Skip to content

SDK incorrectly handles Instagram Posts Discover by URL API #21

@builderpepc

Description

@builderpepc

Under the Instagram - Posts section in the available scrapers for Instagram, there are two endpoints: Collect by URL and Discover by URL. The Collect by URL endpoint is used for retrieving information about specific posts for which we already have URLs, while the Discover by URL endpoint is used for retrieving information about recent posts given one or more public profile URLs.

Based on the API playground, these two endpoints actually use the same API route and only differ in terms of query arguments.

  • Collect by URL: https://api.brightdata.com/datasets/v3/scrape?dataset_id=gd_lk5ns7kz21pck8jpis&notify=false&include_errors=true
  • Discover by URL: https://api.brightdata.com/datasets/v3/scrape?dataset_id=gd_lk5ns7kz21pck8jpis&notify=false&include_errors=true&type=discover_new&discover_by=url

Note the type=discover_new&discover_by=url query arguments in the Discover by URL endpoint. Without these, a request sent to the same route will be interpreted as a request for Collect by URL.

Now consider this error response I received while trying to use the SDK to request data from the Discover by URL endpoint:

{'success': False, 'cost': None, 'error': 'Trigger failed: Trigger failed (HTTP 400): {"error":"Invalid input provided","code":"validation_error","type":"validation","line":"{\\"url\\":\\"https://www.instagram.com/*****\\",\\"num_of_posts\\":3,\\"post_type\\":\\"post\\"}","index":1,"errors":[["num_of_posts","This input should not contain a num_of_posts field"],["post_type","This input should not contain a post_type field"]]}', 'trigger_sent_at': datetime.datetime(2026, 1, 12, 7, 6, 21, 716138, tzinfo=datetime.timezone.utc), 'data_fetched_at': datetime.datetime(2026, 1, 12, 7, 6, 22, 12345, tzinfo=datetime.timezone.utc), 'url': '', 'status': 'error', 'data': None, 'snapshot_id': None, 'platform': 'instagram', 'method': 'web_scraper', 'root_domain': None, 'snapshot_id_received_at': None, 'snapshot_polled_at': [], 'html_char_size': None, 'row_count': None, 'field_count': None}

The error messages included, for clarity, are:

  • This input should not contain a num_of_posts field
  • This input should not contain a post_type field

These messages confuse me because those fields are indeed documented as part of the Discover by URL endpoint and can be used in the API playground. However, these fields are not present in the Collect by URL endpoint, so I assume the SDK is not correctly including the necessary query arguments in the request, causing the API to interpret it as a request to the Collect by URL endpoint.

As a temporary workaround, I can just send the request directly without the SDK wrapper, but it's quite inconvenient given that the SDK handles the asynchronous snapshot polling very neatly. Hoping this can be fixed soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions