-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Under the Instagram - Posts section in the available scrapers for Instagram, there are two endpoints: Collect by URL and Discover by URL. The Collect by URL endpoint is used for retrieving information about specific posts for which we already have URLs, while the Discover by URL endpoint is used for retrieving information about recent posts given one or more public profile URLs.
Based on the API playground, these two endpoints actually use the same API route and only differ in terms of query arguments.
- Collect by URL:
https://api.brightdata.com/datasets/v3/scrape?dataset_id=gd_lk5ns7kz21pck8jpis¬ify=false&include_errors=true - Discover by URL:
https://api.brightdata.com/datasets/v3/scrape?dataset_id=gd_lk5ns7kz21pck8jpis¬ify=false&include_errors=true&type=discover_new&discover_by=url
Note the type=discover_new&discover_by=url query arguments in the Discover by URL endpoint. Without these, a request sent to the same route will be interpreted as a request for Collect by URL.
Now consider this error response I received while trying to use the SDK to request data from the Discover by URL endpoint:
{'success': False, 'cost': None, 'error': 'Trigger failed: Trigger failed (HTTP 400): {"error":"Invalid input provided","code":"validation_error","type":"validation","line":"{\\"url\\":\\"https://www.instagram.com/*****\\",\\"num_of_posts\\":3,\\"post_type\\":\\"post\\"}","index":1,"errors":[["num_of_posts","This input should not contain a num_of_posts field"],["post_type","This input should not contain a post_type field"]]}', 'trigger_sent_at': datetime.datetime(2026, 1, 12, 7, 6, 21, 716138, tzinfo=datetime.timezone.utc), 'data_fetched_at': datetime.datetime(2026, 1, 12, 7, 6, 22, 12345, tzinfo=datetime.timezone.utc), 'url': '', 'status': 'error', 'data': None, 'snapshot_id': None, 'platform': 'instagram', 'method': 'web_scraper', 'root_domain': None, 'snapshot_id_received_at': None, 'snapshot_polled_at': [], 'html_char_size': None, 'row_count': None, 'field_count': None}
The error messages included, for clarity, are:
- This input should not contain a num_of_posts field
- This input should not contain a post_type field
These messages confuse me because those fields are indeed documented as part of the Discover by URL endpoint and can be used in the API playground. However, these fields are not present in the Collect by URL endpoint, so I assume the SDK is not correctly including the necessary query arguments in the request, causing the API to interpret it as a request to the Collect by URL endpoint.
As a temporary workaround, I can just send the request directly without the SDK wrapper, but it's quite inconvenient given that the SDK handles the asynchronous snapshot polling very neatly. Hoping this can be fixed soon.