Source Salesforce capped records to page_size value for incremental syncs #13657
Labels
connectors/source/salesforce
lang/python
team/connectors-python
team/extensibility
type/bug
Something isn't working
Environment
Current Behavior
This is only happening for streams with more records than the page_size. Today the page_size value is 30k records.
So this issue is not reproducible from our integration account without some tests:
Sync using Salesforce connector 1.0.9 version only using ObjectPermission stream with sycn mode dedup + history

Please reset the connection because is incremental sync.
Changing
page_size=1000
here:airbyte/airbyte-integrations/connectors/source-salesforce/source_salesforce/streams.py
Lines 151 to 155 in e52b656
Sync using dev build after the change, the sync is capped to 1k record.

The reason why this is happening is:
Airbyte uses
count
variable to compare with page_size if it is the last page.airbyte/airbyte-integrations/connectors/source-salesforce/source_salesforce/streams.py
Lines 383 to 391 in e52b656
It uses this way because Salesforce doesn't give a direct next token to validate last page... so in theory should work as the following: if you have 70k records, break into three batches of 30+30+10 and the latest would stop the sync because the number of records is lower than the page_size.
The problem happens that the variable
count
is not the sum up of all records in the sync BUT the chunk size (which is 100)airbyte/airbyte-integrations/connectors/source-salesforce/source_salesforce/streams.py
Lines 304 to 316 in e52b656
To solve the problem we can modify the chunk size to be exactly the page size BUT this could throw OOM errors for large column streams. In my opinion to solve the problem we can execute the counter here (see there is a count but is overwrite by the for loop)
airbyte/airbyte-integrations/connectors/source-salesforce/source_salesforce/streams.py
Lines 383 to 386 in e52b656
Other example using 1.0.9 version

and after applying the counter as suggested here:

Expected Behavior
Sync all data in the first run
Logs
Steps to Reproduce
Are you willing to submit a PR?
Yes
The text was updated successfully, but these errors were encountered: