-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source FB Marketing: improve insights jobs reliability & runtime #8282
Comments
This is investigation of Facebook marketing performance issue based on logs provided by customer.Here is plot built based on amount of records produced through the running read operation:
Based on this plot and logs we could make some conslusions:
How Ads Insights implemented nowWe are using async call to schedule max 10 async jobs with minimal date range (1 day) so each time we have running jobs for 10 days ahead. It read out sequentialy started from the former day and append next job to the queue after first job completed. If job is failed it scheduled one more time up to 5 times. If jobs havent been completed after scheduling in ~30 minutes it considered failed and rescheduled again. Pros:
Cons:
Proposed solutionsImprove existing approachMain downside of existing approach is that it waits current job to be completed before proceeding to next jobs despite they could have already be completed. Im not sure if this a serious downside becase accrding to this there is a limits by number of rows in response and number of data points required to compute the total and not mentioned about limit number of active jobs (beside that job id expires in 30 days). To improve it we need:
Not sure how it would work with small sets of data, need additional investigation. Only one async job at timeWe could run on only one async job at time with adjustable date range based on X-Business-Use-Case header that have information on number of resource used by previous request. This is most easy for implementation and should work fine for large and small sets of data. A lot of async jobs but no particular read orderThis is similar to existing approach but instead of reading results sequntialy we process first job that has been completed and spawn next one. It would require additional logic to manage state consistency. |
I have tested improved connector on short and long intervals (up to 1 year) There are minor defects that need to be fixed still:
|
created new PR to have a better overview of the final changes. #9805 |
Tell us about the problem you're trying to solve
Placeholder for https://github.com/airbytehq/oncall/issues/32 and https://github.com/airbytehq/oncall/issues/31
The
*_insights
streams for FB provide a suboptimal user experience for a few reasons:Right now, our approach for tuning jobs is: come up with a tuning configuration / selection of columns and cross our fingers that it will work. If it does, we call it case closed (but don't necessarily learn why it failed previously). If it fails or takes too long, there is no clear answer as to what went wrong. It seems really wrong that we're having these issues with one of the biggest ad platforms in the world. We really need to rethink how we're interacting with the FB API.
Some questions:
A fantastic outcome of this investigation would be:
Basically: make it work, then make it fast.
The linked issues at the beginning of the ticket contain more information about the Airbyte users facing this problem and where to find their instances.
The text was updated successfully, but these errors were encountered: