-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When syncs fail, normalization shouldn't run for full refresh streams #9663
Comments
@sherifnada @grishick is there plans to take on this issue, or perhaps a way I could contribute to see this come together? |
After a brief check, there was some work done already to skip normalization when the sync didn't persist any data, but it has been reverted by #17574: airbyte/airbyte-workers/src/main/java/io/airbyte/workers/temporal/sync/SyncWorkflowImpl.java Lines 133 to 137 in 00fb4bb
Would it be the potential fix for this issue, or at least the place to add the proper fix? The functionality has been initially added in #9672 but then reverted because the normalization is relied upon to clean the data (more details in #17578). In any case, it's important to link this discussion and #17578, as the outcome of either of them might affect the other. |
Closing this issue - normalization is gone, and we are going to change the full refresh behavior via https://github.com/airbytehq/airbyte-internal-issues/issues/7475 |
Current Behavior
If a sync fails, it doesn’t make sense to normalize a full refresh stream which failed during the sync. It incurs cost needlessly on the destination, especially because that stream will just get re-normalized anyways in the sync attempt. Plus, since the whole sync job failed, the user probably won't trust the output in the destination until the whole sync succeeds.
But currently there’s no way to know which full refresh streams failed and which succeeded.
So the tradeoff given the current constraints is we either don’t normalize any FR streams on sync job failure, or we normalize all of them. I’m leaning to the former: don’t normalize any FR streams on a failure, because the customer won’t trust the result anyway if
they see the red X saying failure.
Expected Behavior
If a sync job fails, normalization should not run for streams configured for full refresh.
Alternatively, if we can have a way to know which streams succeeded (like a STREAM SYNC BEGIN and STREAM SYNC END message emitted by the source) it can help on this, and also other things like memory management in the destination but would be a bigger protocol level addition.
The text was updated successfully, but these errors were encountered: