Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When syncs fail, normalization shouldn't run for full refresh streams #9663

Closed
sherifnada opened this issue Jan 20, 2022 · 3 comments
Closed
Labels
autoteam frozen Not being actively worked on normalization team/destinations Destinations team's backlog team/platform-move type/bug Something isn't working

Comments

@sherifnada
Copy link
Contributor

Current Behavior

If a sync fails, it doesn’t make sense to normalize a full refresh stream which failed during the sync. It incurs cost needlessly on the destination, especially because that stream will just get re-normalized anyways in the sync attempt. Plus, since the whole sync job failed, the user probably won't trust the output in the destination until the whole sync succeeds.

But currently there’s no way to know which full refresh streams failed and which succeeded.

So the tradeoff given the current constraints is we either don’t normalize any FR streams on sync job failure, or we normalize all of them. I’m leaning to the former: don’t normalize any FR streams on a failure, because the customer won’t trust the result anyway if
they see the red X saying failure.

Expected Behavior

If a sync job fails, normalization should not run for streams configured for full refresh.

Alternatively, if we can have a way to know which streams succeeded (like a STREAM SYNC BEGIN and STREAM SYNC END message emitted by the source) it can help on this, and also other things like memory management in the destination but would be a bigger protocol level addition.

@killthekitten
Copy link
Contributor

@sherifnada @grishick is there plans to take on this issue, or perhaps a way I could contribute to see this come together?

@killthekitten
Copy link
Contributor

killthekitten commented Jan 11, 2023

After a brief check, there was some work done already to skip normalization when the sync didn't persist any data, but it has been reverted by #17574:

LOGGER.info("No records to normalize detected");
// Normalization skip has been disabled: issue #5417
// LOGGER.info("Skipping normalization because there are no records to normalize.");
// continue;
}

Would it be the potential fix for this issue, or at least the place to add the proper fix?

The functionality has been initially added in #9672 but then reverted because the normalization is relied upon to clean the data (more details in #17578).

In any case, it's important to link this discussion and #17578, as the outcome of either of them might affect the other.

@evantahler
Copy link
Contributor

Closing this issue - normalization is gone, and we are going to change the full refresh behavior via https://github.com/airbytehq/airbyte-internal-issues/issues/7475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoteam frozen Not being actively worked on normalization team/destinations Destinations team's backlog team/platform-move type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants