🐛 Destination Snowflake: duplicate rows on retries when using incremental staging #8832

joshuataylor · 2021-12-16T07:46:16Z

Environment

Airbyte version: v0.33.12-alpha
OS Version / Instance: Ubuntu
Deployment: Docker
Source Connector and version: Postgres 0.3.17
Destination Connector and version: Snowflake 0.1.2
Severity: High
Step where error happened: Sync job

Current Behavior

When creating a new sync, if the sync fails and it has to retry, all rows which have already been put on the stage will then have rows appended again to the stage. So there are duplicate rows.

Expected Behavior

It should not have duplicate rows.

Uploading data from stage:  stream xxx. schema public, tmp table _airbyte_tmp_boa_xxx, stage PUBLIC_AIRBYTE_RAW_XXX

The stage should have a folder in it, as mentioned here https://docs.snowflake.com/en/user-guide/data-load-local-file-system-stage.html:

put file:///data/data.csv @~/staged/SOME UUID;

This way the uuid is used just for that sync, and other retries should then use a new UUID. On failure it should delete files from that uuid.

Attempt 1:

XXX GB | XXXX records | 1h 36m 14s
/tmp/workspace/3/0/logs.log.

Attempt 2:

XX.XX GB | XXXX records | 1h 41m 54s
/tmp/workspace/3/1/logs.log.

Steps to Reproduce

Create a new destination with SF
At the end when it's inserting into SF (or during the process), cancel the query in SF
It then retries (good!), but has duplicate rows.

Are you willing to submit a PR?

Maybe?

The text was updated successfully, but these errors were encountered:

sherifnada · 2021-12-17T00:43:02Z

@joshuataylor if i understand correctly the problem is that all files are loaded from the stage, rather than loading files from this particular sync correct?

joshuataylor · 2021-12-17T00:46:32Z

Correct, so the files are in the stage, so when a sync fails and retries it adds new files to the stage, which will be duplicated from retry 1 and retry 2.

VitaliiMaltsev · 2021-12-23T11:46:49Z

@joshuataylor please advise how to cancel query in Snowflake? I believe we should know query id for that

VitaliiMaltsev · 2021-12-23T12:11:27Z

@joshuataylor please ignore my previous comment. Already found needed approach

joshuataylor added needs-triage type/bug Something isn't working labels Dec 16, 2021

octavia-squidington-iii added the community label Dec 16, 2021

alafanechere changed the title ~~When syncing to Snowflake with Incremental Staging, retries will duplicate rows~~ 🐛 Destination Snowflake: Incremental Staging, retries will duplicate rows Dec 16, 2021

alafanechere added area/connectors Connector related issues and removed needs-triage labels Dec 16, 2021

alafanechere changed the title ~~🐛 Destination Snowflake: Incremental Staging, retries will duplicate rows~~ 🐛 Destination Snowflake: duplicate rows on retries when using incremental staging Dec 16, 2021

alafanechere added the priority/critical Critical priority! label Dec 16, 2021

sherifnada added this to GL Roadmap Dec 17, 2021

sherifnada moved this to Prioritized for Scoping in GL Roadmap Dec 17, 2021

VitaliiMaltsev self-assigned this Dec 17, 2021

alexandr-shegeda moved this from Prioritized for Scoping to Ready for implementation in GL Roadmap Dec 17, 2021

VitaliiMaltsev moved this from Ready for implementation to Implementation in progress in GL Roadmap Dec 23, 2021

VitaliiMaltsev linked a pull request Dec 28, 2021 that will close this issue

Destination Snowflake : fixed duplicate rows on retries #9141

Merged

40 tasks

VitaliiMaltsev moved this from Implementation in progress to Internal review in GL Roadmap Dec 28, 2021

VitaliiMaltsev moved this from Internal review to Airbyte review in GL Roadmap Dec 28, 2021

VitaliiMaltsev closed this as completed in #9141 Jan 10, 2022

VitaliiMaltsev moved this from Airbyte review to Done in GL Roadmap Jan 11, 2022

karinakuz added connectors/destinations-warehouse connectors/destination/snowflake labels Jan 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Destination Snowflake: duplicate rows on retries when using incremental staging #8832

🐛 Destination Snowflake: duplicate rows on retries when using incremental staging #8832

joshuataylor commented Dec 16, 2021 •

edited

Loading

sherifnada commented Dec 17, 2021

joshuataylor commented Dec 17, 2021

VitaliiMaltsev commented Dec 23, 2021

VitaliiMaltsev commented Dec 23, 2021

🐛 Destination Snowflake: duplicate rows on retries when using incremental staging #8832

🐛 Destination Snowflake: duplicate rows on retries when using incremental staging #8832

Comments

joshuataylor commented Dec 16, 2021 • edited Loading

Environment

Current Behavior

Expected Behavior

Steps to Reproduce

Are you willing to submit a PR?

sherifnada commented Dec 17, 2021

joshuataylor commented Dec 17, 2021

VitaliiMaltsev commented Dec 23, 2021

VitaliiMaltsev commented Dec 23, 2021

joshuataylor commented Dec 16, 2021 •

edited

Loading