Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-mssql: Duplicate rows for same LSN #24206

Closed
wants to merge 0 commits into from

Conversation

sivankumar86
Copy link
Contributor

What

Describe what the change is solving
There is a chance commit LSN same for multiple rows and it is hard to figure out latest rows

How

Describe the solution
There is a extra column to find out change sequence but, it was not included in airbyte output

https://debezium.io/documentation/reference/stable/connectors/sqlserver.html

"
The connector sorts the changes that it reads in ascending order, based on the values of their commit LSN and change LSN. This sorting order ensures that the changes are replayed by Debezium in the same order in which they occurred in the database.
"

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Community member or Airbyter

https://discuss.airbyte.io/t/duplicated-registries-when-syncing-from-mssql-cdc/3752/7

@sivankumar86 sivankumar86 requested a review from a team as a code owner March 17, 2023 20:20
@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues bounty community connectors/source/mssql area/documentation Improvements or additions to documentation labels Mar 17, 2023
@marcosmarxm
Copy link
Member

@grishick can you check this contribution and add to source team backlog to future review?

@sherifnada sherifnada removed the request for review from a team March 29, 2023 17:30
@evantahler evantahler removed request for a team April 17, 2023 16:27
@sivankumar86
Copy link
Contributor Author

@grishick @prateekmukhedkar any update on this ?

@plenti-jacob-roe
Copy link
Contributor

plenti-jacob-roe commented May 5, 2023

@marcosmarxm @grishick @prateekmukhedkar Any updates on this?

@sivankumar86
Copy link
Contributor Author

@marcosmarxm did you get a chance to look into this ?

@sivankumar86
Copy link
Contributor Author

/test connector=connectors/source-mssql

@sivankumar86
Copy link
Contributor Author

@sashaNeshcheret Could you take a look on this PR ?

@sivankumar86
Copy link
Contributor Author

@marcosmarxm it has been more than 5 months. Could someone take a look on this PR ?

@sivankumar86 sivankumar86 changed the title Source-Mssql: Duplicate rows for same LSN source-mssql: Duplicate rows for same LSN Jul 5, 2023
@sivankumar86
Copy link
Contributor Author

@akashkulk Could you review this PR ?

@jrolom jrolom added needs-triage team/db-dw-sources Backlog for Database and Data Warehouse Sources team labels Jul 7, 2023
@prateekmukhedkar
Copy link
Contributor

@sivankumar86 I acknowledge the delay in this PR. The reason is that we changed how cursor fields are defined for CDC related syncs so that when data is published to destination, the Destination connector can use this cursor field to de-duplicate rows with the same LSN. The change brings CDC syncs to match the Airbyte protocol.

We made a change for Postgres source connector here #27442. This change specifies _ab_cdc_lsn as a source defined cursor. We will need to follow the same approach for MySQL as well - use _ab_cdc_lsn which will result in a breaking change. I will discuss with the team and get back on the next steps. Thank you for your understanding.

@sivankumar86
Copy link
Contributor Author

@prateekmukhedkar Thanks for taking a look. May need to combine 2 fields from sqlserver cdc to create a _ab_cdc_lsn column or need to change the deduplicate logic to use more than field. I am doing 2 approach as I am using custom dbt job to duplicate which uses 2 columns.

@octavia-squidington-iii octavia-squidington-iii removed area/documentation Improvements or additions to documentation area/connectors Connector related issues labels Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community connectors/source/mssql needs-triage team/db-dw-sources Backlog for Database and Data Warehouse Sources team
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

8 participants