-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: fix debezium shutdown interruption bug #24166
Conversation
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-alloydb |
2.0.13 |
✅ | ✅ |
source-alloydb-strict-encrypt |
2.0.13 |
🔵 (ignored) |
🔵 (ignored) |
source-mssql |
1.0.8 |
✅ | ✅ |
source-mssql-strict-encrypt |
1.0.8 |
🔵 (ignored) |
🔵 (ignored) |
source-mysql |
2.0.11 |
✅ | ✅ |
source-mysql-strict-encrypt |
2.0.11 |
🔵 (ignored) |
🔵 (ignored) |
source-postgres |
2.0.13 |
✅ | ✅ |
source-postgres-strict-encrypt |
2.0.13 |
🔵 (ignored) |
🔵 (ignored) |
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Destinations (0)
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Other Modules (0)
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
Why don't you just trigger the shutdown in a different thread and let Airbyte main thread to finish the consumption of the queue as usual? |
...ebezium/src/main/java/io/airbyte/integrations/debezium/internals/DebeziumRecordIterator.java
Show resolved
Hide resolved
...ebezium/src/main/java/io/airbyte/integrations/debezium/internals/DebeziumRecordIterator.java
Show resolved
Hide resolved
...ebezium/src/main/java/io/airbyte/integrations/debezium/internals/DebeziumRecordIterator.java
Show resolved
Hide resolved
.../src/test/java/io/airbyte/integrations/debezium/internals/DebeziumShutdownProcedureTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!! hard to tackle with the spaghetti code we have... 🤦
@@ -55,7 +55,8 @@ protected Properties getDebeziumProperties() { | |||
props.setProperty("max.queue.size", "8192"); | |||
|
|||
props.setProperty("errors.max.retries", "10"); | |||
props.setProperty("errors.retry.delay.initial.ms", "1000"); | |||
props.setProperty("errors.retry.delay.initial.ms", "300"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why changing these delays timeouts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cause in our Airbyte code we only wait for 60 seconds for Debezium to return a record. So Debezium needs to hurry and restart in case of an error. If it waits for 10 seconds, there is a possibility by the time it restarts, we would ask it to shutdown anyway
while (!debeziumShutdownProcedure.getRecordsRemainingAfterShutdown().isEmpty()) { | ||
final ChangeEvent<String, String> event; | ||
try { | ||
event = debeziumShutdownProcedure.getRecordsRemainingAfterShutdown().poll(10, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't keep reading from the regular queue make space for any outstanding ChangeEvent
s after shutdown?
If not we should discard these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WE cant read form the regular queue cause the reading happens inside the main thread and the main thread pauses once we request shutdown and thus the reading stops. The records can not be discarded.
#24166 (comment)
continue; | ||
} | ||
hasSnapshotFinished = hasSnapshotFinished(Jsons.deserialize(event.value())); | ||
return event; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are events received after shutdown still going to be inside the bounds of our target LSN?
If later than we should discard it, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not/should not discard it cause it's still relevant data. Debezium updates its offset uptill the point it has fetched records so we need to process these records. See this
try { | ||
T event = sourceQueue.poll(10, TimeUnit.SECONDS); | ||
if (event != null) { | ||
targetQueue.put(event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: use offer to avoid blocking / error in case target queue is at capacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The targetQueue is not bound by size so it should not be at capacity.
/test connector=connectors/source-postgres
Build PassedTest summary info:
|
/test connector=connectors/source-mssql
Build PassedTest summary info:
|
/test connector=connectors/source-mysql
Build PassedTest summary info:
|
/publish connector=connectors/source-alloydb
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-alloydb-strict-encrypt
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-postgres-strict-encrypt
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-postgres
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-mysql-strict-encrypt
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-mysql
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-mssql
THe publish was successful but action hit rate limit as a result didn't update the comment
|
/publish connector=connectors/source-mssql-strict-encrypt run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
@subodh1810 Hi Subodh1810, I am testing source MySQL 2.0.17, and find an interesting problem maybe is related to this bug. refs #25988. |
For issue https://github.com/airbytehq/oncall/issues/1596
See this comment for detailed explanation