Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: fix debezium shutdown interruption bug #24166

Merged
merged 17 commits into from
Mar 28, 2023

Conversation

subodh1810
Copy link
Contributor

@subodh1810 subodh1810 commented Mar 16, 2023

For issue https://github.com/airbytehq/oncall/issues/1596
See this comment for detailed explanation

@subodh1810 subodh1810 self-assigned this Mar 16, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (8)

Connector Version Changelog Publish
source-alloydb 2.0.13
source-alloydb-strict-encrypt 2.0.13 🔵
(ignored)
🔵
(ignored)
source-mssql 1.0.8
source-mssql-strict-encrypt 1.0.8 🔵
(ignored)
🔵
(ignored)
source-mysql 2.0.11
source-mysql-strict-encrypt 2.0.11 🔵
(ignored)
🔵
(ignored)
source-postgres 2.0.13
source-postgres-strict-encrypt 2.0.13 🔵
(ignored)
🔵
(ignored)
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

@subodh1810 subodh1810 marked this pull request as ready for review March 22, 2023 09:10
@sergio-ropero
Copy link
Contributor

Why don't you just trigger the shutdown in a different thread and let Airbyte main thread to finish the consumption of the queue as usual?
I just did a quick read on the changes and it's super confusing without any details on the PR or comments in the code itself. Do we really need this type of classes: DebeziumShutdownProcedure<ChangeEvent<String, String>>?

Copy link
Contributor

@sergio-ropero sergio-ropero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!! hard to tackle with the spaghetti code we have... 🤦

@@ -55,7 +55,8 @@ protected Properties getDebeziumProperties() {
props.setProperty("max.queue.size", "8192");

props.setProperty("errors.max.retries", "10");
props.setProperty("errors.retry.delay.initial.ms", "1000");
props.setProperty("errors.retry.delay.initial.ms", "300");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing these delays timeouts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause in our Airbyte code we only wait for 60 seconds for Debezium to return a record. So Debezium needs to hurry and restart in case of an error. If it waits for 10 seconds, there is a possibility by the time it restarts, we would ask it to shutdown anyway

while (!debeziumShutdownProcedure.getRecordsRemainingAfterShutdown().isEmpty()) {
final ChangeEvent<String, String> event;
try {
event = debeziumShutdownProcedure.getRecordsRemainingAfterShutdown().poll(10, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't keep reading from the regular queue make space for any outstanding ChangeEvents after shutdown?
If not we should discard these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WE cant read form the regular queue cause the reading happens inside the main thread and the main thread pauses once we request shutdown and thus the reading stops. The records can not be discarded.
#24166 (comment)

continue;
}
hasSnapshotFinished = hasSnapshotFinished(Jsons.deserialize(event.value()));
return event;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are events received after shutdown still going to be inside the bounds of our target LSN?
If later than we should discard it, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not/should not discard it cause it's still relevant data. Debezium updates its offset uptill the point it has fetched records so we need to process these records. See this

try {
T event = sourceQueue.poll(10, TimeUnit.SECONDS);
if (event != null) {
targetQueue.put(event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: use offer to avoid blocking / error in case target queue is at capacity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The targetQueue is not bound by size so it should not be at capacity.

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4541777327
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4541777327
No Python unittests run

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:578: The previous and actual discovered catalogs are identical.
=================== 68 passed, 5 skipped in 81.70s (0:01:21) ===================

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/test connector=connectors/source-mssql

🕑 connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/4541918364
✅ connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/4541918364
No Python unittests run

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestConnection.test_check: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestDiscovery.test_discover: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestBasicRead.test_read: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestFullRefresh.test_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
================= 22 passed, 6 skipped, 30 warnings in 18.18s ==================

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/test connector=connectors/source-mysql

🕑 connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/4541919472
✅ connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/4541919472
No Python unittests run

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestConnection.test_check: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestDiscovery.test_discover: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestBasicRead.test_read: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestFullRefresh.test_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
================= 22 passed, 6 skipped, 30 warnings in 17.27s ==================

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/publish connector=connectors/source-alloydb

🕑 Publishing the following connectors:
connectors/source-alloydb
https://github.com/airbytehq/airbyte/actions/runs/4542255199


Connector Did it publish? Were definitions generated?
connectors/source-alloydb

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/publish connector=connectors/source-alloydb-strict-encrypt

🕑 Publishing the following connectors:
connectors/source-alloydb-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/4542256236


Connector Did it publish? Were definitions generated?
connectors/source-alloydb-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/publish connector=connectors/source-postgres-strict-encrypt

🕑 Publishing the following connectors:
connectors/source-postgres-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/4542257119


Connector Did it publish? Were definitions generated?
connectors/source-postgres-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/publish connector=connectors/source-postgres

🕑 Publishing the following connectors:
connectors/source-postgres
https://github.com/airbytehq/airbyte/actions/runs/4542257803


Connector Did it publish? Were definitions generated?
connectors/source-postgres

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/publish connector=connectors/source-mysql-strict-encrypt

🕑 Publishing the following connectors:
connectors/source-mysql-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/4542261852


Connector Did it publish? Were definitions generated?
connectors/source-mysql-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/publish connector=connectors/source-mysql

🕑 Publishing the following connectors:
connectors/source-mysql
https://github.com/airbytehq/airbyte/actions/runs/4542262664


Connector Did it publish? Were definitions generated?
connectors/source-mysql

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/publish connector=connectors/source-mssql

🕑 Publishing the following connectors:
connectors/source-mssql
https://github.com/airbytehq/airbyte/actions/runs/4542264576


Connector Did it publish? Were definitions generated?

THe publish was successful but action hit rate limit as a result didn't update the comment

Run peter-evans/create-or-update-comment@v1
Error: API rate limit exceeded for installation ID 6913263.

Screenshot 2023-03-28 at 7 21 24 PM

@subodh1810
Copy link
Contributor Author

subodh1810 commented Mar 28, 2023

/publish connector=connectors/source-mssql-strict-encrypt run-tests=false

🕑 Publishing the following connectors:
connectors/source-mssql-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/4542265582


Connector Did it publish? Were definitions generated?
connectors/source-mssql-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@subodh1810 subodh1810 merged commit 29a7e2e into master Mar 28, 2023
@subodh1810 subodh1810 deleted the fix-cdc-debezium-shutdown-issue branch March 28, 2023 14:17
@ifun248
Copy link

ifun248 commented May 11, 2023

@subodh1810 Hi Subodh1810, I am testing source MySQL 2.0.17, and find an interesting problem maybe is related to this bug. refs #25988.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants