Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] pattern regex error in PulsarLedgerManager cause zk data notification can not execute #23977

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

TakaHiR07
Copy link
Contributor

@TakaHiR07 TakaHiR07 commented Feb 13, 2025

Motivation

ledger zk path is like "/ledgers/00/0601/L7170". But currently it exist pattern regex error cause zk data notification can not execute.

ledgerPathRegex.matcher(n.getPath()).matches() is always false.

企业微信截图_c58d5c24-43ee-4bb1-b1b5-9d9fbb3747e2

Modifications

use correct pattern.

Alternative modification: remove the judgement in handleDataNotification(), since getLedgerId(n.getPath()) would throw error if the path is not ledger path.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment
  • The public API

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 13, 2025
@thetumbled
Copy link
Member

Some test is requried to ensure the notification logic is triggered and works correct.

@lhotari
Copy link
Member

lhotari commented Feb 18, 2025

ledger zk path is like "/ledgers/00/0601/L7170". But currently it exist pattern regex error cause zk data notification can not execute.

Great catch @TakaHiR07. What is the current impact of this in Pulsar & Bookkeeper (which is using PulsarLedgerManager in the Pulsar distribution of Bookkeeper)?

@TakaHiR07
Copy link
Contributor Author

Great catch @TakaHiR07. What is the current impact of this in Pulsar & Bookkeeper (which is using PulsarLedgerManager in the Pulsar distribution of Bookkeeper)?

One impact is all the asyncOpenLedgerNoRecovery in pulsar can not register successful MetadataListener. The code is here: https://github.com/apache/bookkeeper/blob/606db747eae9856fed0aeb3f16ef01e7c9254e26/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadOnlyLedgerHandle.java#L95-L105

I am not sure whether other place use PulsarLedgerManager and register zk listener.

@lhotari
Copy link
Member

lhotari commented Feb 18, 2025

Some test is requried to ensure the notification logic is triggered and works correct.

@thetumbled That's right that there should be tests, but this just shows that the original code didn't have proper test coverage if it's currently broken.

One possible resolution would be to add an issue report about the missing test coverage and add the tests later. That moment usually never comes, but it's also bad to have this issue around.

@lhotari
Copy link
Member

lhotari commented Feb 18, 2025

Great catch @TakaHiR07. What is the current impact of this in Pulsar & Bookkeeper (which is using PulsarLedgerManager in the Pulsar distribution of Bookkeeper)?

One impact is all the asyncOpenLedgerNoRecovery in pulsar can not register successful MetadataListener. The code is here: https://github.com/apache/bookkeeper/blob/606db747eae9856fed0aeb3f16ef01e7c9254e26/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadOnlyLedgerHandle.java#L95-L105

I am not sure whether other place use PulsarLedgerManager and register zk listener.

I wonder what parts of the metadata could change. My guess is LAC (lastAddConfirmed) and length based on this:
https://github.com/apache/bookkeeper/blob/54bdc0d60b32830b513089167cee67f52f4735eb/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L367-L370 .
I would assume that this would be relevant when the ledger is in recovery state.
States:
https://github.com/apache/bookkeeper/blob/2192caaf9738cf4efb799647cc5a5f68bf1823b2/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/api/LedgerMetadata.java#L154-L167

@TakaHiR07
Copy link
Contributor Author

TakaHiR07 commented Feb 18, 2025

Great catch @TakaHiR07. What is the current impact of this in Pulsar & Bookkeeper (which is using PulsarLedgerManager in the Pulsar distribution of Bookkeeper)?

One impact is all the asyncOpenLedgerNoRecovery in pulsar can not register successful MetadataListener. The code is here: https://github.com/apache/bookkeeper/blob/606db747eae9856fed0aeb3f16ef01e7c9254e26/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadOnlyLedgerHandle.java#L95-L105
I am not sure whether other place use PulsarLedgerManager and register zk listener.

I wonder what parts of the metadata could change. My guess is LAC (lastAddConfirmed) and length based on this: https://github.com/apache/bookkeeper/blob/54bdc0d60b32830b513089167cee67f52f4735eb/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L367-L370 . I would assume that this would be relevant when the ledger is in recovery state. States: https://github.com/apache/bookkeeper/blob/2192caaf9738cf4efb799647cc5a5f68bf1823b2/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/api/LedgerMetadata.java#L154-L167

@lhotari I think if ledger is in recover state, LAC would be changed. But we should not use asyncOpenLedgerNoRecovery to register zk metadata listener, instead should use asyncOpenLedger to update ledgerHandler's metadata. This is no problem since it do not rely on zk.

But if ledger is already closed, and then trigger bookkeeper auto-recovery because of disk error, ledger's quorum would be changed, ledger's zk node would also be changed.

Actually, the issue is found when I fix another issue, you can see here. #21552

@codecov-commenter
Copy link

codecov-commenter commented Feb 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.20%. Comparing base (bbc6224) to head (8959e2e).
Report is 929 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23977      +/-   ##
============================================
+ Coverage     73.57%   74.20%   +0.62%     
+ Complexity    32624    32278     -346     
============================================
  Files          1877     1853      -24     
  Lines        139502   143871    +4369     
  Branches      15299    16350    +1051     
============================================
+ Hits         102638   106753    +4115     
+ Misses        28908    28729     -179     
- Partials       7956     8389     +433     
Flag Coverage Δ
inttests 26.72% <100.00%> (+2.14%) ⬆️
systests 23.21% <100.00%> (-1.11%) ⬇️
unittests 73.73% <100.00%> (+0.88%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ulsar/metadata/bookkeeper/PulsarLedgerManager.java 57.01% <100.00%> (+8.33%) ⬆️

... and 1042 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants