Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Consumer stuck when delete subscription __compaction failed #23980

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Feb 13, 2025

Motivation

Background

  • The steps of unsubscribing __compaction
    • Delete compacted ledger
    • Delete cursor
    • Remove subscription

Issue 1: consumer will stuck if deleting cursor(the step 2 above) failed

  • Delete compacted ledger
  • Delete cursor failed
  • Reload the topic
  • Compactor subscription relates to a deleted ledger
  • Consumers are stuck because the compacted ledger has been deleted, and the messages were lost. The broker will keep printing the error log below
2025-02-13T17:23:51,448 - ERROR - [broker-topic-workers-OrderedExecutor-4-0:PersistentDispatcherSingleActiveConsumer] - [persistent://public/default/tp-74ea581c-d7ca-4558-999b-804b6e33faee / reader-801f712e61-Consumer{subscription=PersistentSubscription{topic=persistent://public/default/tp-74ea581c-d7ca-4558-999b-804b6e33faee, name=reader-801f712e61}, consumerId=1, consumerName=Ti6va, address=[id: 0xce1b338f, L:/127.0.0.1:52193 - R:/127.0.0.1:52199] [SR:127.0.0.1, state:Connected]}] Error reading entries at 11:0 : org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies - Retrying to read in 15.0 seconds

You can reproduce the issue by the test testReadMsgsAfterDisableCompaction(true)


issue 2
The compaction task can concurrently execute by deleting the cursor __compaction

Modifications

  • Fix issue 1
  • Fix issue 2
  • Print an error log if a compactor subscription is initialized twice, which may cause a ledger lost/

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode self-assigned this Feb 13, 2025
@poorbarcode poorbarcode added the type/bug The PR fixed a bug or issue reported a bug label Feb 13, 2025
@poorbarcode poorbarcode added this to the 4.1.0 milestone Feb 13, 2025
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 13, 2025
@@ -245,6 +245,7 @@ public static boolean isDedupCursorName(String name) {
private static final Long COMPACTION_NEVER_RUN = -0xfebecffeL;
private volatile CompletableFuture<Long> currentCompaction = CompletableFuture.completedFuture(
COMPACTION_NEVER_RUN);
private volatile AtomicBoolean disablingCompaction = new AtomicBoolean(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private volatile AtomicBoolean disablingCompaction = new AtomicBoolean(false);
private final AtomicBoolean disablingCompaction = new AtomicBoolean(false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I have changed the modifier

if (e != null) {
log.warn("[{}][{}] Last compaction task failed", topic, subscriptionName);
// Avoid concurrently execute compaction and unsubscribing.
synchronized (this) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really a need for a synchronized block here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. It prevents the modifying of the variable currentCompaction.

Comment on lines 401 to 402
AtomicBoolean disablingCompaction =
WhiteboxImpl.getInternalState(persistentTopic, "disablingCompaction");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using reflection/WhiteboxImpl. A better approach is to have a default access method which is annotation with @VisibleForTesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

WhiteboxImpl.getInternalState(persistentTopic, "currentCompaction");
persistentTopic.triggerCompaction();
CompletableFuture<Long> currentCompaction2 =
WhiteboxImpl.getInternalState(persistentTopic, "currentCompaction");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, avoid reflection/WhiteboxImpl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

// Avoid concurrently execute compaction and unsubscribing.
synchronized (this) {
if (!disablingCompaction.compareAndSet(false, true)) {
unsubscribeFuture.completeExceptionally(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be protected with if (!unsubscribeFuture.isDone()) { so that creating the exception could be avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you mention the code line? I could not find it.

@lhotari lhotari changed the title [fix][broker]Consumer stuck when delete subscription __compaction failed [fix][broker] Consumer stuck when delete subscription __compaction failed Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs release/3.0.11 release/3.3.6 release/4.0.4 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants