[improve][broker] Don't call ManagedLedger#asyncAddEntry in Netty I/O thread #23983

BewareMyPower · 2025-02-13T12:21:51Z

Motivation

#23940 brings a behavior change that the core logic of ManagedLedger#asyncAddEntry now won't switch threads, which means it will be executed directly in Netty I/O thread via PersistentTopic#asyncAddEntry.

The beforeAddEntry method calls theintercept and interceptWithNumberOfMessages methods for all broker entry interceptors and prepends a new broker entry metadata buffer on the original buffer (though it's just a composite buffer).

There is a risk that when many producers send messages to the same managed ledger concurrently, the process of asyncAddEntry might block the Netty I/O thread for some time and cause the performance regression.

Modifications

In PersistentTopic#publishMessage, expose the getExecutor() method for ManagedLedger and execute ManagedLedger#asyncAddEntry in that executor. The change of #12606 is moved to PersistentTopic as well that the buffer is retained before switching to another thread.

After that, only synchronize afterAddEntryToQueue with other synchronized methods of ManagedLedgerImpl. P.S. actually I don't think synchronized is needed here but the logic is not trivial like beforeAddEntryToQueue and beforeAddEntry, so I still retain it as synchronized.

ManagedLedgerImpl#asyncAddEntry still doesn't switch the thread, so it would still be possible for the downstream application to synchronize asyncAddEntry, either by adding a lock (e.g. synchronized) or executing this method is a single thread.

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: BewareMyPower#40

… thread

merlimat

An extra context switch for each entry is costly, especially when you have many small entries and little or no batching. That's why we put it on the same thread.

If the interceptor needs to do expensive work, maybe only the interceptor part should be done in a different thread, though it shouldn't affect it when we don't use interceptor.

lhotari · 2025-02-13T18:20:03Z

An extra context switch for each entry is costly, especially when you have many small entries and little or no batching. That's why we put it on the same thread.

@merlimat The thread switching was added in PR #9039, already in December 2020. The reason to make this change is related to a performance concern of #23940 changes which removed the thread switching.

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java

Lines 796 to 826 in ee5b13a

    
           public void asyncAddEntry(ByteBuf buffer, int numberOfMessages, AddEntryCallback callback, Object ctx) { 
        
               if (log.isDebugEnabled()) { 
        
                   log.debug("[{}] asyncAddEntry size={} state={}", name, buffer.readableBytes(), state); 
        
               } 
        
               // retain buffer in this thread 
        
               buffer.retain(); 
        
               // Jump to specific thread to avoid contention from writers writing from different threads 
        
               final var addOperation = OpAddEntry.createNoRetainBuffer(this, buffer, numberOfMessages, callback, ctx, 
        
                       currentLedgerTimeoutTriggered); 
        
               var added = false; 
        
               try { 
        
                   // Use synchronized to ensure if `addOperation` is added to queue and fails later, it will be the first 
        
                   // element in `pendingAddEntries`. 
        
                   synchronized (this) { 
        
                       if (managedLedgerInterceptor != null) { 
        
                           managedLedgerInterceptor.beforeAddEntry(addOperation, addOperation.getNumberOfMessages()); 
        
                       } 
        
                       final var state = STATE_UPDATER.get(this); 
        
                       beforeAddEntryToQueue(state); 
        
                       pendingAddEntries.add(addOperation); 
        
                       added = true; 
        
                       afterAddEntryToQueue(state, addOperation); 
        
                   } 
        
               } catch (Throwable throwable) { 
        
                   if (!added) { 
        
                       addOperation.failed(ManagedLedgerException.getManagedLedgerException(throwable)); 
        
                   } // else: all elements of `pendingAddEntries` will fail in another thread 
        
               } 
        
           }

In Pulsar use cases, synchronization on CPU intensive operations (or blocking IO operations) in Netty IO threads could cause performance regressions. In this case, it would impact use cases where there's a large number of producers producing to a single topic.
Blocking IO threads will have a broader impact since it will impact Netty IO of all connections sharing the same IO thread.

Before #23940, the code looks like this:

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java

Lines 796 to 810 in 7a79c78

    
           public void asyncAddEntry(ByteBuf buffer, int numberOfMessages, AddEntryCallback callback, Object ctx) { 
        
               if (log.isDebugEnabled()) { 
        
                   log.debug("[{}] asyncAddEntry size={} state={}", name, buffer.readableBytes(), state); 
        
               } 
        
               // retain buffer in this thread 
        
               buffer.retain(); 
        
               // Jump to specific thread to avoid contention from writers writing from different threads 
        
               executor.execute(() -> { 
        
                   OpAddEntry addOperation = OpAddEntry.createNoRetainBuffer(this, buffer, numberOfMessages, callback, ctx, 
        
                           currentLedgerTimeoutTriggered); 
        
                   internalAsyncAddEntry(addOperation); 
        
               }); 
        
           }

btw. In the Pulsar code base, we have a problem in how IO threads are used. IO threads are used to process work that shouldn't be handled with IO threads at all. I have created an issue #23865. There should be a separate thread pool for running blocking operations and CPU intensive synchronized operations.

lhotari

Great work @BewareMyPower. Some comments added in this first pass.

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java

lhotari · 2025-02-13T18:40:32Z

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java

+            ledger.getExecutor().execute(() -> ledger.asyncAddEntry(buffer, (int) publishContext.getNumberOfMessages(),
+                    this, publishContext));


this should be an internal concern of the ManagedLedger implementation's asyncAddEntry method. One reason for this is that there's multiple asyncAddEntry signatures and we'd like to add them all in order.

~~I think we should restore the synchronized keyword to the asyncAddEntry method to make it thread safe as it is before~~

Instead of that, I still think the synchronization should be performed from the caller. asyncAddEntry only needs to synchronize it with other asyncAddEntry or addEntry method calls. It does not need to synchronize with other managed ledger's synchronized methods. Let me improve the apiNotes parts

There are 2 different aspects to consider: thread safety and ordering.

Regarding "I still think the synchronization should be performed from the caller":
In Java, synchronization is not only about performing operations one by one under a mutually exclusive lock. "Visibility" is an important aspect of Java thread safety. That's why it doesn't make sense for callers to synchronize calls to asyncAddEntry since all callers would need to use the same lock for both ordering and thread safety.

Snippet from "Java Concurrency in Practice", Chapter 2 "Thread safety":

It is a common mistake to assume that synchronization needs to be used only when writing to shared variables; this is simply not true.
For each mutable state variable that may be accessed by more than one thread, all accesses to that variable must be performed with the same lock held. In this case, we say that the variable is guarded by that lock.

How a ManagedLedger implementation achieves ordering guarantees and thread safety is an internal implementation detail. In the case of ManagedLedger, it doesn't make sense to delegate the responsibility of thread safety to the caller.

Another downside of the ledger.getExecutor().execute solution is that it exposes internal implementation details that callers of the API must be aware of. This is not great API design when such implementation details are exposed.

I think we need to consider a different solution. I can see some code examples in #23940 of what needs to be solved. To me, it seems this could be solved with the Object ctx parameter by passing a ctx that is also understood by the interceptor. @BewareMyPower, would you be able to research that type of solution instead?

BewareMyPower · 2025-02-14T03:05:03Z

@merlimat The thread switching was added in PR #9039, already in December 2020.

@merlimat @lhotari to correct it, this is the very early behavior introduced in #1521.

This PR intends to decouple ManagedLedger#asyncAddEntry and PersistentTopic#asyncAddEntry so that the managed ledger interface can be more flexible for the downstream protocol handlers to use.

After that, all write operations from Pulsar client will still keep the original behavior that switches to managed ledger's executor to call ManagedLedger#asyncAddEntry.

However, regarding the downstream, for example, in my Kafka protocol handler implementation, PersistentTopic#publishMessage is not called in an I/O thread. Instead, it's called in an independent worker thread. Then I can choose to call persistentTopic.getManagedLedger().asyncAddEntry(/* ... */) in order, which can be achieved by adding the synchronized keyword or using the same worker thread for the same topic.

The comment here makes sense to a certain extent, but it might be a new topic (e.g. thread switching vs. synchronized) to discuss, which is beyond the scope of this PR. At least, the existing thread switching approach can already achieve high publish performance, which is verified by many benchmarks.

lhotari

Please check the review comments

lhotari · 2025-02-17T09:47:08Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java

        } catch (Throwable throwable) {
            if (!added) {
                addOperation.failed(ManagedLedgerException.getManagedLedgerException(throwable));
            } // else: all elements of `pendingAddEntries` will fail in another thread
        }
    }

-    protected void beforeAddEntryToQueue(State state) throws ManagedLedgerException {
+    protected void beforeAddEntryToQueue() throws ManagedLedgerException {
+        final var state = STATE_UPDATER.get(this);


using the STATE_UPDATER doesn't make any difference for plain reads of the field value.

lhotari · 2025-02-17T09:47:19Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java

-    protected void afterAddEntryToQueue(State state, OpAddEntry addOperation) throws ManagedLedgerException {
+    // TODO: does this method really need to be synchronized?
+    protected synchronized void afterAddEntryToQueue(OpAddEntry addOperation) throws ManagedLedgerException {
+        final var state = STATE_UPDATER.get(this);


using the STATE_UPDATER doesn't make any difference for plain reads of the field value.

lhotari · 2025-02-17T10:14:26Z

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java

+            ledger.getExecutor().execute(() -> ledger.asyncAddEntry(buffer, (int) publishContext.getNumberOfMessages(),
+                    this, publishContext));


There are 2 different aspects to consider: thread safety and ordering.

Regarding "I still think the synchronization should be performed from the caller":
In Java, synchronization is not only about performing operations one by one under a mutually exclusive lock. "Visibility" is an important aspect of Java thread safety. That's why it doesn't make sense for callers to synchronize calls to asyncAddEntry since all callers would need to use the same lock for both ordering and thread safety.

Snippet from "Java Concurrency in Practice", Chapter 2 "Thread safety":

It is a common mistake to assume that synchronization needs to be used only when writing to shared variables; this is simply not true.
For each mutable state variable that may be accessed by more than one thread, all accesses to that variable must be performed with the same lock held. In this case, we say that the variable is guarded by that lock.

How a ManagedLedger implementation achieves ordering guarantees and thread safety is an internal implementation detail. In the case of ManagedLedger, it doesn't make sense to delegate the responsibility of thread safety to the caller.

Another downside of the ledger.getExecutor().execute solution is that it exposes internal implementation details that callers of the API must be aware of. This is not great API design when such implementation details are exposed.

I think we need to consider a different solution. I can see some code examples in #23940 of what needs to be solved. To me, it seems this could be solved with the Object ctx parameter by passing a ctx that is also understood by the interceptor. @BewareMyPower, would you be able to research that type of solution instead?

[improve][broker] Don't call ManagedLedger#asyncAddEntry in Netty I/O…

0260324

… thread

BewareMyPower requested review from lhotari, codelipenghui, gaoran10, dao-jun and Demogorgon314 February 13, 2025 12:21

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 13, 2025

BewareMyPower self-assigned this Feb 13, 2025

BewareMyPower added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages release/4.0.3 labels Feb 13, 2025

BewareMyPower added 2 commits February 13, 2025 20:24

Add synchronized to afterAddEntryToQueue

67b31ea

Clal buffer.release() in asyncAddEntry

61dcac7

BewareMyPower marked this pull request as draft February 13, 2025 13:00

BewareMyPower added 2 commits February 13, 2025 21:06

Synchronize all addEntry operations

c0e844a

Fix testBrokerClosedProducerClientRecreatesProducerThenSendCommand

73fb146

BewareMyPower marked this pull request as ready for review February 13, 2025 13:17

BewareMyPower marked this pull request as draft February 13, 2025 14:07

Fix tests

d709574

BewareMyPower marked this pull request as ready for review February 13, 2025 14:09

merlimat reviewed Feb 13, 2025

View reviewed changes

lhotari added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Feb 13, 2025

lhotari reviewed Feb 13, 2025

View reviewed changes

BewareMyPower marked this pull request as draft February 14, 2025 01:56

BewareMyPower added 3 commits February 14, 2025 10:13

Fix memory leak of PersistentTopic#publishMessage

c4940d8

Improve API docs

5f920a9

Add more tests for the corner case

01656af

BewareMyPower marked this pull request as ready for review February 14, 2025 02:39

BewareMyPower marked this pull request as draft February 14, 2025 07:46

Fix tests

8106bac

BewareMyPower marked this pull request as ready for review February 14, 2025 07:48

lhotari requested changes Feb 17, 2025

View reviewed changes

BewareMyPower marked this pull request as draft February 17, 2025 10:57

BewareMyPower mentioned this pull request Feb 17, 2025

[revert] "[improve][ml] Do not switch thread to execute asyncAddEntry's core logic (#23940)" #23994

Merged

4 tasks

lhotari added release/4.0.4 and removed release/4.0.3 labels Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][broker] Don't call ManagedLedger#asyncAddEntry in Netty I/O thread #23983

[improve][broker] Don't call ManagedLedger#asyncAddEntry in Netty I/O thread #23983

BewareMyPower commented Feb 13, 2025 •

edited

Loading

merlimat left a comment

lhotari commented Feb 13, 2025

lhotari left a comment

lhotari Feb 13, 2025

BewareMyPower Feb 14, 2025 •

edited

Loading

lhotari Feb 17, 2025

BewareMyPower commented Feb 14, 2025 •

edited

Loading

lhotari left a comment

lhotari Feb 17, 2025

lhotari Feb 17, 2025

lhotari Feb 17, 2025

		ledger.getExecutor().execute(() -> ledger.asyncAddEntry(buffer, (int) publishContext.getNumberOfMessages(),
		this, publishContext));

[improve][broker] Don't call ManagedLedger#asyncAddEntry in Netty I/O thread #23983

Are you sure you want to change the base?

[improve][broker] Don't call ManagedLedger#asyncAddEntry in Netty I/O thread #23983

Conversation

BewareMyPower commented Feb 13, 2025 • edited Loading

Motivation

Modifications

Documentation

Matching PR in forked repository

merlimat left a comment

Choose a reason for hiding this comment

lhotari commented Feb 13, 2025

lhotari left a comment

Choose a reason for hiding this comment

lhotari Feb 13, 2025

Choose a reason for hiding this comment

BewareMyPower Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

lhotari Feb 17, 2025

Choose a reason for hiding this comment

BewareMyPower commented Feb 14, 2025 • edited Loading

lhotari left a comment

Choose a reason for hiding this comment

lhotari Feb 17, 2025

Choose a reason for hiding this comment

lhotari Feb 17, 2025

Choose a reason for hiding this comment

lhotari Feb 17, 2025

Choose a reason for hiding this comment

BewareMyPower commented Feb 13, 2025 •

edited

Loading

BewareMyPower Feb 14, 2025 •

edited

Loading

BewareMyPower commented Feb 14, 2025 •

edited

Loading