-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] maxMessagePublishBufferSizeInMB permits leak can stall and timeout connections #23921
Comments
This is also a location where permits can leak: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Producer.java Lines 198 to 205 in 3d0625b
|
similar problems in all of these completedSendOperation calls: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Producer.java Lines 214 to 264 in 3d0625b
Well these contain another additional problem. The "permits" get returned without being even acquired. |
Search before asking
Read release policy
Version
master branch code analysis
Minimal reproduce step
There's currently an issue that the org.apache.pulsar.broker.service.ServerCnx#completedSendOperation might not get called in error cases.
The impact of this is that message publishing could stop for all connections using a particular IO thread.
The broker
maxMessagePublishBufferSizeInMB
limit is split into amaxPendingBytesPerThread
limit:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Lines 342 to 343 in 3fce309
The pending bytes is incremented in sending:
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Line 3357 in 3fce309
It is decremented in ServerCnx#completedSendOperation method:
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Lines 3376 to 3377 in 3fce309
If the call to decrement is missing, there will be a leak which will eventually cause all message publishing to stop for all connections using a particular IO thread.
The leak happens here:
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
Lines 732 to 749 in 2a9d4ac
There should be a call to MessagePublishContext#completed for all exception cases. ServerCnx#completedSendOperation gets called for exception path in MessagePublishContext#completed here:
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Producer.java
Lines 480 to 499 in 3d0625b
The other exception cases contain the required call to
callback.completed
which will call ServerCnx#completedSendOperation:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
Lines 776 to 794 in 2a9d4ac
What did you expect to see?
There shouldn't be a leak in
maxPendingBytesPerThread
permits which eventually leads to message publishing stopping for all connections using a particular IO thread.What did you see instead?
Based on the analysis of the code, there's a leak.
Anything else?
This might be related to issue #23920
A heap dump could be used to check if the issue applies. This can be done by searching
org.apache.pulsar.broker.service.ServerCnx$PendingBytesPerThreadTracker
instances in the heap dump and checking thependingBytes
andlimitExceeded
field values.Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: