Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Unbounded queue in ThresholdBundler.closedBundles #157

Closed
mattnworb opened this issue Nov 30, 2016 · 9 comments
Closed

Unbounded queue in ThresholdBundler.closedBundles #157

mattnworb opened this issue Nov 30, 2016 · 9 comments
Assignees
Labels
blocking GAX GA 🚨 This issue needs some love. triage me I really want to be triaged.

Comments

@mattnworb
Copy link

Copying over some possible issues found in googleapis/google-cloud-java#1432:

  1. com.google.api.gax.bundling.ThresholdBundler has an unbounded queue for closedBundles that will fill endlessly if an application is publishing faster than the queue can be drained, or if draining/publishing can't occur at all for some reason. We've seen on a few instances (3-5 applications out of 1000s) that this eventually leads to a OutOfMemoryError as the heap space fills up.

  2. It seems like if the thread that ThresholdBundlingForwarder starts dies for any reason, it won't be restarted, and the messages accumulating in the Bundler will not be processed at all (eventually leading to the OOM above).

I am only familiar with the bundling behavior as it is used in google-cloud-pubsub, but for the first issue I think it may make sense for the Bundler to allow for specifying a maximum number of bundles/items that it will hold onto, and then either reject new items from being added or drop the oldest bundles/items when this limit is hit.

@garrettjonesgoogle
Copy link
Member

So the bundling system will propagate service success results and service failures (as exceptions), but it doesn't propagate failures to initiate the service call. If that was propagated to the caller, then the caller wouldn't keep providing more and more messages with no feedback on the failures.

@mattnworb
Copy link
Author

If that was propagated to the caller, then the caller wouldn't keep providing more and more messages with no feedback on the failures.

Do you mean the caller of e.g. com.google.cloud.pubsub.PubSub? This would seem to require the application developer to develop some sort of "should I publish?" state and logic, whereas the ability to set a maximum limit on the number of items stored by the bundling system seems like it could prevent unbounded memory usage at a more foundational level. Of course, these aren't mutually exclusive though.

@garrettjonesgoogle
Copy link
Member

Here is the scenario I'm describing:

  • Main thread calls publish 100 times
  • The bundler combines those publish calls into a bundle, tries to publish to the service
  • The service call fails with UNAUTHENTICATED
  • Currently, this will kill the bundling thread, and the main thread will not see any failures. My suggestion: have each call site that called publish throw an exception indicating UNAUTHENTICATED, and clear out the bundle from the bundler because it failed.

Basically, any terminal state on an attempt to publish a bundle should clear out the bundle from closedBundles and propagate the exception to the call sites that called publish. There really shouldn't be any reason to keep them around.

@garrettjonesgoogle
Copy link
Member

@pongad could you make sure this gets addressed for pubsub?

@pongad
Copy link
Contributor

pongad commented Jan 27, 2017

I think both issues are solved in the new implementation.

  • You can set FlowControlSettings to either throw or block when there are too many messages in the queue.
  • The publish method is now asynchronous, returning Future<String>. Any error will propagate to caller of future.get().

@mattnworb
Copy link
Author

@pongad do you mean "now asynchronous"?

@pongad
Copy link
Contributor

pongad commented Jan 30, 2017

@mattnworb Yes I did!

@garrettjonesgoogle
Copy link
Member

We now have flow control in ThresholdBundler, which is (1) in the requester's list. As for the processing thread dying (2), that might get fixed with converting the bundler to a push model. Assigning to @michaelbausor to verify as part of his bundling changes.

@michaelbausor
Copy link
Contributor

Correct, (1) is resolved by flow control, and (2) is resolved by converting to a pushing bundler, which gets rid of dedicated threads and uses an executor instead.

@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocking GAX GA 🚨 This issue needs some love. triage me I really want to be triaged.
Projects
None yet
Development

No branches or pull requests

5 participants