-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can the bundling behavior of gRPC be disabled for pubsub publishers? #1432
Comments
This is very unusual - the 63k elements is far beyond the limit set in the bundler; that amount of elements should have triggered a publish. The default limits are 1000 elements, 10 ms, and 10 mb; surpassing any individual limit should trigger a publish. Are you publishing synchronously or asynchronously? I think it makes sense to expose the controls - we are already working on a design that will do that. |
Hey Garret, we are publishing asynchronously, in a fire-and-forget manner. Our code adds a listener to the returned future to log if it succeeded or failed but we otherwise don't handle the response in any way. Each of our publish requests contains one message. I think I forgot to mention earlier that we are using all of the default settings with the client. Is there any state in particular of the ThresholdBundler or other pubsub-related instances that would be good to know about? Since I have a heap dump of an affected host I can examine anything that might seem relevant for the unexpected Bundler behavior. |
Hi, @garrettjonesgoogle. I see the code for those limits you're talking about. These seem to be settings for each individual bundle and not the Is the Speculation: What if some exception happens in Could this cause our OOM? |
Ahh yes you're right - there is no limit on |
@garrettjonesgoogle, this is a stacktrace where
Let me know if this helps and/or if you need more info. |
So have you been able to successfully send any messages? UNAUTHENTICATED implies that you haven't - or another scenario might be that you authenticated once, but for some reason the authentication expired. |
@garrettjonesgoogle I think we have been, but isn't the unauthenticated part an orthogonal issue? |
Yes it is, but it might be a secondary issue that also needs to be addressed - if authentication expires, it needs to be refreshed somehow. |
Gotcha. I've attached the full stack trace here. |
The thing I'm curious about is what type of credentials you are using. Are you using the default credentials (which use Application Default Credentials under the hood), or something else? |
I think default? @mattnworb will have more details. |
we are injecting a path to a credentials file from GOOGLE_APPLICATION_CREDENTIALS environment variable. (we use a version of the library before it moved to google-auth-library-java, so it follows this pattern:
It's verified to pick up the correct credentials and being able to use them without auth issues on other instances and services (and seems to have no problem re-authing after having seen UNAUTHENTICATED in the logs). |
To verify what @lndbrg mentioned we are using a service account with the credentials supplied as he mentioned. Our publisher application runs on 1000s of hosts and we have seen this problem with the OutOfMemoryError on just a handful of hosts. Looking at the logs of one of the affected host, it seems like it begins to get publishing errors almost immediately after startup due to an inability to connect to get an access token for the service account:
The root of the exception originates in The application is then attempting to publish new messages every 10-20 seconds, and this stacktrace repeats for each attempt (so it seems as if the thread launched by ThresholdBundlingForwarder is not dying, or is being restarted). This continues for 8 days or so until we see an OutOfMemoryError. Instances of the application that don't have issues connecting to the relevant googleapi to get an access token don't experience any issues publishing and never receive an OOM. So I think there are three issues going on:
Since 1 and 2 are really issues within the googleapis/gax-java library, should we open issues there instead? I believe @davidxia also opened an issue in the Cloud Support Center - we'd be happy to continue the discussion there (or here), whichever seems best for resolving this. |
Filing issues 1 & 2 against gax-java makes sense. (The same people work on both repositories.) One potential way to mitigate failing hosts in your case is to send 1 synchronous "canary" call when the app starts; if it gets UNAUTHENTICATED, then you can recover & fix that then, instead of having the host do nothing useful and eventually OOM. Potentially it doesn't even have to be a publish call - calling We'll try to get all of these other issues fixed, but be aware that we are currently resource-constrained. |
The remaining request here not covered by googleapis/gax-java#157 is to allow bundling to be disabled at the handwritten layer. @pongad , can you consider whether this should be exposed in the Pub/Sub rewrite? |
As currently designed, the bundling can be "turned off" by setting EDIT: "as currently designed" refers to a new Publisher implementation which is not in master yet |
@pongad actually the problem is that even with a threshold of 1, the publish is still done in the bundling thread, and the error isn't propagated to the calling thread. |
@pongad is the new Publisher implementation that you refer to the one on the It seems like what is in this branch would solve my case, but is there still bundling done at the grpc layer? |
Since @pongad is in the Sydney timezone, I will answer your questions:
|
Thanks for the quick answer. I think the main thing that would solve the original problem here would be that bundles that can't be sent (for whatever reason) aren't kept around by the bundler or whomever indefinitely. Publishing the bundle immediately (with threshold=1) is useful too, but the main issue that I recall is the closed bundle list growing without bound. |
The rewrite includes https://github.com/googleapis/gax-java/blob/bundling-hp/src/main/java/com/google/api/gax/grpc/FlowControlSettings.java , which allows for explicit limits to be set on the bundler. |
The rewrite has been merged to master and released. @mattnworb could you take a look to see if your concerns are addressed? |
@garrettjonesgoogle thanks for the update. Are there any docs on how to move from using |
@pongad could you point @mattnworb to the information he needs? |
@mattnworb Could you see if this helps? We are still in the process of updating all our code samples. If you don't specify any |
@mattnworb I'm going to close out this issue now - feel free to file a new issue if you're having problems adopting the new Publisher. |
@garrettjonesgoogle I haven't had a chance to try out the new Publisher yet, but thanks overall for following up on this and addressing the issue. |
…0.10 (#1432) [](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [com.google.cloud:google-cloud-pubsub](https://github.com/googleapis/java-pubsub) | `1.120.9` -> `1.120.10` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>googleapis/java-pubsub</summary> ### [`v1.120.10`](https://github.com/googleapis/java-pubsub/blob/HEAD/CHANGELOG.md#​112010-httpsgithubcomgoogleapisjava-pubsubcomparev11209v112010-2022-08-04) [Compare Source](https://github.com/googleapis/java-pubsub/compare/v1.120.9...v1.120.10) ##### Dependencies - update dependency com.google.cloud:google-cloud-core to v2.8.8 ([#​1231](https://github.com/googleapis/java-pubsub/issues/1231)) ([9d13dd8](https://github.com/googleapis/java-pubsub/commit/9d13dd8bc43e24815884dde421409136958d4b0f)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, click this checkbox. --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/java-bigquerydatatransfer). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzMi4xNDMuMSIsInVwZGF0ZWRJblZlciI6IjMyLjE0My4xIn0=-->
🤖 I have created a release *beep* *boop* --- ## [2.3.3](googleapis/java-bigquerydatatransfer@v2.3.2...v2.3.3) (2022-08-09) ### Dependencies * update dependency com.google.cloud:google-cloud-bigquery to v2.14.1 ([#1430](googleapis/java-bigquerydatatransfer#1430)) ([befed25](googleapis/java-bigquerydatatransfer@befed25)) * update dependency com.google.cloud:google-cloud-bigquery to v2.14.2 ([#1433](googleapis/java-bigquerydatatransfer#1433)) ([30b6942](googleapis/java-bigquerydatatransfer@30b6942)) * update dependency com.google.cloud:google-cloud-bigquery to v2.14.3 ([#1434](googleapis/java-bigquerydatatransfer#1434)) ([5c054b0](googleapis/java-bigquerydatatransfer@5c054b0)) * update dependency com.google.cloud:google-cloud-pubsub to v1.120.10 ([#1432](googleapis/java-bigquerydatatransfer#1432)) ([4ab39e9](googleapis/java-bigquerydatatransfer@4ab39e9)) * update dependency com.google.cloud:google-cloud-pubsub to v1.120.8 ([#1428](googleapis/java-bigquerydatatransfer#1428)) ([ddab51f](googleapis/java-bigquerydatatransfer@ddab51f)) * update dependency com.google.cloud:google-cloud-pubsub to v1.120.9 ([#1431](googleapis/java-bigquerydatatransfer#1431)) ([f909343](googleapis/java-bigquerydatatransfer@f909343)) * update dependency com.google.cloud:google-cloud-shared-dependencies to v3 ([#1426](googleapis/java-bigquerydatatransfer#1426)) ([8a2c31e](googleapis/java-bigquerydatatransfer@8a2c31e)) * update dependency com.google.protobuf:protobuf-java-util to v3.21.3 ([#1423](googleapis/java-bigquerydatatransfer#1423)) ([50bc04e](googleapis/java-bigquerydatatransfer@50bc04e)) * update dependency com.google.protobuf:protobuf-java-util to v3.21.4 ([#1425](googleapis/java-bigquerydatatransfer#1425)) ([60d8c78](googleapis/java-bigquerydatatransfer@60d8c78)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
…s#1687) (googleapis#1432) * chore(java): add a note in README for migrated split repos Disable renovate bot and flaky bot for split repositories that have moved to the Java monorepo. The Java monorepo will pass the "monorepo=True" parameter to java.common_templates method in its owlbot.py files so that the migration note will not appear in the README in the monorepo. Co-authored-by: Jeff Ching <chingor@google.com> Source-Link: googleapis/synthtool@d4b2916 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-java:latest@sha256:edae91ccdd2dded2f572ec341a768ad180305a3e8fbfd93064b28e237d35920a Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Jeff Ching <chingor@google.com> Co-authored-by: Deepankar Dixit <90280028+ddixit14@users.noreply.github.com>
I've encountered a case where an application that publishes pubsub messages (using
google-cloud-pubsub:0.5.1
) has encountered OutOfMemoryErrors which I have traced down to an instance ofcom.google.api.gax.bundling.ThresholdBundler
taking up 92% of the heap space. This application purposefully runs with a small heap of ~256mb.This application publishes all of it's messages to a single topic, so it seems as if all of the requests end up in the same ThresholdBundler instance. In the heap dump that I have, the
ThresholdBundler
instance has 63984 elements in theclosedBundles
list.It looks like the gax library allows for the bundling behavior to be disabled via
BundlingSettings$Builder.setIsEnabled(Boolean)
, but the pubsub layer does not expose any of these options and always leaves the bundling behavior as enabled.Would it be possible to add flags to PubsubOptions (or wherever is appropriate) to allow for this request batching to be disabled if desired?
The text was updated successfully, but these errors were encountered: