New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

RFC: Consistent timeout handling in Collector pipelines #11948

Closed

jmacd wants to merge 11 commits into open-telemetry:main from jmacd:jmacd/consistent_timeout

Contributor

jmacd commented Dec 18, 2024

Description

Calls for deadline-awareness across common Collector pipeline components, including batch processors, queue sender, retry and
timeout senders.

Link to tracking issue

Part of #11183

Testing

n/a

Documentation

This is a new RFC. As these changes are accepted and implemented, user-facing documentation will be added.

jmacd added 3 commits

December 16, 2024 16:38


          New file draft

e70217d


          Draft proposals

c8f7d6f


          Add detail.

b225283

codecov bot commented Dec 18, 2024 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.82%. Comparing base (4593ba7) to head (52d0cdb).
Report is 231 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11948      +/-   ##
==========================================
+ Coverage   91.59%   91.82%   +0.23%     
==========================================
  Files         449      463      +14     
  Lines       23761    24774    +1013     
==========================================
+ Hits        21763    22748     +985     
- Misses       1623     1644      +21     
- Partials      375      382       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jmacd added 6 commits

December 18, 2024 13:59


          Revisions.

dc8a6d2


          Lint

065cff5


          Shorten

d806842


          Chlog

0725f32


          Typo

ed69aaf


          Typos

46f1e79

Contributor Author

jmacd commented Dec 19, 2024

I will present this RFC in the next two Collector SIG meetings 1/7/2025 (APAC/PT) and 1/15/2025 (NA).

jmacd marked this pull request as ready for review

December 19, 2024 19:07

jmacd requested a review from a team as a code owner

December 19, 2024 19:07

jmacd requested a review from evan-bradley

December 19, 2024 19:07

bogdandrutu reviewed

View reviewed changes

docs/rfcs/consistent-timeout-handling.md Outdated Show resolved Hide resolved

docs/rfcs/consistent-timeout-handling.md Outdated Show resolved Hide resolved

docs/rfcs/consistent-timeout-handling.md Outdated Show resolved Hide resolved

docs/rfcs/consistent-timeout-handling.md Outdated Show resolved Hide resolved

docs/rfcs/consistent-timeout-handling.md Outdated Show resolved Hide resolved

docs/rfcs/consistent-timeout-handling.md Outdated Show resolved Hide resolved

docs/rfcs/consistent-timeout-handling.md

Comment on lines +279 to +296

+              ### Queue sender
+              A new field will be introduced, with default matching the
+              original behavior of this component.
+              ```golang
+                // FailFast indicates that the queue should immediately
+                // reject requests when the queue is full, without considering
+                // the request deadline. Default: true.
+                FailFast bool `mapstructure:"fail_fast"`
+              ```
+              In case the new FailFast flag is false, there are two cases:
+. The request has a deadline. In this case, wait until the deadline
+                 to enqueue the request.
+. The request has no deadline. In this case, let the request block
+                 indefinitely until it can be enqueued.

Member

bogdandrutu Jan 2, 2025

Nice

docs/rfcs/consistent-timeout-handling.md


		## Specific proposals

		### Timeout sender

Member

bogdandrutu Jan 2, 2025

Still struggle with this and what you try to achieve. Have you seen a real problem with this and what do you try to achieve.

Contributor Author

jmacd Jan 22, 2025

The real problem stems from disabling the queue sender and/or the batch sender. Suppose another collector, one producing data into this component has been delayed, so that by the time a request arrives at the timeout sender it has 1s of timeout. The configuration of the component states "Timeout is the timeout for every attempt to send data to the backend.", but that is not true. A configuration of "5s" will not raise the timeout and there is no support anywhere in the pipeline to ignore the incoming timeout.

The existing behavior, that timeout indicates a maximum value, that's fine by itself. The fact that the timeout sender has no way to reset the timeout (to unlimited) or reject a too-small timeout anticipating deadline exceeded, those are limitations I want to fix.

docs/rfcs/consistent-timeout-handling.md

+              to use deadlines always and/or use separate pipelines for requests
+              with and without deadlines.
+              ### Receiver helper

Member

bogdandrutu Jan 2, 2025

There are some behaviors that we can automatically check for all the receivers like the deadline expires by adding virtual components in the processing graph while initialization the service.

Some components that require configuration do need a helper/user code change.

Contributor Author

jmacd Jan 22, 2025

Yes. I expect a receiver that is deadline-aware to reject requests before it constructs a pdata object, if the request is expired. Therefore, I'm not sure having it checked prior to every processor makes sense, so only the processors that block need special logic to await cancelation. Otherwise, exporters could consistently enforce deadline handling (e.g., in the HTTP exporter) even though some exporters (e.g., gRPC) enforce it themselves.

docs/rfcs/consistent-timeout-handling.md

Comment on lines +327 to +328

		- `min_timeout` (duration): Limits the allowable timeout for new requests to a minimum value. >=0 means deadline checking.
		- `timeout` (duration): Limits the allowable timeout for new requests to a maximum value. Must be >= 0.

Member

bogdandrutu Jan 2, 2025

Why not having a processor for this?

Contributor Author

jmacd Jan 22, 2025

I support the idea of a processor to disregard or reset the timeout, but I also think it's just as good done in the timeout sender.

I've stated that I think all the functionality of the exporterhelper would be useful in a processor, I'd call that a "pipeline processor". open-telemetry/opentelemetry-collector-contrib#35803 (comment)

jmacd added 2 commits

January 21, 2025 17:03


          Work on Bogdan's feedback.

d14766c


          Shorten

52d0cdb

jmacd mentioned this pull request

[exporter][queuebatching] Implemented "DisabledQueue" which is used when batch is enabled but queue is disabled #12118

Closed

jmacd marked this pull request as draft

February 3, 2025 22:43

Contributor Author

jmacd commented Feb 3, 2025

As with my related RFC, #11947, I am not sure this is up-to-date. I will close and wait for the batch/queue_sender dust to settle, then see if an RFC is still the correct way to proceed.

jmacd closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

bogdandrutu bogdandrutu left review comments

evan-bradley Awaiting requested review from evan-bradley evan-bradley is a code owner automatically assigned from open-telemetry/collector-approvers

codeboten Awaiting requested review from codeboten codeboten will be requested when the pull request is marked ready for review codeboten is a code owner

dmitryax Awaiting requested review from dmitryax dmitryax will be requested when the pull request is marked ready for review dmitryax is a code owner

mx-psi Awaiting requested review from mx-psi mx-psi will be requested when the pull request is marked ready for review mx-psi is a code owner

Labels

None yet