-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Consistent timeout handling in Collector pipelines #11948
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #11948 +/- ##
==========================================
+ Coverage 91.59% 91.82% +0.23%
==========================================
Files 449 463 +14
Lines 23761 24774 +1013
==========================================
+ Hits 21763 22748 +985
- Misses 1623 1644 +21
- Partials 375 382 +7 ☔ View full report in Codecov by Sentry. |
I will present this RFC in the next two Collector SIG meetings 1/7/2025 (APAC/PT) and 1/15/2025 (NA). |
### Queue sender | ||
|
||
A new field will be introduced, with default matching the | ||
original behavior of this component. | ||
|
||
```golang | ||
// FailFast indicates that the queue should immediately | ||
// reject requests when the queue is full, without considering | ||
// the request deadline. Default: true. | ||
FailFast bool `mapstructure:"fail_fast"` | ||
``` | ||
|
||
In case the new FailFast flag is false, there are two cases: | ||
|
||
1. The request has a deadline. In this case, wait until the deadline | ||
to enqueue the request. | ||
2. The request has no deadline. In this case, let the request block | ||
indefinitely until it can be enqueued. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
|
||
## Specific proposals | ||
|
||
### Timeout sender |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still struggle with this and what you try to achieve. Have you seen a real problem with this and what do you try to achieve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real problem stems from disabling the queue sender and/or the batch sender. Suppose another collector, one producing data into this component has been delayed, so that by the time a request arrives at the timeout sender it has 1s of timeout. The configuration of the component states "Timeout is the timeout for every attempt to send data to the backend.", but that is not true. A configuration of "5s" will not raise the timeout and there is no support anywhere in the pipeline to ignore the incoming timeout.
The existing behavior, that timeout
indicates a maximum value, that's fine by itself. The fact that the timeout sender has no way to reset the timeout (to unlimited) or reject a too-small timeout anticipating deadline exceeded, those are limitations I want to fix.
to use deadlines always and/or use separate pipelines for requests | ||
with and without deadlines. | ||
|
||
### Receiver helper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some behaviors that we can automatically check for all the receivers like the deadline expires by adding virtual components in the processing graph while initialization the service.
Some components that require configuration do need a helper/user code change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I expect a receiver that is deadline-aware to reject requests before it constructs a pdata object, if the request is expired. Therefore, I'm not sure having it checked prior to every processor makes sense, so only the processors that block need special logic to await cancelation. Otherwise, exporters could consistently enforce deadline handling (e.g., in the HTTP exporter) even though some exporters (e.g., gRPC) enforce it themselves.
- `min_timeout` (duration): Limits the allowable timeout for new requests to a minimum value. >=0 means deadline checking. | ||
- `timeout` (duration): Limits the allowable timeout for new requests to a maximum value. Must be >= 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not having a processor for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support the idea of a processor to disregard or reset the timeout, but I also think it's just as good done in the timeout sender.
I've stated that I think all the functionality of the exporterhelper would be useful in a processor, I'd call that a "pipeline processor". open-telemetry/opentelemetry-collector-contrib#35803 (comment)
As with my related RFC, #11947, I am not sure this is up-to-date. I will close and wait for the batch/queue_sender dust to settle, then see if an RFC is still the correct way to proceed. |
Description
Calls for deadline-awareness across common Collector pipeline components, including batch processors, queue sender, retry and
timeout senders.
Link to tracking issue
Part of #11183
Testing
n/a
Documentation
This is a new RFC. As these changes are accepted and implemented, user-facing documentation will be added.