-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Adaptive concurrency documentation and stats #8582
Merged
Merged
Changes from 10 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
b1191df
wip
e826d94
more docs
7711f1e
draft
3c9acd7
fix format and test new stat
ab749c1
format
0077514
Merge remote-tracking branch 'upstream/master' into acc_docs
33b3066
typo
eee2321
snow comments
48f6438
format
e76ee97
Add limitations
tonya11en 26ccda7
Matt's comments.
tonya11en File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
197 changes: 197 additions & 0 deletions
197
docs/root/configuration/http/http_filters/adaptive_concurrency_filter.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
.. _config_http_filters_adaptive_concurrency: | ||
|
||
Adaptive Concurrency | ||
==================== | ||
|
||
.. attention:: | ||
|
||
The adaptive concurrency filter is experimental and is currently under active development. | ||
|
||
This filter should be configured with the name `envoy.filters.http.adaptive_concurrency`. | ||
|
||
See the :ref:`v2 API reference <envoy_api_msg_config.filter.http.adaptive_concurrency.v2alpha.AdaptiveConcurrency>` for details on each configuration parameter. | ||
|
||
Overview | ||
-------- | ||
The adaptive concurrency filter dynamically adjusts the allowed number of requests that can be | ||
outstanding (concurrency) to all hosts in a given cluster at any time. Concurrency values are | ||
calculated using latency sampling of completed requests and comparing the measured samples in a time | ||
window against the expected latency for hosts in the cluster. | ||
|
||
Concurrency Controllers | ||
----------------------- | ||
Concurrency controllers implement the algorithm responsible for making forwarding decisions for each | ||
request and recording latency samples to use in the calculation of the concurrency limit. | ||
|
||
Gradient Controller | ||
~~~~~~~~~~~~~~~~~~~ | ||
The gradient controller makes forwarding decisions based on a periodically measured ideal round-trip | ||
time (minRTT) for an upstream. | ||
|
||
:ref:`v2 API reference <envoy_api_msg_config.filter.http.adaptive_concurrency.v2alpha.GradientControllerConfig>` | ||
|
||
Calculating the minRTT | ||
^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The minRTT is periodically measured by only allowing a single outstanding request at a time to an | ||
upstream cluster and measuring the latency under these ideal conditions. The length of this minRTT | ||
calculation window is variable depending on the number of requests the filter is configured to | ||
aggregate to represent the expected latency of an upstream. | ||
|
||
A configurable *jitter* value is used to randomly delay the start of the minRTT calculation window | ||
by some amount of time. This is not necessary and can be disabled; however, it is recommended to | ||
prevent all hosts in a cluster from being in a minRTT calculation window (and having a concurrency | ||
limit of 1) at the same time. The jitter helps negate the effect of the minRTT calculation on the | ||
downstream success rate if retries are enabled. | ||
|
||
It is possible that there is a noticeable increase in request 503s during the minRTT measurement | ||
window because of the potentially significant drop in the concurrency limit. This is expected and it | ||
is recommended to enable retries for resets/503s. | ||
|
||
.. note:: | ||
|
||
It is recommended to use :ref:`the previous_hosts retry predicate | ||
<arch_overview_http_retry_plugins>`. Due to the minRTT recalculation jitter, it's unlikely that | ||
all hosts in the cluster will be in a minRTT calculation window, so retrying on a different host | ||
in the cluster will have a higher likelihood of success in this scenario. | ||
|
||
Once calculated, the minRTT is then used in the calculation of a value referred to as the | ||
*gradient*. | ||
|
||
The Gradient | ||
^^^^^^^^^^^^ | ||
The gradient is calculated using summarized sampled request latencies (sampleRTT): | ||
|
||
.. math:: | ||
|
||
gradient = \frac{minRTT}{sampleRTT} | ||
|
||
This gradient value has a useful property, such that it decreases as the sampled latencies increase. | ||
The gradient value is then used to update the concurrency limit via: | ||
|
||
.. math:: | ||
|
||
limit_{new} = gradient * limit_{old} + headroom | ||
|
||
Concurrency Limit Headroom | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
The headroom value is necessary as a driving factor to increase the concurrency limit when the | ||
sampleRTT is in the same ballpark as the minRTT. This value must be present in the limit | ||
calculation, since it forces the concurrency limit to increase until there is a deviation from the | ||
minRTT latency. In the absence of a headroom value, the concurrency limit could potentially stagnate | ||
at an unnecessary small value if the sampleRTT and minRTT are close to each other. | ||
|
||
Because the headroom value is so necessary to the proper function for the gradient controller, the | ||
headroom value is unconfigurable and pinned to the square-root of the concurrency limit. | ||
|
||
Limitations | ||
----------- | ||
The adaptive concurrency filter's control loop relies on latency measurements | ||
and adjustments to the concurrency limit based on those measurements. Because of | ||
this, the filter must operate in conditions where it has full control over | ||
request concurrency. This means that: | ||
|
||
1. The filter works as intended in the filter chain for a local cluster. | ||
|
||
2. The filter must be able to limit the concurrency for a cluster. This means | ||
there must not be requests destined for a cluster that are not decoded by | ||
the adaptive concurrency filter. | ||
|
||
Example Configuration | ||
--------------------- | ||
An example filter configuration can be found below. Not all fields are required and many of the | ||
fields can be overridden via runtime settings. | ||
|
||
.. code-block:: yaml | ||
|
||
name: envoy.filters.http.adaptive_concurrency | ||
config: | ||
gradient_controller_config: | ||
sample_aggregate_percentile: | ||
value: 90 | ||
concurrency_limit_params: | ||
concurrency_update_interval: 0.1s | ||
min_rtt_calc_params: | ||
jitter: | ||
value: 10 | ||
interval: 60s | ||
request_count: 50 | ||
enabled: | ||
default_value: true | ||
runtime_key: "adaptive_concurrency.enabled" | ||
|
||
The above configuration can be understood as follows: | ||
|
||
* Gather latency samples for a time window of 100ms. When entering a new window, summarize the | ||
requests (sampleRTT) and and update the concurrency limit using this sampleRTT. | ||
* When calculating the sampleRTT, use the p90 of all sampled latencies for that window. | ||
* Recalculate the minRTT every 60s and add a jitter (random delay) of 0s-6s to the start of the | ||
minRTT recalculation. The delay is dictated by the jitter value. | ||
* Collect 50 request samples to calculate the minRTT and use the p90 to summarize them. | ||
* The filter is enabled by default. | ||
|
||
.. note:: | ||
|
||
It is recommended that the adaptive concurrency filter come after the healthcheck filter in the | ||
filter chain to prevent latency sampling of health checks. If health check traffic is sampled, | ||
it could potentially affect the accuracy of the minRTT measurements. | ||
|
||
Runtime | ||
------- | ||
|
||
The adaptive concurrency filter supports the following runtime settings: | ||
|
||
adaptive_concurrency.enabled | ||
Overrides whether the adaptive concurrency filter will use the concurrency controller for | ||
forwarding decisions. If set to `false`, the filter will be a no-op. Defaults to what is | ||
specified for `enabled` in the filter configuration. | ||
|
||
adaptive_concurrency.gradient_controller.min_rtt_calc_interval_ms | ||
Overrides the interval in which the ideal round-trip time (minRTT) will be recalculated. | ||
|
||
adaptive_concurrency.gradient_controller.min_rtt_aggregate_request_count | ||
Overrides the number of requests sampled for calculation of the minRTT. | ||
|
||
adaptive_concurrency.gradient_controller.jitter | ||
Overrides the random delay introduced to the minRTT calculation start time. A value of `10` | ||
indicates a random delay of 10% of the configured interval. The runtime value specified is | ||
clamped to the range [0,100]. | ||
|
||
adaptive_concurrency.gradient_controller.sample_rtt_calc_interval_ms | ||
Overrides the interval in which the concurrency limit is recalculated based on sampled latencies. | ||
|
||
adaptive_concurrency.gradient_controller.max_concurrency_limit | ||
Overrides the maximum allowed concurrency limit. | ||
|
||
adaptive_concurrency.gradient_controller.max_gradient | ||
Overrides the maximum allowed gradient value. | ||
|
||
adaptive_concurrency.gradient_controller.sample_aggregate_percentile | ||
Overrides the percentile value used to represent the collection of latency samples in | ||
calculations. A value of `95` indicates the 95th percentile. The runtime value specified is | ||
clamped to the range [0,100]. | ||
|
||
Statistics | ||
---------- | ||
The adaptive concurrency filter outputs statistics in the | ||
*http.<stat_prefix>.adaptive_concurrency.* namespace. The :ref:`stat prefix | ||
<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.stat_prefix>` | ||
comes from the owning HTTP connection manager. Statistics are specific to the concurrency | ||
controllers. | ||
|
||
Gradient Controller Statistics | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
The gradient controller uses the namespace | ||
*http.<stat_prefix>.adaptive_concurrency.gradient_controller*. | ||
|
||
.. csv-table:: | ||
:header: Name, Type, Description | ||
:widths: auto | ||
|
||
rq_blocked, Counter, Total requests that were blocked by the filter. | ||
min_rtt_calculation_active, Gauge, Set to 1 if the controller is in the process of a minRTT calculation. 0 otherwise. | ||
concurrency_limit, Gauge, The current concurrency limit. | ||
gradient, Gauge, The current gradient value. | ||
burst_queue_size, Gauge, The current headroom value in the concurrency limit calculation. | ||
min_rtt_msecs, Gauge, The current measured minRTT value. | ||
sample_rtt_msecs, Gauge, The current measured sampleRTT aggregate. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,7 @@ namespace ConcurrencyController { | |
*/ | ||
#define ALL_GRADIENT_CONTROLLER_STATS(COUNTER, GAUGE) \ | ||
COUNTER(rq_blocked) \ | ||
GAUGE(min_rtt_calculation_active, Accumulate) \ | ||
tonya11en marked this conversation as resolved.
Show resolved
Hide resolved
|
||
GAUGE(concurrency_limit, NeverImport) \ | ||
GAUGE(gradient, NeverImport) \ | ||
GAUGE(burst_queue_size, NeverImport) \ | ||
|
@@ -70,36 +71,39 @@ class GradientControllerConfig : public Logger::Loggable<Logger::Id::filter> { | |
} | ||
|
||
double maxGradient() const { | ||
return runtime_.snapshot().getDouble(RuntimeKeys::get().MaxGradientKey, max_gradient_); | ||
return std::max( | ||
1.0, runtime_.snapshot().getDouble(RuntimeKeys::get().MaxGradientKey, max_gradient_)); | ||
} | ||
|
||
// The percentage is normalized to the range [0.0, 1.0]. | ||
double sampleAggregatePercentile() const { | ||
return runtime_.snapshot().getDouble(RuntimeKeys::get().SampleAggregatePercentileKey, | ||
sample_aggregate_percentile_) / | ||
100.0; | ||
const double val = runtime_.snapshot().getDouble( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: prefer code and major docs PR to be separate, but it's fine for this PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 in the future. |
||
RuntimeKeys::get().SampleAggregatePercentileKey, sample_aggregate_percentile_); | ||
return std::max(0.0, std::min(val, 100.0)) / 100.0; | ||
tonya11en marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
// The percentage is normalized to the range [0.0, 1.0]. | ||
// The percentage is normalized and clamped to the range [0.0, 1.0]. | ||
double jitterPercent() const { | ||
return runtime_.snapshot().getDouble(RuntimeKeys::get().JitterPercentKey, jitter_pct_) / 100.0; | ||
const double val = | ||
runtime_.snapshot().getDouble(RuntimeKeys::get().JitterPercentKey, jitter_pct_); | ||
return std::max(0.0, std::min(val, 100.0)) / 100.0; | ||
} | ||
|
||
private: | ||
class RuntimeKeyValues { | ||
public: | ||
const std::string MinRTTCalcIntervalKey = | ||
"http.adaptive_concurrency.gradient_controller.min_rtt_calc_interval_ms"; | ||
"adaptive_concurrency.gradient_controller.min_rtt_calc_interval_ms"; | ||
const std::string SampleRTTCalcIntervalKey = | ||
"http.adaptive_concurrency.gradient_controller.sample_rtt_calc_interval_ms"; | ||
"adaptive_concurrency.gradient_controller.sample_rtt_calc_interval_ms"; | ||
const std::string MaxConcurrencyLimitKey = | ||
"http.adaptive_concurrency.gradient_controller.max_concurrency_limit"; | ||
"adaptive_concurrency.gradient_controller.max_concurrency_limit"; | ||
const std::string MinRTTAggregateRequestCountKey = | ||
"http.adaptive_concurrency.gradient_controller.min_rtt_aggregate_request_count"; | ||
const std::string MaxGradientKey = "http.adaptive_concurrency.gradient_controller.max_gradient"; | ||
"adaptive_concurrency.gradient_controller.min_rtt_aggregate_request_count"; | ||
const std::string MaxGradientKey = "adaptive_concurrency.gradient_controller.max_gradient"; | ||
const std::string SampleAggregatePercentileKey = | ||
"http.adaptive_concurrency.gradient_controller.sample_aggregate_percentile"; | ||
const std::string JitterPercentKey = "http.adaptive_concurrency.gradient_controller.jitter"; | ||
"adaptive_concurrency.gradient_controller.sample_aggregate_percentile"; | ||
const std::string JitterPercentKey = "adaptive_concurrency.gradient_controller.jitter"; | ||
}; | ||
|
||
using RuntimeKeys = ConstSingleton<RuntimeKeyValues>; | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that if we ever want to support both ingress and egress adaptive concurrency in a single side car, we will have conflicting runtime names. Would it be better to not hard code these names and instead read them from runtime value configuration fields or similar? We might consider doing this in a follow up before we consider this filter production ready. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these runtime parameters are all overrides of the config parameters, we can just use the runtime configuration fields for the config like you mention. That'll allow for unique runtime names.
Let's knock that out in a different patch. I'll open an issue.