docs: Adaptive concurrency documentation and stats #8582

tonya11en · 2019-10-11T01:50:26Z

Documentation for the adaptive concurrency control filter. This patch also introduces a new stat to the filter and adds some coverage in an existing test.

Fixes #7789

Signed-off-by: Tony Allen <tallen@lyft.com>

repokitteh-read-only · 2019-10-11T01:50:30Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #8582 was opened by tonya11en.

see: more, trace.

Signed-off-by: Tony Allen <tallen@lyft.com>

snowp

Took a look at the docs from someone who hasn't been following along - seems pretty well explained in general, just a few comments

docs/root/configuration/http/http_filters/adaptive_concurrency_filter.rst

Signed-off-by: Tony Allen <tallen@lyft.com>

htuch

LGTM; I haven't been following this that closely either. The only question I have is whether there are references to other systems that implement this algorithm?

Also, this only works if the system is linear, right? Is there some heuristic to know whether adaptive control works in a given situation, or does it basically work in all practical scenarios?

htuch · 2019-10-14T01:04:29Z

...ce/extensions/filters/http/adaptive_concurrency/concurrency_controller/gradient_controller.h

-    return runtime_.snapshot().getDouble(RuntimeKeys::get().SampleAggregatePercentileKey,
-                                         sample_aggregate_percentile_) /
-           100.0;
+    const double val = runtime_.snapshot().getDouble(


Nit: prefer code and major docs PR to be separate, but it's fine for this PR.

+1 in the future.

tonya11en · 2019-10-14T05:39:16Z

@htuch thanks for taking a look. The only other system I'm aware of that has implemented this algorithm is the Netflix concurrency limits library and the associated blog post. I'll admit I didn't look too closely at the Java library, so my implementation here might be dramatically different; however, the concurrency limit calculation should be identical.

There are still a lot that needs to be explored regarding the filter's performance under various workloads. I've only tested the filter for ingress circuit breaking while protecting a single host. Also, since the filter's decisions are measurement-based, I doubt it would perform well in situations where it cannot sample all requests going to the host or in scenarios where the latency distribution is bi-modal (since the sampleRTT is measured as a percentile).

To figure out the situations where this filter does work well, I hacked together a client/server load test with configurable RPS, concurrency, request latencies, etc. I have a lot of data showing how the filter performs under various workload patterns and scenarios where the upstream itself begins to "degrade" by artificially increasing the server's time to respond. I can share this data if you'd like, but I didn't think the configuration docs were the place for my experimental results. I did recently provide an update to the parent issue with a single experiment that might be of interest.

I haven't found any synthetic workload patterns where the filter doesn't work as expected. I'll be rolling this out in our staging environment over the next few weeks to see how it performs in real life, so stay tuned for those results.

htuch · 2019-10-14T22:52:58Z

@tonya11en thanks for the context, it sounds like this is an awesome piece of kit. I agree we don't need a complete writeup of experiments in the docs, but some explanation of how this technology works and where its limits might be would be great. I'm thinking a single paragraph, a few sentences.

Signed-off-by: Tony Allen <tony@allen.gg>

mattklein123

Thanks, awesome work. A few small comments. Super excited to see this land.

/wait

mattklein123 · 2019-10-16T23:34:35Z

docs/root/configuration/http/http_filters/adaptive_concurrency_filter.rst

+
+The adaptive concurrency filter supports the following runtime settings:
+
+adaptive_concurrency.enabled


It occurs to me that if we ever want to support both ingress and egress adaptive concurrency in a single side car, we will have conflicting runtime names. Would it be better to not hard code these names and instead read them from runtime value configuration fields or similar? We might consider doing this in a follow up before we consider this filter production ready. WDYT?

Since these runtime parameters are all overrides of the config parameters, we can just use the runtime configuration fields for the config like you mention. That'll allow for unique runtime names.

Let's knock that out in a different patch. I'll open an issue.

...ce/extensions/filters/http/adaptive_concurrency/concurrency_controller/gradient_controller.h

mattklein123 · 2019-10-16T23:35:38Z

...ce/extensions/filters/http/adaptive_concurrency/concurrency_controller/gradient_controller.h

-    return runtime_.snapshot().getDouble(RuntimeKeys::get().SampleAggregatePercentileKey,
-                                         sample_aggregate_percentile_) /
-           100.0;
+    const double val = runtime_.snapshot().getDouble(


+1 in the future.

...ce/extensions/filters/http/adaptive_concurrency/concurrency_controller/gradient_controller.h

Signed-off-by: Tony Allen <tony@allen.gg>

mattklein123

Thanks!

Onewaysidewalks · 2020-09-08T21:20:23Z

Is there a point where this moves out of "experimental" in the documentation? (still that way as of 1.16.x)

tonya11en · 2020-09-10T17:09:22Z

I'm not quite sure what the criteria are for removing that disclaimer. Have you used this filter in your deployments or are you waiting for a "green light" before trying it out?

Onewaysidewalks · 2020-09-15T04:10:40Z

definitely the second, as it looks like the netflix lib this was based off of (https://github.com/Netflix/concurrency-limits/) is now abandoned

Tony Allen added 6 commits October 9, 2019 17:11

wip

b1191df

Signed-off-by: Tony Allen <tallen@lyft.com>

more docs

e826d94

Signed-off-by: Tony Allen <tallen@lyft.com>

draft

7711f1e

Signed-off-by: Tony Allen <tallen@lyft.com>

fix format and test new stat

3c9acd7

Signed-off-by: Tony Allen <tallen@lyft.com>

format

ab749c1

Signed-off-by: Tony Allen <tallen@lyft.com>

Merge remote-tracking branch 'upstream/master' into acc_docs

0077514

tonya11en requested a review from mattklein123 as a code owner October 11, 2019 01:50

typo

33b3066

Signed-off-by: Tony Allen <tallen@lyft.com>

jmarantz assigned mattklein123 and htuch Oct 11, 2019

snowp suggested changes Oct 11, 2019

View reviewed changes

Tony Allen added 2 commits October 11, 2019 12:38

snow comments

eee2321

Signed-off-by: Tony Allen <tallen@lyft.com>

format

48f6438

Signed-off-by: Tony Allen <tallen@lyft.com>

htuch suggested changes Oct 14, 2019

View reviewed changes

Add limitations

e76ee97

Signed-off-by: Tony Allen <tony@allen.gg>

mattklein123 requested changes Oct 16, 2019

View reviewed changes

repokitteh-read-only bot added the waiting label Oct 16, 2019

Matt's comments.

26ccda7

Signed-off-by: Tony Allen <tony@allen.gg>

repokitteh-read-only bot removed the waiting label Oct 17, 2019

mattklein123 approved these changes Oct 17, 2019

View reviewed changes

mattklein123 merged commit 999c27b into envoyproxy:master Oct 17, 2019

tonya11en deleted the acc_docs branch November 13, 2019 04:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Adaptive concurrency documentation and stats #8582

docs: Adaptive concurrency documentation and stats #8582

tonya11en commented Oct 11, 2019 •

edited by mattklein123

Loading

repokitteh-read-only bot commented Oct 11, 2019

snowp left a comment

htuch left a comment

htuch Oct 14, 2019

mattklein123 Oct 16, 2019

tonya11en commented Oct 14, 2019

htuch commented Oct 14, 2019

mattklein123 left a comment

mattklein123 Oct 16, 2019

tonya11en Oct 17, 2019

mattklein123 Oct 16, 2019

mattklein123 left a comment

Onewaysidewalks commented Sep 8, 2020

tonya11en commented Sep 10, 2020

Onewaysidewalks commented Sep 15, 2020


		The adaptive concurrency filter supports the following runtime settings:

		adaptive_concurrency.enabled

docs: Adaptive concurrency documentation and stats #8582

docs: Adaptive concurrency documentation and stats #8582

Conversation

tonya11en commented Oct 11, 2019 • edited by mattklein123 Loading

repokitteh-read-only bot commented Oct 11, 2019

snowp left a comment

Choose a reason for hiding this comment

htuch left a comment

Choose a reason for hiding this comment

htuch Oct 14, 2019

Choose a reason for hiding this comment

mattklein123 Oct 16, 2019

Choose a reason for hiding this comment

tonya11en commented Oct 14, 2019

htuch commented Oct 14, 2019

mattklein123 left a comment

Choose a reason for hiding this comment

mattklein123 Oct 16, 2019

Choose a reason for hiding this comment

tonya11en Oct 17, 2019

Choose a reason for hiding this comment

mattklein123 Oct 16, 2019

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Onewaysidewalks commented Sep 8, 2020

tonya11en commented Sep 10, 2020

Onewaysidewalks commented Sep 15, 2020

tonya11en commented Oct 11, 2019 •

edited by mattklein123

Loading