-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Adaptive concurrency documentation and stats #8582
Conversation
Signed-off-by: Tony Allen <tallen@lyft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look at the docs from someone who hasn't been following along - seems pretty well explained in general, just a few comments
docs/root/configuration/http/http_filters/adaptive_concurrency_filter.rst
Outdated
Show resolved
Hide resolved
docs/root/configuration/http/http_filters/adaptive_concurrency_filter.rst
Outdated
Show resolved
Hide resolved
docs/root/configuration/http/http_filters/adaptive_concurrency_filter.rst
Outdated
Show resolved
Hide resolved
docs/root/configuration/http/http_filters/adaptive_concurrency_filter.rst
Outdated
Show resolved
Hide resolved
Signed-off-by: Tony Allen <tallen@lyft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; I haven't been following this that closely either. The only question I have is whether there are references to other systems that implement this algorithm?
Also, this only works if the system is linear, right? Is there some heuristic to know whether adaptive control works in a given situation, or does it basically work in all practical scenarios?
return runtime_.snapshot().getDouble(RuntimeKeys::get().SampleAggregatePercentileKey, | ||
sample_aggregate_percentile_) / | ||
100.0; | ||
const double val = runtime_.snapshot().getDouble( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: prefer code and major docs PR to be separate, but it's fine for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 in the future.
@htuch thanks for taking a look. The only other system I'm aware of that has implemented this algorithm is the Netflix concurrency limits library and the associated blog post. I'll admit I didn't look too closely at the Java library, so my implementation here might be dramatically different; however, the concurrency limit calculation should be identical. There are still a lot that needs to be explored regarding the filter's performance under various workloads. I've only tested the filter for ingress circuit breaking while protecting a single host. Also, since the filter's decisions are measurement-based, I doubt it would perform well in situations where it cannot sample all requests going to the host or in scenarios where the latency distribution is bi-modal (since the sampleRTT is measured as a percentile). To figure out the situations where this filter does work well, I hacked together a client/server load test with configurable RPS, concurrency, request latencies, etc. I have a lot of data showing how the filter performs under various workload patterns and scenarios where the upstream itself begins to "degrade" by artificially increasing the server's time to respond. I can share this data if you'd like, but I didn't think the configuration docs were the place for my experimental results. I did recently provide an update to the parent issue with a single experiment that might be of interest. I haven't found any synthetic workload patterns where the filter doesn't work as expected. I'll be rolling this out in our staging environment over the next few weeks to see how it performs in real life, so stay tuned for those results. |
@tonya11en thanks for the context, it sounds like this is an awesome piece of kit. I agree we don't need a complete writeup of experiments in the docs, but some explanation of how this technology works and where its limits might be would be great. I'm thinking a single paragraph, a few sentences. |
Signed-off-by: Tony Allen <tony@allen.gg>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, awesome work. A few small comments. Super excited to see this land.
/wait
|
||
The adaptive concurrency filter supports the following runtime settings: | ||
|
||
adaptive_concurrency.enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that if we ever want to support both ingress and egress adaptive concurrency in a single side car, we will have conflicting runtime names. Would it be better to not hard code these names and instead read them from runtime value configuration fields or similar? We might consider doing this in a follow up before we consider this filter production ready. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these runtime parameters are all overrides of the config parameters, we can just use the runtime configuration fields for the config like you mention. That'll allow for unique runtime names.
Let's knock that out in a different patch. I'll open an issue.
...ce/extensions/filters/http/adaptive_concurrency/concurrency_controller/gradient_controller.h
Outdated
Show resolved
Hide resolved
return runtime_.snapshot().getDouble(RuntimeKeys::get().SampleAggregatePercentileKey, | ||
sample_aggregate_percentile_) / | ||
100.0; | ||
const double val = runtime_.snapshot().getDouble( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 in the future.
...ce/extensions/filters/http/adaptive_concurrency/concurrency_controller/gradient_controller.h
Show resolved
Hide resolved
Signed-off-by: Tony Allen <tony@allen.gg>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Is there a point where this moves out of "experimental" in the documentation? (still that way as of 1.16.x) |
I'm not quite sure what the criteria are for removing that disclaimer. Have you used this filter in your deployments or are you waiting for a "green light" before trying it out? |
definitely the second, as it looks like the netflix lib this was based off of (https://github.com/Netflix/concurrency-limits/) is now abandoned |
Documentation for the adaptive concurrency control filter. This patch also introduces a new stat to the filter and adds some coverage in an existing test.
Fixes #7789