-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idle cpu consumption >40% #996
Comments
(#986) |
It's |
xemul
added a commit
to xemul/seastar
that referenced
this issue
Jan 11, 2022
Right now tokens replenisher runs from steady-clock timer of a group. This generates several troubles. First is that the replenishing rate is hard-coded to be 500 usec and it has no justification other than "seems to work decent enough with default latency goal". Next, when the reactor is idling it's woken up by this timer frequent enough to generate a noticeable user time. And finally, the timer sits on a group and is thus run by a single shard thus making the whole group depending on this shard stallness. The proposed fix is to make each shard replenishing the capacity when it really needs it. Benefits of this approach are: - no magic 500us constant for replenish rate - no dependency on a single shard rate (replenisher is shard-safe) - no user-time generation when idling fixes: scylladb#996 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
avikivity
added a commit
that referenced
this issue
Jan 11, 2022
" The rate-limiter-based IO scheduler uses two token-buckets to rate limit the requests rate. The tokens are put into the second bucket (from where they are then grabbed for dispatch) by the procedure called "replenisher" which is run by a steady timer. This timer generates several troubles: its rate is magically selected, it runs on a single shard, it generates a noticeable user time when the reactor is idling. To fix that the proposal is to make io-queue poller replenish the tokens from all shards when they need them. Before this change it's worth tuning the replenishment threshold to be not less than the minimal capacity that can be claimed from the group. Verified on i3en instance with the rl-iosched. tests: unit(dev), manual.rl-iosched(dev) This set places one more item into the TODO list. If the disk slows down for some reason the replenisher may start generating more tokens for the 2nd bucket than there appears on the 1st. When it happens the replenishment code drops some re-generated tokens until some future time, thus slowing down its rate. This behavior is deliberate and was aimed at making the token-buckets adopt to the real disk speed. However, this logic may lead to false drops. The tokens appear on the 1st bucket in batches, with the "trendline" being at the expected rate. However, the replenisher most likely runs between those batches thus constantly generating more tokens just because those batches are not "linear enough". This is what surfaced during verification -- when the replenisher was switched into on-demand manner it became more "aggressive" thus losing more tokens. This was partially addressed by the threshold increase, but some more care is still needed. " Fixes #996 * 'br-fair-group-replenish-relax' of https://github.com/xemul/seastar: fair_queue: Replenish tokens on grab, not by timer fair_queue, io_queue: Configure replenish threshold from minimal IO request fair_group: Generalize duration -> capacity conversion fair_queue: Tune-up clock_type fair_queue: Remove unused _base
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bisected to 837cadb
reproducer: build/dev/apps/httpd/httpd --smp 2
Observe shard 0 eats 40% cpu instead of 2-3%.
The text was updated successfully, but these errors were encountered: