-
Notifications
You must be signed in to change notification settings - Fork 255
Minimal code changes run rls at 500% CPU for minutes #688
Comments
I tried running with |
This also appears to be the case for the rls that's shipped with the current stable release. |
So, I tried to repro but Cargo couldn't make the One thing that changed somewhat recently is that if there is not a build running, then we will start one immediately on typing instead of waiting for some time (we only wait on subsequent builds). We still don't run more that one build at once (though we may be running a build at the same time as indexing the previous build). We've never been able to cancel builds once they start. |
I have a hypothesis for something which might cause this. Builds in the RLS are a It is possible that if you have a crate where the indexing time is greater than the build time, then you could end up with lots of indexing happening at the same time, even though multiple builds are not. I wonder if something like that is happening here? @sfackler could you run with RUST_LOG=rls=debug please? That would avoid getting spammed by Racer or the compiler. |
Will do, but I won’t have a chance until Wednesday
…On Sun, Feb 4, 2018 at 5:37 PM Nick Cameron ***@***.***> wrote:
I have a hypothesis for something which might cause this. Builds in the
RLS are a cargo check followed by post-processing the generated
save-analysis data. The former is usually much longer than the latter. The
way we queue builds is that we never start one until the current one has
finished and we wait for a timeout before starting one. However, we don't
queue the save-analysis indexing - we start that as soon as the build is
finished and can do it in parallel with another build.
It is possible that if you have a crate where the indexing time is greater
than the build time, then you could end up with lots of indexing happening
at the same time, even though multiple builds are not. I wonder if
something like that is happening here?
@sfackler <https://github.com/sfackler> could you run with
RUST_LOG=rls=debug please? That would avoid getting spammed by Racer or the
compiler.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#688 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABY2UX_n0KFHT7-dWMlbLT9BRskK1soBks5tRls_gaJpZM4Rw5tW>
.
|
I can reproduce this behaviour. The culprit in my case is racer. For this to happen racer completion must take far longer than our request-timeout, so medium/large crates. After timeout these tasks remain running in their thread. Though we do limit the max tasks to This is how I reproduce it by rapidly typing First balm for the wound will be failing requests as soon as our request work pool is full (currently they'll just stack up). This will at least bound the 100% cpu time to the completion of N requests. I'll raise a pr to do this, and log a warning of what tasks have consumed the work pool. This will also help confirm my suspicion that racer is causing this in general. Really we'd want to be able to cancel long running threads, like racer requests and builds. |
Interesting, thanks for investigating! Instead of/as well as limiting the number of active requests we should probably limit the number of active Racer tasks, probably to 1 (like we do for builds). That would allow other requests without us getting bogged down in Racer jobs. I hope we can do that without a whole queuing infrastructure like we do for builds... |
Unlike the builds where we have to deal with slowness, request actions are expected to be fast because they're generally useless unless that's true. Because of this expectation we time out such requests. Without cancellation we are faced with either blocking future tasks until the timed out task finishes, or starting the new ones additionally trusting the timed out tasks to finish before long. We go with the latter, #702 then is about mitigating worst edge cases that can cause this issue. I was thinking it was better to address this generally rather than racer specific rules. Racer shouldn't be so slow & intensive right? The #702 rules in general shouldn't be hit, for example for textDocument/completion to be failed on a 4 thread cpu we'd need 2 tasks already running. Since each task is waited on for 1.5 seconds that means one of those tasks has been running at least 3 seconds and the other 1.5 seconds. New thinking is: Once a task is taking so long it becomes acceptable shift our trust and cancel future tasks of the same type. For racer to cause this issue (6 thread utilisation) on current logic then a racer task had to running for more than 7.5-9 seconds. #702 will help, but after merging can we find out why racer is so slow & intensive? |
After I added the long-task logging I see:
So beyond #702 we need to get to the bottom of why racer takes so long. Maybe in the meantime add logic to disable racer completion functionality after we detect it's slow? |
cc @jwilm since this is a Racer issue |
This seems pretty fine as a throttling mechanism to me. It effectively detects slow runs of Racer and prevents it running - it is not optimal since we are still running Racer and thus using some CPU, but we shouldn't really crush multiple cores because we're only launching one task at a time.
I'm in favour of both general and specific mechanisms here - I don't think it is over-specific - we only outsource work to rustc, Racer, and Rustfmt, so I think specific mechanisms for all three are still OK, complexity-wise. |
Yep we can absolutely do better with what we have, that's what #702 is about. But cancellation just seems such a good fit for Rls, that's why I keep harping on about it. At least it's something to revisit when a decent solution exists in
Fair enough. It is worth noting that on some projects racer isn't slow and our assumptions work. I don't think it was particularly wrong that we have a optimistic "tasks should finish fast" approach where we leave timed out tasks on the pool and expect them not to live long. That's why I think it's ok to have 2 concurrent racer-completion tasks running. When the 2nd one runs and finds another already running it means one racer task has timed out. Optimistically we should assume that the first will finish soon and go ahead with the new request (not generally wanting to cancel requests). However, when a 3rd request comes in and we have 2 ongoing tasks we can be much more sure there's an issue and fast-fail the request. In this way we handle when completion is fast, when completion is roughly equal to the timeout and when completion is slow. Worst case: 2 cores are used for the duration of a single racer execution, while typing. I think this reasoning will apply to every kind of request. But of course if we can improve the worst case without harming the others all the better!
|
I hate to +1 with a comment, but this bug is extremely frustrating to the point where I cannot edit projects using the latest RLS build. I've reverted back to the 01-01-2018 build where I only occasionally get the issue. Restarting the RLS temporarily resolved the issue. |
I completely stopped using RLS. It happens within the first minute, so RLS has become unusable for me. |
I still see this when working on typed-headers - it's just pinned at 200% CPU now due to #702. |
We should think about how/if we can do this. It will be pretty annoying to disable Racer since it means no more code completion, but that seems better than no code completion and all of the CPU. What I'm trying to say is we need to be careful not to get too many false positives. |
Yes we need to be careful with disabling autocomplete automatically, but it would be useful if we can do it well. However, having a Rls config option for autocomplete would be fine and allow manual disabling for large projects where autocomplete isn't behaving anyway. # atom rls.toml
autocomplete = false On another front, has anyone raised an issue for racer to try to improve this upstream? The issue should be reproducible with a more basic/direct racer plugin. What's the status of autocomplete without racer, I'm guessing that's still far off? |
Sorry if this isn't the right place to ask this and I am not exactly familiar how rls works internally but why is rls actually using racer? Why can't rls "just use" the data from rustc? I am not quite sure what the status of racer is, but it seems quite buggy (it can't auto-complete features which it cannot parse, I assume?, see #588, it also seems like it uses an really old version of syntex_syntax, which may be the cause of some parsing issues?). |
@cynecx basically because we can't compile the program fast enough - once the user is typing we need to compile the program and calculate the code completion results fast enough to be useful (apx <= 0.25s). We can't get anywhere near that right now. Hopefully as incremental compilation gets better plus some other changes, we will be able to. |
I'm very much for a manual option to disable code completion. There's still a lot of value in error/warning highlighting that I can't get right now because code completion is preventing that. |
I actually cloned typed-headers and wow this project really suffers from this, you weren't joking @sfackler. I can confirm that the culprit is indeed racer completion. It's the first two requests which time out and keep running for ... I waited a long time to see how long they take. I think this project may somehow cause an infinite loop in racer. This should be reported on https://github.com/racer-rust/racer. When I restarted Rls with racer-completion disabled, #747 style, the problem vanished. That projects like this can be such hard edges for racer is good enough reason to provide the config option. |
Provides manual workaround for rust-lang#688
With #747 now merged the So, for example, in atom a line in your rls.toml with the following would disable racer completion. # rls.toml for atom
racer_completion = false |
I assume this is fixed by the last racer update, if not let me know. |
I'm using
rls-preview 0.124.0-nightly (511321a 2018-01-23)
from the latest nightly. Did request cancellation break maybe?The text was updated successfully, but these errors were encountered: