-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement "pipelined" rustc compilation #6660
Comments
Duplicate of #4831? |
Heh yep! I'm gonna close that one in favor of this though |
What is the reason for using a signalling mechanism instead of just using 2 rustc processes? (The first for the metadata and the second for the rlib.) Then the 2nd and 3rd of the proposed steps largely disappear, as the "signal" for the metadata being ready is just that the metadata rustc process returns. I believe this would allow better integration with other build systems since the dependency information would be captured in the DAG rather than via a special interaction between cargo and rustc. |
From when we discussed the previous issue, my only quibble here is that for consumption of a build plan by external build tools the "rustc sends a signal to cargo" part of the design gives a fairly tight coupling. Using a single rustc process to emit both the metadata and the rlib would of course be more efficient than spawning rustc twice in series, but using separate processes would be very straightforward to represent in a build plan and also likely require fewer changes to cargo's existing job execution architecture. I don't know enough about the internals of rustc to offer an informed opinion about whether the latter approach would be feasible from that side of things. |
@mshal despite incremental compilation being implemented and fairly quick it's not "zero time" and so to get maximal benefit we'd need to keep rustc running and avoid it having to re-parse all internal state and re-typecheck and all that (even if incremental). It's true it might be simpler to just emit the metadata and spawn another process, but I'd fear that it wouldn't be as big of a win as we could possibly get. @luser currently build plans are already a very lossy view of what Cargo does (although we have plans to fix that!). Depending on how the implementation goes here we could always of course generate a build plan that spawns multiple processes. |
I feel like while two processes wouldn't get the same benefits right away, it definitely offers a path to the same benefits, by allowing later addition of I'd favor a simpler and easy-to-drive MVP with a path towards optimal results over a more-complex and harder-to-drive MVP that tries to jump there in one go. |
I think that while we should still experiment with this, we should be wary of stabilizing something here, since MIR-only rlibs and/or multi-crate incremental sessions may end up being better solutions overall. That is, this pipelining is something we should've done a long time ago, but now we're not so sure it's the best way. Other than that though, I think experimenting with this and gathering statistics on it is good! |
The version that invokes From a One thing that might be interesting: Once |
@eddyb I disagree that we should continue to block for MIR-only rlibs, we've collected data and research showing that it's not a silver bullet and has fundamental downsides that the current model of compilation solves. In that sense I wouldn't consider this an experiment, but this is how I'd personally like Cargo to permanently compile Rust code. If MIR-only rlibs ship one day then it'd presumably also benefit from a pipelining architecture, although perhaps not quite as much. @michaelwoerister correct, the two-process version should work after rmetas are consume! Also FWIW I'd personally like to push as much as possible for "this is an unstable interface between rustc". I find the current state somewhat draconian where if we want Cargo, the primary build tool for Rust, to do anything different today we have to go through a huge stabilization process. This hugely hinders experimentation and development of a much richer interface. Cargo has all sorts of knowledge that rustc has to either relearn and/or spend time recompution. This is just a tiny feature ("signal something via some method") which is quite easy technically to implement in rustc and would be a bummer if it is slowed down via process. |
@alexcrichton I don't disagree regarding the current situation, but I suppose if you're taking those downsides into account, the long term silver bullet might be multi-crate compilation sessions. I also don't want to block anything here, as long as it stays between Cargo and rustc. |
One additional issue with the single process approach is that it would break sccache unless we also implemented support for making sccache act as an intermediary between cargo and rustc (this would be a bit of a pain but not impossible). However, sccache also gained support for icecream-style distributed compilation, including distributed Rust compilation, and I don't think the single-process model would be compatible with distributed compilation. This would be especially unfortunate because a distributed compilation configuration would stand to benefit the most from unlocking extra paralellism in the build. |
@luser That just sounds like sccache is misplaced in the compilation process (which had been my suspicion for a long time, given the differences in compilation models of file-based compilers and rustc, which multi-crate sessions only exacerbate). Distributed rustc could work in a similar fashion to "parallel rustc", albeit we need a design based more on opportunistic deduplication between threads than locks everywhere. At the very least, what you want to share is the new incremental persistence, not the old metadata. EDIT: my bad, @eternaleye pointed out on IRC that you were talking about the signalling approach and not necessarily anything regarding further developments in incremental/parallel rustc. Regarding that, I agree running rustc twice is closer to a "pure computation dependency graph" which is the ideal case for distributed compilation. |
I think we forgot to write this down somewhere, but I want to also propose an alternative that I believe @ehuss came up with. Let's start with a clean slate, given rustc today. Instead of the above signaling proposal, let's instead:
And that's it! This seems much more amenable to me and would also natively work with tools like sccache which would presumably want to preserve the output of the compiler as well and Cargo would simply pick it up as usual. It also feels a bit cleaner and honestly easier to implement than a signaling solution, and continues to put most of the burden of complexity on Cargo which seems appropriate in this situation. |
This commit starts to lay the groundwork for rust-lang#6660 where Cargo will invoke rustc in a "pipelined" fashion. The goal here is to execute one command to produce both an `*.rmeta` file as well as an `*.rlib` file for candidate compilations. In that case if another rlib depends on that compilation, then it can start as soon as the `*.rmeta` is ready and not have to wait for the `*.rlib` compilation. The major refactoring in this commit is to add a new form of `CompileMode`: `BuildRmeta`. This mode is introduced to represent that a dependency edge only depends on the metadata of a compilation rather than the the entire linked artifact. After this is introduced the next major change is to actually hook this up into the dependency graph. The approach taken by this commit is to have a postprocessing pass over the dependency graph. After we build a map of all dependencies between units a "pipelining" pass runs and actually introduces the `BuildRmeta` mode. This also makes it trivial to disable/enable pipelining which we'll probably want to do for a preview period at least! The `pipeline_compilations` function is intended to be extensively documented with the graph that it creates as well as how it works in terms of adding `BuildRmeta` nodes into the dependency graph. This commit is not all that will be required for pieplining compilations. It does, however, get the entire test suite passing with this refactoring. The way this works is by ensuring that a pipelined unit, one split from `Build` into both `Build` and `BuildRmeta`, to be a unit that doesn't actually do any work. That way the `BuildRmeta` actually does all the work currently and we should have a working Cargo like we did before. Subsequent commits will work in updating the `JobQueue` to account for pipelining... Note that this commit itself doesn't really contain any tests because there's no functional change to Cargo, only internal refactorings. This does have a large impact on the test suite because the `--emit` flag has now changed by default, so lots of test assertions needed updating.
This commit starts to lay the groundwork for rust-lang#6660 where Cargo will invoke rustc in a "pipelined" fashion. The goal here is to execute one command to produce both an `*.rmeta` file as well as an `*.rlib` file for candidate compilations. In that case if another rlib depends on that compilation, then it can start as soon as the `*.rmeta` is ready and not have to wait for the `*.rlib` compilation. Initially attempted in rust-lang#6864 with a pretty invasive refactoring this iteration is much more lightweight and fits much more cleanly into Cargo's backend. The approach taken here is to update the `DependencyQueue` structure to carry a piece of data on each dependency edge. This edge information represents the artifact that one node requires from another, and then we a node has no outgoing edges it's ready to build. A dependency on a metadata file is modeled as just that, a dependency on just the metadata and not the full build itself. Most of cargo's backend doesn't really need to know about this edge information so it's basically just calculated as we insert nodes into the `DependencyQueue`. Once that's all in place it's just a few pieces here and there to identify compilations that *can* be pipelined and then they're wired up to depend on the rmeta file instead of the rlib file.
This commit starts to lay the groundwork for rust-lang#6660 where Cargo will invoke rustc in a "pipelined" fashion. The goal here is to execute one command to produce both an `*.rmeta` file as well as an `*.rlib` file for candidate compilations. In that case if another rlib depends on that compilation, then it can start as soon as the `*.rmeta` is ready and not have to wait for the `*.rlib` compilation. Initially attempted in rust-lang#6864 with a pretty invasive refactoring this iteration is much more lightweight and fits much more cleanly into Cargo's backend. The approach taken here is to update the `DependencyQueue` structure to carry a piece of data on each dependency edge. This edge information represents the artifact that one node requires from another, and then we a node has no outgoing edges it's ready to build. A dependency on a metadata file is modeled as just that, a dependency on just the metadata and not the full build itself. Most of cargo's backend doesn't really need to know about this edge information so it's basically just calculated as we insert nodes into the `DependencyQueue`. Once that's all in place it's just a few pieces here and there to identify compilations that *can* be pipelined and then they're wired up to depend on the rmeta file instead of the rlib file.
Implement the Cargo half of pipelined compilation (take 2) This commit starts to lay the groundwork for #6660 where Cargo will invoke rustc in a "pipelined" fashion. The goal here is to execute one command to produce both an `*.rmeta` file as well as an `*.rlib` file for candidate compilations. In that case if another rlib depends on that compilation, then it can start as soon as the `*.rmeta` is ready and not have to wait for the `*.rlib` compilation. Initially attempted in #6864 with a pretty invasive refactoring this iteration is much more lightweight and fits much more cleanly into Cargo's backend. The approach taken here is to update the `DependencyQueue` structure to carry a piece of data on each dependency edge. This edge information represents the artifact that one node requires from another, and then we a node has no outgoing edges it's ready to build. A dependency on a metadata file is modeled as just that, a dependency on just the metadata and not the full build itself. Most of cargo's backend doesn't really need to know about this edge information so it's basically just calculated as we insert nodes into the `DependencyQueue`. Once that's all in place it's just a few pieces here and there to identify compilations that *can* be pipelined and then they're wired up to depend on the rmeta file instead of the rlib file. Closes #6660
I've made a post on internals about evaluating pipelined compilation now that nightly Cargo/rustc both fully support pipelined compilation. |
One possible feature we've talked about for a long time but never got around to implementing is the concept of pipelined rustc compilation. Currently today let's say that we have a crate A and a crate B that depends on A. Let's also say we're both compiling rlibs. When compiling this project Cargo will compile A first and then wait for it to finish completely before starting to compile B.
In reality, though, the compiler doesn't need the full compilation results of A to start B. Instead the compiler only needs metadata from A to start compiling B. Ideally Cargo would start compiling B as soon as A's metadata is ready to go.
This idea of pipelining rustc and starting rustc sooner doesn't reduce the overal work being done on each build, but it does in theory greatly increase the parallelism of the build as we can spawn rustc faster and keep all of a machine's cores warm doing Rust compilation. This is expected to have even bigger wins in release mode where post-metadata work in the compiler often takes quite some time. Furthermore incremental release builds should see huge wins because during incremental rebuilds of a chain of crates you can keep all cores busy instead of just dealing with one crate at a time.
There's three main parts of this implementation that need to happen:
In the ideal world the compiler would also wait just before linking for Cargo to let it know that all dependencies are ready. That's somewhat difficult, however, so I think it's probably best to start out incrementally and simply say that Cargo doesn't start a compilation that requires linking until all dependencies are finished (as it does today).
I've talked with @ehuss about possibly implementing this as well as the compiler team about the idea, but I think this is a small enough chunk of work (although certainly not trivial) to be done in the near future!
The text was updated successfully, but these errors were encountered: