Add rustdoc-json perf check #1512

GuillaumeGomez · 2023-01-10T14:32:07Z

The JSON output format is taking more and more importance in rustdoc and we reached a point where performance checks would start to make sense.

Now, with that said, this remains a very specific feature and I'm not sure if it's worth having it tested by rustc-perf directly. It will also add noise considering that rustdoc benchmarks tend to be noisy ones. Maybe the solution here would be to only run this check manually or on rustdoc PRs? But before even that, do you think it's a good idea at all?

Kobzol · 2023-01-10T14:45:52Z

Maybe the solution here would be to only run this check manually or on rustdoc PRs?

Running manually is always an option, and running it automatically only on rustdoc PRs seem reasonable. With the upcoming runtime benchmarks, we will maybe also need some semi-automatic way of filtering what benchmarks should be executed, based on the PR type.

But before even that, do you think it's a good idea at all?

It sounds reasonable to me, assuming that:

rustdoc-json is not a niche feature (to be honest, I have no idea how often is it used). I assume that it is used by tools to programmatically ingest rustdoc output?
It tends to have real performance bottlenecks that are problematic for users. I don't know what are the absolute numbers and deviations of running JSON doc on various crates. For example, if it takes 5 seconds on the largest crate imaginable, I'm not sure if it's worth optimizing. Or, if it takes 1 seconds on the fastest one and 1.5 on the slowest one, the deviation is maybe again too small to super optimize this (these numbers were made up ofc). I also wonder how different it is in comparison to the current doc builds. I assume that it basically runs a doc build (in terms of analysis of the code), but then outputs JSON instead of HTML? If that is so, is the bottleneck truly often in the "JSON part", so that it is warranted to run two different doc benchmarks? Or will they be extremely correlated?

I'm also a bit worried about extending the Profile/Scenario enums. I already dealt with this when we were adding the "binary size" benchmarks, which basically do an opt build, but with specific flags. It would be nice if we could describe the individual build/compile parameters in a more generic/unified way, but it would clash with all of the data that we already have in the DB. Not to mention that runtime benchmarks will completely mix this up :D But this is not a problem for this PR.

GuillaumeGomez · 2023-01-10T14:48:29Z

We will run some perf checks on our side and publish them here so we can have a good idea overall if it's even that useful in the first place. 👍

Mark-Simulacrum · 2023-01-10T14:48:37Z

Running manually is always an option, and running it automatically only on rustdoc PRs seem reasonable. With the upcoming runtime benchmarks, we will maybe also need some semi-automatic way of filtering what benchmarks should be executed, based on the PR type.

I think this is hard -- you need the runs against master to compare to, and while we could make that incremental too (i.e., only collect partial results and then back-collect as needed), it seems hard to make that work well for things like noise level evaluation, etc.

So I think we should prefer running always, and see if the impact to overall time taken for collection is significant. Ultimately the current cost of the collection infrastructure is relatively low; if we invest in parallelism support in perf we could scale horizontally there relatively easily.

aDotInTheVoid · 2023-03-02T13:18:16Z

Just ran into another case where this would be super usefull to have (rust-lang/rust#108626).

In responce to your questions:

rustdoc-json is not a niche feature

While we don't have any way to get absolute numbers, the rustdoc-types crate (which contains the type definitions needed for a rust progam to read the output) has 282,984 downloads on crates.io at the time of writing. I'm not sure weather that qualifies as niche or not.

It tends to have real performance bottlenecks that are problematic for users.

Looking further into this, I'm not sure this is true. stm32f0xx-hal (which takes 70s for HTML), is done in 0.75s

(stm32f0xx-hal v0.18.0, cargo 1.69.0-nightly (9880b408a 2023-02-28), all fresh builds)

This seems to suggest that performance (for JSON) is entirely dominated by how quickly rustc can be done, whereas rustdoc HTML can become dominant.

aDotInTheVoid · 2023-03-02T13:22:22Z

On the other hand, theirs alot of selection bias in play here, as I chose stm32f0xx-hal because I know it's slow with rustdoc-html. Their could also be crates that are slow for rustdoc JSON, and we haven't heard about them yet.

Kobzol · 2023-03-02T13:40:16Z

To be honest, if stm32f0xx-hal (which is massive, documentation wise) takes <1s for JSON doc, then I'm not sure if it's that important to measure JSON output specifically.

On the other hand, this PR is small and it could be quite useful for local testing, so I'm not opposed to it. I wouldn't include json-doc to the perf. RLO CI suite though, as we would have to think of some way how to distinguish it from the rest of the profiles. Just adding another jsondoc profile section wouldn't be very nice, because already the existing doc results are sometimes "muddying the water" a little bit, as they are not on the same importance level as check/debug/opt.

aDotInTheVoid · 2023-03-02T13:46:29Z

Further investigation has shown that stm32f0xx-hal might not be representitive.

Trying instead with aws-sdk-ec2:

cargo +nightly rustdoc --timings -p aws-sdk-ec2 -- -w html: HTML Takes 50s, out of 70s total
cargo +nightly rustdoc --timings -p aws-sdk-ec2 -- -w json -Z unstable-options: JSON Takes 33s, out of 50s total

(awslabs/aws-sdk-rust@66423e0, all fresh builds)

Which seems to suggest that Rustdoc JSON is the performance bottleneck for some users and displays completely different performance charecteristics to HTML

Kobzol · 2023-03-02T13:51:24Z

Thanks for the investigation! Then it sounds like indeed JSON output is something that could be investigated for better performance theoretically.

I would still not add it to the CI suite as it is though, we'd need to think of some way of changing the current categories. But for local profiling it's fine by me.

jyn514 · 2023-03-11T14:12:09Z

This seems to suggest that performance (for JSON) is entirely dominated by how quickly rustc can be done, whereas rustdoc HTML can become dominant.

This is not what's slow in stm32. stm32 is dominated by this horrible O(n^4) section of code: https://github.com/rust-lang/rust/blob/c183110cc26abb506dba0a4def917735fb6eb6f0/src/librustdoc/clean/blanket_impl.rs#L22-L85

Maybe the JSON backend doesn't collect blanket impls and that's why it's closer on stm32.

blyxyas · 2023-08-18T10:48:39Z

collector/src/benchmark/mod.rs

@@ -291,7 +291,7 @@ impl Benchmark {
                }

                // Rustdoc does not support incremental compilation
-                if profile != Profile::Doc {
+                if profile.is_doc() {


typo?

Suggested change

if profile.is_doc() {

if !profile.is_doc() {

Add rustdoc-json perf check

0dc32ad

GuillaumeGomez mentioned this pull request Jan 10, 2023

Clean up code conversions of Profile enum #1511

Merged

This was referenced Jan 10, 2023

rustdoc-json: Track Performance rust-lang/rust#94141

Open

Rustdoc JSON 2023 Roadmap rust-lang/rust#106697

Open

aDotInTheVoid mentioned this pull request Mar 2, 2023

rustdoc-json: switch from HashMap to FxHashMap to fix non-determinism rust-lang/rust#108626

Merged

blyxyas reviewed Aug 18, 2023

View reviewed changes

GuillaumeGomez closed this Aug 12, 2024

GuillaumeGomez deleted the rustdoc-json-perf-check branch August 12, 2024 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rustdoc-json perf check #1512

Add rustdoc-json perf check #1512

GuillaumeGomez commented Jan 10, 2023

Kobzol commented Jan 10, 2023

GuillaumeGomez commented Jan 10, 2023

Mark-Simulacrum commented Jan 10, 2023

aDotInTheVoid commented Mar 2, 2023

aDotInTheVoid commented Mar 2, 2023

Kobzol commented Mar 2, 2023

aDotInTheVoid commented Mar 2, 2023

Kobzol commented Mar 2, 2023

jyn514 commented Mar 11, 2023

blyxyas Aug 18, 2023

Add rustdoc-json perf check #1512

Add rustdoc-json perf check #1512

Conversation

GuillaumeGomez commented Jan 10, 2023

Kobzol commented Jan 10, 2023

GuillaumeGomez commented Jan 10, 2023

Mark-Simulacrum commented Jan 10, 2023

aDotInTheVoid commented Mar 2, 2023

aDotInTheVoid commented Mar 2, 2023

Kobzol commented Mar 2, 2023

aDotInTheVoid commented Mar 2, 2023

Kobzol commented Mar 2, 2023

jyn514 commented Mar 11, 2023

blyxyas Aug 18, 2023

Choose a reason for hiding this comment