Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rustdoc-json perf check #1512

Conversation

GuillaumeGomez
Copy link
Member

The JSON output format is taking more and more importance in rustdoc and we reached a point where performance checks would start to make sense.

Now, with that said, this remains a very specific feature and I'm not sure if it's worth having it tested by rustc-perf directly. It will also add noise considering that rustdoc benchmarks tend to be noisy ones. Maybe the solution here would be to only run this check manually or on rustdoc PRs? But before even that, do you think it's a good idea at all?

@Kobzol
Copy link
Contributor

Kobzol commented Jan 10, 2023

Maybe the solution here would be to only run this check manually or on rustdoc PRs?

Running manually is always an option, and running it automatically only on rustdoc PRs seem reasonable. With the upcoming runtime benchmarks, we will maybe also need some semi-automatic way of filtering what benchmarks should be executed, based on the PR type.

But before even that, do you think it's a good idea at all?

It sounds reasonable to me, assuming that:

  1. rustdoc-json is not a niche feature (to be honest, I have no idea how often is it used). I assume that it is used by tools to programmatically ingest rustdoc output?
  2. It tends to have real performance bottlenecks that are problematic for users. I don't know what are the absolute numbers and deviations of running JSON doc on various crates. For example, if it takes 5 seconds on the largest crate imaginable, I'm not sure if it's worth optimizing. Or, if it takes 1 seconds on the fastest one and 1.5 on the slowest one, the deviation is maybe again too small to super optimize this (these numbers were made up ofc). I also wonder how different it is in comparison to the current doc builds. I assume that it basically runs a doc build (in terms of analysis of the code), but then outputs JSON instead of HTML? If that is so, is the bottleneck truly often in the "JSON part", so that it is warranted to run two different doc benchmarks? Or will they be extremely correlated?

I'm also a bit worried about extending the Profile/Scenario enums. I already dealt with this when we were adding the "binary size" benchmarks, which basically do an opt build, but with specific flags. It would be nice if we could describe the individual build/compile parameters in a more generic/unified way, but it would clash with all of the data that we already have in the DB. Not to mention that runtime benchmarks will completely mix this up :D But this is not a problem for this PR.

@GuillaumeGomez
Copy link
Member Author

We will run some perf checks on our side and publish them here so we can have a good idea overall if it's even that useful in the first place. 👍

@Mark-Simulacrum
Copy link
Member

Running manually is always an option, and running it automatically only on rustdoc PRs seem reasonable. With the upcoming runtime benchmarks, we will maybe also need some semi-automatic way of filtering what benchmarks should be executed, based on the PR type.

I think this is hard -- you need the runs against master to compare to, and while we could make that incremental too (i.e., only collect partial results and then back-collect as needed), it seems hard to make that work well for things like noise level evaluation, etc.

So I think we should prefer running always, and see if the impact to overall time taken for collection is significant. Ultimately the current cost of the collection infrastructure is relatively low; if we invest in parallelism support in perf we could scale horizontally there relatively easily.

@aDotInTheVoid
Copy link
Member

Just ran into another case where this would be super usefull to have (rust-lang/rust#108626).

In responce to your questions:

rustdoc-json is not a niche feature

While we don't have any way to get absolute numbers, the rustdoc-types crate (which contains the type definitions needed for a rust progam to read the output) has 282,984 downloads on crates.io at the time of writing. I'm not sure weather that qualifies as niche or not.

It tends to have real performance bottlenecks that are problematic for users.

Looking further into this, I'm not sure this is true. stm32f0xx-hal (which takes 70s for HTML), is done in 0.75s

(stm32f0xx-hal v0.18.0, cargo 1.69.0-nightly (9880b408a 2023-02-28), all fresh builds)

This seems to suggest that performance (for JSON) is entirely dominated by how quickly rustc can be done, whereas rustdoc HTML can become dominant.

@aDotInTheVoid
Copy link
Member

On the other hand, theirs alot of selection bias in play here, as I chose stm32f0xx-hal because I know it's slow with rustdoc-html. Their could also be crates that are slow for rustdoc JSON, and we haven't heard about them yet.

@Kobzol
Copy link
Contributor

Kobzol commented Mar 2, 2023

To be honest, if stm32f0xx-hal (which is massive, documentation wise) takes <1s for JSON doc, then I'm not sure if it's that important to measure JSON output specifically.

On the other hand, this PR is small and it could be quite useful for local testing, so I'm not opposed to it. I wouldn't include json-doc to the perf. RLO CI suite though, as we would have to think of some way how to distinguish it from the rest of the profiles. Just adding another jsondoc profile section wouldn't be very nice, because already the existing doc results are sometimes "muddying the water" a little bit, as they are not on the same importance level as check/debug/opt.

@aDotInTheVoid
Copy link
Member

Further investigation has shown that stm32f0xx-hal might not be representitive.

Trying instead with aws-sdk-ec2:

(awslabs/aws-sdk-rust@66423e0, all fresh builds)

Which seems to suggest that Rustdoc JSON is the performance bottleneck for some users and displays completely different performance charecteristics to HTML

@Kobzol
Copy link
Contributor

Kobzol commented Mar 2, 2023

Thanks for the investigation! Then it sounds like indeed JSON output is something that could be investigated for better performance theoretically.

I would still not add it to the CI suite as it is though, we'd need to think of some way of changing the current categories. But for local profiling it's fine by me.

@jyn514
Copy link
Member

jyn514 commented Mar 11, 2023

This seems to suggest that performance (for JSON) is entirely dominated by how quickly rustc can be done, whereas rustdoc HTML can become dominant.

This is not what's slow in stm32. stm32 is dominated by this horrible O(n^4) section of code: https://github.com/rust-lang/rust/blob/c183110cc26abb506dba0a4def917735fb6eb6f0/src/librustdoc/clean/blanket_impl.rs#L22-L85

Maybe the JSON backend doesn't collect blanket impls and that's why it's closer on stm32.

@@ -291,7 +291,7 @@ impl Benchmark {
}

// Rustdoc does not support incremental compilation
if profile != Profile::Doc {
if profile.is_doc() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Suggested change
if profile.is_doc() {
if !profile.is_doc() {

@GuillaumeGomez GuillaumeGomez deleted the rustdoc-json-perf-check branch August 12, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants