Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable optimizer in tests and procmacros #5155

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ lto = false # Preserve the 'thin local LTO' for this build.
split-debuginfo = "unpacked"
lto = "thin"

# Enable basic optimizations for unittests
[profile.test]
opt-level = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons unit tests do not have optimization enabled, is to speed up the compilation for interactive development.
When working on a single unit test, you want the fastest edit/compile/run cycle.
Here is an example of a command I would run when I'm working on a specific unit test:

cargo test \
    --package solana-core \
    --test epoch_accounts_hash \
    --all-features -- \
    --exact --nocapture test_snapshots_have_expected_epoch_accounts_hash

Have you considered the increase in the compilation time when switching from opt-level 0 to 1 for these use cases?

I think, a common approach for speed improvements of this kind for CI, is to build unit tests with optimizations in CI only.
cargo supports custom profiles, so, I think, it might be possible to create a custom profile that CI will use, that would enable higher optimizations levels.

More details on the custom profiles can be found here:
https://doc.rust-lang.org/cargo/reference/profiles.html#custom-profiles

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o1 is in many cases practically free in terms of compile times. For example:

running time cargo test --package solana-core --test epoch_accounts_hash

clean-build with optimizations (o1):

Executed in  177.98 secs    fish           external
   usr time   76.21 mins    0.00 micros   76.21 mins
   sys time   10.00 mins  839.00 micros   10.00 mins

clean-build with no optimizations (default settings):

Executed in  159.20 secs    fish           external
   usr time   40.46 mins  837.00 micros   40.46 mins
   sys time    8.29 mins    0.00 micros    8.29 mins

So, we would save 15 seconds of real build time (on full rebuild), while spending far longer waiting for the actual tests to run (assuming you are running more than one at a time). Some of our tests already take minutes to complete, especially the ones in local-cluster, and we only need to build stuff once to run all the tests. Full CI runs are almost always > 1 hour. o1 gives about 10x boost in speed compared to o0.

Another aspect here is that with optimizer disabled certain kinds of bugs such as race conditions may just never show up. Essentially we end up testing something other than what we run in prod in terms of actual generated assembly, and the differences can be quite dramatic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o1 is in many cases practically free in terms of compile times. For example:

running time cargo test --package solana-core --test epoch_accounts_hash

clean-build with optimizations (o1):

Executed in  177.98 secs    fish           external
   usr time   76.21 mins    0.00 micros   76.21 mins
   sys time   10.00 mins  839.00 micros   10.00 mins

clean-build with no optimizations (default settings):

Executed in  159.20 secs    fish           external
   usr time   40.46 mins  837.00 micros   40.46 mins
   sys time    8.29 mins    0.00 micros    8.29 mins

So, we would save 15 seconds of real build time (on full rebuild), while spending far longer waiting for the actual tests to run (assuming you are running more than one at a time).

When working on a single test, I would be running just this one test.
And if the compilation is 15 seconds slower, most of these 15 seconds will be added to my edit/compile/run cycle. The speed-up in a single test execution is normally unnoticeable.

Are you using a multithreaded linker, such as mold?

Some of our tests already take minutes to complete, especially the ones in local-cluster

local-cluster is the most extreme example.
And it is not a unit test, but an integration test.
This is why it is a separate crate that can have optimization configured for it separately.

we only need to build stuff once to run all the tests. Full CI runs are almost always > 1 hour. o1 gives about 10x boost in speed compared to o0.

This is the CI use case.
And I think speeding up our CI is a great cause.

But for the developer use case, you normally compile one crate and run one test.
And if unit tests are small and fast, you do not benefit from the speed-up.
But suffer from the compilation time increase.

Another aspect here is that with optimizer disabled certain kinds of bugs such as race conditions may just never show up. Essentially we end up testing something other than what we run in prod in terms of actual generated assembly, and the differences can be quite dramatic.

While it is great to be able to find UB issues, I am not sure if this is a good argument here.
I do not have a comprehensive statistics, but my guess is that depending on the UB you may or may not observe it at different optimization levels.
So, it could be the switch between them, that is fruitful.
Or, you could miss it in any case.
You can not really rely on the unit tests to catch the undefined behavior.

Copy link

@ilya-bobyr ilya-bobyr Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While a 15-second increase in the compilation time, I believe, is a rather objective thing to consider, this reminded me of this XKCD:

https://xkcd.com/1172/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 sec is for a full rebuild after cargo clean. while incrementally building one test the difference is not really measurable.


# Enable optimizations for procmacros for faster recompile
[profile.dev.build-override]
opt-level = 1

[workspace]
members = [
"account-decoder",
Expand Down
8 changes: 0 additions & 8 deletions accounts-db/src/append_vec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1277,13 +1277,6 @@ pub mod tests {
}
}

fn get_executable_byte(&self) -> u8 {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function was only used in a test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated to the unit tests speed up.
Why not move it into a separate PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason this change is happening is to make optimizations possible for unittests. For more substantial changes separate PRs have been made by teams handling respective pieces of the code. I'll be making some as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this check is not really useful (as is claimed in the discussion) it can be done in a separate PR, as it has value by itself.
Making things simpler and avoiding UB, IIUC.
And merging smaller PRs is generally easier.

For example, should you end up introducing a separate compilation profile for the CI, this PR will increase in size and will take some time to land.

But this particular change might go in pretty quickly on its own.

It is up to you, of course. I was just speaking from personal experience :)

let executable_bool: bool = self.executable();
// UNSAFE: Force to interpret mmap-backed bool as u8 to really read the actual memory content
let executable_byte: u8 = unsafe { std::mem::transmute::<bool, u8>(executable_bool) };
executable_byte
}

fn set_executable_as_byte(&self, new_executable_byte: u8) {
// UNSAFE: Force to interpret mmap-backed &bool as &u8 to write some crafted value;
unsafe {
Expand Down Expand Up @@ -1830,7 +1823,6 @@ pub mod tests {
{
let executable_bool: bool = account.executable();
assert!(!executable_bool);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this assert as per advice from @roryharr

Copy link

@roryharr roryharr Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More Context: This assert was added a long time ago by ryoqun just to monitor some undefined behaviour.

The executable bit is a boolean which uses a byte of storage. The only valid values for the byte are zero and one. This is verified during account load by the function sanitize_executable.

This test corrupts the byte, and verifies that sanitize executable fails, which it does. Afterwards it has various asserts to see how it behaves in the event the sanitize check was not present. As per ryoquns comments these are purely for his interest.

When compiling with optimizations on, the compiler behaviour changes when accessing the byte by value.
u8 = unsafe { std::mem::transmute::<bool, u8>(executable_bool) }; returns the raw byte, rather than the underlying data converted to a bool, and then converted to a u8.

#1485 (comment)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun In case you are still interested

assert_eq!(account.get_executable_byte(), 0); // Wow, not crafted_executable!
}
})
.unwrap();
Expand Down
1 change: 1 addition & 0 deletions gossip/src/cluster_info.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3752,6 +3752,7 @@ mod tests {
let mut node = cluster_info.my_contact_info.write().unwrap();
node.set_shred_version(42);
}
std::thread::sleep(Duration::from_millis(1));
cluster_info.refresh_my_gossip_contact_info();
cluster_info.flush_push_queue();
// Should now include both epoch slots.
Expand Down
Loading