Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage by deduplicating type information #649

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ChayimFriedman2
Copy link
Contributor

We were storing the type information, 3 words wide, for each memo in each slot, while it is always constant wrt. the ingredient (different slots of the same ingredients will always have the same memos in the same order). This introduces some more unsafety, and the result wasn't as fast so I also had to use some lock-free structures, but the result is worth it: this shaves off 230mb from rust-analyzer with new Salsa.

Copy link

netlify bot commented Jan 5, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit fd9cfb8
🔍 Latest deploy log https://app.netlify.com/sites/salsa-rs/deploys/67bdeafb190eda0008c5a7a0

Copy link

codspeed-hq bot commented Jan 5, 2025

CodSpeed Performance Report

Merging #649 will not alter performance

Comparing ChayimFriedman2:common-typeid-real (fd9cfb8) with master (7d417ac)

Summary

✅ 11 untouched benchmarks

@ChayimFriedman2 ChayimFriedman2 force-pushed the common-typeid-real branch 2 times, most recently from 2ecdea8 to 13ecc36 Compare January 7, 2025 06:41
Copy link
Member

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea, but I'm finding myself a bit confused-- we have some tech debt here in that the mdbook is not complete or up-to-date, I'm realizing that I wish it were, because I'd love to read an mdbook update explaining the data structure(s) here. It feels like we could be more efficient still, but I've probably just not fully brought everything back in cache

@ChayimFriedman2
Copy link
Contributor Author

@nikomatsakis I addressed your comments.

@MichaReiser
Copy link
Contributor

MichaReiser commented Jan 26, 2025

The benchmark regressions are a bit concerning (as high as 8%). Do we understand where they come from?

Copy link
Member

@Veykril Veykril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, could we maybe store the types info in the corresponding Page instead? Then we only have duplication of type info among pages which is neglible while not having to go through the ingredient on lookups. It would also keep more info (for drop and the like) localized

@ChayimFriedman2
Copy link
Contributor Author

Hmm... If we share it in an Arc it should be very little duplication.

@nikomatsakis
Copy link
Member

I don't quite follow here--I'll try to make time tomorrow to review this PR. I'm still finding myself a bit confused about what type information exactly we are deduplicating and where.

@ChayimFriedman2
Copy link
Contributor Author

@Veykril I implemented your suggestion to store the types on the page.

@ChayimFriedman2 ChayimFriedman2 force-pushed the common-typeid-real branch 3 times, most recently from 1abc1d8 to 793c14b Compare February 3, 2025 21:41
@ChayimFriedman2
Copy link
Contributor Author

@MichaReiser I've investigated the perf regressions and I think they should be fixed now.

@MichaReiser
Copy link
Contributor

Thanks. The regression now seems to be specific to benchmarks where we primarily create elements. Can we try to optimize that code path as well (a 18% regression is rather significant)

@ChayimFriedman2
Copy link
Contributor Author

@MichaReiser I was referring to that regression, I think I fixed it.

@MichaReiser
Copy link
Contributor

@ChayimFriedman2 Hmm, in that case I'm unsure if the underlying problem was addressed because the codspeed benchmark still show an 18% regression.

@ChayimFriedman2
Copy link
Contributor Author

@MichaReiser It wasn't reran back then, now it shows an 11% regression. I'll see if I can shrink it even more.

@ChayimFriedman2
Copy link
Contributor Author

I think the benchmarks are flawed. They re-create the ingredients on every iteration, which is very much not like real-world usages which create the ingredient once. I'll put a PR to fix this.

@ChayimFriedman2
Copy link
Contributor Author

OK, after #667 it shows there is no perf regression.

@nikomatsakis
Copy link
Member

@ChayimFriedman2 needs rebase

@ChayimFriedman2 ChayimFriedman2 force-pushed the common-typeid-real branch 2 times, most recently from 6e5346f to 681547d Compare February 21, 2025 06:04
@ChayimFriedman2
Copy link
Contributor Author

@nikomatsakis Rebased.

Comment on lines +89 to +98
/// Access the [`MemoTable`] for this slot.
///
/// # Safety
///
/// You must have mutable access to the slot.
unsafe fn memos_no_revision(&self) -> &MemoTable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why does this not just take &mut self

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problems with the borrow checker when calling Page::get() borrows the entire page then we cannot access Page::memo_types.

Comment on lines -303 to +368
fn memos_mut(&mut self, slot: SlotIndex) -> &mut MemoTable {
self.get_mut(slot).memos_mut()
fn memos_mut(&mut self, slot: SlotIndex) -> MemoTableWithTypes<'_> {
// SAFETY: We have a `&mut` reference.
let memos = unsafe { self.get(slot).memos_no_revision() };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unnecessary to me, we can just use get_mut and keep the memos_mut accessor as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we need to attach the types to it anyway. I can change create a new type MemoTableWithTypesMut, but this just seems unnecessary. And given it does not return a mutable reference anymore, memos_mut() seems inappropriate name now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the name change, I was more raising this regarding the unnecessary unsafe block. There is no reason to decay the mutable references in this call chain to an immutable one, doing so results in this extra unsafe that isn't really necessary otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, given your other comment to my review I guess that would require duplicating the function for mutability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by

duplicating the function for mutability

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problems with the borrow checker when calling Page::get() borrows the entire page then we cannot access Page::memo_types.

I interpreted this as you'd needing to keep the current version for the borrowck issue either way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what I meant is that if I this function takes &mut self, then Page.get_mut().memos_mut() clashes with Page.memo_types, leading to a borrowck error.

@ChayimFriedman2 ChayimFriedman2 force-pushed the common-typeid-real branch 3 times, most recently from 246d80e to 4eea15f Compare February 24, 2025 22:15
@MichaReiser
Copy link
Contributor

OK, after #667 it shows there is no perf regression.

It seems the perf regression is back :(

We were storing the type information, 3 words wide, for each memo in each slot, while it is always constant wrt. the ingredient (different slots of the same ingredients will always have the same memos in the same order). This introduces some more unsafety, and the result wasn't as fast so I also had to use some lock-free structures, but the result is worth it: this shaves off 230mb from rust-analyzer with new Salsa.
@ChayimFriedman2
Copy link
Contributor Author

@MichaReiser I rebased and now they're at 5% max. It's hard to investigate because codspeed doesn't identify the same methods, so I don't know what is the cause, but the 8% I'm pretty sure was measurement error or something like that (e.g. it showed a regression for shallow_verify_memo(), where literally nothing it calls changed).

@MichaReiser
Copy link
Contributor

The benchmarks have been fairly stable after @Veykril made some changes. That's why I'm not convinced that they're outright wrong. Have you tried recording a profile locally and e.g. comparing them in firefox-profiler?

@Veykril
Copy link
Member

Veykril commented Feb 25, 2025

There does seem to be some noise still left as my rebased #710 version reports fairly different perf changes now (though I don't think the PR here is entirely noise)

@MichaReiser
Copy link
Contributor

There does seem to be some noise still left as my rebased #710 version reports fairly different perf changes now (though I don't think the PR here is entirely noise)

Rebasing does change the base to which codspeed reports the performance. So that makes sense to some extend. What's interesting is that the perf results are very stable for e.g. https://codspeed.io/salsa-rs/salsa/branches/Veykril%3Aveykril%2Fpush-orlktsrpwzrv

@ChayimFriedman2
Copy link
Contributor Author

I'm trying to investigate the regressions locally now.

@ChayimFriedman2
Copy link
Contributor Author

The benchmark regressions seem unavoidable (I also suspect part of them is noise), and they are not large. The main reason is having two different RwLocks to lock and unlock (only read lock though) when accessing a memo: the types and the memos, instead of just one. I can't see an easy way to get rid of this, so I think the memory decrease is worth the perf penalty (some ideas I had about how we can potentially mitigate the lock on the types: synchronizing on the memos lock instead, but that means we have to go through the entire database and lock each and every memo when we want to register an ingredient; not having a lock, and instead use an atomic pointer and replace it with the new types when registering an ingredient, either with an arc-swap or without even reference counting: arc-swap is doable, but the overhead is already pretty tiny and I don't know how much it'll save, and custom-made implementation will mean we need some kind of GC to collect the unreachable types on new revision - we already have one for memos, but expanding it to types won't be trivial; or even statically register the memo ingredients at compile-time with something like linkme - while I look the idea, it will require throughout design work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants