Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding RandomState::with_seed #1

Open
arthurprs opened this issue Aug 15, 2024 · 8 comments
Open

Consider adding RandomState::with_seed #1

arthurprs opened this issue Aug 15, 2024 · 8 comments

Comments

@arthurprs
Copy link

arthurprs commented Aug 15, 2024

Sometimes it's useful to create a deterministically seeded RandomState.

Even if FixedState exists, it's a different type, so the user must choose one or another at the type level instead of creation time. I suppose there are uses for both patterns.

@orlp
Copy link
Owner

orlp commented Aug 15, 2024

Your concern is legitimate and I have ran into this myself as well, unfortunately I don't currently see a zero-cost way of making it happen.

To reduce the size of hash maps RandomState only contains the per-hasher randomness (a single u64), while the rest of the random state is stored globally:

FoldHasher::with_seed(self.per_hasher_seed, self.global_seed.get())
.

We could sacrifice one bit of the per_hasher_seed to indicate whether it should use the global deterministic initializer, or the global random initializer. But this would introduce extra code (and worse, a branch) in the actual hash path.

I could expand the RandomState by 8 bytes (which inflates all hash maps by 8 bytes) to contain a reference to the global random data, and this reference would then either be the non-deterministic global seed or the deterministic static seed, but even this would add an extra mov to the normal RandomState path I believe.

@arthurprs
Copy link
Author

arthurprs commented Aug 15, 2024

The size rationale makes sense to me as this is a hashmap-optimized hasher.

Sacrificing one bit of the instance's seed sounds reasonable at first glance. It could be possible to massage that (predictable) branch to a conditional move. Maybe 3 more instructions in total.

	        let mut global_seed = &rand_global_seed;
            if self.per_hasher_seed & 1 == 1 {
                  global_seed = &fixed_global_seed;
            }
            FoldHasher::with_seed(self.per_hasher_seed, global_seed.get())

@orlp
Copy link
Owner

orlp commented Aug 15, 2024

Maybe 3 more instructions in total.

Unfortunately that is not acceptable. The current u64 hash is 5 instructions on modern x86-64:

view_assembly:
        mov     rax, qword ptr [rip + example::seed::GLOBAL_SEED_STORAGE::ha8aed685c422c912@GOTPCREL]
        xor     rdi, qword ptr [rsi]
        mov     rdx, rdi
        mulx    rax, rcx, qword ptr [rax]
        xor     rax, rcx
        ret

@orlp
Copy link
Owner

orlp commented Aug 15, 2024

It seems that storing a reference in RandomState is fine in terms of codesize:

view_assembly:
        mov     rax, qword ptr [rsi]
        xor     rdi, qword ptr [rsi + 8]
        mov     rdx, rdi
        mulx    rax, rcx, qword ptr [rax]
        xor     rax, rcx
        ret

So the question is just whether this feature is worth increasing the size of RandomState from 8 to 16 bytes. I think the answer is probably yes.

@arthurprs
Copy link
Author

arthurprs commented Aug 15, 2024

Yeah, I think so, too. It makes it more practical, in my experience. You could even have a SmallRandomState variant that optimizes for size (but generates more code), probably more useful than FixedState.

@c410-f3r
Copy link

c410-f3r commented Oct 2, 2024

Just a feedback.

I was also looking for RandomState::with_seed but ended up with FixedState after digging the documentation. For newcomers, it would be nice to have a citation about this scenario in the documentation.

@sammysheep
Copy link

Ditto on this issue. In our case, we want to pass a seed if we detect an environment variable in order to do deterministic regression testing, otherwise the normal random state is preferred.

Great work on this crate, btw!

@geeknoid
Copy link

I'd also love to see a with_seed method, or a distinct SeededRandomState struct with the feature. My use case is that I'm building fully static/const hash maps and sets. For that use case, I need to hash the keys at compile time to build the collections and then hash again at runtime in order to do lookups.

I want customers of my collections to have different random seeds from customer to customer, and indeed from build to build in order to provide effective HashDoS protection. At the moment, I use ahash for this feature. I pick some seeds during build (via a build script or procedural macros) and burn those into the binary so I use the same seeds at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants