-
-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement SeedableRng::seed_from_u64 #537
Conversation
rand_core/src/lib.rs
Outdated
|
||
for (i1, r1) in results.iter().enumerate() { | ||
let weight = r1.count_ones(); | ||
assert!(weight >= 20); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment how likely this is to fail if the seeds were random.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the binomial distribution B(64, 0.5)
, so chance of failure is F(20; 64, 0.5)
using the notation on wikipedia, which is 0.000781 (0.07%) according to some calculator I found online.
Maybe we should evaluate the different choices of RNG to see how they recover from a seed with lots of zeros? |
Go ahead if you like. Since the basis of this is |
I'm working on it. 😄 |
I agree PCG is a better choice than SplitMix here. I added If you see SplitMix as repeatedly using a hash with good avalanche (MurmurHash) to extend the seed it seems theoretically a little nicer. But I can't imagine a situation where the results from PCG aren't good enough. @dhardy What do you think of including something like a And can you update ISAAC to move their |
@pitdicker Which seeds for SplitMix are weak? |
I'll take it back, that was probably an unnecessary step (wrote to much from memory). |
Okay, are you still saying PCG is a better choice? I'm curious for what reason. |
Not sure what you mean; using MurmurHash repeatedly sounds like a completely different PRNG. Essentially SplitMix is simply a Weyl sequence (
Simply repeating the seed should work fine for those PRNGs but so should just using the default implementation of
👍 (And I guess in this case it makes sense to avoid value-breakage.) |
It was shown that using the same RNG for seeding as for generating can introduce correlations. I would not feel comfortable doing this without testing. |
I think he meant that splitmix64 uses a MurmurHash avalanche function as its output function. @dhardy I still would not implement |
@vks you propose not adding to
Not couldn't, but may not (always) want to. Scientific simulations (stochastic models) are models with obvious limitations, but it is of course desirable to rule out as many sources of bias as possible — therefore it may be desirable to use a cryptographic generator (or multiple generators) just for quality and/or diversity. Of course the seed used doesn't matter, but it may be desirable to make results reproducible (e.g. a project I was working on used volunteer computing and replication of some work units as a measure of protection against bad results). |
I added specific implementations for the ISAAC generators as suggested. General notes:
Another option for In a couple of ways, not having a default implementation is better:
The disadvantage is that we then need to add implementation helpers (perhaps
See #543 |
02750aa
to
52563b0
Compare
Yes, this is what I was suggesting. It does not support using them for just their statistical quality, as you mentioned.
This is similar to my suggestion, except that it would not force them to implement it. |
I still think we should make all seedable RNGs implement Interesting point in your other thread about not implementing |
Implementation strategy for adding as a required function without default:
|
If we don't provide a default implementation, why do we want to force implementors of There might be use cases for manually seeding I think we might want to offer reproducibility guarantees for minor versions of Rand. |
Maybe there's just insufficient use-cases to justify the existence of this method. The better option may be a "universal seeder" based on a hash function, e.g. SipHash. The hash itself can be used to extend the result as necessary. But rather than push that functionality into another hash function implementation or into Rand proper, it may be better to create a sub-crate for it ( Usage could then be something like: let rng = Seeder::from(x).make_rng::<Xoshiro128>();
// or, with less "magic":
let mut seeder = Seeder::new();
seeder.hash(x);
let mut seed = <Xoshiro128 as SeedableRng>::Seed::default();
seeder.fill(&mut seed[..]);
let rng = Xoshiro128::from_seed(seed); |
Another option would be |
I think let seed = [0u8; 32];
for i in 0..4 {
seed[i] = myseed[i];
}
let mut rng = StdRng::from_seed(seed); or someone doing let mut rng = StdRng::from_entropy();
let my_secure_token = Alphanumeric.sample_iter(&mut rng).take(6).collect::<String>(); I.e. we can't prevent people from doing stupid stuff. The best thing we can do is make it easier to do the right thing than the wrong thing. I.e. have an API which make it easy to get a CSPRNG which has been seeded from a good source of entropy. Then point to that API in the documentation for the I think the "recipes" in the documentation and the simplicity of the |
I much liked this PR and would like to see it merged as it is now (please don't close 😄). It is a win for users porting from rand 0.3 and 0.4, for simple tests, games, and possibly simulations. When it comes to security, we already make it easy to do the right thing with And I don't think we can really design Rand with the motto 'make it hard to do the insecure thing'. Good security practices are just not always possible to make convenient. And for many use cases of Rand convenience is an important aspect. Since 0.4 we made good improvements for security and correctness when it comes to seeding: easy seeding from the OS, error reporting, handling of endianness, not silently accepting small sizes. But we do not have to go overboard with it, and use 'security' as a reason to not add simple, reproducible seeding that can benefit many use cases. |
What is this discussion around removing |
No love for #554 then? It covers the "convenient seeding" thing nicely enough IMO, but is complicated enough to warrant being a separate crate. One strange thing about adding The other thing to consider is use-cases. Many users don't care about reproducibility; they don't need this. Then there are those wanting a random secret generator; they can't use it. That leaves only simulations and a few games wanting reproducibility. Using an external crate like #554 should be good enough for these people, and that one is significantly more capable (I don't want to say it's suitable for cryptography because I haven't tried doing any crypto-analysis, but it carries 256 bits of state and uses a fairly strong algorithm, so it should be good enough — though not for password hashing). |
I think #554 is awesome. I could definitely see making a separate crate for it for the use cases that would benefit from it. But I think most use cases that wants reproducibility doesn't need something that powerful. I agree with you that But I really like the |
Sorry, didn't look at the open PRs yet... You have resurrected the hashing approach! I still feel a bit uneasy about adding hashing to a core randomness trait, the idea we had in dhardy#18. There was no clear algorithm that fit best. But you made it an external crate, so that's good.
I see this as a nice property. Many PRNG designs come with a method to seed from a smaller integer. This extra method will give implementations the chance to implement the method chosen by the author. Seems like a nice thing to me. In theory it could also help with reproducibility from other languages, but I don't believe that is easy or worthwhile. |
I was surprised this PR didn't need rebasing after the ISAAC PRNGs were split out, but the base branch is 0.5. Probably doesn't matter much, this just means users get the feature sooner. What needs to happen here? I see no reason why we would have to choose between |
Yes, this should affect There is absolutely no reason we cannot have both this and #554, though #554 makes this redundant. You are correct that allowing PRNGs to override the |
According to Rand's docs, |
It's a little late to change anything but this may be useful http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html |
Aside from the two being independent, what should it do? The key point here is:
Because @pitdicker appears to be proposing the latter, but I'm not keen. On the other hand, the former is redundant with #554, but simpler and less capable. |
I guess this deserves fleshing out: unit tests often want to be deterministic (so cannot use Allowing |
Note on licence: PCG is Apache 2.0 only, so we would have to make a slight exception here instead of the dual MIT / Apache 2.0 we use elsewhere (unless @imneme agrees otherwise). |
To make various people happy, I relicensed the C++ version of the code to MIT/Apache 2.0 a while back. I have no problem with a similar relicense of the C code; the only reason I've not done so is the hassle of making a patch to update all the notices. If someone submits a pull request to update the license for the C code, I'd happily accept it (check out the equivalent patch for the C++ version for what's needed). But if you just want my okay, it's okay. |
Your word is good enough for me. This is actually based off the snipped on your download page. Since I got your attention @imneme, is there any particular reason for preferring the hash "function" constructed in your blog post over PCG for seeding RNGs? And is there any reason for concern when using PCG to seed PCG? As far as I can see PCG has a little better mixing but both are probably sufficient. |
Well, this PR has been open for comment for two months now, and response is mostly positive. I've been hesitant because once people start using it, almost any tweaking becomes a breaking change. This thing has two use-cases (unit tests and reproducible simulations) and meets the requirements for those use-cases. The remaining question is whether |
Implements #522
We never quite came to a conclusion on the algorithm to use, but I don't see any reason not to use PCG32. It's reasonably good quality, reasonably fast and should work fine on 32-bit CPUs too.