-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass a random seed number in the function that generates random numbers #316
Comments
Just to add some details / context to this issue: RRTMGP is not reproducible w.r.t. its random number generation for unthreaded runs because of this line: RRTMGP.jl/src/optics/CloudOptics.jl Line 215 in 89db03b
rand! , impacting the global state shared between threads, and since the order of columns that each thread works on is non-deterministic, neither is the sampled random numbers. To make this reproducible for threaded runs, we'll need to pass in a seed per column.
|
More generally, If would be very useful if we could support reconstructing precisely the state of the random number generator upon restarts so that the stream of random number is the same as if we didn't restart the simulation. If this is not possible, we won't be able to use restarts to debug broken builds |
For this, @sriharshakandala mentioned we will need to store the random number, which will increase the memory. Would that be ok? |
Storing the random number will increase the memory footprint by about 2 to 3 orders of magnitude. |
2 orders of magnitude sounds large and I would rather avoid that. What do others think? |
Our goal is to be able to run two identical runs. This requires thread-safety (different threads not changing each other RNG state) so the first step would be to understand the CUDA RNG scheme. This conversation seems to indicate that RNG is warp-safe out-of-the-box
We should understand this. Maybe all we have to do is worry about warp vs thread. Second, it would be good to be able to save the RNG state and recover it so that we can support restarts. The details of this will depend on the RNG used. |
@sriharshakandala Let's fix the reproducibility issue when running two identical runs first, which shouldn't require storing the random numbers. We can talk about restarts after the first issue is fixed. |
From the conversation, it looks like passing in a single seed might work! Though, this could always differ from the results from the CPU simulation. maleadt danielwe |
I just discussed this with Sriharsha. We will modify the code to ensure reproducibility when running two identical runs, without worrying about the restart. After that is done we can explore whether it is feasible to support restart without increasing the memory footprint by too much. @Sbozzolo What do you think? |
Yes, this is a good start, but I would like us to think about supporting restarts as well. I don't think it makes sense for the memory footprint to increase by orders of magntitude: even if we saved one element per point on the domain we would only the same size as as any other 3D variable. Also, the state has to be saved only when we produce a checkpoint. |
Add back PR #542 |
No description provided.
The text was updated successfully, but these errors were encountered: