Pass a random seed number in the function that generates random numbers #316

szy21 · 2022-09-20T21:00:51Z

No description provided.

charleskawczynski · 2023-05-27T01:50:34Z

Just to add some details / context to this issue:

RRTMGP is not reproducible w.r.t. its random number generation for unthreaded runs because of this line:

Line 215 in 89db03b

Random.rand!(local_rand)

. For threaded runs, different threads will call rand!, impacting the global state shared between threads, and since the order of columns that each thread works on is non-deterministic, neither is the sampled random numbers. To make this reproducible for threaded runs, we'll need to pass in a seed per column.

Sbozzolo · 2024-09-23T20:56:49Z

More generally, If would be very useful if we could support reconstructing precisely the state of the random number generator upon restarts so that the stream of random number is the same as if we didn't restart the simulation. If this is not possible, we won't be able to use restarts to debug broken builds

szy21 · 2024-09-23T21:36:20Z

More generally, If would be very useful if we could support reconstructing precisely the state of the random number generator upon restarts so that the stream of random number is the same as if we didn't restart the simulation. If this is not possible, we won't be able to use restarts to debug broken builds

For this, @sriharshakandala mentioned we will need to store the random number, which will increase the memory. Would that be ok?

sriharshakandala · 2024-09-23T21:39:55Z

Storing the random number will increase the memory footprint by about 2 to 3 orders of magnitude.
We can pass in seed for each column if this helps! Is this preferable?

szy21 · 2024-09-23T21:52:35Z

2 orders of magnitude sounds large and I would rather avoid that. What do others think?

Sbozzolo · 2024-09-23T22:48:15Z

Our goal is to be able to run two identical runs. This requires thread-safety (different threads not changing each other RNG state) so the first step would be to understand the CUDA RNG scheme.

This conversation seems to indicate that RNG is warp-safe out-of-the-box
https://discourse.julialang.org/t/kernel-random-numbers-generation-entropy-randomness-issues/105637

We use overlay method tables during GPU compilation to replace Random.default_rng() to a custom, GPU-friendly RNG: https://github.com/JuliaGPU/CUDA.jl/blob/2ae53761a6a254b98a6689ed0d39781176b245cf/src/device/random.jl#L97 5. Similarly, just calling rand() in a kernel just works and uses the correct RNG.

Specifically, we use Philox2x32, Switch to Philox2x32 for device-side RNG by maleadt · Pull Request #882 · JuliaGPU/CUDA.jl · GitHub 3, a counter-based PRNG. The seed is passed from the host, and the counters are maintained per-warp and initialized at the start of each kernel that uses the RNG, rand: seed kernels from the host. by maleadt · Pull Request #2035 · JuliaGPU/CUDA.jl · GitHub 1. The implementation isn’t fully generic, e.g. you can’t have multiple RNG objects, but it’s pretty close to how Random.jl works.

We should understand this. Maybe all we have to do is worry about warp vs thread.

Second, it would be good to be able to save the RNG state and recover it so that we can support restarts. The details of this will depend on the RNG used.

szy21 · 2024-09-23T23:48:21Z

@sriharshakandala Let's fix the reproducibility issue when running two identical runs first, which shouldn't require storing the random numbers. We can talk about restarts after the first issue is fixed.

sriharshakandala · 2024-09-23T23:49:54Z

From the conversation, it looks like passing in a single seed might work! Though, this could always differ from the results from the CPU simulation.

maleadt
Regular

danielwe
Nov 2023
I think it can be different per warp, but IIRC (it’s been a while since I wrote that code) the idea was to use a single seed for all warps, as we offset it using a counter that’s based on the global ID of the thread. That’s also what happens by default: a single seed is passed from the host and applied from every thread.

szy21 · 2024-09-24T20:38:52Z

I just discussed this with Sriharsha. We will modify the code to ensure reproducibility when running two identical runs, without worrying about the restart. After that is done we can explore whether it is feasible to support restart without increasing the memory footprint by too much. @Sbozzolo What do you think?

Sbozzolo · 2024-09-24T21:35:42Z

I just discussed this with Sriharsha. We will modify the code to ensure reproducibility when running two identical runs, without worrying about the restart. After that is done we can explore whether it is feasible to support restart without increasing the memory footprint by too much. @Sbozzolo What do you think?

Yes, this is a good start, but I would like us to think about supporting restarts as well.

I don't think it makes sense for the memory footprint to increase by orders of magntitude: even if we saved one element per point on the domain we would only the same size as as any other 3D variable. Also, the state has to be saved only when we produce a checkpoint.

sriharshakandala · 2024-10-21T18:11:38Z

Add back PR #542

szy21 added the enhancement New feature or request label Sep 20, 2022

szy21 mentioned this issue Oct 4, 2022

Non-deterministic results in all-sky long run CliMA/ClimaAtmos.jl#837

Closed

This was referenced May 27, 2023

Test Random hypothesis CliMA/ClimaAtmos.jl#1720

Closed

Refactor driver for performance benchmark CliMA/ClimaAtmos.jl#1713

Merged

Sbozzolo assigned sriharshakandala Sep 23, 2024

sriharshakandala linked a pull request Sep 27, 2024 that will close this issue

Add reproducibility test for non-trivial cloud fractions #542

Merged

1 task

sriharshakandala closed this as completed in #542 Sep 27, 2024

sriharshakandala reopened this Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass a random seed number in the function that generates random numbers #316

Pass a random seed number in the function that generates random numbers #316

szy21 commented Sep 20, 2022

charleskawczynski commented May 27, 2023 •

edited

Loading

Sbozzolo commented Sep 23, 2024

szy21 commented Sep 23, 2024

sriharshakandala commented Sep 23, 2024 •

edited

Loading

szy21 commented Sep 23, 2024

Sbozzolo commented Sep 23, 2024

szy21 commented Sep 23, 2024

sriharshakandala commented Sep 23, 2024 •

edited

Loading

szy21 commented Sep 24, 2024

Sbozzolo commented Sep 24, 2024

sriharshakandala commented Oct 21, 2024

Pass a random seed number in the function that generates random numbers #316

Pass a random seed number in the function that generates random numbers #316

Comments

szy21 commented Sep 20, 2022

charleskawczynski commented May 27, 2023 • edited Loading

Sbozzolo commented Sep 23, 2024

szy21 commented Sep 23, 2024

sriharshakandala commented Sep 23, 2024 • edited Loading

szy21 commented Sep 23, 2024

Sbozzolo commented Sep 23, 2024

szy21 commented Sep 23, 2024

sriharshakandala commented Sep 23, 2024 • edited Loading

szy21 commented Sep 24, 2024

Sbozzolo commented Sep 24, 2024

sriharshakandala commented Oct 21, 2024

charleskawczynski commented May 27, 2023 •

edited

Loading

sriharshakandala commented Sep 23, 2024 •

edited

Loading

sriharshakandala commented Sep 23, 2024 •

edited

Loading