Skip to content

Commit

Permalink
Initial public release
Browse files Browse the repository at this point in the history
  • Loading branch information
AndiH committed Jul 20, 2024
0 parents commit e98ae2b
Show file tree
Hide file tree
Showing 11 changed files with 918 additions and 0 deletions.
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.sync.toml
*.tar.gz
.ipynb_checkpoints/*
.*
!/.gitignore
*.png
*.svg
*.pdf
*.jpg
*.html
*.sha256
stream_run/
stream_src/
49 changes: 49 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
cff-version: 1.2.0
title: "JUPITER Benchmark Suite: STREAM"
message: >-
In addition to citing this benchmark repository, please also cite either the JUPITER Benchmark Suite or the accompanying SC24 paper
authors:
- given-names: Sebastian
family-names: Achilles
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-1943-6803'
- given-names: Thomas
family-names: Breuer
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0003-3979-4795'
- given-names: Kay
family-names: Thust
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-1181-1832'
- given-names: Yannik
family-names: Müller
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0009-0001-5696-6512'
- given-names: Andreas
family-names: Herten
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-7150-2505'
- given-names: Alexandre
family-names: Strube
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-9177-6474'
- given-names: Dorian
family-names: Krause
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0001-9799-562X'
- given-names: Salem
family-names: El Sayed
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-7217-6027'
abstract: "The STREAM benchmark of the JUPITER Benchmark Suite"
identifiers:
- type: doi
value: 10.5281/zenodo.12787776
description: Version-agnostic Zenodo Identifier
repository-code: 'https://github.com/FZJ-JSC/jubench-stream/'
license: MIT
date-released: '2024-07-20'
references:
- title: "JUPITER Benchmark Suite"
type: software
doi: 10.5281/zenodo.12737073
168 changes: 168 additions & 0 deletions DESCRIPTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# STREAM

## Purpose

The STREAM benchmark is a synthetic benchmark which measures the sustainable memory bandwidth of a compute node. It only uses little to no computation per byte transferred to or from memory. Different versions of this benchmark are available. The one used here is written in C and only uses OpenMP for threading (no MPI).

## Source

Archive name: `stream-bench.tar.gz`.

The source code is available in the file `src/stream_src.tar.gz`.

The provided sources are equivalent to the STREAM version 5.10 from the official website: https://www.cs.virginia.edu/stream/FTP/Code/

Download and unpack the tar-file first; the directory `stream_src` will be created.

```
cd src
tar -xvzf stream_src.tar.gz
```

## Building

OpenMP must be enabled for compiler and linker. The preprocessor macros `STREAM_ARRAY_SIZE` and `NTIMES` must be defined (see _Configurations_ below).

When using the JUBE script, JUBE will build the STREAM executable in the compile step (see _Execution with JUBE_ below). To build STREAM without JUBE, use a compiler invocation similar to the following line.

```
cd stream_src
make CC=gcc CFLAGS="-fopenmp -Wall -DSTREAM_ARRAY_SIZE=$((2 ** 26)) -DNTIMES=200" stream_c.exe
```

Since the array size is a compile-time parameter, it is recommended to build a dedicated executable for each `STREAM_ARRAY_SIZE`.

Depending on the tested configuration, the compiler flags might be modified, see _Modification_ section below.

## Execution

The executable is called `stream_c.exe`.

### Command Line

Stream can be executed manually for example with the following command:

```
OMP_PLACES="cores" OMP_PROC_BIND="spread" OMP_NUM_THREADS=128 ./stream_c.exe
```

### Execution with JUBE

Using JUBE the benchmark can be executed as follows:

```
jube run stream.jube.xml [--tag tags...]
jube continue stream_run # wait/repeat until all steps are done
jube result -a stream_run
```

| Tags | Effect | Default |
|----------------------|------------------------------------|---------|
| `varySize` | `STREAM_ARRAY_SIZE=2^15..2^28` | |
| `varySizeextended` | `STREAM_ARRAY_SIZE=2^10..2^35` | |
| `fixedSize` | `STREAM_ARRAY_SIZE=2^28` | yes |
| `varyThreads` | `OMP_NUM_THREADS=1..16` | yes |
| `threads1` | `OMP_NUM_THREADS=1` | |
| `threads4` | `OMP_NUM_THREADS=4` | |
| `threadsCores` | `OMP_NUM_THREADS=#Cores` | |
| `threadsNuma` | `OMP_NUM_THREADS=#NUMA Domains` | |
| `threadsHyper` | `OMP_NUM_THREADS=#Hardware Threads`| |
| `s22` | `module load Stages/2022` | yes |
| `s21` | `module load Stages/2021` | |
| `s20` | `module load Stages/2020` | |
| `varyCompiler` | `CC=gcc,icc,nvc,clang` | |
| `gcc` | `CC=gcc` | yes |
| `intel` | `CC=icc` | |
| `nvhpc` | `CC=nvc` | |
| `aocc` | `CC=clang` (AOCC) | |

The results can be found in the columns `Copy, Scale, Add, Triad`.

### Configurations

Candidates are requested to run the following configurations; according JUBE tags are given. All STREAM benchmarks should be run in FP64 precision. See also the modification overview below.

| Name | Description | JUBE Tags |
| ---------------------------| ------------------------------------------------------------------------------------------------------------------------ | --------------------------|
| Threads1 | Single threaded, array sizes 2^15 to 2^28 should be reported; array size 2^28 will be used for evaluation | `threads1` `varySize` |
| Threads4 | Four threads located on the same socket, fixed size (2^28) | `threads4` |
| ThreadsCores | One thread per physical core on the node, fixed size (2^28) | `threadsCores` |
| ThreadsHyper | One thread per hardware thread on the node, fixed size (2^28) | `threadsHyper` |
| Optimal | A custom configuration that achieves maximum bandwidth (2^28) | `optimal`^ |

The following STREAM parameters are to be used:

| Name | Value |
|---------------------|-----------------------------------------------|
| `STREAM_ARRAY_SIZE` | 2^28; 2^15 to 2^28 for _Threads1_ in addition |
| `NTIMES` | 200 |

In any case, the timing accuracy output by the program needs to be at least 20 clocks-ticks (`CLK/Ins`).

If multiple memory domains are exposed to user space (for example HBM and DDR), each memory domain must be measured and reported separately. This can be achieved, for example, by using `numactl` and the `-m` (`--membind`) option. Mixing memory domains (for example _cache mode_) is not allowed.

^ Tag already exists but values have to be defined in `parametersets.jube.xml`.

### Modification

The following Modifications are allowed, depending on the configuration:

| Name | Compiler | Compiler Flags | Size | Threads | Affinity | Source Code |
| ------------ | -------- | -------------- | ---- | ------- | -------- | ----------- |
| Threads1 | Yes | Yes | No^ | No | No | No |
| Threads4 | Yes | Yes | No | No | No | No |
| ThreadsCores | Yes | Yes | No | No | No | No |
| ThreadsHyper | Yes | Yes | No | No | No | No |
| Optimal | Yes | Yes | No | Yes | Yes | No |


Here, _Size_ refers to the length of the arrays in STREAM (i.e `STREAM_ARRAY_SIZE` / JUBE variable `size`), _Threads_ to the number of OpenMP threads (`OMP_NUM_THREADS`/ JUBE's `threadspertask`), _Affinity_ to the locations of spawned threads (`OMP_PLACES`, `OMP_PROC_BIND`, ...). For the _Threads1_ configuration, a report of additional array sizes is expected for the qualitative evaluation.

^: `STREAM_ARRAY_SIZE` should be 2^28 for the evaluation, but a scan through multiple array sizes (2^15 to 2^28) should be reported as part of the feedback.

## Verification

All execution should run without errors. In case of JUBE: status _done_, no errors.
Only in case of out-of-memory crashes, the range of `STREAM_ARRAY_SIZE` values can be narrowed to prevent them.

## Results

In most cases, the average triad bandwidth is to be reported. For the case of the freely chosen _Optimal_ configuration, please report the best bandwidth achieved. Should multiple memory domains be available (for example HBM and DDR), the benchmark is to be run separately for each domain.

Abbreviated example output follow.

| stage | modules | mem | CLK/Ins (>20) | N | Copy | Scale | Add | Triad | runtime[sec] |
|-------|---------|-----|---------------|-----|-------|-------|-------|-------|--------------|
| 2022 | GCC | 3.0 | 55.0 | 200 | 16786 | 19590 | 26872 | 26709 | 0.47 |
| 2022 | Intel | 3.0 | 57.0 | 200 | 22554 | 26494 | 26441 | 26655 | 0.48 |
| 2022 | NVHPC | 3.0 | 200.0 | 200 | 10701 | 9763 | 15062 | 14096 | 0.62 |
| 2022 | AOCC | | | 200 | | | | | 0.49 |


| stage | modules | Exp | Array [MiB] | Thread [MiB] | Threads | CLK/Ins (>20) | N | Triad | runtime[sec] |
|-------|---------|-----|-------------|--------------|---------|---------------|-----|------------|---------------|
| 2022 | GCC | 10 | 0.0 | 0.0 | 8 | 5.0 | 200 | 8589.9 | 0.37 |
| 2022 | GCC | 12 | 0.0 | 0.1 | 8 | 4.0 | 200 | 34359.7 | 0.37 |
| 2022 | GCC | 14 | 0.1 | 0.4 | 8 | 5.0 | 200 | 103079.2 | 0.36 |
| 2022 | GCC | 16 | 0.5 | 1.5 | 8 | 6.0 | 200 | 329853.5 | 0.37 |
| 2022 | GCC | 18 | 2.0 | 6.0 | 8 | 18.0 | 200 | 488671.8 | 0.38 |
| 2022 | GCC | 20 | 8.0 | 24.0 | 8 | 41.0 | 200 | 514893.3 | 0.41 |
| 2022 | GCC | 22 | 32.0 | 96.0 | 8 | 882.0 | 200 | 167478.2 | 0.92 |
| 2022 | GCC | 24 | 128.0 | 384.0 | 8 | 3672.0 | 200 | 166520.4 | 2.95 |
| 2022 | GCC | 26 | 512.0 | 1536.0 | 8 | 14652.0 | 200 | 159671.9 | 9.84 |
| 2022 | GCC | 28 | 2048.0 | 6144.0 | 8 | 53737.0 | 200 | 127464.6 | 35.08 |
| 2022 | GCC | 32 | 32768.0 | 98304.0 | 8 | 380153.0 | 200 | 153141.9 | 498.92 |

## Commitment

The following values should be reported. Any run with CLK/Ins <= 20 is not counted. For _Threads1_, the average triad bandwidth for array sizes from 2^15 to 2^28 is to be reported as well. The configuration chosen or _Optimal_ needs to be given.
If multiple memory types are exposed to user space (for example HBM in flat mode), the benchmark should be run on each.

| Name | Values |
|--------------|---------------------------------------------|
| Threads1 | Average triad bandwidth for array size 2^18 |
| Threads4 | Average triad bandwidth for array size 2^18 |
| ThreadsCores | Average triad bandwidth for array size 2^18 |
| ThreadsHyper | Average triad bandwidth for array size 2^18 |
| Optimal | Average triad bandwidth for array size 2^18 |

21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Forschungszentrum Jülich GmbH

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# JUPITER Benchmark Suite: STREAM

[![DOI](https://zenodo.org/badge/831481274.svg)](https://zenodo.org/badge/latestdoi/831481274) [![Static Badge](https://img.shields.io/badge/DOI%20(Suite)-10.5281%2Fzenodo.12737073-blue)](https://zenodo.org/badge/latestdoi/764615316)

This benchmark is part of the [JUPITER Benchmark Suite](https://github.com/FZJ-JSC/jubench). See the repository of the suite for some general remarks.

This repository contains the STREAM benchmark. [`DESCRIPTION.md`](DESCRIPTION.md) contains details for compilation, execution, and evaluation. Sources are available in `./src/`, archived as a tarball.
Loading

0 comments on commit e98ae2b

Please sign in to comment.