Skip to content

Releases: LLNL/RAJA

v0.10.1

08 Nov 20:32
be91e04
Compare
Choose a tag to compare

This releases brings in a minor change to the camp submodule, which affects the way camp configures its tests to build.

Please download the RAJA-v0.10.1.tar.gz file below. The others will not work due to the way RAJA uses git submodules.

v0.10.0

30 Oct 17:45
53cb89c
Compare
Choose a tag to compare

This release contains new features, several notable changes, and some bug fixes.

Please download the RAJA-v0.10.0.tar.gz file below. The others will not work due to the way RAJA uses git submodules.

Notable changes include:

  • New features:

    • Added CUDA block direct execution policies, which can be used to map loop iterations directly to CUDA thread blocks. These are analogous to the pre-existing thread direct policies. The new block direct policies can provide better performance for kernels than the block loop policies when load balancing may be an issue. Please see the RAJA User Guide for a description of all available RAJA execution policies.
    • Added a plugin registry feature that will allow plugins to be linked into RAJA that can act before and after kernel launches. One benefit of this is that RAJA no longer has an explicit CHAI dependency if RAJA is used with CHAI. Future benefits will include integration with other tools for performance analysis, etc.
    • Added a shift method to RAJA::View, which allows one to create a new view object from an existing one that is shifted in index space from the original. Please see the RAJA User Guide for details.
    • Added support for RAJA::TypedView and RAJA::TypedOffsetLayout, so that the index type can be specified as a template parameter.
    • Added helper functions to convert a RAJA::Layout object to a RAJA::OffsetLayout object and RAJA::TypedLayout to RAJA::TypedOffsetLayout. Please see the RAJA User Guide for details.
    • Added a bounds checking option to RAJA Layout types as a debugging feature. This is a compile-time option that will report user errors when given View or Layout indices are out-of-bounds. See View/Layout section in the RAjA User Guide for instructions on enabling this and how this feature works.
    • We've added a RAJA Template Project on GitHub, which shows how to use RAJA in an application, either as a Git submodule or as an externally installed library that you link your application against. It is available here: https://github.com/LLNL/RAJA-project-template. It is also linked to the main RAJA project page on GitHub.
    • Various user documentation improvements.
  • API Change.

    • The type alias RAJA::IndexSet that was marked deprecated previously has been removed. Now, all index set usage must use the type RAJA::TypedIndexSet and specify all segment types (as template parameters) that the index set may potentially hold.
  • Bug fixes:

    • Fix for issue in OpenMP target offload back-end that previously caused some RAJA Performance Suite kernels to seg fault when built with the XL compiler.
    • Removed an internal RAJA class constructor to prevent users to do potentially incorrect, and very difficult to hunt down, things in their code that are technically not supported in RAJA, such as inserting RAJA::statement::CudaSyncThreads() in arbitrary places inside a lambda expression.
  • Build changes/improvements:

    • RAJA now enforces a minimum CUDA compute capability of sm_35. Users can use the CMake variable 'CUDA_ARCH' to specify this. If not specified, the value of sm_35 will be used and an informational message will be emitted indicating this. If a user attempts to set the value lower than sm_35, CMake will error out and a message will be emitted indicating why this happened.
    • Transition to using camp as a submodule after its open source release (https://github.com/llnl/camp).
    • Made minimum required CMake version 3.9.
    • Update BLT build system submodule to newer version (SHA-1 hash: 96419df).
    • Cleaned up compiler warnings in OpenMP target back-end implementation.

v0.9.0

25 Jul 19:05
df7ca1f
Compare
Choose a tag to compare

This release contains feature enhancements, one breaking change, and some bug fixes.

Please download the RAJA-v0.9.0.tar.gz file below. The others will not work due to the way RAJA uses git submodules.

  • Breaking change
    • The atomic namespace in RAJA has been removed. Now, use atomic operations as RAJA::atomicAdd(), not RAJA::atomic::atomicAdd(), for example. This was done to make atomic usage consistent with other RAJA features, such as reductions, scans, etc.

Other notable changes include:

  • Features

    • The lambda statement interface has been extended in the RAJA kernel API. Earlier, when multiple lambda expressions were used in a kernel, they were required to all have the same arguments, although not all arguments had to be used in each lambda expression. Now, lambda arguments may be specified in the RAJA::statement::Lambda type so that each lambda expression need only take the arguments it uses. However, the previous usage pattern will continue to be supported. To support the new interface, new statement types have been introduced to indicate iteration space variables (Segs), local variable/array parameters (Params), and index offsets (Offsets). The offsets can be used with a For statement as a replacement for the ForICount statement. The new API features are described in the RAJA User Guide.
    • Minloc and maxloc reductions now support a tuple of index values. So now if you have a nested loop kernel with i, j, k loops, you can get the 'loc' value out as an i, j, k triple.
  • Bug Fixes:

    • Small change to make RAJA Views work properly with OpenMP target kernels.
    • Changes to fix OpenMP target back-end for XL compilers.
    • Fix build issue with older versions of GNU compiler.
    • Fixes to resolve issues associated with corner cases in choosing improper number of threads per block or number of thread blocks for CUDA execution policies.
  • Build changes/improvements:

    • A few minor portability improvements

v0.8.0

28 Mar 22:56
8d19a8c
Compare
Choose a tag to compare

This release contains one major change and some minor improvements to
compilation and performance.

Please download the RAJA-v0.8.0.tar.gz file below. The others will not work due to the way RAJA uses git submodules.

Major changes include:

  • Build system updated to use the latest version of BLT (or close to it). Depending on how one builds RAJA, this could require changes to how information is passed to CMake. Content has been added to the relevant sections of the RAJA User Guide which describes how this is done.

Other notable changes include:

Features (These are not yet documented and should be considered experimental. There will be documentation and usage examples in the next RAJA release.):

  • New thread, warp, and bitmask policies for CUDA. These are not yet documented and should be considered experimental.
  • Added AtomicLocalArray type which returns data elements wrapped in an AtomicRef object.

Bug Fixes:

  • Fixed issue in RangeStrideSegment iteration.
  • Fix 'align hint' macro to eliminate compile warning when XL compiler is used with nvcc.
  • Fix issues associated with CUDA architecture level (e.g., sm_*) set too low and generated compiler warning/errors. Caveats for RAJA features (mostly atomic operations) available at different CUDA architecture levels added to User Guide.

Performance Improvements:

  • Some performance improvements in RAJA::kernel usage with CUDA back-end.

v0.7.0

07 Feb 18:40
caa33b3
Compare
Choose a tag to compare

This release contains several major changes, new features, a variety of bugfixes, and expanded user documentation and accompanying example codes. For more information and details about any of the changes listed below, please consult the RAJA documentation for the 0.7.0 release which is linked to our Github project.

Please download the RAJA-0.7.0.tar.gz file above. The others will not work due to the way RAJA uses git submodules.

Major changes include:

  • RAJA::forallN methods were marked deprecated in the 0.6.0 release. They have been removed. All applications that contain nested loops and have been using forallN methods should convert them to use the RAJA::kernel interface.
  • RAJA::forall methods that take explicit loop bounds rather than segments (e.g., RAJA::forall(beg, end, ...) were marked deprecated in the 0.6.0 release. They have been removed. Hopefully, this will result in faster compile times due to simpler template resolution. Users who have been passing loop bounds directly to forall methods should convert those cases to use RAJA segments instead.
  • CUDA execution policies for use in RAJA::kernel policies have been significantly reworked and redefined. The new set of policies are much more flexible and provide improved run time performance.
  • New, improved support for loop tiling algorithms and support for CPU cache blocking, CUDA GPU thread local data and shared memory is available. This includes RAJA::kernel policy statement types to make tile numbers and local tile indices available in user kernels (TileTCount and ForICount statement types), and a new RAJA::LocalArray type with various CPU and GPU memory policies. Due to these new features, RAJA 'shmemwindow' statements have been removed.
  • This release contains expanded documentation and example codes for the RAJA::kernel interface, including loop tiling algorithms and support for CPU cache blocking, CUDA GPU thread local data and shared memory.

Other notable changes include:

  • Features:

    • Initial support for OpenMP target execution policies with RAJA::kernel added.
    • The RAJA::AtomicRef interface is now consistent with the C++20 std::atomic_ref interface.
    • Atomic compare-exchange operations added.
    • CUDA reduce policies no longer require a thread-block size parameter.
    • New features considered prelimiary with no significant documentation or examples available yet:
      • RAJA::statement::Reduce type for use in RAJA::kernel execution policies. This enables the ability to perform reductions and access reduced values inside user kernels.
      • Warp-level execution policies added for CUDA.
  • Performance improvements:

    • Better use of inline directives to improve likelihood of SIMD instruction generation with the Intel compiler.
  • Bug fixes:

    • Several CHAI integration issues resolved.
    • Resolve issue with alignx directive when using XL compiler as host compiler with CUDA.
    • Fix issue associated with how XL compiler interprets OpenMP region definition.
    • Various tweaks to camp implementation to improve robustness.
  • Build changes/improvements:

    • The minimum required version of CMake has changed to 3.8 for all programming model back-ends, except CUDA. The minimum CMake version for CUDA support is 3.9.
    • Improved support for clang-cuda compiler. Some features still do not work with that compiler.
    • Update NVIDIA cub module to version 1.8.0.
    • Enable use of 'BLT_SOURCE_DIR' CMake variable to help prevent conflicts with BLT versions in RAJA and other libraries used in applications.

v0.6.0

27 Jul 16:31
cc7a97e
Compare
Choose a tag to compare

This release contains two major changes, a variety of bug fixes and feature
enhancements, and expanded user documentation and accompanying example codes.

Please download the RAJA-0.6.0.tar.gz file above. The others will not work due to the way RAJA uses git submodules.

Major changes include:

  • RAJA::forallN methods are marked deprecated. They will be removed in
    the 0.7.0 release.
  • RAJA::forall methods that take loop bounds rather than segments (e.g.,
    RAJA::forall(beg, end, ...) are marked deprecated. They will be removed
    in the 0.7.0 release.
  • RAJA::nested has been replaced with RAJA::kernel. The RAJA::kernel interface
    is much more flexible and full featured. Going forward, it will be the
    supported interface for nested loops and more complex kernels in RAJA.
  • This release contains new documentation and example codes for the
    RAJA::kernel interface. The documentation described key features and
    summarizes available 'statement' types. However, it remains a
    work-in-progress and expanded documentation with more examples will be
    available in future releases.
  • Documentation of other RAJA features have been expanded and improved in
    this release along with additional example codes.

Other notable changes include:

  • New or improved features:

    • RAJA CUDA reductions now work with host/device lambdas
    • List segments now work with RAJA::kernel loops.
    • New and expanded collection of build files for LC and ALCF machines.
      Hopefully, these will be helpful to folks getting started.
  • Performance improvements:

    • Some RAJA::View use cases
    • Unnecessary operations removed in min/max atomics
  • Bug fixes:

    • Issues in View with OffsetLayout fixed.
    • Construction of a const View from a non-const View now works
    • CUDA kernel no longer launched in RAJA::kernel loops when iteration
      space has size zero

v0.5.3

31 Jan 21:21
1ca35c0
Compare
Choose a tag to compare

Please download the RAJA-0.5.3.tar.gz file above. The others will not work due to the way RAJA uses git submodules.

This is a bugfix release that fixes bugs in the IndexSetBuilder methods. These methods now work correctly with the strongly-typed IndexSet.

v0.5.2

30 Jan 23:02
4d5c3d5
Compare
Choose a tag to compare

Please download the RAJA-0.5.2.tar.gz file above. The others will not work due to the way RAJA uses git submodules.

This release fixes some small bugs, including compiler warnings issued for deprecated features, type narrowing, and the slice method for the RangeStrideSegment class.

It also adds a new CMake variable, RAJA_LOADED, that is used to determine whether RAJA's CMakeLists file has already been processed. This is useful when including RAJA as part of another CMake project.

v0.5.1

17 Jan 19:31
bf340ab
Compare
Choose a tag to compare

Please download the RAJA-0.5.1.tar.gz file above. The others will not work due to the way RAJA uses git submodules.

This release contains fixes for compiler warnings with newer GCC and Clang compilers, and allows strongly-typed indices to work with RangeStrideSegment.

Additionally, the index type for all segments in an IndexSet needs to be the same. This requirement is enforced with a static_assert.

v0.5.0

11 Jan 19:19
9b539d8
Compare
Choose a tag to compare

Please download the RAJA-0.5.0.tar.gz file above. The others will not work due to the way RAJA uses git submodules.

This release contains a variety of bug fixes, removes nvcc compiler
warnings, addition of unit tests to expand coverage, and a variety of
other code cleanup and improvements. The most notable changes in this
version include:

  • New RAJA User Guide and Tutorial along with a set of example codes
    that illustrate basic usage of RAJA features and which accompany
    the tutorial. The examples are in the RAJA/examples directory.
    The user guide is available online here:
    RAJA User Guide and Tutorial.

  • RAJA::IndexSet is now deprecated. You may still use it until it is
    removed in a future release -- you will see a notification message at
    compile time that it is deprecated.

    Index set functionality will now be available via RAJA::TypedIndexSet
    where you specify all segment types as template parameters when you
    declare an instance of it. This change allows us to: remove all virtual
    methods from the index set, be able to use index set objects to CUDA
    GPU kernels and all of their functionality, and support any arbitrary
    segment type even user-defined. Please see User Guide for details.

    Segment dependencies are being developed for the typed index set and
    will be available in a future release.

  • RAJA::nested::forall changes:

    • Addition of CUDA and OpenMP collapse policies for nested loops.
      OpenMP collapse will do what the OpenMP collapse clause does.
      CUDA collapse will collapse a loop nest into a single CUDA kernel based
      on how nested policies specify how the loop levels should be distributed
      over blocks and threads.

    • Added new policy RAJA::cuda_loop_exec to enable inner loops to run
      sequentially inside a CUDA kernel with RAJA::nested::forall.

    • Fixed RAJA::nested::forall so it now works with RAJA's CUDA Reducer
      types.

    • Removed TypedFor policies. For type safety of nested loop iteration
      variables, it makes more sense to use TypedRangeSegment since the
      variables are associated with the loop kernel and not the execution
      policy, which may be applied to multiple loops with different variables.

  • Fixed OpenMP scans to calculate chunks of work based on actual number of
    threads the OpenMP runtime makes available.

  • Enhancements and fixes to RAJA/CHAI interoperability.

  • Added aliases for several camp types in the RAJA namespace; e.g.,
    camp::make_tuple can now be accessed as RAJA::make_tuple. This
    change makes the RAJA API more consistent and clear.