This page documents the Alpaka integration within CMSSW. For more information about Alpaka itself see the Alpaka documentation.
The code in Package/SubPackage/{interface,src,plugins,test}/alpaka
is compiled once for each enabled Alpaka backend. The ALPAKA_ACCELERATOR_NAMESPACE
macro is substituted with a concrete, backend-specific namespace name in order to guarantee different symbol names for all backends, that allows for cmsRun
to dynamically load any set of the backend libraries.
The source files with .dev.cc
suffix are compiled with the backend-specific device compiler. The other .cc
source files are compiled with the host compiler.
The BuildFile.xml
must contain <flags ALPAKA_BACKENDS="1"/>
to enable the behavior described above.
- Minimize explicit blocking synchronization calls
- Avoid
alpaka::wait()
, non-cached memory buffer allocations
- Avoid
- If you can, use
global::EDProducer
base class- If you need per-stream storage
- For few objects consider using
edm::StreamCache<T>
with the global module, or - Use
stream::EDProducer
- For few objects consider using
- If you need to transfer some data back to host, use
stream::SynchronizingEDProducer
- If you need per-stream storage
- All code using
ALPAKA_ACCELERATOR_NAMESPACE
should be placed inPackage/SubPackage/{interface,src,plugins,test}/alpaka
directory- Alpaka-dependent code that uses templates instead of the namespace macro can be placed in
Package/SubPackage/interface
directory
- Alpaka-dependent code that uses templates instead of the namespace macro can be placed in
- All source files (not headers) using Alpaka device code (such as kernel call, functions called by kernels) must have a suffic
.dev.cc
, and be placed in the aforementionedalpaka
subdirectory - Any code that
#include
s a header from the framework or from theHeterogeneousCore/AlpakaCore
must be separated from the Alpaka device code, and have the usual.cc
suffix.- Some framework headers are allowed to be used in
.dev.cc
files:- Any header containing only macros, e.g.
FWCore/Utilities/interface/CMSUnrollLoop.h
,FWCore/Utilities/interface/stringize.h
FWCore/Utilities/interface/Exception.h
FWCore/MessageLogger/interface/MessageLogger.h
, although it is preferred to issue messages only in the.cc
filesHeterogeneousCore/AlpakaCore/interface/EventCache.h
andHeterogeneousCore/AlpakaCore/interface/QueueCache.h
can, in principle, be used in.dev.cc
files, even if there should be little need to use them explicitly
- Any header containing only macros, e.g.
- Some framework headers are allowed to be used in
Data formats, for both Event and EventSetup, should be placed following their usual rules. The Alpaka-specific conventions are
- There must be a host-only flavor of the data format that is either independent of Alpaka, or depends only on Alpaka's Serial backend
- The host-only data format must be defined in
Package/SubPackage/interface/
directory - If the data format is to be serialized (with ROOT), it must be serialized in a way that the on-disk format does not depend on Alpaka, i.e. it can be read without Alpaka
- For Event data products the ROOT dictionary should be defined in
DataFormats/SubPackage/src/classes{.h,_def.xml}
- As usual, the
classes_def.xml
should declare the dictionaries for the data product typeT
andedm::Wrapper<T>
. These data products can be declared as persistent (default) or transient (persistent="false"
attribute).
- As usual, the
- For EventSetup data products the registration macro
TYPELOOKUP_DATA_REG
should be placed inPackage/SubPackage/src/ES_<type name>.cc
.
- The host-only data format must be defined in
- The device-side data formats are defined in
Package/SubPackage/interface/alpaka/
directory- The device-side data format classes should be either templated over the device type, or defined in the
ALPAKA_ACCELERATOR_NAMESPACE
namespace. - For host backends (
serial
), the "device-side" data format class must be the same as the aforementioned host-only data format class- Use
ASSERT_DEVICE_MATCHES_HOST_COLLECTION(<device collection type>, <host collection type>);
macro to ensure that, see an example in ../../DataFormats/PortableTestObjects/interface/alpaka/TestDeviceCollection.h - This equality is necessary for the implicit data transfers to function properly
- Use
- For Event data products the ROOT dictionary should be defined in
DataFormats/SubPackage/src/alpaka/classes_<platform>{.h,_def.xml}
- The
classes_<platform>_def.xml
should declare the dictionaries for the data product typeT
,edm::DeviceProduct<T>
, andedm::Wrapper<edm::DeviceProduct<T>>
. All these dictionaries must be declared as transient withpersistent="false"
attribute. - The list of
<platform>
includes currently:cuda
,rocm
- The
- For EventSetup data products the registration macro should be placed in
Package/SubPackage/src/alpaka/ES_<type name>.cc
- Data products defined in
ALPAKA_ACCELERATOR_NAMESPACE
should useTYPELOOKUP_ALPAKA_DATA_REG
macro - Data products templated over the device type should use
TYPELOOKUP_ALPAKA_TEMPLATED_DATA_REG
macro
- Data products defined in
- The device-side data format classes should be either templated over the device type, or defined in the
- For Event data products the
DataFormats/SubPackage/BuildFile.xml
must contain<flags ALPAKA_BACKENDS="!serial"/>
- unless the package has something that is really specific for
serial
backend that is not generally applicable on host
- unless the package has something that is really specific for
Note that even if for Event data formats the examples above used DataFormats
package, Event data formats are allowed to be defined in other packages too in some circumstances. For full details please see SWGuideCreatingNewProducts.
Both EDProducers and ESProducers make use of implicit data transfers. In CPU backends these data transfers are omitted, and the host-side and the "device-side" data products are the same.
The implicit host-to-device and device-to-host copies rely on specialization of cms::alpakatools::CopyToDevice
and cms::alpakatools::CopyToHost
class templates, respectively. These have to be specialized along
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
namespace cms::alpakatools {
template<>
struct CopyToDevice<TSrc> {
template <typename TQueue>
requires alpaka::isQueue<TQueue>
static auto copyAsync(TQueue& queue, TSrc const& hostProduct) -> TDst {
// code to construct TDst object, and launch the asynchronous memcpy from the host to the device of TQueue
return ...;
}
};
}
or
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
namespace cms::alpakatools {
template <>
struct CopyToHost<TSrc> {
template <typename TQueue>
requires alpaka::isQueue<TQueue>
static auto copyAsync(TQueue& queue, TSrc const& deviceProduct) -> TDst {
// code to construct TDst object, and launch the asynchronous memcpy from the device of TQueue to the host
return ...;
}
};
}
respectively.
Note that the destination (device-side/host-side) type TDst
can be different from or the same as the source (host-side/device-side) type TSrc
as far as the framework is concerned. For example, in the PortableCollection
model the types are different. The copyAsync()
member function is easiest to implement as a template over TQueue
. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
Both CopyToDevice
and CopyToHost
class templates are partially specialized for all PortableObject
and PortableCollection
instantiations.
If the data product in question contains pointers to memory elsewhere within the data product, after the alpaka::memcpy()
calls in the copyAsync()
those pointers still point to device memory, and need to be updated. Such data products are generally discouraged. Nevertheless, such pointers can be updated without any additional synchronization by implementing a postCopy()
function in the CopyToHost
specialization along (extending the CopyToHost
example above)
namespace cms::alpakatools {
template <>
struct CopyToHost<TSrc> {
// copyAsync() definition from above
static void postCopy(TDst& obj) {
// modify obj
// any modifications must be such that the postCopy() can be
// skipped when the obj originates from the host (i.e. on CPU backends)
}
};
}
The postCopy()
is called after the operations enqueued in the copyAsync()
have finished. The code in postCopy()
must be such that the call to postCopy()
can be omitted on CPU backends.
Note that for CopyToDevice
such postCopy()
functionality is not provided. It should be possible to a issue kernel call from the CopyToDevice::copyAsync()
function to achieve the same effect.
In EDProducers for each device-side data product a transfer from the device memory space to the host memory space is registered automatically. The data product is copied only if the job has another EDModule that consumes the host-side data product. For each device-side data product a specialization of cms::alpakatools::CopyToHost
is required to exist.
In addition, for each host-side data product a transfer from the host memory space to the device meory space is registered autmatically if a cms::alpakatools::CopyToDevice
specialization exists. The data product is copied only if the job has another EDModule that consumes the device-side data product.
In ESProducers for each host-side data product a transfer from the host memory space to the device memory space (of the backend of the ESProducer) is registered automatically. The data product is copied only if the job has another ESProducer or EDModule that consumes the device-side data product. For each host-side data product a specialization of cms::alpakatools::CopyToDevice
is required to exist.
For more information see DataFormats/Portable/README.md
and DataFormats/SoATemplate/README.md
.
The Alpaka-based EDModules should use one of the following base classes (that are defined in the ALPAKA_ACCELERATOR_NAMESPACE
):
global::EDProducer<...>
(#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"
)- A global EDProducer that launches (possibly) asynchronous work
stream::EDProducer<...>
(#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/EDProducer.h"
)- A stream EDProducer that launches (possibly) asynchronous work
stream::SynchronizingEDProducer<...>
(#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/SynchronizingEDProducer.h"
)- A stream EDProducer that may launch (possibly) asynchronous work, and synchronizes the asynchronous work on the device with the host
- The base class uses the
edm::ExternalWork
for the non-blocking synchronization
- The base class uses the
- A stream EDProducer that may launch (possibly) asynchronous work, and synchronizes the asynchronous work on the device with the host
The ...
can in principle be any of the module abilities listed in the linked TWiki pages, except the edm::ExternalWork
. The majority of the Alpaka EDProducers should be global::EDProducer
or stream::EDProducer
, with stream::SynchronizingEDProducer
used only in cases where some data to be copied from the device to the host, that requires synchronization, for different reason than copying an Event data product from the device to the host.
New base classes (or other functionality) can be added based on new use cases that come up.
The Alpaka-based ESProducers should use the ESProducer
base class (#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESProducer.h"
).
Note that both the Alpaka-based EDProducer and ESProducer constructors must pass the argument edm::ParameterSet
object to the constructor of their base class.
Note that currently Alpaka-based ESSources are not supported. If you need to produce EventSetup data products into a Record for which there is no ESSource yet, use EmptyESSource
.
The Alpaka-based modules have a notion of a host memory space and device memory space for the Event and EventSetup data products. The data products in the host memory space are accessible for non-Alpaka modules, whereas the data products in device memory space are available only for modules of the specific Alpaka backend. The host backend(s) use the host memory space directly.
The EDModules get device::Event
and device::EventSetup
from the framework, from which data products in both host memory space and device memory space can be accessed. Data products can also be produced to either memory space. As discussed above, for each data product produced into the device memory space an implicit data copy from the device memory space to the host memory space is registered, and for each data produced produced into the host memory space for which cms::alpakatools::CopyToDevice
is specialized an implicit data copy from the host memory space to the device memory space is registered. The device::Event::queue()
returns the Alpaka Queue
object into which all work in the EDModule must be enqueued.
The ESProducer can have two different produce()
function signatures
- If the function has the usual
TRecord const&
parameter, the function can read an ESProduct from the host memory space, and produce another product into the host memory space. An implicit copy of the data product from the host memory space to the device memory space (of the backend of the ESProducer) is registered as discussed above. - If the function has
device::Record<TRecord> const&
parameter, the function can read an ESProduct from the device memory space, and produce another product into the device memory space. No further copies are made by the framework. Thedevice::Record<TRecord>::queue()
gives the AlpakaQueue
object into which all work in the ESProducer must be enqueued.
The memory spaces of the consumed and (in EDProducer case) produced data products are driven by the tokens. The token types to be used in different cases are summarized below.
Host memory space | Device memory space | |
---|---|---|
Access Event data product of type T |
edm::EDGetTokenT<T> |
device::EDGetToken<T> |
Produce Event data product of type T |
edm::EDPutTokenT<T> |
device::EDPutToken<T> |
Access EventSetup data product of type T in Record TRecord |
edm::ESGetToken<T, TRecord> |
device::ESGetToken<T, TRecord> |
With the device memory space tokens the type-deducing consumes()
, produces()
, and esConsumes()
calls must be used (i.e. do not specify the data product type as part of the function call). For more information on these registration functions see
In the fillDescriptions()
function specifying the module label automatically with the edm::ConfigurationDescriptions::addWithDefaultLabel()
is strongly recommended. Currently a cfi
file is generated for a module for each Alpaka backend such that the backend namespace is explicitly used in the module definition. An additional cfi
file is generated for the "module type resolver" functionality, where the module type has @alpaka
postfix.
Also note that the fillDescription()
function must have the same content for all backends, i.e. any backend-specific behavior with e.g. #ifdef
or if constexpr
are forbidden.
While the EventSetup can be used to handle copying data to all devices of an Alpaka backend, for data used only by one EDProducer a simpler way would be to use one of
cms::alpakatools::MoveToDeviceCache<TDevice, THostObject>
(recommended)#include "HeterogeneousCore/AlpakaCore/interface/MoveToDeviceCache.h"
- Moves the
THostObject
to all devices usingcms::alpakatools::CopyToDevice<THostObject>
synchronously. On host backends the argumentTHostObject
is moved around, but not copied. - The
THostObject
must not be copyable- This is to avoid easy mistakes with objects that follow copy semantics of
std::shared_ptr
(that includes Alpaka buffers), that would allow the source memory buffer to be used via another copy during the asynchronous data copy to the device.
- This is to avoid easy mistakes with objects that follow copy semantics of
- The constructor argument
THostObject
object may not be used, unless it is initialized again e.g. by assigning anotherTHostObject
into it. - The corresponding device-side object can be obtained with
get()
member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.
cms::alpakatools::CopyToDeviceCache<TDevice, THostObject>
(use only if must use copyableTHostObject
)#include "HeterogeneousCore/AlpakaCore/interface/CopyToDeviceCache.h"
- Copies the
THostObject
to all devices usingcms::alpakatools::CopyToDevice<THostObject>
synchronously. Also host backends do a copy. - The constructor argument
THostObject
object can be used for other purposes immediately after the constructor returns - The corresponding device-side object can be obtained with
get()
member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.
For examples see HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerCopyToDeviceCache.cc
and HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerMoveToDeviceCache.cc
.
- All Event data products in the device memory space are guaranteed to be accessible only for operations enqueued in the
Queue
given bydevice::Event::queue()
when accessed through thedevice::Event
. - All Event data products in the host memory space are guaranteed to be accessible for all operations (after the data product has been obtained from the
edm::Event
ordevice::Event
). - All EventSetup data products in the device memory space are guaranteed to be accessible only for operations enqueued in the
Queue
given bydevice::Event::queue()
when accessed via thedevice::EventSetup
(ED modules), or bydevice::Record<TRecord>::queue()
when accessed via thedevice::Record<TRecord>
(ESProducers). - The EDM Stream does not proceed to the next Event until after all asynchronous work of the current Event has finished.
- Note: this implies if an EDProducer in its
produce()
function uses theEvent::queue()
or gets a device-side data product, and does not produce any device-side data products, theproduce()
call will be synchronous (i.e. will block the CPU thread until the asynchronous work finishes)
- Note: this implies if an EDProducer in its
For concrete examples see code in HeterogeneousCore/AlpakaTest
and DataFormats/PortableTestObjects
.
This example shows a mixture of behavior from test code in HeterogeneousCore/AlpakaTest/plugins/alpaka/
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDGetToken.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDPutToken.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESGetToken.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/Event.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/EventSetup.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"
#include "HeterogeneousCore/AlpakaInterface/interface/config.h"
// + usual #includes for the used framework components, data format(s), record(s)
// Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
namespace ALPAKA_ACCELERATOR_NAMESPACE {
// Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
class ExampleAlpakaProducer : public global::EDProducer<> {
public:
ExampleAlpakaProducer(edm::ParameterSet const& iConfig)
: EDProducer<>(iConfig),
// produces() must not specify the product type, it is deduced from deviceToken_
deviceToken_{produces()},
size_{iConfig.getParameter<int32_t>("size")} {}
// device::Event and device::EventSetup are defined in ALPAKA_ACCELERATOR_NAMESPACE as well
void produce(edm::StreamID sid, device::Event& iEvent, device::EventSetup const& iSetup) const override {
// get input data products
auto const& hostInput = iEvent.get(getTokenHost_);
auto const& deviceInput = iEvent.get(getTokenDevice_);
auto const& deviceESData = iSetup.getData(esGetTokenDevice_);
// run the algorithm, potentially asynchronously
portabletest::TestDeviceCollection deviceProduct{size_, event.queue()};
algo_.fill(event.queue(), hostInput, deviceInput, deviceESData, deviceProduct);
// put the asynchronous product into the event without waiting
// must use EDPutToken with emplace() or put()
//
// for a product produced with device::EDPutToken<T> the base class registers
// a separately scheduled transformation function for the copy to host
// the transformation function calls
// cms::alpakatools::CopyToDevice<portabletest::TestDeviceCollection>::copyAsync(Queue&, portabletest::TestDeviceCollection const&)
// function
event.emplace(deviceToken_, std::move(deviceProduct));
}
static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
// All backends must have exactly the same fillDescriptions() content!
edm::ParameterSetDescription desc;
desc.add<int32_t>("size");
descriptions.addWithDefaultLabel(desc);
}
private:
// use edm::EGetTokenT<T> to read from host memory space
edm::EDGetTokenT<FooProduct> const getTokenHost_;
// use device::EDGetToken<T> to read from device memory space
device::EDGetToken<BarProduct> const getTokenDevice_;
// use device::ESGetToken<T, TRecord> to read from device memory space
device::ESGetToken<TestProduct, TestRecord> const esGetTokenDevice_;
// use device::EDPutToken<T> to place the data product in the device memory space
device::EDPutToken<portabletest::TestDeviceCollection> const deviceToken_;
int32_t const size_;
// implementation of the algorithm
TestAlgo algo_;
};
} // namespace ALPAKA_ACCELERATOR_NAMESPACE
#include "HeterogeneousCore/AlpakaCore/interface/MakerMacros.h"
DEFINE_FWK_ALPAKA_MODULE(TestAlpakaProducer);
// Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
namespace ALPAKA_ACCELERATOR_NAMESPACE {
// Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
class ExampleAlpakaESProducer : public ESProducer {
public:
ExampleAlpakaESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
// register the production function
auto cc = setWhatProduced(this);
// register consumed ESProduct(s)
token_ = cc.consumes();
}
static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
// All backends must have exactly the same fillDescriptions() content!
edm::ParameterSetDescription desc;
descriptions.addWithDefaultLabel(desc);
}
// return type can be
// - std::optional<T> (T is cheap to move),
// - std::unique_ptr<T> (T is not cheap to move),
// - std::shared_ptr<T> (allows sharing between IOVs)
//
// the base class registers a separately scheduled function to copy the product on device memory
// the function calls
// cms::alpakatools::CopyToDevice<SimpleProduct>::copyAsync(Queue&, SimpleProduct const&)
// function
std::optional<SimpleProduct> produce(TestRecord const& iRecord) {
// get input data
auto const& hostInput = iRecord.get(token_);
// allocate data product on the host memory
SimpleProduct hostProduct;
// fill the hostProduct from hostInput
return hostProduct;
}
private:
edm::ESGetToken<TestProduct, TestRecord> token_;
};
} // namespace ALPAKA_ACCELERATOR_NAMESPACE
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaESProducer);
// Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
namespace ALPAKA_ACCELERATOR_NAMESPACE {
// Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
class ExampleAlpakaDeriveESProducer : public ESProducer {
public:
ExampleAlpakaDeriveESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
// register the production function
auto cc = setWhatProduced(this);
// register consumed ESProduct(s)
token_ = cc.consumes();
}
static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
// All backends must have exactly the same fillDescriptions() content!
edm::ParameterSetDescription desc;
descriptions.addWithDefaultLabel(desc);
}
std::optional<OtherProduct> produce(device::Record<TestRecord> const& iRecord) {
// get input data in the device memory space
auto const& deviceInput = iRecord.get(token_);
// allocate data product on the device memory
OtherProduct deviceProduct(iRecord.queue());
// run the algorithm, potentially asynchronously
algo_.fill(iRecord.queue(), deviceInput, deviceProduct);
// return the product without waiting
return deviceProduct;
}
private:
device::ESGetToken<SimpleProduct, TestRecord> token_;
OtherAlgo algo_;
};
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaDeviceESProducer);
There are a few different options for using Alpaka-based modules in the CMSSW configuration.
In all cases the configuration must load the necessary ProcessAccelerator
objects (see below) For accelerators used in production, these are aggregated in Configuration.StandardSequences.Accelerators_cff
. The runTheMatrix.py
handles the loading of this Accelerators_cff
automatically. The HLT menus also load the necessary ProcessAccelerator
s.
## Load explicitly
# One ProcessAccelerator for each accelerator technology, plus a generic one for Alpaka
process.load("Configuration.StandardSequences.Accelerators_cff")
The Alpaka modules can be used in the python configuration with their explicit, full type names
process.producerCPU = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...)
process.producerGPU = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
Obviously this kind of configuration can be run only on machines that provide the necessary hardware. The configuration is thus explicitly non-portable.
A step towards a portable configuration is to use the SwitchProcucer
mechanism, for which currently the only concrete implementation is SwitchProducerCUDA
. The modules for different Alpaka backends still need to be specified explicitly
from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
process.producer = SwitchProducerCUDA(
cpu = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...),
cuda = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
)
# or
process.producer = SwitchProducerCUDA(
cpu = cms.EDAlias(producerCPU = cms.EDAlias.allProducts(),
cuda = cms.EDAlias(producerGPU = cms.EDAlias.allProducts()
)
This kind of configuration can be run on any machine (a given CMSSW build supports), but is limited to CMSSW builds where the modules for all the Alpaka backends declared in the configuration can be built (alpaka_serial_sync
and alpaka_cuda_async
in this example). Therefore the SwitchProducer
approach is here called "semi-portable".
A fully portable way to express a configuration can be achieved with "module type resolver" approach. The module is specified in the configuration without the backend-specific namespace, and with @alpaka
postfix
process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka", ...)
# backend can also be set explicitly
process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
...
alpaka = cms.untracked.PSet(
backend = cms.untracked.string("serial_sync")
)
)
The @alpaka
postfix in the module type tells the system the module's exact class type should be resolved at run time. The type (or backend) is set according to the value of process.options.accelerators
and the set of accelerators available in the machine. If the backend is set explicitly in the module's alpaka
PSet, the module of that backend will be used.
This approach is portable also across CMSSW builds that support different sets of accelerators, as long as only the host backends (if any) are specified explicitly in the alpaka
PSet.
The explicitly-set backend must be one of those allowed by the job-wide process.options.accelerators
setting. This setting overrides the ProcessAcceleratorAlpaka
setting described in the next paragraph.
process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
...
alpaka = cms.untracked.PSet(
backend = cms.untracked.string("serial_sync") # or "cuda_async" or "rocm_async"
)
)
The explicitly-set backend must be one of those allowed by the job-wide process.options.accelerators
setting. This ProcessAcceleratorAlpaka
setting can be further overridden for individual modules as described in the previous paragraph.
process.ProcessAcceleratorAlpaka.setBackend("serial_sync") # or "cuda_async" or "rocm_async"
process.options.accelerators = ["cpu"] # or "gpu-nvidia" or "gpu-amd"
While the general approach is to favor asynchronous operations with non-blocking synchronization, for testing purposes it can be useful to synchronize the EDModule's acquire()
/ produce()
or ESProducer's production functions in a blocking way. Such a blocking synchronization can be specified for individual modules via the alpaka
PSet
along
process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka",
...
alpaka = cms.untracked.PSet(
synchronize = cms.untracked.bool(True)
)
)
The blocking synchronization can be specified for all Alpaka modules via the ProcessAcceleratorAlpaka
along
process.ProcessAcceleratorAlpaka.setSynchronize(True)
Note that the possible per-module parameter overrides this global setting.
Unit tests that depend on Alpaka and define <flags ALPAKA_BACKENDS="1"/>
, e.g. as a binary along
<bin name="<unique test binary name>" file="<comma-separated list of files">
<use name="alpaka"/>
<flags ALPAKA_BACKENDS="1"/>
</bin>
or as a command (e.g. cmsRun
or a shell script) to run
<test name="<unique name of the test>" command="<command to run>">
<use name="alpaka"/>
<flags ALPAKA_BACKENDS="1"/>
</test>
will be run as part of scram build runtests
according to the
availability of the hardware:
serial_sync
version is run alwayscuda_async
version is run if NVIDIA GPU is present (i.e.cudaIsEnabled
returns 0)rocm_async
version is run if AMD GPU is present (i.e.rocmIsEnabled
returns 0)
Tests for specific backend (or hardware) can be explicitly specified to be run by setting USER_UNIT_TESTS=cuda
or USER_UNIT_TESTS=rocm
environment variable. Tests not depending on the hardware are skipped. If the corresponding hardware is not available, the tests will fail.