Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cl_ext_buffer_device_address #1159

Merged
merged 5 commits into from
Feb 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions api/cl_ext_buffer_device_address.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
// Copyright 2024 The Khronos Group Inc.
// SPDX-License-Identifier: CC-BY-4.0

include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[]

=== Other Extension Metadata

*Last Modified Date*::
2025-02-04
*IP Status*::
No known IP claims.
*Contributors*::
- Pekka Jääskeläinen, Intel +
- Karol Herbst, Red Hat +
- Ben Ashbaugh, Intel +
- Kevin Petit, Arm +
- Henry Linjamäki, Intel +

=== Description

This extension provides access to raw device pointers for cl_mem buffers
without requiring a shared virtual address space between the host and
the device.

==== Background

Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
that enables raw pointers in the OpenCL standard. Its coarse-grain
variant is relatively simple to implement on various platforms in terms of
coherency requirements, but it requires mapping the buffer's address range
to the host virtual address space.
However, various higher-level heterogeneous APIs present a memory allocation
routine which can allocate device-only memory and provide raw addresses to
it without guarentees of system-wide uniqueness. For example, minimal
implementations of OpenMP's omp_target_alloc() and CUDA/HIP's
cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device.

Host-device unified addressing might not be a major implementation issue in
systems which can provide virtual memory across the platform, but might
bring challenges in cases where the device presents a global memory with
a disjoint address space (that can also be a physical memory address space) or,
for example, when a barebone embedded system lacks virtual memory support altogether.
This extension is targeted to complement the OpenCL SVM extension by providing
an additional lower-end step in the spectrum of type of pointers/buffers OpenCL
can allocate.

=== New Command

* {clSetKernelArgDevicePointerEXT}

=== New Types

* {cl_mem_device_address_EXT}

=== New Enums

* {cl_mem_properties_TYPE}
** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
* {cl_mem_info_TYPE}
** {CL_MEM_DEVICE_ADDRESS_EXT}
* {cl_kernel_exec_info_TYPE}
** {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT}

=== Version History

* Revision 1.0.0, 2025-01-15
** Initial version for detailed review.
* Revision 1.0.1, 2025-01-28
** Made it explicit that passing illegal pointers is legal as long as they are
not referenced. Removed CL_INVALID_ARG_VALUE as a possible error in
clSetKernelArgDevicePointerEXT() as there are no illegal pointer
cases when calling this function. Return CL_INVALID_OPERATION for
clGetMemObjectInfo() if the pointer is not a buffer device pointer.
clSetKernelExecInfo() and clSetKernelArgDevicePointerEXT() now only
error out if no devices in the context associated with kernel support
device pointers.
* Revision 1.0.2, 2025-02-04
** Converted the clSetKernelArgDevicePointerEXT() address parameter to
a value instead of a pointer to the value.

120 changes: 119 additions & 1 deletion api/opencl_runtime_layer.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -595,6 +595,35 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_HANDLE_LIST_KHR.asciidoc[]
{CL_MEM_DEVICE_HANDLE_LIST_END_KHR_anchor}) to associate with the
external memory handle.
endif::cl_khr_external_memory[]

ifdef::cl_ext_buffer_device_address[]

| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor}

include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[]
| {cl_bool_TYPE}
| When set to {CL_TRUE}, specifies that the buffer must have a single fixed
device-side address for its lifetime, and the address can be queried via {clGetMemObjectInfo}.

Each device in the context can have their own (fixed) device-side address and
a copy of the created buffer which are synchronized
implicitly by the runtime.

The flag might imply that the buffer will be "pinned" permanently to
a device's memory, but might not be necessarily so, as long as the address
range of the buffer remains constant.

The device addresses of sub-buffers derived from {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
allocated buffers can be computed by adding the sub-buffer origin to the
device-specific start address.

If the device supports SVM and {clCreateBufferWithProperties} is called with a pointer
returned by {clSVMAlloc} as its _host_ptr_ argument, and {CL_MEM_USE_HOST_PTR} is
set in its _flags_ argument, the device-side address is guaranteed to match
the _host_ptr_.

endif::cl_ext_buffer_device_address[]

|====

ifdef::cl_khr_external_memory[]
Expand Down Expand Up @@ -662,6 +691,12 @@ ifdef::cl_khr_external_memory[]
{CL_MEM_DEVICE_HANDLE_LIST_KHR} is specified as part of _properties_.
** if _properties_ includes more than one external memory handle.
endif::cl_khr_external_memory[]
ifdef::cl_ext_buffer_device_address[]
* {CL_INVALID_OPERATION}
** If _properties_ includes {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there
are no devices in the context that support the {cl_ext_buffer_device_address_EXT}
extension.
endif::cl_ext_buffer_device_address[]

[[memory-flags-table]]
.List of supported memory flag values
Expand Down Expand Up @@ -6463,6 +6498,20 @@ include::{generated}/api/version-notes/CL_MEM_D3D11_RESOURCE_KHR.asciidoc[]
returns the _resource_ argument specified when _memobj_ was created.
endif::cl_khr_d3d11_sharing[]

ifdef::cl_ext_buffer_device_address[]
| {CL_MEM_DEVICE_ADDRESS_EXT_anchor}

include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
| {cl_mem_device_address_EXT_TYPE}[]
| If _memobj_ was created using {clCreateBufferWithProperties} with
the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to {CL_TRUE},
returns a list of device addresses for the buffer, one for each
device in the context in the same order as the list of devices
passed to {clCreateContext}.

endif::cl_ext_buffer_device_address[]


|====

// refError
Expand All @@ -6472,6 +6521,11 @@ successfully.
Otherwise, it returns one of the following errors:

* {CL_INVALID_MEM_OBJECT} if _memobj_ is a not a valid memory object.
ifdef::cl_ext_buffer_device_address[]
* {CL_INVALID_OPERATION} is returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if
the {cl_ext_buffer_device_address_EXT} is not supported or if the
buffer was not allocated with {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}.
endif::cl_ext_buffer_device_address[]
* {CL_INVALID_VALUE} if _param_name_ is not one of the supported values, or
if the size in bytes specified by _param_value_size_ is less than size of
the return type specified in the
Expand Down Expand Up @@ -10778,6 +10832,48 @@ Otherwise, it returns one of the following errors:
required by the OpenCL implementation on the host.
--

ifdef::cl_ext_buffer_device_address[]
[open,refpage='clSetKernelArgDevicePointerEXT',desc='Set a device pointer as the argument value for a specific argument of a kernel.',type='protos']
--
To set a device pointer as the argument value for a specific argument of a
kernel, call the function

include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]

* _kernel_ is a valid kernel object.
* _arg_index_ is the argument index.
Arguments to the kernel are referred by indices that go from 0 for the
leftmost argument to _n_ - 1, where _n_ is the total number of arguments
declared by a kernel.
* _arg_value_ is the device pointer that should be used as the argument value for
argument specified by _arg_index_.
The device pointer specified is the value used by all API calls that enqueue
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument
value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_.
The device pointer can only be used for arguments that are declared to be a
pointer to `global` memory allocated with {clCreateBufferWithProperties} with
the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property. The pointer value specified as
the argument value can be the pointer to the beginning of the buffer or any offset into
the buffer region. The device pointer value must be naturally aligned according to
the argument's type. It should be noted that it's legal to pass invalid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the argument's type. It should be noted that it's legal to pass invalid
the argument's type. It should be noted that it is legal to pass invalid

pointers as the value (similarly to C/C++ function calls with pointer arguments) as
long as the kernel doesn't dereference the pointer.

{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the argument was set
successfully. Otherwise, it returns one of the following errors:

* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support
the {cl_ext_buffer_device_address_EXT} extension.
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index.
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
by the OpenCL implementation on the device.
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
required by the OpenCL implementation on the host.
--
endif::cl_ext_buffer_device_address[]

[open,refpage='clSetKernelExecInfo',desc='Set additional execution information for a kernel.',type='protos']
--
To set additional execution information for a kernel, call the function
Expand Down Expand Up @@ -10844,6 +10940,19 @@ include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
If {clSetKernelExecInfo} has not been called with a value for
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM}, the default value is
{CL_TRUE}.

ifdef::cl_ext_buffer_device_address[]
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor}

include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[]
| {cl_mem_device_address_EXT_TYPE}[]
| Device pointers must reference locations contained entirely within
buffers that are passed to kernel as arguments, or that are passed
through the execution information. Non-argument device pointers accessed
by the kernel must be specified by passing pointers to those buffers
via this {clSetKernelExecInfo} option.
endif::cl_ext_buffer_device_address[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we don't want to require this be called before or after argument setting. We probably ought to test both.


|====

// refError
Expand All @@ -10853,7 +10962,16 @@ successfully.
Otherwise, it returns one of the following errors:

* {CL_INVALID_KERNEL} if _kernel_ is a not a valid kernel object.
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support SVM.
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_SVM_PTRS} and
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} if no devices in
the context associated with _kernel_ support SVM.
ifdef::cl_ext_buffer_device_address[]
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} if no
device in the context associated with _kernel_ support the {cl_ext_buffer_device_address_EXT}
extension.
endif::cl_ext_buffer_device_address[]
* {CL_INVALID_VALUE} if _param_name_ is not valid, if _param_value_ is
`NULL` or if the size specified by _param_value_size_ is not valid.
* {CL_INVALID_OPERATION} if _param_name_ is
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} and _param_value_ is {CL_TRUE}
and no devices in the context associated with _kernel_ support fine-grain
Expand Down
30 changes: 30 additions & 0 deletions xml/cl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@ server's OpenCL/api-docs repository.
<type category="define">typedef <type>cl_bitfield</type> <name>cl_platform_command_buffer_capabilities_khr</name>;</type>
<type category="define">typedef <type>cl_bitfield</type> <name>cl_mutable_dispatch_asserts_khr</name></type>
<type category="define">typedef <type>cl_bitfield</type> <name>cl_device_kernel_clock_capabilities_khr</name>;</type>
<type category="define">typedef <type>cl_ulong</type> <name>cl_mem_device_address_ext</name>;</type>

<comment>Structure types</comment>
<type category="struct" name="cl_dx9_surface_info_khr">
Expand Down Expand Up @@ -2315,6 +2316,12 @@ server's OpenCL/api-docs repository.
<unused start="0x42B0" end="0x4FFF"/>
</enums>

<enums start="0x5000" end="0x500F" name="enums.5000" comment="For cl_ext_buffer_device_address">
<enum value="0x5000" name="CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT"/>
<enum value="0x5001" name="CL_MEM_DEVICE_ADDRESS_EXT"/>
<enum value="0x5002" name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
</enums>

<enums start="0x10000" end="0x10FFF" name="enums.10000" vendor="Khronos" comment="Experimental range for internal development only. Do not allocate.">
<!-- Khronos will never assign values in this range, and vendors
should never ship using values in this range. It is intended
Expand Down Expand Up @@ -3730,6 +3737,12 @@ server's OpenCL/api-docs repository.
<param><type>cl_uint</type> <name>arg_index</name></param>
<param>const <type>void</type>* <name>arg_value</name></param>
</command>
<command suffix="CL_API_SUFFIX__VERSION_3_0">
<proto><type>cl_int</type> <name>clSetKernelArgDevicePointerEXT</name></proto>
<param><type>cl_kernel</type> <name>kernel</name></param>
<param><type>cl_uint</type> <name>arg_index</name></param>
<param><type>cl_mem_device_address_ext</type> <name>arg_value</name></param>
</command>
<command suffix="CL_API_SUFFIX__VERSION_2_0">
<proto><type>cl_int</type> <name>clSetKernelExecInfo</name></proto>
<param><type>cl_kernel</type> <name>kernel</name></param>
Expand Down Expand Up @@ -7191,6 +7204,23 @@ server's OpenCL/api-docs repository.
<command name="clSetContentSizeBufferPoCL"/>
</require>
</extension>
<extension name="cl_ext_buffer_device_address" revision="1.0.2" supported="opencl" depends="CL_VERSION_3_0">
<require>
<type name="cl_mem_device_address_ext"/>
</require>
<require>
<command name="clSetKernelArgDevicePointerEXT"/>
</require>
<require comment="cl_mem_properties">
<enum name="CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT"/>
</require>
<require comment="cl_mem_info">
<enum name="CL_MEM_DEVICE_ADDRESS_EXT"/>
</require>
<require comment="cl_kernel_exec_info">
<enum name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
</require>
</extension>
<extension name="cl_khr_command_buffer" revision="0.9.7" supported="opencl" depends="CL_VERSION_1_2" ratified="opencl" provisional="true">
<require>
<type name="CL/cl.h"/>
Expand Down