Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cl_ext_buffer_device_address #1159

Merged
merged 5 commits into from
Feb 25, 2025

Conversation

pjaaskel
Copy link
Contributor

@pjaaskel pjaaskel commented May 7, 2024

The basic cl_mem buffer API doesn't enable access to the underlying raw pointers in the device memory, preventing its use in host side data structures that need pointer references to objects. This API adds a minimal increment on top of cl_mem that provides such capabilities.

The version 0.1.0 is implemented in PoCL and rusticl for prototyping, but everything's still up for discussion. chipStar is the first client that uses the API.

@karolherbst
Copy link
Contributor

any motivation to get this merged? Or anything else needed to discuss before merging this? Could also try to bring it up at the CL WG if needed.

@pjaaskel
Copy link
Contributor Author

any motivation to get this merged? Or anything else needed to discuss before merging this? Could also try to bring it up at the CL WG if needed.

Yep, this is still being discussed in the WG. I personally think it's useful as is and shouldn't harm anything if merged as it even has 2 implementations now.

@pjaaskel
Copy link
Contributor Author

Thanks @SunSerega

@SunSerega
Copy link
Contributor

Alright, and now the problem I found in #1171 is visible here because the Presubmit workflow has been properly rerun.

@karolherbst

This comment was marked as resolved.

@pjaaskel
Copy link
Contributor Author

pjaaskel commented Sep 4, 2024

One thing I'm wondering about is how should clCreateSubBuffer behave when being executed on a bda memory object? Should it fail or succeed? I'm fine with either, just wondering if some of the behavior needs to be formalized.

CL_MEM_DEVICE_PTR_EXT and CL_MEM_DEVICE_PTRS_EXT comes to my mind, which should probably return the address + the sub buffer offset, at least that's how I have implemented it so far.

Yes, this was the idea. I'll add a mention in the next update.

@pjaaskel
Copy link
Contributor Author

pjaaskel commented Sep 9, 2024

Updated according to @karolherbst comments.

Copy link
Contributor

@karolherbst karolherbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already implemented the extension fully in rusticl/mesa (including sharing the same address across devices) and I think it's fine, however I'd still urge to address the concerns I have for layered implementations implementing it on top of Vulkan. I've already considered the constraints when implementing it, however I think it's better to provide clients to query if address sharing across multiple devices is supported or not.

@pjaaskel pjaaskel force-pushed the cl_ext_buffer_device_address branch from b8df46b to 1931416 Compare September 24, 2024 12:26
@pjaaskel
Copy link
Contributor Author

@SunSerega thanks!

@pjaaskel
Copy link
Contributor Author

@karolherbst I asked about this in the CL/memory WG. We need to submit CTS tests and this might be good to go then with this one. Do you have good tests in Rusticl side we could use? The test in PoCL is quite basic (and needs to be updated), but can be used as a starting point also.

@karolherbst
Copy link
Contributor

@karolherbst I asked about this in the CL/memory WG. We need to submit CTS tests and this might be good to go then with this one. Do you have good tests in Rusticl side we could use? The test in PoCL is quite basic (and needs to be updated), but can be used as a starting point also.

I haven't written any more tests and only used the one in pocl. But I can probably help out with writing tests here.

Copy link
Contributor

@kpet kpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this proposal!

@pjaaskel pjaaskel force-pushed the cl_ext_buffer_device_address branch from 9ad04fb to edcff73 Compare December 6, 2024 15:53
@pjaaskel
Copy link
Contributor Author

pjaaskel commented Dec 6, 2024

Thanks @kpet for the feedback. Implemented most of it.

@pjaaskel pjaaskel force-pushed the cl_ext_buffer_device_address branch from edcff73 to c07ff8f Compare December 6, 2024 16:06
@pjaaskel pjaaskel requested review from karolherbst and kpet December 6, 2024 16:06
@pjaaskel
Copy link
Contributor Author

pjaaskel commented Dec 6, 2024

After @karolherbst and @kpet are happy with this, we'll implement in PoCL and @franz will add a CTS pull request. Then we can mark this 1.0.0 and merge, I think.

@pjaaskel
Copy link
Contributor Author

I cleaned up the commit history and the history description in the docs and upped it to 1.0.0. The headers generated OK. IMHO we could merge this in and update the CTS next.

Copy link
Contributor

@karolherbst karolherbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through it again as I'm updating my implementation. I left a nit and a comment, but the later shouldn't block this.

@pjaaskel
Copy link
Contributor Author

Good points @bashbaug, I changed these and made it rev 1.0.1. Pinging @karolherbst @franz

The basic cl_mem buffer API doesn't enable access to the underlying raw
pointers in the device memory, preventing its use in host side
data structures that need pointer references to objects. This API
adds a minimal increment on top of cl_mem that provides such
capabilities.
The only content addition since the previous version is

"If the device supports SVM and {clCreateBufferWithProperties} is
 called with a pointer returned by {clSVMAlloc} as its _host_ptr_
 argument, and {CL_MEM_USE_HOST_PTR} is set in its _flags_ argument,
 the device-side address is guaranteed to match the _host_ptr."
* Made it explicit that passing illegal pointers is legal as long
as they are not referenced.
* Removed CL_INVALID_ARG_VALUE as a possible error in
clSetKernelArgDevicePointerEXT() as there are no illegal pointer
cases when calling this function. Return CL_INVALID_OPERATION for
clGetMemObjectInfo() if the pointer is not a buffer device
pointer.
* clSetKernelExecInfo() and clSetKernelArgDevicePointerEXT() now
only error out if no devices in the context associated with kernel
support device pointers.
Converted the clSetKernelArgDevicePointerEXT() address parameter to
a value instead of a pointer to the value.
@pjaaskel pjaaskel force-pushed the cl_ext_buffer_device_address branch from 2ef0476 to 8d13bfa Compare February 4, 2025 12:57
@pjaaskel
Copy link
Contributor Author

pjaaskel commented Feb 4, 2025

I believe I've now addressed the feedback. @franz pls check that this 1.0.2 matches the PoCL implementation and the CTS.

@bashbaug
Copy link
Contributor

bashbaug commented Feb 4, 2025

Would it be possible to create a draft PR with headers for this extension, generated from the XML file in this PR? Sometimes I've found that that can be a good way to identify XML issues, and we'll need to do this anyhow before merging the CTS changes.

@pjaaskel
Copy link
Contributor Author

pjaaskel commented Feb 4, 2025

Ah, I was planning to simply attach the generated cl_ext.h but forgot. Would this work:
cl_ext.h.gz

@karolherbst
Copy link
Contributor

Would it be possible to create a draft PR with headers for this extension, generated from the XML file in this PR? Sometimes I've found that that can be a good way to identify XML issues, and we'll need to do this anyhow before merging the CTS changes.

I can attach a diff, because I'll need one for mesa anyway.

@bashbaug
Copy link
Contributor

bashbaug commented Feb 6, 2025

Here's a draft PR with header changes generated from the XML file in this PR: KhronosGroup/OpenCL-Headers#273

@bashbaug
Copy link
Contributor

Discussed in the February 11th teleconference. Are these spec changes ready to go? In order to merge the CTS changes we need to merge the header changes first, and in order to merge the header changes we really ought to merge the XML changes first, which is included in this PR.

There are a few unresolved conversations in the PR, but I think many of them are close to a resolution.

@pjaaskel
Copy link
Contributor Author

This is ready to go as far as I'm concerned. I forgot to resolve the conversations, but I addressed those in the latest commits.

@bashbaug
Copy link
Contributor

Great. @kpet you still have some requested changes, but I think they've been addressed?

Let's see if we can get this merged in next week's call, if not before.

If we can get a review approval on the headers then we can merge the headers, also. KhronosGroup/OpenCL-Headers#273

Copy link
Contributor

@kpet kpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This final version is much nicer :).

the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property. The pointer value specified as
the argument value can be the pointer to the beginning of the buffer or any offset into
the buffer region. The device pointer value must be naturally aligned according to
the argument's type. It should be noted that it's legal to pass invalid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the argument's type. It should be noted that it's legal to pass invalid
the argument's type. It should be noted that it is legal to pass invalid

through the execution information. Non-argument device pointers accessed
by the kernel must be specified by passing pointers to those buffers
via this {clSetKernelExecInfo} option.
endif::cl_ext_buffer_device_address[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we don't want to require this be called before or after argument setting. We probably ought to test both.

@bashbaug
Copy link
Contributor

Merging as discussed in the February 25th teleconference.

@bashbaug bashbaug merged commit f8f1c3c into KhronosGroup:main Feb 25, 2025
2 checks passed
aharon-abramson pushed a commit to aharon-abramson/OpenCL-Docs that referenced this pull request Feb 27, 2025
* cl_ext_buffer_device_address

The basic cl_mem buffer API doesn't enable access to the underlying raw
pointers in the device memory, preventing its use in host side
data structures that need pointer references to objects. This API
adds a minimal increment on top of cl_mem that provides such
capabilities.

* BDA: Removed CL_MEM_DEVICE_SHARED_ADDRESS_EXT as unneeded.

Also made the enums globally unique.

* cl_ext_buffer_device_address to 1.0.0

The only content addition since the previous version is

"If the device supports SVM and {clCreateBufferWithProperties} is
 called with a pointer returned by {clSVMAlloc} as its _host_ptr_
 argument, and {CL_MEM_USE_HOST_PTR} is set in its _flags_ argument,
 the device-side address is guaranteed to match the _host_ptr."

* cl_ext_buffer_device_address: Revision 1.0.1

* Made it explicit that passing illegal pointers is legal as long
as they are not referenced.
* Removed CL_INVALID_ARG_VALUE as a possible error in
clSetKernelArgDevicePointerEXT() as there are no illegal pointer
cases when calling this function. Return CL_INVALID_OPERATION for
clGetMemObjectInfo() if the pointer is not a buffer device
pointer.
* clSetKernelExecInfo() and clSetKernelArgDevicePointerEXT() now
only error out if no devices in the context associated with kernel
support device pointers.

* cl_ext_buffer_device_address: Revision 1.0.2

Converted the clSetKernelArgDevicePointerEXT() address parameter to
a value instead of a pointer to the value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants