Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I am trying to add similar macro to AMDGPU as it exists in CUDA.jl, CUDA.@sync.
One reason is that AMDGPU currently returns following for
AMDGPU.@sync
:which seems to be a different then expected behaviour.
Trying to implement the macro, getting inspiration from CUDA.jl, I am facing some issues, namely:
import Base: @sync
which does not seems to be needed in CUDA.jlquote
should not rather be in following order such that one syncs afterret
:The MWE I am trying the syncing on is following
Note that the correct behaviour can be obtained by commenting the
AMDGPU.@sync
macro and syncing after the kernel withAMDGPU.synchronize
(or syncing the specific stream), The expected performance on MI250x is:while erroneous perf would return e.g.