Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race in kernel packet writing? #121

Closed
jpsamaroo opened this issue Mar 30, 2021 · 0 comments · Fixed by #125
Closed

Data race in kernel packet writing? #121

jpsamaroo opened this issue Mar 30, 2021 · 0 comments · Fixed by #125
Labels
bug Something isn't working

Comments

@jpsamaroo
Copy link
Member

We sometimes see errors like this:

Queue at 0x7fa2351f2000 inactivated due to async error:
HSA_STATUS_ERROR_INVALID_PACKET_FORMAT: The AQL packet is malformed.

I suspect our kernel launch procedure is not exactly correct, and in some situations the CPU reorders stores to the packet buffer incorrectly. We may be able to also catch these errors with a queue callback, and then just kill the queue, producing a nicer error and preventing a hang.

@jpsamaroo jpsamaroo added the bug Something isn't working label Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant