Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CSR query issue to enable new CSR protocal #200

Merged
merged 1 commit into from
Nov 18, 2022

Conversation

ericxu233
Copy link
Contributor

@ericxu233 ericxu233 commented Nov 16, 2022

The SYCL runtime queries device info before we even program our aocx. In our opencl runtime, the function clGetDeviceInfoIntelFPGA() is called whenever a device property is queried. Some properties require the runtime to create an opencl context which queries the auto discovery string and CSR version from our FPGA device. So what's happening in the code is that SYCL will query 7 device properties that would need to trigger a clCreateContext which queries the auto discovery string and kernel CSR. Note that this device info querying stage happens before the runtime programs the .aocx bitstream compiled by our compiler. Therefore, the when we first run the .exe program after doing a "aocl initialize acl0 pac_a10", the device info querying stage would query kernel information of the defualt aocx provided by the BSP (if your on pac_a10 it's called "pac_a10.aocx"). That aocx is compiled with a very old version of the compiler which explains why we are getting CSR version 4 initially.

In hardware runs, it seems that there are some dummy kernels present that messes up with the CSR check (the dummy kernels have a csr version of 4). When the runtime checks the csr for the dummy defualt kernels, they will overwrite the default cra_address_offset causing errors with the new compiler change. Adding this "else" statement will make sure that the cra_address_offset is set correctly.

For the current runtime, everything still works since the issue is not with backwards compatibility.

My previous sycl-l3 runs escaped this issue because I did not enable runtime backwards compatibility support back then.

@pcolberg pcolberg added the bug Something isn't working label Nov 17, 2022
@pcolberg pcolberg added this to the 2023.1 milestone Nov 17, 2022
@ericxu233
Copy link
Contributor Author

Newest sycl-l3 run shows no additional errors: https://spetc.intel.com/testsummary?testRunIds=7216415

@ericxu233
Copy link
Contributor Author

@pcolberg @zibaiwan I've described the underlying root cause of the issue in the main comment above. This else statement would suffice in fixing the issue. There might be some runtime deficiencies where we create contexts and query the auto-discovery string and CSR version too many times but that would require more investigation and likely out of scope for this change. This change is ready for review now.

Copy link
Contributor

@zibaiwan zibaiwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericxu233 . I think the else statement makes sense. I would suggest add a bit more detail in the code comments to explain why the old CSR version can be possibly queried, because the Runtime would call try_device to load the default aocx or whatever aocx was programmed on the board before.

zibaiwan
zibaiwan previously approved these changes Nov 18, 2022
Copy link
Contributor

@zibaiwan zibaiwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericxu233 !

Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericxu233!

@pcolberg pcolberg merged commit 385bd9c into intel:main Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants