-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CSR query issue to enable new CSR protocal #200
Conversation
Newest sycl-l3 run shows no additional errors: https://spetc.intel.com/testsummary?testRunIds=7216415 |
@pcolberg @zibaiwan I've described the underlying root cause of the issue in the main comment above. This else statement would suffice in fixing the issue. There might be some runtime deficiencies where we create contexts and query the auto-discovery string and CSR version too many times but that would require more investigation and likely out of scope for this change. This change is ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ericxu233 . I think the else
statement makes sense. I would suggest add a bit more detail in the code comments to explain why the old CSR version can be possibly queried, because the Runtime would call try_device to load the default aocx or whatever aocx was programmed on the board before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ericxu233 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ericxu233!
The SYCL runtime queries device info before we even program our aocx. In our opencl runtime, the function clGetDeviceInfoIntelFPGA() is called whenever a device property is queried. Some properties require the runtime to create an opencl context which queries the auto discovery string and CSR version from our FPGA device. So what's happening in the code is that SYCL will query 7 device properties that would need to trigger a clCreateContext which queries the auto discovery string and kernel CSR. Note that this device info querying stage happens before the runtime programs the .aocx bitstream compiled by our compiler. Therefore, the when we first run the .exe program after doing a "aocl initialize acl0 pac_a10", the device info querying stage would query kernel information of the defualt aocx provided by the BSP (if your on pac_a10 it's called "pac_a10.aocx"). That aocx is compiled with a very old version of the compiler which explains why we are getting CSR version 4 initially.
In hardware runs, it seems that there are some dummy kernels present that messes up with the CSR check (the dummy kernels have a csr version of 4). When the runtime checks the csr for thedummydefualt kernels, they will overwrite the default cra_address_offset causing errors with the new compiler change. Adding this "else" statement will make sure that the cra_address_offset is set correctly.For the current runtime, everything still works since the issue is not with backwards compatibility.
My previous sycl-l3 runs escaped this issue because I did not enable runtime backwards compatibility support back then.