Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threading support refactor #152

Merged
merged 1 commit into from
Oct 6, 2022

Conversation

ericxu233
Copy link
Contributor

First stage of the multi-threading support refactor by providing a wrapper mutex class around the current routines and use std::loack_guard() to enable RAII practices.

@ericxu233 ericxu233 requested a review from pcolberg July 26, 2022 20:11
@ericxu233 ericxu233 marked this pull request as draft July 26, 2022 20:11
@ericxu233
Copy link
Contributor Author

Note: Using std::lock_guard() since std::scope_lock() requires the mutex to do std::try_lock() which our underlying implementation doesn't support.

@ericxu233 ericxu233 force-pushed the MT-support-refactor branch from 2fb1124 to 1a57d60 Compare July 26, 2022 20:19
@pcolberg pcolberg requested a review from zibaiwan July 26, 2022 20:46
@ericxu233
Copy link
Contributor Author

Preliminary hardware testing done. Now doing bulk change.

@ericxu233
Copy link
Contributor Author

ericxu233 commented Aug 10, 2022

Klockwork is failing for some reason...not sure why...

174 (Local) /__w/fpga-runtime-for-opencl/fpga-runtime-for-opencl/test/acl_program_test.cpp:394 ABV.STACK (1:Critical) Analyze
Array 'this->m_context->device' of size 128 may use index value(s) 0..MAX
  * acl_program_test.cpp:394: 'm_context' is passed as argument '$1->num_devices' to function 'clCreateProgramWithBuiltInKernels'.
  * acl_program_test.cpp:394: Index value(s) 0..MAX may be used to access array 'this->m_context->device' of size 128 while calling 'clCreateProgramWithBuiltInKernels'.
    * acl_program_test.cpp:394: Result of expression '($1-><id>_cl_context::num_devices</id>)+(-1)' is >= 0.
      * acl_program_test.cpp:394: Function argument '$1->num_devices' is >= 1.
    * acl_program_test.cpp:394: Value(s) >= 0 is used as violation index in function 'clCreateProgramWithBuiltInKernels'.
      * acl_program.cpp:719: 'context' is passed as argument '$1->num_devices' to function 'clCreateProgramWithBuiltInKernelsIntelFPGA'.
        * acl_program.cpp:636: 'context' is passed as argument '$1->num_devices' to function 'acl_context_uses_device'.
          * acl_context.cpp:454: Array 'context->device' of size 128 is used.
Current status 'Analyze'

Summary: 1 Local
1 Total Issue(s)

@zibaiwan @pcolberg Have any ideas on this issue? I did not modify the code highlighted by klockwork...

@pcolberg
Copy link
Contributor

pcolberg commented Aug 11, 2022

@zibaiwan @pcolberg Have any ideas on this issue? I did not modify the code highlighted by klockwork...

This is a tricky one. Have you tried adding debug printfs to the existing lock functions,

void acl_lock() {
l_init_once();
if (acl_global_lock_count == 0) {
acl_acquire_condvar(&l_acl_global_condvar);
}
acl_global_lock_count++;
}
void acl_unlock() {
acl_assert_locked();
acl_global_lock_count--;
if (acl_global_lock_count == 0) {
acl_release_condvar(&l_acl_global_condvar);
}
}

as well as the new lock functions,

void aclMutex::lock() { acl_lock(); }
void aclMutex::unlock() { acl_unlock(); }

and running the test in question to see whether the locking behaviour differs from before the refactoring?

export AOCL_BOARD_PACKAGE_ROOT=$(realpath test/board/a10_ref)
./acl_test -v -g acl_program -n create_from_builtin

@ericxu233
Copy link
Contributor Author

On a unrelated note to the previous issue: acl_condition_lock_guard will not be templated for now since I can't seem to get it work. Will add improvements later.

@ericxu233
Copy link
Contributor Author

Indeed found that the locking and locking are different...but maybe due to another thing (I didn't refactor acl_test yet). Will do more testings later...

@ericxu233
Copy link
Contributor Author

The klockwork issue is not related to locking. Confirmed that the locking pattern is the same with the test failing.
Although, there is the new code change has these extra outputs for some reason.
image

@ericxu233
Copy link
Contributor Author

clCreateProgramWithBuiltInKernels(m_context, 1, &m_device[0], "hello", &status);
The klockwork issue is not recognizing the second argument 1 as passing? There is a risk in clCreateProgramWithBuiltInKernels since the num_devices argument (the second argument) is a 32 bit integer but context->devices only is a 128 size array. This makes this portion of the function very risky:
image

@ericxu233
Copy link
Contributor Author

clCreateProgramWithBuiltInKernels(m_context, 1, &m_device[0], "hello", &status); The klockwork issue is not recognizing the second argument 1 as passing? There is a risk in clCreateProgramWithBuiltInKernels since the num_devices argument (the second argument) is a 32 bit integer but context->devices only is a 128 size array. This makes this portion of the function very risky: image

Changing the function arguments from 1 to 0 fixed the klockwork issue...very weird. (This breaks the unit test tho)

@zibaiwan
Copy link
Contributor

Klocwork seemed to have passed with the latest fix.

@ericxu233 ericxu233 force-pushed the MT-support-refactor branch from 8f79bab to 745ca1f Compare August 26, 2022 15:38
@ericxu233 ericxu233 marked this pull request as ready for review August 26, 2022 15:38
@ericxu233 ericxu233 requested review from pcolberg and zibaiwan August 26, 2022 15:39
@ericxu233
Copy link
Contributor Author

Testing done. No runtime errors introduced with sycl-l3 test runs.

@zibaiwan
Copy link
Contributor

Testing done. No runtime errors introduced with sycl-l3 test runs.

@ericxu233 , thanks for doing the testing. If not already done, please run the ocl test as well. You can find the test set in the official FullCycle-oneAPI-ocl-weekly-lnx run. If it doesn't include the multithreading test that we discussed earlier, please run the multi-threading test separately as well.

@zibaiwan
Copy link
Contributor

Please run clang-format by following the instruction in contributing. I know the name of the workflow "run clang-format" sounds like it may run the clang-format for you, but it actually only checks if your code is in clang-format and doesn't modify your pull request. (You probably already know this, just in case you don't.).

Copy link
Contributor

@zibaiwan zibaiwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me as long as the conversations are resolved and the additional tests are completed.

@ericxu233
Copy link
Contributor Author

@pcolberg Could you please take a look and see if the changes are ok :)

Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericxu233, this is close to completion 🙂

@ericxu233 ericxu233 requested review from pcolberg and zibaiwan and removed request for pcolberg and zibaiwan September 28, 2022 18:15
@zibaiwan zibaiwan self-requested a review September 28, 2022 18:49
zibaiwan
zibaiwan previously approved these changes Sep 28, 2022
Copy link
Contributor

@zibaiwan zibaiwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericxu233 . Looks good to me!

@pcolberg pcolberg force-pushed the MT-support-refactor branch 2 times, most recently from fa0934f to fc87493 Compare October 6, 2022 00:03
pcolberg
pcolberg previously approved these changes Oct 6, 2022
Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Eric, this is ready for merging 🙂

I have slightly updated the comments in acl_thread.h and removed left-over lock_count in acl_event.cpp and acl_mem.cpp which were causing warnings.

I also moved acl_is_locked_callback(), acl_wait_for_device_update(), and acl_signal_device_update() back to their old location. You had moved them to be consistent with the order in acl_thread.h, but I think it's fine to have the order in acl_thread.cpp be slightly out of sync, in favour of modifying as little code as possible. That will be appreciated by a future reader, e.g., looking at git blame.

Please let me know if everything is in order before this is merged.

pcolberg
pcolberg previously approved these changes Oct 6, 2022
First stage of multi-threading support refactor by providing a mutex wrapper
class around current routines and using std::scoped_lock to follow RAII.
@pcolberg
Copy link
Contributor

pcolberg commented Oct 6, 2022

Third time is the charm 🙂

You can omit the mutex type when constructing the scoped lock:

diff --git a/src/acl_command.cpp b/src/acl_command.cpp
index 6e71fc4..23a8c12 100644
--- a/src/acl_command.cpp
+++ b/src/acl_command.cpp
@@ -38,7 +38,7 @@
 ACL_EXPORT
 CL_API_ENTRY cl_int CL_API_CALL
 clEnqueueBarrierIntelFPGA(cl_command_queue command_queue) {
-  std::scoped_lock<acl_mutex_wrapper_t> lock{acl_mutex_wrapper};
+  std::scoped_lock lock{acl_mutex_wrapper};
 
   if (!acl_command_queue_is_valid(command_queue)) {
     return CL_INVALID_COMMAND_QUEUE;

@ericxu233
Copy link
Contributor Author

@pcolberg Looks good to me. I think everything is in order now. We are ready to merge. Thanks for the detailed code review and explanatoins!

@pcolberg pcolberg merged commit 334bdb7 into intel:main Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants