-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove code making lock acquisition order inconsistent #243
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @IlanTruanovsky for the detailed analysis. I am not quite sure I understand the first solution, chosen here, yet. Besides silencing Coverity, could you explain how this future-proofs the code? Apart from that, I am not sure whether it is a good idea to add more code to the default flow in order to resolve an issue in a non-default (the timeout without acquiring the signal semaphore) flow, which is used to report kernel status only when debugging is enabled.
While I still need to understand the issue better, I am wondering whether your third solution, removing the offending code that "lowers (the) signal if it was sent between timeout_period and now" altogether and leaving it at that would cause any issues. Did you try whether removing the code addresses the Coverity issue? Why was it added in the first place? Is it needed for functional correctness? I don't see how given that it only decrements the semaphore if it can be acquired immediately, which may or may not be the case. Is it a performance optimization?
This implementation (along with every other fix that I mentioned) will guarantee that the lock acquisition order remains consistent. If we were to add some other function involving acquiring the signaling semaphore and the receiver mutex, then we would know what the proper order is in acquiring the two of them to avoid causing a possible deadlock scenario.
Removing the code does address the Coverity issue. From what I can tell, that part of the code only has any affect when:
It seems that this case would occur very rarely in practice. Like you said, the only critical thing it influences is the the internal value of the semaphore (and also the value of the It does not seem to be a performance optimization either. I'm not sure how it could be, and I could not find any information from when this was implemented indicating that it improves performance. |
Thanks for the detailed explanation. Let's go with your third solution then, removing the offending code. This does not add new code to the non-debug flow and only affects the debug flow in a non-functional manner, if at all. (Whereas the potential, theoretical performance impact of not decrementing the semaphore detailed in #241 (comment) would affect the non-debug flow.) |
57d47d0
to
132b124
Compare
132b124
to
408a23d
Compare
Perfect, looks like integration testing of the third solution has passed and this is ready to be merged. Could you revise the commit message to include your full analysis, the proposed solutions, and the reasoning for choosing the solution? |
The lock acquisition order for acl_timed_wait_condvar can have different orderings. Either the singnal semaphore gets acquired first and then the waiter mutex, or vice versa. Coverity detects this and indicates that there's a possible deadlock scenario. There are 3 solutions to this: - Only acquire the signaling semaphore after locking the waiter mutex. - Only acquire the waiter mutex after acquiring the signaling semaphore. - Do not acquire the signaling mutex on lines 424-430, which is the only place where the lock acquisition order can differ. To implement either of the first two solutions, a combination of unlocking and then relocking the mutex/semaphore would have to be added. This would add more complexity to the function's timeout feature which is only used for reporting the kernel status when debugging (non-default flow) and so it is not ideal. The third solution is the solution this commit implements. This solution does not add new code to the non-debug flow and only affects the debug flow in a non-functional manner, if at all. This code also does not seem to be a performance optimization. This fixes the following Coverity issue: lib/acl_threadsupport/src/acl_threadsupport.c:411:5: Type: Thread deadlock (ORDER_REVERSAL) lib/acl_threadsupport/src/acl_threadsupport.c:390:3: Inconsistent order of nested lock 1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:398:3: 2. path: Condition "0 == C->num_waiters[my_entry_q]++", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:403:5: 3. path: Condition "timeout_period", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:408:7: 4. lock_acquire: Calling "sem_wait" acquires lock "acl_condvar_s.signal_sem". lib/acl_threadsupport/src/acl_threadsupport.c:408:7: 5. path: Condition "sem_wait(&C->signal_sem)", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:411:5: 6. lock_order: Calling "acl_acquire_condvar" acquires lock "acl_condvar_s.waiter_mutex" while holding lock "acl_condvar_s.signal_sem" (count: 2 / 6). lib/acl_threadsupport/src/acl_threadsupport.c:350:3: Inconsistent order of nested lock 6.1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:358:5: 6.2. lock: "pthread_mutex_lock" locks "C->waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 6.3. path: Condition "ret == 0", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 6.4. path: Falling through to end of if statement. lib/acl_threadsupport/src/acl_threadsupport.c:411:5: Examples of presumed correct order of lock acquisitions 7. lock_acquire: Example 1: Calling "acl_acquire_condvar" acquires lock "acl_condvar_s.waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:350:3: Inconsistent order of nested lock 7.1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:358:5: 7.2. lock: "pthread_mutex_lock" locks "C->waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 7.3. path: Condition "ret == 0", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 7.4. path: Falling through to end of if statement. lib/acl_threadsupport/src/acl_threadsupport.c:427:7: 8. example_lock_order: Example 1 (cont.): Calling "acl_sem_trywait" acquires lock "acl_condvar_s.signal_sem" while holding "acl_condvar_s.waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:553:3: Inconsistent order of nested lock 8.1. path: Condition "1", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 8.2. path: Condition "sem_trywait(sem) == 0", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:556:10: 8.3. path: Condition "*__errno_location() == 4", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:557:7: 8.4. path: Continuing loop. lib/acl_threadsupport/src/acl_threadsupport.c:553:3: 8.5. path: Condition "1", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 8.6. lock: "sem_trywait" decrements semaphore "sem". lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 8.7. path: Condition "sem_trywait(sem) == 0", taking true branch. lib/acl_threadsupport/test/acl_threadsupport_test.cpp:379:5: Examples of presumed correct order of lock acquisitions 9. lock_acquire: Example 2: Calling "acl_acquire_condvar" acquires lock "acl_condvar_s.waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:350:3: Inconsistent order of nested lock 9.1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:358:5: 9.2. lock: "pthread_mutex_lock" locks "C->waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 9.3. path: Condition "ret == 0", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 9.4. path: Falling through to end of if statement. lib/acl_threadsupport/test/acl_threadsupport_test.cpp:387:5: 10. example_lock_order: Example 2 (cont.): Calling "acl_wait_condvar" acquires lock "acl_condvar_s.signal_sem" while holding "acl_condvar_s.waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:459:3: Inconsistent order of nested lock 10.1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:461:3: 10.2. path: Condition "acl_timed_wait_condvar(C, 0)", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:462:5: 10.3. path: Jumping back to the beginning of the loop. lib/acl_threadsupport/src/acl_threadsupport.c:461:3: 10.4. lock: "acl_timed_wait_condvar" decrements semaphore "C->signal_sem". lib/acl_threadsupport/src/acl_threadsupport.c:390:3: Inconsistent order of nested lock 10.4.1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:398:3: 10.4.2. path: Condition "0 == C->num_waiters[my_entry_q]++", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:403:5: 10.4.3. path: Condition "timeout_period", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:404:7: 10.4.4. path: Condition "acl_sem_timedwait(&C->signal_sem, timeout_period)", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:407:5: 10.4.5. path: Falling through to end of if statement. lib/acl_threadsupport/src/acl_threadsupport.c:417:5: 10.4.6. path: Condition "--C->num_waiters[my_entry_q]", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:418:7: 10.4.7. path: Jumping back to the beginning of the loop. lib/acl_threadsupport/src/acl_threadsupport.c:417:5: 10.4.8. path: Condition "--C->num_waiters[my_entry_q]", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:426:5: 10.4.9. path: Condition "timed_out", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:427:7: 10.4.10. lock: "acl_sem_trywait" decrements semaphore "C->signal_sem". lib/acl_threadsupport/src/acl_threadsupport.c:553:3: Inconsistent order of nested lock 10.4.10.1. path: Condition "1", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 10.4.10.2. path: Condition "sem_trywait(sem) == 0", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:556:10: 10.4.10.3. path: Condition "*__errno_location() == 4", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:557:7: 10.4.10.4. path: Continuing loop. lib/acl_threadsupport/src/acl_threadsupport.c:553:3: 10.4.10.5. path: Condition "1", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 10.4.10.6. lock: "sem_trywait" decrements semaphore "sem". lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 10.4.10.7. path: Condition "sem_trywait(sem) == 0", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:427:7: 10.4.11. path: Condition "!acl_sem_trywait(&C->signal_sem)", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:435:3: 10.4.12. path: Falling through to end of if statement. lib/acl_threadsupport/src/acl_threadsupport.c:461:3: 10.5. path: Condition "acl_timed_wait_condvar(C, 0)", taking false branch. lib/acl_threadsupport/test/acl_threadsupport_test.cpp:463:3: Examples of presumed correct order of lock acquisitions 11. lock_acquire: Example 3: Calling "acl_acquire_condvar" acquires lock "acl_condvar_s.waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:350:3: Inconsistent order of nested lock 11.1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:358:5: 11.2. lock: "pthread_mutex_lock" locks "C->waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 11.3. path: Condition "ret == 0", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 11.4. path: Falling through to end of if statement. lib/acl_threadsupport/test/acl_threadsupport_test.cpp:464:3: 12. example_lock_order: Example 3 (cont.): Calling "acl_timed_wait_condvar" acquires lock "acl_condvar_s.signal_sem" while holding "acl_condvar_s.waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:390:3: Inconsistent order of nested lock 12.1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:398:3: 12.2. path: Condition "0 == C->num_waiters[my_entry_q]++", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:403:5: 12.3. path: Condition "timeout_period", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:404:7: 12.4. path: Condition "acl_sem_timedwait(&C->signal_sem, timeout_period)", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:407:5: 12.5. path: Falling through to end of if statement. lib/acl_threadsupport/src/acl_threadsupport.c:417:5: 12.6. path: Condition "--C->num_waiters[my_entry_q]", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:418:7: 12.7. path: Jumping back to the beginning of the loop. lib/acl_threadsupport/src/acl_threadsupport.c:417:5: 12.8. path: Condition "--C->num_waiters[my_entry_q]", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:426:5: 12.9. path: Condition "timed_out", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:427:7: 12.10. lock: "acl_sem_trywait" decrements semaphore "C->signal_sem". lib/acl_threadsupport/src/acl_threadsupport.c:553:3: Inconsistent order of nested lock 12.10.1. path: Condition "1", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 12.10.2. path: Condition "sem_trywait(sem) == 0", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:556:10: 12.10.3. path: Condition "*__errno_location() == 4", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:557:7: 12.10.4. path: Continuing loop. lib/acl_threadsupport/src/acl_threadsupport.c:553:3: 12.10.5. path: Condition "1", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 12.10.6. lock: "sem_trywait" decrements semaphore "sem". lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 12.10.7. path: Condition "sem_trywait(sem) == 0", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:427:7: 12.11. path: Condition "!acl_sem_trywait(&C->signal_sem)", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:435:3: 12.12. path: Falling through to end of if statement. lib/acl_threadsupport/test/acl_threadsupport_test.cpp:472:3: Examples of presumed correct order of lock acquisitions 13. lock_acquire: Example 4: Calling "acl_acquire_condvar" acquires lock "acl_condvar_s.waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:350:3: Inconsistent order of nested lock 13.1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:358:5: 13.2. lock: "pthread_mutex_lock" locks "C->waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 13.3. path: Condition "ret == 0", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:359:5: 13.4. path: Falling through to end of if statement. lib/acl_threadsupport/test/acl_threadsupport_test.cpp:473:3: 14. example_lock_order: Example 4 (cont.): Calling "acl_timed_wait_condvar" acquires lock "acl_condvar_s.signal_sem" while holding "acl_condvar_s.waiter_mutex". lib/acl_threadsupport/src/acl_threadsupport.c:390:3: Inconsistent order of nested lock 14.1. path: Condition "!C", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:398:3: 14.2. path: Condition "0 == C->num_waiters[my_entry_q]++", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:403:5: 14.3. path: Condition "timeout_period", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:404:7: 14.4. path: Condition "acl_sem_timedwait(&C->signal_sem, timeout_period)", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:407:5: 14.5. path: Falling through to end of if statement. lib/acl_threadsupport/src/acl_threadsupport.c:417:5: 14.6. path: Condition "--C->num_waiters[my_entry_q]", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:418:7: 14.7. path: Jumping back to the beginning of the loop. lib/acl_threadsupport/src/acl_threadsupport.c:417:5: 14.8. path: Condition "--C->num_waiters[my_entry_q]", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:426:5: 14.9. path: Condition "timed_out", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:427:7: 14.10. lock: "acl_sem_trywait" decrements semaphore "C->signal_sem". lib/acl_threadsupport/src/acl_threadsupport.c:553:3: Inconsistent order of nested lock 14.10.1. path: Condition "1", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 14.10.2. path: Condition "sem_trywait(sem) == 0", taking false branch. lib/acl_threadsupport/src/acl_threadsupport.c:556:10: 14.10.3. path: Condition "*__errno_location() == 4", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:557:7: 14.10.4. path: Continuing loop. lib/acl_threadsupport/src/acl_threadsupport.c:553:3: 14.10.5. path: Condition "1", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 14.10.6. lock: "sem_trywait" decrements semaphore "sem". lib/acl_threadsupport/src/acl_threadsupport.c:554:5: 14.10.7. path: Condition "sem_trywait(sem) == 0", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:427:7: 14.11. path: Condition "!acl_sem_trywait(&C->signal_sem)", taking true branch. lib/acl_threadsupport/src/acl_threadsupport.c:435:3: 14.12. path: Falling through to end of if statement.
408a23d
to
e9a4997
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks @IlanTruanovsky 🙂
Fixes this Coverity issue:
The cause of this Coverity issue is due to what I think is a misuse of the semaphore.
If there is no timeout set, or if the code did not timeout when calling
acl_timed_wait_condvar
on lines 403-409, then the thread acquires the signaling semaphore. We then lock the waiter mutex. So, our order of acquisition is:
signaling semaphore -> waiter mutex
If there is a timeout set and line 404 did timeout, then the signaling semaphore is not acquired, but we proceed to lock the waiter mutex. However, later on lines 424-430, we try to decrement the signaling semaphore, possibly acquiring it in the process. So, our order of acquisition in this scenario is:
waiter mutex -> signaling semaphore
Coverity catches this, and although I don't think deadlock is possible with the way the multithreading library is implemented, we should still maintain a proper mutex/semaphore acquisition ordering to avoid Coverity from complaining and to make the code more future proof.
There are 3 solutions to this that I can think of: