Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not clear that lock is always taken for init_scan_list #78

Closed
cbiffle opened this issue Oct 9, 2020 · 4 comments
Closed

Not clear that lock is always taken for init_scan_list #78

cbiffle opened this issue Oct 9, 2020 · 4 comments

Comments

@cbiffle
Copy link

cbiffle commented Oct 9, 2020

I've noticed wavemon segfaulting a lot, and took the time to rebuild with debug symbols and capture a coredump.

It looks like it's interacting with bogus contents in the scan_result in init_scan_list. To reproduce this, start wavemon 0.9.1 (I am on Arch, specifically, running the 0.9.1-1 package version) and spam the F1-F3-F1-F3 keys until it crashes. Usually takes less than ten seconds for me.

I noticed this line in the source:

/** Initialize scan results. Requires lock to be taken. */

Nothing about the structure of the program ensures that the lock is taken, and in particular, it looks like the code path through main into scr_aplst_init calls init_scan_list on a global variable without any synchronization. It's also not clear how fini synchronizes with the mutex management in the scan thread; I don't see any calls to pthread_setcanceltype, so the pthread_cancel will take effect at some point later, possibly leading to data races. (I notice the mutex management changed in c74e52d and some synchronization was removed; I don't totally understand the change.)

@cbiffle
Copy link
Author

cbiffle commented Oct 9, 2020

On reflection, it also looks like the scan thread can call POSIX "cancellation point" APIs while the mutex is locked, which would mean that the thread could be cancelled without unlocking the mutex. Is there something preventing this that I've missed? (I would expect this to produce deadlocks etc. and I have not seen them, so I'm simply curious about this one.)

Edit: Also, where is the scan result mutex actually initialized? The code appears to rely on "all zeroes" being a legal mutex, which happens to work on glibc (apparently!) but is not correct per the POSIX standard, which states, "If mutex does not refer to an initialized mutex object, the behavior of pthread_mutex_lock(), pthread_mutex_trylock(), and pthread_mutex_unlock() is undefined."

@grrtrr
Copy link
Contributor

grrtrr commented Oct 9, 2020

So ... what is the solution you are suggesting?

@grrtrr
Copy link
Contributor

grrtrr commented Oct 17, 2020

It took me a while to reconstruct what is going on. The simplification in c74e52d removed the use of mutexes when destroying the scan list, since the thread can be canceled with a mutex held. The change was to use a static initializer (PTHREAD_MUTEX_INITIALIZER) in init_scan_list() - and I forgot to update the comment.

Your first comment above identifies the bug that c74e52d did not fix: pthread_cancel is called without a subsequent pthread_join, hence the thread can continue to run after parts of the data structure have already been freed.

I will rework the original fix and revisit the mutex handling. Thank you for reporting this.

grrtrr pushed a commit that referenced this issue Oct 17, 2020
This reverts commit c74e52d.
The fix was not right, using deferred pthread_cancel (the
default). Also the use of locks needs to be revised.
grrtrr pushed a commit that referenced this issue Oct 17, 2020
This addresses several issues in handling the scan thread:
1. pthread_cancel was called without pthread_join, allowing the
   detached thread to continue to run.
2. iw_nl80211_get_scan_data wiped the whole structure, including
   the mutex, causing undefined behaviour.
3. Address the problem of what happens if the thread is canceled
   with the mutex held: use a static allocator and disable thread
   cancellation during the time the mutex is held.
4. Sorting and computing metrics is done on a temporary struct
   to minimize the time the lock is held.
5. The main thread should not allocate/clear the scan data. This
   is now retained when switching windows.
@grrtrr
Copy link
Contributor

grrtrr commented Oct 17, 2020

@cbiffle - I have made comprehensive changes following your suggestions. Would you be able to compile from master and give this a spin? The result looks correct to my eyes, and I am not able to produce a segfault on my system.

@grrtrr grrtrr closed this as completed Oct 24, 2020
grrtrr pushed a commit that referenced this issue Nov 7, 2020
This updates the mutex handling to match how it was done for #78
(re-initalization of the static mutex is not required).
grrtrr pushed a commit that referenced this issue Nov 9, 2020
This updates the mutex handling to match how it was done for #78
(re-initalization of the static mutex is not required).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants