Not clear that lock is always taken for init_scan_list #78

cbiffle · 2020-10-09T04:51:25Z

I've noticed wavemon segfaulting a lot, and took the time to rebuild with debug symbols and capture a coredump.

It looks like it's interacting with bogus contents in the scan_result in init_scan_list. To reproduce this, start wavemon 0.9.1 (I am on Arch, specifically, running the 0.9.1-1 package version) and spam the F1-F3-F1-F3 keys until it crashes. Usually takes less than ten seconds for me.

I noticed this line in the source:

/** Initialize scan results. Requires lock to be taken. */

Nothing about the structure of the program ensures that the lock is taken, and in particular, it looks like the code path through main into scr_aplst_init calls init_scan_list on a global variable without any synchronization. It's also not clear how fini synchronizes with the mutex management in the scan thread; I don't see any calls to pthread_setcanceltype, so the pthread_cancel will take effect at some point later, possibly leading to data races. (I notice the mutex management changed in c74e52d and some synchronization was removed; I don't totally understand the change.)

The text was updated successfully, but these errors were encountered:

cbiffle · 2020-10-09T15:53:31Z

On reflection, it also looks like the scan thread can call POSIX "cancellation point" APIs while the mutex is locked, which would mean that the thread could be cancelled without unlocking the mutex. Is there something preventing this that I've missed? (I would expect this to produce deadlocks etc. and I have not seen them, so I'm simply curious about this one.)

Edit: Also, where is the scan result mutex actually initialized? The code appears to rely on "all zeroes" being a legal mutex, which happens to work on glibc (apparently!) but is not correct per the POSIX standard, which states, "If mutex does not refer to an initialized mutex object, the behavior of pthread_mutex_lock(), pthread_mutex_trylock(), and pthread_mutex_unlock() is undefined."

grrtrr · 2020-10-09T22:58:46Z

So ... what is the solution you are suggesting?

grrtrr · 2020-10-17T14:06:45Z

It took me a while to reconstruct what is going on. The simplification in c74e52d removed the use of mutexes when destroying the scan list, since the thread can be canceled with a mutex held. The change was to use a static initializer (PTHREAD_MUTEX_INITIALIZER) in init_scan_list() - and I forgot to update the comment.

Your first comment above identifies the bug that c74e52d did not fix: pthread_cancel is called without a subsequent pthread_join, hence the thread can continue to run after parts of the data structure have already been freed.

I will rework the original fix and revisit the mutex handling. Thank you for reporting this.

This reverts commit c74e52d. The fix was not right, using deferred pthread_cancel (the default). Also the use of locks needs to be revised.

This addresses several issues in handling the scan thread: 1. pthread_cancel was called without pthread_join, allowing the detached thread to continue to run. 2. iw_nl80211_get_scan_data wiped the whole structure, including the mutex, causing undefined behaviour. 3. Address the problem of what happens if the thread is canceled with the mutex held: use a static allocator and disable thread cancellation during the time the mutex is held. 4. Sorting and computing metrics is done on a temporary struct to minimize the time the lock is held. 5. The main thread should not allocate/clear the scan data. This is now retained when switching windows.

grrtrr · 2020-10-17T20:19:56Z

@cbiffle - I have made comprehensive changes following your suggestions. Would you be able to compile from master and give this a spin? The result looks correct to my eyes, and I am not able to produce a segfault on my system.

This updates the mutex handling to match how it was done for #78 (re-initalization of the static mutex is not required).

grrtrr pushed a commit that referenced this issue Oct 17, 2020

Revert "Scan screen: further simplify mutex use (#78)"

9c21760

This reverts commit c74e52d. The fix was not right, using deferred pthread_cancel (the default). Also the use of locks needs to be revised.

grrtrr closed this as completed Oct 24, 2020

grrtrr pushed a commit that referenced this issue Nov 7, 2020

Info Screen: revise mutex handling

975f642

This updates the mutex handling to match how it was done for #78 (re-initalization of the static mutex is not required).

grrtrr pushed a commit that referenced this issue Nov 9, 2020

Info Screen: revise mutex handling

9ff08d1

This updates the mutex handling to match how it was done for #78 (re-initalization of the static mutex is not required).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not clear that lock is always taken for init_scan_list #78

Not clear that lock is always taken for init_scan_list #78

cbiffle commented Oct 9, 2020 •

edited

Loading

cbiffle commented Oct 9, 2020 •

edited

Loading

grrtrr commented Oct 9, 2020

grrtrr commented Oct 17, 2020

grrtrr commented Oct 17, 2020

Not clear that lock is always taken for init_scan_list #78

Not clear that lock is always taken for init_scan_list #78

Comments

cbiffle commented Oct 9, 2020 • edited Loading

cbiffle commented Oct 9, 2020 • edited Loading

grrtrr commented Oct 9, 2020

grrtrr commented Oct 17, 2020

grrtrr commented Oct 17, 2020

cbiffle commented Oct 9, 2020 •

edited

Loading

cbiffle commented Oct 9, 2020 •

edited

Loading