-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threading issue in rocksdb::WriteThread::Writer::CreateMutex() #2388
Comments
Unless there is a bug,
On the other hand, I need to do some benchmarking to see how much impact it will be to change from lazy initialize mutex to not doing so. The mutex is lazily initialize because unless in a congested scenario, we shouldn't need them. |
Thanks for the reply. Here are the answers to your questions:
|
And just in case, are you using |
No, two-phase commits are not involved. |
Quick update on this issue: I peeked into the upstream RocksDB code, which seems to contain a fix for this problem already. Following is the change I made to our 5.1.4 copy of RocksDB, and that seems to fix our problem of threads hanging in RocksDB and not making progress. Here's my patch:
I think this issue can then be closed, because the problem seems to be fixed in upstream RocksDB. The originally reported problem of multi-threading issues in |
Thank you much for investigation. Regarding your patch. Both before and after the patch, I suspect the bug is still hidden. But that you change the memory order in your patch somehow mitigates it. Please keep an eye on it. Lock-free algorithm is hard to debug.. |
Yes, the fact that it works for us now may be a side-effect of the changed memory order... I'll investigate further should the problem re-occur. |
Sounds good. |
When we import data into RocksDB with multiple threads, we sometimes see multiple threads hanging in RocksDB code sections and not making any more progress.
Backtraces of the hanging threads all look like this:
Precisely, the threads are all hanging in
WriteThread::BlockingAwaitState(...)
in write_thread.cc:37 (note: the code is from RocksDB version 5.1.4):I have attached to the process with gdb and added breakpoints after the
wait
call, but thewait
s never return.Browsing through the code of write_thread.cc I found one obvious problem that may cause trouble when multiple threads want to write in parallel:
BlockingAwaitState
unconditionally callsWriter::CreateMutex()
in line 27 as can be seen above. Multiple threads can enter it at the same time.CreateMutex
is then supposed to lazily initialize a mutex and a condition variable when not already initialized.The actual problem is in CreateMutex():
In line 131 it checks its
made_waitable
variable. If it'sfalse
, it sets it totrue
in line 135.This is not thread-safe: multiple threads can enter
CreateMutex()
at the same time, and all can see amade_waitable
value offalse
. They will then all enter the initialization section, which may initialize the mutex and the condition variable multiple times, potentially overriding existing state in the mutex and/or the condition variable.And if the mutex or condition variable state are corrupted, then it is very likely that there are follow-up issues when using them, e.g. the hanging threads as observed above.
I am wondering why the mutex and condition variable are initialized lazily at all, and why they are not initialized unconditionally in the constructor of
WriteThread::Writer
.I patched RocksDB locally as follows and it seems to fix the problem:
An alternative would be to make the
made_waitable
an atomic variable and usecompare_exchange_strong
inCreateMutex
. However, that would still require checking the state of the atomic variable on every invocation ofBlockingAwaitState(...)
. Initializing the objects just once and unconditionally in the ctor ofWriteThread::Write
seems to be more efficient and is at least easier to implement correctly.Note that all of the above applies to RocksDB version 5.1.4, but I peeked into the latest RocksDB version from the master
branch
and found the above code section to be still the same.So I think the problem is still there in current versions of RocksDB.
I browsed existing Github issues for similar bug reports, and I think that issue #1208 could be related.
The text was updated successfully, but these errors were encountered: