-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race conditions in the overlay network driver #2143
Conversation
The overlay network driver is not properly using it's mutexes or sync.Onces. It made the classic mistake of not holding a lock through various read-modify-write operations. This can result in inconsistent state storage leading to more catastrophic issues. This patch attempts to maintain the previous semantics while holding the driver lock through operations that are read-modify-write of the driver's network state. One example of this race would be if two goroutines tried to invoke d.network() after the network ID was removed from the table. Both would try to reinstall it causing the "once" to get reinitialized twice without any lock protection. This could then lead to the "once" getting invoked twice on the same network. Furthermore, the changes to one of these network structures gets effectively discarded. It's also the case, that because there would be two simultaneous instances of the network, the various network Lock() invocations would be meaningless for race prevention. Signed-off-by: Chris Telfer <ctelfer@docker.com>
swp := d.keys[0] | ||
d.keys[0] = d.keys[priIdx] | ||
d.keys[priIdx] = swp | ||
d.keys[0], d.keys[priIdx] = d.keys[priIdx], d.keys[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this change give any specific improvement other than not using swap variable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just that this is more idiomatic Go as far as I'm aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -137,8 +141,8 @@ func (d *driver) NetworkFree(id string) error { | |||
} | |||
|
|||
d.Lock() | |||
defer d.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actual implementation locks and unlocks only at particular places. And the current change locks at the beginning of the function and unlocks when the function exits. Like you mentioned , will there be any issue for other threads who will be waiting little longer period than it used to be ? Just trying to see if there will be any trade off in performance or some other issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be a tradeoff in performance here. The only extra operation that gets protected by the lock is network.releaseVxlanID() which does two things:
- locks itself temporarily to build a coherent list of VNIs to free
- invokes the idm.Release() method for each VNI
The first would be protected against concurrency by the lock anyway. The second, however, invokes the IDM which uses the bitseq package which can end up writing to store. That last bit is the only long blocking operation I can think of here.
Even so, this is an "ovmanager.driver" instance and the only operations it locks on are creation and deletion. I'd assume that our desired rate for creation/deletion would be measured in the order of thousands per second, in which case I highly doubt a serial write-per-op would hurt us.
But (full transparency), I chafe at bad concurrency patterns and want to make it easy to reason about the correctness of the code. As I said in the commit log for this section: I doubt this race is serious.
This one is probably not critical. The worst that seems like could happen would be if 2 deletes occur at the same time (one of which should be an error): 1. network gets read from the map by delete-1 2. network gets read from the map by delete-2 3. delete-1 releases the network VNI 4. network create arrives at the driver and allocates the now free VNI 5. delete-2 releases the network VNI (error: it's been reallocated!) 6. both networks remove the VNI from the map Part 6 could also become an issue if there were a simultaneous create for the network at the same time. This leads to the modification of the NewNetwork() method which now checks for an existing network before adding it to the map. Signed-off-by: Chris Telfer <ctelfer@docker.com>
Multiple simultaneous updates here would leave the driver in a very inconsistent state. The disadvantage to this change is that it requires holding the driver lock while reprogramming the keys. Signed-off-by: Chris Telfer <ctelfer@docker.com>
LGTM |
Codecov Report
@@ Coverage Diff @@
## master #2143 +/- ##
=========================================
Coverage ? 40.48%
=========================================
Files ? 139
Lines ? 22495
Branches ? 0
=========================================
Hits ? 9107
Misses ? 12049
Partials ? 1339
Continue to review full report at Codecov.
|
@@ -125,8 +125,12 @@ func (d *driver) NetworkAllocate(id string, option map[string]string, ipV4Data, | |||
opts[netlabel.OverlayVxlanIDList] = val | |||
|
|||
d.Lock() | |||
defer d.Unlock() | |||
if _, ok := d.networks[id]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to do this check (and error out) up above in the beginning of the function rather than towards the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. Hrm... What prefaces the lock itself is basically input checking. Some of it (the VxlanID allocation and possibly cleanup on error) would access the store and could be lengthy, so holding the lock would slow down other network create operations on different networks. But moving the check to the top would prevent concurrent attempts to create the same network from going through needless effort (and seems more sane stylistically).
So ... bit nicer hygine vs potentially increased parallelism (assuming that the client is actually using said parallelism). I could go either way on this. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mainly looking at it from the stylistically sane perspective and error out early. Agree with the concern around potential hit in parallelism too. So perhaps okay to leave it the way you have now.
LGTM. A couple of the network-already-exists check may be moved to the beginning of the routines if possible while maintaining the "defer unlock" pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
While reviewing the code for #1765, I came across several race conditions in the overlay driver code. These make classic read-modify-write mistakes by not holding their locks through the modifications. This means that parallel invocations can lead to inconsistent state.
When reviewing, please look at the 3 commits individually. They don't overlap in code, but if you look at the commit you will find an explanation for why I think there is a problem here as well as any tradeoffs I see in the proposed fix.