Fix race conditions in the overlay network driver #2143

ctelfer · 2018-05-01T21:12:23Z

While reviewing the code for #1765, I came across several race conditions in the overlay driver code. These make classic read-modify-write mistakes by not holding their locks through the modifications. This means that parallel invocations can lead to inconsistent state.

When reviewing, please look at the 3 commits individually. They don't overlap in code, but if you look at the commit you will find an explanation for why I think there is a problem here as well as any tradeoffs I see in the proposed fix.

The overlay network driver is not properly using it's mutexes or sync.Onces. It made the classic mistake of not holding a lock through various read-modify-write operations. This can result in inconsistent state storage leading to more catastrophic issues. This patch attempts to maintain the previous semantics while holding the driver lock through operations that are read-modify-write of the driver's network state. One example of this race would be if two goroutines tried to invoke d.network() after the network ID was removed from the table. Both would try to reinstall it causing the "once" to get reinitialized twice without any lock protection. This could then lead to the "once" getting invoked twice on the same network. Furthermore, the changes to one of these network structures gets effectively discarded. It's also the case, that because there would be two simultaneous instances of the network, the various network Lock() invocations would be meaningless for race prevention. Signed-off-by: Chris Telfer <ctelfer@docker.com>

selansen · 2018-05-01T21:25:37Z

drivers/overlay/encryption.go

-		swp := d.keys[0]
-		d.keys[0] = d.keys[priIdx]
-		d.keys[priIdx] = swp
+		d.keys[0], d.keys[priIdx] = d.keys[priIdx], d.keys[0]


does this change give any specific improvement other than not using swap variable ?

No, just that this is more idiomatic Go as far as I'm aware.

selansen · 2018-05-01T21:34:33Z

drivers/overlay/ovmanager/ovmanager.go

@@ -137,8 +141,8 @@ func (d *driver) NetworkFree(id string) error {
 	}

 	d.Lock()
+	defer d.Unlock()


actual implementation locks and unlocks only at particular places. And the current change locks at the beginning of the function and unlocks when the function exits. Like you mentioned , will there be any issue for other threads who will be waiting little longer period than it used to be ? Just trying to see if there will be any trade off in performance or some other issue.

There could be a tradeoff in performance here. The only extra operation that gets protected by the lock is network.releaseVxlanID() which does two things:

locks itself temporarily to build a coherent list of VNIs to free

invokes the idm.Release() method for each VNI

The first would be protected against concurrency by the lock anyway. The second, however, invokes the IDM which uses the bitseq package which can end up writing to store. That last bit is the only long blocking operation I can think of here.

Even so, this is an "ovmanager.driver" instance and the only operations it locks on are creation and deletion. I'd assume that our desired rate for creation/deletion would be measured in the order of thousands per second, in which case I highly doubt a serial write-per-op would hurt us.

But (full transparency), I chafe at bad concurrency patterns and want to make it easy to reason about the correctness of the code. As I said in the commit log for this section: I doubt this race is serious.

This one is probably not critical. The worst that seems like could happen would be if 2 deletes occur at the same time (one of which should be an error): 1. network gets read from the map by delete-1 2. network gets read from the map by delete-2 3. delete-1 releases the network VNI 4. network create arrives at the driver and allocates the now free VNI 5. delete-2 releases the network VNI (error: it's been reallocated!) 6. both networks remove the VNI from the map Part 6 could also become an issue if there were a simultaneous create for the network at the same time. This leads to the modification of the NewNetwork() method which now checks for an existing network before adding it to the map. Signed-off-by: Chris Telfer <ctelfer@docker.com>

Multiple simultaneous updates here would leave the driver in a very inconsistent state. The disadvantage to this change is that it requires holding the driver lock while reprogramming the keys. Signed-off-by: Chris Telfer <ctelfer@docker.com>

selansen · 2018-05-01T21:47:09Z

LGTM

codecov-io · 2018-05-01T22:03:21Z

Codecov Report

❗ No coverage uploaded for pull request base (master@1aac06b). Click here to learn what that means.
The diff coverage is 6.45%.

@@            Coverage Diff            @@
##             master    #2143   +/-   ##
=========================================
  Coverage          ?   40.48%           
=========================================
  Files             ?      139           
  Lines             ?    22495           
  Branches          ?        0           
=========================================
  Hits              ?     9107           
  Misses            ?    12049           
  Partials          ?     1339

Impacted Files	Coverage Δ
drivers/overlay/ov_network.go	`2.79% <0%> (ø)`
drivers/overlay/encryption.go	`0% <0%> (ø)`
drivers/overlay/ovmanager/ovmanager.go	`49.58% <40%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1aac06b...d0b208a. Read the comment docs.

ddebroy · 2018-05-02T01:01:18Z

drivers/overlay/ovmanager/ovmanager.go

@@ -125,8 +125,12 @@ func (d *driver) NetworkAllocate(id string, option map[string]string, ipV4Data,
 	opts[netlabel.OverlayVxlanIDList] = val

 	d.Lock()
+	defer d.Unlock()
+	if _, ok := d.networks[id]; ok {


Is it possible to do this check (and error out) up above in the beginning of the function rather than towards the end?

That's a good suggestion. Hrm... What prefaces the lock itself is basically input checking. Some of it (the VxlanID allocation and possibly cleanup on error) would access the store and could be lengthy, so holding the lock would slow down other network create operations on different networks. But moving the check to the top would prevent concurrent attempts to create the same network from going through needless effort (and seems more sane stylistically).

So ... bit nicer hygine vs potentially increased parallelism (assuming that the client is actually using said parallelism). I could go either way on this. What do you think?

I was mainly looking at it from the stylistically sane perspective and error out early. Agree with the concern around potential hit in parallelism too. So perhaps okay to leave it the way you have now.

ddebroy · 2018-05-02T01:11:30Z

LGTM. A couple of the network-already-exists check may be moved to the beginning of the routines if possible while maintaining the "defer unlock" pattern.

fcrisciani

LGTM

ctelfer force-pushed the overlay-race-fix branch from 02d923f to 5109e8e Compare May 1, 2018 21:17

selansen reviewed May 1, 2018

View reviewed changes

ctelfer added 2 commits May 1, 2018 17:41

Remove race in encrypted overlay key update

d0b208a

Multiple simultaneous updates here would leave the driver in a very inconsistent state. The disadvantage to this change is that it requires holding the driver lock while reprogramming the keys. Signed-off-by: Chris Telfer <ctelfer@docker.com>

ctelfer force-pushed the overlay-race-fix branch from 5109e8e to d0b208a Compare May 1, 2018 21:42

fcrisciani requested a review from ddebroy May 2, 2018 00:00

ddebroy reviewed May 2, 2018

View reviewed changes

ctelfer mentioned this pull request May 2, 2018

services do not start: ingress-sbox is already present moby/moby#36743

Open

fcrisciani approved these changes May 3, 2018

View reviewed changes

ddebroy approved these changes May 3, 2018

View reviewed changes

fcrisciani merged commit 371f8ee into moby:master May 3, 2018

ctelfer deleted the overlay-race-fix branch May 3, 2018 19:02

thaJeztah mentioned this pull request May 20, 2018

bump libnetwork to eb6b2a57955e5c149d47c3973573216e8f8baa09 moby/moby#37111

Merged

thaJeztah mentioned this pull request Jun 8, 2018

Fix spurious deadlock in overlay driver #2180

Merged

ctelfer mentioned this pull request Jul 10, 2018

Fix overlay vxlan races #2146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race conditions in the overlay network driver #2143

Fix race conditions in the overlay network driver #2143

ctelfer commented May 1, 2018

selansen May 1, 2018

ctelfer May 1, 2018

selansen May 1, 2018

selansen May 1, 2018

ctelfer May 1, 2018

selansen commented May 1, 2018

codecov-io commented May 1, 2018

ddebroy May 2, 2018

ctelfer May 2, 2018

ddebroy May 2, 2018

ddebroy commented May 2, 2018

fcrisciani left a comment

Fix race conditions in the overlay network driver #2143

Fix race conditions in the overlay network driver #2143

Conversation

ctelfer commented May 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

selansen commented May 1, 2018

codecov-io commented May 1, 2018

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddebroy commented May 2, 2018

fcrisciani left a comment

Choose a reason for hiding this comment