Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in the overlay network driver #2143

Merged
merged 3 commits into from
May 3, 2018

Conversation

ctelfer
Copy link
Contributor

@ctelfer ctelfer commented May 1, 2018

While reviewing the code for #1765, I came across several race conditions in the overlay driver code. These make classic read-modify-write mistakes by not holding their locks through the modifications. This means that parallel invocations can lead to inconsistent state.

When reviewing, please look at the 3 commits individually. They don't overlap in code, but if you look at the commit you will find an explanation for why I think there is a problem here as well as any tradeoffs I see in the proposed fix.

The overlay network driver is not properly using it's mutexes or
sync.Onces.  It made the classic mistake of not holding a lock through
various read-modify-write operations.  This can result in inconsistent
state storage leading to more catastrophic issues.

This patch attempts to maintain the previous semantics while holding the
driver lock through operations that are read-modify-write of the
driver's network state.

One example of this race would be if two goroutines tried to invoke
d.network() after the network ID was removed from the table.  Both would
try to reinstall it causing the "once" to get reinitialized twice
without any lock protection.  This could then lead to the "once" getting
invoked twice on the same network.  Furthermore, the changes to one of
these network structures gets effectively discarded.  It's also the
case, that because there would be two simultaneous instances of the
network, the various network Lock() invocations would be meaningless for
race prevention.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
@ctelfer ctelfer force-pushed the overlay-race-fix branch from 02d923f to 5109e8e Compare May 1, 2018 21:17
swp := d.keys[0]
d.keys[0] = d.keys[priIdx]
d.keys[priIdx] = swp
d.keys[0], d.keys[priIdx] = d.keys[priIdx], d.keys[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this change give any specific improvement other than not using swap variable ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just that this is more idiomatic Go as far as I'm aware.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -137,8 +141,8 @@ func (d *driver) NetworkFree(id string) error {
}

d.Lock()
defer d.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actual implementation locks and unlocks only at particular places. And the current change locks at the beginning of the function and unlocks when the function exits. Like you mentioned , will there be any issue for other threads who will be waiting little longer period than it used to be ? Just trying to see if there will be any trade off in performance or some other issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a tradeoff in performance here. The only extra operation that gets protected by the lock is network.releaseVxlanID() which does two things:

  1. locks itself temporarily to build a coherent list of VNIs to free
  2. invokes the idm.Release() method for each VNI

The first would be protected against concurrency by the lock anyway. The second, however, invokes the IDM which uses the bitseq package which can end up writing to store. That last bit is the only long blocking operation I can think of here.

Even so, this is an "ovmanager.driver" instance and the only operations it locks on are creation and deletion. I'd assume that our desired rate for creation/deletion would be measured in the order of thousands per second, in which case I highly doubt a serial write-per-op would hurt us.

But (full transparency), I chafe at bad concurrency patterns and want to make it easy to reason about the correctness of the code. As I said in the commit log for this section: I doubt this race is serious.

ctelfer added 2 commits May 1, 2018 17:41
This one is probably not critical.  The worst that seems like could
happen would be if 2 deletes occur at the same time (one of which
should be an error):
  1. network gets read from the map by delete-1
  2. network gets read from the map by delete-2
  3. delete-1 releases the network VNI
  4. network create arrives at the driver and allocates the now free VNI
  5. delete-2 releases the network VNI (error: it's been reallocated!)
  6. both networks remove the VNI from the map

Part 6 could also become an issue if there were a simultaneous create
for the network at the same time.  This leads to the modification of
the NewNetwork() method which now checks for an existing network before
adding it to the map.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
Multiple simultaneous updates here would leave the driver in a very
inconsistent state.  The disadvantage to this change is that it requires
holding the driver lock while reprogramming the keys.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
@ctelfer ctelfer force-pushed the overlay-race-fix branch from 5109e8e to d0b208a Compare May 1, 2018 21:42
@selansen
Copy link
Contributor

selansen commented May 1, 2018

LGTM

@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@1aac06b). Click here to learn what that means.
The diff coverage is 6.45%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2143   +/-   ##
=========================================
  Coverage          ?   40.48%           
=========================================
  Files             ?      139           
  Lines             ?    22495           
  Branches          ?        0           
=========================================
  Hits              ?     9107           
  Misses            ?    12049           
  Partials          ?     1339
Impacted Files Coverage Δ
drivers/overlay/ov_network.go 2.79% <0%> (ø)
drivers/overlay/encryption.go 0% <0%> (ø)
drivers/overlay/ovmanager/ovmanager.go 49.58% <40%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1aac06b...d0b208a. Read the comment docs.

@fcrisciani fcrisciani requested a review from ddebroy May 2, 2018 00:00
@@ -125,8 +125,12 @@ func (d *driver) NetworkAllocate(id string, option map[string]string, ipV4Data,
opts[netlabel.OverlayVxlanIDList] = val

d.Lock()
defer d.Unlock()
if _, ok := d.networks[id]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do this check (and error out) up above in the beginning of the function rather than towards the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion. Hrm... What prefaces the lock itself is basically input checking. Some of it (the VxlanID allocation and possibly cleanup on error) would access the store and could be lengthy, so holding the lock would slow down other network create operations on different networks. But moving the check to the top would prevent concurrent attempts to create the same network from going through needless effort (and seems more sane stylistically).

So ... bit nicer hygine vs potentially increased parallelism (assuming that the client is actually using said parallelism). I could go either way on this. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mainly looking at it from the stylistically sane perspective and error out early. Agree with the concern around potential hit in parallelism too. So perhaps okay to leave it the way you have now.

@ddebroy
Copy link
Contributor

ddebroy commented May 2, 2018

LGTM. A couple of the network-already-exists check may be moved to the beginning of the routines if possible while maintaining the "defer unlock" pattern.

Copy link

@fcrisciani fcrisciani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants