Skip to content

Commit

Permalink
Refactor locking for join/leave to avoid race
Browse files Browse the repository at this point in the history
Instead of using "sync.Once" to determine whether to initialize a
network sandbox or subnet sandbox, we use a traditional mutex +
initialization boolean.  This is because the initialization state isn't
truly a once-and-done condition.  Rather, libnetwork destroys network
and subnet sandboxes when the last endpoint leaves them.  The use of
sync.Once in this kind of scenario requires, therefore, re-initializing
the Once which is impoissible.  So the approach that libnetwork
currently takes is to use a pointer to a Once and redirect that pointer
to a new Once on reset.  This leads to nasty race conditions.

In addition to refactoring the locking, this patch merges the functions
joinSandbox(), and joinSubnetSandbox(). This makes the code both cleaner
and it also holds the network and subnet locks through the series of
read-modify-writes avoiding further potential races.  This does reduce
the potential parallelism which could be applied should there be many
joins coming in on many different subnets in the same overlay network.
However, this should be an extremely minor performance hit for a very
obscure case.

One important pattern in this commit is that it is crucial to avoid
sending peerDB messages while holding a driver or network lock.  The
changes herein defer such (asynchronous) notifications until after
release of such locks.  This prevents deadlocks where the peerDB
blocks acquiring said locks while the network method blocks trying
to send to the peerDB's channel.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
  • Loading branch information
ctelfer committed Jul 10, 2018
1 parent 441a3f9 commit 5d113d1
Show file tree
Hide file tree
Showing 4 changed files with 82 additions and 106 deletions.
10 changes: 1 addition & 9 deletions drivers/overlay/joinleave.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,18 +47,10 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
return fmt.Errorf("couldn't get vxlan id for %q: %v", s.subnetIP.String(), err)
}

if err := n.joinSandbox(false); err != nil {
if err := n.joinSandbox(s, false, true); err != nil {
return fmt.Errorf("network sandbox join failed: %v", err)
}

if err := n.joinSubnetSandbox(s, false); err != nil {
return fmt.Errorf("subnet sandbox join failed for %q: %v", s.subnetIP.String(), err)
}

// joinSubnetSandbox gets called when an endpoint comes up on a new subnet in the
// overlay network. Hence the Endpoint count should be updated outside joinSubnetSandbox
n.incEndpointCount()

sbox := n.sandbox()

overlayIfName, containerIfName, err := createVethPair()
Expand Down
Loading

0 comments on commit 5d113d1

Please sign in to comment.