-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memberlist revendor and optimizations #2040
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2040 +/- ##
=========================================
Coverage ? 40.45%
=========================================
Files ? 138
Lines ? 22167
Branches ? 0
=========================================
Hits ? 8967
Misses ? 11887
Partials ? 1313
Continue to review full report at Codecov.
|
networkdb/networkdb.go
Outdated
} | ||
// Update the entry | ||
entry.ltime = nDB.tableClock.Increment() | ||
entry.node = nDB.config.NodeID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The createOrUpdateEntry
call below used to be within a Lock/Unlock. Was the locking unnecessary?
Is it possible that as the entry
members (ltime
/node
/value
) are being updated, a getEntry
from a parallel thread might see inconsistent state for the entry
since ndb
is not being locked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the locking was to protect the internal data structure
Yes what you are saying is correct, and will affect both the update/delete code path changed here.
At the same time, previous logic based on the acquisition of 2 times the same lock does not avoid the race between 2 creates for the same key.
Will try to come up with a unit test for that and provide a different fix
networkdb/networkdb.go
Outdated
reapTime: nDB.config.reapEntryInterval, | ||
} | ||
entry.ltime = nDB.tableClock.Increment() | ||
entry.node = nDB.config.NodeID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is updating entry.node
necessary in the delete path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, because the deletion has the same meaning of a write, so if node A deletes a key of node B, it become the new owner of the key. (Consider that this is all theoretical because current use of it done by libnetwork has a single writer per key)
b23c9f4
to
697b17b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one minor comment.
if !ok || network.leaving || !nodePresent { | ||
// I'm out of the network OR the event owner is not anymore part of the network so do not propagate | ||
return false | ||
} | ||
|
||
nDB.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about defer nDB.Unlock()
right after nDB.Lock()
here? That way no need to unlock
in the exit scenario down below and again at the very end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was simply trying to avoid keeping the lock for more than was necessary like the dispatch of the event. For symmetry with the other methods after the entry is inserted into the tree you can release the lock and handle whatever operation is following, like sending the notification to other nodes like the case of Create/Update/Delete or like here dispatching the event to the app
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good wrt symmetry.
LGTM |
Needs round on the e2e test before being merged |
LGTM |
diff: hashicorp/memberlist@v0.1.0...master Relevant changes: - Calculates the timeout for dial using the deadline - Reduce LAN min suspicion multiplier - fix deadlock in shutdown process of memberlist Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Avoid waiting for a double notification once a node rejoin, just put it back to active state. Waiting for a further message does not really add anything to the safety of the operation, the source of truth for the node status resided inside memberlist. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
697b17b
to
44298c9
Compare
rebased on top of latest master and e2e tests passed |
Revendoring memberlist diff: hashicorp/memberlist@v0.1.0...master
Insert back failed nodes when join is notified by memberlist
Delete same behavior of create, cannot delete a key already deleted