Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memberlist revendor and optimizations #2040

Merged
merged 2 commits into from
Jan 24, 2018

Conversation

fcrisciani
Copy link

@fcrisciani fcrisciani commented Dec 19, 2017

  • Revendoring memberlist diff: hashicorp/memberlist@v0.1.0...master

  • Insert back failed nodes when join is notified by memberlist

  • Delete same behavior of create, cannot delete a key already deleted

@codecov-io
Copy link

codecov-io commented Dec 19, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@5ab4ab8). Click here to learn what that means.
The diff coverage is 79.16%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2040   +/-   ##
=========================================
  Coverage          ?   40.45%           
=========================================
  Files             ?      138           
  Lines             ?    22167           
  Branches          ?        0           
=========================================
  Hits              ?     8967           
  Misses            ?    11887           
  Partials          ?     1313
Impacted Files Coverage Δ
networkdb/event_delegate.go 66.66% <0%> (ø)
networkdb/delegate.go 74.35% <100%> (ø)
networkdb/networkdb.go 66.2% <80.95%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ab4ab8...44298c9. Read the comment docs.

}
// Update the entry
entry.ltime = nDB.tableClock.Increment()
entry.node = nDB.config.NodeID
Copy link
Contributor

@ddebroy ddebroy Dec 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The createOrUpdateEntry call below used to be within a Lock/Unlock. Was the locking unnecessary?

Is it possible that as the entry members (ltime/node/value) are being updated, a getEntry from a parallel thread might see inconsistent state for the entry since ndb is not being locked?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the locking was to protect the internal data structure

Yes what you are saying is correct, and will affect both the update/delete code path changed here.
At the same time, previous logic based on the acquisition of 2 times the same lock does not avoid the race between 2 creates for the same key.
Will try to come up with a unit test for that and provide a different fix

reapTime: nDB.config.reapEntryInterval,
}
entry.ltime = nDB.tableClock.Increment()
entry.node = nDB.config.NodeID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is updating entry.node necessary in the delete path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because the deletion has the same meaning of a write, so if node A deletes a key of node B, it become the new owner of the key. (Consider that this is all theoretical because current use of it done by libnetwork has a single writer per key)

@fcrisciani fcrisciani force-pushed the memberlist_revendor branch 2 times, most recently from b23c9f4 to 697b17b Compare December 21, 2017 22:43
Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment.

if !ok || network.leaving || !nodePresent {
// I'm out of the network OR the event owner is not anymore part of the network so do not propagate
return false
}

nDB.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about defer nDB.Unlock() right after nDB.Lock() here? That way no need to unlock in the exit scenario down below and again at the very end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was simply trying to avoid keeping the lock for more than was necessary like the dispatch of the event. For symmetry with the other methods after the entry is inserted into the tree you can release the lock and handle whatever operation is following, like sending the notification to other nodes like the case of Create/Update/Delete or like here dispatching the event to the app

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good wrt symmetry.

@selansen
Copy link
Contributor

LGTM

@fcrisciani
Copy link
Author

Needs round on the e2e test before being merged

@ddebroy
Copy link
Contributor

ddebroy commented Dec 22, 2017

LGTM

Flavio Crisciani added 2 commits January 23, 2018 14:22
diff: hashicorp/memberlist@v0.1.0...master

Relevant changes:
 - Calculates the timeout for dial using the deadline
 - Reduce LAN min suspicion multiplier
 - fix deadlock in shutdown process of memberlist

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Avoid waiting for a double notification once a node rejoin, just
put it back to active state. Waiting for a further message does not
really add anything to the safety of the operation, the source of truth
for the node status resided inside memberlist.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
@fcrisciani
Copy link
Author

rebased on top of latest master and e2e tests passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants