Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 17.06] NetworkDB qlen optimization #2379

Merged
merged 1 commit into from
Jun 25, 2019

Conversation

trapier
Copy link

@trapier trapier commented May 17, 2019

Backport of #2216 to 17.06. This should be considered a starting point. Behaves as expected. Needs further review from a logic standpoint.

git checkout -b 17.06-netdb-qlen-issue origin/bump_17.06
git cherry-pick -sx 5ed38221164e04c78d20f17839aab52fafd7fe88

Cherry-Pick Conflicts

Location Conflict Reason Resolution
networkdb/broadcast.go:113,networkdb/broadcast.go:171 node was removed by #2216. nDB.config.NodeName changed to nDB.config.NodeID via #1936 and was then removed in #2216 "ours". Keeping this message in since I don't know when and why it became unnecessary. See 5ed3822#r200441648. Decision to leave this in amenable to further research into why it was removed with #2216, which removed it to make networkdb more efficient.
networkdb/cluster.go:30 Order of variables in patch context does not match following #2169 which backported #2134. "both". Keep rejoinClusterDuration and rejoinInterval as in 17.06 and bring in maxQueueLenBroadcastOnSync from #2216.
networkdb/delegate.go:277 reapEntryInterval moved to networkdb.go:Config in #1957 Stay with 17.06 const convention for reapEntryInterval

Logical Conflicts

Location Conflict Reason Resolution
networkdb/delegate.go:179 #2040 not in 17.06. fatal error: sync: Unlock of unlocked RWMutex "ours". Don't need to unlock for a lock that 17.06 does not have. Do not take this line from #2216.

How I tested

  1. Vendor this PR onto github.com/docker/docker-ee tag v17.06.2-ee-21 and change networkdb StatsPrintPeriod from 5 minutes to 15 seconds to improve observability in test.

  2. In a separate build backport the networkdb diagnostic API to libnetwork 9a47f8d and docker-ee cd8d740be0c048f156438d21975af753ac1b5225 (v17.06.2-ee-20). Use this build to stress networkdb in steps 6 and 8.

  3. Bring up a 31 node dind cluster (one manager 30 workers) with the build from step 1.

  4. Join one worker on 17.06 build from step 2 with networkdb diagnostic API.

  5. Create an overlay network and deploy a mode global service to it.

  6. Use the diagnostic API to introduce a "static load" of 1400 additional entries, half in endpoint_table, and half in overlay_peer_table on the test network.

  7. Wait for test network qLen to reach 0 on all nodes according to NetworkDB stats in dockerd logs.

  8. Use the diagnostic node to introduce a "dynamic load" by adding and 0.5s later removing an endpoint_table entry, doing this three times a second continuously. This seems to be about the max rate the cluster can sustain under otherwise quiet conditions before qLen begins to grow without bound.

  9. Restart one of the non-diagnostic workers, and observe its qLen. Prior to this PR, qLen on the joining node initially rises to the total number of entries, then steadily increases from there over minutes. With this PR qLen on the joining node matches that of other nodes on the first sample after rejoining (within 15s StatsPrintPeriod).

Test tooling

Scripts developed for testing this pr provided for reference below. They probably won't work flawlessly in your environment.

test.sh.txt brings up dind cluster. provide argument down to remove the nodes.

stressndb.sh.txt requires arguments either static to introduce 1400 entries or dynamic to produce the load noted in steps 6 and 8 above, respectively.

Added some optimizations to reduce the messages in the queue:
1) on join network the node execute a tcp sync with all the nodes that
it is aware part of the specific network. During this time before the
node was redistributing all the entries. This meant that if the network
had 10K entries the queue of the joining node will jump to 10K. The fix
adds a flag on the network that would avoid to insert any entry in the
queue till the sync happens. Note that right now the flag is set in
a best effort way, there is no real check if at least one of the nodes
succeed.
2) limit the number of messages to redistribute coming from a TCP sync.
Introduced a threshold that limit the number of messages that are
propagated, this will disable this optimization in case of heavy load.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 5ed3822)
Signed-off-by: Trapier Marshall <trapier.marshall@docker.com>
@thaJeztah
Copy link
Member

Haven't checked all the details, but would it be better to include some of the additional commits that caused the conflicts (e.g. include #1936 so that the cherry-pick will be clean)?

@trapier
Copy link
Author

trapier commented May 24, 2019

Don't have enough experience with this code base to know what octopuses lurk behind the intervening PRs. I suppose it would make sense to start down that path and see what kind of hair starts growing. Should I do that on a separate PR so we can track progress and compare the results?

@thaJeztah
Copy link
Member

😂 yes, if you could give it a quick go and push it separately, that'd be useful.

I do understand the concerns, and yes, we should look carefully if there's nothing in the cherry-picks that we don't want, just that it could help future cherry-picks / backports if we have less modifications in the branch.

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arkodg
Copy link
Contributor

arkodg commented Jun 21, 2019

PTAL @euanh

@euanh
Copy link
Contributor

euanh commented Jun 24, 2019

@thaJeztah, @trapier Did you come to any conclusions about whether it's worthwhile to backport the intermediate commits to make future merges easier?

@trapier
Copy link
Author

trapier commented Jun 24, 2019

I have not followed through on my suggestion to take another branch and investigate the cost taking the interim PRs. Other accountabilities current preclude my ability to proceed with this investigation.

For whatever it's worth, my sense is there isn't currently and before its EOL won't be a great demand for additional backports to 17.06. EE Engine 17.06 is currently scheduled for support EOL on 2020-04-16 .

@euanh euanh merged commit c4e81a0 into moby:bump_17.06 Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants