-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backport 17.06] NetworkDB qlen optimization #2379
Conversation
Added some optimizations to reduce the messages in the queue: 1) on join network the node execute a tcp sync with all the nodes that it is aware part of the specific network. During this time before the node was redistributing all the entries. This meant that if the network had 10K entries the queue of the joining node will jump to 10K. The fix adds a flag on the network that would avoid to insert any entry in the queue till the sync happens. Note that right now the flag is set in a best effort way, there is no real check if at least one of the nodes succeed. 2) limit the number of messages to redistribute coming from a TCP sync. Introduced a threshold that limit the number of messages that are propagated, this will disable this optimization in case of heavy load. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> (cherry picked from commit 5ed3822) Signed-off-by: Trapier Marshall <trapier.marshall@docker.com>
Haven't checked all the details, but would it be better to include some of the additional commits that caused the conflicts (e.g. include #1936 so that the cherry-pick will be clean)? |
Don't have enough experience with this code base to know what octopuses lurk behind the intervening PRs. I suppose it would make sense to start down that path and see what kind of hair starts growing. Should I do that on a separate PR so we can track progress and compare the results? |
😂 yes, if you could give it a quick go and push it separately, that'd be useful. I do understand the concerns, and yes, we should look carefully if there's nothing in the cherry-picks that we don't want, just that it could help future cherry-picks / backports if we have less modifications in the branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PTAL @euanh |
@thaJeztah, @trapier Did you come to any conclusions about whether it's worthwhile to backport the intermediate commits to make future merges easier? |
I have not followed through on my suggestion to take another branch and investigate the cost taking the interim PRs. Other accountabilities current preclude my ability to proceed with this investigation. For whatever it's worth, my sense is there isn't currently and before its EOL won't be a great demand for additional backports to 17.06. EE Engine 17.06 is currently scheduled for support EOL on 2020-04-16 . |
Backport of #2216 to 17.06. This should be considered a starting point. Behaves as expected. Needs further review from a logic standpoint.
Cherry-Pick Conflicts
networkdb/broadcast.go:113
,networkdb/broadcast.go:171
node
was removed by #2216.nDB.config.NodeName
changed tonDB.config.NodeID
via #1936 and was then removed in #2216networkdb/cluster.go:30
rejoinClusterDuration
andrejoinInterval
as in 17.06 and bring inmaxQueueLenBroadcastOnSync
from #2216.networkdb/delegate.go:277
reapEntryInterval
moved to networkdb.go:Config in #1957reapEntryInterval
Logical Conflicts
networkdb/delegate.go:179
fatal error: sync: Unlock of unlocked RWMutex
How I tested
Vendor this PR onto github.com/docker/docker-ee tag v17.06.2-ee-21 and change networkdb
StatsPrintPeriod
from 5 minutes to 15 seconds to improve observability in test.In a separate build backport the networkdb diagnostic API to libnetwork 9a47f8d and docker-ee cd8d740be0c048f156438d21975af753ac1b5225 (v17.06.2-ee-20). Use this build to stress networkdb in steps 6 and 8.
Bring up a 31 node dind cluster (one manager 30 workers) with the build from step 1.
Join one worker on 17.06 build from step 2 with networkdb diagnostic API.
Create an overlay network and deploy a mode global service to it.
Use the diagnostic API to introduce a "static load" of 1400 additional entries, half in
endpoint_table
, and half inoverlay_peer_table
on the test network.Wait for test network qLen to reach 0 on all nodes according to
NetworkDB stats
in dockerd logs.Use the diagnostic node to introduce a "dynamic load" by adding and 0.5s later removing an
endpoint_table
entry, doing this three times a second continuously. This seems to be about the max rate the cluster can sustain under otherwise quiet conditions before qLen begins to grow without bound.Restart one of the non-diagnostic workers, and observe its qLen. Prior to this PR, qLen on the joining node initially rises to the total number of entries, then steadily increases from there over minutes. With this PR qLen on the joining node matches that of other nodes on the first sample after rejoining (within 15s
StatsPrintPeriod
).Test tooling
Scripts developed for testing this pr provided for reference below. They probably won't work flawlessly in your environment.
test.sh.txt brings up dind cluster. provide argument
down
to remove the nodes.stressndb.sh.txt requires arguments either
static
to introduce 1400 entries ordynamic
to produce the load noted in steps 6 and 8 above, respectively.