-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diagnostic client #2032
Diagnostic client #2032
Conversation
fcrisciani
commented
Dec 8, 2017
•
edited
Loading
edited
- decode internal state of service discovery and overlay
- remediate in case of orphan entries
- realign naming with moby/moby api
- GetEntry should return error if the key is under deletion (added unit test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Had a couple of minor comments. One q: can the remediation be triggered automatically once in a while as part of a "garbage collection sweep" or is that too dangerous?
diagnose/client/main.go
Outdated
if err != nil { | ||
logrus.WithError(err).Fatalf("The connection failed") | ||
} | ||
httpIsOk(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can directly check resp.StatusCode
for 200/2xx codes which indicate HTTP OK rather than parsing resp.body
for OK
: https://golangcode.com/get-the-http-response-status-code/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think can be a good enhancement, right now the server side did not really set any specific error code, but it would be good to implement a more proper api with error codes
diagnose/client/main.go
Outdated
} | ||
httpIsOk(resp.Body) | ||
|
||
clsuterPeers := fetchNodePeers(*ipPtr, *portPtr, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clsuster
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Makefile
Outdated
@@ -52,6 +53,7 @@ cross-local: | |||
@echo "🐳 $@" | |||
go build -o "bin/dnet-$$GOOS-$$GOARCH" ./cmd/dnet | |||
go build -o "bin/docker-proxy-$$GOOS-$$GOARCH" ./cmd/proxy | |||
go build -o "bin/dignosticClient-$$GOOS-$$GOARCH" ./diagnose/client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why put it in /diagnose/ and not /cmd/ ? like the others ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep moved
diagnose/client/main.go
Outdated
for _, k := range orphanKeys { | ||
resp, err := http.Get(fmt.Sprintf(deleteEntry, ip, port, network, tableName, k)) | ||
if err != nil { | ||
logrus.WithError(err).Fatalf("Failed deleting entry k:%s", k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really want to fatal here ? since it will stop the program, it won't delete the remaining entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, changed in error
84493e3
to
989af94
Compare
Codecov Report
@@ Coverage Diff @@
## master #2032 +/- ##
=========================================
Coverage ? 40.46%
=========================================
Files ? 138
Lines ? 22172
Branches ? 0
=========================================
Hits ? 8971
Misses ? 11884
Partials ? 1317
Continue to review full report at Codecov.
|
@fcrisciani can you include a README / Markdown file in this PR? Misty started working on one in docker/docs#5558, and can be used as a starting point |
3ba06ad
to
ef512ba
Compare
@thaJeztah @ddebroy @vieux can you guys give another pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple of minor string nits
cmd/diagnostic/README.md
Outdated
A message like the following will appear in the Docker host logs: | ||
|
||
```none | ||
Starting the diagnose server listening on <port> for commands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the string diagnose
be diagnostics
?
cmd/diagnostic/README.md
Outdated
A message like the following will appear in the Docker host logs: | ||
|
||
```none | ||
Disabling the diagnose server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the string diagnose
be diagnostics
?
ef512ba
to
ed0721b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@thaJeztah can you also give the final blessing? |
766b46b
to
5b9614f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with a couple of minor comments.
diagnostic/server.go
Outdated
// IsDebugEnable returns true when the debug is enabled | ||
func (s *Server) IsDebugEnable() bool { | ||
// IsDiagnosticEnable returns true when the debug is enabled | ||
func (s *Server) IsDiagnosticEnable() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: how about changing the name to IsDiagnosticEnable**d**
like in controller
func (c *controller) IsDiagnosticEnabled() bool {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
networkdb/networkdb.go
Outdated
@@ -328,6 +328,9 @@ func (nDB *NetworkDB) GetEntry(tname, nid, key string) ([]byte, error) { | |||
if err != nil { | |||
return nil, err | |||
} | |||
if entry != nil && entry.deleting { | |||
return nil, types.NotFoundErrorf("entry not found in table %s with network id %s and key %s", tname, nid, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Masking the existence into a "not found" message may be confusing if this message makes into logs but say the actual deletion does not trigger for a while. How about making the message something along the lines of: "entry found but in deleting state. returning not found" to keep things super clear for support engineers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing in: return nil, types.NotFoundErrorf("entry in table %s network id %s and key %s deleted and pending garbage collection", tname, nid, key)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments 🤗
cmd/diagnostic/Dockerfile
Outdated
@@ -0,0 +1,8 @@ | |||
FROM docker:17.12-rc-dind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-rc tag 17.12-dind
is now available (https://hub.docker.com/_/docker/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking why we needed the dind
for this tool, but this is so that we can use it for older daemons, which do not have this functionality built-in, correct?
Should we have two versions of the image? one "minimal" (just the binary), and one dind
?
cmd/diagnostic/Dockerfile
Outdated
|
||
RUN apk add --no-cache curl | ||
|
||
WORKDIR /tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of installing in /tool
, you could just install in /usr/local/bin
cmd/diagnostic/README.md
Outdated
**Standalone network:** | ||
|
||
```bash | ||
$ debugClient -c sd -v -net n8a8ie6tb3wr2e260vxj8ncy4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/debugClient/diagnosticClient/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
cmd/diagnostic/README.md
Outdated
**Overlay network:** | ||
|
||
```bash | ||
$ debugClient -port 2001 -c overlay -v -net n8a8ie6tb3wr2e260vxj8ncy4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/debugClient/diagnosticClient/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
cmd/diagnostic/Dockerfile
Outdated
WORKDIR /tool | ||
|
||
COPY daemon.json /etc/docker/daemon.json | ||
COPY diagnosticClient /tool/diagnosticClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a follow-up; we should make this Dockerfile a multi-stage build, and actually build the client in it (I had a branch with that, but looks like I didn't push, and it's not on my laptop 😊
Remember that table operations have ownership, so any `create entry` will be persistent till | ||
the diagnostic container is part of the swarm. | ||
|
||
1. Make sure that the node where the diagnostic client will run is not part of the swarm, if so do `docker swarm leave -f` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need different steps for 17.12+ daemons (as they don't have to leave the swarm, just the diagnostic client to connect to them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the container version so you will run the dind version and you will need to leave the swarm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood; thinking if we should have instructions (also using a containerised version) to use for 17.12 daemons (just bind-mounting /var/run/docker.sock e.g. and connecting to the running daemon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the containerized version of it, as dockereng/network-diagnostic:onlyclient
that can be used as it with --net host, no need for the docker.sock
cmd/diagnostic/README.md
Outdated
2. To run the container, use a command like the following: | ||
|
||
```bash | ||
$ docker container run --name net-diagnostic -d --privileged --network host fcrisciani/network-diagnostic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put the image under an official organization? (dockereng/
or dockercore/
e.g.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
198046b
to
2ccffde
Compare
Note to myself, after this gets merged the moby vendoring requires code change due to changes in method names |
Align it to the moby/moby external api Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- the client allows to talk to the diagnostic server and decode the internal values of the overlay and service discovery - the tool also allows to remediate in case of orphans entries - added README Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2ccffde
to
be91c3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but left two suggestions 👍
`HUP` signal to the PID you found in the previous step. | ||
|
||
```bash | ||
kill -HUP <pid-of-dockerd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could be killall -HUP dockerd
`HUP` signal to the PID you found in the previous step. | ||
|
||
```bash | ||
kill -HUP <pid-of-dockerd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM