Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnostic client #2032

Merged
merged 2 commits into from
Jan 29, 2018
Merged

Diagnostic client #2032

merged 2 commits into from
Jan 29, 2018

Conversation

fcrisciani
Copy link

@fcrisciani fcrisciani commented Dec 8, 2017

  1. decode internal state of service discovery and overlay
  2. remediate in case of orphan entries
  3. realign naming with moby/moby api
  4. GetEntry should return error if the key is under deletion (added unit test)

Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Had a couple of minor comments. One q: can the remediation be triggered automatically once in a while as part of a "garbage collection sweep" or is that too dangerous?

if err != nil {
logrus.WithError(err).Fatalf("The connection failed")
}
httpIsOk(resp.Body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can directly check resp.StatusCode for 200/2xx codes which indicate HTTP OK rather than parsing resp.body for OK: https://golangcode.com/get-the-http-response-status-code/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think can be a good enhancement, right now the server side did not really set any specific error code, but it would be good to implement a more proper api with error codes

}
httpIsOk(resp.Body)

clsuterPeers := fetchNodePeers(*ipPtr, *portPtr, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clsuster typo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Makefile Outdated
@@ -52,6 +53,7 @@ cross-local:
@echo "🐳 $@"
go build -o "bin/dnet-$$GOOS-$$GOARCH" ./cmd/dnet
go build -o "bin/docker-proxy-$$GOOS-$$GOARCH" ./cmd/proxy
go build -o "bin/dignosticClient-$$GOOS-$$GOARCH" ./diagnose/client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why put it in /diagnose/ and not /cmd/ ? like the others ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep moved

for _, k := range orphanKeys {
resp, err := http.Get(fmt.Sprintf(deleteEntry, ip, port, network, tableName, k))
if err != nil {
logrus.WithError(err).Fatalf("Failed deleting entry k:%s", k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want to fatal here ? since it will stop the program, it won't delete the remaining entries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, changed in error

@fcrisciani fcrisciani force-pushed the debug-client branch 2 times, most recently from 84493e3 to 989af94 Compare December 14, 2017 23:14
@codecov-io
Copy link

codecov-io commented Dec 14, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@862df3a). Click here to learn what that means.
The diff coverage is 10.28%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2032   +/-   ##
=========================================
  Coverage          ?   40.46%           
=========================================
  Files             ?      138           
  Lines             ?    22172           
  Branches          ?        0           
=========================================
  Hits              ?     8971           
  Misses            ?    11884           
  Partials          ?     1317
Impacted Files Coverage Δ
networkdb/networkdbdiagnostic.go 0% <0%> (ø)
agent.go 5.02% <0%> (ø)
networkdb/networkdb.go 67.8% <100%> (ø)
controller.go 36.74% <52.94%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 862df3a...be91c3e. Read the comment docs.

@thaJeztah
Copy link
Member

@fcrisciani can you include a README / Markdown file in this PR? Misty started working on one in docker/docs#5558, and can be used as a starting point

@fcrisciani fcrisciani force-pushed the debug-client branch 2 times, most recently from 3ba06ad to ef512ba Compare January 9, 2018 22:51
@fcrisciani
Copy link
Author

fcrisciani commented Jan 9, 2018

@thaJeztah @ddebroy @vieux can you guys give another pass?

Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple of minor string nits

A message like the following will appear in the Docker host logs:

```none
Starting the diagnose server listening on <port> for commands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the string diagnose be diagnostics ?

A message like the following will appear in the Docker host logs:

```none
Disabling the diagnose server
Copy link
Contributor

@ddebroy ddebroy Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the string diagnose be diagnostics ?

Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fcrisciani
Copy link
Author

@thaJeztah can you also give the final blessing?

@fcrisciani fcrisciani changed the title Diagnose client Diagnostic client Jan 23, 2018
@fcrisciani fcrisciani force-pushed the debug-client branch 3 times, most recently from 766b46b to 5b9614f Compare January 24, 2018 18:55
Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with a couple of minor comments.

// IsDebugEnable returns true when the debug is enabled
func (s *Server) IsDebugEnable() bool {
// IsDiagnosticEnable returns true when the debug is enabled
func (s *Server) IsDiagnosticEnable() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: how about changing the name to IsDiagnosticEnable**d** like in controller
func (c *controller) IsDiagnosticEnabled() bool {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -328,6 +328,9 @@ func (nDB *NetworkDB) GetEntry(tname, nid, key string) ([]byte, error) {
if err != nil {
return nil, err
}
if entry != nil && entry.deleting {
return nil, types.NotFoundErrorf("entry not found in table %s with network id %s and key %s", tname, nid, key)
Copy link
Contributor

@ddebroy ddebroy Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Masking the existence into a "not found" message may be confusing if this message makes into logs but say the actual deletion does not trigger for a while. How about making the message something along the lines of: "entry found but in deleting state. returning not found" to keep things super clear for support engineers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing in: return nil, types.NotFoundErrorf("entry in table %s network id %s and key %s deleted and pending garbage collection", tname, nid, key)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments 🤗

@@ -0,0 +1,8 @@
FROM docker:17.12-rc-dind
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-rc tag 17.12-dind is now available (https://hub.docker.com/_/docker/)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking why we needed the dind for this tool, but this is so that we can use it for older daemons, which do not have this functionality built-in, correct?

Should we have two versions of the image? one "minimal" (just the binary), and one dind?


RUN apk add --no-cache curl

WORKDIR /tool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of installing in /tool, you could just install in /usr/local/bin

**Standalone network:**

```bash
$ debugClient -c sd -v -net n8a8ie6tb3wr2e260vxj8ncy4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/debugClient/diagnosticClient/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

**Overlay network:**

```bash
$ debugClient -port 2001 -c overlay -v -net n8a8ie6tb3wr2e260vxj8ncy4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/debugClient/diagnosticClient/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

WORKDIR /tool

COPY daemon.json /etc/docker/daemon.json
COPY diagnosticClient /tool/diagnosticClient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a follow-up; we should make this Dockerfile a multi-stage build, and actually build the client in it (I had a branch with that, but looks like I didn't push, and it's not on my laptop 😊

Remember that table operations have ownership, so any `create entry` will be persistent till
the diagnostic container is part of the swarm.

1. Make sure that the node where the diagnostic client will run is not part of the swarm, if so do `docker swarm leave -f`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need different steps for 17.12+ daemons (as they don't have to leave the swarm, just the diagnostic client to connect to them)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the container version so you will run the dind version and you will need to leave the swarm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood; thinking if we should have instructions (also using a containerised version) to use for 17.12 daemons (just bind-mounting /var/run/docker.sock e.g. and connecting to the running daemon)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the containerized version of it, as dockereng/network-diagnostic:onlyclient that can be used as it with --net host, no need for the docker.sock

2. To run the container, use a command like the following:

```bash
$ docker container run --name net-diagnostic -d --privileged --network host fcrisciani/network-diagnostic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put the image under an official organization? (dockereng/ or dockercore/ e.g.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@fcrisciani fcrisciani force-pushed the debug-client branch 3 times, most recently from 198046b to 2ccffde Compare January 25, 2018 23:53
@fcrisciani
Copy link
Author

Note to myself, after this gets merged the moby vendoring requires code change due to changes in method names

Flavio Crisciani added 2 commits January 25, 2018 16:09
Align it to the moby/moby external api

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- the client allows to talk to the diagnostic server and
decode the internal values of the overlay and service discovery

- the tool also allows to remediate in case of orphans entries

- added README

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but left two suggestions 👍

`HUP` signal to the PID you found in the previous step.

```bash
kill -HUP <pid-of-dockerd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be killall -HUP dockerd

`HUP` signal to the PID you found in the previous step.

```bash
kill -HUP <pid-of-dockerd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor

@vieux vieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fcrisciani fcrisciani merged commit 20dd462 into moby:master Jan 29, 2018
@fcrisciani fcrisciani deleted the debug-client branch January 29, 2018 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants