Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document network diagnostic tool #5558

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions _data/toc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,8 @@ guides:
title: Get started with macvlan network driver
- path: /engine/userguide/networking/overlay-security-model/
title: Swarm mode overlay network security model
- path: /engine/userguide/networking/network-debug/
title: Debug overlay or swarm networking issues
- path: /engine/userguide/networking/configure-dns/
title: Configure container DNS in user-defined networks
- sectiontitle: Default bridge network
Expand Down
219 changes: 219 additions & 0 deletions engine/userguide/networking/network-debug.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
---
description: Learn to use the built-in network debugger to debug overlay networking problems
keywords: network, troubleshooting, debug
title: Debug overlay or swarm networking issues
---

Docker CE 17.12 and higher introduce a network debugging tool designed to help
debug issues with overlay networks and swarm services running on Linux hosts.
When enabled, a network diagnostic server listens on the specified port and
provides diagnostic information. The network debugging tool should only be
started to debug specific issues, and should not be left running all the time.

Information about networks is stored in a database, which can be examined using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stored in a database: how about a little more details around the networkdb:

  1. What kind of information is stored?
  2. Mention that the database is shared/clustered across all nodes.

the API.

The Docker API exposes endpoints to query and control the network debugging
tool. CLI integration is provided as a preview, but the implementation is not
yet considered stable and commands and options may change without notice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a big fat warning that this tool should be used with care, as (IIUC) incorrect use of the tool can destroy/damage the network (it's not a read-only tool), and also expose information about your cluster's configuration that should be kept private (so don't expose this API outside of the host).


## Enable the diagnostic tool

The tool currently only works on Docker hosts running on Linux. Repeat these
steps for each node participating in the swarm.

1. Set the `network-diagnostic-port` to a port which is free on the Docker
host, in the `/etc/docker/daemon.json` configuration file.

```json
“network-diagnostic-port”: <port>
```

2. Get the process ID (PID) of the `dockerd` process. It is the second field in
the output, and is typically a number from 2 to 6 digits long.

```bash
$ ps aux |grep dockerd | grep -v grep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If running with systemd (perhaps it's ok to assume that's the case), it's possible to use systemctl reload docker

```

3. Reload the Docker configuration without restarting Docker, by sending the
`HUP` signal to the PID you found in the previous step.

```bash
kill -HUP <pid-of-dockerd>
```

A message like the following will appear in the Docker host logs:

```none
Starting the diagnose server listening on <port> for commands
```

## Disable the diagnostic tool

Repeat these steps for each node participating in the swarm.

1. Remove the `network-diagnostic-port` key from the `/etc/docker/daemon.json`
configuration file.

2. Get the process ID (PID) of the `dockerd` process. It is the second field in
the output, and is typically a number from 2 to 6 digits long.

```bash
$ ps aux |grep dockerd | grep -v grep
```

3. Reload the Docker configuration without restarting Docker, by sending the
`HUP` signal to the PID you found in the previous step.

```bash
kill -HUP <pid-of-dockerd>
```

A message like the following will appear in the Docker host logs:

```none
Disabling the diagnose server
```

## Access the diagnostic tool's API

The network diagnostic tool exposes its own RESTful API. To access the API,
send a HTTP request to the port where the tool is listening. The following
commands assume the tool is listening on port 2000.

Examples are not given for every endpoint.

### Get help

```bash
$ curl localhost:2000/help

OK
/updateentry
/getentry
/gettable
/leavenetwork
/createentry
/help
/clusterpeers
/ready
/joinnetwork
/deleteentry
/networkpeers
/
/join
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the leave endpoint be listed here too?


### Join or leave the network database cluster

```bash
$ curl localhost:2000/join?members=ip1,ip2,...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to clarify ip1, ip2. Maybe add a note that they are the IPs of the nodes in the swarm? Ideal would be an example with a docker node ls and docker node inspect showing the IP addresses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's useful (at least for one of the examples)

```

```bash
$ curl localhost:2000/leave?members=ip1,ip2,...
```

### Join or leave a network

```bash
$ curl localhost:2000/joinnetwork?nid=<network id>
```

```bash
$ curl localhost:2000/leavenetwork?nid=<network id>
```

### List cluster peers

```bash
$ curl localhost:2000/clusterpeers
```

### List nodes connected to a given network

```bash
$ curl localhost:2000/networkpeers?nid=<network id>
```

### Dump database tables

The tables are called `endpoint_table` and `overlay_peer_table`. These names may
change.

```bash
$ curl localhost:2000/gettable?nid=<network id>&tname=<table name>
```

### Interact with a specific database table

The tables are called `endpoint_table` and `overlay_peer_table`. These names may
change.

```bash
$ curl localhost:2000/<method>?nid=<network id>&tname=<table name>&key=<key>[&value=<value>]
```

## Access the diagnostic tool's CLI

The CLI is provided as a preview and is not yet stable. Commands or options may
change at any time.

The CLI executable is called `diagnosticClient` and is made available using a
standalone container.

The following flags are supported:

| Flag | Description |
|---------------|-------------------------------------------------|
| -c <string> | Command to run. One of `sd` or `overlay`. |
| -ip <string> | The IP address to query. Defaults to 127.0.0.1. |
| -net <string> | The target network ID. |
| -port <int> | The target port. |
| -v | Enable verbose output. |

### Access the CLI

The CLI is provided as a container that needs to run using privileged mode.

1. To run the container, use a command like the following:

```bash
$ docker container run --name net-diagnostic -d --privileged --network host fcrisciani/network-diagnostic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we push the containerized tool to docker org in hub?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; but that can be handled separate from this PR (and this reference updated in a follow-up)

```

2. Connect to the container using `docker container attach <container-ID>`,
and start the server using the following command:

```bash
$ kill -HUP 1
```

3. If you have not already done so, join the Docker host to the swarm, then
run the diagnostic CLI within the container.

```bash
$ ./diagnosticClient <flags>...
```

4. When finished debugging, stop the container.

### Examples

The following commands dump the service discovery table and verify node
ownership.

**Standalone network:**

```bash
$ debugClient -c sd -v -net n8a8ie6tb3wr2e260vxj8ncy4
```

**Overlay network:**

```bash
$ debugClient -port 2001 -c overlay -v -net n8a8ie6tb3wr2e260vxj8ncy4
```