Skip to content
This repository was archived by the owner on Jan 11, 2023. It is now read-only.

kube-dns-v20 deployment #3534

Closed
visokoo opened this issue Jul 24, 2018 · 13 comments · Fixed by #3373
Closed

kube-dns-v20 deployment #3534

visokoo opened this issue Jul 24, 2018 · 13 comments · Fixed by #3373
Assignees

Comments

@visokoo
Copy link

visokoo commented Jul 24, 2018

Is this a request for help?:

Yes.

Is this an ISSUE or FEATURE REQUEST? (choose one):

Issue.

What version of acs-engine?:

v0.14.5 & v0.20.0

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
acs-engine 0.14.5 | k8s 1.9.6
acs-engine 0.20.0 | k8s 1.11.0

What happened:
I'm running into an issue that's pretty similar to #2999 where the kube-dns-v20 deployments go into a CrashBackoffLoop after adding a config map with my custom consul upstream server.

apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-dns
  namespace: kube-system
data:
  upstreamNameservers: |
    ["<consul box ip"]

It doesn't happen right away, but the problem usually presents itself after an hr or so and starts with not being able to do nslookups on the internal kubernetes network, causing the DNS stack to loop since the health check fails.

kubectl logs -n kube-system kube-dns-v20-77b8c6b4c-zz492 -c healthz -f
2018/07/23 22:31:20 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-07-23 22:30:51.167132151 +0000 UTC, error exit status 1
2018/07/23 22:31:30 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-07-23 22:31:25.407528227 +0000 UTC, error exit status 1
2018/07/23 22:31:40 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-07-23 22:31:25.407528227 +0000 UTC, error exit status 1
2018/07/23 22:31:50 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-07-23 22:31:25.407528227 +0000 UTC, error exit status 1
2018/07/23 22:32:00 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-07-23 22:31:58.678927868 +0000 UTC, error exit status 1

Though internal lookups fail, external lookups still work when the cluster comes up momentarily. In testing with minikube, I noticed that the args for dnsmasq are slightly different. My minikube cluster's dns stack includes these args that are not present in azure:

        - --no-negcache
        - --server=/cluster.local/127.0.0.1#10053

After deleting the DNS stack and recreating them with those new values added, my DNS pods have been stable for about a day. Is there a reason why those args aren't included in the kube-dns stack that's spun up via azure's images? I'm not sure if this is the fix, but can those be added?

kube-dns:1.14.8 is missing the same args but has a slightly different issue where requests just don't route to the upstreamserver at all from the pods for external services, which is weird and isn't how it's supposed to work according to kubernetes documentation, thus leading me to try 1.14.10 instead.

Would appreciate any insight...

What you expected to happen:
DNS cluster should be stable after adding custom upstream.

How to reproduce it (as minimally and precisely as possible):
Create a cluster with k8s 1.11.0, add an upstreamNameserver via configmap, resolve a few things and wait about an hr for the DNS stack to start crash looping.

Anything else we need to know:
Our consul DNS is set up with google's dns as its recursors for anything outside of the consul domain.
kube components that we're using:
k8s-gcrio.azureedge.net/exechealthz-amd64:1.2
k8s-gcrio.azureedge.net/k8s-dns-dnsmasq-nanny-amd64:1.14.10
k8s-gcrio.azureedge.net/k8s-dns-kube-dns-amd64:1.14.10

@visokoo
Copy link
Author

visokoo commented Jul 24, 2018

I'm going to be pasting some logs for the 1.14.8 kubedns issue since I have a fresh cluster that's doing the same thing:

After adding stubDomain to point to consul box and upstreamNameservers to consul ips:

nslookup standby.vault.service.<mydomain>.int
Server:		10.0.0.10
Address:	10.0.0.10#53

Name:	standby.vault.service.<mydomain>.int
Address: <consul box ip>

tcpdump from dnsmasq

22:15:45.531814 IP (tos 0x0, ttl 64, id 10131, offset 0, flags [none], proto UDP (17), length 107)
    10.4.20.97.47783 > 10.4.20.126.53: [udp sum ok] 53855+ A? standby.vault.service.<mydomain>.int.default.svc.cluster.local. (79)
22:15:45.532315 IP (tos 0x0, ttl 64, id 49997, offset 0, flags [DF], proto UDP (17), length 200)
    10.4.20.126.53 > 10.4.20.97.47783: [bad udp cksum 0x3dac -> 0x3081!] 53855 NXDomain q: A? standby.vault.service.<mydomain>.int.default.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1532469600 28800 7200 604800 60 (172)

tcpdump from kubedns

22:32:50.236535 IP (tos 0x0, ttl 64, id 45769, offset 0, flags [none], proto UDP (17), length 99)
    10.4.20.97.59860 > 10.4.20.126.53: [udp sum ok] 15302+ A? standby.vault.service.<mydomain>.int.svc.cluster.local. (71)
22:32:50.237393 IP (tos 0x0, ttl 64, id 39037, offset 0, flags [DF], proto UDP (17), length 192)
    10.4.20.126.53 > 10.4.20.97.59860: [bad udp cksum 0x3da4 -> 0xd2b1!] 15302 NXDomain q: A? standby.vault.service.<mydomain>.int.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1532469600 28800 7200 604800 60 (164)
22:32:50.238084 IP (tos 0x0, ttl 64, id 45770, offset 0, flags [none], proto UDP (17), length 95)
    10.4.20.97.56482 > 10.4.20.126.53: [udp sum ok] 53512+ A? standby.vault.service.<mydomain>.int.cluster.local. (67)
22:32:50.238669 IP (tos 0x0, ttl 64, id 39038, offset 0, flags [DF], proto UDP (17), length 188)
    10.4.20.126.53 > 10.4.20.97.56482: [bad udp cksum 0x3da0 -> 0xc47f!] 53512 NXDomain q: A? standby.vault.service.<mydomain>.int.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1532469600 28800 7200 604800 60 (160)
22:32:50.244734 IP (tos 0x0, ttl 64, id 45772, offset 0, flags [none], proto UDP (17), length 81)
    10.4.20.97.49290 > 10.4.20.126.53: [udp sum ok] 41724+ A? standby.vault.service.<mydomain>.int. (53)
22:32:50.244872 IP (tos 0x0, ttl 64, id 28062, offset 0, flags [DF], proto UDP (17), length 81)
    10.4.20.126.11182 > 10.255.0.7.53: [bad udp cksum 0x29d6 -> 0xb14b!] 34294+ A? standby.vault.service.<mydomain>.int. (53)
22:32:50.244912 IP (tos 0x0, ttl 64, id 29882, offset 0, flags [DF], proto UDP (17), length 81)
    10.4.20.126.11182 > 10.255.0.6.53: [bad udp cksum 0x29d5 -> 0xb14c!] 34294+ A? standby.vault.service.<mydomain>.int. (53)
22:32:50.245001 IP (tos 0x0, ttl 64, id 36254, offset 0, flags [DF], proto UDP (17), length 81)
    10.4.20.126.11182 > 10.255.0.5.53: [bad udp cksum 0x29d4 -> 0xb14d!] 34294+ A? standby.vault.service.<mydomain>.int. (53)
22:32:50.284266 IP (tos 0x0, ttl 64, id 50387, offset 0, flags [DF], proto UDP (17), length 185)
    10.255.0.7.53 > 10.4.20.126.11182: [udp sum ok] 34294* q: A? standby.vault.service.<mydomain>.int. 2/0/2 standby.vault.service.<mydomain>.int. A 10.255.0.5, standby.vault.service.<mydomain>.int. A 10.255.0.7 ar: standby.vault.service.<mydomain>.int. TXT "consul-network-segment=", standby.vault.service.<mydomain>.int. TXT "consul-network-segment=" (157)
22:32:50.284362 IP (tos 0x0, ttl 64, id 39049, offset 0, flags [DF], proto UDP (17), length 185)
    10.4.20.126.53 > 10.4.20.97.49290: [bad udp cksum 0x3d9d -> 0x0bec!] 41724* q: A? standby.vault.service.<mydomain>.int. 2/0/2 standby.vault.service.<mydomain>.int. A 10.255.0.5, standby.vault.service.<mydomain>.int. A 10.255.0.7 ar: standby.vault.service.<mydomain>.int. TXT "consul-network-segment=", standby.vault.service.<mydomain>.int. TXT "consul-network-segment=" (157)
22:32:50.284682 IP (tos 0x0, ttl 64, id 22801, offset 0, flags [DF], proto UDP (17), length 185)
    10.255.0.6.53 > 10.4.20.126.11182: [udp sum ok] 34294* q: A? standby.vault.service.<mydomain>.int. 2/0/2 standby.vault.service.<mydomain>.int. A 10.255.0.5, standby.vault.service.<mydomain>.int. A 10.255.0.7 ar: standby.vault.service.<mydomain>.int. TXT "consul-network-segment=", standby.vault.service.<mydomain>.int. TXT "consul-network-segment=" (157)
22:32:50.285674 IP (tos 0x0, ttl 64, id 1963, offset 0, flags [DF], proto UDP (17), length 185)
    10.255.0.5.53 > 10.4.20.126.11182: [udp sum ok] 34294* q: A? standby.vault.service.<mydomain>.int. 2/0/2 standby.vault.service.<mydomain>.int. A 10.255.0.5, standby.vault.service.<mydomain>.int. A 10.255.0.7 ar: standby.vault.service.<mydomain>.int. TXT "consul-network-segment=", standby.vault.service.<mydomain>.int. TXT "consul-network-segment=" (157)
nslookup twitter.com
Server:		10.0.0.10
Address:	10.0.0.10#53

Non-authoritative answer:
Name:	twitter.com
Address: 104.244.42.129
Name:	twitter.com
Address: 104.244.42.1

tcpdump from dnsmasq

23:05:31.406098 IP (tos 0x0, ttl 64, id 40617, offset 0, flags [none], proto UDP (17), length 83)
    10.4.20.97.55440 > 10.4.20.126.53: [udp sum ok] 24138+ A? twitter.com.default.svc.cluster.local. (55)
23:05:31.406583 IP (tos 0x0, ttl 64, id 1321, offset 0, flags [DF], proto UDP (17), length 176)
    10.4.20.126.53 > 10.4.20.97.55440: [bad udp cksum 0x3d94 -> 0x1cff!] 24138 NXDomain q: A? twitter.com.default.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1532473200 28800 7200 604800 60 (148)
23:05:31.407381 IP (tos 0x0, ttl 64, id 40618, offset 0, flags [none], proto UDP (17), length 75)
    10.4.20.97.46096 > 10.4.20.126.53: [udp sum ok] 24108+ A? twitter.com.svc.cluster.local. (47)
23:05:31.407734 IP (tos 0x0, ttl 64, id 1322, offset 0, flags [DF], proto UDP (17), length 168)
    10.4.20.126.53 > 10.4.20.97.46096: [bad udp cksum 0x3d8c -> 0x7c61!] 24108 NXDomain q: A? twitter.com.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1532473200 28800 7200 604800 60 (140)
23:05:31.411092 IP (tos 0x0, ttl 64, id 40620, offset 0, flags [none], proto UDP (17), length 109)
    10.4.20.97.49117 > 10.4.20.126.53: [udp sum ok] 4025+ A? twitter.com.d4eund02jxxu5kwr01sf5f2bvb.xx.internal.cloudapp.net. (81)
23:05:31.411333 IP (tos 0x0, ttl 64, id 26387, offset 0, flags [DF], proto UDP (17), length 109)
    10.4.20.126.53670 > 168.63.129.16.53: [bad udp cksum 0x483c -> 0xf307!] 64428+ A? twitter.com.d4eund02jxxu5kwr01sf5f2bvb.xx.internal.cloudapp.net. (81)
23:05:31.413181 IP (tos 0x0, ttl 128, id 2453, offset 0, flags [none], proto UDP (17), length 188)
    168.63.129.16.53 > 10.4.20.126.53670: [udp sum ok] 64428 NXDomain* q: A? twitter.com.d4eund02jxxu5kwr01sf5f2bvb.xx.internal.cloudapp.net. 0/1/0 ns: xx.internal.cloudapp.net. SOA localhost. hostmaster. 30751 900 600 86400 10 (160)
23:05:31.413351 IP (tos 0x0, ttl 64, id 1323, offset 0, flags [DF], proto UDP (17), length 164)
    10.4.20.126.53 > 10.4.20.97.49117: [bad udp cksum 0x3d88 -> 0xe19d!] 4025 NXDomain* q: A? twitter.com.d4eund02jxxu5kwr01sf5f2bvb.xx.internal.cloudapp.net. 0/1/0 ns: xx.internal.cloudapp.net. SOA localhost. hostmaster. 30751 900 600 86400 10 (136)
23:05:31.414237 IP (tos 0x0, ttl 64, id 40621, offset 0, flags [none], proto UDP (17), length 57)
    10.4.20.97.36236 > 10.4.20.126.53: [udp sum ok] 15458+ A? twitter.com. (29)
23:05:31.414535 IP (tos 0x0, ttl 64, id 26388, offset 0, flags [DF], proto UDP (17), length 57)
    10.4.20.126.45281 > 168.63.129.16.53: [bad udp cksum 0x4808 -> 0xdc4d!] 23512+ A? twitter.com. (29)
23:05:31.415774 IP (tos 0x0, ttl 128, id 2454, offset 0, flags [none], proto UDP (17), length 89)
    168.63.129.16.53 > 10.4.20.126.45281: [udp sum ok] 23512 q: A? twitter.com. 2/0/0 twitter.com. A 104.244.42.129, twitter.com. A 104.244.42.1 (61)
23:05:31.415903 IP (tos 0x0, ttl 64, id 1324, offset 0, flags [DF], proto UDP (17), length 89)
    10.4.20.126.53 > 10.4.20.97.36236: [bad udp cksum 0x3d3d -> 0x6891!] 15458 q: A? twitter.com. 2/0/0 twitter.com. A 104.244.42.129, twitter.com. A 104.244.42.1 (61)

tcpdump from kubedns

23:05:31.410952 IP (tos 0x0, ttl 64, id 40619, offset 0, flags [none], proto UDP (17), length 71)
    10.4.20.4.54360 > 10.4.20.19.53: [udp sum ok] 49449+ A? twitter.com.cluster.local. (43)
23:05:31.411090 IP (tos 0x0, ttl 64, id 64921, offset 0, flags [DF], proto UDP (17), length 71)
    10.4.20.19.29944 > 10.255.0.7.53: [bad udp cksum 0x2961 -> 0x6f21!] 140+ A? twitter.com.cluster.local. (43)
23:05:31.411126 IP (tos 0x0, ttl 64, id 26026, offset 0, flags [DF], proto UDP (17), length 71)
    10.4.20.19.29944 > 10.255.0.6.53: [bad udp cksum 0x2960 -> 0x6f22!] 140+ A? twitter.com.cluster.local. (43)
23:05:31.411187 IP (tos 0x0, ttl 64, id 14360, offset 0, flags [DF], proto UDP (17), length 71)
    10.4.20.19.29944 > 10.255.0.5.53: [bad udp cksum 0x295f -> 0x6f23!] 140+ A? twitter.com.cluster.local. (43)
23:05:31.411477 IP (tos 0x0, ttl 64, id 5718, offset 0, flags [DF], proto UDP (17), length 164)
    10.4.20.19.53 > 10.4.20.4.54360: [bad udp cksum 0x3cc0 -> 0x73c2!] 49449 NXDomain q: A? twitter.com.cluster.local. 0/1/0 ns: cluster.local. [1m] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1532473200 28800 7200 604800 60 (136)
23:05:31.470007 IP (tos 0x0, ttl 64, id 59901, offset 0, flags [DF], proto UDP (17), length 146)
    10.255.0.6.53 > 10.4.20.19.29944: [udp sum ok] 140 NXDomain q: A? twitter.com.cluster.local. 0/1/0 ns: . [23h59m27s] SOA a.root-servers.net. nstld.verisign-grs.com. 2018072401 1800 900 604800 86400 (118)
23:05:31.470218 IP (tos 0x0, ttl 64, id 56510, offset 0, flags [DF], proto UDP (17), length 146)
    10.255.0.5.53 > 10.4.20.19.29944: [udp sum ok] 140 NXDomain q: A? twitter.com.cluster.local. 0/1/0 ns: . [23h59m31s] SOA a.root-servers.net. nstld.verisign-grs.com. 2018072401 1800 900 604800 86400 (118)
23:05:31.470342 IP (tos 0x0, ttl 64, id 47434, offset 0, flags [DF], proto UDP (17), length 146)
    10.255.0.7.53 > 10.4.20.19.29944: [udp sum ok] 140 NXDomain q: A? twitter.com.cluster.local. 0/1/0 ns: . [23h59m58s] SOA a.root-servers.net. nstld.verisign-grs.com. 2018072401 1800 900 604800 86400 (118)

Even with upstreamNameserver set with a ClusterFirst dnspolicy on the pods, it's still going out to the MSFT nameserver instead of what was specified. This seems like a bug to me unless I'm missing some critical?

@visokoo
Copy link
Author

visokoo commented Jul 26, 2018

I initially posted this question on the kubernetes/dns repo and wanted to relay some of the comments here: kubernetes/dns#248

Looks like the upstream/stub domain issue should be resolved if the acs kube-dns-deployment.yaml matched the upstream version of it. Is there any reason why that isn't the case?

@jackfrancis
Copy link
Member

Hi @visokoo, this is the manifest we're currently using for 1.9 and 1.10 clusters:

https://github.com/Azure/acs-engine/blob/master/parts/k8s/addons/kubernetesmasteraddons-kube-dns-deployment.yaml

Do you see any obvious omissions or anything else fishy that would break your consul configuration? If so we'd love to incorporate changes that help your scenario!

@feiskyer who has worked in this area a bit, do you see anything in @visokoo's scenario that would require us to evolve our kube-dns implementation?

@visokoo
Copy link
Author

visokoo commented Jul 26, 2018

@jackfrancis, thanks for responding. The obvious ones that I see missing from that manifest are the ones I listed above under the dnsmasq section:

        - --no-negcache
        - --server=/cluster.local/127.0.0.1#10053

Upstream has: https://github.com/kubernetes/kubernetes/blob/753632d85b7639ffadb05eed3e49dbfbbd5360b6/cluster/addons/dns/kube-dns/kube-dns.yaml.base#L170

In the acs manifest, it looks like the change would need to be here:

and also include the --no-negcache param.

@jackfrancis
Copy link
Member

@visokoo Thanks!

See:

#3564

Are you able to build from that PR branch and try a repro? (Or really, you could just kubectl edit deployment -n kube-system and manually add that add'l dnsmasq config setting)

Shall I assume that that add'l setting says "when you get an NXDOMAIN response (and perhaps other types of not found results) don't cache it"? Something like that?

@visokoo
Copy link
Author

visokoo commented Jul 26, 2018

@jackfrancis Sorry I wasn't clear, aside from adding the --no-negcache change, line 132 referenced in the kube-dns-deployment.yaml also needs to be edited to be --server=/cluster.local/127.0.0.1#10053 or in the case of the file, it looks like needs to read from this parameter: <kubernetesKubeletClusterDomain> so the change would be:
"--server=/<kubernetesKubeletClusterDomain>/127.0.0.1#10053

And yes, --no-negcache functions like how you described it.

For the cluster.local change, here's the explanation from upstream:

Instead of --server=127.0.0.1#10053, the dnsmasq should be using --server=/cluster.local/127.0.0.1#10053 instead (or with the customized cluster domain). In your case because 127.0.0.1#10053 is listed as the first nameserver without explicit domain, basically all queries (that don't match other domains) will be forwarded to it.

Can you add that to the PR and I can try building from that PR?

@jackfrancis
Copy link
Member

👌 PR updated, thanks for testing!

@visokoo
Copy link
Author

visokoo commented Jul 26, 2018

@jackfrancis Tested those settings on my cluster and resolution is working as expected.

Thanks for making the change! What's the ETA on this going out in the next release of acs-engine?

@jackfrancis
Copy link
Member

Good to hear! Unfortunately shipping a new cluster w/ those changes (v1.8 at least) seems to have broken some functionality:

https://circleci.com/gh/Azure/acs-engine/37969?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

I'll run against other versions to see if this config has an affinity for certain k8s versions only.

Once we get a non-regressive implementation we'll test in master for a week or so and then patch a release, is the normal process.

@visokoo
Copy link
Author

visokoo commented Jul 27, 2018

Thanks for the clarification!

Looks like you fixed the tests? https://circleci.com/gh/Azure/acs-engine/38014 =]

Looking forward to the patch and thanks again!

@jackfrancis
Copy link
Member

Sorry to be the bearer of bad news, but that test run is against another PR 😝

If you have a chance, test a cluster using your api model config by building from the branch in PR #3564, let me know how that cluster looks!

@jackfrancis
Copy link
Member

@visokoo this PR, which includes the introduction of the dnsmasq flags you want to 1.11 kube-dns config, is being tested with the original static config:

#3373

i.e., cluster.local

I'll report back with test results. If things check out, I'll probably close #3564 and drive #3373 to completion. Thanks again for hanging in there!

@visokoo
Copy link
Author

visokoo commented Jul 27, 2018

Thanks for the update @jackfrancis. Just for further clarification, this change would also be pushed to earlier versions of kube-dns as well, correct? Our prod cluster is specifically using k8s 1.9.6 with kube-dns 1.14.8. If we don't have to upgrade, that would be ideal...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants