Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution uses resolv.conf nameserver even though upstreamNameserver is specified... #248

Closed
visokoo opened this issue Jul 21, 2018 · 8 comments

Comments

@visokoo
Copy link

visokoo commented Jul 21, 2018

What happened
We're planning to use consul as a DNS server for internal services and also as a forwarder for external domains.

Following this article it looks like all we need to do is specify the upstreamNameservers in the kube-dns config map with our consul box IP.

apiVersion: v1
data:
  upstreamNameservers: |
    ["10.255.0.6"]
kind: ConfigMap
metadata:
  creationTimestamp: 2018-07-21T00:14:39Z
  name: kube-dns
  namespace: kube-system
  resourceVersion: "333414"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kube-dns
  uid: 0f01cb8d-8c7b-11e8-ab6c-000d3af949c9

After doing this and ssh-ing into a pod to do an nslookup active.vault.service.<mydomain>.int, I get

Address:	10.0.0.10#53

** server can't find vault.service.<mydomain>.int: NXDOMAIN

Doing a tcpdump on the consul box yields intermittent activity:

08:27:54.337823 IP (tos 0x0, ttl 64, id 50796, offset 0, flags [DF], proto UDP (17), length 99)
    10.2.20.4.47202 > 10.255.0.6.53: [udp sum ok] 7008+ A? vault.service.<mydomain>.int.default.svc.cluster.local. (71)
08:27:54.338063 IP (tos 0x0, ttl 64, id 62340, offset 0, flags [DF], proto UDP (17), length 99)
    10.255.0.6.36954 > 8.8.8.8.53: [bad udp cksum 0x1b75 -> 0x3ab1!] 7008+ A? vault.service.<mydomain>.int.default.svc.cluster.local. (71)
08:27:54.339406 IP (tos 0x0, ttl 64, id 12324, offset 0, flags [DF], proto UDP (17), length 91)
    10.2.20.35.27387 > 10.255.0.6.53: [udp sum ok] 48234+ A? vault.service.<mydomain>.int.svc.cluster.local. (63)
08:27:54.339564 IP (tos 0x0, ttl 64, id 62341, offset 0, flags [DF], proto UDP (17), length 91)
    10.255.0.6.51761 > 8.8.8.8.53: [bad udp cksum 0x1b6d -> 0x9a93!] 48234+ A? vault.service.<mydomain>.int.svc.cluster.local. (63)
08:27:54.359153 IP (tos 0x0, ttl 117, id 36005, offset 0, flags [none], proto UDP (17), length 174)
    8.8.8.8.53 > 10.255.0.6.36954: [udp sum ok] 7008 NXDomain q: A? vault.service.<mydomain>.int.default.svc.cluster.local. 0/1/0 ns: . [23h59m4s] SOA a.root-servers.net. nstld.verisign-grs.com. 2018072100 1800 900 604800 86400 (146)
08:27:54.359283 IP (tos 0x0, ttl 64, id 3375, offset 0, flags [DF], proto UDP (17), length 174)
    10.255.0.6.53 > 10.2.20.4.47202: [bad udp cksum 0x29b6 -> 0xf7be!] 7008 NXDomain q: A? vault.service.<mydomain>.int.default.svc.cluster.local. 0/1/0 ns: . [23h59m4s] SOA a.root-servers.net. nstld.verisign-grs.com. 2018072100 1800 900 604800 86400 (146)
08:27:54.360847 IP (tos 0x0, ttl 117, id 37226, offset 0, flags [none], proto UDP (17), length 166)
    8.8.8.8.53 > 10.255.0.6.51761: [udp sum ok] 48234 NXDomain q: A? vault.service.<mydomain>.int.svc.cluster.local. 0/1/0 ns: . [23h59m44s] SOA a.root-servers.net. nstld.verisign-grs.com. 2018072100 1800 900 604800 86400 (138)
08:27:54.360922 IP (tos 0x0, ttl 64, id 29473, offset 0, flags [DF], proto UDP (17), length 166)
    10.255.0.6.53 > 10.2.20.35.27387: [bad udp cksum 0x29cd -> 0xde98!] 48234 NXDomain q: A? vault.service.<mydomain>.int.svc.cluster.local. 0/1/0 ns: . [23h59m44s] SOA a.root-servers.net. nstld.verisign-grs.com. 2018072100 1800 900 604800 86400 (138)

Question here...why does my query have the kubernetes search domains appended to it?

Looking at the tcpdump on the dnsmasq container, if I try to nslookup the consul domain, it never forwards to the upstream:

08:45:50.100471 IP (tos 0x0, ttl 64, id 60390, offset 0, flags [none], proto UDP (17), length 80)
    10.2.20.35.36683 > 10.2.20.49.53: [udp sum ok] 36502+ A? active.vault.service.<mydomain>.int. (52)
08:45:50.100566 IP (tos 0x0, ttl 64, id 32274, offset 0, flags [DF], proto UDP (17), length 80)
    10.2.20.49.53 > 10.2.20.35.36683: [bad udp cksum 0x3ca5 -> 0xd711!] 36502 NXDomain q: A? active.vault.service.<mydomain>.int. 0/0/0 (52)
08:45:50.102796 IP (tos 0x0, ttl 64, id 60392, offset 0, flags [none], proto UDP (17), length 80)
    10.2.20.35.55995 > 10.2.20.49.53: [udp sum ok] 37264+ A? active.vault.service.<mydomain>.int. (52)
08:45:50.102846 IP (tos 0x0, ttl 64, id 32275, offset 0, flags [DF], proto UDP (17), length 80)
    10.2.20.49.53 > 10.2.20.35.55995: [bad udp cksum 0x3ca5 -> 0x88a7!] 37264 NXDomain q: A? active.vault.service.<mydomain>.int. 0/0/0 (52)
08:45:50.103696 IP (tos 0x0, ttl 64, id 60393, offset 0, flags [none], proto UDP (17), length 80)
    10.2.20.35.43475 > 10.2.20.49.53: [udp sum ok] 14640+ A? active.vault.service.<mydomain>.int. (52)
08:45:50.103753 IP (tos 0x0, ttl 64, id 32276, offset 0, flags [DF], proto UDP (17), length 80)
    10.2.20.49.53 > 10.2.20.35.43475: [bad udp cksum 0x3ca5 -> 0x11f0!] 14640 NXDomain q: A? active.vault.service.<mydomain>.int. 0/0/0 (52)

If I look up an external DNS, I do see that it eventually forwards to the resolv.conf azure nameserver (168.63.129.16.53). Since I have upstreamNameservers specified, should it not that instead of what's in resolv.conf?

If I add a separate stubDomain just for the consul stuff, resolution works, but the article above seems to be worded in a way that indicates as long as your dnsPolicy is ClusterFirst and you're specifying an upstreamNameserver... if the domain isn't cluster.local, it should forward to your specified upstream. Is this not the case? Just need some clarity here...

Env
Azure
Kubernetes 1.9.6
KubeDNS 1.14.8
dnsPolicy: ClusterFirst

@chrisohaver
Copy link
Contributor

Question here...why does my query have the kubernetes search domains appended to it?

This is the short-name dns resolution process of k8s pods with the ClusterFirst policy.
The pod itself is adding the domains, based on the search path and ndots defined in it's /etc/resolv.conf. The query vault.service.<mydomain>.int contains 3 dots (which is less than the threshold of 5), so the pod tries each domain in the search path.

@visokoo
Copy link
Author

visokoo commented Jul 24, 2018

Thanks for the explanation there. Any potential insight on the upstreamNameservers issue? Just tested this in a brand new k8s cluster spun up in azure and still seeing external queries being sent to the MSFT nameserver instead of specified one. upstreamNameserver value doesn't seem to be respected.

@chrisohaver
Copy link
Contributor

We can look at the flags for dnsmasq (responsible for routing incoming dns queries) to verify that they are configured correctly. You can see those in the kube-dns deployment. These are supposed to be automatically configured based on the kube-dns configmap settings.

@visokoo
Copy link
Author

visokoo commented Jul 25, 2018

Looking at the dnsmasq container:

kubectl exec -n kube-system -it kube-dns-v20-658cf7bf44-ht8xl -c dnsmasq sh
/ # ps -f
PID   USER     TIME   COMMAND
    1 root       0:04 /dnsmasq-nanny -v=2 -logtostderr -configDir=/kube-dns-config -restartDnsmasq=true -- -k --cache-size=1000 --no-r
   16 root       0:06 /usr/sbin/dnsmasq -k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#100
   32 root       0:00 sh
   37 root       0:00 ps -f
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1beta1","kind":"Deployment","metadata":{"annotations":{},"labels":{"k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","version":"v20"},"name":"kube-dns-v20","namespace":"kube-system"},"spec":{"replicas":2,"selector":{"matchLabels":{"k8s-app":"kube-dns","version":"v20"}},"template":{"metadata":{"annotations":{"prometheus.io/port":"10055","prometheus.io/scrape":"true","scheduler.alpha.kubernetes.io/critical-pod":""},"labels":{"k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","version":"v20"}},"spec":{"affinity":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"k8s-app","operator":"In","values":["kube-dns"]}]},"topologyKey":"kubernetes.io/hostname"},"weight":100}]}},"containers":[{"args":["--domain=cluster.local.","--dns-port=10053","--v=2","--config-dir=/kube-dns-config"],"env":[{"name":"PROMETHEUS_PORT","value":"10055"}],"image":"k8s-gcrio.azureedge.net/k8s-dns-kube-dns-amd64:1.14.8","livenessProbe":{"failureThreshold":5,"httpGet":{"path":"/healthz-kubedns","port":8080,"scheme":"HTTP"},"initialDelaySeconds":60,"successThreshold":1,"timeoutSeconds":5},"name":"kubedns","ports":[{"containerPort":10053,"name":"dns-local","protocol":"UDP"},{"containerPort":10053,"name":"dns-tcp-local","protocol":"TCP"},{"containerPort":10055,"name":"metrics","protocol":"TCP"}],"readinessProbe":{"httpGet":{"path":"/readiness","port":8081,"scheme":"HTTP"},"initialDelaySeconds":30,"timeoutSeconds":5},"resources":{"limits":{"memory":"170Mi"},"requests":{"cpu":"100m","memory":"70Mi"}},"volumeMounts":[{"mountPath":"/kube-dns-config","name":"kube-dns-config"}]},{"args":["-v=2","-logtostderr","-configDir=/kube-dns-config","-restartDnsmasq=true","--","-k","--cache-size=1000","--no-resolv","--server=127.0.0.1#10053","--server=/in-addr.arpa/127.0.0.1#10053","--server=/ip6.arpa/127.0.0.1#10053","--log-facility=-"],"image":"k8s-gcrio.azureedge.net/k8s-dns-dnsmasq-nanny-amd64:1.14.8","name":"dnsmasq","ports":[{"containerPort":53,"name":"dns","protocol":"UDP"},{"containerPort":53,"name":"dns-tcp","protocol":"TCP"}],"volumeMounts":[{"mountPath":"/kube-dns-config","name":"kube-dns-config"}]},{"args":["--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 \u003e/dev/null","--url=/healthz-dnsmasq","--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 \u003e/dev/null","--url=/healthz-kubedns","--port=8080","--quiet"],"image":"k8s-gcrio.azureedge.net/exechealthz-amd64:1.2","livenessProbe":{"failureThreshold":5,"httpGet":{"path":"/healthz-dnsmasq","port":8080,"scheme":"HTTP"},"initialDelaySeconds":60,"successThreshold":1,"timeoutSeconds":5},"name":"healthz","ports":[{"containerPort":8080,"protocol":"TCP"}],"resources":{"limits":{"memory":"50Mi"},"requests":{"cpu":"10m","memory":"50Mi"}}}],"dnsPolicy":"Default","nodeSelector":{"beta.kubernetes.io/os":"linux"},"serviceAccountName":"kube-dns","tolerations":[{"key":"CriticalAddonsOnly","operator":"Exists"}],"volumes":[{"configMap":{"name":"kube-dns","optional":true},"name":"kube-dns-config"}]}}}}
  creationTimestamp: 2018-07-24T18:21:32Z
  generation: 1
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    version: v20
  name: kube-dns-v20
  namespace: kube-system
  resourceVersion: "25544"
  selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/kube-dns-v20
  uid: 63ff3b25-8f6e-11e8-84cd-000d3a06123f
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      k8s-app: kube-dns
      version: v20
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        prometheus.io/port: "10055"
        prometheus.io/scrape: "true"
        scheduler.alpha.kubernetes.io/critical-pod: ""
      creationTimestamp: null
      labels:
        k8s-app: kube-dns
        kubernetes.io/cluster-service: "true"
        version: v20
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: k8s-app
                  operator: In
                  values:
                  - kube-dns
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - --domain=cluster.local.
        - --dns-port=10053
        - --v=2
        - --config-dir=/kube-dns-config
        env:
        - name: PROMETHEUS_PORT
          value: "10055"
        image: k8s-gcrio.azureedge.net/k8s-dns-kube-dns-amd64:1.14.8
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthz-kubedns
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: kubedns
        ports:
        - containerPort: 10053
          name: dns-local
          protocol: UDP
        - containerPort: 10053
          name: dns-tcp-local
          protocol: TCP
        - containerPort: 10055
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /kube-dns-config
          name: kube-dns-config
      - args:
        - -v=2
        - -logtostderr
        - -configDir=/kube-dns-config
        - -restartDnsmasq=true
        - --
        - -k
        - --cache-size=1000
        - --no-resolv
        - --server=127.0.0.1#10053
        - --server=/in-addr.arpa/127.0.0.1#10053
        - --server=/ip6.arpa/127.0.0.1#10053
        - --log-facility=-
        image: k8s-gcrio.azureedge.net/k8s-dns-dnsmasq-nanny-amd64:1.14.8
        imagePullPolicy: IfNotPresent
        name: dnsmasq
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /kube-dns-config
          name: kube-dns-config
      - args:
        - --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
        - --url=/healthz-dnsmasq
        - --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
        - --url=/healthz-kubedns
        - --port=8080
        - --quiet
        image: k8s-gcrio.azureedge.net/exechealthz-amd64:1.2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthz-dnsmasq
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: healthz
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            memory: 50Mi
          requests:
            cpu: 10m
            memory: 50Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: Default
      nodeSelector:
        beta.kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kube-dns
      serviceAccountName: kube-dns
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      volumes:
      - configMap:
          defaultMode: 420
          name: kube-dns
          optional: true
        name: kube-dns-config
status:
  availableReplicas: 2
  conditions:
  - lastTransitionTime: 2018-07-24T18:21:32Z
    lastUpdateTime: 2018-07-24T18:23:33Z
    message: ReplicaSet "kube-dns-v20-658cf7bf44" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: 2018-07-24T22:11:17Z
    lastUpdateTime: 2018-07-24T22:11:17Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 1
  readyReplicas: 2
  replicas: 2
  updatedReplicas: 2

Looks like the flag (I assume you're looking for another --server flag) isn't present in the dnsmasq container but the logs do show that dnsmasq is supposedly using the configmap defined nameservers.

kubectl logs -n kube-system kube-dns-v20-658cf7bf44-ht8xl -c dnsmasq
I0724 22:10:43.479786       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --log-facility=-] true} /kube-dns-config 10000000000}
I0724 22:10:43.480145       1 sync.go:167] Updated stubDomains to map[<mydomain>.int:[10.255.0.5 10.255.0.6 10.255.0.7]]
I0724 22:10:43.480234       1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --log-facility=- --server /<mydomain>.int/10.255.0.5 --server /<mydomain>.int/10.255.0.6 --server /<mydomain>.int/10.255.0.7]
I0724 22:10:43.807253       1 nanny.go:119]
I0724 22:10:43.807348       1 nanny.go:116] dnsmasq[12]: started, version 2.78 cachesize 1000
I0724 22:10:43.808409       1 nanny.go:116] dnsmasq[12]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0724 22:10:43.808423       1 nanny.go:116] dnsmasq[12]: using nameserver 10.255.0.7#53 for domain <mydomain>.int
I0724 22:10:43.808429       1 nanny.go:116] dnsmasq[12]: using nameserver 10.255.0.6#53 for domain <mydomain>.int
I0724 22:10:43.808435       1 nanny.go:116] dnsmasq[12]: using nameserver 10.255.0.5#53 for domain <mydomain>.int
I0724 22:10:43.808440       1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0724 22:10:43.808445       1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0724 22:10:43.808450       1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053
I0724 22:10:43.808456       1 nanny.go:116] dnsmasq[12]: read /etc/hosts - 7 addresses
W0724 22:10:43.808587       1 nanny.go:120] Got EOF from stdout
I0724 23:02:03.482634       1 sync.go:167] Updated stubDomains to map[<mydomain>.int:[10.255.0.5 10.255.0.6 10.255.0.7]]
I0724 23:02:03.482676       1 sync.go:177] Updated upstreamNameservers to [10.255.0.5 10.255.0.6 10.255.0.7]
I0724 23:02:03.482709       1 nanny.go:194] Restarting dnsmasq with new configuration
I0724 23:02:03.482717       1 nanny.go:143] Killing dnsmasq
I0724 23:02:03.482750       1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --log-facility=- --server /<mydomain>.int/10.255.0.5 --server /<mydomain>.int/10.255.0.6 --server /<mydomain>.int/10.255.0.7 --server 10.255.0.5 --server 10.255.0.6 --server 10.255.0.7 --no-resolv]
I0724 23:02:03.483242       1 nanny.go:119]
W0724 23:02:03.483386       1 nanny.go:120] Got EOF from stderr
I0724 23:02:03.787708       1 nanny.go:119]
W0724 23:02:03.787759       1 nanny.go:120] Got EOF from stdout
I0724 23:02:03.787788       1 nanny.go:116] dnsmasq[16]: started, version 2.78 cachesize 1000
I0724 23:02:03.787816       1 nanny.go:116] dnsmasq[16]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0724 23:02:03.787845       1 nanny.go:116] dnsmasq[16]: using nameserver 10.255.0.7#53
I0724 23:02:03.787852       1 nanny.go:116] dnsmasq[16]: using nameserver 10.255.0.6#53
I0724 23:02:03.787857       1 nanny.go:116] dnsmasq[16]: using nameserver 10.255.0.5#53
I0724 23:02:03.787864       1 nanny.go:116] dnsmasq[16]: using nameserver 10.255.0.7#53 for domain <mydomain>.int
I0724 23:02:03.787871       1 nanny.go:116] dnsmasq[16]: using nameserver 10.255.0.6#53 for domain <mydomain>.int
I0724 23:02:03.787878       1 nanny.go:116] dnsmasq[16]: using nameserver 10.255.0.5#53 for domain <mydomain>.int
I0724 23:02:03.787883       1 nanny.go:116] dnsmasq[16]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0724 23:02:03.787889       1 nanny.go:116] dnsmasq[16]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0724 23:02:03.787897       1 nanny.go:116] dnsmasq[16]: using nameserver 127.0.0.1#10053
I0724 23:02:03.788037       1 nanny.go:116] dnsmasq[16]: read /etc/hosts - 7 addresses

The only error I see is the EOF, I'm not sure why the value isn't being represented in the dnsmasq container itself...

We are using acs-engine and the config that showed up from get deployments seem to match that: https://github.com/Azure/acs-engine/blob/5e62bbf26536cbda4b19b02a49469d881831df10/parts/k8s/addons/kubernetesmasteraddons-kube-dns-deployment.yaml

I'm wondering if that seems to be overriding my values somehow but the logs seem to disprove that.

@MrHohn
Copy link
Member

MrHohn commented Jul 25, 2018

Instead of --server=127.0.0.1#10053, the dnsmasq should be using --server=/cluster.local/127.0.0.1#10053 instead (or with the customized cluster domain). In your case because 127.0.0.1#10053 is listed as the first nameserver without explicit domain, basically all queries (that don't match other domains) will be forwarded to it.

I think ACS has a mis-configuration here: https://github.com/Azure/acs-engine/blob/5e62bbf26536cbda4b19b02a49469d881831df10/parts/k8s/addons/kubernetesmasteraddons-kube-dns-deployment.yaml#L132.

Ref how this is configured upstream: https://github.com/kubernetes/kubernetes/blob/753632d85b7639ffadb05eed3e49dbfbbd5360b6/cluster/addons/dns/kube-dns/kube-dns.yaml.base#L170.

I'm not sure why the value isn't being represented in the dnsmasq container itself.

The dnsmasq nanny watches the kube-dns configmap and does a live restart on dnsmasq using updated flags. The deployment itself will not be updated.

@visokoo
Copy link
Author

visokoo commented Jul 26, 2018

@MrHohn thanks for the clarification. I had an inkling that that was the problem and created an issue with ACS already. I will definitely link to this ticket here for reference as well.

Follow up question on:

The dnsmasq nanny watches the kube-dns configmap and does a live restart on dnsmasq using updated flags. The deployment itself will not be updated.

If the deployment itself is not updated, do subsequent scale ups of the DNS deployment contain stale data then?

@MrHohn
Copy link
Member

MrHohn commented Jul 26, 2018

If the deployment itself is not updated, do subsequent scale ups of the DNS deployment contain stale data then?

No, the new kube-dns replicas will read from kube-dns configmap upon startup hence will contain the latest configuration.

@visokoo
Copy link
Author

visokoo commented Jul 26, 2018

Awesome, thank you for your help!

@visokoo visokoo closed this as completed Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants