-
Notifications
You must be signed in to change notification settings - Fork 557
kube-dns-v20 deployment #3534
Comments
I'm going to be pasting some logs for the 1.14.8 kubedns issue since I have a fresh cluster that's doing the same thing: After adding
tcpdump from dnsmasq
tcpdump from kubedns
tcpdump from dnsmasq
tcpdump from kubedns
Even with |
I initially posted this question on the kubernetes/dns repo and wanted to relay some of the comments here: kubernetes/dns#248 Looks like the upstream/stub domain issue should be resolved if the acs kube-dns-deployment.yaml matched the upstream version of it. Is there any reason why that isn't the case? |
Hi @visokoo, this is the manifest we're currently using for 1.9 and 1.10 clusters: Do you see any obvious omissions or anything else fishy that would break your consul configuration? If so we'd love to incorporate changes that help your scenario! @feiskyer who has worked in this area a bit, do you see anything in @visokoo's scenario that would require us to evolve our kube-dns implementation? |
@jackfrancis, thanks for responding. The obvious ones that I see missing from that manifest are the ones I listed above under the dnsmasq section:
In the acs manifest, it looks like the change would need to be here:
--no-negcache param.
|
@visokoo Thanks! See: Are you able to build from that PR branch and try a repro? (Or really, you could just Shall I assume that that add'l setting says "when you get an NXDOMAIN response (and perhaps other types of not found results) don't cache it"? Something like that? |
@jackfrancis Sorry I wasn't clear, aside from adding the And yes, For the
Can you add that to the PR and I can try building from that PR? |
👌 PR updated, thanks for testing! |
@jackfrancis Tested those settings on my cluster and resolution is working as expected. Thanks for making the change! What's the ETA on this going out in the next release of acs-engine? |
Good to hear! Unfortunately shipping a new cluster w/ those changes (v1.8 at least) seems to have broken some functionality: I'll run against other versions to see if this config has an affinity for certain k8s versions only. Once we get a non-regressive implementation we'll test in master for a week or so and then patch a release, is the normal process. |
Thanks for the clarification! Looks like you fixed the tests? https://circleci.com/gh/Azure/acs-engine/38014 =] Looking forward to the patch and thanks again! |
Sorry to be the bearer of bad news, but that test run is against another PR 😝 If you have a chance, test a cluster using your api model config by building from the branch in PR #3564, let me know how that cluster looks! |
@visokoo this PR, which includes the introduction of the dnsmasq flags you want to 1.11 kube-dns config, is being tested with the original static config: i.e., I'll report back with test results. If things check out, I'll probably close #3564 and drive #3373 to completion. Thanks again for hanging in there! |
Thanks for the update @jackfrancis. Just for further clarification, this change would also be pushed to earlier versions of kube-dns as well, correct? Our prod cluster is specifically using k8s 1.9.6 with kube-dns 1.14.8. If we don't have to upgrade, that would be ideal... |
Is this a request for help?:
Yes.
Is this an ISSUE or FEATURE REQUEST? (choose one):
Issue.
What version of acs-engine?:
v0.14.5 & v0.20.0
Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
acs-engine 0.14.5 | k8s 1.9.6
acs-engine 0.20.0 | k8s 1.11.0
What happened:
I'm running into an issue that's pretty similar to #2999 where the kube-dns-v20 deployments go into a CrashBackoffLoop after adding a config map with my custom consul upstream server.
It doesn't happen right away, but the problem usually presents itself after an hr or so and starts with not being able to do nslookups on the internal kubernetes network, causing the DNS stack to loop since the health check fails.
Though internal lookups fail, external lookups still work when the cluster comes up momentarily. In testing with minikube, I noticed that the args for dnsmasq are slightly different. My minikube cluster's dns stack includes these args that are not present in azure:
After deleting the DNS stack and recreating them with those new values added, my DNS pods have been stable for about a day. Is there a reason why those args aren't included in the kube-dns stack that's spun up via azure's images? I'm not sure if this is the fix, but can those be added?
kube-dns:1.14.8 is missing the same args but has a slightly different issue where requests just don't route to the upstreamserver at all from the pods for external services, which is weird and isn't how it's supposed to work according to kubernetes documentation, thus leading me to try 1.14.10 instead.
Would appreciate any insight...
What you expected to happen:
DNS cluster should be stable after adding custom upstream.
How to reproduce it (as minimally and precisely as possible):
Create a cluster with k8s 1.11.0, add an upstreamNameserver via configmap, resolve a few things and wait about an hr for the DNS stack to start crash looping.
Anything else we need to know:
Our consul DNS is set up with google's dns as its recursors for anything outside of the consul domain.
kube components that we're using:
k8s-gcrio.azureedge.net/exechealthz-amd64:1.2
k8s-gcrio.azureedge.net/k8s-dns-dnsmasq-nanny-amd64:1.14.10
k8s-gcrio.azureedge.net/k8s-dns-kube-dns-amd64:1.14.10
The text was updated successfully, but these errors were encountered: