[Bug] maxPodsPerNode does't work with eks 1.22 #5134

mathieu-lemay · 2022-04-18T19:04:59Z

What were you trying to accomplish?

I'm trying to create a managed node group with a limit on the number of pods per node.

What happened?

The node group is created but maxPodsPerNode is ignored and the nodes use their default value instead (29 in my case for a m5.large node).

How to reproduce it?

$ cat > nodegroup.yaml << EOF
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
managedNodeGroups:
  -
    name: test-max-pods
    desiredCapacity: 1
    minSize: 1
    maxSize: 5
    maxPodsPerNode: 12
    iam:
      withAddonPolicies:
        appMesh: true
        appMeshPreview: true
        autoScaler: true
        efs: true
metadata:
  name: my-eks-1-22-cluster
  region: ca-central-1
  version: auto
EOF

$ eksctl create nodegroup -f nodegroup.yaml

Logs
Creation log

2022-04-18 14:46:56 [ℹ]  using region ca-central-1
2022-04-18 14:46:57 [ℹ]  will use version 1.22 for new nodegroup(s) based on control plane version
2022-04-18 14:46:58 [ℹ]  nodegroup "test-max-pods" will use "" [AmazonLinux2/1.22]
2022-04-18 14:46:59 [!]  retryable error (Throttling: Rate exceeded
        status code: 400, request id: e73d37bd-b940-484c-ad78-6312b8b5e6d3) from cloudformation/DescribeStacks - will retry after delay of 6.20133802s
2022-04-18 14:47:06 [ℹ]  4 existing nodegroup(s) (my-eks-1-22-cluster-a,my-eks-1-22-cluster-b,my-eks-1-22-cluster-c,my-eks-1-22-cluster-d) will be excluded
2022-04-18 14:47:06 [ℹ]  1 nodegroup (test-max-pods) was included (based on the include/exclude rules)
2022-04-18 14:47:06 [ℹ]  will create a CloudFormation stack for each of 1 managed nodegroups in cluster "my-eks-1-22-cluster"
2022-04-18 14:47:06 [ℹ]
2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "test-max-pods" } }
}
2022-04-18 14:47:06 [ℹ]  checking cluster stack for missing resources
2022-04-18 14:47:07 [ℹ]  cluster stack has all required resources
2022-04-18 14:47:07 [!]  retryable error (Throttling: Rate exceeded
        status code: 400, request id: 1aaf9b5d-6bb7-4370-a4f5-c982f58dcc34) from cloudformation/DescribeStacks - will retry after delay of 5.132635276s
2022-04-18 14:47:13 [ℹ]  building managed nodegroup stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:47:13 [ℹ]  deploying stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:47:13 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:47:32 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:47:51 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:48:07 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:48:25 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:48:41 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:49:01 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:49:17 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:49:33 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:49:52 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:50:10 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:50:28 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:50:45 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:51:05 [ℹ]  waiting for CloudFormation stack "eksctl-my-eks-1-22-cluster-nodegroup-test-max-pods"
2022-04-18 14:51:05 [ℹ]  no tasks
2022-04-18 14:51:05 [✔]  created 0 nodegroup(s) in cluster "my-eks-1-22-cluster"
2022-04-18 14:51:05 [ℹ]  nodegroup "test-max-pods" has 1 node(s)
2022-04-18 14:51:05 [ℹ]  node "ip-10-75-1-120.ca-central-1.compute.internal" is ready
2022-04-18 14:51:05 [ℹ]  waiting for at least 1 node(s) to become ready in "test-max-pods"
2022-04-18 14:51:05 [ℹ]  nodegroup "test-max-pods" has 1 node(s)
2022-04-18 14:51:05 [ℹ]  node "ip-10-75-1-120.ca-central-1.compute.internal" is ready
2022-04-18 14:51:05 [✔]  created 1 managed nodegroup(s) in cluster "my-eks-1-22-cluster"
2022-04-18 14:51:06 [ℹ]  checking security group configuration for all nodegroups
2022-04-18 14:51:06 [ℹ]  all godegroups have up-to-date cloudformation templates

kubect describe node/ip-10-75-1-120.ca-central-1.compute.internal

<== removed ==>
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           83873772Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7934440Ki
  pods:                        29  # <-- Should be 12
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1930m
  ephemeral-storage:           76224326324
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7244264Ki
  pods:                        29  # <-- Should be 12
System Info:
  Machine ID:                 ec2c7770b7e8fd8b2edd9808f7b986a6
  System UUID:                ec2c7770-b7e8-fd8b-2edd-9808f7b986a6
  Boot ID:                    9bbc3b1f-38e7-424b-ac45-b2d093438d75
  Kernel Version:             5.4.181-99.354.amzn2.x86_64
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.13
  Kubelet Version:            v1.22.6-eks-7d68063
  Kube-Proxy Version:         v1.22.6-eks-7d68063
<== removed ==>

Anything else we need to know?
Debian 11 with downloaded 0.93.0 binary

Versions

$ eksctl info
eksctl version: 0.93.0
kubectl version: v1.23.5
OS: linux

$ eksctl get clusters --name my-eks-1-22-cluster
2022-04-18 14:55:24 [ℹ]  eksctl version 0.93.0
2022-04-18 14:55:24 [ℹ]  using region ca-central-1
NAME                VERSION STATUS CREATED              VPC     SUBNETS    SECURITYGROUPS PROVIDER
my-eks-1-22-cluster 1.22    ACTIVE 2022-04-14T14:19:15Z vpc-xxx subnet-xxx sg-xxx         EKS

The text was updated successfully, but these errors were encountered:

Skarlso · 2022-04-18T19:25:15Z

Hello, can you please also verify that the created launch template's user data contains MAX_POD setting to 12?

mathieu-lemay · 2022-04-18T20:27:48Z

Looks like it. This is the user data:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=63096ae1a5df4c7b8a9e6a77290c89ef3f47a3a436b02df68a95bf6a8458

--63096ae1a5df4c7b8a9e6a77290c89ef3f47a3a436b02df68a95bf6a8458
Content-Type: text/x-shellscript
Content-Type: charset="us-ascii"

#!/bin/sh
set -ex
sed -i -E "s/^USE_MAX_PODS=\"\\$\{USE_MAX_PODS:-true}\"/USE_MAX_PODS=false/" /etc/eks/bootstrap.sh
KUBELET_CONFIG=/etc/kubernetes/kubelet/kubelet-config.json
echo "$(jq ".maxPods=12" $KUBELET_CONFIG)" > $KUBELET_CONFIG
--63096ae1a5df4c7b8a9e6a77290c89ef3f47a3a436b02df68a95bf6a8458--

Skarlso · 2022-04-18T20:31:16Z

Okay cool. That's something at least. :)

We'll take a look, but if we provide the right flags, I'm afraid there is little we can do.

Have you tried testing it with more than 12 pods? It might write 29, but it might not allow more than 12 using the controller, or something something AWS magic? :)

mathieu-lemay · 2022-04-18T20:39:13Z

Fair enough, it could be something that changed within EKS.

I did test it already, unfortunatly, there was no AWS magic, and I ended up with about 27 pods. That's how I noticed the issue.

Skarlso · 2022-04-19T05:50:22Z

Thanks!

cPu1 · 2022-04-19T08:04:29Z

Fair enough, it could be something that changed within EKS.

I did test it already, unfortunatly, there was no AWS magic, and I ended up with about 27 pods. That's how I noticed the issue.

I initially suspected that the script eksctl uses to set max pods for managed nodegroups no longer works in EKS 1.22, potentially because the bootstrap script in 1.22 AMIs has changed. But after testing, I can confirm that eksctl is still able to set maxPods in the kubelet config but it's not being honoured.

cPu1 · 2022-04-19T08:29:32Z

Fair enough, it could be something that changed within EKS.
I did test it already, unfortunatly, there was no AWS magic, and I ended up with about 27 pods. That's how I noticed the issue.

I initially suspected that the script eksctl uses to set max pods for managed nodegroups no longer works in EKS 1.22, potentially because the bootstrap script in 1.22 AMIs has changed. But after testing, I can confirm that eksctl is still able to set maxPods in the kubelet config but it's not being honoured.

I have tracked it down to EKS supplying --max-pods as an argument to the kubelet. The implementation for maxPodsPerNode in eksctl writes the maxPods field to the kubelet config, but EKS is now passing --max-pods as an argument to the kubelet, overriding the field in the kubelet config.

We can also work around this but we'll discuss this with the EKS team first as there were some talks about deprecating max pods earlier.

mathieu-lemay · 2022-04-19T13:28:37Z

I have tracked it down to EKS supplying --max-pods as an argument to the kubelet. The implementation for maxPodsPerNode in eksctl writes the maxPods field to the kubelet config, but EKS is now passing --max-pods as an argument to the kubelet, overriding the field in the kubelet config.

We can also work around this but we'll discuss this with the EKS team first as there were some talks about deprecating max pods earlier.

Thanks for the update! In the meantime, we could work around the issue by setting resource requests on our pods, instead of setting a hardcoded number of pods. We have been thinking about it for a while anyway, that was just the push we needed to take the time and do it.

Himangini · 2022-05-03T15:40:21Z

@matthewdepietro tagging you here as per your request 👍🏻

suket22 · 2022-05-03T17:17:11Z

Adding some context on Managed Nodegroups' behavior - if the VPC CNI is running on >= 1.9, Managed Nodegroups attempts to auto-calculate the value of maxPods and sets it on the kubelet as @cPu1 has found. Managed Nodegroups will look at the different environment variables on the VPC CNI to determine what value to set (it essentially emulates the logic in this calculator script). It takes into account PrefixDelegation, Max ENIs etc.

This logic should only be triggered when the ManagedNodegroups is being created without a custom AMI. When looking to override kubelet config, it's recommended to specify an AMI in the launch template passed to CreateNodegroup since you then get full control over all bootstrap parameters including max pods.

Himangini · 2022-06-07T14:08:19Z

We need to come up with a plan to support this as cleanly as possible without hacks.

Timebox: 1-2 days
Document the outcomes here.

cPu1 · 2022-06-15T12:58:33Z

Looking into this more, a clean solution to support max pods in eksctl is to resolve the AMI using SSM, passing it as a custom AMI to the MNG API, and use a custom bootstrap script, setting --max-pods to the supplied value, when maxPodsPerNode is set. This approach, however, breaks eksctl upgrade nodegroup and requires eksctl to handle upgrades for nodegroups that have maxPodsPerNode set.

cPu1 · 2022-06-22T10:46:36Z

Looking into this more, a clean solution to support max pods in eksctl is to resolve the AMI using SSM, passing it as a custom AMI to the MNG API, and use a custom bootstrap script, setting --max-pods to the supplied value, when maxPodsPerNode is set. This approach, however, breaks eksctl upgrade nodegroup and requires eksctl to handle upgrades for nodegroups that have maxPodsPerNode set.

Alternatively, we can use a workaround/hack that modifies the bootstrap.sh script and removes the --max-pods argument passed in the launch template's user data generated by EKS. This is similar to the how max-pods was implemented previously and requires less effort than the custom AMI approach.

suket22 · 2022-07-05T21:19:02Z

and use a custom bootstrap script, setting --max-pods to the supplied value, when maxPodsPerNode is set

This is the approach I'd be in favor of. --maxPodsPerNode is essentially a property of the kubelet, and the only supported way to modify your kubeletConfiguration is using custom AMIs with your managed nodegroup so this approach makes sense to me.

I'm not sure I understood the mechanics of the workaround you'd mentioned. I think you meant you could edit the bootstrap script on the AMI itself and remove the max-pods argument that MNG API tries to set, but I'm not sure I understand how eksctl would set the value of maxPodsPerNode on the kubelet itself. Lmk what I'm missing here.

In the long term, we've been thinking of rewriting the EKS bootstrap script so that kubelet parameter overrides can be specified within your UserData section, and it'll be honored by however MNG bootstraps, but it's pending resourcing.

Himangini · 2022-07-06T12:29:54Z

Looking into this more, a clean solution to support max pods in eksctl is to resolve the AMI using SSM, passing it as a custom AMI to the MNG API, and use a custom bootstrap script, setting --max-pods to the supplied value, when maxPodsPerNode is set. This approach, however, breaks eksctl upgrade nodegroup and requires eksctl to handle upgrades for nodegroups that have maxPodsPerNode set.

Alternatively, we can use a workaround/hack that modifies the bootstrap.sh script and removes the --max-pods argument passed in the launch template's user data generated by EKS. This is similar to the how max-pods was implemented previously and requires less effort than the custom AMI approach.

I am inclined to this approach as well instead of breaking eksctl upgrade nodegroup ✨

cPu1 · 2022-07-06T14:00:21Z

I'm not sure I understood the mechanics of the workaround you'd mentioned. I think you meant you could edit the bootstrap script on the AMI itself and remove the max-pods argument that MNG API tries to set

Correct.

but I'm not sure I understand how eksctl would set the value of maxPodsPerNode on the kubelet itself. Lmk what I'm missing here.

eksctl will set it in the kubelet config, which will then be read by kubelet.

In the long term, we've been thinking of rewriting the EKS bootstrap script so that kubelet parameter overrides can be specified within your UserData section, and it'll be honored by however awslabs/amazon-eks-ami#875, but it's pending resourcing.

Thanks for sharing this. I think we'll go with the workaround for now, given that we already have a similar workaround in place and it requires less effort than using a custom AMI with a custom bootstrap script. We'll revisit this approach after the EKS bootstrap script starts accepting kubelet parameter overrides.

github-actions · 2022-08-06T02:12:11Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

bryanasdev000 · 2022-09-27T16:53:34Z

Just dumping this for reference:

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/user_data.md#%EF%B8%8F-caveat
awslabs/amazon-eks-ami#873
awslabs/amazon-eks-ami#844

Also, maxPodsPerNode does not seem to work with latest 1.21 AMIs anymore (awslabs/amazon-eks-ami@v20220824...v20220914).

I am using this in new created clusters and its still working awslabs/amazon-eks-ami#844 (comment) (tested on 1.21, 1.22 and 1.23).

EDIT: It seems that is working again in 1.21 for me with ami-051aa0d5889741142 (EKS 1.21/us-east-2) as of 2022/10/07.

cPu1 · 2022-12-01T11:21:07Z

Just dumping this for reference:

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/user_data.md#%EF%B8%8F-caveat awslabs/amazon-eks-ami#873 awslabs/amazon-eks-ami#844

Also, maxPodsPerNode does not seem to work with latest 1.21 AMIs anymore (awslabs/amazon-eks-ami@v20220824...v20220914).

I am using this in new created clusters and its still working awslabs/amazon-eks-ami#844 (comment) (tested on 1.21, 1.22 and 1.23).

EDIT: It seems that is working again in 1.21 for me with ami-051aa0d5889741142 (EKS 1.21/us-east-2) as of 2022/10/07.

This was fixed by #5808. You should not run into this issue with a recent version of eksctl.

matti · 2022-12-10T20:15:31Z

@cPu1 are you sure that the fix in https://github.com/weaveworks/eksctl/pull/5808/files#diff-3a316f46904258df0dec1e9c9c1d6a89efb06e0637a5c0a6a930c162b5352498R99 is called - the sed appends it in KUBELET_EXTRA_ARGS which is only called if --kubelet-extra-args is passed in?

matti · 2022-12-10T20:27:21Z

okay it does set it, but kubelet is running with --max-pods=110 --max-pods=123 where the latter is te maxPodsPerNode value

mathieu-lemay added the kind/bug label Apr 18, 2022

Himangini added the area/nodegroup label May 9, 2022

Skarlso mentioned this issue May 28, 2022

[Bug] Default --max-pods value of 10 getting applied in the bootstrap.sh when creating nodegroup with eksctl restricting the max pods on the node #5337

Closed

cPu1 self-assigned this Jun 13, 2022

Himangini added the blocked/aws label Jun 22, 2022

Himangini removed the blocked/aws label Jul 6, 2022

github-actions bot added the stale label Aug 6, 2022

cPu1 removed the stale label Aug 8, 2022

Himangini added the priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases label Sep 2, 2022

cPu1 mentioned this issue Oct 20, 2022

Fix max-pods for managed nodegroups #5808

Merged

7 tasks

cPu1 closed this as completed in #5808 Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] maxPodsPerNode does't work with eks 1.22 #5134

[Bug] maxPodsPerNode does't work with eks 1.22 #5134

mathieu-lemay commented Apr 18, 2022

Skarlso commented Apr 18, 2022

mathieu-lemay commented Apr 18, 2022

Skarlso commented Apr 18, 2022

mathieu-lemay commented Apr 18, 2022

Skarlso commented Apr 19, 2022

cPu1 commented Apr 19, 2022

cPu1 commented Apr 19, 2022 •

edited

Loading

mathieu-lemay commented Apr 19, 2022

Himangini commented May 3, 2022

suket22 commented May 3, 2022

Himangini commented Jun 7, 2022

cPu1 commented Jun 15, 2022

cPu1 commented Jun 22, 2022

suket22 commented Jul 5, 2022

Himangini commented Jul 6, 2022

cPu1 commented Jul 6, 2022

github-actions bot commented Aug 6, 2022

bryanasdev000 commented Sep 27, 2022 •

edited

Loading

cPu1 commented Dec 1, 2022

matti commented Dec 10, 2022

matti commented Dec 10, 2022

[Bug] maxPodsPerNode does't work with eks 1.22 #5134

[Bug] maxPodsPerNode does't work with eks 1.22 #5134

Comments

mathieu-lemay commented Apr 18, 2022

What were you trying to accomplish?

What happened?

How to reproduce it?

Skarlso commented Apr 18, 2022

mathieu-lemay commented Apr 18, 2022

Skarlso commented Apr 18, 2022

mathieu-lemay commented Apr 18, 2022

Skarlso commented Apr 19, 2022

cPu1 commented Apr 19, 2022

cPu1 commented Apr 19, 2022 • edited Loading

mathieu-lemay commented Apr 19, 2022

Himangini commented May 3, 2022

suket22 commented May 3, 2022

Himangini commented Jun 7, 2022

cPu1 commented Jun 15, 2022

cPu1 commented Jun 22, 2022

suket22 commented Jul 5, 2022

Himangini commented Jul 6, 2022

cPu1 commented Jul 6, 2022

github-actions bot commented Aug 6, 2022

bryanasdev000 commented Sep 27, 2022 • edited Loading

cPu1 commented Dec 1, 2022

matti commented Dec 10, 2022

matti commented Dec 10, 2022

cPu1 commented Apr 19, 2022 •

edited

Loading

bryanasdev000 commented Sep 27, 2022 •

edited

Loading