Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot set --max-pods in the eks configuration #2551

Closed
insider89 opened this issue Apr 5, 2023 · 12 comments
Closed

Cannot set --max-pods in the eks configuration #2551

insider89 opened this issue Apr 5, 2023 · 12 comments

Comments

@insider89
Copy link
Contributor

Description

Cannot override max-pods with latest 19.12 module. I've cluster provision with m2.large instance, which set 17 pods per node by default. I've set ENABLE_PREFIX_DELEGATION = "true" and WARM_PREFIX_TARGET = "1" for vpc-cni addons, but it doesn't help, still have 17 pods per node. In the Launch templates I see following:

/etc/eks/bootstrap.sh dev --kubelet-extra-args '--node-labels=node_group=infra,eks.amazonaws.com/nodegroup-image=ami-04dc8cdc2e948f054,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=infra-20230316203627944100000001 --register-with-taints=infra=true:NoSchedule --max-pods=17' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP --use-max-pods false

I tried to provide the following part to my managed group configuration, but module just ignore it:

      enable_bootstrap_user_data = true
      bootstrap_extra_args       = "--kubelet-extra-args '--max-pods=50'"

      pre_bootstrap_user_data = <<-EOT
        export USE_MAX_PODS=false
      EOT

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

  • Module version [Required]: 19.12

  • Terraform version:

Terraform v1.4.2
  • Provider version(s):
Terraform v1.4.2
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v4.61.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.2
+ provider registry.terraform.io/hashicorp/helm v2.9.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.19.0
+ provider registry.terraform.io/hashicorp/time v0.9.1
+ provider registry.terraform.io/hashicorp/tls v4.0.4

Reproduction Code [Required]

# https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2009
data "aws_eks_cluster" "default" {
  name = local.name
  depends_on = [
    module.eks.eks_managed_node_groups,
  ]
}

data "aws_eks_cluster_auth" "default" {
  name = local.name
  depends_on = [
    module.eks.eks_managed_node_groups,
  ]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

data "aws_ami" "eks_default" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amazon-eks-node-${local.cluster_version}-v*"]
  }
}

data "aws_iam_roles" "sso_admins" {
  name_regex  = "AWSReservedSSO_AdministratorAccess_.*"
  path_prefix = "/aws-reserved/sso.amazonaws.com/eu-west-1/"
}

data "aws_iam_roles" "sso_developers" {
  name_regex  = "AWSReservedSSO_DeveloperAccess_.*"
  path_prefix = "/aws-reserved/sso.amazonaws.com/eu-west-1/"
}

locals {
  name            = "dev"
  cluster_version = "1.25"
  region          = "eu-west-1"

  vpc_cidr = data.terraform_remote_state.vpc.outputs.vpc_cidr_block
  azs      = slice(data.aws_availability_zones.available.names, 0, 3)

  tags = {
    Environment = "dev"
    Team        = "DevOps"
    Terraform   = "true"
  }
}

data "aws_availability_zones" "available" {}
data "aws_caller_identity" "current" {}


################################################################################
# EKS Module
################################################################################

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "19.12"

  cluster_name                   = local.name
  cluster_version                = local.cluster_version
  cluster_endpoint_public_access = false

  cluster_addons = {
    coredns = {
      addon_version = "v1.9.3-eksbuild.2"

      timeouts = {
        create = "25m"
        delete = "10m"
      }
    }
    kube-proxy = {
      addon_version = "v1.25.6-eksbuild.2"
    }
    vpc-cni = {
      addon_version  = "v1.12.6-eksbuild.1"
      before_compute = true
      configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
    aws-ebs-csi-driver = {
      addon_version            = "v1.17.0-eksbuild.1"
      service_account_role_arn = module.ebs_csi_irsa_role.iam_role_arn
    }
  }

  vpc_id                   = data.terraform_remote_state.vpc.outputs.vpc_id
  subnet_ids               = data.terraform_remote_state.vpc.outputs.private_subnets
  control_plane_subnet_ids = data.terraform_remote_state.vpc.outputs.intra_subnets

  # https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2009#issuecomment-1262099428
  cluster_security_group_additional_rules = {
    ingress = {
      description                = "EKS Cluster allows 443 port to get API call"
      type                       = "ingress"
      from_port                  = 443
      to_port                    = 443
      protocol                   = "TCP"
      cidr_blocks                = ["10.1.0.0/16"]
      source_node_security_group = false
    }
  }

  node_security_group_additional_rules = {
    node_to_node = {
      from_port = 0
      to_port   = 0
      protocol  = -1
      self      = true
      type      = "ingress"
    }
  }

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    attach_cluster_primary_security_group = true

    ami_type = "AL2_x86_64"

    instance_types = [
      "m5.large",
      "m5.xlarge",
      "m4.large",
      "m4.xlarge",
      "c3.large",
      "c3.xlarge",
      "t2.large",
      "t2.medium",
      "t2.xlarge",
      "t3.medium",
      "t3.large",
      "t3.xlarge"
    ]
    iam_role_additional_policies = {
      AmazonEC2ContainerRegistryReadOnly = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
    }
  }

  eks_managed_node_groups = {
    default = {
      description = "Default EKS managed node group"

      use_custom_launch_template = false

      remote_access = {
        ec2_ssh_key = data.terraform_remote_state.ssh_key.outputs.aws_key_pair_id
      }

      ami_id                     = data.aws_ami.eks_default.image_id
      enable_bootstrap_user_data = true
      bootstrap_extra_args       = "--kubelet-extra-args '--max-pods=50'"

      pre_bootstrap_user_data = <<-EOT
        export USE_MAX_PODS=false
      EOT

      min_size     = 1
      max_size     = 10
      desired_size = 1
      disk_size    = 20

      update_config = {
        max_unavailable_percentage = 33 # or set `max_unavailable`
      }

      labels = {
        node_group = "default"
      }
    }

    infra = {
      description                = "EKS managed node group for infra workloads"
      use_custom_launch_template = false

      remote_access = {
        ec2_ssh_key = data.terraform_remote_state.ssh_key.outputs.aws_key_pair_id
      }

      min_size     = 1
      max_size     = 10
      desired_size = 1
      disk_size    = 20

      update_config = {
        max_unavailable_percentage = 33 # or set `max_unavailable`
      }

      labels = {
        node_group = "infra"
      }

      taints = {
        dedicated = {
          key    = "infra"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      }
    }
  }

  # aws-auth configmap
  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_admins.names)}"
      username = "sso-admin:{{SessionName}}"
      groups   = ["system:masters"]
    },
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_developers.names)}"
      username = "sso-developer:{{SessionName}}"
      groups   = ["system:masters"]
    },
  ]

  tags = local.tags
}

Expected behavior

Have 50 pods per node

Actual behavior

Have 17 pod per node

Additional context

I am going through different issue, but didn't find how to change the max-pod. This suggestion doesn't work.

@insider89
Copy link
Contributor Author

When I delete a few instances from instance type and left instance with a bigger number of pods, I have limits of 29 pods per node now. But still cannot reach goal with 110 pods.
When I only left m2.large instance type, I have 110 pods per node, but it's not cause bootstrap_extra_args or any other configuration, it's set automatically, don't know why.

So the question still actually, how to set max-pod to 110.

@Pionerd
Copy link

Pionerd commented May 1, 2023

It looks like I have the same issue over here. Any updates from your side @insider89 ?

@insider89
Copy link
Contributor Author

@Pionerd I didn't find a way how to set --max-pod in the eks terraform module. I figure out that if I provide different instance type in the instance_types , it set --max-pod to lowest number from the instance_types. So, first of all I left instance type with the same amount of CPU and Memory in the instance group(as cluster autoscaler cannot scale different instance type), and remove the instance type with lowest max pods by this list.

@Pionerd
Copy link

Pionerd commented May 1, 2023

I hate to say this, but I recreated my environment from scratch and now my max_pods are 110...
I suspect it has to do with configuring the VPC CNI before creation of the node pools.

The following is sufficient, no need for bootstrap_extra_args

  cluster_addons = {
    vpc-cni = {
      most_recent = true  
      before_compute           = true
      configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
  }

@insider89
Copy link
Contributor Author

@Pionerd I've this flag enabled as well for cni plugin, but still have max pod per node depends from the instance type I provide in the instance_types variable, in my case it's 20 pods per node(cause I've m4.large in the instance type), here is mu full configuration:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "19.13"

  cluster_name                   = local.name
  cluster_version                = local.cluster_version
  cluster_endpoint_public_access = false

  cluster_addons = {
    coredns = {
      addon_version = "v1.9.3-eksbuild.2"

      timeouts = {
        create = "25m"
        delete = "10m"
      }
    }
    kube-proxy = {
      addon_version = "v1.26.2-eksbuild.1"
    }
    vpc-cni = {
      addon_version  = "v1.12.6-eksbuild.1"
      before_compute = true
      configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
    aws-ebs-csi-driver = {
      addon_version            = "v1.17.0-eksbuild.1"
      service_account_role_arn = module.ebs_csi_irsa_role.iam_role_arn
    }
  }

  vpc_id                   = data.terraform_remote_state.vpc.outputs.vpc_id
  subnet_ids               = data.terraform_remote_state.vpc.outputs.private_subnets
  control_plane_subnet_ids = data.terraform_remote_state.vpc.outputs.intra_subnets

  # https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2009#issuecomment-1262099428
  cluster_security_group_additional_rules = {
    ingress = {
      description                = "EKS Cluster allows 443 port to get API call"
      type                       = "ingress"
      from_port                  = 443
      to_port                    = 443
      protocol                   = "TCP"
      cidr_blocks                = ["10.1.0.0/16"]
      source_node_security_group = false
    }
  }

  node_security_group_additional_rules = {
    node_to_node = {
      from_port = 0
      to_port   = 0
      protocol  = -1
      self      = true
      type      = "ingress"
    }
  }

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    attach_cluster_primary_security_group = true

    iam_role_additional_policies = {
      AmazonEC2ContainerRegistryReadOnly = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
    }
  }

  eks_managed_node_groups = {
    default = {
      description = "Default EKS managed node group"

      use_custom_launch_template = false

      remote_access = {
        ec2_ssh_key = data.terraform_remote_state.ssh_key.outputs.aws_key_pair_id
      }

      instance_types = [
        "m5.large",
        "t2.large",
        "t3.large",
        "m5d.large",
        "m5a.large",
        "m5ad.large",
        "m5n.large",
        "m5dn.large",
        "m4.large",
      ]

      min_size     = 1
      max_size     = 15
      desired_size = 1
      disk_size    = 20

      update_config = {
        max_unavailable_percentage = 33 # or set `max_unavailable`
      }

      labels = {
        node_group = "default"
      }
    }

    infra = {
      description                = "EKS managed node group for infra workloads"
      use_custom_launch_template = false

      instance_types = [
        "m5.large",
        "t2.large",
        "t3.large",
        "m5d.large",
        "m5a.large",
        "m5ad.large",
        "m5n.large",
        "m5dn.large",
        "m4.large"
      ]

      remote_access = {
        ec2_ssh_key = data.terraform_remote_state.ssh_key.outputs.aws_key_pair_id
      }

      min_size     = 1
      max_size     = 15
      desired_size = 1
      disk_size    = 20

      update_config = {
        max_unavailable_percentage = 33 # or set `max_unavailable`
      }

      labels = {
        node_group = "infra"
      }

      taints = {
        dedicated = {
          key    = "infra"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      }
    }
  }

  # aws-auth configmap
  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_admins.names)}"
      username = "sso-admin:{{SessionName}}"
      groups   = ["system:masters"]
    },
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_developers.names)}"
      username = "sso-developer:{{SessionName}}"
      groups   = ["system:masters"]
    },
  ]
}

@Pionerd
Copy link

Pionerd commented May 1, 2023

Hi @insider89

Just ran into the issue again with exactly the same code as before. Looks like some kind of timing issue still. What worked for me (this time, no guarantees) is leaving the cluster intact, removing the existing node group only and recreating it.

@SrDayne
Copy link

SrDayne commented May 4, 2023

Hello guys.

For me it looks like problem not in terraform itself but in aws. Looks like amazon's bootstrap overrides provided values.
I use next workaround:

eks_managed_node_groups = {
    dev = {
      name = "k8s-dev"

      instance_types = ["t3.medium"]

      enable_bootstrap_user_data = false
      
      pre_bootstrap_user_data = <<-EOT
        #!/bin/bash
        LINE_NUMBER=$(grep -n "KUBELET_EXTRA_ARGS=\$2" /etc/eks/bootstrap.sh | cut -f1 -d:)
        REPLACEMENT="\ \ \ \ \ \ KUBELET_EXTRA_ARGS=\$(echo \$2 | sed -s -E 's/--max-pods=[0-9]+/--max-pods=30/g')"
        sed -i '/KUBELET_EXTRA_ARGS=\$2/d' /etc/eks/bootstrap.sh
        sed -i "$${LINE_NUMBER}i $${REPLACEMENT}" /etc/eks/bootstrap.sh
      EOT

      min_size = 1
      max_size = 3
      desired_size = 2

      #taints = [
      #  {
      #    key = "node.cilium.io/agent-not-ready"
      #    value = "true"
      #    effect = "NoExecute"
      #  }
      #]
    }
  }

It is not elegant solution, but it works. It replaces on the fly line in bootstrap script which responsible for --kubelet-extra-args. Notice, that if you use custom ami_id setup could be a little bit different, but still, it should work.

As a result:
kubectl describe node ip-10-1-0-102.eu-south-1.compute.internal

Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           20959212Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      3943372Ki
  pods:                        30
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1930m
  ephemeral-storage:           18242267924
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      3388364Ki
  pods:                        30

Edit
Tried node autoscaling, tried multiple times recreate environments - script works.

@bryantbiggs
Copy link
Member

  1. Yes - managed nodegroups own the boostrap script in the user data which leads to hacky work-arounds How to enable containerd when using EKS managed node group awslabs/amazon-eks-ami#844
  2. The proper way to enable max pods is by setting the intended values via the VPC CNI custom configuration. If the VPC CNI is configured before nodegroups are created and nodes launched, EKS managed nodegroups will infer from the VPC CNI configuration the proper value for max pods. There is a flag that should be enabled to ensure the VPC CNI can be created before the associated nodegroups which has a default timeout of 30s that can be increased if necessary
    # This sleep resource is used to provide a timed gap between the cluster creation and the downstream dependencies
    # that consume the outputs from here. Any of the values that are used as triggers can be used in dependencies
    # to ensure that the downstream resources are created after both the cluster is ready and the sleep time has passed.
    # This was primarily added to give addons that need to be configured BEFORE data plane compute resources
    # enough time to create and configure themselves before the data plane compute resources are created.
    resource "time_sleep" "this" {
    count = var.create ? 1 : 0
    create_duration = var.dataplane_wait_duration
    triggers = {
    cluster_name = aws_eks_cluster.this[0].name
    cluster_endpoint = aws_eks_cluster.this[0].endpoint
    cluster_version = aws_eks_cluster.this[0].version
    cluster_certificate_authority_data = aws_eks_cluster.this[0].certificate_authority[0].data
    }
    }

For now though, closing out since there are no further actions (that I am aware of) that the module can take to improve upon this area

@CostinaDamir
Copy link

pre_bootstrap_user_data = <<-EOT
#!/bin/bash
LINE_NUMBER=$(grep -n "KUBELET_EXTRA_ARGS=$2" /etc/eks/bootstrap.sh | cut -f1 -d:)
REPLACEMENT="\ \ \ \ \ \ KUBELET_EXTRA_ARGS=$(echo $2 | sed -s -E 's/--max-pods=[0-9]+/--max-pods=30/g')"
sed -i '/KUBELET_EXTRA_ARGS=$2/d' /etc/eks/bootstrap.sh
sed -i "$${LINE_NUMBER}i $${REPLACEMENT}" /etc/eks/bootstrap.sh
EOT

I tried your workarround, but I get tf error:
│ Error: Variables not allowed │ │ on <value for var.eks_managed_node_groups> line 1: │ (source code not available)

Any idea?

@ophintor
Copy link

I think you need to escape all the $?

@SrDayne
Copy link

SrDayne commented Jun 15, 2023

@CostinaDamir
As @ophintor said, you need to escape multiple $. Copy piece of code without any changes:

      pre_bootstrap_user_data = <<-EOT
        #!/bin/bash
        LINE_NUMBER=$(grep -n "KUBELET_EXTRA_ARGS=\$2" /etc/eks/bootstrap.sh | cut -f1 -d:)
        REPLACEMENT="\ \ \ \ \ \ KUBELET_EXTRA_ARGS=\$(echo \$2 | sed -s -E 's/--max-pods=[0-9]+/--max-pods=30/g')"
        sed -i '/KUBELET_EXTRA_ARGS=\$2/d' /etc/eks/bootstrap.sh
        sed -i "$${LINE_NUMBER}i $${REPLACEMENT}" /etc/eks/bootstrap.sh
      EOT

Also, you can replace --max-pods=30 with --max-pods=${var.cluster_max_pods} and set amount of pods with variable.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants