Fix unschedulable Tinkerbell stack when no worker nodes #2624

chrisdoherty4 · 2023-11-01T17:07:07Z

Closes https://github.com/aws/eks-anywhere-internal/issues/1953.

If all worker nodes elgible to run Tinkerbell stack workloads go offline the cluster and Workload clusters cannot recover failed nodes. This change allows the Tinkerbell stack to run on control plane nodes but prefer Worker nodes at scheduling time.

The patch was tested against a 1 control plane 1 worker node cluster. The Helm chart was installed using existing values, then the worker node was drained. I observed the Tinkerbell stack move to the control plane node. Similarly, when a pod is deleted it preferences the worker node (when untainted). Additionally, a Workload cluster was created from a stack hosted in a cluster with 0 worker nodes.

If all worker nodes elgible to run Tinkerbell stack workloads go offline the cluster and Workload clusters cannot recover failed nodes. This change allows the Tinkerbell stack to run on control plane nodes but prefer Worker nodes at scheduling time.

KubeVIP needs to run on control planes so that when the Tinkerbell stack floats to a control plane node (with Envoy) the VIP can be hosted from that node and accommodate the Local externalTrafficPolicy.

chrisdoherty4 · 2023-11-01T21:21:23Z

projects/tinkerbell/tinkerbell-chart/chart/templates/kube-vip/daemonset.yaml

+      tolerations:
+      {{- include "controlPlaneTolerations" . | indent 6 }}


Relative to the previous PR, this is the new addition. I built a Workload cluster from a management cluster that had worker nodes scaled back to 0 demonstrating the Tinkerbell stack is fully functional when scheduled on the control plane.

chrisdoherty4 · 2023-11-01T22:54:13Z

/approve

eks-distro-bot · 2023-11-01T22:54:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrisdoherty4

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [chrisdoherty4]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chrisdoherty4 · 2023-11-07T17:00:19Z

/cherry-pick release-0.18

eks-distro-pr-bot · 2023-11-07T17:00:57Z

@chrisdoherty4: new pull request created: #2638

In response to this:

/cherry-pick release-0.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chrisdoherty4 added 4 commits October 23, 2023 10:07

Increase Tinkerbell chart patch version

93c6147

Update Tinkerbell chart tags

fcc5826

Tolerate control plane taints in Tinkerbells KubeVIP

a775fcb

KubeVIP needs to run on control planes so that when the Tinkerbell stack floats to a control plane node (with Envoy) the VIP can be hosted from that node and accommodate the Local externalTrafficPolicy.

eks-distro-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 1, 2023

chrisdoherty4 requested a review from jacobweinstock November 1, 2023 21:20

chrisdoherty4 commented Nov 1, 2023

View reviewed changes

eks-distro-bot added the approved label Nov 1, 2023

chrisdoherty4 requested a review from a team November 2, 2023 17:42

jacobweinstock approved these changes Nov 3, 2023

View reviewed changes

eks-distro-bot assigned jacobweinstock Nov 3, 2023

eks-distro-bot added the lgtm label Nov 3, 2023

eks-distro-bot merged commit 53d676d into aws:main Nov 3, 2023

eks-distro-pr-bot mentioned this pull request Nov 7, 2023

[release-0.18] Fix unschedulable Tinkerbell stack when no worker nodes #2638

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unschedulable Tinkerbell stack when no worker nodes #2624

Fix unschedulable Tinkerbell stack when no worker nodes #2624

chrisdoherty4 commented Nov 1, 2023 •

edited

Loading

chrisdoherty4 Nov 1, 2023 •

edited

Loading

chrisdoherty4 commented Nov 1, 2023

eks-distro-bot commented Nov 1, 2023

chrisdoherty4 commented Nov 7, 2023

eks-distro-pr-bot commented Nov 7, 2023

		tolerations:
		{{- include "controlPlaneTolerations" . \| indent 6 }}

Fix unschedulable Tinkerbell stack when no worker nodes #2624

Fix unschedulable Tinkerbell stack when no worker nodes #2624

Conversation

chrisdoherty4 commented Nov 1, 2023 • edited Loading

chrisdoherty4 Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

chrisdoherty4 commented Nov 1, 2023

eks-distro-bot commented Nov 1, 2023

chrisdoherty4 commented Nov 7, 2023

eks-distro-pr-bot commented Nov 7, 2023

chrisdoherty4 commented Nov 1, 2023 •

edited

Loading

chrisdoherty4 Nov 1, 2023 •

edited

Loading