-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unschedulable Tinkerbell stack when no worker nodes #2624
Fix unschedulable Tinkerbell stack when no worker nodes #2624
Conversation
If all worker nodes elgible to run Tinkerbell stack workloads go offline the cluster and Workload clusters cannot recover failed nodes. This change allows the Tinkerbell stack to run on control plane nodes but prefer Worker nodes at scheduling time.
KubeVIP needs to run on control planes so that when the Tinkerbell stack floats to a control plane node (with Envoy) the VIP can be hosted from that node and accommodate the Local externalTrafficPolicy.
tolerations: | ||
{{- include "controlPlaneTolerations" . | indent 6 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relative to the previous PR, this is the new addition. I built a Workload cluster from a management cluster that had worker nodes scaled back to 0 demonstrating the Tinkerbell stack is fully functional when scheduled on the control plane.
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrisdoherty4 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-0.18 |
@chrisdoherty4: new pull request created: #2638 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Closes https://github.com/aws/eks-anywhere-internal/issues/1953.
If all worker nodes elgible to run Tinkerbell stack workloads go offline the cluster and Workload clusters cannot recover failed nodes. This change allows the Tinkerbell stack to run on control plane nodes but prefer Worker nodes at scheduling time.
The patch was tested against a 1 control plane 1 worker node cluster. The Helm chart was installed using existing values, then the worker node was drained. I observed the Tinkerbell stack move to the control plane node. Similarly, when a pod is deleted it preferences the worker node (when untainted). Additionally, a Workload cluster was created from a stack hosted in a cluster with 0 worker nodes.