Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unschedulable Tinkerbell stack when no worker nodes #2624

Merged
merged 4 commits into from
Nov 3, 2023
Merged

Fix unschedulable Tinkerbell stack when no worker nodes #2624

merged 4 commits into from
Nov 3, 2023

Conversation

chrisdoherty4
Copy link
Contributor

@chrisdoherty4 chrisdoherty4 commented Nov 1, 2023

Closes https://github.com/aws/eks-anywhere-internal/issues/1953.

If all worker nodes elgible to run Tinkerbell stack workloads go offline the cluster and Workload clusters cannot recover failed nodes. This change allows the Tinkerbell stack to run on control plane nodes but prefer Worker nodes at scheduling time.

The patch was tested against a 1 control plane 1 worker node cluster. The Helm chart was installed using existing values, then the worker node was drained. I observed the Tinkerbell stack move to the control plane node. Similarly, when a pod is deleted it preferences the worker node (when untainted). Additionally, a Workload cluster was created from a stack hosted in a cluster with 0 worker nodes.

If all worker nodes elgible to run Tinkerbell stack workloads go offline
the cluster and Workload clusters cannot recover failed nodes. This
change allows the Tinkerbell stack to run on control plane nodes but
prefer Worker nodes at scheduling time.
KubeVIP needs to run on control planes so that when the Tinkerbell stack
floats to a control plane node (with Envoy) the VIP can be hosted from
that node and accommodate the Local externalTrafficPolicy.
@eks-distro-bot eks-distro-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 1, 2023
Comment on lines +44 to +45
tolerations:
{{- include "controlPlaneTolerations" . | indent 6 }}
Copy link
Contributor Author

@chrisdoherty4 chrisdoherty4 Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relative to the previous PR, this is the new addition. I built a Workload cluster from a management cluster that had worker nodes scaled back to 0 demonstrating the Tinkerbell stack is fully functional when scheduled on the control plane.

@chrisdoherty4
Copy link
Contributor Author

/approve

@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrisdoherty4

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chrisdoherty4 chrisdoherty4 requested a review from a team November 2, 2023 17:42
@chrisdoherty4
Copy link
Contributor Author

/cherry-pick release-0.18

@eks-distro-pr-bot
Copy link
Contributor

@chrisdoherty4: new pull request created: #2638

In response to this:

/cherry-pick release-0.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants