Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Clarify cluster paused and skipping remediation steps #10817

Merged

Conversation

SD-13
Copy link
Contributor

@SD-13 SD-13 commented Jun 29, 2024

Fixes #9026

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 29, 2024
- A cluster or a machine is usually paused automatically by Cluster API when it detects a migration.
- Users can skip remediation for a specific machine by setting the `cluster.x-k8s.io/skip-remediation` annotation on it.
- Paused Machines (Machines with the `cluster.x-k8s.io/paused` annotation) are not considered for remediation.
- If a specific MHC resource is paused (using `cluster.x-k8s.io/paused` annotation), it will stop to remediate the corresponding target machines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If a specific MHC resource is paused (using `cluster.x-k8s.io/paused` annotation), it will stop to remediate the corresponding target machines.
- If a specific MachineHealthCheck resource is paused (using `cluster.x-k8s.io/paused` annotation), it will stop to remediate the corresponding target machines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

- Users can skip remediation for a specific machine by setting the `cluster.x-k8s.io/skip-remediation` annotation on it.
- Paused Machines (Machines with the `cluster.x-k8s.io/paused` annotation) are not considered for remediation.
- If a specific MHC resource is paused (using `cluster.x-k8s.io/paused` annotation), it will stop to remediate the corresponding target machines.
- If the Cluster is paused (using the `cluster.x-k8s.io/paused` annotation or by setting `cluster.spec.paused` to true), all the MHC resources belonging to the Cluster will be implicitly paused, and thus stop remediating target machines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the Cluster is paused (using the `cluster.x-k8s.io/paused` annotation or by setting `cluster.spec.paused` to true), all the MHC resources belonging to the Cluster will be implicitly paused, and thus stop remediating target machines.
- If the Cluster is paused (using the `cluster.x-k8s.io/paused` annotation or by setting `cluster.spec.paused` to `true`), all the MachineHealthCheck resources belonging to the Cluster will be implicitly paused, and thus stop remediating target machines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@SD-13 SD-13 force-pushed the clarity_pause_skip_remediation branch from b1b1dda to a350b75 Compare July 1, 2024 13:28
@SD-13
Copy link
Contributor Author

SD-13 commented Jul 1, 2024

@chrischdi Added your suggested changes, let me know if you have any other suggestions. Thanks!

@@ -209,15 +209,14 @@ This is useful for dynamically scaling clusters where the number of machines kee

## Skipping Remediation

Copy link
Member

@sbueringer sbueringer Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to further simplify this to

There are scenarios where remediation for a machine may be undesirable (eg. during cluster migration using `clusterctl move`). For such cases, MachineHealthCheck skips marking a Machine for remediation if:

- the Machine has the `cluster.x-k8s.io/skip-remediation` annotation
- the Machine has the `cluster.x-k8s.io/paused` annotation
- the MachineHealthCheck has the `cluster.x-k8s.io/paused` annotation
- the Cluster has `.spec.paused` set to `true`

Just a lot easier to parse in my opinion

(I didn't find any evidence of handling of the cluster.x-k8s.io/paused annotation on clusters, please let me know if I missed it. I think the same was written in the issue: #9026 (comment))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbueringer I agree, this is more readable.
I think cluster.x-k8s.io/paused is being handled similar to other annotation.

@sbueringer sbueringer added the area/documentation Issues or PRs related to documentation label Jul 2, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Jul 2, 2024
@kashifest
Copy link
Contributor

lgtm

@sbueringer sbueringer modified the milestones: v1.5, v1.8 Jul 10, 2024
@sbueringer
Copy link
Member

Thank you!!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 10, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 969b0b13b8599d51acab3190fb0fec9f1dbd6e4b

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit 7a53f93 into kubernetes-sigs:main Jul 10, 2024
18 checks passed
@SD-13 SD-13 deleted the clarity_pause_skip_remediation branch July 10, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/documentation Issues or PRs related to documentation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The description of cluster paused and skipping remediation of MHC should be clearer
5 participants