Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node object should be removed by node-lifecycle controller #4310

Closed
Danil-Grigorev opened this issue Mar 15, 2021 · 7 comments
Closed

Node object should be removed by node-lifecycle controller #4310

Danil-Grigorev opened this issue Mar 15, 2021 · 7 comments
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.

Comments

@Danil-Grigorev
Copy link
Member

User Story

Cluster-api unnecessary takes care of Node removal. node-lifecycle-controller code is removing nodes once the instance is terminated. This will match autoscaling pattern, and reduce clutter in the code.

Detailed Description

Use nodeRef for Draining/Cordon or machine status reporting only.

/kind cleanup

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Mar 15, 2021
@fabriziopandini fabriziopandini added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Mar 15, 2021
@fabriziopandini
Copy link
Member

fabriziopandini commented Mar 15, 2021

/milestone v0.4.0

I'm not sure this is the correct behavior from Cluster API PoV.
AFAIK the node-lifecycle-controller removes nodes with a predefined frequency, and this could lead to having ghost nodes in the cluster for a certain time. I find it more clean to remove the node in Cluster API during the machine deletion process.

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Mar 15, 2021
@enxebre
Copy link
Member

enxebre commented Mar 15, 2021

We deliberately introduced this feature a while ago. There are scenarios where there's no necessarily a cloud provider running and the node won't be removed, e.g libvirt, baremetal... and for the scenarios where there's one this makes no difference. We agreed on enabling draining and deletion in capi since we are the source of truth for deleting the infra backing a node it makes sense to ensure the node deletion as well.

@vincepri
Copy link
Member

/milestone Next

@fabriziopandini This isn't a release blocking feature.

@Danil-Grigorev Wouldn't the removal be a no-op in case some other controller takes care of it?

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.4.0, Next Mar 15, 2021
@Danil-Grigorev
Copy link
Member Author

@vincepri Yes, it is a no-op, just was wondering if with CCM configuration in KubeadmControlPlane this behavior could be disabled for known cloud providers. It is confusing to do same job in two places, if both configurations are managed by CAPI.

@fabriziopandini this frequency could be adjusted, as that implementation is commonly used in autoscaling outside of clusterapi implementation of it. CAPI could also leverage it, so there will be less discrepancy.

@enxebre I realize, saw your comment on initial issue that this behavior is more generic for providers which do not support the feature.

@Danil-Grigorev
Copy link
Member Author

If this implementation will be leveraged by clusterapi more, I hope there might be a possibility to move CAPI draining implementation server side for everyone (using cloud environment) - kubernetes/kubernetes#25625 (comment)

@enxebre
Copy link
Member

enxebre commented Apr 22, 2021

@vincepri Yes, it is a no-op, just was wondering if with CCM configuration in KubeadmControlPlane this behaviour could be disabled for known cloud providers. It is confusing to do same job in two places, if both configurations are managed by CAPI.

I really see no issue with two tangential components ensuring the node resource is garbage collected, as mentioned above if there's no node any component will no-op. I see no value on exposing this as config to the user as this is an internal detail.
We own the host lifecycle thus makes total sense to me to ensure the node resource is deleted after the infrastructure. This ensures the same behaviour and UX across any environment including those with no cloud-provider support.
This was originally introduced by #809 for more context.

@Danil-Grigorev Unless you or anyone else have any objection I'd suggest we close this.

@Danil-Grigorev
Copy link
Member Author

Yeah, argument that there may be providers which do not have CCM in k/k and so lack this feature so CAPI will take care of Node removal seems good to me. Redundancy in this case is good, and surely CAPI will be quicker to respond then a polling process in CCM anyway to Node removal request.

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.
Projects
None yet
Development

No branches or pull requests

5 participants