Node object should be removed by node-lifecycle controller #4310

Danil-Grigorev · 2021-03-15T11:14:56Z

User Story

Cluster-api unnecessary takes care of Node removal. node-lifecycle-controller code is removing nodes once the instance is terminated. This will match autoscaling pattern, and reduce clutter in the code.

Detailed Description

Use nodeRef for Draining/Cordon or machine status reporting only.

/kind cleanup

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2021-03-15T11:39:44Z

/milestone v0.4.0

I'm not sure this is the correct behavior from Cluster API PoV.
AFAIK the node-lifecycle-controller removes nodes with a predefined frequency, and this could lead to having ghost nodes in the cluster for a certain time. I find it more clean to remove the node in Cluster API during the machine deletion process.

enxebre · 2021-03-15T14:08:26Z

We deliberately introduced this feature a while ago. There are scenarios where there's no necessarily a cloud provider running and the node won't be removed, e.g libvirt, baremetal... and for the scenarios where there's one this makes no difference. We agreed on enabling draining and deletion in capi since we are the source of truth for deleting the infra backing a node it makes sense to ensure the node deletion as well.

vincepri · 2021-03-15T15:52:47Z

/milestone Next

@fabriziopandini This isn't a release blocking feature.

@Danil-Grigorev Wouldn't the removal be a no-op in case some other controller takes care of it?

Danil-Grigorev · 2021-03-15T19:02:06Z

@vincepri Yes, it is a no-op, just was wondering if with CCM configuration in KubeadmControlPlane this behavior could be disabled for known cloud providers. It is confusing to do same job in two places, if both configurations are managed by CAPI.

@fabriziopandini this frequency could be adjusted, as that implementation is commonly used in autoscaling outside of clusterapi implementation of it. CAPI could also leverage it, so there will be less discrepancy.

@enxebre I realize, saw your comment on initial issue that this behavior is more generic for providers which do not support the feature.

Danil-Grigorev · 2021-03-15T19:09:49Z

If this implementation will be leveraged by clusterapi more, I hope there might be a possibility to move CAPI draining implementation server side for everyone (using cloud environment) - kubernetes/kubernetes#25625 (comment)

enxebre · 2021-04-22T10:56:01Z

@vincepri Yes, it is a no-op, just was wondering if with CCM configuration in KubeadmControlPlane this behaviour could be disabled for known cloud providers. It is confusing to do same job in two places, if both configurations are managed by CAPI.

I really see no issue with two tangential components ensuring the node resource is garbage collected, as mentioned above if there's no node any component will no-op. I see no value on exposing this as config to the user as this is an internal detail.
We own the host lifecycle thus makes total sense to me to ensure the node resource is deleted after the infrastructure. This ensures the same behaviour and UX across any environment including those with no cloud-provider support.
This was originally introduced by #809 for more context.

@Danil-Grigorev Unless you or anyone else have any objection I'd suggest we close this.

Danil-Grigorev · 2021-04-22T14:56:53Z

Yeah, argument that there may be providers which do not have CCM in k/k and so lack this feature so CAPI will take care of Node removal seems good to me. Redundancy in this case is good, and surely CAPI will be quicker to respond then a polling process in CCM anyway to Node removal request.

/close

k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Mar 15, 2021

fabriziopandini added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Mar 15, 2021

k8s-ci-robot added this to the v0.4.0 milestone Mar 15, 2021

k8s-ci-robot modified the milestones: v0.4.0, Next Mar 15, 2021

Danil-Grigorev closed this as completed Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node object should be removed by node-lifecycle controller #4310

Node object should be removed by node-lifecycle controller #4310

Danil-Grigorev commented Mar 15, 2021

fabriziopandini commented Mar 15, 2021 •

edited

Loading

enxebre commented Mar 15, 2021 •

edited

Loading

vincepri commented Mar 15, 2021

Danil-Grigorev commented Mar 15, 2021

Danil-Grigorev commented Mar 15, 2021

enxebre commented Apr 22, 2021

Danil-Grigorev commented Apr 22, 2021

Node object should be removed by node-lifecycle controller #4310

Node object should be removed by node-lifecycle controller #4310

Comments

Danil-Grigorev commented Mar 15, 2021

fabriziopandini commented Mar 15, 2021 • edited Loading

enxebre commented Mar 15, 2021 • edited Loading

vincepri commented Mar 15, 2021

Danil-Grigorev commented Mar 15, 2021

Danil-Grigorev commented Mar 15, 2021

enxebre commented Apr 22, 2021

Danil-Grigorev commented Apr 22, 2021

fabriziopandini commented Mar 15, 2021 •

edited

Loading

enxebre commented Mar 15, 2021 •

edited

Loading