Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Operator downscaling functionality #901

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

bsocaciu
Copy link
Contributor

No description provided.

@bsocaciu bsocaciu marked this pull request as ready for review February 17, 2025 07:25
@bsocaciu bsocaciu requested a review from a team as a code owner February 17, 2025 07:25
@bsocaciu
Copy link
Contributor Author

Fixes issue: #179

Comment on lines 2185 to 2193
err = r.Status().Update(ctx, hc)
if err != nil {
r.Log.Error(err, "failed to update cluster status")
return reconcile.Result{}, err
}
r.Log.Info(fmt.Sprintf("removing pod %s containing vhost %d", pod.Name, vhost))
if err := r.Delete(ctx, &pod); err != nil { // delete pod before unregistering node
return reconcile.Result{}, r.logErrorAndReturn(err, fmt.Sprintf("failed to delete pod %s for vhost %d!", pod.Name, vhost))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the r.Status().Update(ctx, hc) call succeeds, but then the r.Delete(ctx, &pod) fails? Or perhaps we update status, and then due to the operator being restarted, it doesn't get to do the pod deletion?

Copy link
Member

@SaaldjorMike SaaldjorMike Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read the code correctly, it would essentially "just" mean it'll append the ID one more time to the Status, and then try to do the pod deletion again? Perhaps it should only do the append operation if the vhost is not already in the list? Mostly to prevent some sort of edge case that appends the same vhost over and over, where it fails to actually delete the pod.

Copy link
Contributor Author

@bsocaciu bsocaciu Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. If the delete fails after the update succeeds, then the node will remain marked at Evicted and at the next reconcile run, it will try to delete the pod again.
As per the check at line 2228, the node will NOT be unregistered if it's still alive (if the pod is not deleted).

Edit: Nvm, I see your point. I'll add a check before the insert.

// Default: false
// Preview: this feature is in a preview state
//+kubebuilder:default=false
EnableDownscalingFeature bool `json:"enableDownscalingFeature,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be nitpicky but if we plan on having opt-in features, perhaps we should create a feature flag type for this purpose?

FeatureFlags could then hold a Downscaling field that is a bool, or could be a string that we parse.

In general if we can keep the list of top level fields as minimal as possible I think that is cleaner. Especially if they are just holding a boolean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. I've implemented it, now I'm testing it, along with the eviction comment.

continue
}
vhost := nodeIdToPodMap[pod.GetName()]
err = r.HumioClient.SetIsBeingEvicted(ctx, humioHttpClient, req, vhost, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section makes me nervous. Do we have any protection in preventing runaway evictions?

For example, what if SetIsBeingEvicted returns an error but still executes the eviction, or something else happens (operator restart, etc) during this time after we mark for eviction but before we mark the pod as being evicted?

I think we should do some validation before this step, or change things where we're forced to be a little more careful. For example, in the reconciliation we could:

First check against the humio API to see if any nodes are marked for eviction. If they are, ensure the labels/annotations are set. If it sets the annotation, close the reconciliation loop. If it is already set, then continue.

Second step is to check if there are any pods with the label/annotation and check this against the humio API. If it doesn't match, then trigger the eviction for the pods that have the label and restart the reconciliation loop. If it does match, then continue with matchPodsToHosts() and starting the eviction process for a node assuming a node can be safely evicted.

Add logic to restrict a max number of concurrent evictions by default and possibly offer a way to override via the spec, possibly a max per zone?

This way if someone evicts via the UI then it will eventually lead to the pod being replaced, and we will not inadvertently evict all nodes in the cluster.

… scaling up the cluster as old pods don't have the new eviction annotation set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants