-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2: refactor managed decommission #288
Conversation
The refactor to the controller ended up being pretty heavy but the logic is largely unchanged with the exception of |
b591c32
to
a36efbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// NB: Deleting this Pod will take an unexpectedly long time as the | ||
// pre-stop hook will "spin" due to not handling decommissioned brokers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. I'm under impression that podEvict is called after Redpanda is successfully decommissioned. Maybe the problem is that maintenance mode is not achievable if particular Redpanda is no longer part of the Cluster?
redpanda-operator/operator/internal/controller/redpanda/managed_decommission_controller.go
Lines 208 to 221 in a36efbc
if !decomStatus.Finished { | |
log.Info("decommission status not finished", "decommission-broker-status", decomStatus) | |
return &resources.RequeueAfterError{ | |
RequeueAfter: wait.Jitter(defaultDecommissionWaitInterval, decommissionWaitJitterFactor), | |
Msg: fmt.Sprintf("broker %d decommission status not finished", decommissionNodeID), | |
} | |
} | |
log.Info("Node decommissioned") | |
r.EventRecorder.AnnotatedEventf(rp, | |
map[string]string{v1alpha2.GroupVersion.Group + revisionPath: rp.ResourceVersion}, | |
corev1.EventTypeNormal, v1alpha2.EventTypeTrace, fmt.Sprintf("Node decommissioned: %d", decommissionNodeID)) | |
if err := r.podEvict(ctx, rp); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the problem is that maintenance mode is not achievable if particular Redpanda is no longer part of the Cluster?
Exactly this! You can't set maintenance mode on a broker that's not part of a cluster. The admin API returns a 404.
Repository: ptr.To("redpandadata/redpanda"), // Override the default to make use of the docker-io image cache. | ||
}, | ||
Config: &redpandav1alpha2.Config{}, | ||
Image: &redpandav1alpha2.RedpandaImage{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why an override was removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It got added by accident 😓 I've actually added it back in and updated the comment. This way we won't inflate our own metrics :P
Image: &redpandav1alpha2.RedpandaImage{ | ||
Repository: ptr.To("redpandadata/redpanda"), // Override the default to make use of the docker-io image cache. | ||
}, | ||
Config: &redpandav1alpha2.Config{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why empty Config struct is added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry that leaked in from another PR. It's so that I can set config later on without having to worry about nil pointers. Added a comment.
7c8d7be
to
12205e4
Compare
Prior to this commit the managed decommission controller was utilizing patterns akin to the node watcher and decommission controller for connecting to the admin API. This made it difficult to test via the `RedpandaControllerSuite` test which was desirable due to the flakey nature (especially when considering a later commit) of it's corresponding kuttl based test. This commit refactors the manged decommission controller to leverage the `ClientFactory` struct used in the redpanda controller which makes it possible to test in the faster and more easily debuggable `RedpandaControllerSuite`.
12205e4
to
908182b
Compare
Prior to this commit the managed decommission controller was utilizing patterns akin to the node watcher and decommission controller for connecting to the admin API. This made it difficult to test via the
RedpandaControllerSuite
test which was desirable due to the flakey nature (especially when considering a later commit) of it's corresponding kuttl based test.This commit refactors the manged decommission controller to leverage the
ClientFactory
struct used in the redpanda controller which makes it possible to test in the faster and more easily debuggableRedpandaControllerSuite
.