Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolved Service Annotation Loop Issue in ROSA Environment #739

Merged
merged 4 commits into from
Mar 20, 2024

Conversation

spilchen
Copy link
Collaborator

In ROSA (RedHat OpenShift on AWS), we noticed that when setting up a network LoadBalancer an annotation would automatically be added to the Service object. This caused the operator to start a reconcile loop. It would remove the annotation, only to have OpenShift add it back. So, the operator was in a continuous reconcile loop.

To fix this we now allow manual annotations be added to service objects. The operator will only ensure the annotations that it generates are the correct value. It will ignore any additional annotation that was added outside of the VerticaDB.

I am also cleaning up the PVC expansion events. This can cause quite a lot of noise about skipping expansion if we continuously are doing reconciles. The skip events have been changed to log entries instead.

Matt Spilchen added 2 commits March 20, 2024 09:15
In ROSA (RedHat OpenShift on AWS), we noticed that when setting up a
network LoadBalancer an annotation would automatically be added to the
Service object. This caused the operator to start a reconcile loop. It
would remove the annotation, only to have OpenShift add it back. So, the
operator was in a continuous reconcile loop.

To fix this we now allow manual annotations be added to service objects.
The operator will only ensure the annotations that it generates are the
correct value. It will ignore any additional annotation that was added
outside of the VerticaDB.

I am also cleaning up the PVC expansion events. This can cause quite a
lot of noise about skipping expansion if we continuously are doing
reconciles. The skip events have been changed to log entries instead.
@spilchen spilchen requested a review from roypaulin March 20, 2024 13:14
@spilchen spilchen self-assigned this Mar 20, 2024
Copy link
Collaborator

@roypaulin roypaulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

apiVersion: vertica.com/v1
kind: VerticaDB
name: v-pvc-expansion
# No event to check if expansion is skipped
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you still keep this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we run with a PVC that allows for expansion, then the step can still be used to verify that.

@roypaulin
Copy link
Collaborator

Is this related to the crash from yesterday?

@spilchen
Copy link
Collaborator Author

Is this related to the crash from yesterday?

Yes, we were repeatedly doing reconcile iterations. Eventually, the K8s OOMKiller stepped in and killed the pod. We were allocating/freeing too much memory that the garbage collection couldn't keep up.

@spilchen spilchen merged commit 47b3c07 into main Mar 20, 2024
30 checks passed
@spilchen spilchen deleted the spilchen/fix-cloud-annotations-reconcile branch March 20, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants