Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sync-cluster-config command #231

Merged
merged 1 commit into from
Sep 17, 2024
Merged

Conversation

chrisseto
Copy link
Contributor

This commit adds the sync-cluster-config command which is designed to replace the bash scripts in the helm chart's post-__ hooks.

Rather than re-inventing the wheel and loading up the admin API configuration, we re-use RPK's config loading as the redpanda.yaml and environment variables are known to be valid.

@chrisseto chrisseto force-pushed the chris/p/sync-cluster-config branch from d256041 to 4dff2eb Compare September 13, 2024 20:10
Base automatically changed from chris/p/add-envsubst to main September 13, 2024 20:23

// NB: It's unclear why this endpoint is being hit. It's preserved
// as a historical artifact.
if err := client.RestartService(ctx, "schema-registry"); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why this restart is needed. The PR which introduced this change have 0 context. I can only guess now that cluster which is upgraded to from no SASL to cluster with SASL enabled might have problem with Schema Registry. It would be good to create follow up to create test case for such upgrade and remove this schema registry restart.

redpanda-data/helm-charts#846

https://redpandadata.atlassian.net/wiki/spaces/CORE/pages/326500353/Schema+Registry+Restart

redpanda-data/redpanda#9631

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file an issue to circle back on this one and leave it in place for now unless you have any major objections?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection I tried to provide context to the best of my ability.

@RafalKorepta
Copy link
Contributor

https://buildkite.com/redpanda/redpanda-operator/builds/2603#_

Test containers was waiting for pulling Redpanda container and apparently it failed :(


2024/09/16 17:55:12 ⏳ Waiting for container id 94e70c6744c0 image: docker.redpanda.com/redpandadata/redpanda:v24.2.4. Waiting for: &{timeout:<nil> deadline:<nil> Strategies:[0xc0002468d0 0xc000246930 0xc000246990]}
--
  | 2024/09/16 17:56:12 container logs (wait until ready: get state: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.44/containers/94e70c6744c034b922f43c35d1b0a9a5ae54b0cc91b0cb7d6422d6f2b3b01c19/json": context deadline exceeded):
  |  
  | sync_test.go:47:
  | Error Trace:	/work/src/go/k8s/cmd/syncclusterconfig/sync_test.go:47
  | Error:      	Received unexpected error:
  | start container: started hook: wait until ready: get state: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.44/containers/94e70c6744c034b922f43c35d1b0a9a5ae54b0cc91b0cb7d6422d6f2b3b01c19/json": context deadline exceeded
  | Test:       	TestSync
  | FAIL: TestSync (72.72s)


},
}

cmd.Flags().StringVar(&bootstrapYAMLPath, "bootstrap-yaml", "", "")
Copy link
Contributor

@RafalKorepta RafalKorepta Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker.

As far as I know the default place where Redpanda is looking for .bootstrap.yaml file is /etc/redpanda/ path.

Comment on lines 35 to 36
github.com/testcontainers/testcontainers-go v0.33.0
github.com/testcontainers/testcontainers-go/modules/redpanda v0.33.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something broke with go dependency update. #231 (comment)

Could you downgrade some of the dependency to see if that helps?

@RafalKorepta
Copy link
Contributor

I wonder if CI have some problems and my hack would work?

if os.Getenv("CI") == "true" {
err := os.Setenv("TESTCONTAINERS_RYUK_DISABLED", "true")
require.NoError(t, err)
}

Copy link
Contributor

@RafalKorepta RafalKorepta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm not sure why CI is failing. Maybe new version of testcontainer introduced a bug.

@chrisseto chrisseto force-pushed the chris/p/sync-cluster-config branch from f3424df to 0d4c9a0 Compare September 16, 2024 21:02
@chrisseto chrisseto force-pushed the chris/p/sync-cluster-config branch 2 times, most recently from 388e2b4 to ca8f15e Compare September 16, 2024 21:20
@chrisseto chrisseto mentioned this pull request Sep 16, 2024
Copy link
Contributor

@andrewstucki andrewstucki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like the movement to go code, though it makes me wonder why this hook exists in the first place.

Additionally, thoughts about the library upgrades.

Comment on lines 45 to 53
k8s.io/api v0.30.3
k8s.io/apiextensions-apiserver v0.30.1
k8s.io/apimachinery v0.30.3
k8s.io/client-go v0.30.3
k8s.io/component-helpers v0.29.5
k8s.io/kubectl v0.29.5
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8
pgregory.net/rapid v1.1.0
sigs.k8s.io/controller-runtime v0.17.2
sigs.k8s.io/controller-runtime v0.18.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure on all of these changes, but I know that a bunch of Kubernetes stuff along with controller-runtime are highly sensitive to minor version changes in the underlying libraries. Just looking above it looks like we also have kubectl and component-helpers at v0.29.5... I'm not sure if that'll cause odd problems or not.

I think that the helm-charts repo also has a controller-runtime dependency? Might version mismatches there cause problems here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've yet to experience any issues from version mismatches that don't appear at either compile time or very early on in startup / init() code. Have you seen runtime issues before?

I think that the helm-charts repo also has a controller-runtime dependency? Might version mismatches there cause problems here?

It does but it uses very little of the API(s) exposed. I'm fairly certain that it's just the client.Client and some configuration helpers. I don't imagine any issues occurring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced all the k8s.io/* deps with the exception of k8s.io/utils which doesn't really have any published versions.

logger.Info(fmt.Sprintf("$%s not set. Skipping license setting...", licenseEnvvar))
} else {
logger.Info(fmt.Sprintf("$%s set. Setting license...", licenseEnvvar))
if err := client.SetLicense(cmd.Context(), strings.NewReader(license)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random question, but do we know why we're setting the license like this? Is it because wherever the license is being dumped doesn't get re-read by the cluster nodes? Would it make sense to put something like this in the config watching sidecar instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the license has to be set like this. I don't believe there's anywhere it actually gets read from.

// NB: It's unclear why this endpoint is being hit. It's preserved
// as a historical artifact.
// See also: https://github.com/redpanda-data/redpanda-operator/issues/232
if err := client.RestartService(ctx, "schema-registry"); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought with this -- if it's something that just fires after a configuration component changes and is meant to avoid an issue where we aren't re-reading configuration off disk, can't we just add this to the config watcher?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's suspected that this is due to a cluster level configuration change which isn't something that's read from disk. I guess technically we could watch the bootstrap file but that's only read once at cluster initialization time.

@chrisseto chrisseto force-pushed the chris/p/sync-cluster-config branch from ca8f15e to b91bacd Compare September 17, 2024 19:20
This commit adds the `sync-cluster-config` command which is designed to replace
the bash scripts in the helm chart's post-__ hooks.

Rather than re-inventing the wheel and loading up the admin API configuration,
we re-use RPK's config loading as the redpanda.yaml and environment variables
are known to be valid.
@chrisseto chrisseto force-pushed the chris/p/sync-cluster-config branch from b91bacd to 187a795 Compare September 17, 2024 19:21
@chrisseto chrisseto enabled auto-merge (rebase) September 17, 2024 19:22
@chrisseto chrisseto merged commit 967e6b3 into main Sep 17, 2024
4 of 5 checks passed
@RafalKorepta RafalKorepta deleted the chris/p/sync-cluster-config branch December 2, 2024 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants