Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow customizing admin url when using k8s service #1874

Merged
merged 5 commits into from
Mar 6, 2025

Conversation

diurnalist
Copy link
Contributor

this is a no-op by default, but the fqdn can help when other network policy agents
are in the mix, which may have rules set up for svc.cluster.local to
enable traffic passing. for example, in an environment with istio and opa in
the mix, there might be policies and/or assumptions about what form the service
address takes. we already saw this with openshift's certificate signing; this just
opens a door to other exceptions in the future.

@diurnalist
Copy link
Contributor Author

for context, i ran in to this issue on our local deployments of the operator. i would prefer not having to be forced to use an Ingress in this case, and honestly am not sure how our internal network security and infra would support that. having the ability to override the AdminUrl to the k8s service fqdn is very helpful. we currently seem to cater to Openshift's own demands on this host 😅

@theSuess
Copy link
Member

Thanks for the PR! I think this is a valid usecase but we'll discuss it in the next weekly on monday before merging

@diurnalist
Copy link
Contributor Author

@theSuess sounds good, thanks! i admit i don't really love this implementation, i dislike adding very specific flags like this. when i first did this patch in our fork, it was from an older version, before the OpenShift-specific adaptation was added. originally i just renamed it from <name>.<namespace> to <name>.<namespace>.svc.cluster.local, figured that was backwards-compatible and a proper FQDN made sense for other things too. the fact that openshift doesn't use the cluster.local TLD part of the host is a bit frustrating!

@diurnalist
Copy link
Contributor Author

actually, i wonder if it can just be svc.cluster.local - it looks like the .svc suffix was maybe added b/c we started requiring TLS on internal services, which wasn't intentional, and was fixed in #1690?

@weisdd
Copy link
Collaborator

weisdd commented Mar 2, 2025

@diurnalist .svc suffix was intentionally added to accommodate automatic OpenShift certs in #1713

To me, the implementation that relies on templating seems to be overly complex. - I think the only variation that users are likely to see lies in cluster domain (= suffix), so a simple string join is likely to be sufficient unless you can think of a more complex scenario.

@diurnalist
Copy link
Contributor Author

@weisdd that's fair, for our use-case, just specifying the suffix would do the job. i was concerned there might be other weird cases after seeing this was changed for OpenShift--but i was probably over-thinking :)

@theSuess
Copy link
Member

theSuess commented Mar 3, 2025

We discussed this today and came to the conclusion that adding a suffix field is the best way forwards because of the ease of use and maintenance simplicity. Happy to review any work in that direction

@diurnalist
Copy link
Contributor Author

@theSuess sounds good, will rework this today when I have a moment.

this is a no-op by default, but the fqdn can help when other network policy agents
are in the mix, which may have rules set up for svc.cluster.local to
enable traffic passing. for example, in an environment with istio and opa in
the mix, there might be policies and/or assumptions about what form the service
address takes. we already saw this with openshift's certificate signing; this just
opens a door to other exceptions in the future.
@diurnalist
Copy link
Contributor Author

i have updated to use the suffix/domain approach. there isn't a great avenue for testing this kind of nested functionality in the operator right now as far as i can tell -- extracting the logic into a function that was tested w/ a whitebox test (like the existing test for the service reconciler) doesn't imo add much value b/c it's essentially a string concat test. if folks have an idea about how better to test the value of the status field, i'm happy to add something :)

@diurnalist
Copy link
Contributor Author

diurnalist commented Mar 4, 2025

the more i pull at this, the weirder it seems that openshift wouldn't support .cluster.local. the original change added .svc to the admin URL, i.e., it wasn't explicitly taking away the .cluster.local domain part. i wonder if openshift's certificates include the .cluster.local FQDN address as a SAN on the cert. do we have a way to check this? i don't have any openshift deployment at-hand.

if openshift supports the FQDN, the simplest fix is to just use the FQDN for this in all cases.

@weisdd
Copy link
Collaborator

weisdd commented Mar 5, 2025

@diurnalist Before.svc was added in #1713, we had other PRs where we stripped .svc.cluster.local (#791 for v4, #962 for v5) due to user complains coming from users who relied on custom domains. We cannot reintroduce .cluster.local as it would break DNS resolution for some setups.
.svc, on the other hand, was a backwards-compatible change as .svc is present everywhere, so we didn't mind having it to simplify life of OpenShift users (personally, I haven't seen OpenShift in ages, but some other maintainers work with it all the time).

clusterDomain is how this setting is generally referred to in kubernetes;
aligning on that terminology for consistency.
@diurnalist
Copy link
Contributor Author

diurnalist commented Mar 5, 2025

@weisdd aha, thanks for that info. i didn't actually realize this was even configurable in kubernetes :)

so my current understanding is that we are using a relative/partial domain name so as to be agnostic to clusterDomain being set on the cluster (and i guess also b/c it helps OpenShift just work) 👍 -- i've revised the feature to align more on that terminology. we default to an empty string, which means to not address using any clusterDomain, and it can be overridden to a custom domain, or to the k8s default domain of "cluster.local" if needed (as in my use-case.)

@diurnalist diurnalist requested a review from weisdd March 5, 2025 19:42
@weisdd
Copy link
Collaborator

weisdd commented Mar 6, 2025

@diurnalist I think it looks great now, thanks for making the final adjustments :)

@weisdd weisdd added this pull request to the merge queue Mar 6, 2025
Merged via the queue into grafana:master with commit 47f993d Mar 6, 2025
15 checks passed
@diurnalist diurnalist deleted the b/adminurl-fqdn branch March 6, 2025 21:21
@weisdd weisdd added the feature this PR introduces a new feature label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature this PR introduces a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants