-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Rework metrics #715
WIP: Rework metrics #715
Conversation
4770c26
to
1ba0565
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. Overall looks good.
OperatorSDKPrometheusMetricsPort = 60000 | ||
|
||
// OperatorSDKPrometheusMetricsPortName define the port name used in kubernetes deployment and service | ||
OperatorSDKPrometheusMetricsPortName = "sdk-metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to rename the port to sdk-metrics
?
I'm thinking metrics
is simple and clear enough. The sdk prefix doesn't add much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the reason being port names need to be unique in the deployment files and metrics
is a very generic name, so if a user defines a new one the metrics
port would not make sense any more. Plus in the next few PRs when the controller-runtime metrics gets into a release we want to add a port to the deployment and service resources and 8080
where they expose their metrics and we want to name it differently. Hope that makes sense?
Note: This just means the name of the port not the actual path is called sdk-metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are going to serve two metrics endpoints one for the SDK and one for the controller runtime metrics?
Could we just register the controller-runtime metrics as apart of the sdk metrics and then serve via a single endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for now, well currently there are no metrics in controller-runtime release so there are no metrics there yet.
But see #715 (comment)
My plan is to adjust controller-runtime to not use global registry and we can then do a follow up PR to reuse that registry once that is merged.
Hope that clears things up?
I might be wrong but I think we don't need The Manager already creates a Server for our desired port to serve the controller metrics from the runtime's metrics.Registry. With the option MetricsBindAddress we can specify what port that should be. opts := manager.Options{MetricsBindAddress: ":"+strconv.Itoa(OperatorSDKPrometheusMetricsPort}
mgr, err := manager.New(cfg, opts) All Of course this means users need to register their own custom operator metrics with metrics.Registry. Meaning you can't have 2 registries. @lilic WDYT? And let me know if I'm off here since you've already tested this out locally. |
@hasbro17 I agree that we should reuse the registry from controller-runtime and my plan is to do just that, but currently we cannot do that. As it uses a global registry, which means we can't replace that registry with the The port name change does not effect that the metrics are exposed on |
This way InitOperatorService can accept custom names of the port and port name.
Due to the changes to operator-sdk the old metrics were obsolete, instead we just want the go and process specific collectors.
Due to possible conflicts rename the port, to make it more clear.
1ba0565
to
da2abc0
Compare
47fcc7e
to
613d23c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits and a question. I like the addition of the metrics package to the top level.
pkg/metrics/metrics.go
Outdated
) | ||
|
||
// ExposeMetricsPort generate a Kubernetes Service to expose metrics port | ||
func ExposeMetricsPort() (*v1.Service, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a nit, but what if we change this to ExposePort()
The call changes from metrics.ExposeMetricsPort
to metrics.ExposePort
which I think is just as clear and has less redundancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wasn't happy with that naming, not sure ExposePort
is the way to go either. This function creates service, exposes port and starts to listen and serve for prom. registry. So maybe metrics.SetupPrometheus()
would make more sense? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about metrics.Start()
or metrics.Expose()
or metrics.Setup()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metrics.Setup()
sounds good to me, will do. Only wanted to include Prometheus in the name to make it more clear it's specific to it.
OperatorSDKPrometheusMetricsPort = 60000 | ||
|
||
// OperatorSDKPrometheusMetricsPortName define the port name used in kubernetes deployment and service | ||
OperatorSDKPrometheusMetricsPortName = "sdk-metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are going to serve two metrics endpoints one for the SDK and one for the controller runtime metrics?
Could we just register the controller-runtime metrics as apart of the sdk metrics and then serve via a single endpoint?
63fe9bd
to
18e1fb1
Compare
18e1fb1
to
0f9e01b
Compare
0f9e01b
to
5764f34
Compare
@shawn-hurley @hasbro17 @estroz Tests pass and I addressed all the comments. Please have another look, thanks! |
@lilic I think we're going to receive some pushback on that. I personally think the controller-runtime having a global pkg level registry is fine.
We can already use the controller-runtime's global registry to serve those metrics on whatever endpoint we want. e.g in your PR that would just be: import (
"sigs.k8s.io/controller-runtime/pkg/metrics"
)
...
mux.Handle(PrometheusMetricsPath, promhttp.HandlerFor(metrics.Registry, ...))
Any reason why we can't do the above? Or some other reason we need to change the controller-runtime to not have a global registry. As for the question of how to serve both the controller-runtime metrics and custom operator metrics we have two options:
However in both cases I think we should keep the actual tcp port the same @lilic Unless I'm missing why we can't use the runtime's global registry I think it might help if you could use the latest commit from the controller-runtime in this PR to show what you intended to do in your follow up PR. Also it seems like the default runtime registry will give us the process and Go metrics as well. |
|
||
// HandlerFuncs registers the handles the /healthz and /metrics, to be able to serve the prometheus registry. | ||
func HandlerFuncs(mux *http.ServeMux, reg *prometheus.Registry) { | ||
mux.HandleFunc("/healthz", handlerHealthz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to register the healthz
endpoint? Right now handlerHealthz()
would just ensure that the operator is always healthy. Which is the default even if we don't set this.
Liveness(healthz
) and Readiness(readyz
) endpoints/probes should be based on the operator logic or possibly related to leader election.
Does prometheus need to query the /healthz
endpoint before scraping the actual metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Liveness(healthz) and Readiness(readyz) endpoints/probes should be based on the operator logic or possibly related to leader election.
Good point will open an issue about the healtz endpoint, if we don't have one already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename this to RegisterHandlerFuncs
for added clarity? HandlerFuncs
made me think it would return handler functions.
This is already done, all the metrics would be served on
Not sure I see this is needed, can you explain maybe why it would be needed to do custom endpoint? |
Thanks @hasbro17 I missed kubernetes-sigs/controller-runtime#132 (comment) that comment completely! As far as I understand global registry is something that needs to be avoided and instead we should refactor so the controller runtime registry is configurable. But I could be wrong, pinging @mxinden @brancz for second opinion on the above. If they agree will open issue on controller-runtime and adjust controller-runtime metrics registry and get back to this PR after that. |
@lilic yes, we should try to avoid global registries, more details in the upcoming Kubernetes enhancement proposal kubernetes/community#2909:
|
For what it’s worth, we regret having introduced the global registry as a pattern and in the 0.10 release of the golang client library there will be breaking changes in this regard (as we don’t want to break the world forever there will be a global registry, but in a different package) and overall working with non global registries is getting a lot of great functionality that is simply not possible with global ones (such as prefixed registries and label curried registries). |
Thanks! I opened an issue to discuss this in controller-runtime kubernetes-sigs/controller-runtime#210 Depending on that outcome, will continue with this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit. LGTM otherwise.
|
||
// HandlerFuncs registers the handles the /healthz and /metrics, to be able to serve the prometheus registry. | ||
func HandlerFuncs(mux *http.ServeMux, reg *prometheus.Registry) { | ||
mux.HandleFunc("/healthz", handlerHealthz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename this to RegisterHandlerFuncs
for added clarity? HandlerFuncs
made me think it would return handler functions.
@@ -45,7 +45,7 @@ func GetOperatorName() (string, error) { | |||
} | |||
|
|||
// InitOperatorService return the static service which expose operator metrics | |||
func InitOperatorService() (*v1.Service, error) { | |||
func InitOperatorService(port int32, portName string) (*v1.Service, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you say something about adding an OwnerReference
to the service? Is this the service or am I getting that confused with something else?
As this PR has comments that are not needed since we went with using the global registry from controller-runtime I decided to close and open a new PR to avoid confusion. Closing in favor of #786 |
Description of the change:
This PR:
pkg/metrics/...
Note: The
NewOperatoSDKPrometheusRegistery
and theHandlerFuncs
are exposed because sometimes the user if they already have a registry or want to register additional things can use those functions instead of theExposeMetricsPort
which does it all. Will add documention for that and the rest in a seperate PR.Also Note: The metrics have not yet landed in controller-runtime release so will open a PR with the changes to add/expose ports they use in a different PR when that release happens. I have tested it locally and it should work with little changes.