Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use recording rule metric for cluster cpu utilization #745

Merged
merged 2 commits into from
Feb 23, 2022

Conversation

ravilr
Copy link
Contributor

@ravilr ravilr commented Feb 15, 2022

  • Updates cluster resources dashboards to use precomputed cluster:node_cpu:ratio_rate5m metric. the current query is slower to evaluate on clusters having many nodes with multi-core cpus ( cores >= 64 ).

cc @povilasv @paulfantom

@@ -140,7 +140,7 @@ local template = grafana.template;
})
.addPanel(
g.panel('CPU Utilisation') +
g.statPanel('1 - sum(avg by (mode) (rate(node_cpu_seconds_total{%(nodeExporterSelector)s, mode=~"idle|iowait|steal", %(clusterLabel)s="$cluster"}[%(grafanaIntervalVar)s])))' % $._config)
g.statPanel('cluster:node_cpu:ratio{%(clusterLabel)s="$cluster"}' % $._config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a good pattern to refer a recording rule(cluster:node_cpu:ratio ) which is not defined in the dependency of kubernetes-mixin. kube-prometheus dependes on k8s-mixin, not the otherway!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed. This mixin doesn't depend on node exporter as far as I remember. :)

Copy link
Contributor Author

@ravilr ravilr Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Is moving those recording rules from kube-prometheus to kubernetes-mixin an option ? who can make that call here ?

there are node-exporter related rules here at https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/rules/node.libsonnet

Can we move some rules from https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/components/mixin/rules/node-rules.libsonnet to above ? Or copying those rules here with different metric names works ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixin itself depends on the existence of node_exporter data for some queries (hence nodeExporterSelector option in config.jsonnet). So I would say it is fine to move the recording rule which speeds things up from kube-prometheus to this project. One caveat though, those recording rules need to be used by alerts or dashboards shipped by kubernetes-mixin project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense then. Agreed @paulfantom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright. Updated to include the recording rule here. Also, renamed the metric name to be different from the one in kube-prometheus. PTAL.

@metalmatze
Copy link
Member

Looks great!
We are able to have recording rules with the same name (and hopefully same underlying metric) and then exclude one of them on kube-prometheus' side. So we don't have to necessarily rename to avoid conflicts.

@ravilr
Copy link
Contributor Author

ravilr commented Feb 22, 2022

We are able to have recording rules with the same name (and hopefully same underlying metric) and then exclude one of them on kube-prometheus' side. So we don't have to necessarily rename to avoid conflicts.

the recording rule in kube-prometheus was already broken(fixed recently at prometheus-operator/kube-prometheus#1628).
I think we could just remove those rules from kube-prometheus, once this is merged.

Copy link
Member

@metalmatze metalmatze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the great work!

@metalmatze metalmatze merged commit 177bc8e into kubernetes-monitoring:master Feb 23, 2022
@ravilr ravilr deleted the cluster_cpu_util branch February 23, 2022 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants