You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our monitoring stack consists out of multiple segmented prometheus instances with retention period set from 6 month till 1 year (depending on metrics type) in addition to Thanos in receiver mode.
Current behaviour of promql/range_query is to take data via API from defined Prometheus hosts and perform check for each alert, which is not usually desirable behaviour. Allowing user to define range_query that is equal to retention period - might be ok for smaller installation but in case when you have hundreds or thousands alerts against metrics with varying lvls of cardinality, it might lead to undesirable consequences (long evaluation times and resource starvation). Such cases can be mitigated by usage of query/cost but with same case (high amount of alerts) we will have a tradeoff of long pipeline execution time due to each alert execution against designated Prometheus/Thanos endpoint. Having possibility to define same rules as currently implemented in rule/for might be beneficial for cases when allowing users to set range_query equal to Prometheus retention is not a desirable outcome.
TL:DR version:
Allowing manually define rule for promql/range_query in same way as rule/for might be beneficial for larger installations
The text was updated successfully, but these errors were encountered:
Our monitoring stack consists out of multiple segmented prometheus instances with retention period set from 6 month till 1 year (depending on metrics type) in addition to Thanos in receiver mode.
Current behaviour of
promql/range_query
is to take data via API from defined Prometheus hosts and perform check for each alert, which is not usually desirable behaviour. Allowing user to definerange_query
that is equal to retention period - might be ok for smaller installation but in case when you have hundreds or thousands alerts against metrics with varying lvls of cardinality, it might lead to undesirable consequences (long evaluation times and resource starvation). Such cases can be mitigated by usage ofquery/cost
but with same case (high amount of alerts) we will have a tradeoff of long pipeline execution time due to each alert execution against designated Prometheus/Thanos endpoint. Having possibility to define same rules as currently implemented inrule/for
might be beneficial for cases when allowing users to setrange_query
equal to Prometheus retention is not a desirable outcome.The text was updated successfully, but these errors were encountered: