Skip to content

Commit 07b0830

Browse files
committed
docs: full notification policy example
1 parent 8f79bd6 commit 07b0830

14 files changed

+467
-26
lines changed

docs/docs/alerting.md

-26
This file was deleted.

docs/docs/alerting/_index.md

+76
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: Alerting
3+
weight: 13
4+
---
5+
{{% pageinfo color="primary" %}}
6+
Alerting resources require Grafana version 9.5 or higher.
7+
{{% /pageinfo %}}
8+
9+
The Grafana Operator currently only supports _Grafana Managed Alerts_.
10+
11+
For data source managed alerts, refer to the documentation and tooling available for the respective data source.
12+
{{% alert title="Note" color="primary" %}}
13+
When using Mimir/Prometheus, you can use the [`mimir.rules.kubernetes`](https://grafana.com/docs/alloy/latest/reference/components/mimir/mimir.rules.kubernetes/) component of [Grafana Alloy](https://grafana.com/docs/alloy/latest/) to deploy rules as Kubernetes resources.
14+
{{% /alert %}}
15+
16+
17+
## Full example
18+
19+
The following resources construct the flow outlined in the [Grafana notification documentation](https://grafana.com/docs/grafana/latest/alerting/fundamentals/notifications/).
20+
21+
They create:
22+
1. Three alert rules across two different groups
23+
2. Two contact points for two different teams
24+
3. A notification policy to route alerts to the correct team
25+
26+
{{< figure src="notification-routing.png" title="Flowchart of alerts routed through this system" width="500" >}}
27+
28+
{{% alert title="Note" color="primary" %}}
29+
If you want to try this for yourself, you can [get started with demo data in Grafana cloud](https://grafana.com/docs/grafana-cloud/get-started/#install-demo-data-sources-and-dashboards).
30+
The examples below utilize the data sources to give you real data to alert on.
31+
{{% /alert %}}
32+
33+
### Alert rule groups
34+
35+
The first resources in this flow are _Alert Rule Groups_.
36+
An alert rule group can contain multiple alert rules.
37+
They group together alerts to run on the same interval and are stored in a Grafana folder, alongside dashboards.
38+
39+
First, create the folder:
40+
41+
{{< readfile file="../examples/notifications-full/folder.yaml" code="true" lang="yaml" >}}
42+
43+
The first alert rule group is responsible for alerting on well known Kubernetes issues:
44+
45+
{{< readfile file="../examples/notifications-full/kubernetes-alert-rules.yaml" code="true" lang="yaml" >}}
46+
47+
The second alert rule group is responsible for alerting on security issues:
48+
49+
{{< readfile file="../examples/notifications-full/security-alert-rules.yaml" code="true" lang="yaml" >}}
50+
51+
After applying the resources, you can see the created rule groups in the _Alert rules_ overview page:
52+
53+
![Alert rules overview page](./overview-page.png)
54+
55+
### Contact Points
56+
57+
Before you can route alerts to the correct receivers, you need to define how these alerts should be delivered.
58+
[Contact points](./contact-points) specify the methods used to notify someone using different providers.
59+
60+
Since the two different teams get notified using different email addresses, two contact points are required.
61+
62+
{{< readfile file="../examples/notifications-full/contact-points.yaml" code="true" lang="yaml" >}}
63+
64+
### Notification Policy
65+
66+
Now that all parts are in place, the only missing component is the notification policy.
67+
The instances notification policy routes alerts to contact points based on labels.
68+
A Grafana instance can only have one notification policy applied at a time as it's a global object.
69+
70+
The following notification policy routes alerts based on the team label and further configures the repetition interval for high severity alerts belonging to the operations team:
71+
72+
{{< readfile file="../examples/notifications-full/notification-policy.yaml" code="true" lang="yaml" >}}
73+
74+
After applying the resource, Grafana shows the following notification policy tree:
75+
76+
![Notification policy tree after applying the resource](./notification-policy-tree.png)
+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
title: Alert Rule Groups
3+
---
4+
5+
Alert Rule Groups contain a list of alerts which should evaluate at the same interval.
6+
Every rule group must belong to a folder and contain at least one rule.
7+
8+
The easiest way to get the YAML specification for an alert rule is to use the [modify export feature](https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/export-alerting-resources/), introduced in Grafana 10.
9+
10+
The following snippet shows an example alert rule group with a single alert that fires when the temperature is below zero degrees.
11+
12+
{{< readfile file="../examples/alertrulegroups/resources.yaml" code="true" lang="yaml" >}}

docs/docs/alerting/contact-points.md

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
title: Contact Points
3+
---
4+
5+
Contact points contain the configuration for sending alert notifications. You can assign a contact point either in the alert rule or notification policy options.
6+
For a complete explanation on notification policies, refer to the [upstream Grafana documentation](https://grafana.com/docs/grafana/latest/alerting/fundamentals/notifications/contact-points/).
7+
8+
{{% alert title="Note" color="secondary" %}}
9+
The Grafana operator currently only supports a single receiver per contact point definition.
10+
As a workaround you can create multiple contact points with the same `spec.name` value.
11+
Follow issue [#1529](https://github.com/grafana/grafana-operator/issues/1529) for further updates on this topic.
12+
{{% /alert %}}
13+
14+
The following snippet shows an example contact point which notifies a specific email address.
15+
It also highlights how secrets and config maps can utilized to externalize some of the configuration.
16+
This is especially useful for contact points which contain sensitive information.
17+
18+
{{< readfile file="../examples/contactpoint_override/resources.yaml" code="true" lang="yaml" >}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: Notification Policies
3+
---
4+
5+
Notification policies provide you with a flexible way of designing how to handle notifications and minimize alert noise.
6+
For a complete explanation on notification policies, see the [upstream Grafana documentation](https://grafana.com/docs/grafana/latest/alerting/fundamentals/notifications/notification-policies/).
7+
8+
{{% alert title="Tip" color="secondary" %}}
9+
If you already know which contact point an alert should send to, you can directly set the [`receivers`]({{% relref "/docs/api/#grafanaalertrulegroupspecrulesindexnotificationsettings" %}}) property on the alert rule.
10+
{{% /alert %}}
11+
12+
13+
The following snippet shows an example notification policy routing to the `operations` or `security` team based on the `team` label.
14+
15+
{{< readfile file="../examples/notification-policy/resources.yaml" code="true" lang="yaml" >}}
66.6 KB
Loading
227 KB
Loading

docs/docs/alerting/overview-page.png

54.9 KB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
apiVersion: grafana.integreatly.org/v1beta1
2+
kind: GrafanaNotificationPolicy
3+
metadata:
4+
name: grafananotificationpolicy-sample
5+
spec:
6+
instanceSelector:
7+
matchLabels:
8+
dashboards: "grafana"
9+
route:
10+
receiver: grafana-email-default
11+
group_by:
12+
- grafana_folder
13+
- alertname
14+
routes:
15+
- receiver: grafana-email-operations
16+
object_matchers:
17+
- - team
18+
- =
19+
- operations
20+
- receiver: grafana-email-security
21+
object_matchers:
22+
- - team
23+
- =
24+
- security
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
apiVersion: grafana.integreatly.org/v1beta1
3+
kind: GrafanaContactPoint
4+
metadata:
5+
name: operations-team
6+
spec:
7+
name: operations-team
8+
type: "email"
9+
instanceSelector:
10+
matchLabels:
11+
instance: my-grafana-stack
12+
settings:
13+
addresses: 'operations@example.com'
14+
---
15+
apiVersion: grafana.integreatly.org/v1beta1
16+
kind: GrafanaContactPoint
17+
metadata:
18+
name: security-team
19+
spec:
20+
name: security-team
21+
type: "email"
22+
instanceSelector:
23+
matchLabels:
24+
instance: my-grafana-stack
25+
settings:
26+
addresses: 'security@example.com'
+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
apiVersion: grafana.integreatly.org/v1beta1
2+
kind: GrafanaFolder
3+
metadata:
4+
name: alerts-demo
5+
spec:
6+
instanceSelector:
7+
matchLabels:
8+
instance: "my-grafana-stack"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
---
2+
apiVersion: grafana.integreatly.org/v1beta1
3+
kind: GrafanaAlertRuleGroup
4+
metadata:
5+
name: kubernetes-alert-rules
6+
spec:
7+
folderRef: alerts-demo
8+
instanceSelector:
9+
matchLabels:
10+
instance: "my-grafana-stack"
11+
interval: 15m
12+
rules:
13+
- uid: be1q3344udslcf
14+
title: Pod stuck in CrashLoop
15+
condition: C
16+
for: 0s
17+
data:
18+
- refId: A
19+
relativeTimeRange:
20+
from: 600
21+
to: 0
22+
datasourceUid: grafanacloud-demoinfra-prom
23+
model:
24+
datasource:
25+
type: prometheus
26+
uid: grafanacloud-demoinfra-prom
27+
editorMode: code
28+
expr: max_over_time(kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff", job!=""}[5m])
29+
instant: true
30+
intervalMs: 1000
31+
legendFormat: __auto
32+
maxDataPoints: 43200
33+
range: false
34+
refId: A
35+
- refId: B
36+
datasourceUid: __expr__
37+
model:
38+
conditions:
39+
- evaluator:
40+
params: []
41+
type: gt
42+
operator:
43+
type: and
44+
query:
45+
params:
46+
- B
47+
reducer:
48+
params: []
49+
type: last
50+
type: query
51+
datasource:
52+
type: __expr__
53+
uid: __expr__
54+
expression: A
55+
intervalMs: 1000
56+
maxDataPoints: 43200
57+
reducer: last
58+
refId: B
59+
type: reduce
60+
- refId: C
61+
datasourceUid: __expr__
62+
model:
63+
conditions:
64+
- evaluator:
65+
params:
66+
- 0
67+
type: gt
68+
operator:
69+
type: and
70+
query:
71+
params:
72+
- C
73+
reducer:
74+
params: []
75+
type: last
76+
type: query
77+
datasource:
78+
type: __expr__
79+
uid: __expr__
80+
expression: B
81+
intervalMs: 1000
82+
maxDataPoints: 43200
83+
refId: C
84+
type: threshold
85+
noDataState: OK
86+
execErrState: Error
87+
labels:
88+
team: operations
89+
isPaused: false
90+
- uid: de1q3hd5d5clce
91+
for: 0s
92+
title: Disk Usage - 80%
93+
condition: C
94+
data:
95+
- refId: A
96+
relativeTimeRange:
97+
from: 600
98+
to: 0
99+
datasourceUid: grafanacloud-demoinfra-prom
100+
model:
101+
datasource:
102+
type: prometheus
103+
uid: grafanacloud-demoinfra-prom
104+
editorMode: code
105+
expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}
106+
instant: true
107+
intervalMs: 1000
108+
legendFormat: __auto
109+
maxDataPoints: 43200
110+
range: false
111+
refId: A
112+
- refId: B
113+
datasourceUid: __expr__
114+
model:
115+
conditions:
116+
- evaluator:
117+
params: []
118+
type: gt
119+
operator:
120+
type: and
121+
query:
122+
params:
123+
- B
124+
reducer:
125+
params: []
126+
type: last
127+
type: query
128+
datasource:
129+
type: __expr__
130+
uid: __expr__
131+
expression: A
132+
intervalMs: 1000
133+
maxDataPoints: 43200
134+
reducer: last
135+
refId: B
136+
type: reduce
137+
- refId: C
138+
datasourceUid: __expr__
139+
model:
140+
conditions:
141+
- evaluator:
142+
params:
143+
- 0.2
144+
type: lt
145+
operator:
146+
type: and
147+
query:
148+
params:
149+
- C
150+
reducer:
151+
params: []
152+
type: last
153+
type: query
154+
datasource:
155+
type: __expr__
156+
uid: __expr__
157+
expression: B
158+
intervalMs: 1000
159+
maxDataPoints: 43200
160+
refId: C
161+
type: threshold
162+
noDataState: NoData
163+
execErrState: Error
164+
labels:
165+
severity: high
166+
team: operations
167+
isPaused: false

0 commit comments

Comments
 (0)