Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.0] InfluxDB notification rule doesn't send Slack message on status change #17809

Closed
mildebrandt opened this issue Apr 21, 2020 · 14 comments
Closed

Comments

@mildebrandt
Copy link

mildebrandt commented Apr 21, 2020

Steps to reproduce:

  1. Create a check, slack notification endpoint, and notification rule using the "changes from" condition.

Expected behavior:
A slack notification to be sent when the state changes.

Actual behavior:
No notification happens.

Environment info:
Docker container of InfluxDB 2.0.0 beta 8

I don't know if it's only Slack, but that's what we're testing with. When the rule is changed to the "is equal to" condition, we get the slack message successfully. So the endpoint and the check work correctly.

When I look at the history of the notification rule, the table says the state changed to "crit" was seen and it says a notification was sent. But we don't get any notification in Slack.

@desa
Copy link
Contributor

desa commented Apr 22, 2020

@mildebrandt thanks for opening this issue. We've got a fix in place that has been making its way into the code base. As a temporary work around, if you don't use changes from then everything should work as expected. Can you confirm that if you switch to using is equal to that everything behaves as expected?

@mildebrandt
Copy link
Author

Thanks for the update, looking forward to that!

Yes, "is equal to" works....but of course sends multiple messages until it's resolved.

Another related thing we noticed, we have a check that's scheduled every 2 minutes and a notification that's scheduled every 10 minutes. We get 5 notifications every time the notification fires (I'm assuming for the 5 times the check ran). Does that sound right to you, or do we need another issue opened?

@desa
Copy link
Contributor

desa commented Apr 22, 2020

@mildebrandt how many series are in the check? You can figure out how many series are there by running

from(bucket: "_monitoring")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "statuses")
  |> filter(fn: (r) => r["_check_id"] == "<id of my check>")
  |> limit(n: 1)
  |> group()
  |> count()

@mildebrandt
Copy link
Author

mildebrandt commented Apr 22, 2020

So, the check recently was changed to check every 1 minute...and the notification was changed to every 2 minutes. When I run your query for that check, it's 2.

@ajkerrigan
Copy link

I'm happy to see this issue and response :). For the record I hit the same behavior for HTTP notifications (and assumed I was doing something wrong). I'm guessing the fix that's in the works is independent of destination, just figured this was worth noting in case any other non-Slack folks come across this. Thanks!

@samhld
Copy link
Contributor

samhld commented Apr 23, 2020

@mildebrandt clarification question: when you get 5 (or 2, in the subsequent case) notifications, are the Checks evaluating to a state that should be triggering notifications? Or is there a notification for each of the Check windows in the Rule interval no matter what the Check evaluates to?

@mildebrandt
Copy link
Author

I think I understand what you're asking. Let's see if this helps:

Check every 2 minutes. Notification on Crit every 10 minutes.

t0 -> OK (Check sees OK, notification sees OK, doesn't do anything)
t6 -> OK (Check sees OK)
t7 -> Crit
t8 -> Crit (Check sees Crit)
t9 -> Crit
t10 -> Crit (Check sees Crit, Notification sends two slack messages for both Crit checks)

So, I'd get two slack notifications for both checks that saw the Crit status. I would like just one.

I don't know what happens if the status goes back to OK at t10, whether I'd get one slack notification or no slack notifications. Personally, I'd expect no slack notifications at this point....but I could see someone else wanting to know if the check peaked into the Crit level within the notification time window.

Hopefully that helps.

@samhld
Copy link
Contributor

samhld commented Apr 23, 2020

Ok, thanks @mildebrandt !

@mildebrandt
Copy link
Author

I found out the answer to what happens if t10 is OK, the Crit notifications get sent. Here are the checks:
Screen Shot 2020-04-23 at 4 24 43 PM

And the notifications:
Screen Shot 2020-04-23 at 4 22 43 PM

I was sent one notification at 23:10 and six notifications at 23:20.

@desa
Copy link
Contributor

desa commented May 4, 2020

I believe that we have recently fixed this issue in cloud 2 and should have the fix available in the next influxdb oss release. Specifically, the state changes functionality should now work.

@mildebrandt
Copy link
Author

That's great, I'll watch for the next release. Thanks!

@ajkerrigan
Copy link

Just writing in to note that my state change notifications did indeed start working after that last release. Yay and thank you!

@mildebrandt
Copy link
Author

Sorry, I forgot to come back. This works for me now as well. Thanks!

@amit12cool
Copy link

I have the check applied and the notification rule set from ANY to ANY. But still I don't get any notification on slack channel.
I see in Alert History the LEVEL is changing from OK to WARN and vice versa, that should trigger the notification as per the notification rul, but it doesn't.

My check is schedule every 5m with offset of 2s
My notification rule is scheduled at every 5min with offset 5s

@mildebrandt any idea, how this can be solved? I see you have got this resolved.

my influx versions are:-

InfluxDB v2.4.0
Server: de247ba
Frontend: a2bd1f3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants