[Security Solution] ML rule can miss anomaly documents if its interval/lookback is too short #158152

banderror · 2023-05-21T17:25:31Z

Summary

A user had an ML rule with interval = 15 mins and lookback time 1 min. This rule was based on an ML job with a fixed interval of 15 mins. The rule had anomaly_threshold = 90.

Despite the fact that there were anomaly documents with record_score >= 90, the rule had missed them and hadn't generated any alerts.

The user increased the rule interval to 22 mins which fixed the issue and the rule started to generate alerts.

This feels like a bug in the ML rule type/executor. If the lookback time for a given ML rule depends on the corresponding job parameters and has to be higher than a certain value, our app should tell the user about that and/or set its value automatically.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-05-21T17:25:34Z

Pinging @elastic/security-detections-response (Team:Detections and Resp)

elasticmachine · 2023-05-21T17:25:35Z

Pinging @elastic/security-solution (Team: SecuritySolution)

yctercero · 2023-05-22T04:52:07Z

This seems like one where we should either fix or triage to know how to add documentation as a known issue for 8.9. Depending on how far back the issue goes, may be worth adding it as a known issue for older releases.

cc @peluja1012

rylnd · 2023-05-26T18:23:30Z

After some discussion with @marshallmain and the rest of the team, I think I can describe a generalization of this issue (or at least the hypothesis):

# 15m job, anomalies A, B
|-A--|B---|----|----|----|

# 15m interval rule, no lookback, offset from job
# Xs represent execution windows for executions 1, 2
--1----2----3----|----|
XXX
   XXXXX

In the above diagram, we can see the execution of both an ML job and an ML rule. In rule execution 1, the rule will be looking in the range of anomaly A; however, the ML job may still be processing that 15m bucket, and anomaly A may not yet exist or it might be an "interim" anomaly; in either case, it will not be alerted upon. Similarly, in execution 2, anomaly B may not be found for the same reasons, and anomaly A is now outside of execution 2's window. Neither of these anomalies will be found by the rule.

By increasing the rule lookback time (as was the fix in the referenced SDH), we ensure that subsequent rule execution will alert on a finalized anomaly from a previous bucket window.

Assuming the above is correct, I would agree that we should validate that a rule's execution window (interval + lookback) is at least double the job's bucket_span, and issue a warning if that's not the case.

Another note: if the above is true, then #90316 arguably exacerbated this problem (assuming that interim anomaly scores are always <= finalized scores).

rylnd · 2023-06-02T19:27:40Z

I was discussing this with @yctercero and I think we've come up with a potential solution: if ML were to maintain an updated_at or finalized_at (name TBD) time field, that was updated at the same time as is_interim on anomaly documents, the detection rules could use that field to sort results and eliminate the "late arrival/finalization" problem described above.

@marshallmain does that make sense to you? @jgowdyelastic would that be a reasonable request/task for the ML team?

darnautov · 2023-06-05T09:40:46Z

Hi @rylnd, I reckon that first it's worth trying to adjust the lookback interval logic to the same approach we have in the Anomaly Detection rule type.

The lookback interval is set to be double the bucket span and sum it with the query delay.

You can read more about it in this blog post.

We also share the alerting service from the ML app, perhaps you could reuse it as well. Let me know if I can help you with this.

banderror assigned yctercero May 21, 2023

banderror added the sdh-linked label May 21, 2023

yctercero removed the triage_needed label May 22, 2023

yctercero removed their assignment May 31, 2023

peteharverson mentioned this issue Jun 7, 2023

[Lateral Movement Detection] Update package to add RDP based lateral movement detection elastic/integrations#6406

Closed

4 tasks

yctercero mentioned this issue Sep 6, 2023

[DE] - Detection Engine backlog overview #165878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution] ML rule can miss anomaly documents if its interval/lookback is too short #158152

[Security Solution] ML rule can miss anomaly documents if its interval/lookback is too short #158152

banderror commented May 21, 2023

elasticmachine commented May 21, 2023

elasticmachine commented May 21, 2023

yctercero commented May 22, 2023

rylnd commented May 26, 2023

rylnd commented Jun 2, 2023

darnautov commented Jun 5, 2023

[Security Solution] ML rule can miss anomaly documents if its interval/lookback is too short #158152

[Security Solution] ML rule can miss anomaly documents if its interval/lookback is too short #158152

Comments

banderror commented May 21, 2023

Summary

elasticmachine commented May 21, 2023

elasticmachine commented May 21, 2023

yctercero commented May 22, 2023

rylnd commented May 26, 2023

rylnd commented Jun 2, 2023

darnautov commented Jun 5, 2023