-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable metric reporting interval #284
Configurable metric reporting interval #284
Conversation
Hmm. If you send SIGUSR1 to fapolicyd, the results are in the file /run/fapolicyd/fapolicyd.state. |
Ah, 👀, thanks @stevegrubb.. thought that was only done at shutdown. Is there value in having the decision log interleaved with cache stats? This PR would allow logging that type of consolidated timeline. I'm also a little concerned with adding noise to the logs when using SIGUSR1, (1) when bumping the event loop to ensure stats are not stale (see fapolicyd-cli ~line 850) and (2) when reading the contents of the state file. The noise from both of these can be filtered out, though that creates a tricky case where log entries are interesting but match the filter, eg. profiling a profiler. Are there any technical issues with the implementation provided in this PR? Ill take a look at revising the cli logic, adding the new command doesn't seem right. |
4e060d3
to
e2b2801
Compare
"fapolicyd-cli --check-status" does the SIGUSR1 thing and then displays the resulting file. It does not dump the LRU's. That only happens at shutdown and if fapolicyd.conf has do_stat_report = 1, which is the default. When set it to 0, you get just the metrics on shutdown. But no matter the value, SIGUSR1 is a concise report. |
Thanks for clarifying that @stevegrubb. I may have been using the term So with that said, I'm not looking to use values from the Those metrics are also logged at The fallback if this PR cannot merge, is to poll with the SIGUSR1 signal, which is feasible. The downside of the fallback is the file access operations of (1) a contrived file read to bump the event loop (see fapolicyd-cli ~line 850), and (2) reading the metrics from disk; each of which adds irrelevant file accesses, ie. noise, to the access log. |
Background in fapolicy-analyzer issue |
I like the idea of making periodic metrics available. There are systems like statsd/prometheus/collectd/fluentd/etc that could help gain visibility into how the daemon (and by extention the system) is performing. By sending this to syslog it is not trivial to dig out the last occurrence. The intention of the SIGUSR1 interface was to have an on-demand way of seeing the daemon's internal metrics. The output is short and always the latest status. This would be ideal for collecting metrics. The fapolicyd-cli app looks at the pid file, checks what it points to, if it is fapolicyd, it sends a sigusr1. Fapolicyd receives the signal on the inbound/main thread. The report is run by the decision thread so that it is serialized with any updates to the metrics. If the queue is empty, it sleeps indefinitely on a conditional wait. In order to wake it up, the cli app accesses a file (the fapolicyd config file). This can skew the statistics for the access report. (This is only written when the daemon exits and has no effect on performance data.) I suppose we can change the file it accesses, but to which one? It might be possible to trigger the conditional wait from the signal handler or the main loop without needing a file access. This would let the report run immediately without skewing access statistics. We might need to do something in the decision thread to be OK if the queue is still empty. Another possibility would be to add a configuration item for periodic reports. What this would do is setup a periodic timerfd which could be polled. When the timer fires, run a new report. This would avoid the need to create an access which skews statistics. The serialization of the report still needs to be handled in the decision thread. This might still need the conditional wait work mentioned in the previous paragraph. Then plugins can setup a notification on the report changing to trigger a re-read/parsing. |
Yes, this sounds right. I made an attempt at sketching that out in this PR, it has sharp edges but is a working example. It needs some more isolated testing. I have a lot of events flowing on my system, so hard to say for sure if the
|
Thanks. I'll look over the changes in the coming week. |
Ok thanks. It seems to be working as expected, both when testing in isolation on a single directory and also when running it on my full system. |
@jw3 do you have anything to add? If not we could probably merge it. |
@radosroka I think it's ready to merge. |
Would you mind squashing the commits into one? |
c102878
to
8eaa07b
Compare
@radosroka it is squashed and rebased 👍 |
Provides recurring metric reports at a specified interval using a nonblocking timerfd and pthread_cond_timedwait. The interval is configured through the report_interval key in fapolicyd.conf. This value represents the length of interval in seconds between reports. If set to 0 reporting intervals are disabled, this is the default. Reports are written to the fapolicyd.state file. The output format and contents of this file are unchanged by this PR. This is the same file that was written at shutdown and upon receiving SIGUSR1. The signal handling function remains unchanged. Error handling is based around disabling interval reporting to preserve the decision thread. When an unrecoverable reporting error occurs the interval reports are disabled and the decision thread will use the non-interval report handler for the remainder of it's execution. The structuring of the reporting loop gives priority to the fan handling loop. When a reporting interval expires, the fan handler runs prior to writing the report to prioritize the fan handling and also to ensure the cache stats are as recent as possible. Also the reporting loop is only entered when no fan events have been recorded. This PR also restructure the initialization of the decision thread and reporting loop to ensure a fresh fapolicyd.state file when fapolicyd starts. The previous behavior was that the file from the last shutdown was still present until either a SIGUSR1 or shutdown.
rebased on @stevegrubb's latest commits Resolved a few minor conflicts in notify.c, nothing significant. |
5d58216
into
linux-application-whitelisting:main
Provides recurring metric reports at a specified interval.
Uses a nonblocking timerfd alongside pthread_cond_timedwait to create timed reporting intervals. The interval is configured through the
report_interval
key in fapolicyd.conf. This value represents the length of interval in seconds between reports. If set to0
reporting intervals are disabled, this is the default.Reports are written to the
fapolicyd.state
file. The output format and contents of this file are unchanged by this PR. This is the same file that was written at shutdown and upon receivingSIGUSR1
. The signal handling function remains unchanged.Error handling is based around disabling interval reporting to preserve the decision thread. When an unrecoverable reporting error occurs the interval reports are disabled and the decision thread will use the non-interval report handler for the remainder of it's execution.
The structuring of the reporting loop gives priority to the fan handling loop. When a reporting interval expires, the fan handler runs prior to writing the report to prioritize the fan handling and also to ensure the cache stats are as recent as possible. Also the reporting loop is only entered when no fan events have been recorded.
This PR also restructure the initialization of the decision thread and reporting loop to ensure a fresh
fapolicyd.state
file when fapolicyd starts. The previous behavior was that the file from the last shutdown was still present until either aSIGUSR1
or shutdown.