-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd-watchdog: improve notifications to watchdog #595
systemd-watchdog: improve notifications to watchdog #595
Conversation
system will now be notified on send timeouts and when event handling is running for a long time Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>
Hello @alexmohr , One thing which I wonder about is that the watchdog is triggered at several places instead from one central place. E.g. in case the poll in dlt_daemon_handle_event() returns. Would this make more sense? |
Hi @michael-methner, it's triggered from multiple places to prevent watchdog timeouts when we run into socket timeouts. |
* the watchdog interval is small and multiple timeouts occur back to back | ||
*/ | ||
if (sd_notify(0, "WATCHDOG=1") < 0) | ||
dlt_vlog(LOG_WARNING, "%s: Could not reset systemd watchdog\n", __func__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why dlt_daemon_trigger_systemd_watchdog_if_necessary() was not used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there is no timestamp to use as a reference time. We could call the function and always pass 0 as timestamp.
As you mentioned in the other comment this patch improves the watchdog behavior but it's far from perfect.
Especially in environment with congested networks between the clients and the daemon combined with heavy logging load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks.
Thanks for the explanation. As the change is definitely an improvement, I will merge it. But that slow clients may still cause watchdog timeouts worries me. But I understand that this will be part of a bigger rework. |
systemd will now be notified on send timeouts
and when event handling is running for a long time
this prevents dlt-daemon from being killed by
systemd watchdog when processing log messages
would take longer than the watchdog timeout
To test we need the following things
Excecute the script below on your test system.
It spawns a lot of dlt-receive proceses, the goal is for dlt-daemon to survive.
The program was tested solely for our own use cases, which might differ from yours.
Licensed under Mozilla Public License Version 2.0
Alexander Mohr, alexander.m.mohr@mercedes-benz.com, Mercedes-Benz Tech Innovation GmbH, imprint