Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaemonClient: Refactor to include parsing of client response #5940

Merged
merged 2 commits into from
Mar 30, 2023

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Mar 22, 2023

Fixes #5934

Recently, the DaemonClient has been updated to provide all necessary
functionality to manage the daemon through the Python API just as it is
possible through verdi daemon. For example, the methods start_daemon
and stop_daemon were added. However, the interface of the client is
still lacking, particularly because the various methods returned a
dictionary from which the caller would still have to parse the actual
result from string values.

The call_client method is updated to detect various problems and
communicate them more clearly by raising an exception. Various
subclasses of DaemonException are raised for certain problems:

  • DaemonNotRunningException: If the daemon is not running
  • DaemonStalePidException: If the daemon could not be reached and the
    PID file appears to be stale.
  • DaemonTimeoutException: If the daemon connection timed out
  • DaemonException: For any other (unknown) reason

The functionality that checks for a stale PID file is moved from the
aiida.cmdline.utils.daemon module to the DaemonClient. This way it
is not just used through verdi daemon but also through the client.

The locations where the automatic cleaning of stale PID files has also
been changed. Summary of behavior changes:

  • verdi daemon status and verdi status no longer delete stale PID file
  • DaemonClient.start_daemon is now the only place that deletes them

The verdi status and verdi daemon status commands were calling the
function aiida.cmdline.utils.daemon.delete_stale_pid_file which would
check the daemon PID file and delete it if it was considered stale. This
was added to prevent users having to clean up manually if the daemon got
killed abruptly, leaving behind an orphaned PID file. However, this is a
side-effect that users wouldn't expect from a command that should simply
report the status of a service.

The aiida.cmdline.utils.daemon.delete_stale_pid_file function has been
removed and its functionality has been refactored and moved to the
DaemonClient._check_pid_file method. It raises if the PID file is
likely to be stale. This allows it to be wrapped by _is_pid_file_stale
which returns a boolean, and _clean_potentially_stale_pid_file which
replaces the old functionality of automatically deleting the PID file if
it is stale and logs a warning of having done so.

The daemon.timeout option default is changed from 20 to 5 seconds.
There is no reason why a functioning daemon wouldn't be able to respond
within 5 seconds, and forcing a user to wait 20 seconds in case of a
broken daemon is frustratingly long.

verdi process list: Simplify the daemon load implementation

The calculation of the daemon load at the end of the command was using a
utility function that relied manually parsed the response of the daemon
client. However, the DaemonClient.get_numprocesses now does this auto-
matically and aiida.cmdline.commands.cmd_daemon.execute_client_command
converts this into the correct formatting for stdout.

The CalculationQueryBuilder is replaced with a query using the
QueryBuilder directly as we only need the count and with the former we
are forced to define a projection and retrieve the results from the
database to count them in Python, which is more expensive an operation.

@ltalirz
Copy link
Member

ltalirz commented Mar 25, 2023

Hi @sphuber , so far I have just read through your (great) PR description, and I agree with all changes discussed there.

I expect any comments from code review to be minor, please feel free to proceed / let me know when I should look deeper

@sphuber sphuber force-pushed the fix/5934/refactor-daemon-client branch 3 times, most recently from bf9a058 to 9acddc5 Compare March 30, 2023 13:46
sphuber added 2 commits March 30, 2023 16:11
Recently, the `DaemonClient` has been updated to provide all necessary
functionality to manage the daemon through the Python API just as it is
possible through `verdi daemon`. For example, the methods `start_daemon`
and `stop_daemon` were added. However, the interface of the client is
still lacking, particularly because the various methods returned a
dictionary from which the caller would still have to parse the actual
result from string values.

The `call_client` method is updated to detect various problems and
communicate them more clearly by raising an exception. Various
subclasses of `DaemonException` are raised for certain problems:

* `DaemonNotRunningException`: If the daemon is not running
* `DaemonStalePidException`: If the daemon could not be reached and the
  PID file appears to be stale.
* `DaemonTimeoutException`: If the daemon connection timed out
* `DaemonException`: For any other (unknown) reason

The functionality that checks for a stale PID file is moved from the
`aiida.cmdline.utils.daemon` module to the `DaemonClient`. This way it
is not just used through `verdi daemon` but also through the client.

The locations where the automatic cleaning of stale PID files has also
been changed. Summary of behavior changes:

* `verdi daemon status` and `verdi status` no longer delete stale PID file
* `DaemonClient.start_daemon` is now the only place that deletes them

The `verdi status` and `verdi daemon status` commands were calling the
function `aiida.cmdline.utils.daemon.delete_stale_pid_file` which would
check the daemon PID file and delete it if it was considered stale. This
was added to prevent users having to clean up manually if the daemon got
killed abruptly, leaving behind an orphaned PID file. However, this is a
side-effect that users wouldn't expect from a command that should simply
report the status of a service.

The `aiida.cmdline.utils.daemon.delete_stale_pid_file` function has been
removed and its functionality has been refactored and moved to the
`DaemonClient._check_pid_file` method. It raises if the PID file is
likely to be stale. This allows it to be wrapped by `_is_pid_file_stale`
which returns a boolean, and `_clean_potentially_stale_pid_file` which
replaces the old functionality of automatically deleting the PID file if
it is stale and logs a warning of having done so.

The `daemon.timeout` option default is changed from 20 to 5 seconds.
There is no reason why a functioning daemon wouldn't be able to respond
within 5 seconds, and forcing a user to wait 20 seconds in case of a
broken daemon is frustratingly long.
The calculation of the daemon load at the end of the command was using a
utility function that relied manually parsed the response of the daemon
client. However, the `DaemonClient.get_numprocesses` now does this auto-
matically and `aiida.cmdline.commands.cmd_daemon.execute_client_command`
converts this into the correct formatting for stdout.

The `CalculationQueryBuilder` is replaced with a query using the
`QueryBuilder` directly as we only need the count and with the former we
are forced to define a projection and retrieve the results from the
database to count them in Python, which is more expensive an operation.
@sphuber sphuber force-pushed the fix/5934/refactor-daemon-client branch from a2c2863 to 9178370 Compare March 30, 2023 14:11
@sphuber sphuber merged commit 505bb01 into aiidateam:main Mar 30, 2023
@sphuber sphuber deleted the fix/5934/refactor-daemon-client branch March 30, 2023 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stale daemon PIDs should not be removed by verdi status and verdi daemon status
2 participants