Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale daemon PIDs should not be removed by verdi status and verdi daemon status #5934

Closed
sphuber opened this issue Mar 16, 2023 · 0 comments · Fixed by #5940
Closed

Stale daemon PIDs should not be removed by verdi status and verdi daemon status #5934

sphuber opened this issue Mar 16, 2023 · 0 comments · Fixed by #5940

Comments

@sphuber
Copy link
Contributor

sphuber commented Mar 16, 2023

The verdi status and verdi daemon status commands both call the aiida.cmdline.utils.delete_stale_pid_file which will check if the daemon's PID file is stale and delete it if the case. This was added in 04e80d2 . The reason was that without this, if the daemon was killed abruptly (for example in a computer restart or some other problem), the PID file wouldn't be cleaned up and the status command would assume the daemon is running and try to contact it, which would result in a timeout. The user would then have to manually clean up the PID file.

Although the solution prevents the user having to manually clean up things, it can have negative side-effects. Users may not expect a status command to have side-effects. For example, in combination with a weakness in the logic for determing whether a PID file is stale, it is possible that a PID file is incorrectly labeled as stale and removed, while the daemon is actually still running. The original daemon process is orphaned in this manner and a new is started. Now there are rogue daemon instances running that can cause all sort of problems and cannot be killed through AiiDA's API.

The proposal is to move the delete_stale_pid_file check from the status commands and move it to verdi daemon start instead. Then we just have to detect a stale PID in the status commands and make the error message more informative, suggesting that the daemon may be running on another machine with access to the same system, or if not the case, that the user restart the daemon to clean the PID file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment