-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daemon not running in Docker with aiida 0.11.0 #1068
Comments
@vdikan First of all, thanks for reporting this and sorry for the late reply - @giovannipizzi usually took care of the docker images and he was unavailable after the 0.11 release (will be back soon, I believe). I can confirm the problem that the daemon somehow doesn't seem to start properly inside the docker image (see travis logs on aiidateam/aiida_docker_compose#10) This might have to do with the removal of the |
@giovannipizzi if you can find the time to give a quick look to what is going on in the travis logs on aiidateam/aiida_docker_compose/pull/10 that would be much appreciated |
I've restarted both jobs, now for both the runs, the logs confirm that the daemon is not running. Could you add a commit where you add, at this point for instance https://github.com/aiidateam/aiida_docker_compose/blob/develop/test-aiida-basic/aiida/scripts/plugin/test_plugin.py#L86 a printout of some info on what the daemon is doing (verdi daemon status, output of the daemon log file etc.) so we can debug more? I agree it might have to do with the change when we removed supervisor |
After some local debugging: the problem is that ~/.local/bin is not in the PATH. If I go on the machine and I run
If I do |
One simple solution is probably to add a line with |
Ha! The What puzzles me is what caused this issue to appear here... apparently not the |
So, I just checked and also on the previous docker image
|
Ok, so the problem is you need to connect via a login shell...
this actually works for both the |
I think it is the supervisor removal, before in supervisor the full absolute path was stored in the supervisor script (in "recent" AiiDA versions and until 0.10.0) |
I mean, in https://travis-ci.org/aiidateam/aiida_docker_compose/jobs/333007950 on line 592, it clearly says |
You are right, indeed if I connect with
So if shuts down in less than a sec. Restarting by hand works and it stays up. |
One idea - maybe the new 'daemonizer' does not properly daemonize 'celery', so when the script (after less than a second) finishes its job (after running |
I can probably confirm in some way my idea above: if I run
the daemon claims to be starting, but then it is not. |
So, the question is: what is causing the daemon to die... Just exiting the terminal in which the daemon was started is not enough, at least not on my mac. I guess in your case of connecting to the docker image, the user actually logs out(?) Edit: On Quantum Mobile with aiida v0.11.0 (Ubuntu 16.04.3), the daemon is not killed when logging out. |
It is correct, the daemon should not be killed when logging out (that is actually one of the main points). |
I think I can confirm my intuition. If I do these changes: diff --git a/aiida/cmdline/commands/daemon.py b/aiida/cmdline/commands/daemon.py
index 4a077c8..4bf9435 100644
--- a/aiida/cmdline/commands/daemon.py
+++ b/aiida/cmdline/commands/daemon.py
@@ -158,6 +158,7 @@ class Daemon(VerdiCommandWithSubcommands):
print "Starting AiiDA Daemon (log file: {})...".format(self.logfile)
currenv = _get_env_with_venv_bin()
+ _devnull = os.open(os.devnull, os.O_RDWR)
process = subprocess.Popen([
"celery", "worker",
"--app", "tasks",
@@ -169,10 +170,12 @@ class Daemon(VerdiCommandWithSubcommands):
],
cwd=self.workdir,
close_fds=True,
- stdout=subprocess.PIPE,
- stderr=subprocess.PIPE,
+ stdout=_devnull,
+ stderr=subprocess.STDOUT,
env=currenv)
+ os.close(_devnull)
+
# The following lines are needed for the workflow_stepper
# (re-initialize the timestamps used to lock the task, in case
# it crashed for some reason). i.e. I redirect stdout and stderr of the child process to /dev/null (and then close right after starting it, so it does not remain connected), the daemon is not killed and everything works. Now, the question is that I'm not 100% sure that this is the correct/best way to do it. I think that it is ok in terms of solving the problem, but then all print statements occurring in the daemon will get lost, I imagine (there shouldn't be in principle, but there might be in reality!). Maybe there is a way to redirect stdout to a file at daemon startup? Or maybe we can just open a real file instead of /dev/null, and close it right after, and this works? |
As a note, if I do these changes diff --git a/aiida/cmdline/commands/daemon.py b/aiida/cmdline/commands/daemon.py
index 4a077c8..e180c7a 100644
--- a/aiida/cmdline/commands/daemon.py
+++ b/aiida/cmdline/commands/daemon.py
@@ -82,6 +82,7 @@ class Daemon(VerdiCommandWithSubcommands):
}
self.logfile = setup.DAEMON_LOG_FILE
+ self.stdouterrfile = os.path.join(setup.AIIDA_CONFIG_FOLDER, setup.DAEMON_SUBDIR, "daemon-stdout-err.log")
self.pidfile = setup.DAEMON_PID_FILE
self.workdir = os.path.join(os.path.split(os.path.abspath(aiida.__file__))[0], "daemon")
self.celerybeat_schedule = os.path.join(setup.AIIDA_CONFIG_FOLDER, setup.DAEMON_SUBDIR, "celerybeat-schedule")
@@ -158,6 +159,7 @@ class Daemon(VerdiCommandWithSubcommands):
print "Starting AiiDA Daemon (log file: {})...".format(self.logfile)
currenv = _get_env_with_venv_bin()
+ _stdouterr = os.open(self.stdouterrfile, os.O_RDWR|os.O_CREAT|os.O_APPEND)
process = subprocess.Popen([
"celery", "worker",
"--app", "tasks",
@@ -169,10 +171,12 @@ class Daemon(VerdiCommandWithSubcommands):
],
cwd=self.workdir,
close_fds=True,
- stdout=subprocess.PIPE,
- stderr=subprocess.PIPE,
+ stdout=_stdouterr,
+ stderr=subprocess.STDOUT,
env=currenv)
+ os.close(_stdouterr)
+
# The following lines are needed for the workflow_stepper
# (re-initialize the timestamps used to lock the task, in case
# it crashed for some reason).
diff --git a/aiida/daemon/tasks.py b/aiida/daemon/tasks.py
index 9deec9e..528eaaf 100644
--- a/aiida/daemon/tasks.py
+++ b/aiida/daemon/tasks.py
@@ -79,6 +79,9 @@ def updater():
from aiida.daemon.execmanager import update_jobs
LOGGER.info('Checking for calculations to update')
set_daemon_timestamp(task_name='updater', when='start')
+ import sys
+ print "TO STDOUT"
+ print >> sys.stderr, "TO STDERR"
update_jobs()
set_daemon_timestamp(task_name='updater', when='stop')
the daemon seems to be working correctly (note that it is important to set properly the os.open flags) but the string |
As an additional note, sys.stdout and sys.stderr are actually |
As an additional edit, the logger only has one handler, |
Note that the difference is if, in a shell, you run |
Thanks for reply! Unfortunately,
The behavior of the daemon is strange: it starts, but only once(!). However, I was looking for using aiida inside docker also on the production workstation, and this bug prevents me from doing so :( |
indeed, the $PATH was something I thought was the cause, but it is not. The following messages should explain more the actual problem. I think I understood how to potentially solve it and I'm working on it, but it might take some time because I want to fix also the missing logs. In the meantime, could you try to change your code according the the first diff I posted above (the one with (You can create the new file, and modify the Dockerfile to COPY the file in the right place - if you do it from the root user, remember to chown it to the aiida:aiida user) |
@giovannipizzi since it seems, that celery doesn't redirect the stdout and/or stderr properly to the logfile for its workers, wouldn't it make sense to drop the The following snippet seems to work fine here: import subprocess
with open("celery.out", 'w') as fhandle:
process = subprocess.Popen(
[
'celery',
'worker',
'--app', 'aiida.daemon.tasks',
'--loglevel', 'INFO',
'--beat',
'--schedule', '/users/tiziano/.aiida/daemon/celerybeat-schedule',
],
stdout=fhandle,
stderr=subprocess.STDOUT,
close_fds=True,
) |
@giovannipizzi the diff doesnt work:
|
@dev-zero it could be a good idea. Anyway, I think I discovered why celery is losing some logs, it has to do with the way we configure the logging. I will try something soon, but I think it can be combined with your suggestion. @vdikan strange, to me it was working properly... A couple of questions just to make sure: is it right that you are running as root? In general it's not a very good idea. |
@giovannipizzi You are right, I did not set up a separate user for dev/test container. For testing purposes it's fine not to (though it is a good idea for production workstation). The only trick for celery to run under root was In some time I will test it with a second user in the container. |
@giovannipizzi Is there something left to be done here? |
…ker and also fix the missing stdout/stderr logs.
Hopefully, #1246 should solve this issue (note, BTW, that when we will merge the 'workflows' branch, the implementation of the daemon will be different so this PR will not be useful (I also mention @sphuber - so that when he will merge this or develop into 'workflows' and will most probably find conflicts, he knows that probably these files don't even exist anymore in that branch). To test, you can use the aiida_docker_compose repository, subfolder test-aiida-basic, replacing the Dockerfile in the aiida subfolder with the attached one, that picks the code from my branch: BTW, @ltalirz: I updated the aiida_docker_base for 0.11.0 with the fix of putting ~/.aiida/local/bin in the PATH in the bashrc, you might need to remove your docker cache (do a Sorry it took longer than expected, but it was much more complex than expected. In the end, before the actual fix (putting the new process in a different process group), various (partial) solutions I was trying were randomly working or not working, and I wasn't able to reproduce the results. Now with this fix I'm quite convinced it should work, and I tried to create the machine a few times and all seems to be ok. |
@giovannipizzi thanks for the mention, indeed good to know for when we will be merging. On that note, we should remember to test if the new daemon setup also works on docker. |
@giovannipizzi Thanks a lot. #1246 is merged - I close this issue now. By the way, this issue isn't related to the random failing of the travis tests by any chance, is it? |
For the ssh on travis - no, it should be unrelated, that I think is more of a problem with the base_ssh docker. I'm not sure what's causing it, and it's hard to debug (also because we rerun the jobs and lose the few logs we have). |
In a Docker container verdi daemon fails to restart with
Apparently, it starts only once in a fresh container while setting up computer and codes. With aiida 0.10.1 the approach worked well.
Startup script is here, it is a lightweight change of aiida_docker_compose - template:
https://github.com/vdikan/aiida_siesta_plugin/blob/with_11/dockerscripts/setup_develop.sh
First run goes equally well for v.0.10.1 and 0.11.0:
But upcoming starts (
docker-compose up
):While with aiida v0.10.1 the daemon starts (and continuously shows its log as a "service")
Maybe someone has ideas about possible reason?
The text was updated successfully, but these errors were encountered: