Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon not running in Docker with aiida 0.11.0 #1068

Closed
vdikan opened this issue Jan 18, 2018 · 31 comments
Closed

Daemon not running in Docker with aiida 0.11.0 #1068

vdikan opened this issue Jan 18, 2018 · 31 comments
Milestone

Comments

@vdikan
Copy link

vdikan commented Jan 18, 2018

In a Docker container verdi daemon fails to restart with

Daemon not running (cannot find the PID for it)

Apparently, it starts only once in a fresh container while setting up computer and codes. With aiida 0.10.1 the approach worked well.

Startup script is here, it is a lightweight change of aiida_docker_compose - template:
https://github.com/vdikan/aiida_siesta_plugin/blob/with_11/dockerscripts/setup_develop.sh

First run goes equally well for v.0.10.1 and 0.11.0:

aiida_1  | Executing now a migrate command...
aiida_1  | ...for Django backend
aiida_1  | Operations to perform:
aiida_1  |   Apply all migrations: contenttypes, db, sites, auth, sessions
aiida_1  | Running migrations:
aiida_1  |   Applying contenttypes.0001_initial... OK
aiida_1  |   Applying auth.0001_initial... OK
aiida_1  |   Applying db.0001_initial... OK
aiida_1  |   Applying db.0002_db_state_change... OK
aiida_1  |   Applying db.0003_add_link_type... OK
aiida_1  |   Applying db.0004_add_daemon_and_uuid_indices... OK
aiida_1  |   Applying db.0005_add_cmtime_indices... OK
aiida_1  |   Applying db.0006_delete_dbpath... OK
aiida_1  |   Applying db.0007_update_linktypes... OK
aiida_1  |   Applying sessions.0001_initial... OK
aiida_1  |   Applying sites.0001_initial... OK
aiida_1  | Database was created successfully
aiida_1  | Loading new environment...
aiida_1  | Installing default AiiDA user...
aiida_1  | Starting user configuration for aiida@localhost...
aiida_1  | You set up AiiDA using the default Daemon email (aiida@localhost),
aiida_1  | therefore no further user configuration will be asked.
aiida_1  | Setup finished.
aiida_1  | Clearing all locks ...
aiida_1  | Starting AiiDA Daemon (log file: /root/.aiida/daemon/log/celery.log)...
aiida_1  | Re-initializing workflow stepper stop timestamp
aiida_1  | Daemon started
aiida_1  | At any prompt, type ? to get some help.
aiida_1  | ---------------------------------------
aiida_1  | => Computer name: Creating new computer with name 'develop'
aiida_1  | => Fully-qualified hostname: => Description: => Enabled: => Transport type: => Scheduler type: => shebang line at the beginning of the submission script: => AiiDA work directory: => mpirun command: => Default number of CPUs per machine: => Text to prepend to each command execution: 
aiida_1  |    # This is a multiline input, press CTRL+D on a
aiida_1  |    # empty line when you finish
aiida_1  |    # ------------------------------------------
aiida_1  |    # End of old input. You can keep adding     
aiida_1  |    # lines, or press CTRL+D to store this value
aiida_1  |    # ------------------------------------------
aiida_1  | => Text to append to each command execution: 
aiida_1  |    # This is a multiline input, press CTRL+D on a
aiida_1  |    # empty line when you finish
aiida_1  |    # ------------------------------------------
aiida_1  |    # End of old input. You can keep adding     
aiida_1  |    # lines, or press CTRL+D to store this value
aiida_1  |    # ------------------------------------------
aiida_1  | Computer 'develop' successfully stored in DB.
aiida_1  | pk: 1, uuid: 71f4cd8f-d792-4c6e-872e-50c633cf3fa3
aiida_1  | Note: before using it with AiiDA, configure it using the command
aiida_1  |   verdi computer configure develop
aiida_1  | (Note: machine_dependent transport parameters cannot be set via 
aiida_1  | the command-line interface at the moment)
aiida_1  | Configuring computer 'develop' for the AiiDA user 'aiida@localhost'
aiida_1  | Computer develop has transport of type local
aiida_1  | There are no special keys to be configured. Configuration completed.
aiida_1  | Testing computer 'develop' for user aiida@localhost...
aiida_1  | > Testing connection...
aiida_1  | > Getting job list...
aiida_1  |   `-> OK, 11 jobs found in the queue.
aiida_1  | > Creating a temporary file in the work directory...
aiida_1  |   `-> Getting the remote user name...
aiida_1  |       [remote username: root]
aiida_1  |       [Checking/creating work directory: /scratch/aiida_run/]
aiida_1  |   `-> Creating the file tmpYPTDQc...
aiida_1  |   `-> Checking if the file has been created...
aiida_1  |       [OK]
aiida_1  |   `-> Retrieving the file and checking its content...
aiida_1  |       [Retrieved]
aiida_1  |       [Content OK]
aiida_1  |   `-> Removing the file...
aiida_1  |   [Deleted successfully]
aiida_1  | Test completed (all 3 tests succeeded)
aiida_1  | At any prompt, type ? to get some help.
aiida_1  | ---------------------------------------
aiida_1  | => Label: => Description: => Local: => Default input plugin: => Remote computer name: => Remote absolute path: => Text to prepend to each command execution
aiida_1  | FOR INSTANCE, MODULES TO BE LOADED FOR THIS CODE: 
aiida_1  |    # This is a multiline input, press CTRL+D on a
aiida_1  |    # empty line when you finish
aiida_1  |    # ------------------------------------------
aiida_1  |    # End of old input. You can keep adding     
aiida_1  |    # lines, or press CTRL+D to store this value
aiida_1  |    # ------------------------------------------
aiida_1  | => Text to append to each command execution: 
aiida_1  |    # This is a multiline input, press CTRL+D on a
aiida_1  |    # empty line when you finish
aiida_1  |    # ------------------------------------------
aiida_1  |    # End of old input. You can keep adding     
aiida_1  |    # lines, or press CTRL+D to store this value
aiida_1  |    # ------------------------------------------
aiida_1  | Code 'siesta' successfully stored in DB.
aiida_1  | pk: 1, uuid: 206f39d3-565f-44cd-a495-f037e0f6f2d1
aiida_1  | Shutting down AiiDA Daemon (134)...
aiida_1  | Waiting for the AiiDA Daemon to shut down...
aiida_1  | AiiDA Daemon shut down correctly.

But upcoming starts (docker-compose up):

aiida_1  | Aiida development environment is set up.
aiida_1  | Clearing all locks ...
aiida_1  | Starting AiiDA Daemon (log file: /root/.aiida/daemon/log/celery.log)...
aiida_1  | Re-initializing workflow stepper stop timestamp
aiida_1  | Daemon started
aiida_1  | Daemon not running (cannot find the PID for it)

While with aiida v0.10.1 the daemon starts (and continuously shows its log as a "service")
Maybe someone has ideas about possible reason?

@ltalirz
Copy link
Member

ltalirz commented Jan 24, 2018

@vdikan First of all, thanks for reporting this and sorry for the late reply - @giovannipizzi usually took care of the docker images and he was unavailable after the 0.11 release (will be back soon, I believe).

I can confirm the problem that the daemon somehow doesn't seem to start properly inside the docker image (see travis logs on aiidateam/aiida_docker_compose#10)

This might have to do with the removal of the supervisor package

@sphuber sphuber added this to the v0.11.1 milestone Jan 25, 2018
@ltalirz
Copy link
Member

ltalirz commented Feb 5, 2018

@giovannipizzi if you can find the time to give a quick look to what is going on in the travis logs on aiidateam/aiida_docker_compose/pull/10 that would be much appreciated

@giovannipizzi
Copy link
Member

I've restarted both jobs, now for both the runs, the logs confirm that the daemon is not running. Could you add a commit where you add, at this point for instance https://github.com/aiidateam/aiida_docker_compose/blob/develop/test-aiida-basic/aiida/scripts/plugin/test_plugin.py#L86 a printout of some info on what the daemon is doing (verdi daemon status, output of the daemon log file etc.) so we can debug more?

I agree it might have to do with the change when we removed supervisor

@giovannipizzi
Copy link
Member

After some local debugging: the problem is that ~/.local/bin is not in the PATH. If I go on the machine and I run ~/.local/bin/verdi daemon restart I get

Clearing all locks ...
Starting AiiDA Daemon (log file: /home/aiida/.aiida/daemon/log/celery.log)...
Traceback (most recent call last):
  File "/home/aiida/.local/bin/verdi", line 9, in <module>
    sys.exit(run())
  File "/home/aiida/code/aiida_core/aiida/cmdline/verdilib.py", line 1049, in run
    aiida.cmdline.verdilib.exec_from_cmdline(sys.argv)
  File "/home/aiida/code/aiida_core/aiida/cmdline/verdilib.py", line 1034, in exec_from_cmdline
    CommandClass.run(*argv[command_position + 1:])
  File "/home/aiida/code/aiida_core/aiida/cmdline/baseclass.py", line 217, in run
    function_to_call(*args[1:])
  File "/home/aiida/code/aiida_core/aiida/cmdline/commands/daemon.py", line 404, in daemon_restart
    self.daemon_start()
  File "/home/aiida/code/aiida_core/aiida/cmdline/commands/daemon.py", line 174, in daemon_start
    env=currenv)
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

If I do export PATH=~/.local/bin/:$PATH, followed by ~/.local/bin/verdi daemon restart, the daemon starts and the calculations complete successfully.

@giovannipizzi
Copy link
Member

One simple solution is probably to add a line with export PATH=~/.local/bin/:$PATH in the .bashrc, what do you think?

@ltalirz
Copy link
Member

ltalirz commented Feb 5, 2018

Ha! The ~/.local/bin problem already struck in materialscloud-org/issues/issues/3 today although (I believe?) in a completely unrelated way.
I'll try to add it to the .bashrc.

What puzzles me is what caused this issue to appear here... apparently not the supervisor removal after all?
We've been using pip install --user in aiidateam/aiida_core_base-docker since a very long time...

@ltalirz
Copy link
Member

ltalirz commented Feb 5, 2018

So, I just checked and also on the previous docker image aiidateam/aiida_core_base:0.10.0 I have

$ docker exec -it c632f4b602f591bd5b9e4dd232621634f0707fc0b72b1ef9df47f9dfffa73bba bash
root@c632f4b602f5:~# verdi
bash: verdi: command not found
root@c632f4b602f5:~# su aiida
aiida@c632f4b602f5:~$ verdi
bash: verdi: command not found

@ltalirz
Copy link
Member

ltalirz commented Feb 5, 2018

Ok, so the problem is you need to connect via a login shell...

$ docker exec -it c632f4b602f591bd5b9e4dd232621634f0707fc0b72b1ef9df47f9dfffa73bba bash -l
root@c632f4b602f5:~/code/aiida_core# verdi
Usage: verdi [--profile=PROFILENAME|-p PROFILENAME] COMMAND [<args>]
...

this actually works for both the 0.10.0 and 0.11.0 docker base images.
Back to trying to figure out what went wrong in aiidateam/aiida_docker_compose/pull/10

@giovannipizzi
Copy link
Member

giovannipizzi commented Feb 5, 2018

What puzzles me is what caused this issue to appear here... apparently not the supervisor removal after all?

I think it is the supervisor removal, before in supervisor the full absolute path was stored in the supervisor script (in "recent" AiiDA versions and until 0.10.0)

@ltalirz
Copy link
Member

ltalirz commented Feb 5, 2018

I mean, in https://travis-ci.org/aiidateam/aiida_docker_compose/jobs/333007950 on line 592, it clearly says Daemon started. So, the problem is that it seems to die afterwards without anybody noticing...
Sorry, for the moment I have to do other things... happy for anyone to investigate further.

@giovannipizzi
Copy link
Member

You are right, indeed if I connect with bash -l the restarting works.
Indeed, in the celery log, I find

[2018-02-05 22:31:06,132: INFO/MainProcess] Connected to sqla+postgresql://aiida:w1-6O6NXkI8j@db:5432/aiida_default
[2018-02-05 22:31:06,250: WARNING/MainProcess] celery@214b044f3885 ready.
[2018-02-05 22:31:06,257: INFO/Beat] beat: Starting...
[2018-02-05 22:31:07,074: INFO/MainProcess] beat: Shutting down...

So if shuts down in less than a sec. Restarting by hand works and it stays up.
A quick google search showed people having problems with other package versions (kombu, but they refer to much older versions than ours), or to problem with the broker (but they used MongoDB that we do not use) so I'm not sure if these suggestions apply...

@giovannipizzi
Copy link
Member

One idea - maybe the new 'daemonizer' does not properly daemonize 'celery', so when the script (after less than a second) finishes its job (after running verdi daemon start) and exits, celery gets killed?

@giovannipizzi
Copy link
Member

I can probably confirm in some way my idea above: if I run

docker-compose exec --user aiida aiida /bin/bash -l -c "verdi daemon start"

the daemon claims to be starting, but then it is not.
Instead if I connect and run verdi daemon start, it works and stays up.

@ltalirz
Copy link
Member

ltalirz commented Feb 5, 2018

So, the question is: what is causing the daemon to die...

Just exiting the terminal in which the daemon was started is not enough, at least not on my mac.

I guess in your case of connecting to the docker image, the user actually logs out(?)
Should the daemon even continue running in this case or should it be killed then?

Edit: On Quantum Mobile with aiida v0.11.0 (Ubuntu 16.04.3), the daemon is not killed when logging out.

@giovannipizzi
Copy link
Member

It is correct, the daemon should not be killed when logging out (that is actually one of the main points).
The issue is, I think, related to the redirection of stdin/stdout. Currently the daemon is started with stdout=PIPE, stderr=PIPE, which is probably incorrect, as it might remain connected to the parent process to try to communicate. I still have to figure out how to properly redirect the stdout to a file, as if the file is opened by the parent, I think this does not solve the problem.
I think this is connected to #1030 - there, a workaround is implemented, but the workaround AFAIK runs within the daemon tasks, so probably if the parent process exits quickly (before a task is run), the two processes are still connected, and therefore killing/terminating the parent terminates the child as well?

@giovannipizzi
Copy link
Member

I think I can confirm my intuition. If I do these changes:

diff --git a/aiida/cmdline/commands/daemon.py b/aiida/cmdline/commands/daemon.py
index 4a077c8..4bf9435 100644
--- a/aiida/cmdline/commands/daemon.py
+++ b/aiida/cmdline/commands/daemon.py
@@ -158,6 +158,7 @@ class Daemon(VerdiCommandWithSubcommands):
 
         print "Starting AiiDA Daemon (log file: {})...".format(self.logfile)
         currenv = _get_env_with_venv_bin()
+        _devnull = os.open(os.devnull, os.O_RDWR)
         process = subprocess.Popen([
                 "celery",  "worker",
                 "--app", "tasks",
@@ -169,10 +170,12 @@ class Daemon(VerdiCommandWithSubcommands):
                 ],
             cwd=self.workdir,
             close_fds=True,
-            stdout=subprocess.PIPE,
-            stderr=subprocess.PIPE,
+            stdout=_devnull,
+            stderr=subprocess.STDOUT,
             env=currenv)
 
+        os.close(_devnull)
+
         # The following lines are needed for the workflow_stepper
         # (re-initialize the timestamps used to lock the task, in case
         # it crashed for some reason).

i.e. I redirect stdout and stderr of the child process to /dev/null (and then close right after starting it, so it does not remain connected), the daemon is not killed and everything works.

Now, the question is that I'm not 100% sure that this is the correct/best way to do it. I think that it is ok in terms of solving the problem, but then all print statements occurring in the daemon will get lost, I imagine (there shouldn't be in principle, but there might be in reality!). Maybe there is a way to redirect stdout to a file at daemon startup? Or maybe we can just open a real file instead of /dev/null, and close it right after, and this works?
I don't know if @dev-zero has a more in-depth knowledge of this or has some suggestions.

@giovannipizzi
Copy link
Member

As a note, if I do these changes

diff --git a/aiida/cmdline/commands/daemon.py b/aiida/cmdline/commands/daemon.py
index 4a077c8..e180c7a 100644
--- a/aiida/cmdline/commands/daemon.py
+++ b/aiida/cmdline/commands/daemon.py
@@ -82,6 +82,7 @@ class Daemon(VerdiCommandWithSubcommands):
         }
 
         self.logfile = setup.DAEMON_LOG_FILE
+        self.stdouterrfile = os.path.join(setup.AIIDA_CONFIG_FOLDER, setup.DAEMON_SUBDIR, "daemon-stdout-err.log")
         self.pidfile = setup.DAEMON_PID_FILE
         self.workdir = os.path.join(os.path.split(os.path.abspath(aiida.__file__))[0], "daemon")
         self.celerybeat_schedule = os.path.join(setup.AIIDA_CONFIG_FOLDER, setup.DAEMON_SUBDIR, "celerybeat-schedule")
@@ -158,6 +159,7 @@ class Daemon(VerdiCommandWithSubcommands):
 
         print "Starting AiiDA Daemon (log file: {})...".format(self.logfile)
         currenv = _get_env_with_venv_bin()
+        _stdouterr = os.open(self.stdouterrfile, os.O_RDWR|os.O_CREAT|os.O_APPEND)
         process = subprocess.Popen([
                 "celery",  "worker",
                 "--app", "tasks",
@@ -169,10 +171,12 @@ class Daemon(VerdiCommandWithSubcommands):
                 ],
             cwd=self.workdir,
             close_fds=True,
-            stdout=subprocess.PIPE,
-            stderr=subprocess.PIPE,
+            stdout=_stdouterr,
+            stderr=subprocess.STDOUT,
             env=currenv)
 
+        os.close(_stdouterr)
+
         # The following lines are needed for the workflow_stepper
         # (re-initialize the timestamps used to lock the task, in case
         # it crashed for some reason).
diff --git a/aiida/daemon/tasks.py b/aiida/daemon/tasks.py
index 9deec9e..528eaaf 100644
--- a/aiida/daemon/tasks.py
+++ b/aiida/daemon/tasks.py
@@ -79,6 +79,9 @@ def updater():
     from aiida.daemon.execmanager import update_jobs
     LOGGER.info('Checking for calculations to update')
     set_daemon_timestamp(task_name='updater', when='start')
+    import sys
+    print "TO STDOUT"
+    print >> sys.stderr, "TO STDERR"
     update_jobs()
     set_daemon_timestamp(task_name='updater', when='stop')
    

the daemon seems to be working correctly (note that it is important to set properly the os.open flags) but the string TO STDOUT or TO STDERR do not seem to appear in any log... As a note, however, stdout/stderr did not seem to we working/logging in vanilla 0.11.0 either! This is probably something we want to fix as well.

@giovannipizzi
Copy link
Member

As an additional note, sys.stdout and sys.stderr are actually celery.utils.log.LoggingProxy instances, so it's probably worth looking into how celery deals with them.

@giovannipizzi
Copy link
Member

As an additional edit, the logger only has one handler, sys.stdout.logger.handlers = [<kombu.log.NullHandler object at 0x7f378216c810>, and a level of 30 (WARNING). So I guess the correct thing to do is to add a file handler to this, and we can probably keep redirecting to /dev/null when creating the process?

@giovannipizzi
Copy link
Member

Note that the difference is if, in a shell, you run
exec verdi daemon start (it does not work, same behaviour as in docker)
vs
verdi daemon start (it works)

@vdikan
Copy link
Author

vdikan commented Feb 8, 2018

Thanks for reply! Unfortunately,
I cannot repeat your experience since in my container $PATH already points to where verdi executable resides:

root@d1b06ef64b80:~# which verdi
/usr/local/bin/verdi
root@d1b06ef64b80:~# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

The behavior of the daemon is strange: it starts, but only once(!).
In fact, the docker stack can still be used as a development environment, because dev containers are expendable on purpose. See my launcher script in this case:
https://github.com/vdikan/aiida_siesta_plugin/blob/with_11_dev/dockerscripts/setup_develop.sh

However, I was looking for using aiida inside docker also on the production workstation, and this bug prevents me from doing so :(

@giovannipizzi
Copy link
Member

indeed, the $PATH was something I thought was the cause, but it is not. The following messages should explain more the actual problem. I think I understood how to potentially solve it and I'm working on it, but it might take some time because I want to fix also the missing logs. In the meantime, could you try to change your code according the the first diff I posted above (the one with os.devnull) and tell me if this solves the problem?

(You can create the new file, and modify the Dockerfile to COPY the file in the right place - if you do it from the root user, remember to chown it to the aiida:aiida user)

@dev-zero
Copy link
Contributor

dev-zero commented Feb 8, 2018

@giovannipizzi since it seems, that celery doesn't redirect the stdout and/or stderr properly to the logfile for its workers, wouldn't it make sense to drop the --logfile ... parameter instead and directly open the logfile?

The following snippet seems to work fine here:

import subprocess

with open("celery.out", 'w') as fhandle:
    process = subprocess.Popen(
        [
            'celery',
            'worker',
            '--app', 'aiida.daemon.tasks',
            '--loglevel', 'INFO',
            '--beat',
            '--schedule', '/users/tiziano/.aiida/daemon/celerybeat-schedule',
            ],
        stdout=fhandle,
        stderr=subprocess.STDOUT,
        close_fds=True,
        )

@vdikan
Copy link
Author

vdikan commented Feb 8, 2018

@giovannipizzi the diff doesnt work:

root@e474c836a097:/code# verdi daemon status
# Most recent daemon timestamp:0h:00m:09s ago
Daemon is running as pid 151 since 2018-02-08 18:13:34.020000, child processes:
   * celery[167]   sleeping, started at 2018-02-08 18:13:34
   * celery[168]   sleeping, started at 2018-02-08 18:13:34
   * celery[169]   sleeping, started at 2018-02-08 18:13:34
   * celery[170]   sleeping, started at 2018-02-08 18:13:34
   * celery[171]   sleeping, started at 2018-02-08 18:13:34
   * celery[172]   sleeping, started at 2018-02-08 18:13:34
   * celery[173]   sleeping, started at 2018-02-08 18:13:34
   * celery[174]   sleeping, started at 2018-02-08 18:13:34
   * celery[178] disk-sleep, started at 2018-02-08 18:13:35
root@e474c836a097:/code# verdi daemon stop
Shutting down AiiDA Daemon (151)...
Waiting for the AiiDA Daemon to shut down...
AiiDA Daemon shut down correctly.
root@e474c836a097:/code# verdi daemon start
Clearing all locks ...
Starting AiiDA Daemon (log file: /root/.aiida/daemon/log/celery.log)...
Daemon started
root@e474c836a097:/code# verdi daemon status
# Most recent daemon timestamp:0h:00m:22s ago
Daemon not running (cannot find the PID for it)
root@e474c836a097:/code# cat /usr/local/lib/python2.7/site-packages/aiida/cmdline/commands/daemon.py | grep _devnull
        _devnull = os.open(os.devnull, os.O_RDWR)
            stdout=_devnull,
        os.close(_devnull)

@giovannipizzi
Copy link
Member

@dev-zero it could be a good idea. Anyway, I think I discovered why celery is losing some logs, it has to do with the way we configure the logging. I will try something soon, but I think it can be combined with your suggestion.

@vdikan strange, to me it was working properly... A couple of questions just to make sure: is it right that you are running as root? In general it's not a very good idea.
I see you actually have to make tricks/set environment variables to allow things to work under root. This might be one of the problems? I suggest that you give a look to our https://github.com/aiidateam/aiida_core_base-docker where we run as a specific user aiida, with that docker - actually ran via the scripts in https://github.com/aiidateam/aiida_docker_compose - it was working for me with the fix. 2. if point 1 is not the problem, could you double-check that indeed if you do 'import aiida' in a verdi shell, it is importing from /usr/local/lib/python2.7/site-packages/aiida/?

@vdikan
Copy link
Author

vdikan commented Feb 9, 2018

@giovannipizzi You are right, I did not set up a separate user for dev/test container. For testing purposes it's fine not to (though it is a good idea for production workstation). The only trick for celery to run under root was export C_FORCE_ROOT=1. The whole thing works as expected for v0.10.1.
Second point, aiida really points to /usr/local/lib/python2.7/site-packages/aiida/.

In some time I will test it with a second user in the container.

@ltalirz
Copy link
Member

ltalirz commented Mar 7, 2018

@giovannipizzi Is there something left to be done here?
This is the last issue for the release_v0.11.1 milestone...

giovannipizzi added a commit to giovannipizzi/aiida-core that referenced this issue Mar 7, 2018
…ker and

also fix the missing stdout/stderr logs.
@giovannipizzi
Copy link
Member

Hopefully, #1246 should solve this issue (note, BTW, that when we will merge the 'workflows' branch, the implementation of the daemon will be different so this PR will not be useful (I also mention @sphuber - so that when he will merge this or develop into 'workflows' and will most probably find conflicts, he knows that probably these files don't even exist anymore in that branch).

To test, you can use the aiida_docker_compose repository, subfolder test-aiida-basic, replacing the Dockerfile in the aiida subfolder with the attached one, that picks the code from my branch:
Dockerfile.txt
(Note: you have to remove the .txt extension).

BTW, @ltalirz: I updated the aiida_docker_base for 0.11.0 with the fix of putting ~/.aiida/local/bin in the PATH in the bashrc, you might need to remove your docker cache (do a docker image rm for that image).

Sorry it took longer than expected, but it was much more complex than expected. In the end, before the actual fix (putting the new process in a different process group), various (partial) solutions I was trying were randomly working or not working, and I wasn't able to reproduce the results. Now with this fix I'm quite convinced it should work, and I tried to create the machine a few times and all seems to be ok.

@sphuber
Copy link
Contributor

sphuber commented Mar 8, 2018

@giovannipizzi thanks for the mention, indeed good to know for when we will be merging. On that note, we should remember to test if the new daemon setup also works on docker.

@ltalirz
Copy link
Member

ltalirz commented Mar 8, 2018

@giovannipizzi Thanks a lot. #1246 is merged - I close this issue now.
Happy to reopen if there are problems with docker with aiida v0.11.1.

By the way, this issue isn't related to the random failing of the travis tests by any chance, is it?
Do we have an open issue for this? I think it's kind of important in terms of the amount of extra work it is generating... I've never looked into it properly, so perhaps someone who has could collect the current state of knowledge.

@ltalirz ltalirz closed this as completed Mar 8, 2018
@giovannipizzi
Copy link
Member

For the ssh on travis - no, it should be unrelated, that I think is more of a problem with the base_ssh docker. I'm not sure what's causing it, and it's hard to debug (also because we rerun the jobs and lose the few logs we have).
Good to open an issue and ask people to copy-paste all logs in there before restarting, so it helps in debugging. I think what happens is some kind of SSH_CONNECTION_... error, not sure why though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants