Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown hangs on SIGINT #473

Closed
almaslov opened this issue Nov 18, 2020 · 10 comments
Closed

Shutdown hangs on SIGINT #473

almaslov opened this issue Nov 18, 2020 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@almaslov
Copy link

Bug report

Required Info:

  • Operating System:
    • DISTRIB_ID=Ubuntu
    • DISTRIB_RELEASE=20.04
    • DISTRIB_CODENAME=focal
    • DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
  • Installation type:
    • binaries
  • Version or commit hash:
    • ros-foxy-ros2launch 0.10.3-1focal.20201008.001853
  • DDS implementation:
    • Fast-RTPS

Steps to reproduce issue

Any rarely logging binary/script, e.g. /tmp/simple_script.bash:

#!/usr/bin/env bash
sleep 20
echo 'Hello'
sleep 20

A launch file /tmp/simple.launch.py:

from launch import LaunchDescription
import launch.actions


def generate_launch_description():
    return LaunchDescription([
        launch.actions.ExecuteProcess(cmd=['/tmp/simple_script.bash'], output='screen')
    ])

Run the launch file with debug option:

ros2 launch -d /tmp/simple.launch.py

In separate terminal tab send SIGINT to launcher process:

pkill -2 ros2

Expected behavior

In best case I expect launcher would forward SIGINT to my bash subprocess and both processes terminate gracefully. At least I expect launcher would terminate in finite time.

Actual behavior

Nothing happens until subprocess echoes. Looks like Shutdown event stuck in queue. In the log below you can see there's about 19 second between emitting Shutdown event and processing one:

1605703730.0072842 [DEBUG] [launch.launch_context]: emitting event synchronously: 'launch.events.IncludeLaunchDescription'
1605703730.0082226 [INFO] [launch]: All log files can be found below /home/user/.ros/log/2020-11-18-12-48-49-887166-27703d5967b1-7901
1605703730.0083606 [INFO] [launch]: Default logging verbosity is set to DEBUG
1605703730.0106397 [DEBUG] [launch]: processing event: '<launch.events.include_launch_description.IncludeLaunchDescription object at 0x7f462fc195b0>'
1605703730.0108380 [DEBUG] [launch]: processing event: '<launch.events.include_launch_description.IncludeLaunchDescription object at 0x7f462fc195b0>' ✓ '<launch.event_handlers.on_include_launch_description.OnIncludeLaunchDescription object at 0x7f462fc28e80>'
1605703730.0200140 [INFO] [simple_script.bash-1]: process started with pid [7903]
1605703730.0203502 [DEBUG] [launch.launch_context]: emitting event: 'launch.events.process.ProcessStarted'
1605703730.0207794 [DEBUG] [launch]: processing event: '<launch.events.process.process_started.ProcessStarted object at 0x7f463042ccd0>'
1605703731.6404905 [WARNING] [launch]: user interrupted with ctrl-c (SIGINT)
1605703731.6411691 [DEBUG] [launch.launch_context]: emitting event synchronously: 'launch.events.Shutdown'
1605703750.3496287 [DEBUG] [launch]: processing event: '<launch.events.shutdown.Shutdown object at 0x7f46304a48e0>'(8)
1605703750.3498898 [DEBUG] [launch]: processing event: '<launch.events.shutdown.Shutdown object at 0x7f46304a48e0>' ✓ '<launch.event_handlers.on_shutdown.OnShutdown object at 0x7f462fb29100>'
1605703750.3528886 [DEBUG] [launch]: processing event: '<launch.events.shutdown.Shutdown object at 0x7f46304a48e0>' ✓ '<launch.event_handlers.on_shutdown.OnShutdown object at 0x7f462fc82550>'
1605703750.3548706 [DEBUG] [launch.launch_context]: emitting event synchronously: 'launch.events.process.ProcessStdout'
1605703750.3557782 [DEBUG] [launch]: processing event: '<launch.events.process.process_stdout.ProcessStdout object at 0x7f462fb29d90>'
1605703750.3559816 [DEBUG] [launch]: processing event: '<launch.events.process.process_stdout.ProcessStdout object at 0x7f462fb29d90>' ✓ '<launch.event_handlers.on_process_io.OnProcessIO object at 0x7f462fb290d0>'
1605703755.3556085 [DEBUG] [launch.launch_context]: emitting event: 'launch.events.TimerEvent'
1605703755.3560567 [DEBUG] [launch]: processing event: '<launch.events.timer_event.TimerEvent object at 0x7f462fcd1490>'
1605703755.3564785 [DEBUG] [launch]: processing event: '<launch.events.timer_event.TimerEvent object at 0x7f462fcd1490>' ✓ '<launch.event_handler.EventHandler object at 0x7f462fb29f70>'
1605703755.3568366 [ERROR] [simple_script.bash-1]: process[simple_script.bash-1] failed to terminate '5' seconds after receiving 'SIGINT', escalating to 'SIGTERM'
1605703755.3571658 [DEBUG] [launch.launch_context]: emitting event synchronously: 'launch.events.process.SignalProcess'
1605703755.3583355 [DEBUG] [launch]: processing event: '<launch.events.process.signal_process.SignalProcess object at 0x7f462fb29a90>'
1605703755.3584905 [DEBUG] [launch]: processing event: '<launch.events.process.signal_process.SignalProcess object at 0x7f462fb29a90>' ✓ '<launch.event_handler.EventHandler object at 0x7f462fb29070>'
1605703755.3588274 [INFO] [simple_script.bash-1]: sending signal 'SIGTERM' to process[simple_script.bash-1]
1605703755.3613477 [ERROR] [simple_script.bash-1]: process has died [pid 7903, exit code -15, cmd '/tmp/simple_script.bash'].
1605703755.3616023 [DEBUG] [launch.launch_context]: emitting event: 'launch.events.process.ProcessExited'
1605703755.3632402 [DEBUG] [launch]: processing event: '<launch.events.process.process_exited.ProcessExited object at 0x7f462fcd1490>'(9)
1605703755.3634789 [DEBUG] [launch]: processing event: '<launch.events.process.process_exited.ProcessExited object at 0x7f462fcd1490>' ✓ '<launch.event_handlers.on_process_exit.OnProcessExit object at 0x7f462fb291f0>'
1605703755.3637152 [DEBUG] [launch]: processing event: '<launch.events.process.process_exited.ProcessExited object at 0x7f462fcd1490>' ✓ '<launch.event_handlers.on_process_exit.OnProcessExit object at 0x7f462fb29130>'

Additional information

This behaviour is not observed when sending SIGINT with Ctrl+C in terminal, since TTY machinery sends signal to the whole process group and subprocesses terminate without parent's help.
In my case I tried to run ros2 launch in docker container. docker stop sends specified signal only to parent process.

@jacobperron jacobperron self-assigned this Nov 18, 2020
@jacobperron
Copy link
Member

I can confirm the issue also affects Rolling.

@jacobperron jacobperron transferred this issue from ros2/launch_ros Nov 18, 2020
@jacobperron
Copy link
Member

I've transferred this issue since it affects launch generally. We can reproduce without ros2launch as well, extending the original example:

import sys

from launch import LaunchDescription
from launch import LaunchService
import launch.actions


def generate_launch_description():
    return LaunchDescription([
        launch.actions.ExecuteProcess(cmd=['./simple_script.bash'], output='screen')
    ])

if __name__ == '__main__':
    ls = LaunchService(argv=sys.argv)
    ls.include_launch_description(generate_launch_description())
    sys.exit(ls.run())

Launch with:

python3 simple.launch.py

then kill with something like:

pkill -2 python3

@jacobperron jacobperron added the bug Something isn't working label Nov 18, 2020
@jacobperron
Copy link
Member

It looks like ExecuteProcess assumes that if shutdown was initiated by a SIGINT, that the subprocess also received the signal:

send_sigint=(not cast(Shutdown, event).due_to_sigint),

@jacobperron
Copy link
Member

I'm not sure why, but we never reach the following line when sigint is sent from a separate shell:

send_sigint=(not cast(Shutdown, event).due_to_sigint),

@jacobperron jacobperron removed their assignment Nov 18, 2020
@almaslov
Copy link
Author

Yep.
Actually I only meant event stuck as a bug. But if you implement SIGINT forwarding, that would be great.

@jacobperron
Copy link
Member

I don't have time right now to look into a patch (I'm not sure exactly where in the code a fix needs to land); help is appreciated.

@almaslov
Copy link
Author

I don't have time right now to look into a patch

So do I.
In case this task takes too long, as a temporary solution for my initial problem tiny -g or dumb-init can be used as an entry point in docker to forward signal to process group.

@hidmic
Copy link
Contributor

hidmic commented Dec 10, 2020

Ok, there are several different issues here:

  1. As noted by @jacobperron, launch assumes SIGINT came via Ctrl+C i.e. SIGINT was sent to the ros2 launch process group from a shell with job control enabled. If that's not the case, the signal will not be propagated to subprocesses. That's why SIGINT is never sent and SIGTERM eventually kicks in.
  2. Even if launch did forward SIGINT, the provided repro script would still not exit as it does not perform signal forwarding on its own. Moreover, if it did forward signals by running sleep as a background process and wait'ing on it, background processes run in a non-interactive shells ignore SIGINT.
  3. Something's off with events sent from signal handlers. It does not seem it's waking up the loop. This post suggests asyncio.Queue implementation is not re-entrant and that the correct way to ensure a signal handler safely interacts with an asyncio event loop is using the add_signal_handler() method on Unix platforms. This isn't supported at all on Windows -- this pypi package suggests it's plain broken. I wonder if there's a connection between this and Tests hanging during ROS 2 CI tests build_farmer#248.

I'm not sure how to deal with (1.). Perhaps we could let launch know it's running in a non-interactive shell so it can forward or not forward signals accordingly. (2.) is on the user to properly deal with signals and job control. (3.) is the most interesting one but it requires a significant rehash of the current signal handling code. It'll take me longer.

@hidmic
Copy link
Contributor

hidmic commented Dec 10, 2020

Putting it in for Galactic solely because signal handling seems broken. FYI @clalancette.

@hidmic
Copy link
Contributor

hidmic commented Jan 22, 2021

I believe this has been solved. I'll proceed to close this ticket. Feel free to reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants