-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On Unix-like OSes, std::process::Command::spawn() should not use assert! from the child process after fork() #73894
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This is a good catch. We certainly try to be signal-safe in such contexts, and failing to do so is a bug. Sorry to see that this didn't get a response. I'm reopening this, as it remains a bug. |
I discovered that if rust/src/libstd/sys/unix/stack_overflow.rs Lines 92 to 106 in 891e6fe
Maybe I should open a new issue or report the abort(3) bug to glibc...
|
On Mon, Jul 20, 2020 at 01:32:22AM -0700, hyd-dev wrote:
I just realized if `libc::abort()` is not signal-safe (in glibc), then `rtabort!()` is not signal-safe, so the signal handlers in the standard library that call `rtabort!()` is actually not signal-safe:
https://github.com/rust-lang/rust/blob/891e6fee572009ff2be4d4057fb33483610c36a7/src/libstd/sys/unix/stack_overflow.rs#L92-L106
Maybe I should open a new issue or report the `abort(3)` bug to glibc.
`signal-safety(7)` lists `abort(3)` as async-signal-safe. If it isn't in
practice async-signal-safe in glibc, that's a bug and you should report
it.
|
Reported as https://sourceware.org/bugzilla/show_bug.cgi?id=26275. |
So I think |
@joshtriplett Should this be fixed like that? |
Given the above research, I don't think |
Using |
I would prefer just using |
That call chain is rust/library/std/src/sys/unix/mod.rs Lines 159 to 168 in e5e33eb
|
The note about flushing files isn't something we can rely on: https://man7.org/linux/man-pages/man3/abort.3.html
|
In upstream glibc the goal is for abort() to be AS-safe as required by the standard and it should be today (after glibc 2.27). In fact doing anything superfluous during the abort() generally constitutes a security hazard because the process can be in an unknown state when corruption is detected and the abort() called. For the sake of reducing restart latency and avoiding executing any compromised parts of the process we shut down as quickly as possible. We strongly recommend out-of-process backtracing to avoid needing to run any of that code in the process with bad or unknown state. |
It's not. It tries to hold a lock. |
Example demonstrating a deadlock. After a dozen of so runs leaves deadlocked children behind: #include <assert.h>
#include <pthread.h>
#include <signal.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
void *lazy_abort(void *arg) {
struct timespec req = {.tv_sec = 0, .tv_nsec = 1000000 };
nanosleep(&req, NULL);
abort();
return NULL;
}
int main() {
pthread_t thread;
signal(SIGABRT, SIG_IGN);
int r = pthread_create(&thread, NULL, lazy_abort, NULL);
assert(r == 0);
while (1) {
pid_t pid = fork();
assert(pid != -1);
if (pid == 0) {
abort();
}
}
} At least the conditions for deadlock are quite unlikely, since both a child process and a thread in the parent process must abort. The musl implementation looks correct (it also uses a lock, which by itself is only a weak indicator of a potential issue). |
@tmiasko You may want to post that on the glibc bug report here: https://sourceware.org/bugzilla/show_bug.cgi?id=26275 |
@tmiasko I have a fix for glibc. Locks are basically required to avoid seeing the unmasked signal state in the child. However, you have to take that lock during fork to ensure that you own and can unlock the lock in the child (avoids the deadlock). The regression test was not straight forward to write and on average it only reproduces once every 100-600 seconds on an x86_64 VM with 4vCPU. That's good enough though and I run the test for 20s to probabilistically try to catch if the defect comes back. On average if I ran the regression test just once per dev-test cycle it might take something like 32 years to show the defect :-( Thankfully more people than just the dev-test cycle will be using the API. |
Unfortunately, avoiding |
I think we can make |
Please don't use |
|
It is: https://www.man7.org/linux/man-pages/man7/signal-safety.7.html Although you should probably use |
Sorry. I confused. You're correct. |
@Amanieu https://man7.org/linux/man-pages/man3/raise.3.html#DESCRIPTION says I can submit a PR to do that. |
IMO we should just leave it as glibc's acknowledged bug, unless folks are actively hitting this problem. While we do hope to be technically "pure", the actual abort race seems like a very artificial situation that I doubt is a problem in practice. |
@cuviper @Amanieu I've found that if I have to avoid
Fair. Also, thanks @codonell for fixing the race in glibc. I can still submit a PR to just change the |
On Unix-like OSes,
std::process::Command::spawn()
uses anassert!
to check if the pipe I/O succeeded. If thatassert!
fails, it will callpanic!
, which is not signal-safe. It's better to useif ....is_err() { std::intrinsics::abort() }
there.libc::abort()
(std::process::abort()
) shouldn't be used as a replacement, as, at least in glibc, it's not signal-safe[1][2], although all of the C standard, the C++ standard, the POSIX standard, and Linux man-pages say it should be.[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort.c;h=df98782dd7ea6c1476184a365bd9f3954f481a18;hb=refs/heads/master#l54
[2] https://www.gnu.org/software/libc/manual/html_node/Aborting-a-Program.html#Aborting-a-Program
The text was updated successfully, but these errors were encountered: