Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accounting is wrong when 32-bit build launches a 64-bit binary #1

Closed
tavianator opened this issue Nov 17, 2021 · 4 comments
Closed

Comments

@tavianator
Copy link
Contributor

Continuing the discussion from sharkdp/fd#410 (comment)

The actual argv/envp pointer are counted in units of size_of::<*const c_char>():

argmax/src/unix.rs

Lines 50 to 54 in 51575d7

pub(crate) fn arg_size<O: AsRef<OsStr>>(arg: O) -> i64 {
size_of::<*const c_char>() as i64 // size for the pointer in argv**
+ arg.as_ref().len() as i64 // size for argument string
+ 1 // terminating NULL
}

But this is wrong if a 32-bit program is launching a 64-bit one on a 64-bit kernel. This leads to this failure: https://github.com/sharkdp/argmax/runs/4231259608?check_suite_focus=true. It can be reproduced locally with

$ rustup target add i686-unknown-linux-gnu
$ cargo test --target=i686-unknown-linux-gnu
...
Trying execution with 260638 args
thread 'can_run_command_with_maximum_number_of_arguments' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 7, kind: ArgumentListTooLong, message: "Argument list too long" }', tests/main.rs:49:30
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

A conservative fix would be to assume that pointers are at least 8 bytes wide on all platforms. Ideally we'd only do this on 64-bit kernels, but I don't know a good way to detect that from userspace.

@tavianator
Copy link
Contributor Author

This bug affects bfs too. I just implemented a binary search to find ARG_MAX empirically which is much better than before: tavianator/bfs@ac1d28e

@tavianator
Copy link
Contributor Author

GNU xargs tries to do the same thing: https://git.savannah.gnu.org/cgit/findutils.git/tree/lib/buildcmd.c?id=372cd34894e247fe5c2991eb75185ea2ec850ee2#n190

But their implementation is buggy because limit can be far outside [largest_successful_arg_count, smallest_failed_arg_count]. In my case I get shift == 1 and it has to reduce limit from ~30k to ~10k which takes forever:

$ ./configure CFLAGS="-m32 -g" TIME_T_32_BIT_OK=yes
$ make
$ ulimit -s 512
$ yes foo | head -n100000 | time xargs echo >/dev/null
xargs echo > /dev/null  0.03s user 0.08s system 102% cpu 0.111 total
$ yes foo | head -n100000 | time ./xargs/xargs echo >/dev/null
./xargs/xargs echo > /dev/null  15.91s user 7.06s system 101% cpu 22.550 total

@sharkdp
Copy link
Owner

sharkdp commented Nov 18, 2021

Thank you very much! I implemented the proposed fix in 2057ea1. This fixes the unit tests for i686-unknown-linux-gnu.

The updated status is here: #2. The only thing which would be nice to fix for now is the unreasonably small limit on musl targets.

This bug affects bfs too. I just implemented a binary search to find ARG_MAX empirically which is much better than before: tavianator/bfs@ac1d28e

So even with the conservative guess for the pointer size, do you think there will always be cases where the command line length can overflow? Will such a backup strategy be needed for fd as well?

By binary search, do you mean: exponential backoff? I.e. divide the number of arguments by two until it works? Or do you really go up again after you found a working number of arguments? In that case, would bfs -exec do something like the following?

  • try with 400,000 arguments => fails
  • try with 200,000 arguments => fails
  • try with 100,000 arguments => succeeds => i.e. the first 100,000 search results have been processed; 300,000 to go
  • try with 150,000 arguments => fails
  • try with 125,000 arguments => succeeds => i.e. the first 225,000 search results have now been processed…

@tavianator
Copy link
Contributor Author

By binary search, do you mean: exponential backoff? I.e. divide the number of arguments by two until it works? Or do you really go up again after you found a working number of arguments?

Yeah the limit can go back up:

bfs: -D exec: -exec: Got E2BIG, shrinking argument list...
bfs: -D exec: -exec: ARG_MAX between [0, 2086316], trying 1043158
bfs: -D exec: -exec: ARG_MAX between [1043085, 2086316], trying 1564700
bfs: -D exec: -exec: ARG_MAX between [1564682, 2086316], trying 1825499
bfs: -D exec: -exec: ARG_MAX between [1825451, 2086316], trying 1955883
bfs: -D exec: -exec: ARG_MAX between [1955849, 2086316], trying 2021082
bfs: -D exec: -exec: Got E2BIG, shrinking argument list...
bfs: -D exec: -exec: ARG_MAX between [1955849, 2021017], trying 1988433
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants