Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/stdenv/cross-compilation.chapter.md: explain tuples #180030

Closed
wants to merge 9 commits into from
9 changes: 8 additions & 1 deletion doc/stdenv/cross-compilation.chapter.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,14 @@ The exact schema these fields follow is a bit ill-defined due to a long and conv

`config`

: This is a 3- or 4- component shorthand for the platform. Examples of this would be `x86_64-unknown-linux-gnu` and `aarch64-apple-darwin14`. This is a standard format called the "LLVM target triple", as they are pioneered by LLVM. In the 4-part form, this corresponds to `[cpu]-[vendor]-[os]-[abi]`. This format is strictly more informative than the "Nix host double", as the previous format could analogously be termed. This needs a better name than `config`!
: This is a 3-, 4-, or 5- component shorthand for the platform. Examples of this would be `x86_64-unknown-linux-gnux32`, `aarch64-apple-darwin14`, and `mips64el-unknown-linux-muslabin32`. This is a standard format called the "[multiarch tuple](https://wiki.debian.org/Multiarch/Tuples)", as [pioneered by autoconf](https://www.gnu.org/software/autoconf/manual/autoconf-2.65/html_node/System-Type.html#System-Type), [disambiguated to create multiarch](https://wiki.debian.org/Multiarch/Tuples#Used_solution), and adopted by LLVM. In the 5-part form, this corresponds to `[cpu]-[vendor]-[os]-[libc][abi]`; note that there is no hyphen separating the `[libc]` field from the `[abi]` field. This format is strictly more informative than the "Nix host double", as the previous format could analogously be termed. This needs a better name than `config`!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd honestly leave Multiarch out of this. It is yet another system that has not necessary anything to do with us, since it is what Debian does. autoconf OTOH is used for figuring out builtins.currentSystem, so it makes sense why it is a point of reference for us.

Another problem about multiarch is that we actually don't do multiarch (in the sense of the directory layout) and getting a multiarch (i.e. multilib) gcc to work is a struggle (if it's even possible?), so using the term may lead to confusion.

Note also that autoconf itself calls its (1), 2, 3 or 4 component platform strings target triplets, canonical system type or canonical name.

Considering the string to be up to 5 components is a bad idea, especially for documentation purposes. Both nixpkgs and autoconf only allow up to 4 dash-separated components. While autoconf surely globs on the 4th components (in fact it probably does on all of them), considering it as (sometimes) multifielded is a stretch. Nixpkgs treats it as an opaque string and this is, in my opinion, the correct interpretation, even though some of those strings relate to each other via some structure. The used libc is derived from the ABI in nixpkgs and never directly derived from the platform string.

Nix host double is a bad term, it'd be better to call them Nix systems or Nix system doubles for consistency. It's also not the “previous format”, but very much used today.

Copy link
Author

@ghost ghost Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're interested, I have a note somewhere on my laptop from when I researched this topic, maybe that's helpful for your purposes.

Yes, I am interested in that.

I'd honestly leave Multiarch out of this.

Okay.

Note also that autoconf itself calls its (1), 2, 3 or 4 component platform strings target triplets

I think is slightly bonkers to call a four-component thing a "triple".

Both nixpkgs and autoconf only allow up to 4 dash-separated components

Unfortunately the last two components are not separated by a dash.

The used libc is derived from the ABI in nixpkgs

This is not true! Both mips64el-linux-gnuabin32 and mips64el-linux-muslabin32 use the same ABI: n32 yet they use different libcs. If the libc were derived from the ABI then this would be impossible!

This is exactly the sort of confusion that comes from people pretending that the last two components are really one big component.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix host double is a bad term, it'd be better to call them Nix systems or Nix system doubles for consistency. It's also not the “previous format”, but very much used today.

FWIW I didn't write that sentence, it was preexisting text. I have however updated it with your recommendation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true! Both mips64el-linux-gnuabin32 and mips64el-linux-muslabin32 use the same ABI: n32 yet they use different libcs. If the libc were derived from the ABI then this would be impossible!

This is exactly the sort of confusion that comes from people pretending that the last two components are really one big component.

As you can see, nixpkgs does not consider these ABIs the same:

nix-repl> :p (lib.systems.elaborate { config = "mips64el-linux-gnuabin32"; }).parsed.a
{ _type = "abi"; abi = "n32"; name = "gnuabin32"; }

nix-repl> :p (lib.systems.elaborate { config = "mips64el-linux-muslabin32"; }).parsed.
{ _type = "abi"; abi = "n32"; name = "muslabin32"; }

The libc is derived in after parsing the triple itself by looking at the ABI, but you can of course also theoretically override this (lib.systems.elaborate { config = "mips64el-linux-muslabin32"; libc = "glibc"; } (where the ABI would stay musl and even isMusl == true)). This is why I think it makes more sense to treat the ABI component as some opaque string which we can sometimes extract more information out of (n32 ABI, musl “ABI”).

Overall I guess it is a bit of a problem that we conflate ABI and libc in nixpkgs in a weird way which is also not done by autoconf (the third component for autoconf can either be os or kernel-system which is much vaguer (e.g. linux-musl)).

I think is slightly bonkers to call a four-component thing a "triple".

Yes, but I'm not sure about a good alternative that is also in use; LLVM's triples are also not always three component strings.

Yes, I am interested in that.

https://gist.github.com/sternenseemann/a00d91b8e58cca3e18792771483b4c25

Copy link
Author

@ghost ghost Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, nixpkgs does not consider these ABIs the same:

That is an egregious nixpkgs bug which really should be fixed.

n32 is an ABI and so is x32; gnuabin32, muslabin32, gnuabix32, and muslabix32 are not ABIs.

musl “ABI”).

Musl isn't an ABI... they maintain a list of ABIs they support and links to what they use as the definition for each ABI.

ABIs exist even when there is no libc around:

  • A statically-linked no_std rust binary will have no libc involved at all (neither "musl" nor "gnu" nor anything else), yet it still has to pick an ABI if it wants to make system calls into the kernel.
  • When enabling seccomp (for example for the nix sandbox) a process needs to declare what ABIs it will use, in order to enable system call filtering. This is totally independent of libc choice.
  • file can easily detect the ABI of a statically-linked ELF binary (it's part of the magic bytes), but not which libc (if any) it includes

(the third component for autoconf can either be os or kernel-system which is much vaguer (e.g. linux-musl)).

As far as I can tell autoconf considers any part matching the regex e?abi[^-]*$ to be a comment field. As I mentioned elsewhere, arm-unknown-linux-eabicrazypants is a valid autoconf-name.

I think is slightly bonkers to call a four-component thing a "triple".

Yes, but I'm not sure about a good alternative that is also in use

canonical?


"Multiarch tuple" means exactly the same thing as "autoconf tuple" except in [two specific cases](https://wiki.debian.org/Multiarch/Tuples) dealing with 32-bit architectures:

1. Autoconf has multiple `[cpu]` fields for 32-bit x86 systems (`i386-`, `i486-`, `i586-`, and `i686-`). Multiarch uses `i386-` for all of them.
2. The 32-bit ARM ABI for systems with hardware floating point is incompatible with the ABI for systems without floating point. Autoconf tuples use the same tuples (`arm-*-*eabi`) for both of these ABIs; multiarch distinguishes between them as `arm-*-*eabi` and `arm-*-*eabihf`.

This field should be *canonicalized*. The rules for canonicalizing a tuple are kept in the `config.guess` file in the source code for `autoconf`.

`parsed`

Expand Down