Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdenvAdapters.useLibsFrom: consider reverting or changing the interface #283517

Open
SomeoneSerge opened this issue Jan 24, 2024 · 6 comments
Open
Labels
6.topic: cuda Parallel computing platform and API 6.topic: stdenv Standard environment

Comments

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Jan 24, 2024

Issue description

The PR #275947 introduced a new attribute in stdenvAdapters, the experimental adapter useLibsFrom = stdenvForLibs: stdenvForCC: ... that constructs a stdenv with the former argument's "standard libraries" (chiefly libstdc++) and the latter's compilers and tools. The intended use case is using older toolchains (gcc12Stdenv) to build shared libraries to be loaded in processes that may require a newer libstdc++ (one from the "default" stdenv, e.g. gcc13). The particular form of the adapter is a result of a brief exchange on Matrix, not a product of thoughtful consideration or any reasonable review process.

Impossible promises

As explicated in #282356, for this to actually work in non-trivial settings we'd need to be able to compose multiple cc-wrappers (in particular, through their cc and gccForLibs arguments). This may not actually be trivial to implement, in which case the new adapter is dangerous (it's promising something we couldn't provide).

Leaky interfaces

Additionally, the "adapter" interface is expressed in terms of wrapped values: it doesn't accept e.g. ccForLibs or a (made up name, meaning to refer either to libcxx or gcc.cc.lib) cxxStdlib as is, but only boxed inside a stdenv. The same applies to the cc-wrapper itself: it takes two different inputs, libcxx and gccForLibs, that impact the same aspect of the wrapper and may be specified individually or simultaneously. In the latter case, inferring the priorities requires consulting with the documentation.

It may be a revealing symptom that downstream expressions are sometimes tempted to inspect the cc-wrapper for their needs, e.g. to infer which implementation of the c++ standard library is being used, or whether the shared libraries' location is expected to be prefixed with the platform's "triple". A more frequent and innocuous need is to infer whether the wrapper contains a gcc or a clang. We should perhaps set (document) explicit boundaries so that cc-wrapper may be treated as a black box, and one with the minimal surface. It's likely OK to ask cc.isClang. On the other hand, a downstream package trying to directly access the c++ standard library might signal that we need to expose a more constraint tool (possibly useLibsFrom is one).

Some possibly motivating examples:

Rigid interfaces

The current interface might be too rigid. Example: the current form of useLibsFrom allows one to specify which library to use both and build- and run-times. We couldn't express e.g. a requirement like "use the oldest available libc/libstdc++ with a slightly less old gcc, but then put the newest possible libstdc++ in the runpaths". I don't know if this is a valid use-case (why not also use the oldest possible gcc?), but if it were we'd fail to accommodate it.

Notify the maintainers

I've no experience with the wrapper or with stdenv, so I'm hoping for some advice here. @RaitoBezarius had dropped some comments on the matter before, even though this probably isn't his main area of interest. I understand @Ericson2314, @trofi, @aszlig, and @a-n-n-a-l-e-e have been deeply involved with these subsystems, if any of you have the time? Naturally, ideas from @amjoseph-nixpkgs would be very welcome

@SomeoneSerge SomeoneSerge added the 6.topic: stdenv Standard environment label Jan 24, 2024
@trofi
Copy link
Contributor

trofi commented Jan 24, 2024

Dropping useLibsFrom (or turning into an unconditional error) to prevent proliferation of users sounds reasonable.

Silly question: why not use the default compiler and libstdc++ / libc++ that comes with it? Skimming through #275947 the only broken user I noticed is is a tensorflow library that fails to build. It feels like it only needs a few missing #include files that could be added without much of the breakage. Is it about right? Or there are other more complex breakages? I can help fixing trivial migration issues and debug more complex breakages (as long as they don't require special hardware).

I'm asking to get the idea what the intended use case of swapping a c++ implementation from under the compiler.

In theory the implementation could be swapped out. In practice I see a bunch of constraints that will probably require quite a bit of engineering:

  • picking the right library at link time and runtime: for swap out mechanism to work we need to run (and ideally link as well) against the highest libstdc++ version in the whole linked closure. nixpkgs does not have reasonable tools for that today and if prone to https://trofi.github.io/posts/248-how-do-shared-library-collisions-break.html. Using a single default library is much simpler
  • what to swap: in theory libstdc++ is portable across the compilers, in practice compatibility errors slip into headers as it's not a very frequently exercised path. Do you need to to override both headers and library or just library? Both have different failure modes. Using library matching compiler is a lot simpler
  • libc++ vs libstdc++ play: do you want to allow libc++ override against libstdc++ toolchain or the vice versa?

I'd say both nuts are tough to crack in nixpkgs. But maybe a special case could be crafted in a more robust form if we look at a specific failure modes we have. Would it be reasonable to look at those in more detail?

@SomeoneSerge
Copy link
Contributor Author

Silly question: why not use the default compiler and libstdc++ / libc++ that comes with it?

The intent is to use the old compiler, but omit the old libstdc++ from the closure and from the runpaths. Keeping only the newest libstdc++ at runtime is important for shared libraries, e.g. for native python extensions.

There are two use-cases: 1) intermittent breakages like e.g. tensorflow and openvino right now (I didn't scan for more, but historically there have been many) which are usually fixed with a bunch of includes, but it's a burden for the maintainer to find those and prepare the patches; 2) variants of packages that consistently require the older toolchain, e.g. using cuda's compiler driver.

Using a single default library is much simpler

👍🏻 This is exactly the intended usage. The issue is that if we keep an "adapter", it should do something sensible outside those intended use-cases...

libc++ vs libstdc++ play: do you want to allow libc++ override against libstdc++ toolchain or the vice versa?

My understanding is that libcxx is more or less a replacement for libstdc++, so only one should be used at a time? I don't know how handle libc++ generally, but I want the general solution restricted to .*linux and libstdc++ lead to always using the newest libstdc++ at runtime.

or turning into an unconditional error

I think it's still in early stages and shouldn't have metastasized 😆

@trofi
Copy link
Contributor

trofi commented Jan 24, 2024

There are two use-cases: 1) intermittent breakages like e.g. tensorflow and openvino right now (I didn't scan for more, but historically there have been many) which are usually fixed with a bunch of includes, but it's a burden for the maintainer to find those and prepare the patches

From the process standpoint I would say that most prominent broken users should have been handled as part of the compiler update and should be a prerequisite before the merge into staging. Or at least a prerequisite before merging it to master. Perhaps there is a place to add at least tensorflow as a nixpkgs-unstable channel blocker (unless it's alreayd there)?

Otherwise it makes sense to hold back the package on a previous compiler version (if we had a mechanism for that today).

  1. variants of packages that consistently require the older toolchain, e.g. using cuda's compiler driver.

Does cuda compiler target x86_64? Where does it differ compared to gcc / libstdc++? How the story of porting to a new compiler looks like usually?

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Jan 24, 2024

From the process standpoint...

Blocking nixpkgs-unstable is probably reasonable, but blocking python-updates/staging/master is somewhat unrealistic: it's too many constraints, and way too few maintainers on the python side of things (often just one if we should be honest). Blocking the unstable branches sounds reasonable, I'll check if we do.

Does cuda compiler target x86_64? Where does it differ compared to gcc / libstdc++?

  • We support x86_64-linux and aarch64-linux.
  • We so far only have the experience with nvidia's "compiler driver", nvcc.
  • There are also cuda targets in LLVM, which we haven't yet any experience with.
  • The nvcc "compiler driver" relies either gcc or clang as a back-end. It splices the sources into host and device parts. It imposes major version upperbounds on the compatible back-end compilers
  • We maintain several package scopes under cudaPackages_XX_Y, corresponding to nvidia's releases. Each pins its own back-end gcc. When a package (say, torch) is configured with CUDA support, it takes its stdenv from cudaPackages instead of pkgs (this also looks pretty cumbersome, we'd be happy to remove this)
  • We wrap the compilers so as to use the newest possible libstdc++ even when using an old toolchain. Up until a week ago we've been doing this ad hoc, right now we (ab)use the stdenv adapter under question

To sum up, cuda-stuff requiring old toolchains is no different from tensorflow requiring old toolchains

Otherwise it makes sense to hold back the package on a previous compiler version (if we had a mechanism for that today).

Yes. My suggestion is that we introduce a mechanism for that (useLibsFrom is a possible implementation), and deliberately use it during the gcc updates instead of leaving broken packages behind or doing the upstreams' job of patching to support the new toolchain.

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Jan 25, 2024

An alternative question to ask is: is it reasonable for anyone to rely on gcc11Stdenv coming with gcc11's libstdc++? I'd say right now it seems like an internal, because gccXXStdenvs behave differently from clangStdenv:

❯ nix build github:NixOS/nixpkgs/master#clangStdenv.cc
❯ cat result/nix-support/cc-ldflags 
-L/nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/lib/gcc/x86_64-unknown-linux-gnu/13.2.0 -L/nix/store/np3cndfk53miqg2cilv7vfdxckga665h-gcc-13.2.0-lib/lib ...
❯ nix build github:NixOS/nixpkgs/master#gcc11Stdenv.cc
❯ cat result/nix-support/cc-ldflags 
 -L/nix/store/qzd8pw1rmpbnzl3ivf549i41pilx3xyj-gcc-11.4.0-lib/lib

So maybe instead of attempting the generalization of useLibsFrom we should consider changing the meaning of gccXXStdenvs?

@ConnorBaker
Copy link
Contributor

@SomeoneSerge raised an excellent point in #281371 (comment):

I agree with switching to .override now, but I'd like to emphasize that the issue is relevant in the short term (1mo) too because it'll be harder to delete the adapter after 24.05

Is there a way to track this issue as something which should be addressed prior to 24.05?

@ConnorBaker ConnorBaker added the 6.topic: cuda Parallel computing platform and API label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cuda Parallel computing platform and API 6.topic: stdenv Standard environment
Projects
Status: New
Development

No branches or pull requests

3 participants