Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/source-types.nix: remove redundant sourceType #175498

Closed
wants to merge 2 commits into from
Closed

lib/source-types.nix: remove redundant sourceType #175498

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented May 30, 2022

Description of changes

The provenance type fromSource is redundant, and adds numerous special edge cases. Let's remove it. I pointed this out during the review of #161098; @risicle asked that I submit this as a seperate subsequent PR in order to not further delay the merge of #161098. Therefore, I am submitting it now.

The fromSource provenance type is redundant; the following are semantically indistinguishable:

  meta = {
    sourceProvenance = [];
  }

  meta = {
    sourceProvenance = [ lib.sourceTypes.fromSource ];
  }

Furthermore, fromSource gets special treatment: "a package with no meta.sourceProvenance set implies it has no known sourceTypes other than fromSource.".

This complicates the "is this package built from source" routine. Consider the exemplar allowNonSourcePredicate to permit non-source firmware given in check-meta.nix. This predicate is likely to be a very popular policy choice for nixpkgs users, due to the firmware-saturated state of the consumer-grade hardware market. It changes from:

  allowNonSourcePredicate = with lib.lists; pkg: !(any (p: !p.isSource && p!=lib.sourceTypes.binaryFirmware) (toList pkg.meta.sourceProvenance)

to

  allowNonSourcePredicate = with lib.lists; pkg: any (p: p!=lib.sourceTypes.binaryFirmware) (toList pkg.meta.sourceProvenance);

and, if #175495 merges, even further to:

  allowNonSourcePredicate = pkg: lib.lists.any (p: p!=lib.sourceTypes.binaryFirmware) pkg.meta.sourceProvenance;

Similar simplifications will be enjoyed by any predicates which users write for themselves, or any "provenance functions" which express the sourceProvenance of one package as a function of the sourceProvenance of some subpackage.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

The provenance type `fromSource` is redundant, and adds numerous
special edge cases.  Let's remove it.  I pointed this out during the
review of #161098; @risicle asked that I submit this as a seperate
subsequent PR in order to not further delay the merge of #161098.
Therefore, I am submitting it now.

The `fromSource` provenance type is redundant; the following are
semantically indistinguishable:

```
  meta = {
    sourceProvenance = [];
  }

  meta = {
    sourceProvenance = [ lib.source-types.fromSource ];
  }
```

Furthermore, `fromSource` gets special treatment: "a package with no
`meta.sourceProvenance` set implies it has no *known* `sourceType`s
**other than `fromSource`.**".

This greatly complicates the "is this package built from source"
routine.  Consider the exemplar `allowNonSourcePredicate` to permit
non-source firmware given in check-meta.nix.  This predicate is likely
to be a very popular policy choice for nixpkgs users, due to the level
of firmware-saturation in consumer-grade hardware.  It changes from:

```
  allowNonSourcePredicate = with lib.lists; pkg: !(any (p: !p.isSource && p!=lib.sourceTypes.binaryFirmware) (toList pkg.meta.sourceProvenance)
```

to

```
  allowNonSourcePredicate = lib.lists.any (p: p!=lib.sourceTypes.binaryFirmware) (toList pkg.meta.sourceProvenance);
```

and, if #175495 merges, even further to:

```
  allowNonSourcePredicate = lib.lists.any (p: p!=lib.sourceTypes.binaryFirmware) pkg.meta.sourceProvenance;
```

Similar simplifications will be enjoyed by any predicates which users
write for themselves, or any "provenance functionals" which express
the `sourceProvenance` of one package as a function of the
`sourceProvenance` of some subpackage.
@github-actions github-actions bot added 6.topic: stdenv Standard environment 8.has: documentation This PR adds or changes documentation labels May 30, 2022
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels May 30, 2022
@ghost
Copy link
Author

ghost commented May 30, 2022

@risicle should be added as a reviewer (am I able to add reviewers to my own PRs?)

@pennae pennae requested a review from risicle June 1, 2022 21:17
@risicle
Copy link
Contributor

risicle commented Jun 5, 2022

Without fromSource there would be no way of differentiating between a package that is entirely downloaded binaries and a package which only has some of its dependencies as binaries. See

sourceProvenance = with sourceTypes; [
vs.
sourceProvenance = with sourceTypes; [ binaryBytecode ];
. The practical difference between these two is we have some ability to patch the former if necessary. We should perhaps amend the guidelines to state that a missing or null sourceProvenance is assumed to be [ fromSource ], whereas an empty list has no special meaning.

What we probably neglected to do is add an equivalent of allowlistedLicenses and blocklistedLicenses to make this easier to do on a per-sourceType basis. Honestly it's a bit of a weird interface IMHO and would be more flexible as a callable function, but it would probably be better to match the license-handling behaviour.

@ghost
Copy link
Author

ghost commented Jun 6, 2022

a package which only has some of its dependencies as binaries.

In that case, one of the package's buildInputs (or nativeBuildInputs) would have its own meta.sourceProvenance = [ binarySomething ]. In your ghidra example, it is the separate mkDerivation that you already have for the dependencies:

deps = stdenv.mkDerivation {

You can just add a one-attribute meta = { sourceProvenance = [ binarySomething ]; } to that mkDerivation. This approach documents which of the build process inputs are contributing not-built-from-source software to the final result, rather than just conveying "some are, some aren't".

Note that even fetchurl, fetchSourceHut, etc all boil down to mkDerivation calls, so you can even tag those instead if you prefer! Check this out:

https://github.com/NixOS/nixpkgs/blob/b016543b93f3eeecb1aeb0c300b46e33bf10f9e8/pkgs/applications/misc/hello/default.nix#L17

With allowNonSource=false in your config, nix-build . -A hello will correctly refuse to build it.

We should perhaps amend the guidelines to state that a missing or null sourceProvenance is assumed to be [ fromSource ], whereas an empty list has no special meaning.

That would leave no way to express "no promises, because nobody has looked yet", which is by far the most common situation right now (and probably for the forseeable future). I think the fact that a missing attribute means "no promises" was one of the strong points of your RFC. I liked that a lot, I hope we keep it.

it would probably be better to match the license-handling behaviour.

I may have mentioned this before, but meta.license was the result of evolution rather than design, and as a mandatory field present in every package definition it is really painful to change anything about it. Also: sourceProvenance already diverges from it simply because you're allowed to omit sourceProvenance entirely, but license is mandatory.

@risicle
Copy link
Contributor

risicle commented Jun 7, 2022

...marking src derivations with provenance...

This is something I thought about while I've been trawling through nixpkgs - the problem with it is in many cases a file we download contains binary elements, but we don't use them. A number of java packages for instance include a pre-built .jar for convenience, but we ignore it and build from source. We also have some packages that very cleverly download an electron-based app, pull the app.asar file out of it (which is essentially like a tarball of minified javascript and other resources) and jam it in to our own electron. So the src has all kinds of binary elements, but the resulting package is clean.

Which reminds me we should probably add a minifiedSource type.

We should perhaps amend the guidelines to state that a missing or null sourceProvenance is assumed to be [ fromSource ], whereas an empty list has no special meaning.

That would leave no way to express "no promises, because nobody has looked yet", which is by far the most common situation right now (and probably for the forseeable future). I think the fact that a missing attribute means "no promises" was one of the strong points of your RFC. I liked that a lot, I hope we keep it.

Could still explicitly set meta.sourceProvenance = null;.

it would probably be better to match the license-handling behaviour.

I may have mentioned this before, but meta.license was the result of evolution rather than design, and as a mandatory field present in every package definition it is really painful to change anything about it. Also: sourceProvenance already diverges from it simply because you're allowed to omit sourceProvenance entirely, but license is mandatory.

I realize this, but it still doesn't mean source types should diverge unless they are explicitly better (I won't say indisputably because this is the internet). Ideally using a design that is able to be ported back to the licenses system if it's successful.

The licenses-style design starts off with a couple of points in its column because of consistency, and it also has the advantage of being able to filter both "by license" and "by package" with short expressions. So - I'd only personally support something that can do this and more.

For a clean-slate design, my thinking would be around having a single (replaceable) default deciding function that calls out to another per-(license/sourcetype) function in its inner loop. Though it would also be nice to be able to replace the top-level function while still being able to defer a decision back to the default function ... possibly by returning null? ... or making the default function accessible through lib? Not sure, but the aim would be to allow custom quirks to be added without having to re-implement the whole default function (this is currently achieved by the predicate's decision being AND-ed with the default function's I think?).

@ghost
Copy link
Author

ghost commented Jun 8, 2022

in many cases a file we download contains binary elements, but we don't use them.

If you're sure those binary elements aren't used in building the package's derivation outputs, then they are not relevant to the package's sourceProvenance:

The value of a package's meta.sourceProvenance attribute specifies the provenance of the package's derivation outputs.

If you're unsure, or want to guard against future changes, a good solution is to simply delete the unused elements before building, for example in postUnpack. That's what I did in #174691 which was a much more serious situation affecting meta.license -- not only were we not building the binary from source, the source code was not even available.

Ideally using a design that is able to be ported back to the licenses system if it's successful.

That's extremely unrealistic. The meta.license system is in such widespread use that making any kind of breaking change is excruciatingly painful. I doubt any kind of backwards-incompatible change will ever be made to it.

I realize this, but it still doesn't mean source types should diverge unless they are explicitly better

I agree.

I think that the simplification proposed by this PR is explicitly better (otherwise I wouldn't have opened it!)

@risicle
Copy link
Contributor

risicle commented Jun 9, 2022

If you're sure those binary elements aren't used in building the package's derivation outputs, then they are not relevant to the package's sourceProvenance:

Sure, that's what I do - but here I was referring to why it's not sufficient to just mark the downloaded source file(s) derivation with a sourceProvenance as a potential solution to not being able to sensibly tag a part-compiled-part-binary package without having fromSource. Sometimes the src file is literally a tarball with the source code ... and a directory full of dependencies as jars. These packages deserve to differentiable from those that are pure jar downloads.

@risicle
Copy link
Contributor

risicle commented Jun 9, 2022

Ideally using a design that is able to be ported back to the licenses system if it's successful.

That's extremely unrealistic.

I don't think it would be particularly difficult to achieve. The new mechanism simply needs to be able to implement the old mechanism as its default behaviour, using replaceable functions.

I think that the simplification proposed by this PR is explicitly better (otherwise I wouldn't have opened it!)

Except it still can't allow or disallow by-source-type without having to replicate the whole loop logic in the predicate function, whereas duplicating the licenses' allowlist/blocklist attributes would enable users to make simple additions to lists.

I don't see it as worth diverging for something that's just differently annoying.

@ghost
Copy link
Author

ghost commented Jun 10, 2022

sensibly tag a part-compiled-part-binary package without having fromSource. Sometimes the src file is literally a tarball with the source code ... and a directory full of dependencies as jars.

If you have the burning urge to classify different individual files within a tarball, you do it like this:

  1. Delete the jars in postUnpack
  2. Mark the package meta.sourceProvenance = [ ] since you deleted the non-source components.
  3. Add a fetchurl { ... meta { sourceProvenance = [ binaryBytecode ] } } block with the same url as src. Nix will only download the url once.

I have to say this is a pretty niche use case. But it is representable nonetheless.

I think that the simplification proposed by this PR is explicitly better (otherwise I wouldn't have opened it!)

Except it still can't allow or disallow by-source-type without having to replicate the whole loop logic in the predicate function,

I'm having trouble understanding what you mean here. Could you please give a concrete example of "replicate the whole loop logic in the predicate function"? You mean replicating the 53-character string "lib.lists.any (p: p!=lib.sourceTypes.binaryFirmware)"?

@ghost
Copy link
Author

ghost commented Jun 19, 2022

😞

@ghost ghost closed this Jun 19, 2022
@ghost ghost deleted the pr/sourceProvenance/remove-redunant branch January 23, 2024 06:50
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: stdenv Standard environment 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant