Prioritize matching build metadata for '=' requirements #9548

dtolnay · 2021-06-06T21:21:18Z

This PR makes lz4-sys = "=1.0.1+1.7.3" resolve to https://crates.io/crates/lz4-sys/1.0.1+1.7.3 while lz4-sys = "=1.0.1+1.7.5" resolves to https://crates.io/crates/lz4-sys/1.0.1+1.7.5. Previously, build metadata in a requirement was irrelevant.

This is not in opposition to the semver ranges proto-spec, which has this to say:

Implementations MAY reject Ranges with SemVer strings that include build metadata, but if they do not, then they MUST ignore any build metadata when evaluating versions for inclusion.

Build metadata continues to be ignored when evaluation versions for inclusion (i.e. whether a version is considered matching the version requirement). Build metadata only comes into play once there is a set of multiple matching versions that Cargo needs to choose among, which is not a behavior that is governed by the spec. Versions with a different build metadata are still considered matching, and one may use cargo update --precise to update among them.

Examples:

$ tail Cargo.toml
[dependencies]
lz4-sys = "=1.0.1+1.7.3"

$ cargo generate-lockfile   resolves to 1.0.1+1.7.3

$ cargo update -p lz4-sys --precise 1.0.1+1.7.5   okay

$ tail Cargo.toml
[dependencies]
lz4-sys = "=1.0.1+1.7.5"

$ cargo generate-lockfile   resolves to 1.0.1+1.7.5

$ cargo update -p lz4-sys --precise 1.0.1+1.7.3   okay (also a match)

$ tail Cargo.toml
[dependencies]
lz4-sys = "=1.0.1+asdf"

$ cargo generate-lockfile   resolves to 1.0.1+1.7.5

$ cargo generate-lockfile -Z minimal-versions   resolves to 1.0.1+1.7.3

rust-highfive · 2021-06-06T21:21:20Z

r? @alexcrichton

(rust-highfive has picked a reviewer for you, use r? to override)

ehuss · 2021-06-06T21:36:49Z

Just FYI, there is more discussion about handling build metadata in rust-lang/crates.io#1059. I think everyone is leaning towards not allowing multiple packages that differ only in the metadata string. I think it is fine for this PR to match more tightly, but the intent is to not allow this to happen in the first place.

alexcrichton

Could you expand a bit more on the motivation for this as well? It makes sense in that I understand what this does, but I'm not sure why we'd want to do this. (ideally we'd just ignore this since it'll become invalid on crates.io at some point in the future)

alexcrichton · 2021-06-07T21:50:50Z

src/cargo/core/resolver/dep_cache.rs

+            dep.specified_req()
+                .unwrap()
+                .trim()
+                .trim_end_matches(&v.to_string())


Historically the resolver at least as a whole is definitely performance sensitive, although I don't recall if this exact region is so sensitive to performance. I would expect that we have a lot of = exact requirements which typically come from "locked" dependencies and lockfiles (an artifact of how Cargo implemented lock files originally, very dated now...), so this may end up getting executed a lot.

Would you be up for trying to gather some peformance data about the impact of this comparison? I'm mostly worried about the temporarily allocated string here and the various string manipulations, and if they should up at all in comparisons or anything. One of the better ways I've found to benchmark the resolver is to use the --build-plan option which avoids actually building anything but still runs the resolver the same way cargo build does.

Or another way to put this is that this feels somewhat odd doing a bunch of string manipulations here, could the semver crate be used instead to extract build metadata and keep that stored somewhere? (or something like that)

This code is in a closure that only executes if all of the following are true:

dep.version_req() is an = requirement.

dep.version_req() gives a specific patch version, i.e. =x.y.z or =x.y.z-pre or =x.y.z+meta or =x.y.z-pre+meta, but not =x.y or =x;

ret contains multiple matching candidates that must be sorted. If ret contains only 1 possible candidate, then ret.sort_unstable_by is never going to run the argument closure.

Because of how = with a specific patch version works, the only possible matching candidates (the contents of ret) all have v.major == req.major && v.minor == req.minor && v.patch == req.patch && v.pre == req.pre, meaning it's only possible for ret to contain more than one matching candidate if those candidates all have identical major/minor/patch/pre and differ only in metadata.

So I wouldn't expect a string allocation or string manipulation in this closure to affect resolver performance in general, except when dealing with a large dependency tree consisting overwhelmingly of versions with identical major/minor/patch/pre and differing metadata, which doesn't happen in practice.

For benchmarking do you mostly care about typical dependency trees in which versions don't tend to differ only in metadata (and this closure would therefore execute 0 times), or about worst case performance where every crate has multiple of the same version everywhere?

There is a self.registry_cache around this method, so it is called at most once per dep per call to resolve. So I would not expect it to be hot.

Oh it makes sense that this doesn't really run that often, but Cargo's resolver runs (unintuitively) twice on each build. The first is the "real" one and the latter is a re-resolution to get a smaller set of dependencies (it's weird and it's a bug we've been meaning to fix, well, forever). The implementation with lock files will also synthetically switch many dependencies to = dependencies under the hood to ensure the same resolution happens once you already have a lock file (again this is a bug we'd like to fix...)

I think you're right though that the closure probably won't get run since most of the candidate lists will be of length 1. I'm not really worried at all about candidate lists with tons of build metadata and same versions. That being said I'd personally appreciate some smoke testing at least to confirm that this doesn't have a performance impact (as expected).

For performance I'd be interested in big trees, not any particular shape of a tree per se. Something like rust-lang/rust is probably a big enough tree that just doing cargo build --build-plan for rustc itself should show a meaningful difference if there is one (and we'd expect no difference here really)

alexcrichton · 2021-06-07T21:51:32Z

src/cargo/core/resolver/dep_cache.rs

+                }
+            }
+
+            // Prefer prioritized summaries (those in `try_to_use`).
            let a_in_previous = self.try_to_use.contains(&a.package_id());


How come this is no longer prioritized? Were you running into issues which forced the new comparison to come first?

Naively I'd sort of expect this to come after the semver compare, in that if we really care about same-version-different-build-metadata that seems like it'd be part of the semver comparison function below for ordering, basically doing first a sort on version and then a sort on "does this equal or not"

Good call, I think the previous_cmp comparison should go above the exact_match_cmp comparison. The case I had in mind writing this code was that somebody has =x.y.z+build1 in Cargo.toml, resolves it, then edits Cargo.toml to say =x.y.z+build2, and resolves again. In that case probably what they meant is for the exact match (build2) to be chosen.

However, that person's use case is less important than somebody else who has =x.y.z+build2 in Cargo.toml, ran cargo update --precise x.y.z+build1, then makes some unrelated change (maybe adding a dependency) in Cargo.toml which causes the resolver to rerun. That person probably wants the previous match (build1) to remain chosen.

Yeah I think this is where multiple versions with the same build metadata doesn't really make sense. I see what you mean about how one ordering here respects lock files while the other ordering respects --precise. I don't think we can support both simultaneously because we don't know what manifest was used to generate a lock file. It seems reasonable to prefer the --precise flag for now I guess?

This whole area seems like it's just full of surprising gotchas and is ripe for us receiving lots of bug reports if it's used too much... Hopefully it's not though?

dtolnay · 2021-06-07T22:52:23Z

Could you expand a bit more on the motivation for this as well?

Closing up loose ends from dtolnay/semver#107 (comment). The approach here is based on @sgrif advocating that metadata in = requirements be relevant in dtolnay/semver#107 (comment), but reconciles that with what's written in the standard.

alexcrichton · 2021-06-09T15:24:51Z

I think it may actually make the most sense for Cargo to print a warning about a version requirement with build metadata, since it's explicitly ignored. I think we could support the cargo update --precise case explicitly to match a very specific build metadata, but otherwise I'd ideally like to follow the semver spec here and simply ignore build metadata.

I sort of wish we didn't even parse it if it's ignored, but that ship is probably sailed so all we can do now is print a warning.

dtolnay · 2021-06-09T17:23:36Z

Makes sense. I'll go ahead and close since your description is already how Cargo works. There is a warning for dependency reqs in Cargo.toml containing build metadata, but cargo update --precise still respects the metadata.

cargo/src/cargo/util/toml/mod.rs

Lines 1721 to 1728 in 66686fd

    
                       if version.contains('+') { 
        
                           cx.warnings.push(format!( 
        
                               "version requirement `{}` for dependency `{}` \ 
        
                                includes semver metadata which will be ignored, removing the \ 
        
                                metadata is recommended to avoid confusion", 
        
                               version, name_in_toml 
        
                           )); 
        
                       }

sgrif · 2021-06-09T17:29:10Z

Thanks for following up on that @dtolnay ❤️

alexcrichton · 2021-06-09T18:18:14Z

Oh geez I clearly understand how Cargo works as-is...

dtolnay added 3 commits June 6, 2021 13:44

Add test of choosing between same versions with different metadata

74c9cf2

Prioritize matching build metadata for '=' requirements

2767dec

Add cargo update test with exact req

5150c62

rust-highfive assigned alexcrichton Jun 6, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 6, 2021

alexcrichton reviewed Jun 7, 2021

View reviewed changes

dtolnay closed this Jun 9, 2021

dtolnay deleted the exact branch June 9, 2021 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritize matching build metadata for '=' requirements #9548

Prioritize matching build metadata for '=' requirements #9548

dtolnay commented Jun 6, 2021

rust-highfive commented Jun 6, 2021

ehuss commented Jun 6, 2021

alexcrichton left a comment

alexcrichton Jun 7, 2021

alexcrichton Jun 7, 2021

dtolnay Jun 7, 2021

Eh2406 Jun 7, 2021

alexcrichton Jun 9, 2021

alexcrichton Jun 7, 2021

alexcrichton Jun 7, 2021

dtolnay Jun 7, 2021

alexcrichton Jun 9, 2021

dtolnay commented Jun 7, 2021 •

edited

Loading

alexcrichton commented Jun 9, 2021

dtolnay commented Jun 9, 2021

sgrif commented Jun 9, 2021

alexcrichton commented Jun 9, 2021

Prioritize matching build metadata for '=' requirements #9548

Prioritize matching build metadata for '=' requirements #9548

Conversation

dtolnay commented Jun 6, 2021

Examples:

rust-highfive commented Jun 6, 2021

ehuss commented Jun 6, 2021

alexcrichton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dtolnay commented Jun 7, 2021 • edited Loading

alexcrichton commented Jun 9, 2021

dtolnay commented Jun 9, 2021

sgrif commented Jun 9, 2021

alexcrichton commented Jun 9, 2021

dtolnay commented Jun 7, 2021 •

edited

Loading