Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize matching build metadata for '=' requirements #9548

Closed
wants to merge 3 commits into from

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Jun 6, 2021

This PR makes lz4-sys = "=1.0.1+1.7.3" resolve to https://crates.io/crates/lz4-sys/1.0.1+1.7.3 while lz4-sys = "=1.0.1+1.7.5" resolves to https://crates.io/crates/lz4-sys/1.0.1+1.7.5. Previously, build metadata in a requirement was irrelevant.

This is not in opposition to the semver ranges proto-spec, which has this to say:

Implementations MAY reject Ranges with SemVer strings that include build metadata, but if they do not, then they MUST ignore any build metadata when evaluating versions for inclusion.

Build metadata continues to be ignored when evaluation versions for inclusion (i.e. whether a version is considered matching the version requirement). Build metadata only comes into play once there is a set of multiple matching versions that Cargo needs to choose among, which is not a behavior that is governed by the spec. Versions with a different build metadata are still considered matching, and one may use cargo update --precise to update among them.

Examples:

$ tail Cargo.toml
[dependencies]
lz4-sys = "=1.0.1+1.7.3"

$ cargo generate-lockfile   resolves to 1.0.1+1.7.3

$ cargo update -p lz4-sys --precise 1.0.1+1.7.5   okay
$ tail Cargo.toml
[dependencies]
lz4-sys = "=1.0.1+1.7.5"

$ cargo generate-lockfile   resolves to 1.0.1+1.7.5

$ cargo update -p lz4-sys --precise 1.0.1+1.7.3   okay (also a match)
$ tail Cargo.toml
[dependencies]
lz4-sys = "=1.0.1+asdf"

$ cargo generate-lockfile   resolves to 1.0.1+1.7.5

$ cargo generate-lockfile -Z minimal-versions   resolves to 1.0.1+1.7.3

@rust-highfive
Copy link

r? @alexcrichton

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 6, 2021
@ehuss
Copy link
Contributor

ehuss commented Jun 6, 2021

Just FYI, there is more discussion about handling build metadata in rust-lang/crates.io#1059. I think everyone is leaning towards not allowing multiple packages that differ only in the metadata string. I think it is fine for this PR to match more tightly, but the intent is to not allow this to happen in the first place.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand a bit more on the motivation for this as well? It makes sense in that I understand what this does, but I'm not sure why we'd want to do this. (ideally we'd just ignore this since it'll become invalid on crates.io at some point in the future)

dep.specified_req()
.unwrap()
.trim()
.trim_end_matches(&v.to_string())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically the resolver at least as a whole is definitely performance sensitive, although I don't recall if this exact region is so sensitive to performance. I would expect that we have a lot of = exact requirements which typically come from "locked" dependencies and lockfiles (an artifact of how Cargo implemented lock files originally, very dated now...), so this may end up getting executed a lot.

Would you be up for trying to gather some peformance data about the impact of this comparison? I'm mostly worried about the temporarily allocated string here and the various string manipulations, and if they should up at all in comparisons or anything. One of the better ways I've found to benchmark the resolver is to use the --build-plan option which avoids actually building anything but still runs the resolver the same way cargo build does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or another way to put this is that this feels somewhat odd doing a bunch of string manipulations here, could the semver crate be used instead to extract build metadata and keep that stored somewhere? (or something like that)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is in a closure that only executes if all of the following are true:

  • dep.version_req() is an = requirement.
  • dep.version_req() gives a specific patch version, i.e. =x.y.z or =x.y.z-pre or =x.y.z+meta or =x.y.z-pre+meta, but not =x.y or =x;
  • ret contains multiple matching candidates that must be sorted. If ret contains only 1 possible candidate, then ret.sort_unstable_by is never going to run the argument closure.

Because of how = with a specific patch version works, the only possible matching candidates (the contents of ret) all have v.major == req.major && v.minor == req.minor && v.patch == req.patch && v.pre == req.pre, meaning it's only possible for ret to contain more than one matching candidate if those candidates all have identical major/minor/patch/pre and differ only in metadata.

So I wouldn't expect a string allocation or string manipulation in this closure to affect resolver performance in general, except when dealing with a large dependency tree consisting overwhelmingly of versions with identical major/minor/patch/pre and differing metadata, which doesn't happen in practice.

For benchmarking do you mostly care about typical dependency trees in which versions don't tend to differ only in metadata (and this closure would therefore execute 0 times), or about worst case performance where every crate has multiple of the same version everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a self.registry_cache around this method, so it is called at most once per dep per call to resolve. So I would not expect it to be hot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it makes sense that this doesn't really run that often, but Cargo's resolver runs (unintuitively) twice on each build. The first is the "real" one and the latter is a re-resolution to get a smaller set of dependencies (it's weird and it's a bug we've been meaning to fix, well, forever). The implementation with lock files will also synthetically switch many dependencies to = dependencies under the hood to ensure the same resolution happens once you already have a lock file (again this is a bug we'd like to fix...)

I think you're right though that the closure probably won't get run since most of the candidate lists will be of length 1. I'm not really worried at all about candidate lists with tons of build metadata and same versions. That being said I'd personally appreciate some smoke testing at least to confirm that this doesn't have a performance impact (as expected).

For performance I'd be interested in big trees, not any particular shape of a tree per se. Something like rust-lang/rust is probably a big enough tree that just doing cargo build --build-plan for rustc itself should show a meaningful difference if there is one (and we'd expect no difference here really)

}
}

// Prefer prioritized summaries (those in `try_to_use`).
let a_in_previous = self.try_to_use.contains(&a.package_id());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this is no longer prioritized? Were you running into issues which forced the new comparison to come first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively I'd sort of expect this to come after the semver compare, in that if we really care about same-version-different-build-metadata that seems like it'd be part of the semver comparison function below for ordering, basically doing first a sort on version and then a sort on "does this equal or not"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I think the previous_cmp comparison should go above the exact_match_cmp comparison. The case I had in mind writing this code was that somebody has =x.y.z+build1 in Cargo.toml, resolves it, then edits Cargo.toml to say =x.y.z+build2, and resolves again. In that case probably what they meant is for the exact match (build2) to be chosen.

However, that person's use case is less important than somebody else who has =x.y.z+build2 in Cargo.toml, ran cargo update --precise x.y.z+build1, then makes some unrelated change (maybe adding a dependency) in Cargo.toml which causes the resolver to rerun. That person probably wants the previous match (build1) to remain chosen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is where multiple versions with the same build metadata doesn't really make sense. I see what you mean about how one ordering here respects lock files while the other ordering respects --precise. I don't think we can support both simultaneously because we don't know what manifest was used to generate a lock file. It seems reasonable to prefer the --precise flag for now I guess?

This whole area seems like it's just full of surprising gotchas and is ripe for us receiving lots of bug reports if it's used too much... Hopefully it's not though?

@dtolnay
Copy link
Member Author

dtolnay commented Jun 7, 2021

Could you expand a bit more on the motivation for this as well?

Closing up loose ends from dtolnay/semver#107 (comment). The approach here is based on @sgrif advocating that metadata in = requirements be relevant in dtolnay/semver#107 (comment), but reconciles that with what's written in the standard.

@alexcrichton
Copy link
Member

I think it may actually make the most sense for Cargo to print a warning about a version requirement with build metadata, since it's explicitly ignored. I think we could support the cargo update --precise case explicitly to match a very specific build metadata, but otherwise I'd ideally like to follow the semver spec here and simply ignore build metadata.

I sort of wish we didn't even parse it if it's ignored, but that ship is probably sailed so all we can do now is print a warning.

@dtolnay
Copy link
Member Author

dtolnay commented Jun 9, 2021

Makes sense. I'll go ahead and close since your description is already how Cargo works. There is a warning for dependency reqs in Cargo.toml containing build metadata, but cargo update --precise still respects the metadata.

if version.contains('+') {
cx.warnings.push(format!(
"version requirement `{}` for dependency `{}` \
includes semver metadata which will be ignored, removing the \
metadata is recommended to avoid confusion",
version, name_in_toml
));
}

@dtolnay dtolnay closed this Jun 9, 2021
@dtolnay dtolnay deleted the exact branch June 9, 2021 17:23
@sgrif
Copy link
Contributor

sgrif commented Jun 9, 2021

Thanks for following up on that @dtolnay ❤️

@alexcrichton
Copy link
Member

Oh geez I clearly understand how Cargo works as-is...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants