-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prioritize matching build metadata for '=' requirements #9548
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
Just FYI, there is more discussion about handling build metadata in rust-lang/crates.io#1059. I think everyone is leaning towards not allowing multiple packages that differ only in the metadata string. I think it is fine for this PR to match more tightly, but the intent is to not allow this to happen in the first place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand a bit more on the motivation for this as well? It makes sense in that I understand what this does, but I'm not sure why we'd want to do this. (ideally we'd just ignore this since it'll become invalid on crates.io at some point in the future)
dep.specified_req() | ||
.unwrap() | ||
.trim() | ||
.trim_end_matches(&v.to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically the resolver at least as a whole is definitely performance sensitive, although I don't recall if this exact region is so sensitive to performance. I would expect that we have a lot of =
exact requirements which typically come from "locked" dependencies and lockfiles (an artifact of how Cargo implemented lock files originally, very dated now...), so this may end up getting executed a lot.
Would you be up for trying to gather some peformance data about the impact of this comparison? I'm mostly worried about the temporarily allocated string here and the various string manipulations, and if they should up at all in comparisons or anything. One of the better ways I've found to benchmark the resolver is to use the --build-plan
option which avoids actually building anything but still runs the resolver the same way cargo build
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or another way to put this is that this feels somewhat odd doing a bunch of string manipulations here, could the semver
crate be used instead to extract build metadata and keep that stored somewhere? (or something like that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is in a closure that only executes if all of the following are true:
dep.version_req()
is an=
requirement.dep.version_req()
gives a specific patch version, i.e.=x.y.z
or=x.y.z-pre
or=x.y.z+meta
or=x.y.z-pre+meta
, but not=x.y
or=x
;ret
contains multiple matching candidates that must be sorted. Ifret
contains only 1 possible candidate, thenret.sort_unstable_by
is never going to run the argument closure.
Because of how =
with a specific patch version works, the only possible matching candidates (the contents of ret
) all have v.major == req.major && v.minor == req.minor && v.patch == req.patch && v.pre == req.pre
, meaning it's only possible for ret
to contain more than one matching candidate if those candidates all have identical major/minor/patch/pre and differ only in metadata.
So I wouldn't expect a string allocation or string manipulation in this closure to affect resolver performance in general, except when dealing with a large dependency tree consisting overwhelmingly of versions with identical major/minor/patch/pre and differing metadata, which doesn't happen in practice.
For benchmarking do you mostly care about typical dependency trees in which versions don't tend to differ only in metadata (and this closure would therefore execute 0 times), or about worst case performance where every crate has multiple of the same version everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a self.registry_cache
around this method, so it is called at most once per dep per call to resolve. So I would not expect it to be hot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it makes sense that this doesn't really run that often, but Cargo's resolver runs (unintuitively) twice on each build. The first is the "real" one and the latter is a re-resolution to get a smaller set of dependencies (it's weird and it's a bug we've been meaning to fix, well, forever). The implementation with lock files will also synthetically switch many dependencies to =
dependencies under the hood to ensure the same resolution happens once you already have a lock file (again this is a bug we'd like to fix...)
I think you're right though that the closure probably won't get run since most of the candidate lists will be of length 1. I'm not really worried at all about candidate lists with tons of build metadata and same versions. That being said I'd personally appreciate some smoke testing at least to confirm that this doesn't have a performance impact (as expected).
For performance I'd be interested in big trees, not any particular shape of a tree per se. Something like rust-lang/rust is probably a big enough tree that just doing cargo build --build-plan
for rustc itself should show a meaningful difference if there is one (and we'd expect no difference here really)
} | ||
} | ||
|
||
// Prefer prioritized summaries (those in `try_to_use`). | ||
let a_in_previous = self.try_to_use.contains(&a.package_id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this is no longer prioritized? Were you running into issues which forced the new comparison to come first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naively I'd sort of expect this to come after the semver compare, in that if we really care about same-version-different-build-metadata that seems like it'd be part of the semver comparison function below for ordering, basically doing first a sort on version and then a sort on "does this equal or not"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I think the previous_cmp comparison should go above the exact_match_cmp comparison. The case I had in mind writing this code was that somebody has =x.y.z+build1
in Cargo.toml, resolves it, then edits Cargo.toml to say =x.y.z+build2
, and resolves again. In that case probably what they meant is for the exact match (build2) to be chosen.
However, that person's use case is less important than somebody else who has =x.y.z+build2
in Cargo.toml, ran cargo update --precise x.y.z+build1
, then makes some unrelated change (maybe adding a dependency) in Cargo.toml which causes the resolver to rerun. That person probably wants the previous match (build1) to remain chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this is where multiple versions with the same build metadata doesn't really make sense. I see what you mean about how one ordering here respects lock files while the other ordering respects --precise
. I don't think we can support both simultaneously because we don't know what manifest was used to generate a lock file. It seems reasonable to prefer the --precise
flag for now I guess?
This whole area seems like it's just full of surprising gotchas and is ripe for us receiving lots of bug reports if it's used too much... Hopefully it's not though?
Closing up loose ends from dtolnay/semver#107 (comment). The approach here is based on @sgrif advocating that metadata in |
I think it may actually make the most sense for Cargo to print a warning about a version requirement with build metadata, since it's explicitly ignored. I think we could support the I sort of wish we didn't even parse it if it's ignored, but that ship is probably sailed so all we can do now is print a warning. |
Makes sense. I'll go ahead and close since your description is already how Cargo works. There is a warning for dependency reqs in Cargo.toml containing build metadata, but cargo/src/cargo/util/toml/mod.rs Lines 1721 to 1728 in 66686fd
|
Thanks for following up on that @dtolnay ❤️ |
Oh geez I clearly understand how Cargo works as-is... |
This PR makes
lz4-sys = "=1.0.1+1.7.3"
resolve to https://crates.io/crates/lz4-sys/1.0.1+1.7.3 whilelz4-sys = "=1.0.1+1.7.5"
resolves to https://crates.io/crates/lz4-sys/1.0.1+1.7.5. Previously, build metadata in a requirement was irrelevant.This is not in opposition to the semver ranges proto-spec, which has this to say:
Build metadata continues to be ignored when evaluation versions for inclusion (i.e. whether a version is considered matching the version requirement). Build metadata only comes into play once there is a set of multiple matching versions that Cargo needs to choose among, which is not a behavior that is governed by the spec. Versions with a different build metadata are still considered matching, and one may use
cargo update --precise
to update among them.Examples: