Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix 'uv pip install' handling of gzip'd response and PEP 691 #1978

Merged
merged 1 commit into from
Feb 26, 2024

Conversation

thundergolfer
Copy link
Contributor

@thundergolfer thundergolfer commented Feb 26, 2024

Thank you for writing uv! We're already using it internally on some container image builds and finding that it's noticeably faster 💯

Summary

I was attempting to use uv alongside modal's internal PyPi mirror and ran into some issues. The first issue was the following error:

error: Failed to download: nltk==3.8.1
  Caused by: content-length header is missing from response

This error was coming from within RegistryClient::wheel_metadata_no_pep658. By logging requests on the client (uv) and server (internal mirror) sides I've concluded that it's occurring because uv is sending a header suggesting that it can accept a gzip'd response, but decompressing the gzip'd response strips the content-length header: seanmonstar/reqwest#294.

Logged request, client-side:

0.981664s   0ms  INFO uv_client::registry_client JONO, REQ: Request { method: HEAD, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(172.21.0.1)), port: Some(5555), path: "/simple/joblib/joblib-1.3.2-py3-none-any.whl", query: None, fragment: None }, headers: {} }

No headers set explicitly by uv.

Logged request, server-side:

2024-02-26T03:45:08.598272Z DEBUG pypi_mirror: origin request = Request { method: HEAD, uri: /simple/joblib/joblib-1.3.2-py3-none-any.whl, version: HTTP/1.1, headers: {"accept": "*/*", "user-agent": "uv", "accept-encoding": "gzip, br", "host": "172.21.0.1:5555"}, body: Body(Empty) }

Server receives "accept-encoding": "gzip, br",.

My change adding the header to the request fixed this issue. But our internal mirror is just passing through PyPI responses and PyPI responses do contain PEP 658 data, and so wheel_metadata_no_pep658 shouldn't execute.

The issue there is that the PyPi response field has dashes not underscores (https://peps.python.org/pep-0691/).

image

After changing the alias the PEP 658 codepath now runs correctly :)

Test Plan

I tested by installing against both our mirror and against PyPi:

RUST_LOG="uv=trace" UV_NO_CACHE=true UV_INDEX_URL="http://172.21.0.1:5555/simple" target/release/uv pip install -v nltk
RUST_LOG="uv=trace" UV_NO_CACHE=true UV_INDEX_URL="http://172.21.0.1:5555/simple" target/release/uv pip uninstall -v nltk
target/release/uv pip install -v nltk
target/release/uv pip uninstall -v nltk

@@ -459,6 +459,10 @@ impl RegistryClient {
.client
.uncached()
.head(url.clone())
.header(
"accept-encoding",
http::HeaderValue::from_static("identity"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iiinteresting, so when the server returned a gzipped response here, we effectively lost the headers that are required for the range request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is related: #1754.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, that issue suggests that Content-Type is preserved, nevermind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep they get stripped because they're not accurate after decompression seanmonstar/reqwest#294

@@ -38,7 +38,7 @@ fn sorted_simple_json_files<'de, D: Deserializer<'de>>(d: D) -> Result<Vec<File>
#[serde(rename_all = "kebab-case")]
pub struct File {
// Non-PEP 691-compliant alias used by PyPI.
#[serde(alias = "data_dist_info_metadata")]
#[serde(alias = "data-dist-info-metadata")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's a huge oversight, thank you! This could make a big difference in cold resolution time actually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

@charliermarsh
Copy link
Member

I'm gonna benchmark this real quick.

@charliermarsh
Copy link
Member

The bad news is that we missed that typo and it's embarrassing. The good news is that it makes cold resolution like 30-40% faster for free:

❯ python -m scripts.bench \
        --uv-path ./target/release/uv \
        --uv-path ./target/release/main \
        ./scripts/requirements/jupyter.in --benchmark resolve-cold --min-runs 20
Benchmark 1: ./target/release/uv (resolve-cold)
  Time (mean ± σ):     627.2 ms ±  33.8 ms    [User: 215.1 ms, System: 97.3 ms]
  Range (min … max):   567.0 ms … 719.3 ms    20 runs

Benchmark 2: ./target/release/main (resolve-cold)
  Time (mean ± σ):     838.0 ms ±  38.9 ms    [User: 228.0 ms, System: 110.9 ms]
  Range (min … max):   775.9 ms … 897.9 ms    20 runs

Summary
  './target/release/uv (resolve-cold)' ran
    1.34 ± 0.10 times faster than './target/release/main (resolve-cold)'

@charliermarsh charliermarsh merged commit c80d5c6 into astral-sh:main Feb 26, 2024
7 checks passed
@charliermarsh
Copy link
Member

Thanks for the great find! I'll release this in the morning.

@charliermarsh charliermarsh added performance Potential performance improvement great writeup A wonderful example of a quality contribution 💜 labels Feb 26, 2024
@zanieb
Copy link
Member

zanieb commented Feb 26, 2024

Thank you! This is great work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
great writeup A wonderful example of a quality contribution 💜 performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants