-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small reads in uv_distribution::distribution_database::download producing v. slow download performance when installing torch #2220
Comments
I'm happy to look more into what's going on later in the week 🙂. I've read through some of the code in |
Cool, would definitely welcome more exploration / eyes on it and happy to support. We unzip the wheel to disk directly as we stream the download in-memory, it looks like we're making way too many small reads right now? |
Yeah, I added It'd take me a decent bit more time to unpack why these small reads are being produced and slap a buffered reader in the right place. |
Is there any way to reproduce this with publicly available indexes? |
Yeh I think that's job 1 so that we can collab better on this! We won't open-source our mirror implementation just yet. |
This behavior seems very similar to one I’m seeing as well. For our internal index, proxied public packages have good performance even when wheel is large (PyTorch), but internal packages with large wheels are much slower with uv then pip. The only thing I can add is I know for our index we don’t support range requests/head so maybe the fallback path there is needed to trigger slow behavior here. |
We don't use range requests for the wheel download IIRC, just for fetching wheel metadata, so hopefully not a factor here. |
Adding some details of attempting a repro with an open-source mirror, proxpi:
Has
With just PIP_TRUSTED_HOST="127.0.0.1:5000" PIP_NO_CACHE_DIR=off PIP_INDEX_URL="http://127.0.0.1:5000/index" pip install torch 2024-03-09 04:39:12 [ INFO] gunicorn.access: 172.18.0.1 "GET /index/torch/ HTTP/1.1" 200 24056 7ms
2024-03-09 04:39:14 [ INFO] gunicorn.access: 172.18.0.1 "GET /index/torch/torch-2.2.1-cp311-cp311-manylinux1_x86_64.whl HTTP/1.1" 200 0 2080ms I get around 2s. So there's a difference, but nothing like the ~5 minute slowdown seen in our internal mirror. So a negative result on providing an open-source reproduction. I'll keep digging. |
I'd really like to fix this. I'll try to play around with it today. The simplest thing we should try: diff --git a/crates/uv-extract/src/stream.rs b/crates/uv-extract/src/stream.rs
index 8d8dfb11..b9272df5 100644
--- a/crates/uv-extract/src/stream.rs
+++ b/crates/uv-extract/src/stream.rs
@@ -18,7 +18,7 @@ pub async fn unzip<R: tokio::io::AsyncRead + Unpin>(
target: impl AsRef<Path>,
) -> Result<(), Error> {
let target = target.as_ref();
- let mut reader = reader.compat();
+ let mut reader = futures::io::BufReader::new( reader.compat());
let mut zip = async_zip::base::read::stream::ZipFileReader::new(&mut reader);
let mut directories = FxHashSet::default(); (It looks like the async ZIP reader uses a BufReader internally, but only for the entry bodies, and not the headers.) |
Not seeing any improvement from this change. FWIW, on my machine, the difference is much smaller with the open-source proxy. pip:
uv:
Which is maybe plausible given that we're unzipping and writing to disk -- not sure. |
We could also try increasing the buffer size. |
Ok from looking at trace logs for our internal mirror I think the encoding strategy difference is key:
Will work out why this difference occurs. |
Ohh nice! |
I think this is another I added some extra uv
pip
Seeing By doing the same header change that's done in #1978 torch installs much, much faster against the mirror! ![]() A 15.64s second download against the mirror is still slower than what Checking the |
Can you try tweaking the buffer size on the BufReader? I think 8KiB might be the exact default. |
You mean doing this but with a non-default buffer size? #2220 (comment) |
Yeah. Alternatively I can change the buffer used inside async-zip-rs, we use a fork anyway. |
(I don’t have much intuition for how nested BufReaders interact.) |
Think this is a enough to make a PR. It'd be two things:
|
Sounds great, thank you! The only thing I’ll want to test before merging is whether the buffer size changes benchmarks much / at all against PyPI. |
Makes sense. Looks like I could even check this myself by reading |
Totally, up to you! I'd just do a release build on main, move it to
|
By the way, I can get this released pretty quickly. |
## Summary Addressing the extremely slow performance detailed in #2220. There are two changes to increase download performance: 1. setting `accept-encoding: identity`, in the spirit of pypa/pip#1688 2. increasing buffer from 8KiB to 128KiB. ### 1. accept-encoding: identity I think this related `pip` PR has a good explanation of what's going on: pypa/pip#1688 ``` # We use Accept-Encoding: identity here because requests # defaults to accepting compressed responses. This breaks in # a variety of ways depending on how the server is configured. # - Some servers will notice that the file isn't a compressible # file and will leave the file alone and with an empty # Content-Encoding # - Some servers will notice that the file is already # compressed and will leave the file alone and will add a # Content-Encoding: gzip header # - Some servers won't notice anything at all and will take # a file that's already been compressed and compress it again # and set the Content-Encoding: gzip header ``` The `files.pythonhosted.org` server is the 1st kind. Example debug log I added in `uv` when installing against PyPI: <img width="1459" alt="image" src="https://github.com/astral-sh/uv/assets/12058921/ef10d758-46aa-4c8e-9dba-47f33437401b"> (there is no `content-encoding` header in this response, the `whl` hasn't been compressed, and there is a content-length header) Our internal mirror is the third case. It does seem sensible that our mirror should be modified to act like the 1st kind. But `uv` should handle all three cases like `pip` does. ### 2. buffer increase In #2220 I observed that `pip`'s downloading was causing up-to 128KiB flushes in our mirror. After fix 1, `uv` was still only causing up-to 8KiB flushes, and was slower to download than `pip`. Increasing this buffer from the default 8KiB led to a download performance improvement against our mirror and the expected observed 128KiB flushes. ## Test Plan Ran benchmarking as instructed by @charliermarsh <img width="1447" alt="image" src="https://github.com/astral-sh/uv/assets/12058921/840d9c8d-4b98-4bfa-89f3-073a2dec1f23"> No performance improvement or regression.
I tested the fix on my slow large wheel and it works great. Installation time is now fast for that 1 wheel and I guess my index server had same issue here. Thanks to both of you for continuing to improve uv's performance. |
All @thundergolfer! |
I'll probably cut a release tomorrow with this in it. |
Hey, this is a successor to #1978 as I'm still trying to integrate
uv
with Modal's internal mirror. But this time I don't yet have a solution to this problem 🙂.I'm observed extremely degraded install performance when installing pytorch against our mirror. The install command is
uv pip install torch
.On my test machine, installing without our mirror is fast, taking around 7 seconds.
But when I get our mirror involved, performance tanks and an install takes over 5 minutes. The command I'm running is this:
Investigating...
Using debug logs on our mirror I see that there is a vast (~64x) difference in read sizes between
uv pip install
andpip install
.Our mirror serves
.whl
files off disk like this:By enabling debug logs (
RUST_LOG=debug
) I can see that when usingpip install
read chunks are large:We can see in the screenshot of the debug logs that reads up to 128KiB are happening. Performance of the
pip install
command is good, the whl download takes around 3 seconds.The
pip
command here wasand
pip --version
ispip 23.3.1 from /home/ubuntu/modal/venv/lib/python3.11/site-packages/pip (python 3.11)
.The read sizes of
pip
contrast strongly withuv pip install
, where I'm seeing read sizes between11
and1120
bytes.I believe these relatively tiny read sizes when downloading are contributing to the slow download performance on the 720.55MiB torch
.whl
file.Looking at the
trace
logs this is the region of code I'm in:uv/crates/uv-distribution/src/distribution_database.rs
Line 161 in 043d726
uv
versionOS
uname -a Linux ip-10-1-1-198 5.15.0-1052-aws #57~20.04.1-Ubuntu SMP Mon Jan 15 17:04:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: