-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data descriptors aren't supported in zip files #2216
Comments
👋🏽 Hi from Buf, the company running the Regardless, we (and our customers) really appreciate the attention you're giving to #2202 😻 |
Hey thanks for chiming in! I'd be happy to hear about why / how you're using data descriptors. I'd like to support them but it'll require either making some upgrades to the |
Hi Charlie - I don't think our use of data descriptors is intentional, just what our downstream library (klauspost/compress) does when creating zip files. FWIW, it seems like similar behavior exists in the Go standard library. The calling code is roughly here, but it's nothing special - I believe any call to Hope that helps - not sure there's much we can do on our end, but I'll be watching over at Majored/rs-async-zip#122 to see if things can be resolved over there. Thanks again for all the work! |
Thanks @stefanvanburen -- I can also try to take a look today. We use an async-zip fork anyway (https://github.com/charliermarsh/rs-async-zip) so we can easily experiment with different behaviors. |
Data descriptors add a lot of complexity for streaming. In ZIP, each entry looks like [header][file contents][data descriptor]. The [data descriptor] is optional, and the [header] tells you if it's present. The [header] tells you the length of [file contents], so that you can read the body. But if [data descriptor] is present, the the length is in [data descriptor], not [header]... So you can't stream it (trivially), because you don't know the length of the file until you're past it. In theory it should be possible to make it work for Deflate, since the compressed body will be self-terminating. Alternatively, we could fall back to downloading the entire file, then unzipping it from disk. |
I think the best path is to catch this error and fall back to downloading + unzipping from disk. It's not that much of a slowdown. |
## Summary Some zip files can't be streamed; in particular, `rs-async-zip` doesn't support data descriptors right now (though it may in the future). This PR adds a fallback path for such zips that downloads the entire zip file to disk, then unzips it from disk (which gives us `Seek`). Closes #2216. ## Test Plan `cargo run pip install --extra-index-url https://buf.build/gen/python hashb_foxglove_protocolbuffers_python==25.3.0.1.20240226043130+465630478360 --force-reinstall -n`
We now support data descriptors natively. |
If you run:
You'll hit an error like:
The text was updated successfully, but these errors were encountered: