-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stability of parallel compression #2238
Comments
Compression is reproducible. |
The single-threaded output will be different than the multi-threaded output. However, both are deterministic, and the multi-threaded output produces the same compressed data no matter how many threads you use. |
Great, good job made! |
One further question: is the compressed output dependant on a number of threads that are used for compression? |
No. Single-threaded output is different than multi-threaded output, but the multi-threaded output is the same for any number of threads. |
When:
The job number is calculated by Then in this case, different thread number may produce different compressed data, is this intended? |
No. I'd call that a bug. I've opened Issue #2327 to track it. If you need to work around this bug, don't start your streaming job with |
See D117853: compressing debug sections is a bottleneck and therefore it has a large value parallizing the step. zstd provides multi-threading API and the output is deterministic even with different numbers of threads (see facebook/zstd#2238). Therefore we can leverage it instead of using the pigz-style sharding approach. Also, switch to the default compression level 3. The current level 5 is significantly slower without providing justifying size benefit. ``` 'dash b.sh 1' ran 1.05 ± 0.01 times faster than 'dash b.sh 3' 1.18 ± 0.01 times faster than 'dash b.sh 4' 1.29 ± 0.02 times faster than 'dash b.sh 5' level=1 size: 358946945 level=3 size: 309002145 level=4 size: 307693204 level=5 size: 297828315 ``` Reviewed By: andrewng, peter.smith Differential Revision: https://reviews.llvm.org/D133679
See D117853: compressing debug sections is a bottleneck and therefore it has a large value parallizing the step. zstd provides multi-threading API and the output is deterministic even with different numbers of threads (see facebook/zstd#2238). Therefore we can leverage it instead of using the pigz-style sharding approach. Also, switch to the default compression level 3. The current level 5 is significantly slower without providing justifying size benefit. ``` 'dash b.sh 1' ran 1.05 ± 0.01 times faster than 'dash b.sh 3' 1.18 ± 0.01 times faster than 'dash b.sh 4' 1.29 ± 0.02 times faster than 'dash b.sh 5' level=1 size: 358946945 level=3 size: 309002145 level=4 size: 307693204 level=5 size: 297828315 ``` Reviewed By: andrewng, peter.smith Differential Revision: https://reviews.llvm.org/D133679
Just to be sure. When using the parallel compression, is the compressed stream stable?
I mean can it vary when compression is repeated or is it deterministic?
The text was updated successfully, but these errors were encountered: