Stability of parallel compression #2238

marxin · 2020-07-09T10:26:28Z

Just to be sure. When using the parallel compression, is the compressed stream stable?
I mean can it vary when compression is repeated or is it deterministic?

Cyan4973 · 2020-07-09T15:50:10Z

Compression is reproducible.

terrelln · 2020-07-09T16:26:29Z

The single-threaded output will be different than the multi-threaded output. However, both are deterministic, and the multi-threaded output produces the same compressed data no matter how many threads you use.

marxin · 2020-07-10T07:19:22Z

Great, good job made!

marxin · 2020-07-23T09:24:02Z

One further question: is the compressed output dependant on a number of threads that are used for compression?

terrelln · 2020-07-23T20:33:12Z

One further question: is the compressed output dependant on a number of threads that are used for compression?

No. Single-threaded output is different than multi-threaded output, but the multi-threaded output is the same for any number of threads.

ghost · 2020-09-25T01:53:45Z

the multi-threaded output is the same for any number of threads.

When:

use ZSTD_e_end end directive
output buffer size >= ZSTD_compressBound()

The job number is calculated by ZSTDMT_computeNbJobs() function.

Then in this case, different thread number may produce different compressed data, is this intended?

terrelln · 2020-09-25T02:31:24Z

Then in this case, different thread number may produce different compressed data, is this intended?

No. I'd call that a bug. I've opened Issue #2327 to track it.

If you need to work around this bug, don't start your streaming job with ZSTD_e_end. Pass at least one byte of input with ZSTD_e_continue before calling ZSTD_e_end.

See D117853: compressing debug sections is a bottleneck and therefore it has a large value parallizing the step. zstd provides multi-threading API and the output is deterministic even with different numbers of threads (see facebook/zstd#2238). Therefore we can leverage it instead of using the pigz-style sharding approach. Also, switch to the default compression level 3. The current level 5 is significantly slower without providing justifying size benefit. ``` 'dash b.sh 1' ran 1.05 ± 0.01 times faster than 'dash b.sh 3' 1.18 ± 0.01 times faster than 'dash b.sh 4' 1.29 ± 0.02 times faster than 'dash b.sh 5' level=1 size: 358946945 level=3 size: 309002145 level=4 size: 307693204 level=5 size: 297828315 ``` Reviewed By: andrewng, peter.smith Differential Revision: https://reviews.llvm.org/D133679

marxin mentioned this issue Jul 9, 2020

Support threading for zstd compression. rpm-software-management/rpm#1303

Merged

Cyan4973 added the question label Jul 9, 2020

marxin closed this as completed Jul 10, 2020

terrelln mentioned this issue Sep 25, 2020

Zstd multithreaded output can depend on number of threads #2327

Closed

2 tasks

codicodi mentioned this issue Oct 7, 2020

mkinitcpio: Add support for the zstd compressor archlinux/mkinitcpio#35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stability of parallel compression #2238

Stability of parallel compression #2238

marxin commented Jul 9, 2020

Cyan4973 commented Jul 9, 2020

terrelln commented Jul 9, 2020

marxin commented Jul 10, 2020

marxin commented Jul 23, 2020

terrelln commented Jul 23, 2020

ghost commented Sep 25, 2020 •

edited by ghost

Loading

terrelln commented Sep 25, 2020

Stability of parallel compression #2238

Stability of parallel compression #2238

Comments

marxin commented Jul 9, 2020

Cyan4973 commented Jul 9, 2020

terrelln commented Jul 9, 2020

marxin commented Jul 10, 2020

marxin commented Jul 23, 2020

terrelln commented Jul 23, 2020

ghost commented Sep 25, 2020 • edited by ghost Loading

terrelln commented Sep 25, 2020

ghost commented Sep 25, 2020 •

edited by ghost

Loading