On 6/5/19 12:53 AM, Chris Murphy wrote:
On Mon, Jun 3, 2019 at 7:01 PM Jason L Tibbitts III
<tibbs(a)math.uh.edu> wrote:
>
>>>>>> "PM" == Panu Matilainen <pmatilai(a)redhat.com>
writes:
>
> PM> Note that rpm doesn't support parallel zstd compression, and while
> PM> it does for xz, that's not even utilized in Fedora.
>
> Doing parallel xz compression has a surprising cost in compression ratio
> which gets worse as the thread count increases (because it just splits
> the input into independent blocks and compresses them separately). I
> did start on a feature to have it enabled but then abandoned that after
> realizing that it didn't really work as I'd hoped.
Which is also why parallel xz compression doesn't produce reproducible results.
> That said, I do wonder how difficult it would be to do parallel zstd
> compression/decompression within RPM. If it were possible then that
> might help to obviate some of the downsides.
At least for small files, and there are many in any distribution,
using a dictionary very well could improve compression/decompression
time, compression ratio, more than threads. Adding dictionary support
would help all the single thread hardware, and even the builders when
zstd -T0 option dictates there's only 1 or 2 threads available. On the
generic sample set, it's functionally like getting 4 threads on speed,
and even compression ratio goes up by ~3x. But I have no idea how that
sample set compares to Fedora's files.
Yes, but as I mentioned in another email, rpm doesn't compress the files
individually, it compresses them as one big continuous archive. The
dictionary is unlikely to help that (in my quick test yesterday it
actually made it worse)
- Panu -