On Wed, May 29, 2019 at 2:20 PM Ben Cotton <bcotton(a)redhat.com> wrote:
https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression
= Switch RPMs to zstd compression =
== Summary ==
Binary RPMs are currently compressed with xz level 2.
Switching to zstd would increase decompression speed significantly.
Arch has been discussing this change also, with more elaborate test
results. This is the most recent table including --ultra flag to
unlock level 20+
https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029542.html
The first post
https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520.html
* The recommended compression level is 19. The builds will take
longer, but the additional compression time is negligible in the total
build time and it pays off in better compression ratio than xz lvl2
has.
Even without dictionary, -T0 will outperform xz. By how much depends
on the resources available at the time.
Phase 2 (it's out of scope for this feature): If you can figure out a
way to leverage the zstd (training) dictionary feature, that would
increase compression ratios, reduce compression time, as well as
decompression speed. The gotcha is the dictionary must be specified
for both compression and decompression. So you'd need a way for the
RPM metadata to reference a dictionary version, and package the
dictionary with RPM to make sure it's available. You only need one
version, but if future training demonstrates a significant
improvement, you'd want a way to deploy multiple dictionaries, and
differentiate which was used to compress an RPM since packages could
be made with either or none.
https://github.com/facebook/zstd See "the case for small data compression"
== Benefit to Fedora ==
* Faster installations/upgrades of user systems
* Faster koji builds (installations in build roots)
* Faster container builds
* Lower bandwidth on mirrors if we choose the highest compression level
Likely also faster openqa installations and testing.
== How To Test ==
* dnf install <package>
* rpm -q --qf "%{PAYLOADCOMPRESSOR} %{PAYLOADFLAGS}\n" <package>
* expected output: zstd 19
Someone building an RPM locally for local use (or within their
organization) shouldn't get hit with level 19 compression time and
memory requirements. They're probably alright with just the default,
level 3. That's way faster than xz, and compresses better than any of
the zips.
Is there a way to configure different defaults, either on the command
line or with a configuration file? If you don't want to expose all of
the zstd options, even coming up with your own mapping/grouping is
useful: faster=3, better=20 And at some future date, both of them can
use the latest version dictionary automatically.
--
Chris Murphy