Il 26/03/24 10:41, Sirius via devel ha scritto:
In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth:
> In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth:
>> On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
>> devel(a)lists.fedoraproject.org> wrote:
>>
>>> Aoife Moloney wrote:
>>>> The zstd compression type was chosen to match createrepo_c settings.
>>>> As an alternative, we might want to choose xz,
>>> Since xz consistently compresses better than zstd, I would strongly
>>> suggest
>>> using xz everywhere to minimize download sizes. However:
>>>
>>>> especially after zlib-ng has been made the default in Fedora and brought
>>>> performance improvements.
>>> zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
>>> (which is mostly due to the format, so, while some implementations manage
>>> to
>>> do better than others at the expense of more compression time, there is a
>>> limit to how well they can do and it is nowhere near xz or even zstd) and
>>> should hence never be used at all.
>>>
>>>
>> There are two parts to this which users will see as 'slowness'. Part one
is
>> downloading the data from a mirror. Part two is uncompressing the data. In
>> work I have been a part of, we have found that while xz gave us much
>> smaller files, the time to uncompress was so much larger that our download
>> gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
>> but uncompressed much faster than xz. This is data dependent though so it
>> would be good to see if someone could test to see if xz uncompression of
>> the datafiles will be too slow.
> Hi there,
>
> Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.
Added tests with zstd 1-19, not using a dictionary to improve it any
further.
Input File: f41-filelist.xml, Size: 985194446 bytes
ZStd Level 1, 1.7s to compress, 6.46% file size, 0.6s decompress
ZStd Level 2, 1.7s to compress, 6.34% file size, 0.7s decompress
ZStd Level 3, 2.1s to compress, 6.26% file size, 0.7s decompress
ZStd Level 4, 2.3s to compress, 6.26% file size, 0.7s decompress
ZStd Level 5, 5.7s to compress, 5.60% file size, 0.6s decompress
ZStd Level 6, 7.2s to compress, 5.42% file size, 0.6s decompress
ZStd Level 7, 8.1s to compress, 5.39% file size, 0.6s decompress
ZStd Level 8, 9.5s to compress, 5.31% file size, 0.6s decompress
ZStd Level 9, 10.4s to compress, 5.28% file size, 0.6s decompress
ZStd Level 10, 13.6s to compress, 5.26% file size, 0.6s decompress
ZStd Level 11, 18.4s to compress, 5.25% file size, 0.6s decompress
ZStd Level 12, 19.5s to compress, 5.25% file size, 0.6s decompress
ZStd Level 13, 30.9s to compress, 5.25% file size, 0.6s decompress
ZStd Level 14, 39.7s to compress, 5.23% file size, 0.6s decompress
ZStd Level 15, 56.1s to compress, 5.21% file size, 0.6s decompress
ZStd Level 16, 1min58s to compress, 5.52% file size, 0.7s decompress
ZStd Level 17, 2min25s to compress, 5.36% file size, 0.7s decompress
ZStd Level 18, 3min46s to compress, 5.43% file size, 0.8s decompress
ZStd Level 19, 10min36s to compress, 4.66% file size, 0.7s decompress
So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend
eleven times longer compressing the file (and I did not look at resources
like CPU or RAM while doing this). I am sure there are other compression
mechanisms that can squeeze these files a bit further, but at what cost.
If it is a once a day event, maybe a high compression ration is
justifiable. If it has to happen hundreds of times per day - not so much.
## zstd
function do_zstd()
{
let cl=1
echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
echo
while [[ $cl -le 19 ]]
do
echo ZStd compression level ${cl}
echo Time to compress the file
time zstd -z -${cl} ${INPUTFILE}
COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}')
echo Compressed to
echo "scale=5
${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
"|bc
echo % of original
echo Time to decompress the file, output to /dev/null
time zstd -d -c ${INPUTFILE}.zst > /dev/null
rm -f ${INPUTFILE}.zst
let cl=$cl+1
echo
done
}
--
Kind regards,
/S
--
Also note that adding '-T0' to use all available cores of the CPU will
greatly speed up the results with zstd.
However, all this talking about the optimal compression level, but in
the end there's no way to set that to createrepo_c options, so.... ;-)
Mattia