Re: F41 Change Proposal: Change Compose Settings (system-wide)

older

Fedora rawhide compose report:...

Fedora CoreOS next stream rebased...

Kevin Kofler

Wednesday, 20 March 2024 Wed, 20 Mar '24

9 p.m.

Aoife Moloney wrote:

...

The zstd compression type was chosen to match createrepo_c settings. As an alternative, we might want to choose xz,

Since xz consistently compresses better than zstd, I would strongly suggest using xz everywhere to minimize download sizes. However:

...

especially after zlib-ng has been made the default in Fedora and brought performance improvements.

zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly (which is mostly due to the format, so, while some implementations manage to do better than others at the expense of more compression time, there is a limit to how well they can do and it is nowhere near xz or even zstd) and should hence never be used at all. Kevin Kofler

Show replies by date

Stephen Smoogen

Thursday, 21 March Thu, 21 Mar

7:12 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel < devel(a)lists.fedoraproject.org> wrote:

...

Aoife Moloney wrote: > The zstd compression type was chosen to match createrepo_c settings. > As an alternative, we might want to choose xz, Since xz consistently compresses better than zstd, I would strongly suggest using xz everywhere to minimize download sizes. However: > especially after zlib-ng has been made the default in Fedora and brought > performance improvements. zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly (which is mostly due to the format, so, while some implementations manage to do better than others at the expense of more compression time, there is a limit to how well they can do and it is nowhere near xz or even zstd) and should hence never be used at all.

There are two parts to this which users will see as 'slowness'. Part one is downloading the data from a mirror. Part two is uncompressing the data. In work I have been a part of, we have found that while xz gave us much smaller files, the time to uncompress was so much larger that our download gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger) but uncompressed much faster than xz. This is data dependent though so it would be good to see if someone could test to see if xz uncompression of the datafiles will be too slow.

...

Kevin Kofler -- _______________________________________________ devel mailing list -- devel(a)lists.fedoraproject.org To unsubscribe send an email to devel-leave(a)lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

-- Stephen Smoogen, Red Hat Automotive Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

Neal Gompa

7:37 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

On Thu, Mar 21, 2024 at 8:20 AM Stephen Smoogen <ssmoogen(a)redhat.com> wrote:

...

On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <devel(a)lists.fedoraproject.org> wrote: > > Aoife Moloney wrote: > > The zstd compression type was chosen to match createrepo_c settings. > > As an alternative, we might want to choose xz, > > Since xz consistently compresses better than zstd, I would strongly suggest > using xz everywhere to minimize download sizes. However: > > > especially after zlib-ng has been made the default in Fedora and brought > > performance improvements. > > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly > (which is mostly due to the format, so, while some implementations manage to > do better than others at the expense of more compression time, there is a > limit to how well they can do and it is nowhere near xz or even zstd) and > should hence never be used at all. > There are two parts to this which users will see as 'slowness'. Part one is downloading the data from a mirror. Part two is uncompressing the data. In work I have been a part of, we have found that while xz gave us much smaller files, the time to uncompress was so much larger that our download gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger) but uncompressed much faster than xz. This is data dependent though so it would be good to see if someone could test to see if xz uncompression of the datafiles will be too slow.

Fedora has been using optimized zstd compression "by default" since Fedora 30 anyway with Zchunk metadata: https://fedoraproject.org/wiki/Changes/Zchunk_Metadata Regular zstd compression is less optimized due to the lack of dictionaries, but it's also effectively the fallback path, though much faster to decompress while providing pretty good compression (which is why we have been gradually switching *everything* to zstd). -- 真実はいつも一つ！/ Always, there's only one truth!

Kevin Kofler

Friday, 22 March Fri, 22 Mar

8:38 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Neal Gompa wrote:

...

Regular zstd compression is less optimized due to the lack of dictionaries, but it's also effectively the fallback path, though much faster to decompress while providing pretty good compression (which is why we have been gradually switching *everything* to zstd).

"pretty good" if you only really care about speed. It loses to the much older xz in almost all compression ratio benchmarks, often significantly (e.g., xz compresses text (enwik8 testcase) with a factor 4 (*), zstd only with a factor 2.5 [1]). So I still think zstd is a step backwards compared to xz. (*) The factor 4 is with xz -9, but you should always use that because the decompression is no slower, and in fact actually slightly faster, with it than with lower compression levels, only the compression is slower, but the compression happens once on the server. [1] source: https://quixdb.github.io/squash-benchmark/ Kevin Kofler

Daniel Alley

Sunday, 24 March Sun, 24 Mar

11:38 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

But we're not compressing text, we're compressing XML. Anyway, I ran an experiment on a local copy of the Fedora 38 release repo and the differences (while they do exist) aren't very significant. Less than 10% createrepo_c --update --skip-stat --recycle-pkglist --general-compress-type gz . ============================================================ du -bh repodata/* | grep .gz 18M repodata/f6dee453a7f86804214e402ad2e444b989f044f0b16fa7ba74e5a27a8a49cd07-primary.xml.gz 52M repodata/131fa4fcd206fd3a718e4765983c8b7b276e7e634e45c226d9c465145f8e69e9-filelists.xml.gz 7.1M repodata/1c4bf077a2bdf4743a7cded3e2f72282dec5f8e4910692d193e371508552322a-other.xml.gz createrepo_c --update --skip-stat --recycle-pkglist --general-compress-type zstd . ============================================================= 15M repodata/5c05e888c6da5a13dc2a73fc6fc6e6b2f4ec9120a9544fa26c20cff14a8ace27-primary.xml.zst 41M repodata/289503c7ec867863ee67188b8d9981f7e291158a9821ae813124eb480b41cc94-filelists.xml.zst 5.5M repodata/f78b3010d62173a4a81951c24bad28deb7cb91ab678fdd515a56ff9a72574953-other.xml.zst createrepo_c --update --skip-stat --recycle-pkglist --general-compress-type xz . ============================================================ 14M repodata/a4a4b9c7da02d0cbc7bb3aa39f7f919c7ca033e685ef44e42478a6daf841b32a-primary.xml.xz 41M repodata/244e49e5b8c95280bb67a9695e4177fc9e7358f4482df2b489126c02673a48ad-filelists.xml.xz 4.9M repodata/1f492b3f77a2f9d8a0c1f646200db2575f4a37f5df4c955c8f39e622324eb3ec-other.xml.xz

Daniel Alley

11:44 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Also, to use that squash benchmark you will probably want to run it yourself with modern libraries and modern hardware, as the data on their website (assuming it's the same as the data in their github repo) is 8+ years old. zstd has improved a fair bit during that timeframe.

Daniel Alley

Monday, 25 March Mon, 25 Mar

8:53 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

One more point: createrepo_c uses zstd compression level 10, but the range goes all the way up to level 22. I would oppose making the default much computationally heavier than it is currently, but if spending 20x longer to compress the repo 10% more is desirable to the fedora project, then createrepo_c could perhaps add a the ability to select a compression level. zstd at high compression levels is very nearly as good at compressing as xz and sometimes better, while remaining much faster to decompress.

Kevin Kofler

1:29 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Daniel Alley wrote:

...

Considering that compression happens once on the server and downloading and decompression happens many times on many computers, I think we should use the highest possible compression level. By the way, xz also supports stronger parameters than -9 in principle, there is just no preset for it. Kevin Kofler

Zbigniew Jędrzejewski-Szmek

3:40 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

On Mon, Mar 25, 2024 at 07:29:09PM +0100, Kevin Kofler via devel wrote:

...

Daniel Alley wrote: > One more point: createrepo_c uses zstd compression level 10, but the range > goes all the way up to level 22. I would oppose making the default much > computationally heavier than it is currently, but if spending 20x longer > to compress the repo 10% more is desirable to the fedora project, then > createrepo_c could perhaps add a the ability to select a compression > level. > > zstd at high compression levels is very nearly as good at compressing as > xz and sometimes better, while remaining much faster to decompress. -- Considering that compression happens once on the server and downloading and decompression happens many times on many computers, I think we should use the highest possible compression level.

+1 Zbyszek

Neal Gompa

3:50 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

On Mon, Mar 25, 2024 at 4:40 PM Zbigniew Jędrzejewski-Szmek <zbyszek(a)in.waw.pl> wrote:

...

On Mon, Mar 25, 2024 at 07:29:09PM +0100, Kevin Kofler via devel wrote: > Daniel Alley wrote: > > One more point: createrepo_c uses zstd compression level 10, but the range > > goes all the way up to level 22. I would oppose making the default much > > computationally heavier than it is currently, but if spending 20x longer > > to compress the repo 10% more is desirable to the fedora project, then > > createrepo_c could perhaps add a the ability to select a compression > > level. > > > > zstd at high compression levels is very nearly as good at compressing as > > xz and sometimes better, while remaining much faster to decompress. -- > > Considering that compression happens once on the server and downloading and > decompression happens many times on many computers, I think we should use > the highest possible compression level. +1

Keep in mind we also want to make the compose process faster too, I don't know if it's worth it to spend 20x more time compressing repodata when we keep trying to get back hours and minutes in the compose time. -- 真実はいつも一つ！/ Always, there's only one truth!

Zbigniew Jędrzejewski-Szmek

4:12 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

On Mon, Mar 25, 2024 at 04:50:28PM -0400, Neal Gompa wrote:

...

On Mon, Mar 25, 2024 at 4:40 PM Zbigniew Jędrzejewski-Szmek <zbyszek(a)in.waw.pl> wrote: > > On Mon, Mar 25, 2024 at 07:29:09PM +0100, Kevin Kofler via devel wrote: > > Daniel Alley wrote: > > > One more point: createrepo_c uses zstd compression level 10, but the range > > > goes all the way up to level 22. I would oppose making the default much > > > computationally heavier than it is currently, but if spending 20x longer > > > to compress the repo 10% more is desirable to the fedora project, then > > > createrepo_c could perhaps add a the ability to select a compression > > > level. > > > > > > zstd at high compression levels is very nearly as good at compressing as > > > xz and sometimes better, while remaining much faster to decompress. -- > > > > Considering that compression happens once on the server and downloading and > > decompression happens many times on many computers, I think we should use > > the highest possible compression level. > > +1 > Keep in mind we also want to make the compose process faster too, I don't know if it's worth it to spend 20x more time compressing repodata when we keep trying to get back hours and minutes in the compose time.

I wanted to write that the compression times are small enough for this not not matter, but indeed, at the very highest levels, they do become noticable. $ mv 8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-filelists.xml filelists.xml $ time zstd -k -9 filelists.xml filelists.xml : 5.38% ( 863 MiB => 46.4 MiB, filelists.xml.zst) zstd -k -9 4.96s user 0.18s system 103% cpu 4.971 total $ time zstd -k -21 filelists.xml Warning : compression level higher than max, reduced to 19 filelists.xml : 4.74% ( 863 MiB => 40.9 MiB, filelists.xml.zst) zstd -k -21 321.49s user 0.31s system 99% cpu 5:22.20 total $ time zstd -k -21 -T8 filelists.xml Warning : compression level higher than max, reduced to 19 filelists.xml : 4.74% ( 863 MiB => 40.9 MiB, filelists.xml.zst) zstd -k -21 -T8 874.57s user 0.70s system 732% cpu 1:59.51 total $ time xz -k -v 8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-filelists.xml 8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-filelists.xml (1/1) 100 % 44.3 MiB / 862.9 MiB = 0.051 33 MiB/s 0:26 xz -k -v 196.88s user 0.63s system 749% cpu 26.337 total (This is multithreaded, and gives a compression ratio of 5.14%.) Dunno, I think anything below a minute should be OK… Zbyszek

Kevin Kofler

4:59 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Zbigniew Jędrzejewski-Szmek wrote:

...

On Mon, Mar 25, 2024 at 04:50:28PM -0400, Neal Gompa wrote: > Keep in mind we also want to make the compose process faster too, I > don't know if it's worth it to spend 20x more time compressing > repodata when we keep trying to get back hours and minutes in the > compose time. I wanted to write that the compression times are small enough for this not not matter, but indeed, at the very highest levels, they do become noticable.

5 minutes? On a process that is run once every 24 hours? While at the same time saving download time for all Fedora users? I fail to see the issue.

...

$ time xz -k -v 8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-

filelists.xml

...

8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-

filelists.xml

...

(1/1) 100 % 44.3 MiB / 862.9 MiB = 0.051 33 MiB/s 0:26 xz -k -v 196.88s user 0.63s system 749% cpu 26.337 total (This is multithreaded, and gives a compression ratio of 5.14%.)

That is not the highest compression level of xz though. Try xz -9, it should be better than zstd. It will take longer to compress, but should actually be FASTER (!) to decompress, which is what really matters. Kevin Kofler

Kevin Fenzi

5:09 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

On Mon, Mar 25, 2024 at 10:59:15PM +0100, Kevin Kofler via devel wrote:

...

Zbigniew Jędrzejewski-Szmek wrote: > On Mon, Mar 25, 2024 at 04:50:28PM -0400, Neal Gompa wrote: >> Keep in mind we also want to make the compose process faster too, I >> don't know if it's worth it to spend 20x more time compressing >> repodata when we keep trying to get back hours and minutes in the >> compose time. > > I wanted to write that the compression times are small enough for this not > not matter, but indeed, at the very highest levels, they do become > noticable. 5 minutes? On a process that is run once every 24 hours? While at the same time saving download time for all Fedora users? I fail to see the issue.

7 repodata files compressed x 5 arches x 2 (debuginfo) x 2 (server and Everything ) = 140 files * 5min -> 11 hours? Thats of course a inflation of what it would be... most of the repodata files are way smaller than filelists, but still... keep in mind that any one thing we do, if we do it a zillion times will add up. kevin

Daniel Alley

Tuesday, 26 March Tue, 26 Mar

8:42 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

...

ry xz -9, it should be better than zstd. It will take longer to compress, but should actually be FASTER (!) to decompress, which is what really matters.

Please provide data - any data - to support this claim, because it flies completely in the face of every benchmark the internet has to offer, including the one Sirius performed below.

Kevin Kofler

6:29 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Daniel Alley wrote:

...

>ry xz -9, it should be better than zstd. It will take longer to compress, >but should actually be FASTER (!) to decompress, which is what really >matters. Please provide data - any data - to support this claim, because it flies completely in the face of every benchmark the internet has to offer, including the one Sirius performed below.

I think you misunderstood what I wrote (which admittedly was somewhat misleading). I mean xz decompresses faster when the input was compressed with xz -9 than when it was compressed with just xz (which according to the manpage currently defaults to xz -6, but in any case, less than -9), which was the context. If you look at https://quixdb.github.io/squash-benchmark/ , wherever a higher compression level actually compresses better (e.g., on the enwik8 or mozilla benchmarks), xz gets slower to compress, but faster to decompress with increasing compression level. (Though if the maximum compression ratio is reached before -9, as on ooffice, decompression will actually get slower again with higher levels. The speedup comes from having less input to process.) xz at any level will of course still be nowhere near zstd in decompression speed. That is not what I intended to claim (and I thought it is obvious that that is not the correct interpretation), though my message was somewhat ambiguous, and I apologize for that. Kevin Kofler

Kevin Kofler

6:37 p.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Daniel Alley wrote:

...

In any case, according to Sirius' benchmark, it looks like zstd -19 actually beats even xz -9 at compression ratio (while being worlds faster to decompress), so it looks like a good alternative. It takes 3 times longer to compress, but who cares, since that happens only once per compose, on one computer, vs. millions of Fedora users having to download and decompress the file. The tradeoff should be obvious. (You can also see that the decompression time does in fact go down from xz -4 to -6 to -7, then stays constant on -7, -8, -9 where little to no further size reduction is reached. This is consistent with what I explained in my previous reply to your post above. But of course zstd at any level is about 6 times faster to decompress than xz at any level.) Given the benchmark results on one of the actually affected files, I now think zstd -19 is what we want to use, not xz -9. Kevin Kofler

Kevin Kofler

Thursday, 21 March Thu, 21 Mar

7:40 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Stephen Smoogen wrote:

...

This very much depends on the speed of the local Internet connection vs. the speed of the user's CPU, so the tradeoff will unfortunately be different from user to user. Back in the delta RPM days, I have seen both sides of the tradeoff, with delta RPMs initially helping, then when my ISP gradually increased the bandwidth allocations while my computer was still the same, it more and more just making things worse. It works the same way for metadata compression, though I have not timed how that will work out for me personally. That said, another part of the tradeoff is that, for some users, more to download means more money getting charged on their metered bandwidth plan. That is of course not an issue for those of us lucky enough to be on a flatrate broadband plan. Kevin Kofler

Sirius

Tuesday, 26 March Tue, 26 Mar

3:07 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth:

...

On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel < devel(a)lists.fedoraproject.org> wrote: > Aoife Moloney wrote: > > The zstd compression type was chosen to match createrepo_c settings. > > As an alternative, we might want to choose xz, > > Since xz consistently compresses better than zstd, I would strongly > suggest > using xz everywhere to minimize download sizes. However: > > > especially after zlib-ng has been made the default in Fedora and brought > > performance improvements. > > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly > (which is mostly due to the format, so, while some implementations manage > to > do better than others at the expense of more compression time, there is a > limit to how well they can do and it is nowhere near xz or even zstd) and > should hence never be used at all. > > There are two parts to this which users will see as 'slowness'. Part one is downloading the data from a mirror. Part two is uncompressing the data. In work I have been a part of, we have found that while xz gave us much smaller files, the time to uncompress was so much larger that our download gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger) but uncompressed much faster than xz. This is data dependent though so it would be good to see if someone could test to see if xz uncompression of the datafiles will be too slow.

Hi there, Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB. Input File: f41-filelist.xml, Size: 985194446 bytes XZ level 1 : 21s to compress, 5.3% filesize, 4.4s to decompress XZ level 2 : 28s to compress, 5.1% filesize, 4.2s to decompress XZ level 3 : 44s to compress, 5.1% filesize, 4.2s to decompress XZ level 4 : 55s to compress, 5.3% filesize, 4.5s to decompress XZ level 5 : 1min25s to compress, 5.3% filesize, 4.3s to decompress XZ level 6 : 2min49s to compress, 5.1% filesize, 4.4s to decompress XZ level 7 : 2min55s to compress, 4.8% filesize, 4.2s to decompress XZ level 8 : 3min 4s to compress, 4.8% filesize, 4.2s to decompress XZ level 9 : 3min12s to compress, 4.8% filesize, 4.2s to decompress Input File: f41-filelist.xml, Size: 985194446 bytes GZ Level 1 : 6s to compress, 7.9% filesize, 4.2s to decompress GZ Level 2 : 6s to compress, 7.8% filesize, 4.1s to decompress GZ Level 3 : 7s to compress, 7.6% filesize, 4.1s to decompress GZ Level 4 : 8s to compress, 6.8% filesize, 4.0s to decompress GZ Level 5 : 9s to compress, 6.6% filesize, 4.0s to decompress GZ Level 6 : 12s to compress, 6.6% filesize, 4.0s to decompress GZ Level 7 : 15s to compress, 6.5% filesize, 4.0s to decompress GZ Level 8 : 24s to compress, 6.4% filesize, 4.0s to decompress GZ Level 9 : 28s to compress, 6.3% filesize, 4.0s to decompress xz level 2 is not a shabby compromise as you get small filesize and time to compress is the same as gzip level 9. To get the smallest filesizes, the time (and memory requirements) of xz becomes very noticeable for not much gain. #!/bin/bash INPUTFILE=f41-filelist.xml INPUTFILESIZE=$(ls -ln f41-filelist.xml|awk '{print $5}') ## gzip function do_gzip() { let cl=1 echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes echo while [[ $cl -le 9 ]] do echo GZip compression level ${cl} echo Time to compress the file time gzip -k -${cl} ${INPUTFILE} COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.gz | awk '{print $5}') echo Compressed to echo "scale=5 ${COMPRESSED_SIZE}/${INPUTFILESIZE}*100 "|bc echo % of original echo Time to decompress the file, output to /dev/null time gzip -d -c ${INPUTFILE}.gz > /dev/null rm -f ${INPUTFILE}.gz let cl=$cl+1 echo done } ## xz function do_xz() { let cl=1 echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes echo while [[ $cl -le 9 ]] do echo XZ compression level ${cl} echo Time to compress the file time xz -k -z -${cl} ${INPUTFILE} COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.xz | awk '{print $5}') echo Compressed to echo "scale=5 ${COMPRESSED_SIZE}/${INPUTFILESIZE}*100 "|bc echo % of original echo Time to decompress the file, output to /dev/null time xz -d -c ${INPUTFILE}.xz > /dev/null rm -f ${INPUTFILE}.xz let cl=$cl+1 echo done } do_gzip do_xz -- Kind regards, /S

Sirius

4:41 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth:

...

In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth: > On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel < > devel(a)lists.fedoraproject.org> wrote: > > > Aoife Moloney wrote: > > > The zstd compression type was chosen to match createrepo_c settings. > > > As an alternative, we might want to choose xz, > > > > Since xz consistently compresses better than zstd, I would strongly > > suggest > > using xz everywhere to minimize download sizes. However: > > > > > especially after zlib-ng has been made the default in Fedora and brought > > > performance improvements. > > > > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly > > (which is mostly due to the format, so, while some implementations manage > > to > > do better than others at the expense of more compression time, there is a > > limit to how well they can do and it is nowhere near xz or even zstd) and > > should hence never be used at all. > > > > > There are two parts to this which users will see as 'slowness'. Part one is > downloading the data from a mirror. Part two is uncompressing the data. In > work I have been a part of, we have found that while xz gave us much > smaller files, the time to uncompress was so much larger that our download > gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger) > but uncompressed much faster than xz. This is data dependent though so it > would be good to see if someone could test to see if xz uncompression of > the datafiles will be too slow. Hi there, Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.

Added tests with zstd 1-19, not using a dictionary to improve it any further. Input File: f41-filelist.xml, Size: 985194446 bytes ZStd Level 1, 1.7s to compress, 6.46% file size, 0.6s decompress ZStd Level 2, 1.7s to compress, 6.34% file size, 0.7s decompress ZStd Level 3, 2.1s to compress, 6.26% file size, 0.7s decompress ZStd Level 4, 2.3s to compress, 6.26% file size, 0.7s decompress ZStd Level 5, 5.7s to compress, 5.60% file size, 0.6s decompress ZStd Level 6, 7.2s to compress, 5.42% file size, 0.6s decompress ZStd Level 7, 8.1s to compress, 5.39% file size, 0.6s decompress ZStd Level 8, 9.5s to compress, 5.31% file size, 0.6s decompress ZStd Level 9, 10.4s to compress, 5.28% file size, 0.6s decompress ZStd Level 10, 13.6s to compress, 5.26% file size, 0.6s decompress ZStd Level 11, 18.4s to compress, 5.25% file size, 0.6s decompress ZStd Level 12, 19.5s to compress, 5.25% file size, 0.6s decompress ZStd Level 13, 30.9s to compress, 5.25% file size, 0.6s decompress ZStd Level 14, 39.7s to compress, 5.23% file size, 0.6s decompress ZStd Level 15, 56.1s to compress, 5.21% file size, 0.6s decompress ZStd Level 16, 1min58s to compress, 5.52% file size, 0.7s decompress ZStd Level 17, 2min25s to compress, 5.36% file size, 0.7s decompress ZStd Level 18, 3min46s to compress, 5.43% file size, 0.8s decompress ZStd Level 19, 10min36s to compress, 4.66% file size, 0.7s decompress So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend eleven times longer compressing the file (and I did not look at resources like CPU or RAM while doing this). I am sure there are other compression mechanisms that can squeeze these files a bit further, but at what cost. If it is a once a day event, maybe a high compression ration is justifiable. If it has to happen hundreds of times per day - not so much. ## zstd function do_zstd() { let cl=1 echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes echo while [[ $cl -le 19 ]] do echo ZStd compression level ${cl} echo Time to compress the file time zstd -z -${cl} ${INPUTFILE} COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}') echo Compressed to echo "scale=5 ${COMPRESSED_SIZE}/${INPUTFILESIZE}*100 "|bc echo % of original echo Time to decompress the file, output to /dev/null time zstd -d -c ${INPUTFILE}.zst > /dev/null rm -f ${INPUTFILE}.zst let cl=$cl+1 echo done } -- Kind regards, /S

Mattia Verga

4:55 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Il 26/03/24 10:41, Sirius via devel ha scritto:

...

In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth: > In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth: >> On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel < >> devel(a)lists.fedoraproject.org> wrote: >> >>> Aoife Moloney wrote: >>>> The zstd compression type was chosen to match createrepo_c settings. >>>> As an alternative, we might want to choose xz, >>> Since xz consistently compresses better than zstd, I would strongly >>> suggest >>> using xz everywhere to minimize download sizes. However: >>> >>>> especially after zlib-ng has been made the default in Fedora and brought >>>> performance improvements. >>> zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly >>> (which is mostly due to the format, so, while some implementations manage >>> to >>> do better than others at the expense of more compression time, there is a >>> limit to how well they can do and it is nowhere near xz or even zstd) and >>> should hence never be used at all. >>> >>> >> There are two parts to this which users will see as 'slowness'. Part one is >> downloading the data from a mirror. Part two is uncompressing the data. In >> work I have been a part of, we have found that while xz gave us much >> smaller files, the time to uncompress was so much larger that our download >> gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger) >> but uncompressed much faster than xz. This is data dependent though so it >> would be good to see if someone could test to see if xz uncompression of >> the datafiles will be too slow. > Hi there, > > Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB. Added tests with zstd 1-19, not using a dictionary to improve it any further. Input File: f41-filelist.xml, Size: 985194446 bytes ZStd Level 1, 1.7s to compress, 6.46% file size, 0.6s decompress ZStd Level 2, 1.7s to compress, 6.34% file size, 0.7s decompress ZStd Level 3, 2.1s to compress, 6.26% file size, 0.7s decompress ZStd Level 4, 2.3s to compress, 6.26% file size, 0.7s decompress ZStd Level 5, 5.7s to compress, 5.60% file size, 0.6s decompress ZStd Level 6, 7.2s to compress, 5.42% file size, 0.6s decompress ZStd Level 7, 8.1s to compress, 5.39% file size, 0.6s decompress ZStd Level 8, 9.5s to compress, 5.31% file size, 0.6s decompress ZStd Level 9, 10.4s to compress, 5.28% file size, 0.6s decompress ZStd Level 10, 13.6s to compress, 5.26% file size, 0.6s decompress ZStd Level 11, 18.4s to compress, 5.25% file size, 0.6s decompress ZStd Level 12, 19.5s to compress, 5.25% file size, 0.6s decompress ZStd Level 13, 30.9s to compress, 5.25% file size, 0.6s decompress ZStd Level 14, 39.7s to compress, 5.23% file size, 0.6s decompress ZStd Level 15, 56.1s to compress, 5.21% file size, 0.6s decompress ZStd Level 16, 1min58s to compress, 5.52% file size, 0.7s decompress ZStd Level 17, 2min25s to compress, 5.36% file size, 0.7s decompress ZStd Level 18, 3min46s to compress, 5.43% file size, 0.8s decompress ZStd Level 19, 10min36s to compress, 4.66% file size, 0.7s decompress So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend eleven times longer compressing the file (and I did not look at resources like CPU or RAM while doing this). I am sure there are other compression mechanisms that can squeeze these files a bit further, but at what cost. If it is a once a day event, maybe a high compression ration is justifiable. If it has to happen hundreds of times per day - not so much. ## zstd function do_zstd() { let cl=1 echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes echo while [[ $cl -le 19 ]] do echo ZStd compression level ${cl} echo Time to compress the file time zstd -z -${cl} ${INPUTFILE} COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}') echo Compressed to echo "scale=5 ${COMPRESSED_SIZE}/${INPUTFILESIZE}*100 "|bc echo % of original echo Time to decompress the file, output to /dev/null time zstd -d -c ${INPUTFILE}.zst > /dev/null rm -f ${INPUTFILE}.zst let cl=$cl+1 echo done } -- Kind regards, /S --

Also note that adding '-T0' to use all available cores of the CPU will greatly speed up the results with zstd. However, all this talking about the optimal compression level, but in the end there's no way to set that to createrepo_c options, so.... ;-) Mattia

Sirius

5:06 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth:

...

True. But running these tests illustrate quite well that there is diminishing returns or serious tradeoffs required to reach for the biggest compression ratios. Either they do not perform as well as a lower ratio, they take inordinately long time to run or they require bespoke solutions (like custom dicts tailored very specifically to what you are trying to compress). Saving bandwidth is a laudable goal but can not lose sight of practical issues. :) -- Kind regards, /S

Tulio Magno Quites Machado Filho

7:47 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Sirius via devel <devel(a)lists.fedoraproject.org> writes:

...

echo % of original echo Time to decompress the file, output to /dev/null time gzip -d -c ${INPUTFILE}.gz > /dev/null

Keep in mind that gzip has its own zlib implementation, while createrepo_c uses the system-provided zlib. That means, when creating a repository, results may very. -- Tulio Magno

Mattia Verga

Thursday, 21 March Thu, 21 Mar

11:25 a.m.

New subject: F41 Change Proposal: Change Compose Settings (system-wide)

Il 21/03/24 03:00, Kevin Kofler via devel ha scritto:

...

Since xz consistently compresses better than zstd, I would strongly suggest using xz everywhere to minimize download sizes. However: > especially after zlib-ng has been made the default in Fedora and brought > performance improvements. zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly (which is mostly due to the format, so, while some implementations manage to do better than others at the expense of more compression time, there is a limit to how well they can do and it is nowhere near xz or even zstd) and should hence never be used at all.

Yep, I've messed thing up. So, let's stick to use zstd, which is createrepo_c new default anyway. Mattia

days inactive

days old

devel@lists.fedoraproject.org

Manage subscription

22 comments

9 participants

tags (0)

participants (9)

Daniel Alley
Kevin Fenzi
Kevin Kofler
Mattia Verga
Neal Gompa
Sirius
Stephen Smoogen
Tulio Magno Quites Machado Filho
Zbigniew Jędrzejewski-Szmek

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: F41 Change Proposal: Change Compose Settings (system-wide)