Summary
Improve compression ratio of SquashFS filesystem on the installation media. Owner
Name: Bohdan Khomutskyi
Email: bkhomuts@redhat.com Current status
Targeted release: I propose this change for Fedora 32
Last updated: Jan 5 2020
Pagure.io issue: https://pagure.io/releng/issue/9127
I was unable to create an article in Fedora wiki system. Detailed Description
As of Fedora 31, the LiveOS/squashfs.img file on the installation image, is compressed with default settings of mksquashfs. The standard configuration is set to XZ algorithm with block size of 128k and BCJ filter enabled. Those parameters can be adjusted which will lead to a better compression ratio and/or reduction of the CPU usage at build time.
This is simple to achieve. Recently, Lorax has gotten support[1] for adjusting the compression options for mksquashfs via the configuration file. The file should be altered as following:
[compression] bcj = yes args = -b 1M -Xdict-size 1M -no-recovery
Where -b 1M and -Xdict-size 1M are block and dictionary sizes respectively. Could be adjusted. Benefit to Fedora
-
Reduction of the installation media size and the cost of storing and distributing Fedora. -
Reduction of the CPU usage at build time. Depending on which compression parameters chosen. -
See a graphical detail at https://pagure.io/releng/issue/9127.
Scope
-
Proposal owners:
The build environment should have support for adjusting the Lorax configuration file.. Lorax is a program that produces the LiveOS/squashfs.img file on the installation media.
One of the way to allow for such customization, is to add a feature in Pungi, to allow for passing -c option to Lorax.
-
Release engineering: #9127 https://pagure.io/releng/issue/9127 -
Policies and guidelines: N/A -
Trademark approval: N/A
Upgrade/compatibility impact
-
This change comes at a cost of higher memory usage during the installation. Based on my personal estimations, this should not be the issue. Since the decompression should require up to 1MiB per thread.
User Experience
-
Increasing the block size on the current configuration with EXT4 file system, should increase latency while accessing the EXT4 filesystem. The exact impact is to be evaluated. -
The impact of latency will be reduced, if the plain SquashFS option is be choosen.
Dependencies
-
N/A
Contingency Plan
-
N/A
Documentation https://pagure.io/releng/issue/9127.
mksquashfs(1) Release notes See also
https://pagure.io/releng/issue/8646
--
Bohdan Khomutskyi
Release Configuration Management engineer Red Hat
On Sun, Jan 5, 2020 at 5:24 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Summary
Improve compression ratio of SquashFS filesystem on the installation media.
On the issues of Fedora ISOs using excessive CPU, related to lzma decompression: https://bugzilla.redhat.com/show_bug.cgi?id=1717728 https://pagure.io/releng/issue/8581
Create images using plain squashfs (without nested ext4) https://pagure.io/releng/issue/8646
And koji issue to enhance it so it can accept configurable rootfs types (plain squashfs and configurable compression) https://pagure.io/koji/issue/1622
I'm wondering if you can relate this feature proposal to those issues and feature requests?
In my testing, xz does provide better compression ratios, well suited for seldom used images like archives. But it really makes the installation experience worse by soaking the CPU, times thousands of installations (openQA tests on every single nightly, every human QA tester for nightlies, betas, and then the final released product used by Fedora end users).
Has zstandard been evaluated? In my testing of images compressed with zstd, the CPU hit is cut by more than 50%, and is no longer a bottleneck during installations. Image size does increase, although I haven't tested mksquashfs block size higher than 256K. Using zstd with Fedora images also builds on prior evaluation, testing, and effort moving RPM from xz to zstd.
My testing with mksquashfs block size suggests compression ratio improves but latency gets worse, and becomes somewhat pathological with a nested ext4 in it: my best guess is the random access nature of ext4, many 4KiB seeks turn into larger 128KiB seeks; and also squashfs and ext4 probably have different localities (where data is placed in relation to their metadata, in attempt to optimize). Dropping the nested ext4 image also improved performance quite a bit, independent of compression algorithm. I forget how much exactly but it may be ~30%.
I've pretty much concluded Fedora is best off dropping the nested ext4 in favor of plain squashfs, and using zstd. It's not required to do both, but the benefit is additive and significant. The work in dracut and lorax to support plain squashfs, assembling it using overlayfs instead of device-mapper is already done, and tested.
On Sun, Jan 05, 2020 at 10:08:07AM -0700, Chris Murphy wrote:
I've pretty much concluded Fedora is best off dropping the nested ext4 in favor of plain squashfs, and using zstd. It's not required to do both, but the benefit is additive and significant. The work in dracut and lorax to support plain squashfs, assembling it using overlayfs instead of device-mapper is already done, and tested.
I agree with Chris here, I think we should make the switch to plain squashfs unless someone can come up something dramatic that it will break :) Tweaking the current settings would be fine if we didn't have a better, simpler, solution.
A side note about the xz bcj compression -- in some experiments I noticed that enabling x86 and armthumb resulted in further reduction (about 400k with the default block size). My guess was due to use of ARM instructions in the firmware blobs.
Brian C. Lane wrote:
I agree with Chris here, I think we should make the switch to plain squashfs unless someone can come up something dramatic that it will break :)
Does SquashFS support all the advanced features that are needed, such as extended attributes (used at least by SELinux), file system capabilities, etc.?
Kevin Kofler
On Tue, Jan 07, 2020 at 09:56:21AM +0100, Kevin Kofler wrote:
Brian C. Lane wrote:
I agree with Chris here, I think we should make the switch to plain squashfs unless someone can come up something dramatic that it will break :)
Does SquashFS support all the advanced features that are needed, such as extended attributes (used at least by SELinux), file system capabilities, etc.?
Yes, according to the manpage it supports xattrs.
On Mon, 2020-01-06 at 16:35 -0800, Brian C. Lane wrote:
On Sun, Jan 05, 2020 at 10:08:07AM -0700, Chris Murphy wrote:
I've pretty much concluded Fedora is best off dropping the nested ext4 in favor of plain squashfs, and using zstd. It's not required to do both, but the benefit is additive and significant. The work in dracut and lorax to support plain squashfs, assembling it using overlayfs instead of device-mapper is already done, and tested.
I agree with Chris here, I think we should make the switch to plain squashfs unless someone can come up something dramatic that it will break :) Tweaking the current settings would be fine if we didn't have a better, simpler, solution.
A side note about the xz bcj compression -- in some experiments I noticed that enabling x86 and armthumb resulted in further reduction (about 400k with the default block size). My guess was due to use of ARM instructions in the firmware blobs.
Also does squashfs support zstd compression ? IIRC the speedups in compression and decompression speed we got for RPMs[0] with zstd were pretty nice, so having the same for the media would be nice as well. :)
[0] https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression
-- Brian C. Lane (PST8PDT) - weldr.io - lorax - parted - pykickstart _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Tue, Jan 7, 2020 at 11:07 AM Martin Kolman mkolman@redhat.com wrote:
On Mon, 2020-01-06 at 16:35 -0800, Brian C. Lane wrote:
On Sun, Jan 05, 2020 at 10:08:07AM -0700, Chris Murphy wrote:
I've pretty much concluded Fedora is best off dropping the nested ext4 in favor of plain squashfs, and using zstd. It's not required to do both, but the benefit is additive and significant. The work in dracut and lorax to support plain squashfs, assembling it using overlayfs instead of device-mapper is already done, and tested.
I agree with Chris here, I think we should make the switch to plain squashfs unless someone can come up something dramatic that it will break :) Tweaking the current settings would be fine if we didn't have a better, simpler, solution.
A side note about the xz bcj compression -- in some experiments I noticed that enabling x86 and armthumb resulted in further reduction (about 400k with the default block size). My guess was due to use of ARM instructions in the firmware blobs.
Also does squashfs support zstd compression ?
Yes, that's what I was referring to in the first sentence quoted above.
On Tue, 2020-01-07 at 11:20 -0700, Chris Murphy wrote:
On Tue, Jan 7, 2020 at 11:07 AM Martin Kolman mkolman@redhat.com wrote:
On Mon, 2020-01-06 at 16:35 -0800, Brian C. Lane wrote:
On Sun, Jan 05, 2020 at 10:08:07AM -0700, Chris Murphy wrote:
I've pretty much concluded Fedora is best off dropping the nested ext4 in favor of plain squashfs, and using zstd. It's not required to do both, but the benefit is additive and significant. The work in dracut and lorax to support plain squashfs, assembling it using overlayfs instead of device-mapper is already done, and tested.
I agree with Chris here, I think we should make the switch to plain squashfs unless someone can come up something dramatic that it will break :) Tweaking the current settings would be fine if we didn't have a better, simpler, solution.
A side note about the xz bcj compression -- in some experiments I noticed that enabling x86 and armthumb resulted in further reduction (about 400k with the default block size). My guess was due to use of ARM instructions in the firmware blobs.
Also does squashfs support zstd compression ?
Yes, that's what I was referring to in the first sentence quoted above.
Oh right, now I see it. :D In any case, nice! :)
On Tue, 7 Jan 2020 at 18:07, Martin Kolman mkolman@redhat.com wrote:
IIRC the speedups in compression and decompression speed we got for RPMs[0] with zstd were pretty nice
If it helps the argument, at the moment 99.7% of the time building the AppStream metadata is spent decompressing the RPMs. If zstd helps with that *at all*, I'm a huge proponent.
Richard
Chris Murphy wrote:
Has zstandard been evaluated? In my testing of images compressed with zstd, the CPU hit is cut by more than 50%, and is no longer a bottleneck during installations. Image size does increase, although I haven't tested mksquashfs block size higher than 256K.
I think increasing the size of the live images, also affecting the download time and the time to write the image to media (even USB sticks are not instant), to get a one-time installation speedup is a very bad tradeoff.
Kevin Kofler
On Tue, Jan 7, 2020 at 10:01 AM Kevin Kofler kevin.kofler@chello.at wrote:
Chris Murphy wrote:
Has zstandard been evaluated? In my testing of images compressed with zstd, the CPU hit is cut by more than 50%, and is no longer a bottleneck during installations. Image size does increase, although I haven't tested mksquashfs block size higher than 256K.
I think increasing the size of the live images, also affecting the download time and the time to write the image to media (even USB sticks are not instant), to get a one-time installation speedup is a very bad tradeoff.
Well for the general user, everything is one-time. One download, one write to USB, one install. Saving a minute in one step and adding it to a different step doesn't really matter, it's the same sum overall (unless you pay considerable money for the extra downloaded data, of course). Where it matters is when you do a high amount of operations. And those operations are likely to be installations. Either in some school lab, where you install 20 machines, or, in our very specific example, when you perform automated testing/CI as part of the release process, and perform tens or hundreds of installation every single day. The time difference (and CPU usage difference) saved during installation gets really noticeable in such cases.
Kamil Paral wrote:
Well for the general user, everything is one-time. One download, one write to USB, one install. Saving a minute in one step and adding it to a different step doesn't really matter, it's the same sum overall (unless you pay considerable money for the extra downloaded data, of course).
But the larger download will take several minutes extra even on a low-end "broadband" connection. On slower connections, which are still standard in parts of the world, it will take hours longer.
Kevin Kofler
On Tue, Jan 7, 2020 at 4:21 PM Kevin Kofler kevin.kofler@chello.at wrote:
Kamil Paral wrote:
Well for the general user, everything is one-time. One download, one
write
to USB, one install. Saving a minute in one step and adding it to a different step doesn't really matter, it's the same sum overall (unless you pay considerable money for the extra downloaded data, of course).
But the larger download will take several minutes extra even on a low-end "broadband" connection. On slower connections, which are still standard in parts of the world, it will take hours longer.
Sorry, but your argument is just wrong. We've had a similar discussion regarding RPM payload compression and so we know we're talking about small percent number increase by changing the compression, if any. That means a few tens of MBs for e.g. the Workstation image. And if you spend *hours* to download the extra 50 MBs, that means you're on a dial-up connection and the whole image would take you a *week* to download. This whole example is simply unrealistic. Anyone who has a problem to download extra 50-100 MBs can hardly use Fedora at all, because even the first dnf metadata update will consume exactly this amount of data, and then they will be presented with 1 GB worth of system updates.
Not to mention we're talking here about removing the nested ext4 filesystem, which is likely to *reduce* the image size (and combined with changing the compression type can equal out to no change at all).
It makes no sense to pre-emptively hate the discussed changes. Let's try them and then discuss the cost/benefit of the output with actual numbers in hand.
It's untenable to consider ISO size alone. It is a legitimate concern, but it can't be reasonable to soak every single CPU, times thousands. You're willing to exchange less download time for longer install time and higher energy demand, but there are quite a lot of other uses occurring that are relevant.
I don't have an immediate breakdown but RPM switching from xz to zstd increased RPM sizes somewhat, but the installation performance makes up for the extra download time.
-- Chris Murphy
Chris Murphy wrote:
It's untenable to consider ISO size alone. It is a legitimate concern, but it can't be reasonable to soak every single CPU, times thousands. You're willing to exchange less download time for longer install time and higher energy demand, but there are quite a lot of other uses occurring that are relevant.
Downloads also require energy, on the client, on the server, on the intermediate hops, and even for the actual information transmission in the cables. So I am not convinced that you are going to save any energy by making the image significantly larger just so that it is faster to decompress.
(As for the time factor, I already explained how that does not compute either, at least in large, less-privileged parts of the world.)
Kevin Kofler
On Tue, Jan 7, 2020 at 9:00 AM Kevin Kofler kevin.kofler@chello.at wrote:
I think increasing the size of the live images, also affecting the download time and the time to write the image to media (even USB sticks are not instant), to get a one-time installation speedup is a very bad tradeoff.
While not exactly the same, the measured increase in size by the Arch community for their packaging by moving from xz to zstd was ~0.8% (and gaining a huge reduction in CPU utilization at the decompress end).
If those (approximate) numbers hold for this use case (someone would clearly have to test to confirm or refute those numbers) I would have to suggest that is likely a good tradeoff that is worth further consideration.
On Tue, Jan 7, 2020 at 9:58 AM Gary Buhrmaster gary.buhrmaster@gmail.com wrote:
On Tue, Jan 7, 2020 at 9:00 AM Kevin Kofler kevin.kofler@chello.at wrote:
I think increasing the size of the live images, also affecting the download time and the time to write the image to media (even USB sticks are not instant), to get a one-time installation speedup is a very bad tradeoff.
While not exactly the same, the measured increase in size by the Arch community for their packaging by moving from xz to zstd was ~0.8% (and gaining a huge reduction in CPU utilization at the decompress end).
If those (approximate) numbers hold for this use case (someone would clearly have to test to confirm or refute those numbers) I would have to suggest that is likely a good tradeoff that is worth further consideration.
Even at 8% bigger it would be worth it. And probably 16%.
Gaining additional features, like on the fly checksumming is worth considering (at least not making it harder to implement in the future, by taking it into account with the work implied in this proposal). The monolithic ISO check is terrible. It's dog slow. It's optional. And it's a one time check. Typically real optical media tends to work or persistently fail; whereas USB sticks can have transient bad reads (explicit or silently corrupt).
Stacked images on the same media functionality is in the kernel, it's not complicated, it's well tested, doesn't require any gymnastics in the initramfs - your bootloader entries can each point to different root=UUIDs and image assembly is figured out entirely in kernel code, no special handling in the client side deliverable. Yes the image creator needs to know some things to achieve this.
Why stacked images? Consider a single base.img that's maybe 1G, and now you don't have to do separate composes for server, cloud, GNOME, KDE, Cinnamon, LXQt, Astronomy that repeat a lot of the same steps, including expensive steps like compressing the same things over and over again. Just do a 'dnf group install' tacked onto that base.img, the work being done is custom for that output, rather than repetitive. Not complicated. It would be fast enough that the high level variants could be composed on demand. Seconds. It'd be fast enough to queue it for download within the hour.
Chris Murphy wrote:
Even at 8% bigger it would be worth it. And probably 16%.
I disagree. We need to stop treating bloat like a feature.
And please see my other replies for why this is a particularly bad tradeoff in this particular case.
Gaining additional features, like on the fly checksumming is worth considering (at least not making it harder to implement in the future, by taking it into account with the work implied in this proposal). The monolithic ISO check is terrible. It's dog slow. It's optional. And it's a one time check. Typically real optical media tends to work or persistently fail; whereas USB sticks can have transient bad reads (explicit or silently corrupt).
Can't we just drop the mediacheck entirely? It is optional for a reason.
Stacked images on the same media functionality is in the kernel, it's not complicated, it's well tested, doesn't require any gymnastics in the initramfs - your bootloader entries can each point to different root=UUIDs and image assembly is figured out entirely in kernel code, no special handling in the client side deliverable. Yes the image creator needs to know some things to achieve this.
Why stacked images? Consider a single base.img that's maybe 1G, and now you don't have to do separate composes for server, cloud, GNOME, KDE, Cinnamon, LXQt, Astronomy that repeat a lot of the same steps, including expensive steps like compressing the same things over and over again. Just do a 'dnf group install' tacked onto that base.img, the work being done is custom for that output, rather than repetitive. Not complicated. It would be fast enough that the high level variants could be composed on demand. Seconds. It'd be fast enough to queue it for download within the hour.
Then how do you deliver the stacked images? Either the user still needs to download base.img + the specific image the user actually wants, either as 2 downloads (but then how does the user reliably get them onto bootable media? Surely you don't want to require 2 media!) or as 1 combined download, or you ship one image with base.img and all the specific layers at once, which will waste a lot of download size for all the images the user does not care about. I do not see what use case would be served by stacked images.
What would be a much more useful feature is hybrid netinstall, i.e., allowing liveinst to netinstall additional packages on top of the installed live image. See the Calamares netinstall module (e.g., on my old Kannolo 27 images, as long as I don't have newer ones) for how the user experience can look like. And that requires only installer support, no file system or compression support.
Kevin Kofler
On 1/7/20 11:16 AM, Kevin Kofler wrote:
Chris Murphy wrote:
Stacked images on the same media functionality is in the kernel, it's not complicated, it's well tested, doesn't require any gymnastics in the initramfs - your bootloader entries can each point to different root=UUIDs and image assembly is figured out entirely in kernel code, no special handling in the client side deliverable. Yes the image creator needs to know some things to achieve this.
Why stacked images? Consider a single base.img that's maybe 1G, and now you don't have to do separate composes for server, cloud, GNOME, KDE, Cinnamon, LXQt, Astronomy that repeat a lot of the same steps, including expensive steps like compressing the same things over and over again. Just do a 'dnf group install' tacked onto that base.img, the work being done is custom for that output, rather than repetitive. Not complicated. It would be fast enough that the high level variants could be composed on demand. Seconds. It'd be fast enough to queue it for download within the hour.
Then how do you deliver the stacked images? Either the user still needs to download base.img + the specific image the user actually wants, either as 2 downloads (but then how does the user reliably get them onto bootable media? Surely you don't want to require 2 media!) or as 1 combined download, or you ship one image with base.img and all the specific layers at once, which will waste a lot of download size for all the images the user does not care about. I do not see what use case would be served by stacked images.
What would be a much more useful feature is hybrid netinstall, i.e., allowing liveinst to netinstall additional packages on top of the installed live image. See the Calamares netinstall module (e.g., on my old Kannolo 27 images, as long as I don't have newer ones) for how the user experience can look like. And that requires only installer support, no file system or compression support.
At first I did think the same thing as you did, that it would have all of them. But my understanding of what he was suggesting is similar to your suggestion. There's a base image and then for each spin, there's an extra stacked image that contains whatever extra is needed for that spin. So there's a separate iso for each spin, but each one has the base image plus (at least) one other.
On Sun, Jan 12, 2020 at 3:53 PM Samuel Sieb samuel@sieb.net wrote:
On 1/7/20 11:16 AM, Kevin Kofler wrote:
Chris Murphy wrote:
Stacked images on the same media functionality is in the kernel, it's not complicated, it's well tested, doesn't require any gymnastics in the initramfs - your bootloader entries can each point to different root=UUIDs and image assembly is figured out entirely in kernel code, no special handling in the client side deliverable. Yes the image creator needs to know some things to achieve this.
Why stacked images? Consider a single base.img that's maybe 1G, and now you don't have to do separate composes for server, cloud, GNOME, KDE, Cinnamon, LXQt, Astronomy that repeat a lot of the same steps, including expensive steps like compressing the same things over and over again. Just do a 'dnf group install' tacked onto that base.img, the work being done is custom for that output, rather than repetitive. Not complicated. It would be fast enough that the high level variants could be composed on demand. Seconds. It'd be fast enough to queue it for download within the hour.
Then how do you deliver the stacked images? Either the user still needs to download base.img + the specific image the user actually wants, either as 2 downloads (but then how does the user reliably get them onto bootable media? Surely you don't want to require 2 media!) or as 1 combined download, or you ship one image with base.img and all the specific layers at once, which will waste a lot of download size for all the images the user does not care about. I do not see what use case would be served by stacked images.
What would be a much more useful feature is hybrid netinstall, i.e., allowing liveinst to netinstall additional packages on top of the installed live image. See the Calamares netinstall module (e.g., on my old Kannolo 27 images, as long as I don't have newer ones) for how the user experience can look like. And that requires only installer support, no file system or compression support.
At first I did think the same thing as you did, that it would have all of them. But my understanding of what he was suggesting is similar to your suggestion. There's a base image and then for each spin, there's an extra stacked image that contains whatever extra is needed for that spin. So there's a separate iso for each spin, but each one has the base image plus (at least) one other.
The ISO contains a base.img + the diff.img that makes that particular edition/spin ISO unique. Plausibly there's more than one diff file on an ISO depending on the desired level of granularity, and whether it's useful to have more than one bootable option per ISO. A single USB stick could get imaged with a single ISO file, capable of installing all the editions.
But yeah, netinstall is a much faster and easier way to get to a one off custom "spin". A portal that basically helps create a kickstart or updates.img file for Anaconda to consume. And viola. Custom installer. The gotcha there is: it's not Live demo media, and to install multiple times you either take multiple download hits, or you have to setup a local mirror.
Why stacked images? Consider a single base.img that's maybe 1G, and now you don't have to do separate composes for server, cloud, GNOME, KDE, Cinnamon, LXQt, Astronomy that repeat a lot of the same steps, including expensive steps like compressing the same things over and over again. Just do a 'dnf group install' tacked onto that base.img, the work being done is custom for that output, rather than repetitive. Not complicated. It would be fast enough that the high level variants could be composed on demand. Seconds. It'd be fast enough to queue it for download within the hour.
Well, I would like that. +1
-- Chris Murphy _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Thanks everyone for your comments. These are all valid concerns.
I filed a new change proposal at https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS.
I'll run more benchmarks, including using Zstd compression algorithm, and will post results. Hopefully this weekend, I'll also try to measure the impact of the compression, block size versus installation time.
Also, the decompression of SquashFS is currently happening in single thread. This could be changed with CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU kernel build-time configuration option. I'll file a new change proposal for it, along with benchmarks.
On Thu, Jan 9, 2020 at 11:13 AM Lukas Ruzicka lruzicka@redhat.com wrote:
Why stacked images? Consider a single base.img that's maybe 1G, and now you don't have to do separate composes for server, cloud, GNOME, KDE, Cinnamon, LXQt, Astronomy that repeat a lot of the same steps, including expensive steps like compressing the same things over and over again. Just do a 'dnf group install' tacked onto that base.img, the work being done is custom for that output, rather than repetitive. Not complicated. It would be fast enough that the high level variants could be composed on demand. Seconds. It'd be fast enough to queue it for download within the hour.
Well, I would like that. +1
-- Chris Murphy _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
--
Lukáš Růžička
FEDORA QE, RHCE
Red Hat
Purkyňova 115
612 45 Brno - Královo Pole
lruzicka@redhat.com TRIED AND PERSONALLY TESTED, ERGO TRUSTED. https://redhat.com/trusted _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Hello,
I posted more benchmark results in this article: https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS
In short, bigger block size and higher compression ratio does not increase the installation time for Fedora Workstation. I saw the opposite effect. The Zstd compression performed worse than XZ in the compression test. On the other hand, 40% lower installation time for Zstd, was documented. Along with the CPU consumption 37% lower. All installation tests were performed from and to local NVMe storage. Which I consider far from real life scenario.
I plan to perform more testing, with slower installation media, and will post the results next weekend. I did not evaluate CONFIG_SQUASHFS_DECOMP_MULTI this weekend.
On Sat, Jan 11, 2020 at 4:38 PM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Thanks everyone for your comments. These are all valid concerns.
I filed a new change proposal at https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS.
I'll run more benchmarks, including using Zstd compression algorithm, and will post results. Hopefully this weekend, I'll also try to measure the impact of the compression, block size versus installation time.
Also, the decompression of SquashFS is currently happening in single thread. This could be changed with CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU kernel build-time configuration option. I'll file a new change proposal for it, along with benchmarks.
On Thu, Jan 9, 2020 at 11:13 AM Lukas Ruzicka lruzicka@redhat.com wrote:
Why stacked images? Consider a single base.img that's maybe 1G, and now you don't have to do separate composes for server, cloud, GNOME, KDE, Cinnamon, LXQt, Astronomy that repeat a lot of the same steps, including expensive steps like compressing the same things over and over again. Just do a 'dnf group install' tacked onto that base.img, the work being done is custom for that output, rather than repetitive. Not complicated. It would be fast enough that the high level variants could be composed on demand. Seconds. It'd be fast enough to queue it for download within the hour.
Well, I would like that. +1
-- Chris Murphy _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
--
Lukáš Růžička
FEDORA QE, RHCE
Red Hat
Purkyňova 115
612 45 Brno - Královo Pole
lruzicka@redhat.com TRIED AND PERSONALLY TESTED, ERGO TRUSTED. https://redhat.com/trusted _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
-- Bohdan Khomutskyi, RHCE Release configuration management engineer, PnT DevOps Red Hat Czech s.r.o T: +420532270289 IRC: bkhomuts
On Sun, Jan 12, 2020 at 05:44:33PM +0100, Bohdan Khomutskyi wrote:
Hello,
I posted more benchmark results in this article: https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS
In short, bigger block size and higher compression ratio does not increase the installation time for Fedora Workstation. I saw the opposite effect. The Zstd compression performed worse than XZ in the compression test. On the other hand, 40% lower installation time for Zstd, was documented. Along with the CPU consumption 37% lower. All installation tests were performed from and to local NVMe storage. Which I consider far from real life scenario.
I plan to perform more testing, with slower installation media, and will post the results next weekend. I did not evaluate CONFIG_SQUASHFS_DECOMP_MULTI this weekend.
Many thanks for doing this testing. It's really appreciated as well as driving this change forward. :)
kevin
On Sun, Jan 12, 2020 at 9:45 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Hello,
I posted more benchmark results in this article: https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS
Cool!
Do you have any tests to compare plain squashfs xz with zstd? The nested ext4 stuff is really pointless now because Fedora hasn't used 'dd' + resizing the ext4 file system as an installation method in a long time (going back to Fedora 18 I think). All of the Live installations use rsync.
The Zstd compression performed worse than XZ in the compression test. On the other hand, 40% lower installation time for Zstd, was documented. Along with the CPU consumption 37% lower. All installation tests were performed from and to local NVMe storage. Which I consider far from real life scenario.
Fedora QA nightly tests are real and I think it'll make a meaningful impact for both the creation of the ISOs, as well as their consumption, in a lot of cases. Even if it doesn't impact USB installations. I do VM installs on both SSD and NVMe and and it matters there. But also the power consumption of xz I think is relevant whether baremetal or virtual.
Thanks!
On Sun, 2020-01-12 at 15:02 -0700, Chris Murphy wrote:
Do you have any tests to compare plain squashfs xz with zstd? The nested ext4 stuff is really pointless now because Fedora hasn't used 'dd' + resizing the ext4 file system as an installation method in a long time (going back to Fedora 18 I think). All of the Live installations use rsync.
I am not entirely clear on what this proposal covers (it would actually be nice if this was specified on the Change page; it's not immediately clear *which of Fedora's images* are affected by this Change), but are you sure of this? Do we not do it even for Cloud image deployments or ARM disk image deployments? ISTR there being a filesystem resize involved there, which is why I ask...
On Mon, Jan 20, 2020 at 4:24 AM Adam Williamson adamwill@fedoraproject.org wrote:
On Sun, 2020-01-12 at 15:02 -0700, Chris Murphy wrote:
Do you have any tests to compare plain squashfs xz with zstd? The nested ext4 stuff is really pointless now because Fedora hasn't used 'dd' + resizing the ext4 file system as an installation method in a long time (going back to Fedora 18 I think). All of the Live installations use rsync.
I am not entirely clear on what this proposal covers (it would actually be nice if this was specified on the Change page; it's not immediately clear *which of Fedora's images* are affected by this Change), but are you sure of this? Do we not do it even for Cloud image deployments or ARM disk image deployments? ISTR there being a filesystem resize involved there, which is why I ask...
Cloud images don't use squashfs so they can be ignored. (But yes, some cloud related images do fs resize on first boot.)
With one exception [1], all ISO images I've looked at have a squashfs image file on them containing a LiveOS used as system root. If the change proposal implements plain squashfs images, startup assembly will use overlayfs. The current ISOs have a nested ext4 in that squashfs image, and startup assembly uses device-mapper.
Also with one exception [1], any ISO with the word "Live" in the filename, might switch from rsync based installation to unsquashfs. The netinstalls and DVD ISOs are unaffected.
[1] Fedora CoreOS, fedora-coreos-31.20200113.3.1-live.x86_64.iso (congratulations on the final release btw), which has an initramfs based LiveOS, uses dd for installation, with ignition doing provisioning on first boot that also includes GPT fixup and fs resize.
-- Chris Murphy
On Sun, Jan 12, 2020 at 5:46 PM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Hello,
I posted more benchmark results in this article: https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS
In short, bigger block size and higher compression ratio does not increase the installation time for Fedora Workstation. I saw the opposite effect. The Zstd compression performed worse than XZ in the compression test. On the other hand, 40% lower installation time for Zstd, was documented. Along with the CPU consumption 37% lower. All installation tests were performed from and to local NVMe storage. Which I consider far from real life scenario.
This is very interesting, thank you!
The "CPU user time" should be independent on the number of CPU cores you have, is that correct? I.e. the number should be always roughly the same, whether you run it on 1 core, 2 cores or 8 cores, right? I'm asking because our QA tests often use 1-2 cores for installation, and I assume you used all your available cores (if I read it correctly, you seem to have a 4 core system), therefore the "real time" value is applicable just to your system, but the "cpu user time" should be better comparable to other systems.
How exactly did you measure those numbers, can you please provide reproduction steps?
I'm quite surprised that plain squashfs is a bit smaller, but also a bit slower than squashfs+ext4. Our expectations were that it would be faster.
Looking at compressions, the most interesting results for me are: -comp xz, without -Xbcj x86 --- cutting CPU time by 50% at the expense of 30MB is awesome -Xdict-size 1M -b 1M, without -Xbcj x86 (optionally with hardlinking) --- 33% speedup while also saving 110 MB -comp zstd -Xcompression 15 -b 1M --- blazing fast installation with cutting CPU time by 80%, but also increasing the size by 150 MB
I'm sure different people will have different priorities regarding size and installation time, but these are really interesting numbers, thanks for benchmarking.
Hello,
Thanks everyone for posting feedback. More benchmarking results are available at https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS, including the 'plain' SquashFS filesystem. After performing the tests, I personally recommend to use xz compression with 1MiB block size, without bcj, on a 'plain' squash filesystem -- this will lead to a reduction of 142MiB on the ISO, compared to the stock Fedora 31 Workstation x86_64 image. Alternative compression options, such as Zstd, are also mentioned in the change proposal.
Select re-packaged ISOs of Fedora 31 Workstation x86-64 is available for download at https://khomutsky.com/fedora-dvd/
On Mon, Jan 13, 2020 at 5:34 PM Kamil Paral kparal@redhat.com wrote:
On Sun, Jan 12, 2020 at 5:46 PM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Hello,
I posted more benchmark results in this article: https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS
In short, bigger block size and higher compression ratio does not increase the installation time for Fedora Workstation. I saw the opposite effect. The Zstd compression performed worse than XZ in the compression test. On the other hand, 40% lower installation time for Zstd, was documented. Along with the CPU consumption 37% lower. All installation tests were performed from and to local NVMe storage. Which I consider far from real life scenario.
This is very interesting, thank you!
The "CPU user time" should be independent on the number of CPU cores you have, is that correct? I.e. the number should be always roughly the same, whether you run it on 1 core, 2 cores or 8 cores, right? I'm asking because our QA tests often use 1-2 cores for installation, and I assume you used all your available cores (if I read it correctly, you seem to have a 4 core system), therefore the "real time" value is applicable just to your system, but the "cpu user time" should be better comparable to other systems.
How exactly did you measure those numbers, can you please provide reproduction steps?
I'm quite surprised that plain squashfs is a bit smaller, but also a bit slower than squashfs+ext4. Our expectations were that it would be faster.
Looking at compressions, the most interesting results for me are: -comp xz, without -Xbcj x86 --- cutting CPU time by 50% at the expense of 30MB is awesome -Xdict-size 1M -b 1M, without -Xbcj x86 (optionally with hardlinking) --- 33% speedup while also saving 110 MB -comp zstd -Xcompression 15 -b 1M --- blazing fast installation with cutting CPU time by 80%, but also increasing the size by 150 MB
I'm sure different people will have different priorities regarding size and installation time, but these are really interesting numbers, thanks for benchmarking.
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Sun, Jan 19, 2020 at 8:41 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Hello,
Thanks everyone for posting feedback. More benchmarking results are available at https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS, including the 'plain' SquashFS filesystem. After performing the tests, I personally recommend to use xz compression with 1MiB block size, without bcj, on a 'plain' squash filesystem -- this will lead to a reduction of 142MiB on the ISO, compared to the stock Fedora 31 Workstation x86_64 image. Alternative compression options, such as Zstd, are also mentioned in the change proposal.
Thanks for all the tests.
While I see the meaningfully reduced CPU hit of xz compressed images, the proposal leaves a lot of performance improvement on the table by not also enabling zstd as an option in the compose process. The tests show zstd results, but the proposal doesn't mention zstd at all.
In particular for Workstation ISO, the CPU hit isn't worth the size savings for regular users, let alone the recurring hit for releng composes and QA's automated installation tests. It's a lot of CPU burn at both ends of the candle, for not a lot of size savings. I'm not convinced it's worth the extra hit on the create side for Zstd level 22, compared to Zstd 15 or 17.
I admit I'm biased toward the two endpoints: create and consume, not distribution ,i.e the mirror donors. Their storage and bandwidth concerns were evaluated with the RPM change from xz to zstd. So I'm mystified by the bias for image size.
Anyway, I approve of the change but disappointed if it really doesn't let Fedora release engineering the ability to choose (possibly based on image type - maybe there's some benefit to using xz for raw and qcow2 images).
Chris,
Thanks for your feedback and comments, it's very valuable to me.
In my previous message, I mentioned that CPU is *underutilized* during installation. I haven't investigated further why, but I suspect it's due to the inefficiency caused by the usage of the *loop* device and/or inefficiency in the rsync itself. In fact, I have an optimization to file next weekend on my to do list.
All of the Live installations use rsync.
And that's what I propose to change: to use unsquashfs instead of rsync, preliminary benchmarks show 8x improvement in decompressing speed on local media for XZ on local storage.
On my PC configuration, that will require approx. 52.98 MiB/s sequential read performance from local media and approx. 181.19MiB writing speed to the destination media. That level performance is not common among today's USB drives or optical media -- the installation speed will not be capped due to the CPU limits, but otherwise limited by the sequential read speed of the installation media. It means, selecting an algorithm with better compression ratio should reduce the installation time from commonly used USB storage.
Yes, Zstd consumes 12.24x less CPU user time while unsquashfs, but let's consider the practical application.
Will Zstd decrease the installation time, given the constraints and optimization above -- that's what I plan to investigate in upcoming weekends.
My proposal focuses on reducing the installation media size, and *recommends *to use certain compression options. But, I think, the final decision is to be made by FESCO.
On Mon, Jan 20, 2020 at 12:42 AM Chris Murphy lists@colorremedies.com wrote:
On Sun, Jan 19, 2020 at 8:41 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Hello,
Thanks everyone for posting feedback. More benchmarking results are available at
https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS, including the 'plain' SquashFS filesystem.
After performing the tests, I personally recommend to use xz compression
with 1MiB block size, without bcj, on a 'plain' squash filesystem -- this will lead to a reduction of 142MiB on the ISO, compared to the stock Fedora 31 Workstation x86_64 image.
Alternative compression options, such as Zstd, are also mentioned in the
change proposal.
Thanks for all the tests.
While I see the meaningfully reduced CPU hit of xz compressed images, the proposal leaves a lot of performance improvement on the table by not also enabling zstd as an option in the compose process. The tests show zstd results, but the proposal doesn't mention zstd at all.
In particular for Workstation ISO, the CPU hit isn't worth the size savings for regular users, let alone the recurring hit for releng composes and QA's automated installation tests. It's a lot of CPU burn at both ends of the candle, for not a lot of size savings. I'm not convinced it's worth the extra hit on the create side for Zstd level 22, compared to Zstd 15 or 17.
I admit I'm biased toward the two endpoints: create and consume, not distribution ,i.e the mirror donors. Their storage and bandwidth concerns were evaluated with the RPM change from xz to zstd. So I'm mystified by the bias for image size.
Anyway, I approve of the change but disappointed if it really doesn't let Fedora release engineering the ability to choose (possibly based on image type - maybe there's some benefit to using xz for raw and qcow2 images).
-- Chris Murphy _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Mon, Jan 20, 2020 at 09:37:32AM +0100, Bohdan Khomutskyi wrote:
Chris,
Thanks for your feedback and comments, it's very valuable to me.
In my previous message, I mentioned that CPU is *underutilized* during installation. I haven't investigated further why, but I suspect it's due to the inefficiency caused by the usage of the *loop* device and/or inefficiency in the rsync itself.
We might be able to do somewith nbdkit inside the installer to transparently uncompress and loop mount at the same time. (Although this would require xz, not zstd because of a missing feature in zstd - see my other comment in this thread).
Rich.
On Mon, Jan 20, 2020 at 1:38 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
In my previous message, I mentioned that CPU is underutilized during installation. I haven't investigated further why, but I suspect it's due to the inefficiency caused by the usage of the loop device and/or inefficiency in the rsync itself.
In all installations with xz compression, I see the loop1 device pegged at 100% CPU (single thread), and perf shows this is almost entirely lzma decompression. Better utilization would happen with parallelized decompression threads. But what about the use cases where there's only one CPU? VMs may not assign multiple CPUs, and what about the ARM boards we support?
A zstd compressed squashfs image, loop1 uses 30% CPU or less, IO of either the source or target is near 100% utilization, therefore I'm not sure parallelization in this case would improve things by much.
In fact, I have an optimization to file next weekend on my to do list.
All of the Live installations use rsync.
And that's what I propose to change: to use unsquashfs instead of rsync, preliminary benchmarks show 8x improvement in decompressing speed on local media for XZ on local storage.
I had not considered unsquashfs, so that's an interesting optimization.
Yes, Zstd consumes 12.24x less CPU user time while unsquashfs, but let's consider the practical application.
I am. There's an electricity cost when there's enough heat generated by an installation that my computer sounds like a hair dryer.
Therefore, I'm still biased against the heavy CPU cost hit for xz for an insignificant reduction in ISO size, multiplied by thousands of installations per week (real, virtual and test) quite a lot of which aren't USB sticks as sources.
Will Zstd decrease the installation time, given the constraints and optimization above -- that's what I plan to investigate in upcoming weekends.
My proposal focuses on reducing the installation media size, and recommends to use certain compression options. But, I think, the final decision is to be made by FESCO.
If image size is a significant consideration, then evaluation of erofs seems indicated. It promises both significant compression and CPU performance. The intended use case is for Android device read-only partitions with both limited storage and CPU/power capacity.
On Mon, Jan 20, 2020 at 1:38 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
In my previous message, I mentioned that CPU is underutilized during installation. I haven't investigated further why, but I suspect it's due to the inefficiency caused by the usage of the loop device and/or inefficiency in the rsync itself.
Could this be read amplification?
This paper on erofs suggests read amplification can be a significant side effect with squashfs. It could be exacerbated with random reads, and I expect it gets worse with larger block size. That's probably mitigated with unsquashfs.
Specifically page 4, 2nd paragraph. https://www.usenix.org/system/files/atc19-gao.pdf
This also makes me wonder about the memory consumption effect of a 1M block size, especially for Fedora ARM where it looks like Raspberry Pi 2B
Most of the ARM images are raw.xz but some are bootable ISOs, dvd and netinstall. And those contain a squashfs sysroot. Even if there's no out of memory problem, it could result in paging. All ISOs setup swap-on-ZRAM these days, lives, DVD, and netinstall. I think the ARM case needs testing before committing to 1M block size across all ISOs, or implementing changes in Fedora release engineering.
Hello,
I opened a request to anaconda team with a draft patch to use unsquashfs instead of rsync: https://github.com/rhinstaller/anaconda/pull/2292. That should lower the installation time from Live media. Adjustments should be made to make this patch work, I was not able to install the image using this patch. Anaconda installer crashes.
Chris,
Could this be read amplification? That's probably mitigated with unsquashfs.
That could be read amplification, it will be mitigated with unsquashfs. The memory usage should not be a problem: xz (1) manual page, states that 2 MiB of memory required to decompress an archive with 1MiB block size. My previous observations found that the squashfs is currently decompressed in single thread. I welcome additional testing, but I think the memory limit will not be a problem.
If image size is a significant consideration, then evaluation of erofs seems indicated. It promises both significant compression and CPU performance. The intended use case is for Android device read-only partitions with both limited storage and CPU/power capacity.
I briefly reviewed the document and found that LZ4 is the only supported algorithm in EROFS, even if EROFS has perfect layout of the filesystem, it's to good to be true it can outperform the XZ in compression ratio tests. The difference for SquashFS LZ4hc 1M vs XZ 1M is >40%.
On Fri, Jan 24, 2020 at 2:17 AM Chris Murphy lists@colorremedies.com wrote:
On Mon, Jan 20, 2020 at 1:38 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
In my previous message, I mentioned that CPU is underutilized during
installation. I haven't investigated further why, but I suspect it's due to the inefficiency caused by the usage of the loop device and/or inefficiency in the rsync itself.
Could this be read amplification?
This paper on erofs suggests read amplification can be a significant side effect with squashfs. It could be exacerbated with random reads, and I expect it gets worse with larger block size. That's probably mitigated with unsquashfs.
Specifically page 4, 2nd paragraph. https://www.usenix.org/system/files/atc19-gao.pdf
This also makes me wonder about the memory consumption effect of a 1M block size, especially for Fedora ARM where it looks like Raspberry Pi 2B
Most of the ARM images are raw.xz but some are bootable ISOs, dvd and netinstall. And those contain a squashfs sysroot. Even if there's no out of memory problem, it could result in paging. All ISOs setup swap-on-ZRAM these days, lives, DVD, and netinstall. I think the ARM case needs testing before committing to 1M block size across all ISOs, or implementing changes in Fedora release engineering.
-- Chris Murphy
Hello,
It was a long time since the last message in this change proposal.
Recently I was working to reduce the impact of the increased compression ratio on the installation image size for Fedora. I have achieved outstanding results -- working proof of concept. With the following change: https://github.com/rhinstaller/anaconda/pull/2292 , not only the higher compression does not impact the installation time. In certain cases, the installation time is even reduced. This is because of the fact the filesystem internal structure aware process is used to install the system from the SquashFS. The new process also allows for taking advantage of the multi-core architecture of the system during installation -- does the decompression on multiple processors in parallel.
The combination of https://github.com/rhinstaller/anaconda/pull/2292 and https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS should reduce _both_ the image size and the installation time. The installation time will be reduced in case the system is installed from the SquashFS. This is the case in Fedora Workstation.
For optimization of the SquashFS, I will work on requesting the support of the required functionality in the Pungi compose build software.
On 25/01/2020 16:36, Bohdan Khomutskyi wrote:
Hello,
I opened a request to anaconda team with a draft patch to use unsquashfs instead of rsync: https://github.com/rhinstaller/anaconda/pull/2292. That should lower the installation time from Live media. Adjustments should be made to make this patch work, I was not able to install the image using this patch. Anaconda installer crashes.
Chris,
Could this be read amplification? That's probably mitigated with unsquashfs.
That could be read amplification, it will be mitigated with unsquashfs. The memory usage should not be a problem: xz (1) manual page, states that 2 MiB of memory required to decompress an archive with 1MiB block size. My previous observations found that the squashfs is currently decompressed in single thread. I welcome additional testing, but I think the memory limit will not be a problem.
If image size is a significant consideration, then evaluation of erofs seems indicated. It promises both significant compression and CPU performance. The intended use case is for Android device read-only partitions with both limited storage and CPU/power capacity.
I briefly reviewed the document and found that LZ4 is the only supported algorithm in EROFS, even if EROFS has perfect layout of the filesystem, it's to good to be true it can outperform the XZ in compression ratio tests. The difference for SquashFS LZ4hc 1M vs XZ 1M is >40%.
On Fri, Jan 24, 2020 at 2:17 AM Chris Murphy <lists@colorremedies.com mailto:lists@colorremedies.com> wrote:
On Mon, Jan 20, 2020 at 1:38 AM Bohdan Khomutskyi <bkhomuts@redhat.com <mailto:bkhomuts@redhat.com>> wrote: > > In my previous message, I mentioned that CPU is underutilized during installation. I haven't investigated further why, but I suspect it's due to the inefficiency caused by the usage of the loop device and/or inefficiency in the rsync itself. Could this be read amplification? This paper on erofs suggests read amplification can be a significant side effect with squashfs. It could be exacerbated with random reads, and I expect it gets worse with larger block size. That's probably mitigated with unsquashfs. Specifically page 4, 2nd paragraph. https://www.usenix.org/system/files/atc19-gao.pdf This also makes me wonder about the memory consumption effect of a 1M block size, especially for Fedora ARM where it looks like Raspberry Pi 2B Most of the ARM images are raw.xz but some are bootable ISOs, dvd and netinstall. And those contain a squashfs sysroot. Even if there's no out of memory problem, it could result in paging. All ISOs setup swap-on-ZRAM these days, lives, DVD, and netinstall. I think the ARM case needs testing before committing to 1M block size across all ISOs, or implementing changes in Fedora release engineering. -- Chris Murphy
-- Bohdan Khomutskyi, RHCE Release configuration management engineer, PnT DevOps Red Hat Czech s.r.o T: +420532270289 IRC: bkhomuts
On Wed, May 13, 2020 at 1:33 PM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Hello,
It was a long time since the last message in this change proposal.
Recently I was working to reduce the impact of the increased compression ratio on the installation image size for Fedora. I have achieved outstanding results -- working proof of concept. With the following change: https://github.com/rhinstaller/anaconda/pull/2292 , not only the higher compression does not impact the installation time. In certain cases, the installation time is even reduced. This is because of the fact the filesystem internal structure aware process is used to install the system from the SquashFS. The new process also allows for taking advantage of the multi-core architecture of the system during installation -- does the decompression on multiple processors in parallel.
This sounds very good. Thanks for working on this.
On Wed, May 13, 2020 at 5:32 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Hello,
It was a long time since the last message in this change proposal.
Recently I was working to reduce the impact of the increased compression ratio on the installation image size for Fedora. I have achieved outstanding results -- working proof of concept. With the following change: https://github.com/rhinstaller/anaconda/pull/2292 , not only the higher compression does not impact the installation time. In certain cases, the installation time is even reduced. This is because of the fact the filesystem internal structure aware process is used to install the system from the SquashFS. The new process also allows for taking advantage of the multi-core architecture of the system during installation -- does the decompression on multiple processors in parallel.
The combination of https://github.com/rhinstaller/anaconda/pull/2292 and https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS should reduce _both_ the image size and the installation time. The installation time will be reduced in case the system is installed from the SquashFS. This is the case in Fedora Workstation.
For optimization of the SquashFS, I will work on requesting the support of the required functionality in the Pungi compose build software.
Hi, since the feedback was that a higher emphasis be placed on install time being reduced, even if there was some increase in ISO size (not without limit, it's a balancing act), I'm still curious how the change compares when using zstd, all other things being equal.
For example Solus recently changed from xz to zstd in squashfs, and claim 3-4x faster install times, with some increase in image size. https://getsol.us/2020/01/25/solus-4-1-released/
On Wed, May 13, 2020 at 6:03 PM Chris Murphy lists@colorremedies.com wrote:
On Wed, May 13, 2020 at 5:32 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Hello,
It was a long time since the last message in this change proposal.
Recently I was working to reduce the impact of the increased compression ratio on the installation image size for Fedora. I have achieved outstanding results -- working proof of concept. With the following change: https://github.com/rhinstaller/anaconda/pull/2292 , not only the higher compression does not impact the installation time. In certain cases, the installation time is even reduced. This is because of the fact the filesystem internal structure aware process is used to install the system from the SquashFS. The new process also allows for taking advantage of the multi-core architecture of the system during installation -- does the decompression on multiple processors in parallel.
The combination of https://github.com/rhinstaller/anaconda/pull/2292 and https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS should reduce _both_ the image size and the installation time. The installation time will be reduced in case the system is installed from the SquashFS. This is the case in Fedora Workstation.
For optimization of the SquashFS, I will work on requesting the support of the required functionality in the Pungi compose build software.
Hi, since the feedback was that a higher emphasis be placed on install time being reduced, even if there was some increase in ISO size (not without limit, it's a balancing act), I'm still curious how the change compares when using zstd, all other things being equal.
For example Solus recently changed from xz to zstd in squashfs, and claim 3-4x faster install times, with some increase in image size. https://getsol.us/2020/01/25/solus-4-1-released/
From anaconda.log for a default/auto LVM+ext4 install using Fedora-Workstation-Live-x86_64-32-1.6.iso 19:57:16 DBG ui.gui.spokes.installation_progress: The installation has finished. 19:51:52 DBG ui.gui.spokes.installation_progress: The installation has started. 00:05:24
This is not an exact comparison to using a plain squashfs image and writing out (I'm guessing) 30,000 files to the install target. But, using unsquashfs to extract the root.img and write it to the same target:
real 0m50.315s user 2m18.318s sys 0m6.569s
I'm extracting just one file, the embedded ext4. But (a) unsquashfs is parallelizing at about 270% CPU for a 3 virtual core VM and (b) /dev/loop1 isn't busy at all. Does unsquashfs and ext4 slow down when handed 30K files to write out instead of one big one? Dunno. But as prior testing suggests this is a CPU bound problem, not a disk contention problem - I'm definitely in the "tell me more" position.
2m18s is a lot better than 5m24s. And honestly 5m isn't bad, it's takes a lot longer to install Windows 10 and macOS.
I still think that zstd would get even better decompression rates with less of a CPU, and thus power hit, it could be splitting hairs. I'm not sure.
On Wed, May 13, 2020 at 13:32:19 +0200, Bohdan Khomutskyi bkhomuts@redhat.com wrote:
For optimization of the SquashFS, I will work on requesting the support of the required functionality in the Pungi compose build software.
Note that squashfs-tools 4.4 just went into rawhide a couple of days ago. By default it does reproducible builds and this might affect performance since the files need to be added in a consistent order. This probably increases the wall clock time when creating images. I wouldn't expect it to have much affect on reading data from images, but you might see some changes.
Hello,
I have implemented the necessary changes in Pungi, the software that creates Fedora compose. So this change has a potential of moving forward.
I have created 2 pull-requests for release engineering for this change:
https://pagure.io/pungi-fedora/pull-request/871
https://pagure.io/pungi-fedora/pull-request/872
Fedora release engineering would have the ability to test this change and provide fresh test results after Pungi 4.2.4 is released.
I hope this change could land in Fedora 33.
As a reminder, here is the change proposal: https://fedoraproject.org/wiki/Changes/OptimizeSquashFS
Regards,
On 13/05/2020 13:32, Bohdan Khomutskyi wrote:
Hello,
It was a long time since the last message in this change proposal.
Recently I was working to reduce the impact of the increased compression ratio on the installation image size for Fedora. I have achieved outstanding results -- working proof of concept. With the following change: https://github.com/rhinstaller/anaconda/pull/2292 , not only the higher compression does not impact the installation time. In certain cases, the installation time is even reduced. This is because of the fact the filesystem internal structure aware process is used to install the system from the SquashFS. The new process also allows for taking advantage of the multi-core architecture of the system during installation -- does the decompression on multiple processors in parallel.
The combination of https://github.com/rhinstaller/anaconda/pull/2292 and https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS should reduce _both_ the image size and the installation time. The installation time will be reduced in case the system is installed from the SquashFS. This is the case in Fedora Workstation.
For optimization of the SquashFS, I will work on requesting the support of the required functionality in the Pungi compose build software.
On 25/01/2020 16:36, Bohdan Khomutskyi wrote:
Hello,
I opened a request to anaconda team with a draft patch to use unsquashfs instead of rsync: https://github.com/rhinstaller/anaconda/pull/2292. That should lower the installation time from Live media. Adjustments should be made to make this patch work, I was not able to install the image using this patch. Anaconda installer crashes.
Chris,
Could this be read amplification? That's probably mitigated with unsquashfs.
That could be read amplification, it will be mitigated with unsquashfs. The memory usage should not be a problem: xz (1) manual page, states that 2 MiB of memory required to decompress an archive with 1MiB block size. My previous observations found that the squashfs is currently decompressed in single thread. I welcome additional testing, but I think the memory limit will not be a problem.
If image size is a significant consideration, then evaluation of erofs seems indicated. It promises both significant compression and CPU performance. The intended use case is for Android device read-only partitions with both limited storage and CPU/power capacity.
I briefly reviewed the document and found that LZ4 is the only supported algorithm in EROFS, even if EROFS has perfect layout of the filesystem, it's to good to be true it can outperform the XZ in compression ratio tests. The difference for SquashFS LZ4hc 1M vs XZ 1M is >40%.
On Fri, Jan 24, 2020 at 2:17 AM Chris Murphy <lists@colorremedies.com mailto:lists@colorremedies.com> wrote:
On Mon, Jan 20, 2020 at 1:38 AM Bohdan Khomutskyi <bkhomuts@redhat.com <mailto:bkhomuts@redhat.com>> wrote: > > In my previous message, I mentioned that CPU is underutilized during installation. I haven't investigated further why, but I suspect it's due to the inefficiency caused by the usage of the loop device and/or inefficiency in the rsync itself. Could this be read amplification? This paper on erofs suggests read amplification can be a significant side effect with squashfs. It could be exacerbated with random reads, and I expect it gets worse with larger block size. That's probably mitigated with unsquashfs. Specifically page 4, 2nd paragraph. https://www.usenix.org/system/files/atc19-gao.pdf This also makes me wonder about the memory consumption effect of a 1M block size, especially for Fedora ARM where it looks like Raspberry Pi 2B Most of the ARM images are raw.xz but some are bootable ISOs, dvd and netinstall. And those contain a squashfs sysroot. Even if there's no out of memory problem, it could result in paging. All ISOs setup swap-on-ZRAM these days, lives, DVD, and netinstall. I think the ARM case needs testing before committing to 1M block size across all ISOs, or implementing changes in Fedora release engineering. -- Chris Murphy
-- Bohdan Khomutskyi, RHCE Release configuration management engineer, PnT DevOps Red Hat Czech s.r.o T: +420532270289 IRC: bkhomuts
-- Bohdan Khomutskyi
Red Hat
On Sunday, January 19, 2020 4:41:06 PM MST Chris Murphy wrote:
I admit I'm biased toward the two endpoints: create and consume, not distribution ,i.e the mirror donors. Their storage and bandwidth concerns were evaluated with the RPM change from xz to zstd. So I'm mystified by the bias for image size.
End consumers also benefit from reduced image size, namely in the amount of space needed for installation media.
On Sun, Jan 19, 2020 at 4:42 PM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Hello,
Thanks everyone for posting feedback. More benchmarking results are available at https://fedoraproject.org/wiki/Category:Changes/OptimizeSquashFS, including the 'plain' SquashFS filesystem. After performing the tests, I personally recommend to use xz compression with 1MiB block size, without bcj, on a 'plain' squash filesystem -- this will lead to a reduction of 142MiB on the ISO, compared to the stock Fedora 31 Workstation x86_64 image. Alternative compression options, such as Zstd, are also mentioned in the change proposal.
Hmm, and I see I've been completely confused in my last reply, and I considered the numbers in the first image [1] to be installation times. Instead they are image creation times. OK. Looking at the new second image [2], Zstd seems like a clear winner to me at least for QA purposes. With fine tuning the compression level, we can achieve almost the same file size for a great installation speedup.
[1] https://fedoraproject.org/wiki/File:Compression_vs_SquashFS_creation_time.pn... [2] https://fedoraproject.org/wiki/File:Compression_vs_installation_time.png
Gary Buhrmaster wrote:
While not exactly the same, the measured increase in size by the Arch community for their packaging by moving from xz to zstd was ~0.8% (and gaining a huge reduction in CPU utilization at the decompress end).
I don't know what xz settings Arch was using, but in the case of our RPMs, xz was being used with very conservative settings, mainly so that applying DeltaRPMs (which recompresses the RPMs) can be done in a reasonable time (and by the way, IIRC, the switch to zstd actually slows down that use case!), though decompression time was also a criterion. That's why switching to zstd was not a huge size increase. But we could have saved significant size by using higher xz compression rates.
If Arch was using similar settings, that would explain the relatively small size increase. But it is still a size increase.
Kevin Kofler
On Sun, Jan 5, 2020, at 12:08 PM, Chris Murphy wrote:
I've pretty much concluded Fedora is best off dropping the nested ext4 in favor of plain squashfs, and using zstd.
Fedora CoreOS already uses zstd for squashfs: https://github.com/coreos/coreos-assembler/blob/master/src/cmd-buildextend-i...
(I'm also working to rebase Fedora Silverblue on Fedora CoreOS' toolchain and I'd like the same to happen for IoT, which would mean this proposal should more clearly call out it's affecting Fedora images that use Anaconda)
On Sun, Jan 05, 2020 at 10:08:07AM -0700, Chris Murphy wrote:
In my testing, xz does provide better compression ratios, well suited for seldom used images like archives. But it really makes the installation experience worse by soaking the CPU, times thousands of installations (openQA tests on every single nightly, every human QA tester for nightlies, betas, and then the final released product used by Fedora end users).
Has zstandard been evaluated? In my testing of images compressed with zstd, the CPU hit is cut by more than 50%, and is no longer a bottleneck during installations. Image size does increase, although I haven't tested mksquashfs block size higher than 256K. Using zstd with Fedora images also builds on prior evaluation, testing, and effort moving RPM from xz to zstd.
Blocked-based decompression works with xz, but not with zstd. We do use this feature. Here's the github issue to get block-based decompression supported in zstd, and a bit of background about how we use the feature:
https://github.com/facebook/zstd/issues/395#issuecomment-535875379
Rich.
On Sunday, January 5, 2020 5:23:12 AM MST Bohdan Khomutskyi wrote:
Summary
Improve compression ratio of SquashFS filesystem on the installation media. Owner
Name: Bohdan Khomutskyi
Email: bkhomuts@redhat.com Current status
Targeted release: I propose this change for Fedora 32
Last updated: Jan 5 2020
Pagure.io issue: https://pagure.io/releng/issue/9127
I was unable to create an article in Fedora wiki system. Detailed Description
As of Fedora 31, the LiveOS/squashfs.img file on the installation image, is compressed with default settings of mksquashfs. The standard configuration is set to XZ algorithm with block size of 128k and BCJ filter enabled. Those parameters can be adjusted which will lead to a better compression ratio and/or reduction of the CPU usage at build time.
This is simple to achieve. Recently, Lorax has gotten support[1] for adjusting the compression options for mksquashfs via the configuration file. The file should be altered as following:
[compression] bcj = yes args = -b 1M -Xdict-size 1M -no-recovery
Where -b 1M and -Xdict-size 1M are block and dictionary sizes respectively. Could be adjusted. Benefit to Fedora
Reduction of the installation media size and the cost of storing and distributing Fedora.
Reduction of the CPU usage at build time. Depending on which compression parameters chosen.
See a graphical detail at https://pagure.io/releng/issue/9127.
Scope
Proposal owners:
The build environment should have support for adjusting the Lorax configuration file.. Lorax is a program that produces the LiveOS/squashfs.img file on the installation media.
One of the way to allow for such customization, is to add a feature in Pungi, to allow for passing -c option to Lorax.
Release engineering: #9127 https://pagure.io/releng/issue/9127
Policies and guidelines: N/A
Trademark approval: N/A
Upgrade/compatibility impact
This change comes at a cost of higher memory usage during the installation. Based on my personal estimations, this should not be the issue. Since the decompression should require up to 1MiB per thread.
User Experience
Increasing the block size on the current configuration with EXT4 file system, should increase latency while accessing the EXT4 filesystem. The exact impact is to be evaluated.
The impact of latency will be reduced, if the plain SquashFS option is be choosen.
Dependencies
N/A
Contingency Plan
N/A
Documentation https://pagure.io/releng/issue/9127.
mksquashfs(1) Release notes See also
https://pagure.io/releng/issue/8646
--
Bohdan Khomutskyi
Release Configuration Management engineer Red Hat
This looks like an excellent Change, especially as these images grow.
Hi,
it does not look you follow the formal "Changes policy". The process for submitting change proposals is described here:
https://docs.fedoraproject.org/en-US/program_management/changes_policy/
Vít
Dne 05. 01. 20 v 13:23 Bohdan Khomutskyi napsal(a):
Summary
Improve compression ratio of SquashFS filesystem on the installation media.
Owner
Name: Bohdan Khomutskyi
Email: bkhomuts@redhat.com mailto:bkhomuts@redhat.com
Current status
Targeted release: I propose this change for Fedora 32
Last updated: Jan 5 2020
Pagure.io issue: https://pagure.io/releng/issue/9127
I was unable to create an article in Fedora wiki system.
Detailed Description
As of Fedora 31, the LiveOS/squashfs.img file on the installation image, is compressed with default settings of mksquashfs. The standard configuration is set to XZ algorithm with block size of 128k andBCJ filter enabled. Those parameters can be adjusted which will lead to a better compression ratio and/or reduction of the CPU usage at build time.
This is simple to achieve. Recently, Lorax has gotten support[1] for adjusting the compression options for mksquashfs via the configuration file. The file should be altered as following:
[compression] bcj = yes args = -b 1M -Xdict-size 1M -no-recovery
Where -b 1M and -Xdict-size 1M are block and dictionary sizes respectively. Could be adjusted.
Benefit to Fedora
Reduction of the installation media size and the cost of storing and distributing Fedora.
Reduction of the CPU usage at build time. Depending on which compression parameters chosen.
See a graphical detail at https://pagure.io/releng/issue/9127.
Scope
Proposal owners:
The build environment should have support for adjusting the Lorax configuration file.. Lorax is a program that produces the LiveOS/squashfs.img file on the installation media.
One of the way to allow for such customization, is to add a feature in Pungi, to allow for passing -c option to Lorax.
Release engineering: #9127 https://pagure.io/releng/issue/9127
Policies and guidelines: N/A
Trademark approval: N/A
Upgrade/compatibility impact
This change comes at a cost of higher memory usage during the installation. Based on my personal estimations, this should not be the issue. Since the decompression should require up to 1MiB per thread.
User Experience
Increasing the block size on the current configuration with EXT4 file system, should increase latency while accessing the EXT4 filesystem. The exact impact is to be evaluated.
The impact of latency will be reduced, if the plain SquashFS option is be choosen.
Dependencies
N/A
Contingency Plan
N/A
Documentation
https://pagure.io/releng/issue/9127.
mksquashfs(1)
Release notes
See also
https://pagure.io/releng/issue/8646
--
Bohdan Khomutskyi
Release Configuration Management engineer Red Hat
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Sun, Jan 5, 2020 at 7:24 AM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
I was unable to create an article in Fedora wiki system.
Since you don't have any non-CLA groups in FAS, I have added you to the wikiedit group. Please add this to the wiki ASAP. This proposal is past the deadline for Fedora 32 System-Wide Change proposals, but since it is only a few days late, I'll continue shepherding it and FESCo can decide if it's worth granting an exception.
On Sun, Jan 5, 2020 at 1:24 PM Bohdan Khomutskyi bkhomuts@redhat.com wrote:
Summary
Improve compression ratio of SquashFS filesystem on the installation media.
Hi Bohdan, as a member of QA, I'll happily support any proposal that improves the installation speed (the image size is not that important from my POV). Chris found that dropping the nested ext4 filesystem can get us substantial gains in this area, as well as changing xz to zstd. I see he already provided you with links to the respective tickets. Perhaps you can work with him to make this change a reality (and benefit in both areas - speed and size)? QA would really appreciate it :-) Thanks! Kamil