Why not zfs?
On 6/26/2020 10:42 AM, Ben Cotton wrote:
> == Summary ==
> For laptop and workstation installs of Fedora, we want to provide file
> system features to users in a transparent fashion. We want to add new
> features, while reducing the amount of expertise needed to deal with
> situations like [https://pagure.io/fedora-workstation/issue/152
> running out of disk space.] Btrfs is well adapted to this role by
> design philosophy, let's make it the default.
> == Owners ==
> * Names: [[User:Chrismurphy|Chris Murphy]], [[User:Ngompa|Neal
> Gompa]], [[User:Josef|Josef Bacik]], [[User:Salimma|Michel Alexandre
> Salim]], [[User:Dcavalca|Davide Cavalca]], [[User:eeickmeyer|Erich
> Eickmeyer]], [[User:ignatenkobrain|Igor Raits]],
> [[User:Raveit65|Wolfgang Ulbrich]], [[User:Zsun|Zamir SUN]],
> [[User:rdieter|Rex Dieter]], [[User:grinnz|Dan Book]],
> [[User:nonamedotc|Mukundan Ragavan]]
> * Emails: chrismurphy(a)fedoraproject.org, ngompa13(a)gmail.com,
> josef(a)toxicpanda.com, michel(a)michel-slm.name, dcavalca(a)fb.com,
> erich(a)ericheickmeyer.com, ignatenkobrain(a)fedoraproject.org,
> fedora(a)raveit.de, zsun(a)fedoraproject.org, rdieter(a)gmail.com,
> grinnz(a)gmail.com, nonamedotc(a)gmail.com
> * Products: All desktop editions, spins, and labs
> * Responsible WGs: Workstation Working Group, KDE Special Interest Group
> == Detailed Description ==
> Fedora desktop edition/spin variants will switch to using Btrfs as the
> filesystem by default for new installs. Labs derived from these
> variants inherit this change, and other editions may opt into this
> The change is based on the installer's custom partitioning Btrfs
> preset. It's been well tested for 7 years.
> '''''Current partitioning'''''<br />
> <span style="color: tomato">vg/root</span> LV mounted at
> style="color: tomato">/</span> and a <span style="color:
> tomato">vg/home</span> LV mounted at <span style="color:
> tomato">/home</span>. These are separate file system volumes, with
> separate free/used space.
> '''''Proposed partitioning'''''<br />
> <span style="color: tomato">root</span> subvolume mounted at
> style="color: tomato">/</span> and <span style="color:
> tomato">home</span> subvolume mounted at <span style="color:
> tomato">/home</span>. Subvolumes don't have size, they act mostly
> directories, space is shared.
> '''''Unchanged'''''<br />
> <span style="color: tomato">/boot</span> will be a small ext4
> A separate boot is needed to boot dm-crypt sysroot installations; it's
> less complicated to keep the layout the same, regardless of whether
> sysroot is encrypted. There will be no automatic snapshots/rollbacks.
> If you select to encrypt your data, LUKS (dm-crypt) will be still used
> as it is today (with the small difference that Btrfs is used instead
> of LVM+Ext4). There is upstream work on getting native encryption for
> Btrfs that will be considered once ready and is subject of a different
> change proposal in a future Fedora release.
> === Optimizations (Optional) ===
> The detailed description above is the proposal. It's intended to be a
> minimalist and transparent switch. It's also the same as was
> [[Features/F16BtrfsDefaultFs|proposed]] (and
accepted]) for Fedora 16. The
> following optimizations improve on the proposal, but are not critical.
> They are also transparent to most users. The general idea is agree to
> the base proposal first, and then consider these as enhancements.
> ==== Boot on Btrfs ====
> * Instead of a 1G ext4 boot, create a 1G Btrfs boot.
> * Advantage: Makes it possible to include in a snapshot and rollback
> regime. GRUB has stable support for Btrfs for 10+ years.
> * Scope: Contingent on bootloader and installer team review and
> approval. blivet should use <code>mkfs.btrfs --mixed</code>.
> ==== Compression ====
> * Enable transparent compression using zstd on select directories:
> <span style="color: tomato">/usr</span> <span
> tomato">/var/lib/flatpak</span> <span style="color:
> * Advantage: Saves space and significantly increase the lifespan of
> flash-based media by reducing write amplification. It may improve
> performance in some instances.
> * Scope: Contingent on installer team review and approval to enhance
> anaconda to perform the installation using <code>mount -o
> compress=zstd</code>, then set the proper XATTR for each directory.
> The XATTR can't be set until after the directories are created via:
> rsync, rpm, or unsquashfs based installation.
> ==== Additional subvolumes ====
> * <span style="color: tomato">/var/log/</span> <span
> tomato">/var/lib/libvirt/images</span> and <span
> tomato">~/.local/share/gnome-boxes/images/</span> will use separate
> * Advantage: Makes it easier to excluded them from snapshots,
> rollbacks, and send/receive. (Btrfs snapshotting is not recursive, it
> stops at a nested subvolume.)
> * Scope: Anaconda knows how to do this already, just change the
> kickstart to add additional subvolumes (minus the subvolume in <span
> style="color: tomato">~/</span>. GNOME Boxes will need enhancement
> detect that the user home is on Btrfs and create <span style="color:
> tomato">~/.local/share/gnome-boxes/images/</span> as a subvolume.
> == Feedback ==
> ==== Red Hat doesn't support Btrfs? Can Fedora do this? ====
> Red Hat supports Fedora well, in many ways. But Fedora already works
> closely with, and depends on, upstreams. And this will be one of them.
> That's an important consideration for this proposal. The community has
> a stake in ensuring it is supported. Red Hat will never support Btrfs
> if Fedora rejects it. Fedora necessarily needs to be first, and make
> the persuasive case that it solves more problems than alternatives.
> Feature owners believe it does, hands down.
> The Btrfs community has users that have been using it for most of the
> past decade at scale. It's been the default on openSUSE (and SUSE
> Linux Enterprise) since 2014, and Facebook has been using it for all
> their OS and data volumes, in their data centers, for almost as long.
> Btrfs is a mature, well-understood, and battle-tested file system,
> used on both desktop/container and server/cloud use-cases. We do have
> developers of the Btrfs filesystem maintaining and supporting the code
> in Fedora, one is a Change owner, so issues that are pinned to Btrfs
> can be addressed quickly.
> ==== What about device-mapper alternatives? ====
> dm-thin (thin provisioning):
> [[https://pagure.io/fedora-workstation/issue/152 Issue #152] still
> happens, because the installer won't over provision by default. It
> still requires manual intervention by the user to identify and resolve
> the problem. Upon growing a file system on dm-thin, the pool is over
> committed, and file system sizes become a fantasy: they don't add up
> to the total physical storage available. The truth of used and free
> space is only known by the thin pool, and CLI and GUI programs are
> unprepared for this. Integration points like rpm free space checks or
> GNOME disk-space warnings would have to be adapted as well.
> dm-vdo: is not yet merged, and isn't as straightforward to selectively
> enable per directory and per file, as is the case on Btrfs using
> <code>chattr +c</code> on <span style="color:
> Btrfs solves the problems that need solving, with few side effects or
> pitfalls for users. It has more features we can take advantage of
> immediately and transparently: compression, integrity, and IO
> isolation. Many Btrfs features and optimizations can be opted into
> selectively per directory or file, such as compression and nodatacow,
> rather than as a layer that's either on or off.
> ==== What about UI/UX and integration in the desktop? ====
> If Btrfs isn't the default file system, there's no commitment, nor
> reason to work on any UI/UX integration. There are ideas to make
> certain features discoverable: selective compression; systemd-homed
> may take advantage of either Btrfs online resize, or near-term planned
> native encryption, which could make it possible to live convert
> non-encrypted homes to encrypted; and system snapshot and rollbacks.
> Anaconda already has sophisticated Btrfs integration.
> ==== What Btrfs features are recommended and supported? ====
> The primary goal of this feature is to be largely transparent to the
> user. It does not require or expect users to learn new commands, or to
> engage in peculiar maintenance rituals.
> The full set of Btrfs features that is considered stable and enabled
> by default upstream will be enabled in Fedora. Fedora is a community
> project. What is supported within Fedora depends on what the community
> decides to put forward in terms of resources.
> The upstream [https://btrfs.wiki.kernel.org/index.php/Status
> feature status page].
> ==== Are subvolumes really mostly like directories? ====
> Subvolumes behave like directories in terms of navigation in both the
> GUI and CLI, e.g. <code>cp</code>, <code>mv</code>,
> owner/permissions, and SELinux labels. They also share space, just
> like a directory.
> But it is an incomplete answer.
> A subvolume is an independent file tree, with its own POSIX namespace,
> and has its own pool of inodes. This means inode numbers repeat
> themselves on a Btrfs volume. Inodes are only unique within a given
> subvolume. A subvolume has its own st_dev, so if you use <code>stat
> FILE</code> it reports a device value referring to the subvolume the
> file is in. And it also means hard links can't be created between
> subvolumes. From this perspective, subvolumes start looking more like
> a separate file system. But subvolumes share most of the other trees,
> so they're not truly independent file systems. They're also not block
> == Benefit to Fedora ==
> Problems Btrfs helps solve:
> * Users running out of free space on either <span style="color:
> tomato">/</span> or <span style="color:
> ** "one big file system": no hard barriers like partitions or logical
> ** transparent compression: significantly reduces write amplification,
> improves lifespan of storage hardware
> ** reflinks and snapshots are more efficient for use cases like
> containers (Podman supports both)
> * Storage devices can be flaky, resulting in data corruption
> ** Everything is checksummed and verified on every read
> ** Corrupt data results in EIO (input/output error), instead of
> resulting in application confusion, and isn't replicated into backups
> and archives
> * Poor desktop responsiveness when under pressure
> ** Currently only Btrfs has proper IO isolation capability via cgroups2
> ** Completes the resource control picture: memory, cpu, IO isolation
> * File system resize
> ** Online shrink and grow are fundamental to the design
> * Complex storage setups are... complicated
> ** Simple and comprehensive command interface. One master command
> ** Simpler to boot, all code is in the kernel, no initramfs complexities
> ** Simple and efficient file system replication, including incremental
> backups, with <code>btrfs send</code> and <code>btrfs
> == Scope ==
> * Proposal owners:
> ** Submit PR's for Anaconda to change <code>default_scheme =
> BTRFS</code> to the proper product files.
> ** Multiple test days: build community support network
> ** Aid with documentation
> * Other developers:
> ** Anaconda, review PRs and merge
> ** Bootloader team, review PRs and merge
> ** Recommended optimization <code>chattr +C</code> set on the
> containing directory for virt-manager and GNOME Boxes.
> * Release engineering: [https://pagure.io/releng/issue/9545
> * Policies and guidelines: N/A
> * Trademark approval: N/A
> == Upgrade/compatibility impact ==
> Change will not affect upgrades.
> Documentation will be provided for existing Btrfs users to "retrofit"
> their setups to that of a default Btrfs installation (base plus any
> approved options).
> == How To Test ==
> '''''Today'''''<br />
> Do a custom partitioning installation; change the scheme drop-down
> menu to Btrfs; click the blue "automatically create partitions"; and
> install.<br />
> Fedora 31, 32, Rawhide, on x86_64 and ARM.
> '''''Once change lands'''''<br />
> It should be simple enough to test, just do a normal install.
> == User Experience ==
> ==== Pros ====
> * Mostly transparent
> * Space savings from compression
> * Longer lifespan of hardware, also from compression.
> * Utilities for used and free space, CLI and GUI, are expected to
> behave the same. No special commands are required.
> * More detailed information can be revealed by <code>btrfs</code>
> specific commands.
> ==== Enhancement opportunities ====
updatedb does not
> index /home when /home is a bind mount] Also can affected rpm-ostree
> installations, including Silverblue.
> Incorrect numbers when using multiple btrfs subvolumes] This isn't
> Btrfs specific, happens with "one big ext4" volume as well.
> RFE: create qcow2 with 'nocow' option when on btrfs /home] This is
> Btrfs specific, and is a recommended optimization for both GNOME Boxes
> and virt-manager.
> automatically use btrfs driver if on btrfs]
> == Dependencies ==
> == Contingency Plan ==
> * Contingency mechanism: Owner will revert changes back to LVM+ext4
> * Contingency deadline: Beta freeze
> * Blocks release? Yes
> * Blocks product? Workstation and KDE
> == Documentation ==
> Strictly speaking no documentation is required reading for users. But
> there will be some Fedora documentation to help get the ball rolling.
> For those who want to know more:
btrfs wiki main
> page and full feature list.]
> <code>man 5 btrfs</code> contains: mount options, features, swapfile
> support, checksum algorithms, and more<br />
> <code>man btrfs</code> contains an overview of the btrfs
> <code>man btrfs <nowiki><subcommand></nowiki></code>
will show the man
> page for that subcommand
> NOTE: The btrfs command will accept partial subcommands, as long as
> it's not ambiguous. These are equivalent commands:<br />
> <code>btrfs subvolume snapshot</code><br />
> <code>btrfs sub snap</code><br />
> <code>btrfs su sn</code>
> You'll discover your own convention. It might be preferable to write
> out the full command on forums and lists, but then maybe some folks
> don't learn about this useful shortcut?
> For those who want to know a lot more:
> Btrfs developer documentation]<br />
> == Release Notes ==
> The default file system on the desktop is Btrfs.