On Wed, Jul 1, 2020 at 5:49 AM Steven Whitehouse <swhiteho(a)redhat.com> wrote:
If the / and /home split is the main issue, then dm-thin might be an
alternative solution, and we should check to see if some of the issues
listed on the change page have been addressed. I'm copying in Jon for
additional comment on that. Are those btrfs benefits which are listed on
the change page in priority order?
They are of equal priority, from the perspective of both feature
owners and the working group, based on many months of discussion.
Individual users definitely have their own priorities, that also vary.
There is perhaps an emphasis on solving /home and / free space
competition, because it is one of the most pernicious issues that
really leaves users on an island to fend for themselves in order to
Importantly dm-thin doesn't fix this problem by avoiding it in the
first place, which Btrfs does. On dm-thin, the user must still
identify which file system is out of space, and grow the file system.
Once file systems are either snapshot or over provisioned, the only
arbiter of used and free space truth is the thin pool. File system
sizes are virtual that currently CLI and GUI apps are unprepared to
deal with. And that sets up a prerequisite solution before anything
dm-thin based could be used, because having fantasy free space
reporting is objectively a UX regression.
The transparent compression feature is perhaps understated. It's
configurable per directory and per file. That includes algorithm
selection. A future feature includes configurable compression level in
the XATTR. The simplest use case is selective compression of high
value targets like /usr and flatpaks. Future feature ideas include
user selection of directories, and UI that shows compression efficacy.
Reflinks are permitted between Btrfs subvolumes, where neither
reflinks nor hard links are possible between dm-thin snapshots. One
use case is cheaply restoring individual files from snapshots. Also
thin snapshots currently pin the file system journal inside, making
them rather expensive in terms of space consumption, compared to Btrfs
snapshots. Again, the cost of thin snapshots is only revealed by the
thin pool, not the file system. Where as on Btrfs 'df' and friends are
expected to properly report free space, and they do.
Also, cgroup2 developers report that the IO isolation features of any
file system are lost on anything device-mapper based. And while that
work is in progress, it's not there yet.
Integrity checking is highly valued by some and less by others.
Considering that we know hardware isn't 100% reliable, and doesn't
always report its own failures as expected, and hence why most file
systems now at least checksum metadata, it's not persuasive to me that
the data should be left unchecked, and corruption ought to be handled
by user space somehow.
File system resize is mentioned there, but pretty much all local
filesystems support grow. Also, no use cases are listed for that
Windows and macOS support online shrink and grow for more than a
decade. While it doesn't often come up on the desktop, if you don't
have it and need it, it's aggravating.
The typical use case today is to reprovision a system with an
additional or eventual substitute OS, without first having to destroy
another. I'd call it rare. But it's also essentially expected.
The much more common use case is for systemd-homed for managing
authentication and user homes, including when encrypted. It's not
decided whether to integrate sd-homed but it supports multiple storage
types. One of those storage types, LUKS on loop, does effectively
depend on file system shrink capability. While the use case for Fedora
is mainly single user, and to optimize for that case, it is not
exclusively single user so the chosen solution shouldn't cause
difficult regressions. And we get a number of free and used space
knock on effects here if the file system can't do online shrink. Is
lack of online shrink disqualifying? No, but having it significantly
improves UX. So whether LUKS or future fscrypt/Btrfs encryption, this
road points to Btrfs.
Yes, we could drop LVM and go with one big file system, that too was
discussed. The main knock on effect there is a significant minority of
users want to do a clean install of Fedora from time to time while
preserving user home. The installer permits this behavior with LVM
layouts, and Btrfs by only requiring a new root subvolume be created
for mounting at /.
It doesn't mean Btrfs applies to 100% of Fedora use cases. No single
layout does. But Btrfs consistently solves more problems than causing
more knock on effects. This doesn't make the alternatives bad. It just
leaves a variety of problems unsolved. That too isn't inherently bad
or disqualifying, but it's an opportunity that begs for a more
Shrink is more tricky, and can easily result in poor file
layouts, particularly if there are repeated grow/shrink operations, not
to mention potential complications with NFS if the fs is exported.
A systemd-homed workflow suggests some cases where there will be many
grow/shrink operations. If there are two or three active users, this
might mean several grow/shrink operations per day. So it would need to
be a file system explicitly designed for this in mind, including no
negative locality knock on effects. Only Btrfs meets this use case
requirement without knock on effects. Its metadata has no fixed
layout, its written dynamically so it doesn't suffer from the poor
layout problem other file systems do.
Eric has already pointed out that XFS has cgroups2 support, so the
statement that btrfs is the only fs with that is incorrect. It would
help to make things a bit clearer if that list was updated, with the
information gathered so far,
Updated. I've asked a number of cgroups2 kernel developers about this
and they've consistently told me that they know Btrfs does it
correctly, ext4 has priority inversions, and they don't know about