https://fedoraproject.org/wiki/Changes/BtrfsByDefault
== Summary ==
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
== Owners ==
* Names: [[User:Chrismurphy|Chris Murphy]], [[User:Ngompa|Neal Gompa]], [[User:Josef|Josef Bacik]], [[User:Salimma|Michel Alexandre Salim]], [[User:Dcavalca|Davide Cavalca]], [[User:eeickmeyer|Erich Eickmeyer]], [[User:ignatenkobrain|Igor Raits]], [[User:Raveit65|Wolfgang Ulbrich]], [[User:Zsun|Zamir SUN]], [[User:rdieter|Rex Dieter]], [[User:grinnz|Dan Book]], [[User:nonamedotc|Mukundan Ragavan]] * Emails: chrismurphy@fedoraproject.org, ngompa13@gmail.com, josef@toxicpanda.com, michel@michel-slm.name, dcavalca@fb.com, erich@ericheickmeyer.com, ignatenkobrain@fedoraproject.org, fedora@raveit.de, zsun@fedoraproject.org, rdieter@gmail.com, grinnz@gmail.com, nonamedotc@gmail.com
* Products: All desktop editions, spins, and labs * Responsible WGs: Workstation Working Group, KDE Special Interest Group
== Detailed Description ==
Fedora desktop edition/spin variants will switch to using Btrfs as the filesystem by default for new installs. Labs derived from these variants inherit this change, and other editions may opt into this change.
The change is based on the installer's custom partitioning Btrfs preset. It's been well tested for 7 years.
'''''Current partitioning'''''<br /> <span style="color: tomato">vg/root</span> LV mounted at <span style="color: tomato">/</span> and a <span style="color: tomato">vg/home</span> LV mounted at <span style="color: tomato">/home</span>. These are separate file system volumes, with separate free/used space.
'''''Proposed partitioning'''''<br /> <span style="color: tomato">root</span> subvolume mounted at <span style="color: tomato">/</span> and <span style="color: tomato">home</span> subvolume mounted at <span style="color: tomato">/home</span>. Subvolumes don't have size, they act mostly like directories, space is shared.
'''''Unchanged'''''<br /> <span style="color: tomato">/boot</span> will be a small ext4 volume. A separate boot is needed to boot dm-crypt sysroot installations; it's less complicated to keep the layout the same, regardless of whether sysroot is encrypted. There will be no automatic snapshots/rollbacks.
If you select to encrypt your data, LUKS (dm-crypt) will be still used as it is today (with the small difference that Btrfs is used instead of LVM+Ext4). There is upstream work on getting native encryption for Btrfs that will be considered once ready and is subject of a different change proposal in a future Fedora release.
=== Optimizations (Optional) ===
The detailed description above is the proposal. It's intended to be a minimalist and transparent switch. It's also the same as was [[Features/F16BtrfsDefaultFs|proposed]] (and [https://lwn.net/Articles/446925/ accepted]) for Fedora 16. The following optimizations improve on the proposal, but are not critical. They are also transparent to most users. The general idea is agree to the base proposal first, and then consider these as enhancements.
==== Boot on Btrfs ====
* Instead of a 1G ext4 boot, create a 1G Btrfs boot. * Advantage: Makes it possible to include in a snapshot and rollback regime. GRUB has stable support for Btrfs for 10+ years. * Scope: Contingent on bootloader and installer team review and approval. blivet should use <code>mkfs.btrfs --mixed</code>.
==== Compression ====
* Enable transparent compression using zstd on select directories: <span style="color: tomato">/usr</span> <span style="color: tomato">/var/lib/flatpak</span> <span style="color: tomato">~/.local/share/flatpak</span> * Advantage: Saves space and significantly increase the lifespan of flash-based media by reducing write amplification. It may improve performance in some instances. * Scope: Contingent on installer team review and approval to enhance anaconda to perform the installation using <code>mount -o compress=zstd</code>, then set the proper XATTR for each directory. The XATTR can't be set until after the directories are created via: rsync, rpm, or unsquashfs based installation.
==== Additional subvolumes ====
* <span style="color: tomato">/var/log/</span> <span style="color: tomato">/var/lib/libvirt/images</span> and <span style="color: tomato">~/.local/share/gnome-boxes/images/</span> will use separate subvolumes. * Advantage: Makes it easier to excluded them from snapshots, rollbacks, and send/receive. (Btrfs snapshotting is not recursive, it stops at a nested subvolume.) * Scope: Anaconda knows how to do this already, just change the kickstart to add additional subvolumes (minus the subvolume in <span style="color: tomato">~/</span>. GNOME Boxes will need enhancement to detect that the user home is on Btrfs and create <span style="color: tomato">~/.local/share/gnome-boxes/images/</span> as a subvolume.
== Feedback ==
==== Red Hat doesn't support Btrfs? Can Fedora do this? ====
Red Hat supports Fedora well, in many ways. But Fedora already works closely with, and depends on, upstreams. And this will be one of them. That's an important consideration for this proposal. The community has a stake in ensuring it is supported. Red Hat will never support Btrfs if Fedora rejects it. Fedora necessarily needs to be first, and make the persuasive case that it solves more problems than alternatives. Feature owners believe it does, hands down.
The Btrfs community has users that have been using it for most of the past decade at scale. It's been the default on openSUSE (and SUSE Linux Enterprise) since 2014, and Facebook has been using it for all their OS and data volumes, in their data centers, for almost as long. Btrfs is a mature, well-understood, and battle-tested file system, used on both desktop/container and server/cloud use-cases. We do have developers of the Btrfs filesystem maintaining and supporting the code in Fedora, one is a Change owner, so issues that are pinned to Btrfs can be addressed quickly.
==== What about device-mapper alternatives? ====
dm-thin (thin provisioning): [[https://pagure.io/fedora-workstation/issue/152 Issue #152] still happens, because the installer won't over provision by default. It still requires manual intervention by the user to identify and resolve the problem. Upon growing a file system on dm-thin, the pool is over committed, and file system sizes become a fantasy: they don't add up to the total physical storage available. The truth of used and free space is only known by the thin pool, and CLI and GUI programs are unprepared for this. Integration points like rpm free space checks or GNOME disk-space warnings would have to be adapted as well.
dm-vdo: is not yet merged, and isn't as straightforward to selectively enable per directory and per file, as is the case on Btrfs using <code>chattr +c</code> on <span style="color: tomato">/var/lib/flatpaks/</span>.
Btrfs solves the problems that need solving, with few side effects or pitfalls for users. It has more features we can take advantage of immediately and transparently: compression, integrity, and IO isolation. Many Btrfs features and optimizations can be opted into selectively per directory or file, such as compression and nodatacow, rather than as a layer that's either on or off.
==== What about UI/UX and integration in the desktop? ====
If Btrfs isn't the default file system, there's no commitment, nor reason to work on any UI/UX integration. There are ideas to make certain features discoverable: selective compression; systemd-homed may take advantage of either Btrfs online resize, or near-term planned native encryption, which could make it possible to live convert non-encrypted homes to encrypted; and system snapshot and rollbacks.
Anaconda already has sophisticated Btrfs integration.
==== What Btrfs features are recommended and supported? ====
The primary goal of this feature is to be largely transparent to the user. It does not require or expect users to learn new commands, or to engage in peculiar maintenance rituals.
The full set of Btrfs features that is considered stable and enabled by default upstream will be enabled in Fedora. Fedora is a community project. What is supported within Fedora depends on what the community decides to put forward in terms of resources.
The upstream [https://btrfs.wiki.kernel.org/index.php/Status Btrfs feature status page].
==== Are subvolumes really mostly like directories? ====
Subvolumes behave like directories in terms of navigation in both the GUI and CLI, e.g. <code>cp</code>, <code>mv</code>, <code>du</code>, owner/permissions, and SELinux labels. They also share space, just like a directory.
But it is an incomplete answer.
A subvolume is an independent file tree, with its own POSIX namespace, and has its own pool of inodes. This means inode numbers repeat themselves on a Btrfs volume. Inodes are only unique within a given subvolume. A subvolume has its own st_dev, so if you use <code>stat FILE</code> it reports a device value referring to the subvolume the file is in. And it also means hard links can't be created between subvolumes. From this perspective, subvolumes start looking more like a separate file system. But subvolumes share most of the other trees, so they're not truly independent file systems. They're also not block devices.
== Benefit to Fedora ==
Problems Btrfs helps solve:
* Users running out of free space on either <span style="color: tomato">/</span> or <span style="color: tomato">/home</span> [https://pagure.io/fedora-workstation/issue/152 Workstation issue #152] ** "one big file system": no hard barriers like partitions or logical volumes ** transparent compression: significantly reduces write amplification, improves lifespan of storage hardware ** reflinks and snapshots are more efficient for use cases like containers (Podman supports both) * Storage devices can be flaky, resulting in data corruption ** Everything is checksummed and verified on every read ** Corrupt data results in EIO (input/output error), instead of resulting in application confusion, and isn't replicated into backups and archives * Poor desktop responsiveness when under pressure [https://pagure.io/fedora-workstation/issue/154 Workstation issue #154] ** Currently only Btrfs has proper IO isolation capability via cgroups2 ** Completes the resource control picture: memory, cpu, IO isolation * File system resize ** Online shrink and grow are fundamental to the design * Complex storage setups are... complicated ** Simple and comprehensive command interface. One master command ** Simpler to boot, all code is in the kernel, no initramfs complexities ** Simple and efficient file system replication, including incremental backups, with <code>btrfs send</code> and <code>btrfs receive</code>
== Scope == * Proposal owners: ** Submit PR's for Anaconda to change <code>default_scheme = BTRFS</code> to the proper product files. ** Multiple test days: build community support network ** Aid with documentation
* Other developers: ** Anaconda, review PRs and merge ** Bootloader team, review PRs and merge ** Recommended optimization <code>chattr +C</code> set on the containing directory for virt-manager and GNOME Boxes.
* Release engineering: [https://pagure.io/releng/issue/9545 #9545]
* Policies and guidelines: N/A
* Trademark approval: N/A
== Upgrade/compatibility impact ==
Change will not affect upgrades.
Documentation will be provided for existing Btrfs users to "retrofit" their setups to that of a default Btrfs installation (base plus any approved options).
== How To Test ==
'''''Today'''''<br /> Do a custom partitioning installation; change the scheme drop-down menu to Btrfs; click the blue "automatically create partitions"; and install.<br /> Fedora 31, 32, Rawhide, on x86_64 and ARM.
'''''Once change lands'''''<br /> It should be simple enough to test, just do a normal install.
== User Experience ==
==== Pros ====
* Mostly transparent * Space savings from compression * Longer lifespan of hardware, also from compression. * Utilities for used and free space, CLI and GUI, are expected to behave the same. No special commands are required. * More detailed information can be revealed by <code>btrfs</code> specific commands.
==== Enhancement opportunities ====
[https://bugzilla.redhat.com/show_bug.cgi?id=906591 updatedb does not index /home when /home is a bind mount] Also can affected rpm-ostree installations, including Silverblue.
[https://gitlab.gnome.org/GNOME/gnome-usage/-/issues/49 GNOME Usage: Incorrect numbers when using multiple btrfs subvolumes] This isn't Btrfs specific, happens with "one big ext4" volume as well.
[https://gitlab.gnome.org/GNOME/gnome-boxes/-/issues/88 GNOME Boxes, RFE: create qcow2 with 'nocow' option when on btrfs /home] This is Btrfs specific, and is a recommended optimization for both GNOME Boxes and virt-manager.
[https://github.com/containers/libpod/issues/6563 containers/libpod: automatically use btrfs driver if on btrfs]
== Dependencies ==
None.
== Contingency Plan ==
* Contingency mechanism: Owner will revert changes back to LVM+ext4 * Contingency deadline: Beta freeze
* Blocks release? Yes * Blocks product? Workstation and KDE
== Documentation ==
Strictly speaking no documentation is required reading for users. But there will be some Fedora documentation to help get the ball rolling.
For those who want to know more:
[https://btrfs.wiki.kernel.org/index.php/Main_Page btrfs wiki main page and full feature list.]
<code>man 5 btrfs</code> contains: mount options, features, swapfile support, checksum algorithms, and more<br /> <code>man btrfs</code> contains an overview of the btrfs subcommands<br /> <code>man btrfs <nowiki><subcommand></nowiki></code> will show the man page for that subcommand
NOTE: The btrfs command will accept partial subcommands, as long as it's not ambiguous. These are equivalent commands:<br /> <code>btrfs subvolume snapshot</code><br /> <code>btrfs sub snap</code><br /> <code>btrfs su sn</code>
You'll discover your own convention. It might be preferable to write out the full command on forums and lists, but then maybe some folks don't learn about this useful shortcut?
For those who want to know a lot more:
[https://btrfs.wiki.kernel.org/index.php/Main_Page#Developer_documentation Btrfs developer documentation]<br /> [https://github.com/btrfs/btrfs-dev-docs/blob/master/trees.txt Btrfs trees]
== Release Notes == The default file system on the desktop is Btrfs.
On 26.06.2020 16:42, Ben Cotton wrote:
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
I'm strongly against this proposal. BTRFS is the most unstable file system I ever seen. It can break up even under an ideal conditions and lead to a complete data loss. There are lots of complaints and bug reports in Linux kernel bugzilla and Reddit.
Such changes could affect Fedora reputation among other distributions.
On Fri, Jun 26, 2020 at 04:58:19PM +0200, Vitaly Zaitsev via devel wrote:
I'm strongly against this proposal. BTRFS is the most unstable file system I ever seen. It can break up even under an ideal conditions and lead to a complete data loss. There are lots of complaints and bug reports in Linux kernel bugzilla and Reddit.
That certainly would be concerning, but do you have citations on this? I did a search on reddit and did not find a significant number of such complaints in the top results -- in fact, mostly positive reports. For kernel bugzilla issues, do you have numbers compared to other filesystems?
My Reddit search _did_ turn up this presentation from Usenix:
https://www.usenix.org/conference/atc19/presentation/jaffer
From that in part:
* ext4 has significantly improved over ext3 in both detection and recovery from data corruption and I/O injection errors. Our extensive test suite generates only minor errors or datalosses in the file system, in stark contrast with [a 2005 paper], where ext3 was reported to silently discard write errors.
* On the other hand, Btrfs, which is a production grade filesystem with advanced features like snapshot and cloning, has good failure detection mechanisms, but is unable to recover from errors that affect its key data structures, partially due to disabling metadata replication when deployed on SSDs.
[...]
* We notice potentially fatal omissions in error detection andrecovery for all file systems except for ext4. This is concern-ing since technology trends, such as continually growing SSDdrive capacities and increasing densities as QLC drives whichare coming on the market, all seem to point towards increas-ing rather than decreasing SSD error rates in the future. [...]
On Fri, Jun 26, 2020 at 04:58:19PM +0200, Vitaly Zaitsev via devel wrote:
On 26.06.2020 16:42, Ben Cotton wrote:
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
I'm strongly against this proposal. BTRFS is the most unstable file system I ever seen. It can break up even under an ideal conditions and lead to a complete data loss. There are lots of complaints and bug reports in Linux kernel bugzilla and Reddit.
Anecdata… OTOH, I'm using btrfs on most of my machines. I had one data loss, when RAM module went bad and caused corruption in bcache attached to my btrfs /. It was neither fault of bcache nor btrfs.
On Fri, 26 Jun 2020 at 16:05, Vitaly Zaitsev via devel < devel@lists.fedoraproject.org> wrote: [..]
I'm strongly against this proposal. BTRFS is the most unstable file system I ever seen.
I would be really interested how you came to that conclusion (how did you measure that?). Do you have any metrics data which shows Linux filesystems stability?
Does anyone know any source of some data which could be used to put all Linux filesystems on some stability ruler? Maybe some FS crash statistics taken from systems working on the same/similar HW in some DCs?
kloczek
On Fri, Jun 26, 2020 at 8:58 AM Vitaly Zaitsev via devel devel@lists.fedoraproject.org wrote:
I'm strongly against this proposal. BTRFS is the most unstable file system I ever seen. It can break up even under an ideal conditions and lead to a complete data loss. There are lots of complaints and bug reports in Linux kernel bugzilla and Reddit.
I've got a Samsung 840 EVO that I know has firmware bugs. Is that an ideal condition? What about compiling webkitgtk and losing control of the system under load (unresponsive GUI while the compiling continues to write)? Is it an ideal condition? And because I'm notoriously impatient, I often yank the power cord. Ideal condition? And I've done this over 100 times in the last year. Ideal condition?
100% of the subsequent cold boots, boot identical to that of a prior clean shutdown. Zero btrfs complaints. One person, one laptop, one SSD. I'm not a totally disqualified scientific sample but it's a really insignificant anecdote, other than even at this scale if there were intrinsic file system defects, I think I'd have seen it.
Question is, what happens when the firmware has a hiccup and I also get a power fail. What am I likely to see, and what do I do? When there are problems, we're used to a particular pattern with ext4. That pattern will change with btrfs. There will be fewer of some problems, more of others, and the messages will be different. fsck.ext4 is pretty much all we have, all we're used to, and it's a binary pass/fail. Even though we're talking about edge cases at this level, those who get unlucky for whatever reason are going to need a community of user to user support giving them good advice. Will Fedora?
It's also important to talk about what's left on the table *without* this change. The potential to almost transparently drop in a new file system that extends the life of user's hardware, eliminates the free space competition problem between /home and /, and allocates it more efficiently. And asks *less* of day to day users, while inviting *more* from those who want to explore more features. On the same file system.
The fear/concern component is real, it has to be addressed and not dismissed. But that component is already present with what we have. We're just used to it. Is there enough of a sense of adventure and bravery in Fedora to overcome the fear component, and in exchange we get a modern file system that actually helps us solve problems we're having today right now? And offers features that beg for future creativity and innovation?
I think the answer is yes, but the Fedora community is going to have to decide.
On 6/26/20 12:31 PM, Chris Murphy wrote:
That pattern will change with btrfs. There will be fewer of some problems, more of others, and the messages will be different. fsck.ext4 is pretty much all we have, all we're used to, and it's a binary pass/fail. Even though we're talking about edge cases at this level, those who get unlucky for whatever reason are going to need a community of user to user support giving them good advice. Will Fedora?
Well said. BTRFS is more complex and will require getting used to.
In case of FS trouble, everyone knows 'fsck' but as Josef wrote
With btrfs you are just getting started. You have several built in mount options for recovering different failures, all read only. But you have to know that they are there and how to use them.
which is both encouraging and terrifying :)
I remember that two issues that made me apprehensive wrt. BTRFS were its handling of the 'disk full' situation, and lack of a staightforward 'fsck' workflow. I think the first issue has been resolved, and we probably just need some docs and scripts that handle file system corruption by remounting R/O and printing some suggestions what to do next.
It's also important to talk about what's left on the table*without* this change. The potential to almost transparently drop in a new file system that extends the life of user's hardware, eliminates the free space competition problem between /home and /, and allocates it more efficiently. And asks*less* of day to day users, while inviting *more* from those who want to explore more features. On the same file system.
For what it's worth, this is really needed, and overdue. I have repeatedly failed Fedora OS release upgrades on different machines by running out of root fs space. I think the default / is around 50GB, and it's too easy to fill: during OS update we need space for three copies of each package: the old version, the downloaded new version, and the space to install the new version.
Even though technically dnf system-upgrade can --download-dir to a location off / it doesn't seem to work with the actual upgrade, so the only way I know is to delete largest packages (flightGear*, piglit*, KiCAD*, ...) and reinstall them after update.
One thing that hasn't been mentioned yet is that btrfs is also important for our plans to preserve system responsiveness under heavy load, https://pagure.io/fedora-workstation/issue/154.
On Fri, Jun 26, 2020 at 5:22 pm, Przemek Klosowski via devel devel@lists.fedoraproject.org wrote:
For what it's worth, this is really needed, and overdue. I have repeatedly failed Fedora OS release upgrades on different machines by running out of root fs space. I think the default / is around 50GB, and it's too easy to fill: during OS update we need space for three copies of each package: the old version, the downloaded new version, and the space to install the new version.
We raised it to 70 GB, but it's still too small. I keep running out of space too, most recently just a couple days ago. This is a problem we're determined to solve, and raising the size of / further just increases the chance of the user running out of space on /home currently, so if btrfs doesn't pass, we will (very likely) switch to single-partition ext4 (or maybe xfs). See https://pagure.io/fedora-workstation/issue/152.
On Fri, Jun 26, 2020 at 3:22 PM Przemek Klosowski via devel devel@lists.fedoraproject.org wrote:
I remember that two issues that made me apprehensive wrt. BTRFS were its handling of the 'disk full' situation, and lack of a staightforward 'fsck' workflow. I think the first issue has been resolved, and we probably just need some docs and scripts that handle file system corruption by remounting R/O and printing some suggestions what to do next.
A medium term goal is to make systemd and the desktop environment more tolerant to starting up read-only, and even though this is a limited environment the user isn't just stuck at a prompt. SUSE/openSUSE can today boot read-only snapshots as part of its rollback strategy but I'm not sure how/why it works or whether it's adaptable.
A short term goal, possibly even a requirement for the proposal, is some kind of message at a dracut prompt to at least give the user something to go on, in sequence, including even 'join us on #fedora-btrfs' or whatever. A bigger problem is that right now (a) new installs don't set a password for root user, and (b) systemd emergency target requires a root user login to get to a prompt. It has to be a mount *failure* to get to a dracut prompt where we could show some messages. There is this middle area where the user is stuck no matter the file system.
Some of these are long standing problems, but they're perhaps being spotlit by the change.
For what it's worth, this is really needed, and overdue. I have repeatedly failed Fedora OS release upgrades on different machines by running out of root fs space. I think the default / is around 50GB, and it's too easy to fill: during OS update we need space for three copies of each package: the old version, the downloaded new version, and the space to install the new version.
75G on new installs today but yes there are many folks still with a 50G root volume at /
And changing this to 80+G is sorta 'kick the can' but also as it turns out it doesn't really fix the problem that well and puts pressure on /home in cases where the laptop drive is kinda small. There are other valid ways to solve this single problem, e.g. a single plain ext4 or xfs volume. But both of those leave things on the table users benefit from.
Of course it isn't all about features. If it's just a feature contest btrfs wins somewhat dramatically. What's going to make the feature successful is the community backing it up. The change needs the desire and resources of Fedora more than just features. A dozen owners on the proposal hopefully gives confidence that it's serious, but it's going to take more than that.
On Fri, Jun 26, 2020 at 05:49:03PM -0600, Chris Murphy wrote:
For what it's worth, this is really needed, and overdue. I have repeatedly failed Fedora OS release upgrades on different machines by running out of root fs space. I think the default / is around 50GB, and it's too easy to fill: during OS update we need space for three copies of each package: the old version, the downloaded new version, and the space to install the new version.
75G on new installs today but yes there are many folks still with a 50G root volume at /
And changing this to 80+G is sorta 'kick the can' but also as it turns out it doesn't really fix the problem that well and puts pressure on /home in cases where the laptop drive is kinda small. There are other valid ways to solve this single problem, e.g. a single plain ext4 or xfs volume. But both of those leave things on the table users benefit from.
We cannot do anything for existing installs. It is up to owner to juggle partitions. Also, with btrfs proposal we do not have to decide how to split space between / and /home. Boths are just a subvolumes and share all the space.
On 6/26/20 2:22 PM, Przemek Klosowski via devel wrote:
Even though technically dnf system-upgrade can --download-dir to a location off / it doesn't seem to work with the actual upgrade, so the only way I know is to delete largest packages (flightGear*, piglit*, KiCAD*, ...) and reinstall them after update.
Somewhat off-topic, but you can symlink /var/lib/dnf/system-upgrade to somewhere else and all the downloaded packages will be stored there and the upgrade will still work. (As long as the linked storage is automatically mounted at boot.) I've even used a USB flash drive for this.
On Fri, Jun 26, 2020 at 04:58:19PM +0200, Vitaly Zaitsev via devel wrote:
On 26.06.2020 16:42, Ben Cotton wrote:
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
I'm strongly against this proposal. BTRFS is the most unstable file system I ever seen. It can break up even under an ideal conditions and lead to a complete data loss. There are lots of complaints and bug reports in Linux kernel bugzilla and Reddit.
I don't have any info to either confirm or refute this assertion, but I want to say we should be careful to actually compare apples to apples.
btrfs is not a 1-1 equivalent of ext4, because the scope of btrfs is much broader. It should likely be compared against some combo of existing functionality, such as ext4+devicemapper, to get a fairer picture.
It isn't just a matter of whether the kernel parts are reliable. It is also important how well the userspace tools fit together to form the end user solution. This impacts how likely it is for the user to shoot themselves in the foot when making changes to their storage stack.
Regards, Daniel
On Fri, Jun 26, 2020 at 05:32:45PM +0100, Daniel P. Berrangé wrote:
btrfs is not a 1-1 equivalent of ext4, because the scope of btrfs is much broader. It should likely be compared against some combo of existing functionality, such as ext4+devicemapper, to get a fairer picture.
Well, specifically, we should compare the existing default partitioning scheme.
On 26 June 2020 16:58:19 CEST, Vitaly Zaitsev via devel devel@lists.fedoraproject.org wrote:
On 26.06.2020 16:42, Ben Cotton wrote:
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
I'm strongly against this proposal. BTRFS is the most unstable file system I ever seen. It can break up even under an ideal conditions and lead to a complete data loss. There are lots of complaints and bug reports in Linux kernel bugzilla and Reddit.
Such changes could affect Fedora reputation among other distributions.
I strongly agree. BTRFS has been 5 years from production ready for almost a decade now, please don't force this on users that doesn't know any better.
On Fri, Jun 26, 2020 at 8:45 pm, Markus Larsson qrsbrwn@uidzero.se wrote:
I strongly agree. BTRFS has been 5 years from production ready for almost a decade now, please don't force this on users that doesn't know any better.
This is hard to square with the fact that it's already being used in production on millions of systems. It's also hard to square with the data presented by Josef -- the only hard evidence I've seen on the topic of filesystem reliability -- which shows btrfs is an order of magnitude more reliable than xfs (although we don't know how it compares to ext4). Surely if xfs is good enough for RHEL, and btrfs is at least 10x more reliable than xfs, that suggests btrfs should probably be good enough for Fedora?
Do you have any real evidence for your claim that would be more convincing than what Josef has presented?
On 26 June 2020 21:04:00 CEST, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Fri, Jun 26, 2020 at 8:45 pm, Markus Larsson qrsbrwn@uidzero.se wrote:
I strongly agree. BTRFS has been 5 years from production ready for almost a decade now, please don't force this on users that doesn't know any better.
This is hard to square with the fact that it's already being used in production on millions of systems. It's also hard to square with the data presented by Josef -- the only hard evidence I've seen on the topic of filesystem reliability -- which shows btrfs is an order of magnitude more reliable than xfs (although we don't know how it compares to ext4). Surely if xfs is good enough for RHEL, and btrfs is at least 10x more reliable than xfs, that suggests btrfs should probably be good enough for Fedora?
Do you have any real evidence for your claim that would be more convincing than what Josef has presented?
Josef's server parks is a bit of a different use case than laptops as other people has already pointed out. If you want data on how it works in a desktop/laptop scenario talk to openSUSE users about how many times the "btrfs randomly ate my volume"-bug was "fixed".
When I ran an environment of about 4500 SLES and about 5000 RHEL servers btrfs failed about 3 times as often as xfs (this from our own in-house statistics). That was 3 years ago but filesystems takes long to mature and I have been keeping ear near openSUSE to see where it goes. Is this as big as Josef's environment? No but it is first hand data to me (to you it is of course just anecdotal evidence)
BTRFS has the potential to become great, I just think it isn't there yet and it'll take 5 years of smooth sailing to convince me.
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Fri, 2020-06-26 at 21:22 +0200, Markus Larsson wrote:
On 26 June 2020 21:04:00 CEST, Michael Catanzaro < mcatanzaro@gnome.org> wrote:
On Fri, Jun 26, 2020 at 8:45 pm, Markus Larsson qrsbrwn@uidzero.se wrote:
I strongly agree. BTRFS has been 5 years from production ready for almost a decade now, please don't force this on users that doesn't know any better.
This is hard to square with the fact that it's already being used in production on millions of systems. It's also hard to square with the data presented by Josef -- the only hard evidence I've seen on the topic of filesystem reliability -- which shows btrfs is an order of magnitude more reliable than xfs (although we don't know how it compares to ext4). Surely if xfs is good enough for RHEL, and btrfs is at least 10x more reliable than xfs, that suggests btrfs should probably be good enough for Fedora?
Do you have any real evidence for your claim that would be more convincing than what Josef has presented?
Josef's server parks is a bit of a different use case than laptops as other people has already pointed out. If you want data on how it works in a desktop/laptop scenario talk to openSUSE users about how many times the "btrfs randomly ate my volume"-bug was "fixed".
When I ran an environment of about 4500 SLES and about 5000 RHEL servers btrfs failed about 3 times as often as xfs (this from our own in-house statistics). That was 3 years ago but filesystems takes long to mature and I have been keeping ear near openSUSE to see where it goes. Is this as big as Josef's environment? No but it is first hand data to me (to you it is of course just anecdotal evidence)
Keep in mind that SLES does backport btrfs patches because they support it. RHEL does not. And we are talking about Fedora here anyway.
BTRFS has the potential to become great, I just think it isn't there yet and it'll take 5 years of smooth sailing to convince me.
Probably you should try it with Fedora's kernel?
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
- -- Igor Raits ignatenkobrain@fedoraproject.org
On 26 June 2020 21:32:31 CEST, Igor Raits ignatenkobrain@fedoraproject.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Josef's server parks is a bit of a different use case than laptops as other people has already pointed out. If you want data on how it works in a desktop/laptop scenario talk to openSUSE users about how many times the "btrfs randomly ate my volume"-bug was "fixed".
When I ran an environment of about 4500 SLES and about 5000 RHEL servers btrfs failed about 3 times as often as xfs (this from our own in-house statistics). That was 3 years ago but filesystems takes long to mature and I have been keeping ear near openSUSE to see where it goes. Is this as big as Josef's environment? No but it is first hand data to me (to you it is of course just anecdotal evidence)
Keep in mind that SLES does backport btrfs patches because they support it. RHEL does not. And we are talking about Fedora here anyway.
We didn't use BTRFS on any RHEL machines only on the SLES ones.
BTRFS has the potential to become great, I just think it isn't there yet and it'll take 5 years of smooth sailing to convince me.
Probably you should try it with Fedora's kernel?
Oh I will, when BTRFS has had smooth sailing for 5 years. I have no problem with others running btrfs, I just think it should be the default since we seem to be all about not creating problems for new users. That's at least what the recent changes tell me :)
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Igor Raits ignatenkobrain@fedoraproject.org -----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEcwgJ58gsbV5f5dMcEV1auJxcHh4FAl72TU8ACgkQEV1auJxc Hh759w//XHCXloEj6QAUNpVxCEljVwm1WQVl1jfH3p+mex1a5Dan242COXkVaEzy 6zR79EZf7ONg1dTU41fq1mg3gWkFAE/q+OD4cSJ/Jbwyt/L+L40MgD1h7UmNo0/P uytLZYC3BUIq9ARAH2DlYMHSQUcYZ8TOyrlxWUmkyqPnc99D9CkkqReRjWA/EtYi mVNOzCQwdMefSJu6+HZlFIhyYeyBbmfu/Q0v5uQE9CQbmN/AuyTHmWG3jRYTINxg 7w8vFPLwjUEmUno+i0Jvkdr4EqSZihV4ljoA0MO8OEADHamjnUOWX8HiFN6E6y+V cDXPvVTqdf7v+Hz6j6F2cUDbm6PQrbd5fODMeCVibuE5knDB587jRcrqXYfSp+wL 66VRnHXYrOAMHXKlcs+XpPxkqfy5AdgvkP63PUZTWb4yb4wElVVpFNsBf2wk7TXu kp9cKSf+1CSaIq0oD1uY9YB4Xm9elI3pRJJHuH8TrOKI4RsxnmjXdpXB+pzNf8BH 8PQex0mAwcvefiK0MfaJcl6cP9PgIvvAb75OoWulEsXGG9uPT1ZknYwgXPFN+eDs T5Wr/7957eiDDgYDtxPXQfliI58AtnCh1ysNcEf5vRLEARs3HLT8Mo+Z+o78ZvpG ZNYkixPYKGrGrUdLJzwXqlQAy6wlNXDzTIxPtrXy5DHMkuAAAqo= =2hBc -----END PGP SIGNATURE----- _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Fri, Jun 26, 2020 at 2:05 PM Michael Catanzaro mcatanzaro@gnome.org wrote:
On Fri, Jun 26, 2020 at 8:45 pm, Markus Larsson qrsbrwn@uidzero.se wrote:
I strongly agree. BTRFS has been 5 years from production ready for almost a decade now, please don't force this on users that doesn't know any better.
This is hard to square with the fact that it's already being used in production on millions of systems. It's also hard to square with the data presented by Josef -- the only hard evidence I've seen on the topic of filesystem reliability -- which shows btrfs is an order of magnitude more reliable than xfs (although we don't know how it compares to ext4). Surely if xfs is good enough for RHEL, and btrfs is at least 10x more reliable than xfs, that suggests btrfs should probably be good enough for Fedora?
Saying production on millions of systems is a bit misleading here, when you are talking about millions of systems at a single company.
On 2020-06-26 22:13, Justin Forbes wrote:
Saying production on millions of systems is a bit misleading here, when you are talking about millions of systems at a single company.
...in a redundant configuration where losing a disk is tolerated by design and managing data that have very low vale (mostly pictures of cats and random chats).
Filesystem quality must be measured in other conditions: have a Postgres on it, financial transactions, random blackouts, etc.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Sat, 2020-06-27 at 10:35 +0200, Roberto Ragusa wrote:
On 2020-06-26 22:13, Justin Forbes wrote:
Saying production on millions of systems is a bit misleading here, when you are talking about millions of systems at a single company.
...in a redundant configuration where losing a disk is tolerated by design and managing data that have very low vale (mostly pictures of cats and random chats).
Filesystem quality must be measured in other conditions: have a Postgres on it, financial transactions, random blackouts, etc.
Do you run postgres, financial transactions and random blackouts on your laptop / workstation? If so, isn't it just for testing purposes?
I'm not saying that it is not important to have filesystem stable and such, but just saying that typical workstation workloads is not utilizing disks that much (if at all?).
-- Roberto Ragusa mail at robertoragusa.it _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
- -- Igor Raits ignatenkobrain@fedoraproject.org
Le samedi 27 juin 2020 à 10:47 +0200, Igor Raits a écrit :
Do you run postgres, financial transactions and random blackouts on your laptop / workstation? If so, isn't it just for testing purposes?
Wokstations are full of high-value personnal data, because home users do not have an IT organisation to back it up in a professional way.
On Sat, Jun 27, 2020 at 10:59:57AM +0200, Nicolas Mailhot via devel wrote:
Le samedi 27 juin 2020 à 10:47 +0200, Igor Raits a écrit :
Do you run postgres, financial transactions and random blackouts on your laptop / workstation? If so, isn't it just for testing purposes?
Wokstations are full of high-value personnal data, because home users do not have an IT organisation to back it up in a professional way.
That's why hav my personal, valuable and irreplacable data (photos, contracts, etc.) on btrfs raid1. I do backups regularly, but only btrfs is able to catch and correct silent corruptions. Which do happen. Without btrfs, I could be happily backing up corrupted photos.
On Saturday, June 27, 2020, Nicolas Mailhot via devel < devel@lists.fedoraproject.org> wrote:
Le samedi 27 juin 2020 à 10:47 +0200, Igor Raits a écrit :
Do you run postgres, financial transactions and random blackouts on your laptop / workstation? If so, isn't it just for testing purposes?
Wokstations are full of high-value personnal data, because home users do not have an IT organisation to back it up in a professional way.
Speaking of backups some popular (even cheap) NAS systems for "home users" do use btrfs - those users also do not have professional IT support to help them.
-- Nicolas Mailhot _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject. org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists. fedoraproject.org
On Sat, Jun 27, 2020 at 06:41:17PM +0200, drago01 wrote:
Speaking of backups some popular (even cheap) NAS systems for "home users" do use btrfs - those users also do not have professional IT support to help them.
Um, yes they do -- the folks who supplied the NAS software, aka the device manufacturer. For that equipment, btrfs is an implementation detail, completely hidden from the user.
- Solomon
On 2020-06-27 10:47, Igor Raits wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Sat, 2020-06-27 at 10:35 +0200, Roberto Ragusa wrote:
On 2020-06-26 22:13, Justin Forbes wrote:
Saying production on millions of systems is a bit misleading here, when you are talking about millions of systems at a single company.
...in a redundant configuration where losing a disk is tolerated by design and managing data that have very low vale (mostly pictures of cats and random chats).
Filesystem quality must be measured in other conditions: have a Postgres on it, financial transactions, random blackouts, etc.
Do you run postgres, financial transactions and random blackouts on your laptop / workstation? If so, isn't it just for testing purposes?
No, but I do run on my laptop/workstation the same technologies that have been proven to be good for serious stuff. That is the fundamental Linux advantage, or at least has always been, and that's why I'm using Linux daily since when other people were waiting for the release of Win95.
On Sat, Jun 27, 2020 at 9:30 AM Roberto Ragusa mail@robertoragusa.it wrote:
On 2020-06-27 10:47, Igor Raits wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Sat, 2020-06-27 at 10:35 +0200, Roberto Ragusa wrote:
On 2020-06-26 22:13, Justin Forbes wrote:
Saying production on millions of systems is a bit misleading here, when you are talking about millions of systems at a single company.
...in a redundant configuration where losing a disk is tolerated by design and managing data that have very low vale (mostly pictures of cats and random chats).
Filesystem quality must be measured in other conditions: have a Postgres on it, financial transactions, random blackouts, etc.
Do you run postgres, financial transactions and random blackouts on your laptop / workstation? If so, isn't it just for testing purposes?
No, but I do run on my laptop/workstation the same technologies that have been proven to be good for serious stuff. That is the fundamental Linux advantage, or at least has always been, and that's why I'm using Linux daily since when other people were waiting for the release of Win95.
By that metric, Btrfs qualifies, as it's the default filesystem on SUSE Linux Enterprise (and has been since 2014). SUSE has built several products specifically on top of Btrfs, including their Kubernetes product, which relies on Btrfs features to offer safety and high performance.
And Facebook runs it for nearly all their infrastructure, as noted by Josef upthread.
Google uses it to power the Crostini Linux on Chrome OS environment system.
Synology uses it for their NAS products by default since 2016 with DSM 6.0.
I can keep going on, but I think this shows that Btrfs is a mature, battle-tested filesystem used for *very* serious workloads.
-- 真実はいつも一つ!/ Always, there's only one truth!
On Sat, Jun 27, 2020 at 09:39:36AM -0400, Neal Gompa wrote:
By that metric, Btrfs qualifies, as it's the default filesystem on SUSE Linux Enterprise (and has been since 2014). SUSE has built
One thing I'd like to see addressed.
Back in the RHEL7.4 days, btrfs was explicitly deprecated:
"The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a fully supported feature and it will be removed in a future major release of Red Hat Enterprise Linux.
"The Btrfs file system did receive numerous updates from the upstream in Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise Linux 7 series. However, this is the last planned update to this feature.
So, why did SuSE consider BTRFS "ready" while RedHat did not, to the point of removing support for it? And what has changed since then?
- Solomon
On 27 June 2020 16:17:16 CEST, Solomon Peachy pizza@shaftnet.org wrote:
On Sat, Jun 27, 2020 at 09:39:36AM -0400, Neal Gompa wrote:
By that metric, Btrfs qualifies, as it's the default filesystem on SUSE Linux Enterprise (and has been since 2014). SUSE has built
One thing I'd like to see addressed.
Back in the RHEL7.4 days, btrfs was explicitly deprecated:
"The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a fully supported feature and it will be removed in a future major release of Red Hat Enterprise Linux.
"The Btrfs file system did receive numerous updates from the upstream in Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise Linux 7 series. However, this is the last planned update to this feature.
So, why did SuSE consider BTRFS "ready" while RedHat did not, to the point of removing support for it? And what has changed since then?
I don't know how to say this without throwing shade so here goes anyway. Anyone who has worked with both RHEL and SLES systems knows why. My feelings from working with both products in large scale heterogeneous environments is that SLES is many factors less reliable than RHEL. I don't know exactly how and why because SuSE has many many talented people on payroll and do good work in many areas it's just that when it's time to put SLES together it just isn't very reliable. I'm sorry for the harsh words I just don't know how to put it in any other way.
/Markus
On Sat, Jun 27, 2020 at 10:17 AM Solomon Peachy pizza@shaftnet.org wrote:
On Sat, Jun 27, 2020 at 09:39:36AM -0400, Neal Gompa wrote:
By that metric, Btrfs qualifies, as it's the default filesystem on SUSE Linux Enterprise (and has been since 2014). SUSE has built
One thing I'd like to see addressed.
Back in the RHEL7.4 days, btrfs was explicitly deprecated:
"The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a fully supported feature and it will be removed in a future major release of Red Hat Enterprise Linux.
"The Btrfs file system did receive numerous updates from the upstream in Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise Linux 7 series. However, this is the last planned update to this feature.
So, why did SuSE consider BTRFS "ready" while RedHat did not, to the point of removing support for it? And what has changed since then?
Red Hat deprecated it because they have zero engineers knowledgeable about Btrfs in a way that they could regularly and meaningfully contribute to its development upstream and maintain it for the Red Hat Enterprise Linux kernel. They all left for different companies over the past several years. That situation has not changed at Red Hat to the best of my knowledge.
However, Fedora, as the cutting edge platform that uses new technologies first, is not bound by Red Hat's lack of staff on Btrfs. Indeed, one of the change owners (Josef Bacik) does not work at Red Hat, but is an upstream Btrfs developer who is helping to push this change.
Perhaps with Fedora adopting Btrfs, this may change in the future. I do not know. But as a Fedoran, I want Fedora to use the best technology we have to solve problems. My firm belief is that Btrfs is that for the problems we are facing today.
-- 真実はいつも一つ!/ Always, there's only one truth!
By that metric, Btrfs qualifies, as it's the default filesystem on SUSE Linux Enterprise (and has been since 2014). SUSE has built
One thing I'd like to see addressed.
Back in the RHEL7.4 days, btrfs was explicitly deprecated:
"The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a fully supported feature and it will be removed in a future major release of Red Hat Enterprise Linux.
"The Btrfs file system did receive numerous updates from the upstream in Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise Linux 7 series. However, this is the last planned update to this feature.
So, why did SuSE consider BTRFS "ready" while RedHat did not, to the point of removing support for it? And what has changed since then?
I suspect, but I do not know so it's purely my own opinion, it was because there was not the internal knowledge to be able to support the filesystem, or yet another filesystem, for paying customers. Adding support for something like a filesystem where you have paying customers is not something taken lightly, customers tend to like their data and enterprise support is only as good as their response when the absolute worst possible thing happens. Having worked at hosting providers and been consulting onsite at some very large companies I know from experience that a lot of enterprises often take more time over decisions to change storage platforms and options around storage than probably all other decisions combined. An outage of a load balancer or a network switch can be dealt with via resiliency and replacing them is quite straight foward if a device doesn't live up to expectations, storage is quite the opposite and data corruption is often not easy to recover from so things like new filesystems are not something that's taken lightly for some customers.
Peter
On 6/27/20 4:35 AM, Roberto Ragusa wrote:
On 2020-06-26 22:13, Justin Forbes wrote:
Saying production on millions of systems is a bit misleading here, when you are talking about millions of systems at a single company.
...in a redundant configuration where losing a disk is tolerated by design and managing data that have very low vale (mostly pictures of cats and random chats).
Huh? I can assure you that we care very much about our users data, and do not lose "cat pictures" randomly and call it a day. If you are going to make technical arguments then I'm happy to talk about actual issues, but insulting the hard work we put into maintaining a very high quality production environment is not helpful or relevant. Thanks,
Josef
On Fri, 2020-06-26 at 14:04 -0500, Michael Catanzaro wrote:
[...]
Surely if xfs is good enough for RHEL, and btrfs is at least 10x more reliable than xfs, that suggests btrfs should
This is a good argument for having Fedora officially support BtrFS as a possible installation option, yes; but a _default_ filesystem needs to be absolutely tried-and-true, and I believe that BtrFS has not yet been put through its paces well enough for this. Ext4 should remain the default FS for now.
BtrFS might have significant feature advantage over Ext4, yes; but so far it has only seen production use from a handful of companies, and even then, not for very long. (The BtrFS wiki page [1] lists most of these users only as of late-2018 -- just under two years.)
In contrast to this, Ext4 has been the default FS in many enterprise systems for well over a decade: For instance, Google transitioned its systems to Ext4 in January 2010 [2], and transitioned Android to Ext4 in December 2010 [3]. And most modern Linux distros made Ext4 their default filesystem at around the same time.
Moreover, putting aside any issues of stability, Ext4 peformance and interactivity continues to beat that of BtrFS except in very specific scenarios. For example, in this December 2019 Phoronix benchmark [4], BtrFS was better under some RAID setups, but consistently slightly worse than Ext4 for the single disk use cases; and considering that most desktops and laptops (minus some workstation-class PCs and laptops, like a ThinkPad P-series perhaps) use single disks instead of RAID arrays, it does not make sense yet to suggest a default filesystem that will hamper performance at the cost of features (like support for 16 EiB volumes and snapshots) that most users will probably not use or care about.
In summary: Yes, it would be very cool for Fedora to support BtrFS; but it should not yet be the default filesystem. It still needs a lot of time to mature and stabilize.
[1] htps://btrfs.wiki.kernel.org/index.php/Production_Users [2] https://lists.openwall.net/linux-ext4/2010/01/04/8 [3] https://thunk.org/tytso/blog/2010/12/12/android-will-be-using-ext4-starting-...
[4] https://www.phoronix.com/scan.php?page=article&item=linux54-hdd-raid
On Fri, Jun 26, 2020 at 6:17 PM Peter Gordon peter@thecodergeek.com wrote:
This is a good argument for having Fedora officially support BtrFS as a possible installation option, yes;
It already is a release blocking (supported) file system for install time option. Has been for ~10 years.
BtrFS might have significant feature advantage over Ext4, yes; but so far it has only seen production use from a handful of companies, and even then, not for very long. (The BtrFS wiki page [1] lists most of these users only as of late-2018 -- just under two years.)
Facebook since 2015. SUSE/openSUSE on the desktop and on servers since 2014, by default. Are you suggesting they can do it and we can't?
In contrast to this, Ext4 has been the default FS in many enterprise systems for well over a decade: For instance, Google transitioned its systems to Ext4 in January 2010 [2], and transitioned Android to Ext4 in December 2010 [3]. And most modern Linux distros made Ext4 their default filesystem at around the same time.
Google has been using btrfs as part of Crostini, which I mention up thread, as the file system to support native Linux apps on Chrome OS. It would appear they're choosing different things for different purposes to solve specific problems.
And in Fedora we think users want to improve the life of their hardware, get better efficiency with reflinks and snapshots for containers, and improve the responsiveness of the desktop by including IO isolation as part of a better resource control solution, and not have corrupt data pass through to user space or to their backups, silently. Btrfs provides all of these things, and helps solve users' problems.
On 27 June 2020 03:21:32 CEST, Chris Murphy lists@colorremedies.com wrote:
On Fri, Jun 26, 2020 at 6:17 PM Peter Gordon peter@thecodergeek.com wrote:
Facebook since 2015. SUSE/openSUSE on the desktop and on servers since 2014, by default. Are you suggesting they can do it and we can't?
There's a difference between "can" and "should". I find this "<other guy> can do this are you less of a man than <other guy>" tiresome. When SLES made the switch they did only recommend it for system data not production data because it kept breaking and data loss is painful. That was still the case 3 years ago, if they have reconsidered it has been done later than that. It's very clear from both the openSUSE and the Arch community that btrfs has higher failure rates than ext4 and the rate of catastrophic failure is non-negligible. To push for btrfs is doing a disservice to the new users and the not yet competent.
On Sat, Jun 27, 2020 at 3:12 AM Markus Larsson qrsbrwn@uidzero.se wrote:
There's a difference between "can" and "should". I find this "<other guy> can do this are you less of a man than <other guy>" tiresome.
Yes, I also find it tiresome when people make grandiose claims of having facts on their side, and yet provide none, but inject hyperbole into the conversation instead.
When SLES made the switch they did only recommend it for system data not production data because it kept breaking and data loss is painful. That was still the case 3 years ago, if they have reconsidered it has been done later than that. It's very clear from both the openSUSE and the Arch community that btrfs has higher failure rates than ext4 and the rate of catastrophic failure is non-negligible.
Excellent! Provide the data. I'm looking forward to seeing this very clear data. You can provide it, today?
I'm not using Btrfs because Facebook does or SUSE does. I'm using it because I trust it, I value my data, I value the contents of my wallet (money), and I'm saving time overall in my myriad use of it for work and for testing Fedora. I've seen btrfs catch corruption other file systems aren't designed to, however rare these are at an individual level. Every day I benefit from compression, reflink copies and snapshots, however incremental that benefit. And yet, I mostly interact with it just like any other file system. It is an exceptionally ordinary experience most of the time.
To push for btrfs is doing a disservice to the new users and the not yet competent.
This is not at all persuasive.
On 27 June 2020 17:55:09 CEST, Chris Murphy lists@colorremedies.com wrote:
On Sat, Jun 27, 2020 at 3:12 AM Markus Larsson qrsbrwn@uidzero.se wrote:
There's a difference between "can" and "should". I find this "<other guy> can do this are you less of a man than <other guy>" tiresome.
Yes, I also find it tiresome when people make grandiose claims of having facts on their side, and yet provide none, but inject hyperbole into the conversation instead.
When SLES made the switch they did only recommend it for system data not production data because it kept breaking and data loss is painful. That was still the case 3 years ago, if they have reconsidered it has been done later than that. It's very clear from both the openSUSE and the Arch community that btrfs has higher failure rates than ext4 and the rate of catastrophic failure is non-negligible.
Excellent! Provide the data. I'm looking forward to seeing this very clear data. You can provide it, today?
The actual data I will never ever be able to share. I have ended my time at that particular company but even when I was there I was not permitted to share such data. Or did you mean data from openSUSE and Arch? Just have a look at their bug trackers. You can dismiss it as anecdotal, that's fine. You could also try to see why someone would get the view that I hold. I have no problem with Fedora supporting btrfs, I have a problem with having it as the default option. This is because my experience tells me that it isn't ready yet. Josef has a different view and that's good, even fine tbh. Disagreement is good, that's how mistakes are avoided.
That said, arguing doesn't do much good now, the decision looks like it has already been made.
On Sat, Jun 27, 2020 at 10:21 AM Markus Larsson qrsbrwn@uidzero.se wrote:
The actual data I will never ever be able to share. I have ended my time at that particular company but even when I was there I was not permitted to share such data. Or did you mean data from openSUSE and Arch?
Whatever data makes the claim "very clear."
Just have a look at their bug trackers. You can dismiss it as anecdotal, that's fine. You could also try to see why someone would get the view that I hold. I have no problem with Fedora supporting btrfs, I have a problem with having it as the default option. This is because my experience tells me that it isn't ready yet. Josef has a different view and that's good, even fine tbh. Disagreement is good, that's how mistakes are avoided.
I agree which is why we need to be very clear about what you mean by failure. Intrinsic btrfs failures? Or that btrfs is more sensitive to hardware failures?
And also your recommendation necessarily means choosing a shorter lifespan for more people's hardware. It means leaving other useful features we could take advantage of, off the table. There is a choice to be made, no matter what.
How do you assess the value of extending the life of most people's hardware, to the negative UX shift in the disaster recovery pattern? That is difficult to assess objectively, so I don't dispute a subjective component to this evaluation. But we have to be clear about all the parts being evaluated and not just focus on worry.
That said, arguing doesn't do much good now, the decision looks like it has already been made.
It is definitely not made.
On 27 June 2020 17:55:09 CEST, Chris Murphy <lists(a)colorremedies.com> wrote:
The actual data I will never ever be able to share. I have ended my time at that particular company but even when I was there I was not permitted to share such data. Or did you mean data from openSUSE and Arch? Just have a look at their bug trackers.
Our bugtracker (openSUSE bugzilla that is) has been curiously silent about btrfs issues recently. Actually ever since we switched from btrfs + xfs setup to pure btrfs (with improved subvolume layout) we have seen way less complaints about most of the issues the users had previously.
LCP [Stasiek] https://lcp.world
On Sat, 2020-06-27 at 22:59 +0000, Stasiek Michalski wrote:
On 27 June 2020 17:55:09 CEST, Chris Murphy <lists(a)colorremedies.com> wrote:
The actual data I will never ever be able to share. I have ended my time at that particular company but even when I was there I was not permitted to share such data. Or did you mean data from openSUSE and Arch? Just have a look at their bug trackers.
Our bugtracker (openSUSE bugzilla that is) has been curiously silent about btrfs issues recently. Actually ever since we switched from btrfs + xfs setup to pure btrfs (with improved subvolume layout) we have seen way less complaints about most of the issues the users had previously.
(Hi LCP I hope life is good) That's great to hear, just a few questions. Lately, how would you rate that in number of years? While you seem to have pulled through, would you say the switch btrfs as default has been painful? But to summarize, I'm mainly glad you have fewer issues with btrfs now.
/M
LCP [Stasiek] https://lcp.world _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Sat, 2020-06-27 at 22:59 +0000, Stasiek Michalski wrote:
(Hi LCP I hope life is good) That's great to hear, just a few questions. Lately, how would you rate that in number of years?
The change to partitioning occurred in November of 2018 iirc, so it's over 1.5 years
While you seem to have pulled through, would you say the switch btrfs as default has been painful?
Oh yeah, it was a nightmare. If you plan on using ancient Kernels, you will not have a particularly great time with certain btrfs features either. The initial switch happened in 2014, and at the time, and the lack of confidence showed itself through the choice of secondary filesystem for /home. However, the reputation has been more problematic than the filesystem I feel like, since every single problem with the low level stuff was attributed to btrfs for so many years now
But to summarize, I'm mainly glad you have fewer issues with btrfs now.
I'm happier the users have less issues
LCP [Stasiek] https://lcp.world
On Sat, Jun 27, 2020 at 6:47 PM Stasiek Michalski stasiek@michalski.cc wrote:
On Sat, 2020-06-27 at 22:59 +0000, Stasiek Michalski wrote:
(Hi LCP I hope life is good) That's great to hear, just a few questions. Lately, how would you rate that in number of years?
The change to partitioning occurred in November of 2018 iirc, so it's over 1.5 years
While you seem to have pulled through, would you say the switch btrfs as default has been painful?
Oh yeah, it was a nightmare. If you plan on using ancient Kernels, you will not have a particularly great time with certain btrfs features either. The initial switch happened in 2014, and at the time, and the lack of confidence showed itself through the choice of secondary filesystem for /home. However, the reputation has been more problematic than the filesystem I feel like, since every single problem with the low level stuff was attributed to btrfs for so many years now
I wonder to what degree some of the problems, especially enospc bugs, were exacerbated by a somewhat small root for btrfs combined with a fairly aggressive snapshotting regime by default? I agree with the "shoot the messenger" problem with btrfs. It's a victim of its own design: reports the facts, but doesn't assign blame.
I'm happier the users have less issues
Agreed. What do you think are the biggest remaining issues you have with btrfs? Or even not directly btrfs, but side effects that are still unresolved? Any desktop integration issues that stand out in particular?
On Sat, Jun 27, 2020 at 6:47 PM Stasiek Michalski <stasiek(a)michalski.cc> wrote:
I wonder to what degree some of the problems, especially enospc bugs, were exacerbated by a somewhat small root for btrfs combined with a fairly aggressive snapshotting regime by default? I agree with the "shoot the messenger" problem with btrfs. It's a victim of its own design: reports the facts, but doesn't assign blame.
Yeah, some mistakes were made when handling the root size, some other issues with openQA when trying to fix it, Richard Brown had fun couple of weeks with that stuff, but it was all worth the effort. We didn't change much with how aggressively everything is snapshotted, because in practice, since most desktop updates are done on live systems (obviously excluding ro filesystems with transactional/atomic updates), everything can go wrong, both pre and post the transaction, so every snapshot might be the one you need
Agreed. What do you think are the biggest remaining issues you have with btrfs? Or even not directly btrfs, but side effects that are still unresolved? Any desktop integration issues that stand out in particular?
There is no gui for basically anything btrfs related anywhere, since SUSE has had close to 0 interest in desktop for around 10 years. Since I heard there is nobody maintaining gnome-disk-utility, I might have some motivation to help out with it, since I am a huge fan of it, so we will see how much time I have over the coming weeks to implement things there. We wouldn't want it to die like banshee, would we?
LCP [Stasiek] https://lcp.world
On Sat, Jun 27, 2020 at 8:05 PM Stasiek Michalski stasiek@michalski.cc wrote:
Yeah, some mistakes were made when handling the root size, some other issues with openQA when trying to fix it, Richard Brown had fun couple of weeks with that stuff, but it was all worth the effort. We didn't change much with how aggressively everything is snapshotted, because in practice, since most desktop updates are done on live systems (obviously excluding ro filesystems with transactional/atomic updates), everything can go wrong, both pre and post the transaction, so every snapshot might be the one you need
Can you elaborate on the sorts of reasons you'd need the pre rolled back versus the post? I imagine one is more common to use as a rollback than the other.
Agreed. What do you think are the biggest remaining issues you have with btrfs? Or even not directly btrfs, but side effects that are still unresolved? Any desktop integration issues that stand out in particular?
There is no gui for basically anything btrfs related anywhere, since SUSE has had close to 0 interest in desktop for around 10 years. Since I heard there is nobody maintaining gnome-disk-utility, I might have some motivation to help out with it, since I am a huge fan of it, so we will see how much time I have over the coming weeks to implement things there. We wouldn't want it to die like banshee, would we?
That would be cool. There are some notes about this in the tracker for the proposal we're using, #153. In particular when I think of the layout (open)SUSE is using, I'd think you probably don't want to show all subvolumes in this interface, let alone subvolume snapshots (many of those on an (open)SUSE system!)
On Sat, Jun 27, 2020 at 8:05 PM Stasiek Michalski <stasiek(a)michalski.cc> wrote:
Can you elaborate on the sorts of reasons you'd need the pre rolled back versus the post? I imagine one is more common to use as a rollback than the other.
Post is usually used when something else goes wrong with the system, outside of use cases foreseen by the automated snapshots. So it depends when the issue happens, and which part of the system caused it. Obviously we can't expect the user to make snapshots before doing something potentially dangerous, so after the last update seems like a good restore point for a system
That would be cool. There are some notes about this in the tracker for the proposal we're using, #153. In particular when I think of the layout (open)SUSE is using, I'd think you probably don't want to show all subvolumes in this interface, let alone subvolume snapshots (many of those on an (open)SUSE system!)
Yup, I already had a look in places, to see what is needed, and what people expect from gdu and associated utilities
LCP [Stasiek] https://lcp.world
On Sun, Jun 28, 2020 at 2:05 am, Stasiek Michalski stasiek@michalski.cc wrote:
There is no gui for basically anything btrfs related anywhere, since SUSE has had close to 0 interest in desktop for around 10 years. Since I heard there is nobody maintaining gnome-disk-utility, I might have some motivation to help out with it, since I am a huge fan of it, so we will see how much time I have over the coming weeks to implement things there. We wouldn't want it to die like banshee, would we?
It's being maintained by Kai Lüke, but certainly doesn't appear to be under active development. I'm sure he would appreciate help. :) Certainly, nobody has volunteered to work on btrfs support there. I know udisks2 has btrfs API, though, which should help.
I'm strongly against this proposal. BTRFS is the most unstable file system I ever seen. It can break up even under an ideal conditions and lead to a complete data loss. There are lots of complaints and bug reports in Linux kernel bugzilla and Reddit.
Without providing evidence, this is just unsubstantiated FUD. As with any piece of software, also btrfs may have bugs, but the only know issue which may have implications with regard to data loss are the raid5/6 write holes, which are documented on btrfs' gotchas [1] and status [2] pages. However, there are many other good reasons why raid5/6 configurations should be avoided - with any filesystem. For more detailed explanations see [3] and [4]. Even though these articles are written for ZFS, the drawbacks around raid5/6 apply equally well to other filesystems.
Also, to note, many of the early "issues" and "bug" reports around btrfs were due to user-space utilities such as snapper. I ran into some of these issues myself (specifically a issue with meta-data on openSUSE at around 2012), which made me very sceptical of btrfs for several years. However, I recently did my research on modern filesystems when setting up a home NAS and came to the conclusion that ZFS and btrfs are the best filesystems currently available. I subsequently opted for ZFS due to the excellent community support and user-space utilities, and do not only use ZFS for the NAS, but also on my Fedora laptop.
I personally like the articles on modern filesystems by Jim Salter, where especially the one from 2014 discusses the advantages of ZFS and btrfs [5]. (That article was a excellent entry point to the topic for, and is especially well suited for people otherwise not really familiar with this topic.)
Armin
[1] https://btrfs.wiki.kernel.org/index.php/Gotchas [2] https://btrfs.wiki.kernel.org/index.php/Status [3] https://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/ [4] http://nex7.blogspot.com/2013/03/readme1st.html [5] https://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cow...
Wow! Is it 2010 already? Time flies! :)
In seriousness: thanks for all of the effort put into this change proposal, and the impressive list of change owners. I'm following the discussion here with much interest!
I couldn't believe it either when I saw the proposal, so 2010-ish :)
Anyway I'm in great favour of this proposal and I'd love to see btrfs the default. I personally use it in all of my systems (desktops, laptops and workstations) except for servers, where it lacks the reliability on some raid configurations I use (instead I use zfs, which also supports native encryption). I had my share of issues in the early days but it has proven to be extremely reliable lately. My biggest complain nowadays is this: https://lwn.net/Articles/674865/ Not being able to mount partitions of my Raptor Talos Power 9 into my x86 systems annoys me, but I guess it shouldn't bother many people. On the other side I had lots of hardware issues on my Raptor machine and my btrfs hourly snapshots already saved my day multiple times (latest one was while upgrading from Fedora 31 to 32). Would love to see it the default, possibly with full grub rollback integration.
On 6/26/20 11:14 AM, niccolo.belli@linuxsystems.it wrote:
I couldn't believe it either when I saw the proposal, so 2010-ish :)
Anyway I'm in great favour of this proposal and I'd love to see btrfs the default.
Glad to hear!
My biggest complain nowadays is this: https://lwn.net/Articles/674865/ Not being able to mount partitions of my Raptor Talos Power 9 into my x86 systems annoys me, but I guess it shouldn't bother many people. On the other side I had lots of hardware issues on my Raptor machine and my btrfs hourly snapshots already saved my day multiple times (latest one was while upgrading from Fedora 31 to 32).
Sadly that's the present situation, but I think that's being worked on. As desktop ARM hopefully gains in popularity (between Raspberry Pi 4 being almost there, and Apple about to ship ARM-based Macs, the next couple of years will be interesting) - as this happens I could imagine the need for mounting partitions created on Intel on ARM machine and vice versa will be more important.
From what Josef told me, once the kernel's btrfs driver supports this existing filesystems would mount fine cross platform.
Regards,
On Fri, Jun 26, 2020 at 11:00 AM Matthew Miller mattdm@fedoraproject.org wrote:
Wow! Is it 2010 already? Time flies! :)
In seriousness: thanks for all of the effort put into this change proposal, and the impressive list of change owners. I'm following the discussion here with much interest!
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
Michael
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
On Tue, Jun 30, 2020 at 11:22 AM Stephen John Smoogen smooge@gmail.com wrote:
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead.
Right, this is basically what I was trying to say here. I think it's all well and good that the proposal has plenty of support, but the fact of the matter is that the Workstation WG is the set of people who will be stuck with maintaining it long-term, so I'd prefer that they at least get to say "Sure, let's do it", or "No way in Hell can we handle that".
On Tuesday, June 30, 2020 8:22:00 AM MST Stephen John Smoogen wrote:
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
Well, it's not only Workstation that this proposal is trying to throw btrfs on, but the other desktops as well, such as KDE Spin.
On Tue, Jun 30, 2020 at 2:39 PM John M. Harris Jr johnmh@splentity.com wrote:
On Tuesday, June 30, 2020 8:22:00 AM MST Stephen John Smoogen wrote:
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
Well, it's not only Workstation that this proposal is trying to throw btrfs on, but the other desktops as well, such as KDE Spin.
And I am driving this as a member of the KDE SIG, though I am a member of both groups. Both the Workstation WG and KDE SIG are responsible groups for this Change. Chris Murphy is the primary driver of this from the Workstation WG side, and I am from the KDE SIG side.
-- 真実はいつも一つ!/ Always, there's only one truth!
On Tue, Jun 30, 2020 at 1:39 PM John M. Harris Jr johnmh@splentity.com wrote:
On Tuesday, June 30, 2020 8:22:00 AM MST Stephen John Smoogen wrote:
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
Well, it's not only Workstation that this proposal is trying to throw btrfs on, but the other desktops as well, such as KDE Spin.
How is that even a thing? Shouldn't a spin maintainer be responsible for choosing the defaults of their spin? This proposal seems fairly absurd in the regard of dictating what other people should do.
On Tuesday, June 30, 2020, Justin Forbes jmforbes@linuxtx.org wrote:
On Tue, Jun 30, 2020 at 1:39 PM John M. Harris Jr johnmh@splentity.com wrote:
On Tuesday, June 30, 2020 8:22:00 AM MST Stephen John Smoogen wrote:
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation
deliverable,
I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them
are
Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we
were
not making a ton of progress towards reaching a decision either way,
so
going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed
like
a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At
least,
my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because
FESCo
always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need
to
switch to btfs" (which I highly doubt would happen). So I don't see
any
reason to vote -1 here out of concern for overriding the WG.
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
Well, it's not only Workstation that this proposal is trying to throw
btrfs
on, but the other desktops as well, such as KDE Spin.
How is that even a thing? Shouldn't a spin maintainer be responsible for choosing the defaults of their spin? This proposal seems fairly absurd in the regard of dictating what other people should do.
That argument can be used against any change not restricted to a specific spin. Treating all desktop based spins the same unless there is a reason not to makes sense.
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject. org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists. fedoraproject.org
On Tue, Jun 30, 2020 at 4:30 PM Justin Forbes jmforbes@linuxtx.org wrote:
On Tue, Jun 30, 2020 at 1:39 PM John M. Harris Jr johnmh@splentity.com wrote:
On Tuesday, June 30, 2020 8:22:00 AM MST Stephen John Smoogen wrote:
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
Well, it's not only Workstation that this proposal is trying to throw btrfs on, but the other desktops as well, such as KDE Spin.
How is that even a thing? Shouldn't a spin maintainer be responsible for choosing the defaults of their spin? This proposal seems fairly absurd in the regard of dictating what other people should do.
For what it's worth, I asked spin owners from each one before adding them. That's why the change covers them all, they all assented to it. I am doing all the work for it, but I asked for their approval to be covered under this.
Please don't assume such absurd things like that, especially given the list of change owners and listed responsible entities.
-- 真実はいつも一つ!/ Always, there's only one truth!
On Tue, Jun 30, 2020 at 4:02 PM Neal Gompa ngompa13@gmail.com wrote:
On Tue, Jun 30, 2020 at 4:30 PM Justin Forbes jmforbes@linuxtx.org wrote:
On Tue, Jun 30, 2020 at 1:39 PM John M. Harris Jr johnmh@splentity.com wrote:
On Tuesday, June 30, 2020 8:22:00 AM MST Stephen John Smoogen wrote:
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
Well, it's not only Workstation that this proposal is trying to throw btrfs on, but the other desktops as well, such as KDE Spin.
How is that even a thing? Shouldn't a spin maintainer be responsible for choosing the defaults of their spin? This proposal seems fairly absurd in the regard of dictating what other people should do.
For what it's worth, I asked spin owners from each one before adding them. That's why the change covers them all, they all assented to it. I am doing all the work for it, but I asked for their approval to be covered under this.
Please don't assume such absurd things like that, especially given the list of change owners and listed responsible entities.
I honestly hadn't considered it until it came up that the Workstation WG has not come to agreement on this change yet. Either way, it is my belief that the spins should be able to decide what they want to use, when they want to use it. If they have bought in, that's great. From a kernel standpoint, the only change being asked here is to make btrfs inline instead of a module. If it is to become the default fs for any spin, I don't have a problem with that.
On Tue, Jun 30, 2020 at 5:19 PM Justin Forbes jmforbes@linuxtx.org wrote:
On Tue, Jun 30, 2020 at 4:02 PM Neal Gompa ngompa13@gmail.com wrote:
On Tue, Jun 30, 2020 at 4:30 PM Justin Forbes jmforbes@linuxtx.org wrote:
On Tue, Jun 30, 2020 at 1:39 PM John M. Harris Jr johnmh@splentity.com wrote:
On Tuesday, June 30, 2020 8:22:00 AM MST Stephen John Smoogen wrote:
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
> For the record, as this directly affects the Workstation deliverable, > I will be voting -1 until and unless the Workstation WG votes in > favor. > > > > Yes, it's a large set of Change owners, but since only two of them are > Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
Well, it's not only Workstation that this proposal is trying to throw btrfs on, but the other desktops as well, such as KDE Spin.
How is that even a thing? Shouldn't a spin maintainer be responsible for choosing the defaults of their spin? This proposal seems fairly absurd in the regard of dictating what other people should do.
For what it's worth, I asked spin owners from each one before adding them. That's why the change covers them all, they all assented to it. I am doing all the work for it, but I asked for their approval to be covered under this.
Please don't assume such absurd things like that, especially given the list of change owners and listed responsible entities.
I honestly hadn't considered it until it came up that the Workstation WG has not come to agreement on this change yet. Either way, it is my belief that the spins should be able to decide what they want to use, when they want to use it. If they have bought in, that's great. From a kernel standpoint, the only change being asked here is to make btrfs inline instead of a module. If it is to become the default fs for any spin, I don't have a problem with that.
I submitted it because it was agreed to submit it[1]. I would have waited otherwise.
[1]: https://meetbot.fedoraproject.org/teams/workstation/workstation.2020-06-25-0...
On Tue, Jun 30, 2020 at 4:29 PM Neal Gompa ngompa13@gmail.com wrote:
On Tue, Jun 30, 2020 at 5:19 PM Justin Forbes jmforbes@linuxtx.org wrote:
On Tue, Jun 30, 2020 at 4:02 PM Neal Gompa ngompa13@gmail.com wrote:
On Tue, Jun 30, 2020 at 4:30 PM Justin Forbes jmforbes@linuxtx.org wrote:
On Tue, Jun 30, 2020 at 1:39 PM John M. Harris Jr johnmh@splentity.com wrote:
On Tuesday, June 30, 2020 8:22:00 AM MST Stephen John Smoogen wrote:
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote: > > > On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher > sgallagh@redhat.com wrote: > > > For the record, as this directly affects the Workstation deliverable, > > I will be voting -1 until and unless the Workstation WG votes in > > favor. > > > > > > > > Yes, it's a large set of Change owners, but since only two of them are > > Workstation WG members, they are not representative of that group. > > > > Workstation WG hat on: > > > > I don't think there's any need to vote -1 for that reason alone. The > Workstation WG has discussed the change proposal at several meetings > recently (really, we've spent a long time on this), and frankly we were > not making a ton of progress towards reaching a decision either way, so > going forward with the change proposal and moving the discussion to > devel@ to get feedback from a wider audience and from FESCo seemed like > a good idea. Most likely, we'll wind up doing whatever FESCo chooses > here, but unless FESCo were to explicitly indicate intent to override > the Workstation WG, we would not consider a FESCo decision to limit > what the Workstation WG can do with the Workstation product. At least, > my understanding of the power structure FESCo has established is that > the WG can make product-specific decisions that differ from FESCo's > decisions whenever we want, unless FESCo says otherwise (because FESCo > always has final say). That is, if FESCo were to approve btrfs by > default, but Workstation WG were to vote to stick with ext4, then we > would stick with ext4 unless FESCo were to say "no really, you need to > switch to btfs" (which I highly doubt would happen). So I don't see any > reason to vote -1 here out of concern for overriding the WG. > >
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
Well, it's not only Workstation that this proposal is trying to throw btrfs on, but the other desktops as well, such as KDE Spin.
How is that even a thing? Shouldn't a spin maintainer be responsible for choosing the defaults of their spin? This proposal seems fairly absurd in the regard of dictating what other people should do.
For what it's worth, I asked spin owners from each one before adding them. That's why the change covers them all, they all assented to it. I am doing all the work for it, but I asked for their approval to be covered under this.
Please don't assume such absurd things like that, especially given the list of change owners and listed responsible entities.
I honestly hadn't considered it until it came up that the Workstation WG has not come to agreement on this change yet. Either way, it is my belief that the spins should be able to decide what they want to use, when they want to use it. If they have bought in, that's great. From a kernel standpoint, the only change being asked here is to make btrfs inline instead of a module. If it is to become the default fs for any spin, I don't have a problem with that.
I submitted it because it was agreed to submit it[1]. I would have waited otherwise.
So it seems the purpose of the proposal was to generate discussion (which it certainly has), but the Workstation WG has not decided what they really want yet. I do get wanting discussion about it. I do not get how it is a proper change request at this point. Seems very much like "We would like to propose a change that we may or may not do", and if the decision is ultimately to not do it, time was wasted.
On Tue, 30 Jun 2020 at 17:09, Neal Gompa ngompa13@gmail.com wrote:
On Tue, Jun 30, 2020 at 4:30 PM Justin Forbes jmforbes@linuxtx.org wrote:
On Tue, Jun 30, 2020 at 1:39 PM John M. Harris Jr johnmh@splentity.com wrote:
On Tuesday, June 30, 2020 8:22:00 AM MST Stephen John Smoogen wrote:
On Tue, 30 Jun 2020 at 11:09, Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
The problem is that the request as discussed reads as "FESCo says use it for workstation" vs "FESCo has no problem with Workstation saying they want btrfs" or "FESCo says use btrfs as default". Yes it says "desktop variants" but only 1 variant really counts and that is Workstation. So yes, either Workstation agrees to it or it isn't getting voted on. If Workstation can't come to an agreement on it, then the proposal is dead. Anything else is an end-run and a useless trolling of people to see how many rants LWN counts in its weekly messages.
Well, it's not only Workstation that this proposal is trying to throw btrfs on, but the other desktops as well, such as KDE Spin.
How is that even a thing? Shouldn't a spin maintainer be responsible for choosing the defaults of their spin? This proposal seems fairly absurd in the regard of dictating what other people should do.
For what it's worth, I asked spin owners from each one before adding them. That's why the change covers them all, they all assented to it. I am doing all the work for it, but I asked for their approval to be covered under this.
Please don't assume such absurd things like that, especially given the list of change owners and listed responsible entities.
The issue isn't that you haven't done your work. It is that it looks like you were set up to fail. The email from Michael comes across that Workstation couldn't make a decision and told you to go see if FESCO would approve it... but even then they don't have to follow through on it because they are independent. So all that work, all the tantrums from people who just love to fly off the handle on anything, all that bull.. is for essentially nothing. Because in the end, if FESCO does approve it, it means every spin etc is stuck with it while Workstation can decide not to... even though they sent you to get the decision. That is where if I was on FESCO I would say this proposal is dead. Either a Working Group wants something and will fight for it, or they don't. If they don't and have veto authority over anything FESCO says.. then it doesn't matter what FESCO decides.
On Tue, Jun 30, 2020 at 7:16 pm, Stephen John Smoogen smooge@gmail.com wrote:
The issue isn't that you haven't done your work. It is that it looks like you were set up to fail. The email from Michael comes across that Workstation couldn't make a decision and told you to go see if FESCO would approve it... but even then they don't have to follow through on it because they are independent. So all that work, all the tantrums from people who just love to fly off the handle on anything, all that bull.. is for essentially nothing. Because in the end, if FESCO does approve it, it means every spin etc is stuck with it while Workstation can decide not to... even though they sent you to get the decision. That is where if I was on FESCO I would say this proposal is dead. Either a Working Group wants something and will fight for it, or they don't. If they don't and have veto authority over anything FESCO says.. then it doesn't matter what FESCO decides.
At this point, we're discussing a weird corner case where FESCo approves this change proposal and then the WG does not. I guess it's my fault for suggesting that might occur, but it's really not a very likely scenario. Reality is that the WG members are not filesystem experts and after several weeks of discussing the issue, it became clear that we need more feedback from a larger group of developers. That's what the systemwide change proposal process is designed for.
And to be clear, FESCo has veto authority over the WG, not the other way around. The WG was actually created by FESCo itself. I think technically we're a subcommittee of FESCo. Of course we certainly expect that we can ship Fedora Workstation with different defaults than the rest of Fedora, to the extent FESCo continues to allow that.
Michael
On Tue, Jun 30, 2020 at 9:03 PM Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 7:16 pm, Stephen John Smoogen smooge@gmail.com wrote:
The issue isn't that you haven't done your work. It is that it looks like you were set up to fail. The email from Michael comes across that Workstation couldn't make a decision and told you to go see if FESCO would approve it... but even then they don't have to follow through on it because they are independent. So all that work, all the tantrums from people who just love to fly off the handle on anything, all that bull.. is for essentially nothing. Because in the end, if FESCO does approve it, it means every spin etc is stuck with it while Workstation can decide not to... even though they sent you to get the decision. That is where if I was on FESCO I would say this proposal is dead. Either a Working Group wants something and will fight for it, or they don't. If they don't and have veto authority over anything FESCO says.. then it doesn't matter what FESCO decides.
At this point, we're discussing a weird corner case where FESCo approves this change proposal and then the WG does not. I guess it's my fault for suggesting that might occur, but it's really not a very likely scenario. Reality is that the WG members are not filesystem experts and after several weeks of discussing the issue, it became clear that we need more feedback from a larger group of developers. That's what the systemwide change proposal process is designed for.
And to be clear, FESCo has veto authority over the WG, not the other way around. The WG was actually created by FESCo itself. I think technically we're a subcommittee of FESCo. Of course we certainly expect that we can ship Fedora Workstation with different defaults than the rest of Fedora, to the extent FESCo continues to allow that.
I think there has been a good deal of miscommunication on all sides (starting with me).
What I was attempting to say in the first place was this: "It's not clear to me that this proposal has the blessing of the Workstation WG or Spins. I'm not willing to *assert* that they must do this work without hearing whether they are willing and have capacity to do so." I think I phrased this poorly initially.
What I would like is just to have a statement added to the Change Proposal that "Workstation WG and the maintainers of Spins Foo, Bar and Baz are willing to make this the default if this Change Proposal is accepted." I just didn't want anyone getting *dictated* at without their input.
On Wed, Jul 1, 2020 at 8:55 AM Stephen Gallagher sgallagh@redhat.com wrote:
On Tue, Jun 30, 2020 at 9:03 PM Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 7:16 pm, Stephen John Smoogen smooge@gmail.com wrote:
The issue isn't that you haven't done your work. It is that it looks like you were set up to fail. The email from Michael comes across that Workstation couldn't make a decision and told you to go see if FESCO would approve it... but even then they don't have to follow through on it because they are independent. So all that work, all the tantrums from people who just love to fly off the handle on anything, all that bull.. is for essentially nothing. Because in the end, if FESCO does approve it, it means every spin etc is stuck with it while Workstation can decide not to... even though they sent you to get the decision. That is where if I was on FESCO I would say this proposal is dead. Either a Working Group wants something and will fight for it, or they don't. If they don't and have veto authority over anything FESCO says.. then it doesn't matter what FESCO decides.
At this point, we're discussing a weird corner case where FESCo approves this change proposal and then the WG does not. I guess it's my fault for suggesting that might occur, but it's really not a very likely scenario. Reality is that the WG members are not filesystem experts and after several weeks of discussing the issue, it became clear that we need more feedback from a larger group of developers. That's what the systemwide change proposal process is designed for.
And to be clear, FESCo has veto authority over the WG, not the other way around. The WG was actually created by FESCo itself. I think technically we're a subcommittee of FESCo. Of course we certainly expect that we can ship Fedora Workstation with different defaults than the rest of Fedora, to the extent FESCo continues to allow that.
I think there has been a good deal of miscommunication on all sides (starting with me).
What I was attempting to say in the first place was this: "It's not clear to me that this proposal has the blessing of the Workstation WG or Spins. I'm not willing to *assert* that they must do this work without hearing whether they are willing and have capacity to do so." I think I phrased this poorly initially.
What I would like is just to have a statement added to the Change Proposal that "Workstation WG and the maintainers of Spins Foo, Bar and Baz are willing to make this the default if this Change Proposal is accepted." I just didn't want anyone getting *dictated* at without their input.
To me, this sounds weird, because the implication of this Change being accepted is that we *would* do this. That's sort of the point of it. The owners of the spins are listed as change owners because I talked to all of them and they all accepted. I even have the pull request ready for Anaconda to make the change as soon as the change is accepted (I'm working on the other bits, kickstarts are complicated...).
I would say that this is redundant with the statement that "the default for new installs shall be btrfs" that is in the Change itself.
Nobody is being forced to do this in the manner I'm guessing you think.
-- 真実はいつも一つ!/ Always, there's only one truth!
On Wed, Jul 1, 2020 at 7:54 AM Stephen Gallagher sgallagh@redhat.com wrote:
On Tue, Jun 30, 2020 at 9:03 PM Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 7:16 pm, Stephen John Smoogen smooge@gmail.com wrote:
The issue isn't that you haven't done your work. It is that it looks like you were set up to fail. The email from Michael comes across that Workstation couldn't make a decision and told you to go see if FESCO would approve it... but even then they don't have to follow through on it because they are independent. So all that work, all the tantrums from people who just love to fly off the handle on anything, all that bull.. is for essentially nothing. Because in the end, if FESCO does approve it, it means every spin etc is stuck with it while Workstation can decide not to... even though they sent you to get the decision. That is where if I was on FESCO I would say this proposal is dead. Either a Working Group wants something and will fight for it, or they don't. If they don't and have veto authority over anything FESCO says.. then it doesn't matter what FESCO decides.
At this point, we're discussing a weird corner case where FESCo approves this change proposal and then the WG does not. I guess it's my fault for suggesting that might occur, but it's really not a very likely scenario. Reality is that the WG members are not filesystem experts and after several weeks of discussing the issue, it became clear that we need more feedback from a larger group of developers. That's what the systemwide change proposal process is designed for.
And to be clear, FESCo has veto authority over the WG, not the other way around. The WG was actually created by FESCo itself. I think technically we're a subcommittee of FESCo. Of course we certainly expect that we can ship Fedora Workstation with different defaults than the rest of Fedora, to the extent FESCo continues to allow that.
I think there has been a good deal of miscommunication on all sides (starting with me).
What I was attempting to say in the first place was this: "It's not clear to me that this proposal has the blessing of the Workstation WG or Spins. I'm not willing to *assert* that they must do this work without hearing whether they are willing and have capacity to do so." I think I phrased this poorly initially.
What I would like is just to have a statement added to the Change Proposal that "Workstation WG and the maintainers of Spins Foo, Bar and Baz are willing to make this the default if this Change Proposal is accepted." I just didn't want anyone getting *dictated* at without their input.
So why not word the proposal "The Workstation WG and maintainers of Spins Foo, Bar, and Baz are free to make btrfs the default file system if they so choose"?
Justin
On Tue, Jun 30, 2020 at 10:00 AM Michael Catanzaro mcatanzaro@gnome.org wrote:
On Tue, Jun 30, 2020 at 10:26 am, Stephen Gallagher sgallagh@redhat.com wrote:
For the record, as this directly affects the Workstation deliverable, I will be voting -1 until and unless the Workstation WG votes in favor.
Yes, it's a large set of Change owners, but since only two of them are Workstation WG members, they are not representative of that group.
Workstation WG hat on:
I don't think there's any need to vote -1 for that reason alone. The Workstation WG has discussed the change proposal at several meetings recently (really, we've spent a long time on this), and frankly we were not making a ton of progress towards reaching a decision either way, so going forward with the change proposal and moving the discussion to devel@ to get feedback from a wider audience and from FESCo seemed like a good idea. Most likely, we'll wind up doing whatever FESCo chooses here, but unless FESCo were to explicitly indicate intent to override the Workstation WG, we would not consider a FESCo decision to limit what the Workstation WG can do with the Workstation product. At least, my understanding of the power structure FESCo has established is that the WG can make product-specific decisions that differ from FESCo's decisions whenever we want, unless FESCo says otherwise (because FESCo always has final say). That is, if FESCo were to approve btrfs by default, but Workstation WG were to vote to stick with ext4, then we would stick with ext4 unless FESCo were to say "no really, you need to switch to btfs" (which I highly doubt would happen). So I don't see any reason to vote -1 here out of concern for overriding the WG.
As I said earlier when I brought up the kernel stance on this, I very much consider this a Workstation WG decision. If the Workstation WG can not come to a consensus, I don't think it should be on FESCo to force one. It is not like what is there now is broken. Going from EXT4 to BTRFS or any other file system is a series of trade offs. You gain some features, you lose some features. I would recommend that FESCo not take this up until the Workstation WG can come to a consensus.
Justin
On Fri, Jun 26, 2020 at 10:42:25AM -0400, Ben Cotton wrote:
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
So... can btrfs now be trusted to not crap itself?
The change is based on the installer's custom partitioning Btrfs preset. It's been well tested for 7 years.
What does "Well tested" mean, in this context? Do we have data that shows roughly how many installs were done in Fedora-land, and how long they lasted?
(two of the installs in that 7 year period were mine, and ended in complete filesystem loss across clean shutdown/restart cycles. Hardware is still in use, and other than a failed fan, hasn't so much as hiccupped since scrapping btrfs)
- Solomon
On 6/26/20 11:04 AM, Solomon Peachy wrote:
On Fri, Jun 26, 2020 at 10:42:25AM -0400, Ben Cotton wrote:
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
So... can btrfs now be trusted to not crap itself?
The change is based on the installer's custom partitioning Btrfs preset. It's been well tested for 7 years.
What does "Well tested" mean, in this context? Do we have data that shows roughly how many installs were done in Fedora-land, and how long they lasted?
Not Fedora land, but Facebook installs it on all of our root devices, so millions of machines. We've done this for 5 years. It's worked out very well. Thanks,
Josef
On Fri, Jun 26, 2020 at 11:13:39AM -0400, Josef Bacik wrote:
Not Fedora land, but Facebook installs it on all of our root devices, so millions of machines. We've done this for 5 years. It's worked out very well. Thanks,
Josef, I'd love to hear your comments on any differences between that situation and the typical laptop-user case for Fedora desktop systems. Anything we should consider?
On Fri, Jun 26, 2020 at 11:15:54AM -0400, Matthew Miller wrote:
On Fri, Jun 26, 2020 at 11:13:39AM -0400, Josef Bacik wrote:
Not Fedora land, but Facebook installs it on all of our root devices, so millions of machines. We've done this for 5 years. It's worked out very well. Thanks,
Josef, I'd love to hear your comments on any differences between that situation and the typical laptop-user case for Fedora desktop systems. Anything we should consider?
And, perhaps more crucially, what subset of btrfs features are in use.
(Plus perhaps the underlying hardware; I suspect the server-class hardware facebook uses is a grade above the typical desktop..)
- Solomon
On Fri, 26 Jun 2020 at 11:36, Solomon Peachy pizza@shaftnet.org wrote:
On Fri, Jun 26, 2020 at 11:15:54AM -0400, Matthew Miller wrote:
On Fri, Jun 26, 2020 at 11:13:39AM -0400, Josef Bacik wrote:
Not Fedora land, but Facebook installs it on all of our root devices, so millions of machines. We've done this for 5 years. It's worked out very well. Thanks,
Josef, I'd love to hear your comments on any differences between that situation and the typical laptop-user case for Fedora desktop systems. Anything we should consider?
And, perhaps more crucially, what subset of btrfs features are in use.
(Plus perhaps the underlying hardware; I suspect the server-class hardware facebook uses is a grade above the typical desktop..)
Actually the opposite. The facebook hardware is built at scale on nearly the cheapest it can be. That said, they have a different fail scale than personal hardware.. they are ok if entire racks or areas in a DC blow themselves up because the data is spread out. That is different from a laptop where the failure means loss of everything since the last backup (aka never). So what is stored on the systems is a different usecase and how it is considered safe is different. That said they have probably put in a lot more on scale testing of the filesystem than Fedora could do.
[this is neither an endorsement or hatred of the proposal.. ]
On 6/26/20 11:15 AM, Matthew Miller wrote:
On Fri, Jun 26, 2020 at 11:13:39AM -0400, Josef Bacik wrote:
Not Fedora land, but Facebook installs it on all of our root devices, so millions of machines. We've done this for 5 years. It's worked out very well. Thanks,
Josef, I'd love to hear your comments on any differences between that situation and the typical laptop-user case for Fedora desktop systems. Anything we should consider?
We buy worse hardware than a typical laptop user uses, at least for our hard drives. Also we hit our disks harder than most typical Fedora users. Consider the web tier for example, we push the entire website to every box in the web tier (measured in hundreds of thousands of machines) probably 6-10 times a day. This is roughly 40 gib of data, getting written to these truly terrible consumer grade flash drives (along with some spinning rust), 6-10 times a day. In addition to the normal sort of logging, package updates, etc that happen.
Also keep in mind we pay really close attention to burn rates for our drives, because obviously at our scale it translates to millions of dollars. Btrfs has improved our burn rates with the compression, as the write amplification goes drastically down, thus extending the life of the drives.
Obviously the Facebook scale, recoverability, and workload is going to be drastically different from a random Fedora user. But hardware wise we are pretty close, at least on the disk side. Thanks,
Josef
On Fri, Jun 26, 2020 at 12:30:35PM -0400, Josef Bacik wrote:
Obviously the Facebook scale, recoverability, and workload is going to be drastically different from a random Fedora user. But hardware wise we are pretty close, at least on the disk side. Thanks,
Thanks. I guess it's really recoverability I'm most concerned with. I expect that if one of these nodes has a metadata corruption that results in an unbootable system, that's really no big deal in the big scheme of things. It's a bigger deal to home users. :)
On 6/26/20 12:43 PM, Matthew Miller wrote:
On Fri, Jun 26, 2020 at 12:30:35PM -0400, Josef Bacik wrote:
Obviously the Facebook scale, recoverability, and workload is going to be drastically different from a random Fedora user. But hardware wise we are pretty close, at least on the disk side. Thanks,
Thanks. I guess it's really recoverability I'm most concerned with. I expect that if one of these nodes has a metadata corruption that results in an unbootable system, that's really no big deal in the big scheme of things. It's a bigger deal to home users. :)
Sure, I've answered this a few different times with various members of the working group committee (or whatever they're called nowadays). I'll copy and paste what I said to them. The context is "what do we do with bad drives that blow up at the wrong time".
Now as for what does the average Fedora user do? I've also addressed that a bunch over the last few weeks, but instead of pasting like 9 emails I'll just summarize.
The UX of a completely fucked fs sucks, irregardless of the file system. Systemd currently (but will soon apparently) does not handle booting with a read only file system, which is essentially what you get when you have critical metadata corrupted. You are dumped to a emergency shell, and then you have to know what to do from there.
With ext4/xfs, you mount read only or you run fsck. With Btrfs you can do that too, but then there's like a whole level of other options depending on how bad the disk is. I've written a lot of tools over the years (which are in btrfs-progs) to recover various levels of broken file systems. To the point that you can pretty drastically mess up a FS and I'll still be able to pull data from the disk.
But, again, the UX for this _sucks_. You have to know first of all that you should try mounting read only, and then you have to get something plugged into the box and copy it over. And then assume the worst, you can't mount read only. Now with ext4/xfs that's it, you are done. With btrfs you are just getting started. You have several built in mount options for recovering different failures, all read only. But you have to know that they are there and how to use them.
These things are easily addressed with documentation, but that's only so good. This sort of scenario needs to be baked into Fedora itself, because it's the same problem no matter which file system you use. Thanks,
Josef
Email elaborating my comments about btrfs's sensitivity to bad hardware and how we test.
---------------
The fact is I can make any file system unmountable with the right corruption. The only difference with btrfs is that our metadata is completely dynamic, while xfs and ext4 are less so. So they're overwriting the same blocks over and over again, and there is simply less of "important" metadata for the file system to function.
The "problem" that btrfs has is it's main strength, it does COW. That means our important metadata is constantly being re-written to different segments of the disk. So if you have a bad disk, you are much more likely to get unlucky and end up with some core piece of metadata getting corrupted, and thus resulting in a file system that cannot be mounted read/write.
Now you are much more likely to hit this in a data segment, because generally speaking there's more data writes than metadata writes. The thing I brought up in the meeting last week was a potential downside for sure, but not something that will be a common occurrence. I just checked the fleet for this week and we've had to reprovision 20 machines out of 138 machines that threw crc errors, out of N total machines with btrfs fs'es, which is in the millions. In the same time period I have 15 xfs boxes that needed to be reprovisioned because of metadata corruption, out of <100k machines that have xfs. I don't have data on ext4 because it doesn't exist in our fleet anymore.
As for testing, there are 8 tests in xfstests that utilize my dm-log-writes target. These tests mount the file system, do a random workload, and then replay the workload one write at a time to validate the file system isn't left in some intermediate broken state. This simulates the case of weird things happening but in a much more concrete and repeatable manner.
There's 65 tests that utilize dm-flakey, which randomly corrupts or drops writes, and again these are to test different scenarios that have given us issues in the past. There's more of these because up until a few years ago this was our only mechanism for testing this class of failures. I wrote dm-log-writes to bring some determinism to our testing.
All of our file systems in linux are extremely thoroughly tested for a variety of power fail cases. The only area that btrfs is more likely to screw up is in the case of bad hardware, and again we're not talking like huge percentage points difference. It's a trade off. You are trading a slight increased percentage that bad hardware will result in a file system that cannot be mounted read/write for the ability to detect silent corruption from your memory, cpu, or storage device. Thanks,
Josef
Le vendredi 26 juin 2020 à 12:30 -0400, Josef Bacik a écrit :
On 6/26/20 11:15 AM, Matthew Miller wrote:
On Fri, Jun 26, 2020 at 11:13:39AM -0400, Josef Bacik wrote:
Not Fedora land, but Facebook installs it on all of our root devices, so millions of machines. We've done this for 5 years. It's worked out very well. Thanks,
Josef, I'd love to hear your comments on any differences between that situation and the typical laptop-user case for Fedora desktop systems. Anything we should consider?
We buy worse hardware than a typical laptop user uses, at least for our hard drives.
The difference between an operation like Facebook and the Fedora user base, it that Facebook will have a huge fleet of crap hardware, with the support teams to baby-sit the crap hardware, and attention to reducing the variety of crap hardware to limit the support matrix breadth, while Fedora has to deal with a huge support matrix breadth, without the support teams and the support team tooling to baby-sit hardware. (Besides Facebook designs the levels of crapiness they allow in their hardware, meaning they know exactly where they are pushing limits to lower hardware costs).
And, it’s not always the crap hardware that hits bugs. Sometimes expensive gamer hardware will fail first because its manufacturer has pushed the limits to eke some performance points over the competition.
Therefore, using btrfs in Fedora, is inherently more ambitious, than using it at Facebook.
Regards,
On 6/27/20 2:57 AM, Nicolas Mailhot via devel wrote:
Le vendredi 26 juin 2020 à 12:30 -0400, Josef Bacik a écrit :
On 6/26/20 11:15 AM, Matthew Miller wrote:
On Fri, Jun 26, 2020 at 11:13:39AM -0400, Josef Bacik wrote:
Not Fedora land, but Facebook installs it on all of our root devices, so millions of machines. We've done this for 5 years. It's worked out very well. Thanks,
Josef, I'd love to hear your comments on any differences between that situation and the typical laptop-user case for Fedora desktop systems. Anything we should consider?
We buy worse hardware than a typical laptop user uses, at least for our hard drives.
The difference between an operation like Facebook and the Fedora user base, it that Facebook will have a huge fleet of crap hardware, with the support teams to baby-sit the crap hardware, and attention to reducing the variety of crap hardware to limit the support matrix breadth, while Fedora has to deal with a huge support matrix breadth, without the support teams and the support team tooling to baby-sit hardware. (Besides Facebook designs the levels of crapiness they allow in their hardware, meaning they know exactly where they are pushing limits to lower hardware costs).
And, it’s not always the crap hardware that hits bugs. Sometimes expensive gamer hardware will fail first because its manufacturer has pushed the limits to eke some performance points over the competition.
Therefore, using btrfs in Fedora, is inherently more ambitious, than using it at Facebook.
I've been very clear from the outset that Facebook's fault tolerance is much higher than the average Fedora user. The only reason I've agreed to assist in answering questions and support this proposal is because I have multi-year data that shows our failure rates are the same that we see on every other file system, which is basically the failure rate of the disks themselves.
And I specifically point out the hardware that we use that most closely reflects the drives that an average Fedora user is going to have. We of course have a very wide variety of hardware. In fact the very first thing we deployed on were these expensive hardware RAID setups. Btrfs found bugs in that firmware that was silently corrupting data. These corruptions had been corrupting AI test data for years under XFS, and Btrfs found it in a matter of days because of our checksumming.
We use all sorts of hardware, and have all sorts of similar stories like this. I agree that the hardware is going to be muuuuuch more varied with Fedora users, and that Facebook has muuuuch higher fault tolerance. But higher production failures inside FB means more engineering time spent dealing with those failures, which translates to lost productivity. If btrfs was causing us to run around fixing it all the time then we wouldn't deploy it. The fact is that it's not, it's perfectly stable from our perspective. Thanks,
Josef
I've been very clear from the outset that Facebook's fault tolerance is much higher than the average Fedora user. The only reason I've agreed to assist in answering questions and support this proposal is because I have multi-year data that shows our failure rates are the same that we see on every other file system, which is basically the failure rate of the disks themselves.
And I specifically point out the hardware that we use that most closely reflects the drives that an average Fedora user is going to have. We of course have a very wide variety of hardware. In fact the very first thing we deployed on were these expensive hardware RAID setups. Btrfs found bugs in that firmware that was silently corrupting data. These corruptions had been corrupting AI test data for years under XFS, and Btrfs found it in a matter of days because of our checksumming.
We use all sorts of hardware, and have all sorts of similar stories like this. I agree that the hardware is going to be muuuuuch more varied with Fedora users, and that Facebook has muuuuch higher fault tolerance. But higher production failures inside FB means more engineering time spent dealing with those failures, which translates to lost productivity. If btrfs was causing us to run around fixing it all the time then we wouldn't deploy it. The fact is that it's not, it's perfectly stable from our perspective. Thanks,
Thanks for the details, you have any data/information/opinions on non x86 architectures such as aarch64/armv7/ppc64le all of which have supported desktops too?
Peter
On 6/27/20 9:57 AM, Peter Robinson wrote:
I've been very clear from the outset that Facebook's fault tolerance is much higher than the average Fedora user. The only reason I've agreed to assist in answering questions and support this proposal is because I have multi-year data that shows our failure rates are the same that we see on every other file system, which is basically the failure rate of the disks themselves.
And I specifically point out the hardware that we use that most closely reflects the drives that an average Fedora user is going to have. We of course have a very wide variety of hardware. In fact the very first thing we deployed on were these expensive hardware RAID setups. Btrfs found bugs in that firmware that was silently corrupting data. These corruptions had been corrupting AI test data for years under XFS, and Btrfs found it in a matter of days because of our checksumming.
We use all sorts of hardware, and have all sorts of similar stories like this. I agree that the hardware is going to be muuuuuch more varied with Fedora users, and that Facebook has muuuuch higher fault tolerance. But higher production failures inside FB means more engineering time spent dealing with those failures, which translates to lost productivity. If btrfs was causing us to run around fixing it all the time then we wouldn't deploy it. The fact is that it's not, it's perfectly stable from our perspective. Thanks,
Thanks for the details, you have any data/information/opinions on non x86 architectures such as aarch64/armv7/ppc64le all of which have supported desktops too?
I can't speak to ppc* at all, and I'm not sure how much I can talk about our arm stuff, but it was tested and used in production on arm a few years ago. But obviously the bulk of our workload is x86. Thanks,
Josef
On Sat, Jun 27, 2020 at 7:58 AM Peter Robinson pbrobinson@gmail.com wrote:
I've been very clear from the outset that Facebook's fault tolerance is much higher than the average Fedora user. The only reason I've agreed to assist in answering questions and support this proposal is because I have multi-year data that shows our failure rates are the same that we see on every other file system, which is basically the failure rate of the disks themselves.
And I specifically point out the hardware that we use that most closely reflects the drives that an average Fedora user is going to have. We of course have a very wide variety of hardware. In fact the very first thing we deployed on were these expensive hardware RAID setups. Btrfs found bugs in that firmware that was silently corrupting data. These corruptions had been corrupting AI test data for years under XFS, and Btrfs found it in a matter of days because of our checksumming.
We use all sorts of hardware, and have all sorts of similar stories like this. I agree that the hardware is going to be muuuuuch more varied with Fedora users, and that Facebook has muuuuch higher fault tolerance. But higher production failures inside FB means more engineering time spent dealing with those failures, which translates to lost productivity. If btrfs was causing us to run around fixing it all the time then we wouldn't deploy it. The fact is that it's not, it's perfectly stable from our perspective. Thanks,
Thanks for the details, you have any data/information/opinions on non x86 architectures such as aarch64/armv7/ppc64le all of which have supported desktops too?
Sample size of 1: Raspberry Pi Zero running Arch for ~ a year. I use mount option -o compress=zstd:1. I haven't benchmarked it, it's a Pi Zero so it's slow no matter what file system is used. But anecdotally I can't tell a difference enough to even speculate.
This is a bit of an overly verbose mess, but the take away is that at least for /usr I'm saving about 41%. Space and writes.
$ sudo compsize /usr Processed 48038 files, 28473 regular extents (28757 refs), 25825 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 59% 879M 1.4G 1.4G none 100% 435M 435M 435M lzo 54% 153M 281M 287M zstd 37% 289M 767M 786M $
I could instead selectively compress just certain directories or files, using an XATTR (there is a btrfs command for setting it). Compression can aslo be applied after the fact by defragmenting with a compression option.
I think the reduction in write amplification in this use case is significant because SD cards are just so impressively terrible. I have only ever seen them return garbage rather than the device itself admit a read error (UNC read error), and btrfs will catch that. I seriously would only ever use btrfs for this. I might consider another file system if I were using industrial SD cards, but *shrug* in that case I'd probably spend a bit more time benchmarking things and seeing if i can squeak out a bit more performance from lzo or zstd:1 on reads due to a reduction in IO latency. Because SLC is going to be slower than TLC or anything else.
I don't know much about eMMC media, but if it's a permanent resident on the board, all the more reason I'd use btrfs and compress everything. I *might* even consider changing the compression level to something more aggressive for updates because the performance limitation isn't the compression hit, but rather the internet bandwidth. This is as simple as 'mount -o remount,compress=zstd:9 /' and then do the update - and upon reboot it's still zstd:1 or whatever is in fstab/systemd mount unit. A future feature might be to add level to the existing XATTR method of setting compression per dir or per file. So you could indicate things like "always use heavier compression" for specific dirs.
On Fri, Jun 26, 2020 at 6:30 PM Josef Bacik josef@toxicpanda.com wrote:
On 6/26/20 11:15 AM, Matthew Miller wrote:
On Fri, Jun 26, 2020 at 11:13:39AM -0400, Josef Bacik wrote:
Not Fedora land, but Facebook installs it on all of our root devices, so millions of machines. We've done this for 5 years. It's worked out very well. Thanks,
Josef, I'd love to hear your comments on any differences between that situation and the typical laptop-user case for Fedora desktop systems. Anything we should consider?
We buy worse hardware than a typical laptop user uses, at least for our hard drives. Also we hit our disks harder than most typical Fedora users. Consider the web tier for example, we push the entire website to every box in the web tier (measured in hundreds of thousands of machines) probably 6-10 times a day. This is roughly 40 gib of data, getting written to these truly terrible consumer grade flash drives (along with some spinning rust), 6-10 times a day. In addition to the normal sort of logging, package updates, etc that happen.
Also keep in mind we pay really close attention to burn rates for our drives, because obviously at our scale it translates to millions of dollars. Btrfs has improved our burn rates with the compression, as the write amplification goes drastically down, thus extending the life of the drives.
Hi Josef,
Out of curiosity, do you also monitor SMART data for all your hard drives? If yes, have you seen any correlations between specific errors reported by btrfs and those picked up by SMART (not necessarily the fatal ones)? Any useful conclusions?
Best regards, A.
Why not zfs?
On 6/26/2020 10:42 AM, Ben Cotton wrote:
https://fedoraproject.org/wiki/Changes/BtrfsByDefault
== Summary ==
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
== Owners ==
- Names: [[User:Chrismurphy|Chris Murphy]], [[User:Ngompa|Neal
Gompa]], [[User:Josef|Josef Bacik]], [[User:Salimma|Michel Alexandre Salim]], [[User:Dcavalca|Davide Cavalca]], [[User:eeickmeyer|Erich Eickmeyer]], [[User:ignatenkobrain|Igor Raits]], [[User:Raveit65|Wolfgang Ulbrich]], [[User:Zsun|Zamir SUN]], [[User:rdieter|Rex Dieter]], [[User:grinnz|Dan Book]], [[User:nonamedotc|Mukundan Ragavan]]
- Emails: chrismurphy@fedoraproject.org, ngompa13@gmail.com,
josef@toxicpanda.com, michel@michel-slm.name, dcavalca@fb.com, erich@ericheickmeyer.com, ignatenkobrain@fedoraproject.org, fedora@raveit.de, zsun@fedoraproject.org, rdieter@gmail.com, grinnz@gmail.com, nonamedotc@gmail.com
- Products: All desktop editions, spins, and labs
- Responsible WGs: Workstation Working Group, KDE Special Interest Group
== Detailed Description ==
Fedora desktop edition/spin variants will switch to using Btrfs as the filesystem by default for new installs. Labs derived from these variants inherit this change, and other editions may opt into this change.
The change is based on the installer's custom partitioning Btrfs preset. It's been well tested for 7 years.
'''''Current partitioning'''''<br /> <span style="color: tomato">vg/root</span> LV mounted at <span style="color: tomato">/</span> and a <span style="color: tomato">vg/home</span> LV mounted at <span style="color: tomato">/home</span>. These are separate file system volumes, with separate free/used space.
'''''Proposed partitioning'''''<br /> <span style="color: tomato">root</span> subvolume mounted at <span style="color: tomato">/</span> and <span style="color: tomato">home</span> subvolume mounted at <span style="color: tomato">/home</span>. Subvolumes don't have size, they act mostly like directories, space is shared.
'''''Unchanged'''''<br /> <span style="color: tomato">/boot</span> will be a small ext4 volume. A separate boot is needed to boot dm-crypt sysroot installations; it's less complicated to keep the layout the same, regardless of whether sysroot is encrypted. There will be no automatic snapshots/rollbacks.
If you select to encrypt your data, LUKS (dm-crypt) will be still used as it is today (with the small difference that Btrfs is used instead of LVM+Ext4). There is upstream work on getting native encryption for Btrfs that will be considered once ready and is subject of a different change proposal in a future Fedora release.
=== Optimizations (Optional) ===
The detailed description above is the proposal. It's intended to be a minimalist and transparent switch. It's also the same as was [[Features/F16BtrfsDefaultFs|proposed]] (and [https://lwn.net/Articles/446925/ accepted]) for Fedora 16. The following optimizations improve on the proposal, but are not critical. They are also transparent to most users. The general idea is agree to the base proposal first, and then consider these as enhancements.
==== Boot on Btrfs ====
- Instead of a 1G ext4 boot, create a 1G Btrfs boot.
- Advantage: Makes it possible to include in a snapshot and rollback
regime. GRUB has stable support for Btrfs for 10+ years.
- Scope: Contingent on bootloader and installer team review and
approval. blivet should use <code>mkfs.btrfs --mixed</code>.
==== Compression ====
- Enable transparent compression using zstd on select directories:
<span style="color: tomato">/usr</span> <span style="color: tomato">/var/lib/flatpak</span> <span style="color: tomato">~/.local/share/flatpak</span>
- Advantage: Saves space and significantly increase the lifespan of
flash-based media by reducing write amplification. It may improve performance in some instances.
- Scope: Contingent on installer team review and approval to enhance
anaconda to perform the installation using <code>mount -o compress=zstd</code>, then set the proper XATTR for each directory. The XATTR can't be set until after the directories are created via: rsync, rpm, or unsquashfs based installation.
==== Additional subvolumes ====
- <span style="color: tomato">/var/log/</span> <span style="color:
tomato">/var/lib/libvirt/images</span> and <span style="color: tomato">~/.local/share/gnome-boxes/images/</span> will use separate subvolumes.
- Advantage: Makes it easier to excluded them from snapshots,
rollbacks, and send/receive. (Btrfs snapshotting is not recursive, it stops at a nested subvolume.)
- Scope: Anaconda knows how to do this already, just change the
kickstart to add additional subvolumes (minus the subvolume in <span style="color: tomato">~/</span>. GNOME Boxes will need enhancement to detect that the user home is on Btrfs and create <span style="color: tomato">~/.local/share/gnome-boxes/images/</span> as a subvolume.
== Feedback ==
==== Red Hat doesn't support Btrfs? Can Fedora do this? ====
Red Hat supports Fedora well, in many ways. But Fedora already works closely with, and depends on, upstreams. And this will be one of them. That's an important consideration for this proposal. The community has a stake in ensuring it is supported. Red Hat will never support Btrfs if Fedora rejects it. Fedora necessarily needs to be first, and make the persuasive case that it solves more problems than alternatives. Feature owners believe it does, hands down.
The Btrfs community has users that have been using it for most of the past decade at scale. It's been the default on openSUSE (and SUSE Linux Enterprise) since 2014, and Facebook has been using it for all their OS and data volumes, in their data centers, for almost as long. Btrfs is a mature, well-understood, and battle-tested file system, used on both desktop/container and server/cloud use-cases. We do have developers of the Btrfs filesystem maintaining and supporting the code in Fedora, one is a Change owner, so issues that are pinned to Btrfs can be addressed quickly.
==== What about device-mapper alternatives? ====
dm-thin (thin provisioning): [[https://pagure.io/fedora-workstation/issue/152 Issue #152] still happens, because the installer won't over provision by default. It still requires manual intervention by the user to identify and resolve the problem. Upon growing a file system on dm-thin, the pool is over committed, and file system sizes become a fantasy: they don't add up to the total physical storage available. The truth of used and free space is only known by the thin pool, and CLI and GUI programs are unprepared for this. Integration points like rpm free space checks or GNOME disk-space warnings would have to be adapted as well.
dm-vdo: is not yet merged, and isn't as straightforward to selectively enable per directory and per file, as is the case on Btrfs using <code>chattr +c</code> on <span style="color: tomato">/var/lib/flatpaks/</span>.
Btrfs solves the problems that need solving, with few side effects or pitfalls for users. It has more features we can take advantage of immediately and transparently: compression, integrity, and IO isolation. Many Btrfs features and optimizations can be opted into selectively per directory or file, such as compression and nodatacow, rather than as a layer that's either on or off.
==== What about UI/UX and integration in the desktop? ====
If Btrfs isn't the default file system, there's no commitment, nor reason to work on any UI/UX integration. There are ideas to make certain features discoverable: selective compression; systemd-homed may take advantage of either Btrfs online resize, or near-term planned native encryption, which could make it possible to live convert non-encrypted homes to encrypted; and system snapshot and rollbacks.
Anaconda already has sophisticated Btrfs integration.
==== What Btrfs features are recommended and supported? ====
The primary goal of this feature is to be largely transparent to the user. It does not require or expect users to learn new commands, or to engage in peculiar maintenance rituals.
The full set of Btrfs features that is considered stable and enabled by default upstream will be enabled in Fedora. Fedora is a community project. What is supported within Fedora depends on what the community decides to put forward in terms of resources.
The upstream [https://btrfs.wiki.kernel.org/index.php/Status Btrfs feature status page].
==== Are subvolumes really mostly like directories? ====
Subvolumes behave like directories in terms of navigation in both the GUI and CLI, e.g. <code>cp</code>, <code>mv</code>, <code>du</code>, owner/permissions, and SELinux labels. They also share space, just like a directory.
But it is an incomplete answer.
A subvolume is an independent file tree, with its own POSIX namespace, and has its own pool of inodes. This means inode numbers repeat themselves on a Btrfs volume. Inodes are only unique within a given subvolume. A subvolume has its own st_dev, so if you use <code>stat FILE</code> it reports a device value referring to the subvolume the file is in. And it also means hard links can't be created between subvolumes. From this perspective, subvolumes start looking more like a separate file system. But subvolumes share most of the other trees, so they're not truly independent file systems. They're also not block devices.
== Benefit to Fedora ==
Problems Btrfs helps solve:
- Users running out of free space on either <span style="color:
tomato">/</span> or <span style="color: tomato">/home</span> [https://pagure.io/fedora-workstation/issue/152 Workstation issue #152] ** "one big file system": no hard barriers like partitions or logical volumes ** transparent compression: significantly reduces write amplification, improves lifespan of storage hardware ** reflinks and snapshots are more efficient for use cases like containers (Podman supports both)
- Storage devices can be flaky, resulting in data corruption
** Everything is checksummed and verified on every read ** Corrupt data results in EIO (input/output error), instead of resulting in application confusion, and isn't replicated into backups and archives
- Poor desktop responsiveness when under pressure
[https://pagure.io/fedora-workstation/issue/154 Workstation issue #154] ** Currently only Btrfs has proper IO isolation capability via cgroups2 ** Completes the resource control picture: memory, cpu, IO isolation
- File system resize
** Online shrink and grow are fundamental to the design
- Complex storage setups are... complicated
** Simple and comprehensive command interface. One master command ** Simpler to boot, all code is in the kernel, no initramfs complexities ** Simple and efficient file system replication, including incremental backups, with <code>btrfs send</code> and <code>btrfs receive</code>
== Scope ==
- Proposal owners:
** Submit PR's for Anaconda to change <code>default_scheme = BTRFS</code> to the proper product files. ** Multiple test days: build community support network ** Aid with documentation
- Other developers:
** Anaconda, review PRs and merge ** Bootloader team, review PRs and merge ** Recommended optimization <code>chattr +C</code> set on the containing directory for virt-manager and GNOME Boxes.
Release engineering: [https://pagure.io/releng/issue/9545 #9545]
Policies and guidelines: N/A
Trademark approval: N/A
== Upgrade/compatibility impact ==
Change will not affect upgrades.
Documentation will be provided for existing Btrfs users to "retrofit" their setups to that of a default Btrfs installation (base plus any approved options).
== How To Test ==
'''''Today'''''<br /> Do a custom partitioning installation; change the scheme drop-down menu to Btrfs; click the blue "automatically create partitions"; and install.<br /> Fedora 31, 32, Rawhide, on x86_64 and ARM.
'''''Once change lands'''''<br /> It should be simple enough to test, just do a normal install.
== User Experience ==
==== Pros ====
- Mostly transparent
- Space savings from compression
- Longer lifespan of hardware, also from compression.
- Utilities for used and free space, CLI and GUI, are expected to
behave the same. No special commands are required.
- More detailed information can be revealed by <code>btrfs</code>
specific commands.
==== Enhancement opportunities ====
[https://bugzilla.redhat.com/show_bug.cgi?id=906591 updatedb does not index /home when /home is a bind mount] Also can affected rpm-ostree installations, including Silverblue.
[https://gitlab.gnome.org/GNOME/gnome-usage/-/issues/49 GNOME Usage: Incorrect numbers when using multiple btrfs subvolumes] This isn't Btrfs specific, happens with "one big ext4" volume as well.
[https://gitlab.gnome.org/GNOME/gnome-boxes/-/issues/88 GNOME Boxes, RFE: create qcow2 with 'nocow' option when on btrfs /home] This is Btrfs specific, and is a recommended optimization for both GNOME Boxes and virt-manager.
[https://github.com/containers/libpod/issues/6563 containers/libpod: automatically use btrfs driver if on btrfs]
== Dependencies ==
None.
== Contingency Plan ==
Contingency mechanism: Owner will revert changes back to LVM+ext4
Contingency deadline: Beta freeze
Blocks release? Yes
Blocks product? Workstation and KDE
== Documentation ==
Strictly speaking no documentation is required reading for users. But there will be some Fedora documentation to help get the ball rolling.
For those who want to know more:
[https://btrfs.wiki.kernel.org/index.php/Main_Page btrfs wiki main page and full feature list.]
<code>man 5 btrfs</code> contains: mount options, features, swapfile support, checksum algorithms, and more<br /> <code>man btrfs</code> contains an overview of the btrfs subcommands<br /> <code>man btrfs <nowiki><subcommand></nowiki></code> will show the man page for that subcommand
NOTE: The btrfs command will accept partial subcommands, as long as it's not ambiguous. These are equivalent commands:<br /> <code>btrfs subvolume snapshot</code><br /> <code>btrfs sub snap</code><br /> <code>btrfs su sn</code>
You'll discover your own convention. It might be preferable to write out the full command on forums and lists, but then maybe some folks don't learn about this useful shortcut?
For those who want to know a lot more:
[https://btrfs.wiki.kernel.org/index.php/Main_Page#Developer_documentation Btrfs developer documentation]<br /> [https://github.com/btrfs/btrfs-dev-docs/blob/master/trees.txt Btrfs trees]
== Release Notes == The default file system on the desktop is Btrfs.
On Fri, Jun 26, 2020 at 11:15:24AM -0400, Michael Watters wrote:
Why not zfs?
We cannot include ZFS in Fedora for legal reasons. Additionally, ZFS is not really intended for the laptop use case.
On Friday, June 26, 2020 8:22:49 AM MST Matthew Miller wrote:
On Fri, Jun 26, 2020 at 11:15:24AM -0400, Michael Watters wrote:
Why not zfs?
We cannot include ZFS in Fedora for legal reasons. Additionally, ZFS is not really intended for the laptop use case.
Has that actually been explored? How does Canonical get around the legal issues with OpenZFS' licensing?
On Sun, 28 Jun 2020 09:59:52 -0700, you wrote:
Has that actually been explored? How does Canonical get around the legal issues with OpenZFS' licensing?
For a start they aren't a US company, and unlike Red Hat they aren't the same tempting target for a lawsuit.
On Sunday, June 28, 2020 5:14:14 PM MST Gerald Henriksen wrote:
On Sun, 28 Jun 2020 09:59:52 -0700, you wrote:
Has that actually been explored? How does Canonical get around the legal issues with OpenZFS' licensing?
For a start they aren't a US company, and unlike Red Hat they aren't the same tempting target for a lawsuit.
I fail to see how being a US company or not would have much bearing on this. As for being a "tempting target", they're both big tech companies providing a Linux distro as their primary product, working on a support model.
On Sun, Jun 28, 2020 at 09:59:52AM -0700, John M. Harris Jr wrote:
We cannot include ZFS in Fedora for legal reasons. Additionally, ZFS is not really intended for the laptop use case.
Has that actually been explored? How does Canonical get around the legal issues with OpenZFS' licensing?
I can't really speculate on Canonical's legal stance and I encourage everyone else to also not.
I can point to Red Hat's, though: the knowledge base article here https://access.redhat.com/solutions/79633 says:
* ZFS is not included in the upstream Linux kernel due to licensing reasons.
* Red Hat applies the upstream first policy for kernel modules (including filesystems). Without upstream presence, kernel modules like ZFS cannot be supported by Red Hat.
and "due to licensing reasons" links to https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/ which is quite interesting and quite long. If you have just time to read one section, the two paragraphs at the end under "Do Not Rely On This Document As Legal Advice" seem like the _most_ interesting to me.
On Monday, June 29, 2020 9:26:09 AM MST Matthew Miller wrote:
On Sun, Jun 28, 2020 at 09:59:52AM -0700, John M. Harris Jr wrote:
We cannot include ZFS in Fedora for legal reasons. Additionally, ZFS is not really intended for the laptop use case.
Has that actually been explored? How does Canonical get around the legal issues with OpenZFS' licensing?
I can't really speculate on Canonical's legal stance and I encourage everyone else to also not.
I can point to Red Hat's, though: the knowledge base article here https://access.redhat.com/solutions/79633 says:
- ZFS is not included in the upstream Linux kernel due to licensing
reasons.
- Red Hat applies the upstream first policy for kernel modules (including filesystems). Without upstream presence, kernel modules like ZFS cannot
be supported by Red Hat.
and "due to licensing reasons" links to https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/ which is quite interesting and quite long. If you have just time to read one section, the two paragraphs at the end under "Do Not Rely On This Document As Legal Advice" seem like the _most_ interesting to me.
I've both read that page, and linked to it further down in this thread. Yes, I believe that Canonical's implementation is a GPL violation, but it doesn't need to be. So long as the source is in a separate package, and it's packaged as a kmod, it wouldn't be a GPL violation. It's worth considering, in my opinion, whether or not it'd be available for RHEL. It wouldn't be the first package RHEL doesn't have, but Fedora does. :)
On Monday, June 29, 2020, John M. Harris Jr johnmh@splentity.com wrote:
On Monday, June 29, 2020 9:26:09 AM MST Matthew Miller wrote:
On Sun, Jun 28, 2020 at 09:59:52AM -0700, John M. Harris Jr wrote:
We cannot include ZFS in Fedora for legal reasons. Additionally, ZFS
is
not really intended for the laptop use case.
Has that actually been explored? How does Canonical get around the
legal
issues with OpenZFS' licensing?
I can't really speculate on Canonical's legal stance and I encourage everyone else to also not.
I can point to Red Hat's, though: the knowledge base article here https://access.redhat.com/solutions/79633 says:
- ZFS is not included in the upstream Linux kernel due to licensing
reasons.
- Red Hat applies the upstream first policy for kernel modules (including filesystems). Without upstream presence, kernel modules like ZFS cannot
be supported by Red Hat.
and "due to licensing reasons" links to https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/ which is quite interesting and quite long. If you have just time to read one section,
the
two paragraphs at the end under "Do Not Rely On This Document As Legal Advice" seem like the _most_ interesting to me.
I've both read that page, and linked to it further down in this thread. Yes, I believe that Canonical's implementation is a GPL violation, but it doesn't need to be. So long as the source is in a separate package, and it's packaged as a kmod, it wouldn't be a GPL violation. It's worth considering, in my opinion, whether or not it'd be available for RHEL. It wouldn't be the first package RHEL doesn't have, but Fedora does. :)
That's not how the GPL work - using that argument you can link anything to a GPL only library as long as it is in a separate source tree (which is mostly the case).
It's either a derived work of the kernel and thus is bound by the GPL restrictions or it isn't. Does not matter in which tarball or rpm or $whatever it is in.
On Mon, Jun 29, 2020 at 10:20:17AM -0700, John M. Harris Jr wrote:
I've both read that page, and linked to it further down in this thread. Yes, I believe that Canonical's implementation is a GPL violation, but it doesn't need to be. So long as the source is in a separate package, and it's packaged as a kmod, it wouldn't be a GPL violation. It's worth considering, in my opinion, whether or not it'd be available for RHEL. It wouldn't be the first package RHEL doesn't have, but Fedora does. :)
The Conservancy page does address source distribution as well, and they (as well as Red Hat's lawyers) have a different conclusion.
It's not a GPL violation. OpenZFS works under Linux through a compatibility layer called SPL, the Solaris Porting Layer. SPL is licensed under GPL. Torvalds himself said that a non-GPL file system that was written for another OS cannot be considered a derivative of the Linux kernel: https://yarchive.net/comp/linux/gpl_modules.html
SPL is a derived work from the Linux kernel because it's designed for the Linux kernel. SPL is therefore under GPL. ZFS is designed for Solaris and therefore a different license is fine.
Dell, a friggin huge US company, wouldn't distribute Ubuntu with their laptops if they as the distributor did something illegal.
On Monday, June 29, 2020 3:40:57 PM MST Markus S. wrote:
It's not a GPL violation. OpenZFS works under Linux through a compatibility layer called SPL, the Solaris Porting Layer. SPL is licensed under GPL. Torvalds himself said that a non-GPL file system that was written for another OS cannot be considered a derivative of the Linux kernel: https://yarchive.net/comp/linux/gpl_modules.html
SPL is a derived work from the Linux kernel because it's designed for the Linux kernel. SPL is therefore under GPL. ZFS is designed for Solaris and therefore a different license is fine.
Dell, a friggin huge US company, wouldn't distribute Ubuntu with their laptops if they as the distributor did something illegal.
That's a good point, I didn't think about that. Additionally, having the context from Linus is very useful, thank you for that!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Mon, 2020-06-29 at 12:26 -0400, Matthew Miller wrote:
On Sun, Jun 28, 2020 at 09:59:52AM -0700, John M. Harris Jr wrote:
We cannot include ZFS in Fedora for legal reasons. Additionally, ZFS is not really intended for the laptop use case.
Has that actually been explored? How does Canonical get around the legal issues with OpenZFS' licensing?
I can't really speculate on Canonical's legal stance and I encourage everyone else to also not.
I can point to Red Hat's, though: the knowledge base article here https://access.redhat.com/solutions/79633 says:
- ZFS is not included in the upstream Linux kernel due to licensing
reasons.
- Red Hat applies the upstream first policy for kernel modules
(including filesystems). Without upstream presence, kernel modules like ZFS cannot be supported by Red Hat.
This is not fully true to my knowledge. Red Hat ships VDO and that is not even sent to upstream (yet?).
and "due to licensing reasons" links to https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/ which is quite interesting and quite long. If you have just time to read one section, the two paragraphs at the end under "Do Not Rely On This Document As Legal Advice" seem like the _most_ interesting to me.
-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
- -- Igor Raits ignatenkobrain@fedoraproject.org
Hi,
On 29/06/2020 19:54, Igor Raits wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Mon, 2020-06-29 at 12:26 -0400, Matthew Miller wrote:
On Sun, Jun 28, 2020 at 09:59:52AM -0700, John M. Harris Jr wrote:
We cannot include ZFS in Fedora for legal reasons. Additionally, ZFS is not really intended for the laptop use case.
Has that actually been explored? How does Canonical get around the legal issues with OpenZFS' licensing?
I can't really speculate on Canonical's legal stance and I encourage everyone else to also not.
I can point to Red Hat's, though: the knowledge base article here https://access.redhat.com/solutions/79633 says:
- ZFS is not included in the upstream Linux kernel due to licensing
reasons.
- Red Hat applies the upstream first policy for kernel modules
(including filesystems). Without upstream presence, kernel modules like ZFS cannot be supported by Red Hat.
This is not fully true to my knowledge. Red Hat ships VDO and that is not even sent to upstream (yet?).
It has taken a bit longer than perhaps expected. However the intention is very much that it will go upstream,
Steve.
OpenZFS is frequently lagging behind in support for newer kernels which would work against Fedora's "rolling" approach to kernel releases.
Proxmox and Ubuntu don't feature rolling kernel releases. That's why they can ship OpenZFS (without legal problems, btw).
OpenZFS is frequently lagging behind in support for newer kernels which would work against Fedora's "rolling" approach to kernel releases.
Yes, there is quite often a time delay between kernel releases and OpenZFS releases that contain compatibility patches. However, in my experience, the OpenZFS developers are aware of this and act rather quickly. I believe that if a project like Fedora were to switch to ZFS, this would not be an issue at all - ZFS compatibility patches are usually available early on during the kernel development cycle, the delay is mostly due to the lack of testing and review.
Proxmox and Ubuntu don't feature rolling kernel releases. That's why they can ship OpenZFS (without legal problems, btw).
Would you care to elaborate why a rolling release kernel is not hit by any legal problems? I fail to see how that is relevant here, but then again, I am certainly not a lawyer and my understanding of the legal implications is rudimentary at best.
-Armin
On Fri, Jun 26, 2020 at 10:42:25AM -0400, Ben Cotton wrote:
==== Boot on Btrfs ====
- Instead of a 1G ext4 boot, create a 1G Btrfs boot.
- Advantage: Makes it possible to include in a snapshot and rollback
regime. GRUB has stable support for Btrfs for 10+ years.
GRUB2 btrfs support tend to lag a bit. User would need to be careful not to enable some btrfs features before GRUB2 supports them.
- Scope: Contingent on bootloader and installer team review and
approval. blivet should use <code>mkfs.btrfs --mixed</code>.
When going with btrfs /boot, you can forego separate partition and just make a /boot subvolume in main pool.
Advantage: fewer partitions.
Disadvantages: using encryption is harder. GRUB2 supports only LUKS1 encryption (AFAIK). Obviously, there is not plymouth integration, so the password would have to be entered at least twice. When not using encryption above is not a problem.
On 6/26/20 8:23 AM, Tomasz Torcz wrote:
On Fri, Jun 26, 2020 at 10:42:25AM -0400, Ben Cotton wrote:
==== Boot on Btrfs ====
...
When going with btrfs /boot, you can forego separate partition and just make a /boot subvolume in main pool.
Advantage: fewer partitions.
Disadvantages: using encryption is harder. GRUB2 supports only LUKS1 encryption (AFAIK). Obviously, there is not plymouth integration, so the password would have to be entered at least twice. When not using encryption above is not a problem.
Once there's native btrfs encryption, agreed, /boot should just be a separate subvolume where we make sure we don't turn on any unsupported features.
Regards,
On 6/26/20 5:23 PM, Tomasz Torcz wrote:
Disadvantages: using encryption is harder. GRUB2 supports only LUKS1 encryption (AFAIK). Obviously, there is not plymouth integration, so the password would have to be entered at least twice. When not using encryption above is not a problem.
There's support for LUKS2 already waiting for next GRUB2 upstream release, AFAICT: https://git.savannah.gnu.org/cgit/grub.git/commit/?id=365e0cc3e7e44151c14dd2...
Regards O.
On Fr, 26.06.20 10:42, Ben Cotton (bcotton@redhat.com) wrote:
If this is decided to be the way to go, please work with kernel maintainers to make btrfs.ko a built-in kernel module, so that initrd-less boots work... (it's kinda pointless anyway to have something as module that is now gonna used by most people anyway, it just slows things down for little benefit)
Lennart
-- Lennart Poettering, Berlin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Fri, 2020-06-26 at 17:30 +0200, Lennart Poettering wrote:
On Fr, 26.06.20 10:42, Ben Cotton (bcotton@redhat.com) wrote:
If this is decided to be the way to go, please work with kernel maintainers to make btrfs.ko a built-in kernel module, so that initrd-less boots work... (it's kinda pointless anyway to have something as module that is now gonna used by most people anyway, it just slows things down for little benefit)
Good point, we'll make sure to not forget about it.
Lennart
-- Lennart Poettering, Berlin _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
- -- Igor Raits ignatenkobrain@fedoraproject.org
On Fri, Jun 26, 2020 at 10:30 AM Lennart Poettering mzerqung@0pointer.de wrote:
On Fr, 26.06.20 10:42, Ben Cotton (bcotton@redhat.com) wrote:
If this is decided to be the way to go, please work with kernel maintainers to make btrfs.ko a built-in kernel module, so that initrd-less boots work... (it's kinda pointless anyway to have something as module that is now gonna used by most people anyway, it just slows things down for little benefit)
That would make sense if this were decided. My big issue with this is we have no internal RH expertise on btrfs, and would be entirely dependent on the upstream community for support. There are instances of CVEs that get ignored for long periods of time, CVE-2019-19378 and CVE-2019-19448 being the current examples, with the later being not a huge deal, but still an outstanding CVE. In general btrfs CVEs tend to stick around longer than XFS and ext4 before a fix is pushed upstream. The Fedora kernel supports btrfs, it has for quite some time, and that doesn't change regardless of the outcome of this proposal. I honestly cannot tell you what the stability would be like spread across the majority of fedora users, because not being default, the typical btrfs user probably currently has a better understanding of what they are getting into. While the lack of internal RH expertise makes me lean against this proposal, I believe the Desktop SIG with FESCo should be able to make the decisions for defaults on the desktop spin.
On Fri, Jun 26, 2020 at 8:45 AM Ben Cotton bcotton@redhat.com wrote:
Related: Chromebooks are using btrfs in a particular way. ChromeOS has something called Crostini which is a set of technologies they use for enabling native Linux app support. This is LXC/LXD based, and uses a (I think per user) loop mounted file that is btrfs, and they leverage btrfs snapshotting for the containers.
Once upon a time, Ben Cotton bcotton@redhat.com said:
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
So... I freely admit I have not looked closely at btrfs in some time, so I could be out of date (and my apologies if so). One issue that I have seen mentioned as an issue within the last week is still the problem of running out of space when it still looks like there's space free. I didn't read the responses, so not sure of the resolution, but I remember that being a "thing" with btrfs. Is that still the case? What are the causes, and if so, how can we keep from getting a lot of the same question on mailing lists/forums/etc.?
I'm pretty neutral on this... I run a bunch of RHEL/CentOS systems, so I tend to stick close to that on my Fedora systems (so I'd probably stick with ext4/xfs on LVM myself). I remember when btrfs was going to be the one FS to rule them all, but then had issues, and specific weird cases (like with VM images IIRC at one point), and kind of fell of my map then. That is not intended as a criticism - filesystems are complex, and developing them hard... I think some of the reputation came from some people pushing btrfs before it was really ready.
On Fri, Jun 26, 2020 at 1:31 PM Chris Adams linux@cmadams.net wrote:
Once upon a time, Ben Cotton bcotton@redhat.com said:
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
So... I freely admit I have not looked closely at btrfs in some time, so I could be out of date (and my apologies if so). One issue that I have seen mentioned as an issue within the last week is still the problem of running out of space when it still looks like there's space free. I didn't read the responses, so not sure of the resolution, but I remember that being a "thing" with btrfs. Is that still the case? What are the causes, and if so, how can we keep from getting a lot of the same question on mailing lists/forums/etc.?
Josef gave a fairly detailed answer upthread: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
However, I'll give some of my own color on this, as well. I have not personally experienced this issue on any of my systems in the past three years. I experienced it a couple of times when I first started out using it in 2014~2015, but it's not been a problem for me since.
We could stand to have some improved documentation here, and I hope this is something we can build up to support our user community. I'm sure there's some documentation from our friends at openSUSE that we can borrow as well.
I'm pretty neutral on this... I run a bunch of RHEL/CentOS systems, so I tend to stick close to that on my Fedora systems (so I'd probably stick with ext4/xfs on LVM myself). I remember when btrfs was going to be the one FS to rule them all, but then had issues, and specific weird cases (like with VM images IIRC at one point), and kind of fell of my map then. That is not intended as a criticism - filesystems are complex, and developing them hard... I think some of the reputation came from some people pushing btrfs before it was really ready.
I absolutely agree. I've often wondered if Btrfs would have a better reputation if it was developed for a few years behind closed doors before being unveiled. I think the way people perceive the filesysetm would be very different then.
Thankfully, I think today we're in a very good place with Btrfs upstream, and having Josef (an upstream Btrfs developer) helping drive this in Fedora makes me very confident in this change.
On 6/26/20 1:43 PM, Neal Gompa wrote:
One issue that I have seen mentioned as an issue within the last week is still the problem of running out of space when it still looks like there's space free. I didn't read the responses, so not sure of the resolution, but I remember that being a "thing" with btrfs. Is that still the case? What are the causes, and if so, how can we keep from getting a lot of the same question on mailing lists/forums/etc.?
Josef gave a fairly detailed answer upthread:
In this reply, he does not specifically address the disk-full issue, but I seem to remember that it was resolved. I couldn't however find a reference---could someone authoritatively say something one way or another?
On Fri, 26 Jun 2020 12:30:02 -0500 Chris Adams linux@cmadams.net wrote:
So... I freely admit I have not looked closely at btrfs in some time, so I could be out of date (and my apologies if so). One issue that I have seen mentioned as an issue within the last week is still the problem of running out of space when it still looks like there's space free. I didn't read the responses, so not sure of the resolution, but I remember that being a "thing" with btrfs. Is that still the case? What are the causes, and if so, how can we keep from getting a lot of the same question on mailing lists/forums/etc.?
Yes, it happened to me last week. The workstation has been upgraded since F25 and is now at F31. A yum update last week ran a restorecon -r / which filled up the filesystem and RAM and swap. The 460 GB filesystem had about 140GB of real data, 100 GB of data bloat from underfull blocks, and the rest (200GB) was metadata. I had to boot from a live USB and run btrfs balance to free up the bloat. I expect to reformat it to ext4 when the quarantine is over.
This is my last BTRFS filesystem. One was on a laptop hard disk that was painfully slow, especially when compared with it's ext4 twin sitting next to it. It was reformatted to ext4. I also had a BTRFS RAID 0 hard disk array. It was also slow and also ended up needing rescue. I converted it over to xfs on MD raid and it's been faster and perfectly reliable ever since.
While I like subvolumes and snapshots, I find the maintenance, reliability, and performance overhead to be not worth it.
Not recommended.
Jim
On Fri, Jun 26, 2020 at 12:58 pm, James Szinger jszinger@gmail.com wrote:
Yes, it happened to me last week. The workstation has been upgraded since F25 and is now at F31. A yum update last week ran a restorecon -r / which filled up the filesystem and RAM and swap. The 460 GB filesystem had about 140GB of real data, 100 GB of data bloat from underfull blocks, and the rest (200GB) was metadata. I had to boot from a live USB and run btrfs balance to free up the bloat. I expect to reformat it to ext4 when the quarantine is over.
Could the proposal owners comment on this, please? Sounds really bad.
On 6/26/20 2:58 PM, James Szinger wrote:
On Fri, 26 Jun 2020 12:30:02 -0500 Chris Adams linux@cmadams.net wrote:
So... I freely admit I have not looked closely at btrfs in some time, so I could be out of date (and my apologies if so). One issue that I have seen mentioned as an issue within the last week is still the problem of running out of space when it still looks like there's space free. I didn't read the responses, so not sure of the resolution, but I remember that being a "thing" with btrfs. Is that still the case? What are the causes, and if so, how can we keep from getting a lot of the same question on mailing lists/forums/etc.?
Yes, it happened to me last week. The workstation has been upgraded since F25 and is now at F31. A yum update last week ran a restorecon -r / which filled up the filesystem and RAM and swap. The 460 GB filesystem had about 140GB of real data, 100 GB of data bloat from underfull blocks, and the rest (200GB) was metadata. I had to boot from a live USB and run btrfs balance to free up the bloat. I expect to reformat it to ext4 when the quarantine is over.
This is my last BTRFS filesystem. One was on a laptop hard disk that was painfully slow, especially when compared with it's ext4 twin sitting next to it. It was reformatted to ext4. I also had a BTRFS RAID 0 hard disk array. It was also slow and also ended up needing rescue. I converted it over to xfs on MD raid and it's been faster and perfectly reliable ever since.
While I like subvolumes and snapshots, I find the maintenance, reliability, and performance overhead to be not worth it.
Not recommended.
Generally speaking btrfs performance has been the same if not better for our workloads. This is millions of boxes with thousands of different workloads and performance requirements.
That being said I can make btrfs look really stupid on some workloads. There's going to be cases where Btrfs isn't awesome. We still use xfs for all our storage related tiers (think databases). Performance is always going to be workload dependent, and Btrfs has built in overhead out the gate because of checksumming and the fact that we generate far more metadata.
As for your ENOSPC issue, I've made improvements on that area. I see this in production as well, I have monitoring in place to deal with the machine before it gets to this point. That being said if you run the box out of metadata space things get tricky to fix. I've been working my way down the list of issues in this area for years, this last go around of patches I sent were in these corner cases.
I described this case to the working group last week, because it hit us in production this winter. Somebody screwed up and suddenly pushed 2 extra copies of the whole website to everybody's VM. The website is mostly metadata, because of the inline extents, so it exhausted everybody's metadata space. Tens of thousands of machines affected. Of those machines I had to hand boot and run balance on ~20 of them to get them back. The rest could run balance from the automation and recover cleanly.
It's a shit user experience, and its a shitty corner case that still needs work. It's a top priority of mine. Thanks,
Josef
On Fri, Jun 26, 2020 at 03:22:07PM -0400, Josef Bacik wrote:
I described this case to the working group last week, because it hit us in production this winter. Somebody screwed up and suddenly pushed 2 extra copies of the whole website to everybody's VM. The website is mostly metadata, because of the inline extents, so it exhausted everybody's metadata space. Tens of thousands of machines affected. Of those machines I had to hand boot and run balance on ~20 of them to get them back. The rest could run balance from the automation and recover cleanly.
Is there a way to mitigate this by reserving space or setting quotas? Users running out of space on their laptops because:
* they downloaded a lot of media * they created huge vms * some sort of horrible log thing gone awry
are pretty common in both a) my anecdotal experience helping people professionally and personally and b) um, me.
Once question, are we looking at using a layout like openSUSE is doing? ( https://en.opensuse.org/SDB:BTRFS ) utilizing subvolumes, or are we looking at something like
/boot/efi > EFI (FAT32) / > btrfs
On Fri, Jun 26, 2020 at 4:45 PM Matthew Miller mattdm@fedoraproject.org wrote:
On Fri, Jun 26, 2020 at 03:22:07PM -0400, Josef Bacik wrote:
I described this case to the working group last week, because it hit us in production this winter. Somebody screwed up and suddenly pushed 2 extra copies of the whole website to everybody's VM. The website is mostly metadata, because of the inline extents, so it exhausted everybody's metadata space. Tens of thousands of machines affected. Of those machines I had to hand boot and run balance on ~20 of them to get them back. The rest could run balance from the automation and recover cleanly.
Is there a way to mitigate this by reserving space or setting quotas? Users running out of space on their laptops because:
- they downloaded a lot of media
- they created huge vms
- some sort of horrible log thing gone awry
are pretty common in both a) my anecdotal experience helping people professionally and personally and b) um, me.
-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Fri, Jun 26, 2020 at 6:11 PM Alex Thomas karlthane@gmail.com wrote:
Once question, are we looking at using a layout like openSUSE is doing? ( https://en.opensuse.org/SDB:BTRFS ) utilizing subvolumes, or are we looking at something like
/boot/efi > EFI (FAT32) / > btrfs
We are planning on using Fedora's current default layout, which has a subvolume for / and a subvolume for /home.
Ok, I thought I saw a proposal by you to change the default btrfs layout to something like openSUSE's using subvolumes, but now, of course, I cannot find it.
On Fri, Jun 26, 2020 at 5:25 PM Neal Gompa ngompa13@gmail.com wrote:
On Fri, Jun 26, 2020 at 6:11 PM Alex Thomas karlthane@gmail.com wrote:
Once question, are we looking at using a layout like openSUSE is doing? ( https://en.opensuse.org/SDB:BTRFS ) utilizing subvolumes, or are we looking at something like
/boot/efi > EFI (FAT32) / > btrfs
We are planning on using Fedora's current default layout, which has a subvolume for / and a subvolume for /home.
Ok, I thought I saw a proposal by you to change the default btrfs layout to something like openSUSE's using subvolumes, but now, of course, I cannot find it.
-- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Fri, Jun 26, 2020 at 6:37 PM Alex Thomas karlthane@gmail.com wrote:
On Fri, Jun 26, 2020 at 5:25 PM Neal Gompa ngompa13@gmail.com wrote:
On Fri, Jun 26, 2020 at 6:11 PM Alex Thomas karlthane@gmail.com wrote:
Once question, are we looking at using a layout like openSUSE is doing? ( https://en.opensuse.org/SDB:BTRFS ) utilizing subvolumes, or are we looking at something like
/boot/efi > EFI (FAT32) / > btrfs
We are planning on using Fedora's current default layout, which has a subvolume for / and a subvolume for /home.
Ok, I thought I saw a proposal by you to change the default btrfs layout to something like openSUSE's using subvolumes, but now, of course, I cannot find it.
I have, at various points in the past couple of years, considered different subvolume configurations. Right now, I'm keeping it simple to our currently tested configuration: /boot on ext4, / and /home as btrfs subvolumes on a single btrfs volume.
The only modification I may consider would be moving /boot to be btrfs volume or subvolume, but that's contingent on some discussion with the installer and bootloader teams.
However, the existing configuration works *very* well right now.
-- 真実はいつも一つ!/ Always, there's only one truth!
On Fri, 26 Jun 2020 at 23:21, Alex Thomas karlthane@gmail.com wrote:
Once question, are we looking at using a layout like openSUSE is doing? ( https://en.opensuse.org/SDB:BTRFS ) utilizing subvolumes, or are we looking at something like
/boot/efi > EFI (FAT32) / > btrfs
BTW that layout. Anaconda still does not allow installing something like that because it does not allow /boot on btrfs (technically there is no any reasons to demand that and /boot can be just subvolume on the root btrfs pool).
kloczek
On Fri, Jun 26, 2020 at 5:31 PM Tomasz Kłoczko kloczko.tomasz@gmail.com wrote:
On Fri, 26 Jun 2020 at 23:21, Alex Thomas karlthane@gmail.com wrote:
Once question, are we looking at using a layout like openSUSE is doing? ( https://en.opensuse.org/SDB:BTRFS ) utilizing subvolumes, or are we looking at something like
/boot/efi > EFI (FAT32) / > btrfs
BTW that layout. Anaconda still does not allow installing something like that because it does not allow /boot on btrfs (technically there is no any reasons to demand that and /boot can be just subvolume on the root btrfs pool).
kloczek
Tomasz Kłoczko | LinkedIn: http://lnkd.in/FXPWxH _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Something to think about when trying to set up for snapshot/rollbacks, then.
Le vendredi 26 juin 2020 à 23:28 +0100, Tomasz Kłoczko a écrit :
On Fri, 26 Jun 2020 at 23:21, Alex Thomas karlthane@gmail.com wrote:
Once question, are we looking at using a layout like openSUSE is doing? ( https://en.opensuse.org/SDB:BTRFS ) utilizing subvolumes, or are we looking at something like
/boot/efi > EFI (FAT32) / > btrfs
BTW that layout. Anaconda still does not allow installing something like that because it does not allow /boot on btrfs (technically there is no any reasons to demand that and /boot can be just subvolume on the root btrfs pool).
Anaconda will detect you’re reusing an efi partition, and complain it does not fit its requirements of the day, and force you to recreate it from scratch, blowing up the EFI parts installed by other systems for their own boot in the process.
Thus, Anaconda EFI support is terrible period.
On 6/26/20 5:44 PM, Matthew Miller wrote:
On Fri, Jun 26, 2020 at 03:22:07PM -0400, Josef Bacik wrote:
I described this case to the working group last week, because it hit us in production this winter. Somebody screwed up and suddenly pushed 2 extra copies of the whole website to everybody's VM. The website is mostly metadata, because of the inline extents, so it exhausted everybody's metadata space. Tens of thousands of machines affected. Of those machines I had to hand boot and run balance on ~20 of them to get them back. The rest could run balance from the automation and recover cleanly.
Is there a way to mitigate this by reserving space or setting quotas? Users running out of space on their laptops because:
- they downloaded a lot of media
- they created huge vms
- some sort of horrible log thing gone awry
are pretty common in both a) my anecdotal experience helping people professionally and personally and b) um, me.
There's a difference between data ENOSPC and metadata ENOSPC. And again, this is a pretty specific failure case. Obviously it's not impossible to hit, but it's not something that's going to be a common occurrence. The two times we've hit these issues in production was the thing that I mentioned, which had 750gib fs's completely full with 20gib of metadata completely filled up.
The second was a bad service that was spewing empty files onto the disk slowly filling up the metadata chunks, coupled with a bug in how we allocated data and metadata chunks. The chunk allocation thing has been fixed for a year or two now. This isn't something a normal user is going to hit most of the time. It obviously does happen, I'm aware of it, and I've made progress on making it less likely to get you into a "call Josef" situation. I'm sure there's still more work to be done, but there's continual progress on this particular edgecase. Thanks,
Josef
On Fri, Jun 26, 2020 at 3:44 PM Matthew Miller mattdm@fedoraproject.org wrote:
On Fri, Jun 26, 2020 at 03:22:07PM -0400, Josef Bacik wrote:
I described this case to the working group last week, because it hit us in production this winter. Somebody screwed up and suddenly pushed 2 extra copies of the whole website to everybody's VM. The website is mostly metadata, because of the inline extents, so it exhausted everybody's metadata space. Tens of thousands of machines affected. Of those machines I had to hand boot and run balance on ~20 of them to get them back. The rest could run balance from the automation and recover cleanly.
Is there a way to mitigate this by reserving space or setting quotas? Users running out of space on their laptops because:
- they downloaded a lot of media
- they created huge vms
- some sort of horrible log thing gone awry
are pretty common in both a) my anecdotal experience helping people professionally and personally and b) um, me.
Real out of space can happen on any file system. Bogus enospc on btrfs due to edge cases hitting bugs are less common than real enospc due to the current partitioning arrangement creating competition between /home and / free space - which won't exist with btrfs. I expect a net reduction of out of space as a result of the change.
There is a reserve in btrfs to help make sure if you do get to a real out of space condition, that the file system (a) stays read write and (b) can be backed out of the full condition by deleting files and successfully freeing up space. Edge cases where this doesn't work are bugs, and there are some non-obvious ways to back out of it if someone does hit one.
The old stories on #btrfs and linux-btrfs@ do include cases of a file system that goes read-only, can't be remounted read-write, and you have to backup->reformat->restore. And that is a PITA. But also not data loss.
* Josef Bacik:
As for your ENOSPC issue, I've made improvements on that area. I see this in production as well, I have monitoring in place to deal with the machine before it gets to this point. That being said if you run the box out of metadata space things get tricky to fix. I've been working my way down the list of issues in this area for years, this last go around of patches I sent were in these corner cases.
Is there anything we need to do in userspace to improve the behavior of fflush and similar interfaces?
This is not strictly a btrfs issue: Some of us are worried about scenarios where the write system call succeeds and the data never makes it to storage *without a catastrophic failure*. (I do not consider running out of disk space a catastrophic failure.) NFS apparently has this property, and you have to call fsync or close the descriptor to detect this. fsync is not desirable due to its performance impact.
Hi,
On 27/06/2020 11:00, Florian Weimer wrote:
- Josef Bacik:
As for your ENOSPC issue, I've made improvements on that area. I see this in production as well, I have monitoring in place to deal with the machine before it gets to this point. That being said if you run the box out of metadata space things get tricky to fix. I've been working my way down the list of issues in this area for years, this last go around of patches I sent were in these corner cases.
Is there anything we need to do in userspace to improve the behavior of fflush and similar interfaces?
This is not strictly a btrfs issue: Some of us are worried about scenarios where the write system call succeeds and the data never makes it to storage *without a catastrophic failure*. (I do not consider running out of disk space a catastrophic failure.) NFS apparently has this property, and you have to call fsync or close the descriptor to detect this. fsync is not desirable due to its performance impact.
It doesn't matter which filesystem you use, you can't be sure that the data is really safe on disk without calling fsync. In the case of a new inode, that means fsync on the file and on the containing directory.
There can be performance issues depending on how that is done, however there are a number of solutions to those issues which can reduce the performance effects to the point where they are usually no longer a problem. That is with the caveat that slow storage will always be slow, of course!
The usual tricks are to avoid doing lots of small fsyncs, by gathering up smaller files, ideally sorting them into inode number order for local filesystems, and then issuing fsyncs asynchronously, waiting for them all only once all the fsyncs have been issued. Also fadvise/madvise can be useful in these situations too,
Steve.
* Steven Whitehouse:
On 27/06/2020 11:00, Florian Weimer wrote:
- Josef Bacik:
As for your ENOSPC issue, I've made improvements on that area. I see this in production as well, I have monitoring in place to deal with the machine before it gets to this point. That being said if you run the box out of metadata space things get tricky to fix. I've been working my way down the list of issues in this area for years, this last go around of patches I sent were in these corner cases.
Is there anything we need to do in userspace to improve the behavior of fflush and similar interfaces?
This is not strictly a btrfs issue: Some of us are worried about scenarios where the write system call succeeds and the data never makes it to storage *without a catastrophic failure*. (I do not consider running out of disk space a catastrophic failure.) NFS apparently has this property, and you have to call fsync or close the descriptor to detect this. fsync is not desirable due to its performance impact.
It doesn't matter which filesystem you use, you can't be sure that the data is really safe on disk without calling fsync. In the case of a new inode, that means fsync on the file and on the containing directory.
In my opinion, there is a conceptual difference between the machine or storage crashing hard, and just running out of disk space.
There can be performance issues depending on how that is done, however there are a number of solutions to those issues which can reduce the performance effects to the point where they are usually no longer a problem. That is with the caveat that slow storage will always be slow, of course!
The usual tricks are to avoid doing lots of small fsyncs, by gathering up smaller files, ideally sorting them into inode number order for local filesystems, and then issuing fsyncs asynchronously, waiting for them all only once all the fsyncs have been issued. Also fadvise/madvise can be useful in these situations too,
None of this applies to shell utilities such as grep and cat. They work around data loss as a result of the write system call not reporting ENOSPC errors: they close stdout and stderr underneath glibc, which leads to a different class of problems. It turns out that on Linux, close does more space checks than write, so this allows the shell utilities to check for ENOSPC without issuing fsyncs. At present, lack of space checks from write seems to primarily happen with NFS.
So let me rephrase: Does btrfs report ENOSPC during write? If it does not, what can we do to check for sufficient space during fflush and similar operations?
If we change the shell utilities to do an fsync on close, we get traditional UNIX behavior with traditional UNIX performance. I don't think that's what people want.
Thanks, Florian
* Josef Bacik:
That being said I can make btrfs look really stupid on some workloads. There's going to be cases where Btrfs isn't awesome. We still use xfs for all our storage related tiers (think databases). Performance is always going to be workload dependent, and Btrfs has built in overhead out the gate because of checksumming and the fact that we generate far more metadata.
Just to be clear here, the choice of XFS here is purely based on performance, not on the reliability of the file systems, right? (So it's not “all the really important data is stored in XFS”.)
Thanks, Florian
On 6/29/20 5:33 AM, Florian Weimer wrote:
- Josef Bacik:
That being said I can make btrfs look really stupid on some workloads. There's going to be cases where Btrfs isn't awesome. We still use xfs for all our storage related tiers (think databases). Performance is always going to be workload dependent, and Btrfs has built in overhead out the gate because of checksumming and the fact that we generate far more metadata.
Just to be clear here, the choice of XFS here is purely based on performance, not on the reliability of the file systems, right? (So it's not “all the really important data is stored in XFS”.)
Yes that's correct. At our scale everything falls over, including XFS, and as I've stated elsewhere in this thread we actually see a higher rate of failure (relative to the install size) with XFS. The databases we use already do all of the fancy things that btrfs does in the application. If we could get away with it we'd just use raw disks for those applications. and in fact may do that in the future. Thanks,
Josef
On 6/29/20 8:39 AM, Josef Bacik wrote:
On 6/29/20 5:33 AM, Florian Weimer wrote:
- Josef Bacik:
That being said I can make btrfs look really stupid on some workloads. There's going to be cases where Btrfs isn't awesome. We still use xfs for all our storage related tiers (think databases). Performance is always going to be workload dependent, and Btrfs has built in overhead out the gate because of checksumming and the fact that we generate far more metadata.
Just to be clear here, the choice of XFS here is purely based on performance, not on the reliability of the file systems, right? (So it's not “all the really important data is stored in XFS”.)
Yes that's correct. At our scale everything falls over, including XFS, and as I've stated elsewhere in this thread we actually see a higher rate of failure (relative to the install size) with XFS. The databases we use already do all of the fancy things that btrfs does in the application. If we could get away with it we'd just use raw disks for those applications. and in fact may do that in the future. Thanks,
Josef, with my XFS hat on, are these recent failures? Have they all been reported to the XFS list?
It makes sense to look at reliability in the context of this thread, but offering "btrfs fails less often than XFS for us" without any context (what kind of failure, what kernel, when, etc) doesn't help much, it's just more anecdotes.
Thanks, -Eric
On 6/29/20 2:23 PM, Eric Sandeen wrote:
On 6/29/20 8:39 AM, Josef Bacik wrote:
On 6/29/20 5:33 AM, Florian Weimer wrote:
- Josef Bacik:
That being said I can make btrfs look really stupid on some workloads. There's going to be cases where Btrfs isn't awesome. We still use xfs for all our storage related tiers (think databases). Performance is always going to be workload dependent, and Btrfs has built in overhead out the gate because of checksumming and the fact that we generate far more metadata.
Just to be clear here, the choice of XFS here is purely based on performance, not on the reliability of the file systems, right? (So it's not “all the really important data is stored in XFS”.)
Yes that's correct. At our scale everything falls over, including XFS, and as I've stated elsewhere in this thread we actually see a higher rate of failure (relative to the install size) with XFS. The databases we use already do all of the fancy things that btrfs does in the application. If we could get away with it we'd just use raw disks for those applications. and in fact may do that in the future. Thanks,
Josef, with my XFS hat on, are these recent failures? Have they all been reported to the XFS list?
It makes sense to look at reliability in the context of this thread, but offering "btrfs fails less often than XFS for us" without any context (what kind of failure, what kernel, when, etc) doesn't help much, it's just more anecdotes.
Yup this is why I try to avoid talking about other file systems. This shouldn't be interpreted as "XFS drools, btrfs rules!", just that in our own environment, btrfs does not fail at any significant rate higher than xfs.
Xfs is used in completely different workloads, and with completely different (much better) hardware.
And the reason they haven't been brought up to the list is because it fails at such a low rate that I didn't even realize we were having xfs reprovisions until I went and looked at the data. So far of the 15 machines that fell over, 10 of them appear to be hardware related. The other 5 have logs that are in a different database that take longer to pull out. Thanks,
Josef
On 6/29/20 1:47 PM, Josef Bacik wrote:
Just to be clear here, the choice of XFS here is purely based on performance, not on the reliability of the file systems, right? (So it's not “all the really important data is stored in XFS”.)
Yes that's correct. At our scale everything falls over, including XFS, and as I've stated elsewhere in this thread we actually see a higher rate of failure (relative to the install size) with XFS. The databases we use already do all of the fancy things that btrfs does in the application. If we could get away with it we'd just use raw disks for those applications. and in fact may do that in the future. Thanks,
Josef, with my XFS hat on, are these recent failures? Have they all been reported to the XFS list?
It makes sense to look at reliability in the context of this thread, but offering "btrfs fails less often than XFS for us" without any context (what kind of failure, what kernel, when, etc) doesn't help much, it's just more anecdotes.
Yup this is why I try to avoid talking about other file systems. This shouldn't be interpreted as "XFS drools, btrfs rules!", just that in our own environment, btrfs does not fail at any significant rate higher than xfs.
Xfs is used in completely different workloads, and with completely different (much better) hardware.
And the reason they haven't been brought up to the list is because it fails at such a low rate that I didn't even realize we were having xfs reprovisions until I went and looked at the data. So far of the 15 machines that fell over, 10 of them appear to be hardware related. The other 5 have logs that are in a different database that take longer to pull out. Thanks,
Josef
Thanks for the context, Josef, I appreciate it.
-Eric
On Mon, Jun 29, 2020 at 11:33:40AM +0200, Florian Weimer wrote:
Just to be clear here, the choice of XFS here is purely based on performance, not on the reliability of the file systems, right? (So it's not “all the really important data is stored in XFS”.)
Be careful about overloading quite a few definitions into the single word "reliability".
You seem to be referring to btrfs features like file checksumming that can detect silent corruption, and automagically fix things if you've enabled the equally automagic RAID1-like features. (Which, for the record, I think are really frickin' awesome!)
But what good is btrfs' attestation of file integrity when it craps itself to the point where it doesn't even know those files even _exist_ anymore? How can we brag about robustness in the face of cosmic rays or recovery from the power cord getting yanked when it couldn't reliably _remount_ a lightly-used, cleanly unmounted filesystem?
By that "reliability" metric, for me XFS has been infinitely better than btrfs; sure XFS can't automagically tell me if an individual file got corrupted (much less fix it) but it's also never eaten entire filesystems across clean unmount/mount cycles. Whereas btrfs has done so, Twice.
I realize this is several-years-out-of-date anectdata, but it's the sort of thing that has given btrfs (quite deservedly) a very bad reputation.
(BTW, I didn't try using XFS until after my second bad btrfs experience. It hasn't so much as hiccupped me since, proving to be the robust filesytem I've ever used...)
I concede that my experience is outdated, and am willing to take the btrfs authors at their word that the bugs that led to my filesystems eating themselves have been fixed, but.. sure, the btrfs proponents say it's "ready for production" now, but they also said that back then, too.
So. Instead of making btrfs the default for F33, perhaps a better approach is is to plan to make it the default for F34, and use the F33 cycle to encourage folks (eg release notes, installer prompting?) to try using btrfs.
The point here is not F32 vs F33 or whatever, but that of _time_ -- I don't think there's enough time between now and the F33 go/no point for folks like me to set up and sufficiently burn-in F32 btrfs systems to gain confidence that btrfs is indeed ready. (In any case, the traditional beta period is _way_ too short for something like this!)
- Solomon
* Solomon Peachy:
On Mon, Jun 29, 2020 at 11:33:40AM +0200, Florian Weimer wrote:
Just to be clear here, the choice of XFS here is purely based on performance, not on the reliability of the file systems, right? (So it's not “all the really important data is stored in XFS”.)
Be careful about overloading quite a few definitions into the single word "reliability".
You seem to be referring to btrfs features like file checksumming that
No, I was not. To me, for file systems, it means that under conditions I personally consider reasonable (generally healthy hardware, and only the occasional hard power-off after a system becomes unresponsive), the file system can be mounted, retains consistent metadata, and most of the data is still there, with the possible exception of things that have been written a short time after the crash.
It's not about getting the best out of partially faulty hardware or an execution environment with frequent power outages.
As you point out, historically, checksumming file systems weren't very good at this, but I think for btrfs, this has improved. (For a long time, there was a FAQ for a different checksumming file system that had something along the lines of “Q: I can't mount my file system due to a checksum error. A: Restore form backup.”.)
Thanks, Florian
On Mon, Jun 29, 2020 at 7:55 AM Solomon Peachy pizza@shaftnet.org wrote:
On Mon, Jun 29, 2020 at 11:33:40AM +0200, Florian Weimer wrote:
Just to be clear here, the choice of XFS here is purely based on performance, not on the reliability of the file systems, right? (So it's not “all the really important data is stored in XFS”.)
Be careful about overloading quite a few definitions into the single word "reliability".
You seem to be referring to btrfs features like file checksumming that can detect silent corruption, and automagically fix things if you've enabled the equally automagic RAID1-like features. (Which, for the record, I think are really frickin' awesome!)
But what good is btrfs' attestation of file integrity when it craps itself to the point where it doesn't even know those files even _exist_ anymore?
You've got an example where 'btrfs restore' saw no files at all? And you think it's the file system rather than the hardware, why?
I think this is the wrong metaphor because it suggests btrfs caused the crapping. The sequence is: btrfs does the right thing, drive firmware craps itself and there's a power failure or a crash. Btrfs in the ordinary case doesn't care and boots without complaint. In the far less common case some critical node just happened to get nerfed and there's no way to automatically recover. The user is left on an island. This part should get better anyway, even though it can happen with any file system.
And as a community we need the user to user support to make sure folks aren't left on an island - can we do that? This is the question. It really is a community question more than it is a technology question.
How can we brag about robustness in the face of cosmic rays or recovery from the power cord getting yanked when it couldn't reliably _remount_ a lightly-used, cleanly unmounted filesystem?
Come on. It's cleanly unmounted and doesn't mount?
I guess you missed the other emails about dm-log-writes and xfstests, but they directly relate here. Josef relayed that all of his deep dives into Btrfs failures since the dm-log-writes work, have all been traced back to hardware doing the wrong thing.
All file systems have write ordering expectations. If the hardware doesn't honor that, it's trouble if there's a crash. What you're describing is 100% a hardware crapped itself case. You said it cleanly unmounted i.e. the exact correct write ordering did happen. And yet the file system can't be mounted again. That's a hardware failure.
I realize this is several-years-out-of-date anectdata, but it's the sort of thing that has given btrfs (quite deservedly) a very bad reputation.
The frustration and skepticism are palpable. But here is the problem with the road you're going down: you are arguing in favor of closed door development practices. Keep all the scary early development out of public scrutiny, as a form of messaging control, so that reputation isn't damaged by knowing about all the sausage making.
The point here is not F32 vs F33 or whatever, but that of _time_ -- I don't think there's enough time between now and the F33 go/no point for folks like me to set up and sufficiently burn-in F32 btrfs systems to gain confidence that btrfs is indeed ready. (In any case, the traditional beta period is _way_ too short for something like this!)
There is no way for one person to determine if Btrfs is ready. That's done by combination of synthetic tests (xfstests) and volume regression testing on actual workloads. And by the way the Red Hat CKI project is going to help run btrfs xfstests for Fedora kernels.
The questions are whether the Fedora community wants and is ready for Btrfs by default.
On Mon, Jun 29, 2020 at 10:26:37AM -0600, Chris Murphy wrote:
You've got an example where 'btrfs restore' saw no files at all? And you think it's the file system rather than the hardware, why?
Because the system failed to boot up, and even after offline repair attempts was still missing a sufficiently large chunk of the root filesystem to necessitate re-installation.
Because the same hardware provided literally years of problem-free stability with ext4 (before) and xfs (after).
I think this is the wrong metaphor because it suggests btrfs caused the crapping. The sequence is: btrfs does the right thing, drive firmware craps itself and there's a power failure or a crash. Btrfs in the ordinary case doesn't care and boots without complaint. In the far
The first time, I needed to physically move the system, so the machine was shut down via 'shutdown -h now' on a console, and didn't come back up.
The second time was a routine post-dnf-update 'reboot', without power cycling anything.
At no point was there ever any unclean shutdown, and at the time of those reboots, no errors were reported in the kernel logs.
Once is a fluke, twice is a trend... and I didn't have the patience for a third try because I needed to be able to rely on the system to not eat itself.
I can't get the complete details at the moment, but it was an AMD E-350 system with an 32GB ADATA SATA, configured using anaconda's btrfs defaults and only about 30% of disk space used. Pretty minimal I/O.
I will concede that it's possible there was/is some sort hardware/firmware bug, but if so, only btrfs seemed to trigger it.
Come on. It's cleanly unmounted and doesn't mount?
Yes. (See above)
(Granted, I'm using "mount" to mean "successfully mounted a writable filesystem with data largely intact" -- I'm a bit fuzzy on the exact details but I believe the it did mount read-only before the boot crapped out due to missing/inaccessable system libraries. I had to resort to a USB stick to attempt repairs that were only partially successful)
All file systems have write ordering expectations. If the hardware doesn't honor that, it's trouble if there's a crash. What you're describing is 100% a hardware crapped itself case. You said it cleanly unmounted i.e. the exact correct write ordering did happen. And yet the file system can't be mounted again. That's a hardware failure.
That may be the case, but when there were no crashes, and neither ext4 nor xfs crapped themselves under day-to-day operation with the same hardware, it's reasonable to infer that the problem has _something_ to do with the variable that changed, ie btrfs.
There is no way for one person to determine if Btrfs is ready. That's done by combination of synthetic tests (xfstests) and volume regression testing on actual workloads. And by the way the Red Hat CKI project is going to help run btrfs xfstests for Fedora kernels.
Of course not, but the Fedora commnuity is made up of innumerable "one persons" each responsible for several special snowflake systems.
Let's say for sake of argument that my bad btrfs experiences were due to bugs in device firmware with btrfs's completely-legal usage patterns rather than bugs in btrfs-from-five-years-ago. That's great... except my system still got trashed to the point of needing to be reinstalled, and finger-pointing can't bring back lost data.
How many more special snowflake drives are out there? Think about how long it took Fedora to enable TRIM out of concern for potential data loss. Why should this be any different?
(We're always going to be stuck with buggy firmware. FFS, the Samsung 860 EVO SATA SSD that I have in my main workstation will hiccup to the point of trashing data when used with AMD SATA controllers... even under Windows! Their official support answer is "Use an Intel controller". And that's a tier-one manufacturer who presumably has among the best QA and support in the industry..)
If there is device/firmware known to be problematic, we need to keep track of these buggy devices and either automatically provide workarounds or some way to tell the user that proceeding with btrfs may be perilous to their data.
(Or perhaps the issues I had were due to bugs in btrfs-of-five-years-ago that have long since been fixed. Either way, given my twice-burned experiences, I would want to verify that for myself before I entrust it with any data I care about...)
The questions are whether the Fedora community wants and is ready for Btrfs by default.
There are obviously some folks here (myself included) that have had very negative btrfs experiences. Similarly, there are folks that have successfully overseen large-scale deployements of btrfs in their managed enviroments (not on Fedora though, IIUC)
So yes, I think an explicit "let's all test btrfs (as anaconda configures it) before we make it default" period is warranted.
Perhaps one can argue that Fedora has already been doing that for the past two years (since 2018-or-later-btrfs is what everyone with positive results appears to be talking about), but it's still not clear that those deployments utilize the same feature set as Fedora's defaults, and how broad the hardware sample is.
- Solomon
On Mon, Jun 29, 2020 at 03:15:23PM -0400, Solomon Peachy wrote:
So yes, I think an explicit "let's all test btrfs (as anaconda configures it) before we make it default" period is warranted.
Perhaps one can argue that Fedora has already been doing that for the past two years (since 2018-or-later-btrfs is what everyone with positive results appears to be talking about), but it's still not clear that those deployments utilize the same feature set as Fedora's defaults, and how broad the hardware sample is.
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
Normally we just switch the default or we don't, without half measures. But the fs is important enough and complicated enough to be extra careful about any transitions.
Zbyszek
Hi,
On 01/07/2020 07:54, Zbigniew Jędrzejewski-Szmek wrote:
On Mon, Jun 29, 2020 at 03:15:23PM -0400, Solomon Peachy wrote:
So yes, I think an explicit "let's all test btrfs (as anaconda configures it) before we make it default" period is warranted.
Perhaps one can argue that Fedora has already been doing that for the past two years (since 2018-or-later-btrfs is what everyone with positive results appears to be talking about), but it's still not clear that those deployments utilize the same feature set as Fedora's defaults, and how broad the hardware sample is.
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
Normally we just switch the default or we don't, without half measures. But the fs is important enough and complicated enough to be extra careful about any transitions.
Zbyszek
Indeed, it is an important point, and taking care is very important when dealing with other people's data, which is in effect what we are discussing here.
When we looked at btrfs support in RHEL, we took quite a long time over it. In fact I'm not quite sure how long, since the process had started before I was involved, but it was not a decision that was made quickly, and a great deal of thought went into it. It was difficult to get concrete information about the stability aspects at the time. Just like the discussions that have taken place on this thread, there was a lot of anecdotal evidence, but that is not always a good indicator. Since time has passed since then, and there is now more evidence, this part of the process should be easier. That said to get a meaningful comparison then ideally one would want to compare on the basis of user populations of similar size and technical skill level, and look not just at the overall number of bugs reported, but at the rate those bugs are being reported too.
It is often tricky to be sure of the root cause of bugs - just because a filesystem reports an error doesn't mean that it is at fault, it might be a hardware problem, or an issue with volume management. Figuring out where the real problem lies is often very time consuming work. Without that work though, the raw numbers of bugs reported can be very misleading.
It is also worth noting that when we made the decision for RHEL it was not just a question of stability, although that is obviously an important consideration. We looked at a wide range of factors, including the overall design and features. We had reached out to a number of potential users and asked them what features they wanted from their filesystems and tried to understand where we had gaps in our existing offerings. It would be worth taking that step here, and asking each of the spins what are the features that they would most like to see from the storage/fs stack. Comparing filesystems in the abstract is a difficult task, and it is much easier against a context. I know that some of the issues have already been discussed in this thread, but maybe if someone was to gather up a list of requirements from those messages then that would help to direct further discussion,
Steve.
On Wed, Jul 01, 2020 at 11:28:10AM +0100, Steven Whitehouse wrote:
Hi,
On 01/07/2020 07:54, Zbigniew Jędrzejewski-Szmek wrote:
On Mon, Jun 29, 2020 at 03:15:23PM -0400, Solomon Peachy wrote:
So yes, I think an explicit "let's all test btrfs (as anaconda configures it) before we make it default" period is warranted.
Perhaps one can argue that Fedora has already been doing that for the past two years (since 2018-or-later-btrfs is what everyone with positive results appears to be talking about), but it's still not clear that those deployments utilize the same feature set as Fedora's defaults, and how broad the hardware sample is.
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
Normally we just switch the default or we don't, without half measures. But the fs is important enough and complicated enough to be extra careful about any transitions.
Zbyszek
Indeed, it is an important point, and taking care is very important when dealing with other people's data, which is in effect what we are discussing here.
When we looked at btrfs support in RHEL, we took quite a long time over it. In fact I'm not quite sure how long, since the process had started before I was involved, but it was not a decision that was made quickly, and a great deal of thought went into it. It was difficult to get concrete information about the stability aspects at the time. Just like the discussions that have taken place on this thread, there was a lot of anecdotal evidence, but that is not always a good indicator. Since time has passed since then, and there is now more evidence, this part of the process should be easier. That said to get a meaningful comparison then ideally one would want to compare on the basis of user populations of similar size and technical skill level, and look not just at the overall number of bugs reported, but at the rate those bugs are being reported too.
Yeah. I have no doubt that the decision was made carefully back then. That said, time has passed, and btrfs has evolved and our use cases have evolved too, so a fresh look is good.
We have https://fedoraproject.org/wiki/Changes/DNF_Better_Counting, maybe this could be used to collect some statistics about the fs type too.
It is often tricky to be sure of the root cause of bugs - just because a filesystem reports an error doesn't mean that it is at fault, it might be a hardware problem, or an issue with volume management. Figuring out where the real problem lies is often very time consuming work. Without that work though, the raw numbers of bugs reported can be very misleading.
It would be worth taking that step here, and asking each of the spins what are the features that they would most like to see from the storage/fs stack. Comparing filesystems in the abstract is a difficult task, and it is much easier against a context. I know that some of the issues have already been discussed in this thread, but maybe if someone was to gather up a list of requirements from those messages then that would help to direct further discussion,
Actually that part has been answered pretty comprehensively. The split between / and /home is hurting users and we completely sidestep it with this change. The change page lists a bunch of other benefits, incl. better integration with the new resource allocation mechanisms we have with cgroups2. So in a way this is a follow-up to the cgroupsv2-by-default change in F31. Snapshots and subvolumes also give additional powers to systemd-nspawn and other tools. I'd say that the huge potential of btrfs is clear. It's the possibility of the loss of stability that is my (and others') worry and the thing which is hard to gauge.
Zbyszek
Hi,
On 01/07/2020 12:09, Zbigniew Jędrzejewski-Szmek wrote:
On Wed, Jul 01, 2020 at 11:28:10AM +0100, Steven Whitehouse wrote:
Hi,
On 01/07/2020 07:54, Zbigniew Jędrzejewski-Szmek wrote:
On Mon, Jun 29, 2020 at 03:15:23PM -0400, Solomon Peachy wrote:
So yes, I think an explicit "let's all test btrfs (as anaconda configures it) before we make it default" period is warranted.
Perhaps one can argue that Fedora has already been doing that for the past two years (since 2018-or-later-btrfs is what everyone with positive results appears to be talking about), but it's still not clear that those deployments utilize the same feature set as Fedora's defaults, and how broad the hardware sample is.
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
Normally we just switch the default or we don't, without half measures. But the fs is important enough and complicated enough to be extra careful about any transitions.
Zbyszek
Indeed, it is an important point, and taking care is very important when dealing with other people's data, which is in effect what we are discussing here.
When we looked at btrfs support in RHEL, we took quite a long time over it. In fact I'm not quite sure how long, since the process had started before I was involved, but it was not a decision that was made quickly, and a great deal of thought went into it. It was difficult to get concrete information about the stability aspects at the time. Just like the discussions that have taken place on this thread, there was a lot of anecdotal evidence, but that is not always a good indicator. Since time has passed since then, and there is now more evidence, this part of the process should be easier. That said to get a meaningful comparison then ideally one would want to compare on the basis of user populations of similar size and technical skill level, and look not just at the overall number of bugs reported, but at the rate those bugs are being reported too.
Yeah. I have no doubt that the decision was made carefully back then. That said, time has passed, and btrfs has evolved and our use cases have evolved too, so a fresh look is good.
We have https://fedoraproject.org/wiki/Changes/DNF_Better_Counting, maybe this could be used to collect some statistics about the fs type too.
Yes, and also the questions that Fedora is trying to answer are different too. So I don't think that our analysis for RHEL is applicable here in general. The method that we went through, in general terms, may potentially be helpful.
It is often tricky to be sure of the root cause of bugs - just because a filesystem reports an error doesn't mean that it is at fault, it might be a hardware problem, or an issue with volume management. Figuring out where the real problem lies is often very time consuming work. Without that work though, the raw numbers of bugs reported can be very misleading. It would be worth taking that step here, and asking each of the spins what are the features that they would most like to see from the storage/fs stack. Comparing filesystems in the abstract is a difficult task, and it is much easier against a context. I know that some of the issues have already been discussed in this thread, but maybe if someone was to gather up a list of requirements from those messages then that would help to direct further discussion,
Actually that part has been answered pretty comprehensively. The split between / and /home is hurting users and we completely sidestep it with this change. The change page lists a bunch of other benefits, incl. better integration with the new resource allocation mechanisms we have with cgroups2. So in a way this is a follow-up to the cgroupsv2-by-default change in F31. Snapshots and subvolumes also give additional powers to systemd-nspawn and other tools. I'd say that the huge potential of btrfs is clear. It's the possibility of the loss of stability that is my (and others') worry and the thing which is hard to gauge.
Zbyszek
If the / and /home split is the main issue, then dm-thin might be an alternative solution, and we should check to see if some of the issues listed on the change page have been addressed. I'm copying in Jon for additional comment on that. Are those btrfs benefits which are listed on the change page in priority order?
File system resize is mentioned there, but pretty much all local filesystems support grow. Also, no use cases are listed for that benefit. Shrink is more tricky, and can easily result in poor file layouts, particularly if there are repeated grow/shrink operations, not to mention potential complications with NFS if the fs is exported. So is there some specific use case there that cannot be supported easily with the existing tools? There are a few other features listed that are available in other fs/volume management tools as well.
Eric has already pointed out that XFS has cgroups2 support, so the statement that btrfs is the only fs with that is incorrect. It would help to make things a bit clearer if that list was updated, with the information gathered so far,
Steve.
On 7/1/20 7:49 AM, Steven Whitehouse wrote:
Hi,
On 01/07/2020 12:09, Zbigniew Jędrzejewski-Szmek wrote:
On Wed, Jul 01, 2020 at 11:28:10AM +0100, Steven Whitehouse wrote:
Hi,
On 01/07/2020 07:54, Zbigniew Jędrzejewski-Szmek wrote:
On Mon, Jun 29, 2020 at 03:15:23PM -0400, Solomon Peachy wrote:
So yes, I think an explicit "let's all test btrfs (as anaconda configures it) before we make it default" period is warranted.
Perhaps one can argue that Fedora has already been doing that for the past two years (since 2018-or-later-btrfs is what everyone with positive results appears to be talking about), but it's still not clear that those deployments utilize the same feature set as Fedora's defaults, and how broad the hardware sample is.
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
Normally we just switch the default or we don't, without half measures. But the fs is important enough and complicated enough to be extra careful about any transitions.
Zbyszek
Indeed, it is an important point, and taking care is very important when dealing with other people's data, which is in effect what we are discussing here.
When we looked at btrfs support in RHEL, we took quite a long time over it. In fact I'm not quite sure how long, since the process had started before I was involved, but it was not a decision that was made quickly, and a great deal of thought went into it. It was difficult to get concrete information about the stability aspects at the time. Just like the discussions that have taken place on this thread, there was a lot of anecdotal evidence, but that is not always a good indicator. Since time has passed since then, and there is now more evidence, this part of the process should be easier. That said to get a meaningful comparison then ideally one would want to compare on the basis of user populations of similar size and technical skill level, and look not just at the overall number of bugs reported, but at the rate those bugs are being reported too.
Yeah. I have no doubt that the decision was made carefully back then. That said, time has passed, and btrfs has evolved and our use cases have evolved too, so a fresh look is good.
We have https://fedoraproject.org/wiki/Changes/DNF_Better_Counting, maybe this could be used to collect some statistics about the fs type too.
Yes, and also the questions that Fedora is trying to answer are different too. So I don't think that our analysis for RHEL is applicable here in general. The method that we went through, in general terms, may potentially be helpful.
It is often tricky to be sure of the root cause of bugs - just because a filesystem reports an error doesn't mean that it is at fault, it might be a hardware problem, or an issue with volume management. Figuring out where the real problem lies is often very time consuming work. Without that work though, the raw numbers of bugs reported can be very misleading. It would be worth taking that step here, and asking each of the spins what are the features that they would most like to see from the storage/fs stack. Comparing filesystems in the abstract is a difficult task, and it is much easier against a context. I know that some of the issues have already been discussed in this thread, but maybe if someone was to gather up a list of requirements from those messages then that would help to direct further discussion,
Actually that part has been answered pretty comprehensively. The split between / and /home is hurting users and we completely sidestep it with this change. The change page lists a bunch of other benefits, incl. better integration with the new resource allocation mechanisms we have with cgroups2. So in a way this is a follow-up to the cgroupsv2-by-default change in F31. Snapshots and subvolumes also give additional powers to systemd-nspawn and other tools. I'd say that the huge potential of btrfs is clear. It's the possibility of the loss of stability that is my (and others') worry and the thing which is hard to gauge.
Zbyszek
If the / and /home split is the main issue, then dm-thin might be an alternative solution, and we should check to see if some of the issues listed on the change page have been addressed. I'm copying in Jon for additional comment on that. Are those btrfs benefits which are listed on the change page in priority order?
File system resize is mentioned there, but pretty much all local filesystems support grow. Also, no use cases are listed for that benefit. Shrink is more tricky, and can easily result in poor file layouts, particularly if there are repeated grow/shrink operations, not to mention potential complications with NFS if the fs is exported. So is there some specific use case there that cannot be supported easily with the existing tools? There are a few other features listed that are available in other fs/volume management tools as well.
Eric has already pointed out that XFS has cgroups2 support, so the statement that btrfs is the only fs with that is incorrect. It would help to make things a bit clearer if that list was updated, with the information gathered so far,
Yeah that should be changed.
There's a big gap between having cgroups2 support and it actually working. The thing that I've said consistently is that there's nothing keeping XFS from working with cgroups2, it's just that we (Facebook) haven't tested it, because at the time we were rolling it out it didn't have writeback support.
Even btrfs with writeback support enabled still required a few investigations and follow up work to get everything working properly, because you don't ever know what's going to break until you actually use it. So while XFS technically has support, Btrfs is the only fs that we use cgroup2 with IO isolation in production, so it's the only thing we're comfortable with. XFS may work perfectly fine, but AFAIK nobody has ever tested it or used it in production. Thanks,
Josef
On 7/1/20 9:24 AM, Josef Bacik wrote:
On 7/1/20 7:49 AM, Steven Whitehouse wrote:
Hi,
On 01/07/2020 12:09, Zbigniew Jędrzejewski-Szmek wrote:
On Wed, Jul 01, 2020 at 11:28:10AM +0100, Steven Whitehouse wrote:
Hi,
On 01/07/2020 07:54, Zbigniew Jędrzejewski-Szmek wrote:
On Mon, Jun 29, 2020 at 03:15:23PM -0400, Solomon Peachy wrote:
So yes, I think an explicit "let's all test btrfs (as anaconda configures it) before we make it default" period is warranted.
Perhaps one can argue that Fedora has already been doing that for the past two years (since 2018-or-later-btrfs is what everyone with positive results appears to be talking about), but it's still not clear that those deployments utilize the same feature set as Fedora's defaults, and how broad the hardware sample is.
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
Normally we just switch the default or we don't, without half measures. But the fs is important enough and complicated enough to be extra careful about any transitions.
Zbyszek
Indeed, it is an important point, and taking care is very important when dealing with other people's data, which is in effect what we are discussing here.
When we looked at btrfs support in RHEL, we took quite a long time over it. In fact I'm not quite sure how long, since the process had started before I was involved, but it was not a decision that was made quickly, and a great deal of thought went into it. It was difficult to get concrete information about the stability aspects at the time. Just like the discussions that have taken place on this thread, there was a lot of anecdotal evidence, but that is not always a good indicator. Since time has passed since then, and there is now more evidence, this part of the process should be easier. That said to get a meaningful comparison then ideally one would want to compare on the basis of user populations of similar size and technical skill level, and look not just at the overall number of bugs reported, but at the rate those bugs are being reported too.
Yeah. I have no doubt that the decision was made carefully back then. That said, time has passed, and btrfs has evolved and our use cases have evolved too, so a fresh look is good.
We have https://fedoraproject.org/wiki/Changes/DNF_Better_Counting, maybe this could be used to collect some statistics about the fs type too.
Yes, and also the questions that Fedora is trying to answer are different too. So I don't think that our analysis for RHEL is applicable here in general. The method that we went through, in general terms, may potentially be helpful.
It is often tricky to be sure of the root cause of bugs - just because a filesystem reports an error doesn't mean that it is at fault, it might be a hardware problem, or an issue with volume management. Figuring out where the real problem lies is often very time consuming work. Without that work though, the raw numbers of bugs reported can be very misleading. It would be worth taking that step here, and asking each of the spins what are the features that they would most like to see from the storage/fs stack. Comparing filesystems in the abstract is a difficult task, and it is much easier against a context. I know that some of the issues have already been discussed in this thread, but maybe if someone was to gather up a list of requirements from those messages then that would help to direct further discussion,
Actually that part has been answered pretty comprehensively. The split between / and /home is hurting users and we completely sidestep it with this change. The change page lists a bunch of other benefits, incl. better integration with the new resource allocation mechanisms we have with cgroups2. So in a way this is a follow-up to the cgroupsv2-by-default change in F31. Snapshots and subvolumes also give additional powers to systemd-nspawn and other tools. I'd say that the huge potential of btrfs is clear. It's the possibility of the loss of stability that is my (and others') worry and the thing which is hard to gauge.
Zbyszek
If the / and /home split is the main issue, then dm-thin might be an alternative solution, and we should check to see if some of the issues listed on the change page have been addressed. I'm copying in Jon for additional comment on that. Are those btrfs benefits which are listed on the change page in priority order?
File system resize is mentioned there, but pretty much all local filesystems support grow. Also, no use cases are listed for that benefit. Shrink is more tricky, and can easily result in poor file layouts, particularly if there are repeated grow/shrink operations, not to mention potential complications with NFS if the fs is exported. So is there some specific use case there that cannot be supported easily with the existing tools? There are a few other features listed that are available in other fs/volume management tools as well.
Eric has already pointed out that XFS has cgroups2 support, so the statement that btrfs is the only fs with that is incorrect. It would help to make things a bit clearer if that list was updated, with the information gathered so far,
Yeah that should be changed.
There's a big gap between having cgroups2 support and it actually working.
Well, that's why dchinnner pushed back on the first patches from FB; there was no way for us or any other filesystem to validate that it worked, or would continue to work.
The thing that I've said consistently is that there's nothing keeping XFS from working with cgroups2, it's just that we (Facebook) haven't tested it, because at the time we were rolling it out it didn't have writeback support.
Even btrfs with writeback support enabled still required a few investigations and follow up work to get everything working properly, because you don't ever know what's going to break until you actually use it. So while XFS technically has support, Btrfs is the only fs that we use cgroup2 with IO isolation in production, so it's the only thing we're comfortable with. XFS may work perfectly fine, but AFAIK nobody has ever tested it or used it in production. Thanks,
The work was done by Christoph and was sponsored and tested by Profihost AG, who presumably use it in production.
-Eric
On Wed, Jul 1, 2020 at 5:49 AM Steven Whitehouse swhiteho@redhat.com wrote:
If the / and /home split is the main issue, then dm-thin might be an alternative solution, and we should check to see if some of the issues listed on the change page have been addressed. I'm copying in Jon for additional comment on that. Are those btrfs benefits which are listed on the change page in priority order?
They are of equal priority, from the perspective of both feature owners and the working group, based on many months of discussion. Individual users definitely have their own priorities, that also vary. There is perhaps an emphasis on solving /home and / free space competition, because it is one of the most pernicious issues that really leaves users on an island to fend for themselves in order to resolve it.
Importantly dm-thin doesn't fix this problem by avoiding it in the first place, which Btrfs does. On dm-thin, the user must still identify which file system is out of space, and grow the file system. Once file systems are either snapshot or over provisioned, the only arbiter of used and free space truth is the thin pool. File system sizes are virtual that currently CLI and GUI apps are unprepared to deal with. And that sets up a prerequisite solution before anything dm-thin based could be used, because having fantasy free space reporting is objectively a UX regression.
The transparent compression feature is perhaps understated. It's configurable per directory and per file. That includes algorithm selection. A future feature includes configurable compression level in the XATTR. The simplest use case is selective compression of high value targets like /usr and flatpaks. Future feature ideas include user selection of directories, and UI that shows compression efficacy.
Reflinks are permitted between Btrfs subvolumes, where neither reflinks nor hard links are possible between dm-thin snapshots. One use case is cheaply restoring individual files from snapshots. Also thin snapshots currently pin the file system journal inside, making them rather expensive in terms of space consumption, compared to Btrfs snapshots. Again, the cost of thin snapshots is only revealed by the thin pool, not the file system. Where as on Btrfs 'df' and friends are expected to properly report free space, and they do.
Also, cgroup2 developers report that the IO isolation features of any file system are lost on anything device-mapper based. And while that work is in progress, it's not there yet.
Integrity checking is highly valued by some and less by others. Considering that we know hardware isn't 100% reliable, and doesn't always report its own failures as expected, and hence why most file systems now at least checksum metadata, it's not persuasive to me that the data should be left unchecked, and corruption ought to be handled by user space somehow.
File system resize is mentioned there, but pretty much all local filesystems support grow. Also, no use cases are listed for that benefit.
Windows and macOS support online shrink and grow for more than a decade. While it doesn't often come up on the desktop, if you don't have it and need it, it's aggravating.
The typical use case today is to reprovision a system with an additional or eventual substitute OS, without first having to destroy another. I'd call it rare. But it's also essentially expected.
The much more common use case is for systemd-homed for managing authentication and user homes, including when encrypted. It's not decided whether to integrate sd-homed but it supports multiple storage types. One of those storage types, LUKS on loop, does effectively depend on file system shrink capability. While the use case for Fedora is mainly single user, and to optimize for that case, it is not exclusively single user so the chosen solution shouldn't cause difficult regressions. And we get a number of free and used space knock on effects here if the file system can't do online shrink. Is lack of online shrink disqualifying? No, but having it significantly improves UX. So whether LUKS or future fscrypt/Btrfs encryption, this road points to Btrfs.
Yes, we could drop LVM and go with one big file system, that too was discussed. The main knock on effect there is a significant minority of users want to do a clean install of Fedora from time to time while preserving user home. The installer permits this behavior with LVM layouts, and Btrfs by only requiring a new root subvolume be created for mounting at /.
It doesn't mean Btrfs applies to 100% of Fedora use cases. No single layout does. But Btrfs consistently solves more problems than causing more knock on effects. This doesn't make the alternatives bad. It just leaves a variety of problems unsolved. That too isn't inherently bad or disqualifying, but it's an opportunity that begs for a more complete picture.
Shrink is more tricky, and can easily result in poor file layouts, particularly if there are repeated grow/shrink operations, not to mention potential complications with NFS if the fs is exported.
A systemd-homed workflow suggests some cases where there will be many grow/shrink operations. If there are two or three active users, this might mean several grow/shrink operations per day. So it would need to be a file system explicitly designed for this in mind, including no negative locality knock on effects. Only Btrfs meets this use case requirement without knock on effects. Its metadata has no fixed layout, its written dynamically so it doesn't suffer from the poor layout problem other file systems do.
Eric has already pointed out that XFS has cgroups2 support, so the statement that btrfs is the only fs with that is incorrect. It would help to make things a bit clearer if that list was updated, with the information gathered so far,
Updated. I've asked a number of cgroups2 kernel developers about this and they've consistently told me that they know Btrfs does it correctly, ext4 has priority inversions, and they don't know about XFS.
Chris Murphy
On 7/1/20 12:50 PM, Chris Murphy wrote:
...
Integrity checking is highly valued by some and less by others. Considering that we know hardware isn't 100% reliable, and doesn't always report its own failures as expected, and hence why most file systems now at least checksum metadata, it's not persuasive to me that the data should be left unchecked, and corruption ought to be handled by user space somehow.
There's a flip side to this coin - in my experience, if the right btrfs metadata blocks experience this disk corruption, there can be a complete inability to recover the btrfs filesystem from that error - i.e. it won't mount, and btrfsck --repair won't get it to a mountable state.
So if we're saying disk corruption happens often enough that data checksumming is critical, then it happens often enough that metadata recovery is at least as critical.
I've been trying to quantify this and have not come up with a particularly compelling test scenario, because it involves purposefully (though at random) corrupting enough blocks on a filesystem image that a critical block gets hit, so it looks synthetic. But the net result is frequently a filesystem where btrfsck and/or mount fails, and at first blush this type of failure happens much more often than on other filesystems.[1]
I think Josef has alluded to this situation as well. To me, that's a big concern. Not trying to be a wet blanket here but I think this needs to be carefully investigated and evaluated to understand what impact it may have on Fedora btrfs users and their ability to recover their data in the face of metadata corruption, because it looks to me like a definite btrfs weak spot.
-Eric
[1] some details - I used the mangle.c fuzzer from fsfuzzer, and modified it so that it corrupts 8192 bytes of an image, which in fs terms can be up to 8192 filesystem blocks. I also avoided the first 4k so that any filesystem signature was not damaged.
I then ran a loop where I created a 1G base image, populated it, fuzzed it in this way, (so up to 3% of blocks were damaged) and ran the filesystem's fsck utility (in btrfs' case, btrfsck --repair) and then tried to mount (in btrfs' case, with bare mount, then -o usebackuproot if mount failed). If it mounted, I used "find | wc" to see how many files were reachable vs the original image.
If either fsck or mount reports an exit code that reflects failure to complete properly, I recorded that.
It was a quick hack, and it's not beautiful, so there are probably holes to be poked in it; if you want to look, I threw the bash script and the C source up at https://people.redhat.com/esandeen/fsckfuzzer/
Running 10 loops on each of btrfs, ext4, and xfs I got results that look like this (ext4 always creates empty lost+found so it will always find at least 1 file there)
btrfs
fsck failed 0 files in lost+found, 628 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 526 files in lost+found, 9 files gone/unreachable 595 files in lost+found, 55 files gone/unreachable 53 files in lost+found, 8 files gone/unreachable 57 files in lost+found, 44 files gone/unreachable fsck failed 7 files in lost+found, 1491 files gone/unreachable fsck failed, mount failed fsck failed, mount failed 88 files in lost+found, 40 files gone/unreachable == 4 fsck failures, 2 mount failures
ext4
1 files in lost+found, 0 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 164 files in lost+found, 2 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 1 files in lost+found, 1 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 9 files in lost+found, 1 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable == 0 fsck failures, 0 mount failures
xfs
0 files in lost+found, 1 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 958 files in lost+found, 629 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 2 files in lost+found, 0 files gone/unreachable 0 files in lost+found, 1 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 8 files in lost+found, 1 files gone/unreachable 3 files in lost+found, -1 files gone/unreachable == 0 fsck failures, 0 mount failures
On Thursday, 2 July 2020 21.38.46 WEST Eric Sandeen wrote:
3 files in lost+found, -1 files gone/unreachable
This last line from the xfs test seems suspicious (the -1 file gone). :-)
On 7/2/20 3:58 PM, José Abílio Matos wrote:
On Thursday, 2 July 2020 21.38.46 WEST Eric Sandeen wrote:
3 files in lost+found, -1 files gone/unreachable
This last line from the xfs test seems suspicious (the -1 file gone). :-)
It is weird, but it shows I didn't fudge the numbers ;)
directory repair may have inadvertently created a file or something, not sure.
-Eric
On 7/2/20 4:38 PM, Eric Sandeen wrote:
On 7/1/20 12:50 PM, Chris Murphy wrote:
...
Integrity checking is highly valued by some and less by others. Considering that we know hardware isn't 100% reliable, and doesn't always report its own failures as expected, and hence why most file systems now at least checksum metadata, it's not persuasive to me that the data should be left unchecked, and corruption ought to be handled by user space somehow.
There's a flip side to this coin - in my experience, if the right btrfs metadata blocks experience this disk corruption, there can be a complete inability to recover the btrfs filesystem from that error - i.e. it won't mount, and btrfsck --repair won't get it to a mountable state.
So if we're saying disk corruption happens often enough that data checksumming is critical, then it happens often enough that metadata recovery is at least as critical.
I've been trying to quantify this and have not come up with a particularly compelling test scenario, because it involves purposefully (though at random) corrupting enough blocks on a filesystem image that a critical block gets hit, so it looks synthetic. But the net result is frequently a filesystem where btrfsck and/or mount fails, and at first blush this type of failure happens much more often than on other filesystems.[1]
I think Josef has alluded to this situation as well. To me, that's a big concern. Not trying to be a wet blanket here but I think this needs to be carefully investigated and evaluated to understand what impact it may have on Fedora btrfs users and their ability to recover their data in the face of metadata corruption, because it looks to me like a definite btrfs weak spot.
Yeah this is what I've said many times over the last 3 weeks. Btrfs is more vulnerable to metadata corruption.
Now there's things that we can do to mitigate this. I have one patch up to handle one of the main cases (a corrupt global tree). The next patch set will be to keep entire metadata tree's around for longer as long as we have space to handle it. These two things will drastically improve the situation, but of course if I'm being evil we can still end up in a bad spot. These patches are not hard or controversial, they'll likely land in 5.9 which will be what F33 ships with (if I'm doing my math right).
And this sort of ignores the other side of the coin. fsfuzzer isn't just corrupting metadata, it's corrupting data. Btrfs is the only file system that's going to notice that and let the user know.
Checksumming is great because it lets the user know things are going wrong before they go catastrophically wrong. However just because we know something went wrong doesn't mean we can do anything about it, it just means that the user knows now that they need to restore from backups and find a new drive. These features do not mean you are absolved of good practices. If you care about data, you need to have it in multiple places. End of story. Btrfs is just going to let you know in advance that things are going wrong.
We're talking about this issue like it's reasonable that xfs and ext4 are going to allow the user to get back a bunch of data they don't know is ok or not. We're also talking about it like the user should be able to carry on his happy merry way. In these cases the drive is dying and needs to be shredded, and a new install needs to happen and a restore from backups needs to happen. Is the btrfs failure much less user friendly? No doubt about it. Is it any comfort at all when a user shows up and we say "where are your backups" and they say "what backups?", no. But if we're going to talk about this like ext4 and xfs are much better because they give you the _appearance_ that your data is fine, that's a bit disingenuous.
"Well what if it was just /usr." Sure, then you got lucky and you could copy things off. But what if it wasn't? That's the measure that's being applied to btrfs here. Is it likely that random corruption is going to be so bad that you end up with an unmountable file system? It's about as likely that the random corruption is on your dissertation or your family photographs. The difference is that btrfs will tell you that your dissertation or your family photographs are now bad, whereas ext4 and xfs will not.
These are tradeoffs no doubt. Every file system choice is a series of trade offs. We're arguing/optimizing for the narrowest usecase. Arguments can be made either way, but in the end is it important enough to not move ahead with btrfs? Thanks,
Josef
On 7/2/20 4:44 PM, Josef Bacik wrote:
We're talking about this issue like it's reasonable that xfs and ext4 are going to allow the user to get back a bunch of data they don't know is ok or not. We're also talking about it like the user should be able to carry on his happy merry way. In these cases the drive is dying and needs to be shredded, and a new install needs to happen and a restore from backups needs to happen. Is the btrfs failure much less user friendly? No doubt about it. Is it any comfort at all when a user shows up and we say "where are your backups" and they say "what backups?", no. But if we're going to talk about this like ext4 and xfs are much better because they give you the _appearance_ that your data is fine, that's a bit disingenuous.
If I had talked about it like that, it would have been disingenuous.
But I didn't; this was an investigation of resiliency to metadata corruption, not data error detection, and to what degree metadata corruption can render files or even entire filesystems unreachable after normal administrative recovery efforts.
-Eric
Yeah I mean the general discussion, not you specifically. Thanks,
Josef
On Thu, Jul 2, 2020 at 8:38 PM Eric Sandeen sandeen@redhat.com wrote:
On 7/2/20 4:44 PM, Josef Bacik wrote:
We're talking about this issue like it's reasonable that xfs and ext4
are going to allow the user to get back a bunch of data they don't know is ok or not. We're also talking about it like the user should be able to carry on his happy merry way. In these cases the drive is dying and needs to be shredded, and a new install needs to happen and a restore from backups needs to happen. Is the btrfs failure much less user friendly? No doubt about it. Is it any comfort at all when a user shows up and we say "where are your backups" and they say "what backups?", no. But if we're going to talk about this like ext4 and xfs are much better because they give you the _appearance_ that your data is fine, that's a bit disingenuous.
If I had talked about it like that, it would have been disingenuous.
But I didn't; this was an investigation of resiliency to metadata corruption, not data error detection, and to what degree metadata corruption can render files or even entire filesystems unreachable after normal administrative recovery efforts.
-Eric _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Le jeudi 02 juillet 2020 à 17:44 -0400, Josef Bacik a écrit :
However just because we know something went wrong doesn't mean we can do anything about it, it just means that the user knows now that they need to restore from backups
That’s a perfect answer for an Enterprise server setup with systematic backup/restore procedures.
For workstations? Even in an Enterprise context? Not so much.
Regards,
On 7/2/20 4:38 PM, Eric Sandeen wrote:
Running 10 loops on each of btrfs, ext4, and xfs I got results that look like this (ext4 always creates empty lost+found so it will always find at least 1 file there)
btrfs ... == 4 fsck failures, 2 mount failures
ext4 ... == 0 fsck failures, 0 mount failures
xfs ... == 0 fsck failures, 0 mount failures
Did you check the content of the filesystem, to make sure that the files restored by fsck are actually correct?
I think ext4/xfs may be showing 0 files lost but they may or may not contain the pre-damage content, while btrfs would just fess up that it lost them if the checksums didn't agree.
On Wed, 1 Jul 2020 at 07:19, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
Yeah. I have no doubt that the decision was made carefully back then. That said, time has passed, and btrfs has evolved and our use cases have evolved too, so a fresh look is good.
We have https://fedoraproject.org/wiki/Changes/DNF_Better_Counting, maybe this could be used to collect some statistics about the fs type too.
I am going to try and nix this one in the bud right here. DNF counting is NOT the place to do this.
Starting to collect such information is a slippery slope and the more you collect the harder it is to remove personal identifiable data. If this sort of data is going to be collected it needs to be done by a specific program that does this, which can be audited, which can be deleted, and which can be 'cleaned' to meet GDPR and other rules. The DNF counting works because all it does is give a 'better guess' on what is going on by randomly burping a countme over a week (if countme is turned on). The data it collects in the end is not absolute but fuzzy.. it is just supposedly better a better fuzzy than the previous guesses.
The other information gathered in that transaction is stuff that already has a business need to work.. our servers need to know what architecture, what release and what ip to send data the appropriate mirrorlist back to.. we also need to keep a log of the transaction to debug why XYZ proxy decided to send the wrong thing or some other issue.
Mirrormanager does not have a need to know what filesystem you are using, it does not need to know what exact CPU, memory amount, or a bunch of other things which would be useful for the project. Even something like the packages you have installed which is sort of closely aligned with a business need that other distros do collect, it does not collect and adding it now would be a headache.
If the project wants this, then someone needs to make a smolt replacement but with some people who truly understand privacy programming well enough to not end up with a landmine field.
On Wed, Jul 01, 2020 at 08:48:57AM -0400, Stephen John Smoogen wrote:
We have https://fedoraproject.org/wiki/Changes/DNF_Better_Counting, maybe this could be used to collect some statistics about the fs type too.
I am going to try and nix this one in the bud right here. DNF counting is NOT the place to do this.
Yeah -- that feature is explicitly limited. I know Christian is interested in an system-information-collection system developed by Endless Computing as presented at GUADEC ... was that just last year? Sometime. (What is time anyway?)
Le mercredi 01 juillet 2020 à 11:09 +0000, Zbigniew Jędrzejewski-Szmek a écrit :
Actually that part has been answered pretty comprehensively. The split between / and /home is hurting users
Actually this split is a godsend because you can convince anaconda to leave your home alone when reinstalling, while someone always seems too invent a new Fedora change that justifies the reformatting of /.
Good luck dealing with user data the next time workstation (or any other group) feels the / filesystem should change, once you've put user data on the same mount point
Regards,
On Wed, Jul 1, 2020 at 10:26 AM Nicolas Mailhot via devel devel@lists.fedoraproject.org wrote:
Le mercredi 01 juillet 2020 à 11:09 +0000, Zbigniew Jędrzejewski-Szmek a écrit :
Actually that part has been answered pretty comprehensively. The split between / and /home is hurting users
Actually this split is a godsend because you can convince anaconda to leave your home alone when reinstalling, while someone always seems too invent a new Fedora change that justifies the reformatting of /.
Good luck dealing with user data the next time workstation (or any other group) feels the / filesystem should change, once you've put user data on the same mount point
Anaconda does this behavior correctly with btrfs with / and /home as separate subvolumes on the same btrfs volume. So the preservation of /home would still work, while we get flexible storage allocation at the volume level.
Le mercredi 01 juillet 2020 à 10:27 -0400, Neal Gompa a écrit :
On Wed, Jul 1, 2020 at 10:26 AM Nicolas Mailhot via devel devel@lists.fedoraproject.org wrote:
Le mercredi 01 juillet 2020 à 11:09 +0000, Zbigniew Jędrzejewski- Szmek a écrit :
Actually that part has been answered pretty comprehensively. The split between / and /home is hurting users
Actually this split is a godsend because you can convince anaconda to leave your home alone when reinstalling, while someone always seems too invent a new Fedora change that justifies the reformatting of /.
Good luck dealing with user data the next time workstation (or any other group) feels the / filesystem should change, once you've put user data on the same mount point
Anaconda does this behavior correctly with btrfs with / and /home as separate subvolumes on the same btrfs volume. So the preservation of /home would still work, while we get flexible storage allocation at the volume level.
That only works as long as btrfs is the next shiny thing, or as long as no one decides the options Fedora used to create btrfs volumes with are crap and it’s a good idea to recreate them with new better options.
(btrfs may offer migration to new volume options like ext4 did in the past, I still would not want anaconda to touch my existing user data volumes, especially when doing emergency reinstalls because the kernel, systemd, glibc or any other core component crapped itself).
Regards,
On Wed, Jul 01, 2020 at 04:25:31PM +0200, Nicolas Mailhot via devel wrote:
Le mercredi 01 juillet 2020 à 11:09 +0000, Zbigniew Jędrzejewski-Szmek a écrit :
Actually that part has been answered pretty comprehensively. The split between / and /home is hurting users
Actually this split is a godsend because you can convince anaconda to leave your home alone when reinstalling, while someone always seems too invent a new Fedora change that justifies the reformatting of /.
Good luck dealing with user data the next time workstation (or any other group) feels the / filesystem should change, once you've put user data on the same mount point
The whole point of the btrfs change is to keep / and /home on separate subvolumes to avoid the anaconda requirement to reformat / from affecting /home while also avoiding the problem of running out of space on one while still having tons of free space on the other.
On Wed, Jul 1, 2020 at 4:25 pm, Nicolas Mailhot via devel devel@lists.fedoraproject.org wrote:
Actually this split is a godsend because you can convince anaconda to leave your home alone when reinstalling, while someone always seems too invent a new Fedora change that justifies the reformatting of /.
Good luck dealing with user data the next time workstation (or any other group) feels the / filesystem should change, once you've put user data on the same mount point
So for the avoidance of doubt: if the btrfs change is rejected, we are almost certain to put everything on the same mount point. We haven't approved this yet, but odds are very high IMO. The options we are seriously considering for our default going forward are (a) btrfs, (b) failing that, probably ext4 all one big partition without LVM, (c) less-likely, maybe xfs all one big partition without LVM. This is being discussed in https://pagure.io/fedora-workstation/issue/152
We have a high number of complaints from developers running out of space on / with plenty of space left on /home (happens to me all the time). The opposite scenario is a problem too. Separate mountpoints by default is just not a good default, sorry. Ensuring users don't run out of space due to bad partitioning is more important than keeping /home during reinstall IMO. But with btrfs, then /home will just be a subvolume so we can have our cake and eat it too.
On 2020-07-01 18:53, Michael Catanzaro wrote:
The options we are seriously considering for our default going forward are (a) btrfs, (b) failing that, probably ext4 all one big partition without LVM, (c) less-likely, maybe xfs all one big partition without LVM. This is being discussed in https://pagure.io/fedora-workstation/issue/152
One partition without LVM? Maybe labeling this partition C:\
The real solution would be to make wise usage of LVM, for example by not allocating 100% of the extents at the beginning (or even dm-thin) and/or using filesystems where a shrink is supported (I'm here blaming xfs for not having this, while ext4 has).
On Wed, Jul 1, 2020 at 11:01 pm, Roberto Ragusa mail@robertoragusa.it wrote:
The real solution would be to make wise usage of LVM, for example by not allocating 100% of the extents at the beginning (or even dm-thin) and/or using filesystems where a shrink is supported (I'm here blaming xfs for not having this, while ext4 has).
Leaving space unallocated doesn't gain us anything because the user still has to manually resize both logical volumes and the partitions inside them. Our default needs to be something that doesn't require users to resize partitions.
On 2020-07-01 23:04, Michael Catanzaro wrote:
On Wed, Jul 1, 2020 at 11:01 pm, Roberto Ragusa mail@robertoragusa.it wrote:
The real solution would be to make wise usage of LVM, for example by not allocating 100% of the extents at the beginning (or even dm-thin) and/or using filesystems where a shrink is supported (I'm here blaming xfs for not having this, while ext4 has).
Leaving space unallocated doesn't gain us anything because the user still has to manually resize both logical volumes and the partitions inside them. Our default needs to be something that doesn't require users to resize partitions.
But those are things that can be done in a few seconds with one or two commands. Attempts to make easy things easier lead to making other things difficult: some not so inexperienced users will find themselves with their disk having only one big partition, no LVM, everything inside (system+data) and trying to decipher the suggestion found on a forum "with btrfs you can sort of format / without losing /home even if you do not have separate partitions".
On 7/1/20 11:53 AM, Michael Catanzaro wrote:
On Wed, Jul 1, 2020 at 4:25 pm, Nicolas Mailhot via devel devel@lists.fedoraproject.org wrote:
Actually this split is a godsend because you can convince anaconda to leave your home alone when reinstalling, while someone always seems too invent a new Fedora change that justifies the reformatting of /.
Good luck dealing with user data the next time workstation (or any other group) feels the / filesystem should change, once you've put user data on the same mount point
So for the avoidance of doubt: if the btrfs change is rejected, we are almost certain to put everything on the same mount point. We haven't approved this yet, but odds are very high IMO. The options we are seriously considering for our default going forward are (a) btrfs, (b) failing that, probably ext4 all one big partition without LVM, (c) less-likely, maybe xfs all one big partition without LVM. This is being discussed in https://pagure.io/fedora-workstation/issue/152
We have a high number of complaints from developers running out of space on / with plenty of space left on /home (happens to me all the time). The opposite scenario is a problem too. Separate mountpoints by default is just not a good default, sorry. Ensuring users don't run out of space due to bad partitioning is more important than keeping /home during reinstall IMO. But with btrfs, then /home will just be a subvolume so we can have our cake and eat it too.
This can be mitigated with directory (project) quotas, btw.
On XFS, exceeding a directory tree quota even yields ENOSPC. (on ext4, it's EDQUOT right now.)
So one big / partition including /home, with a directory quota set on /home at 20G, will yield ENOSPC when home contains 20G and will now allow / to get filled with user files.
It's also trivial to adjust the directory quota on /home up or down, as needed.
It's another cake eating-and-having option which is a pretty trivial thing to implement.
-Eric
On Wed, Jul 1, 2020 at 5:06 PM Eric Sandeen sandeen@redhat.com wrote:
On 7/1/20 11:53 AM, Michael Catanzaro wrote:
On Wed, Jul 1, 2020 at 4:25 pm, Nicolas Mailhot via devel devel@lists.fedoraproject.org wrote:
Actually this split is a godsend because you can convince anaconda to leave your home alone when reinstalling, while someone always seems too invent a new Fedora change that justifies the reformatting of /.
Good luck dealing with user data the next time workstation (or any other group) feels the / filesystem should change, once you've put user data on the same mount point
So for the avoidance of doubt: if the btrfs change is rejected, we are almost certain to put everything on the same mount point. We haven't approved this yet, but odds are very high IMO. The options we are seriously considering for our default going forward are (a) btrfs, (b) failing that, probably ext4 all one big partition without LVM, (c) less-likely, maybe xfs all one big partition without LVM. This is being discussed in https://pagure.io/fedora-workstation/issue/152
We have a high number of complaints from developers running out of space on / with plenty of space left on /home (happens to me all the time). The opposite scenario is a problem too. Separate mountpoints by default is just not a good default, sorry. Ensuring users don't run out of space due to bad partitioning is more important than keeping /home during reinstall IMO. But with btrfs, then /home will just be a subvolume so we can have our cake and eat it too.
This can be mitigated with directory (project) quotas, btw.
On XFS, exceeding a directory tree quota even yields ENOSPC. (on ext4, it's EDQUOT right now.)
So one big / partition including /home, with a directory quota set on /home at 20G, will yield ENOSPC when home contains 20G and will now allow / to get filled with user files.
It's also trivial to adjust the directory quota on /home up or down, as needed.
It's another cake eating-and-having option which is a pretty trivial thing to implement.
This does not solve the "Anaconda will blow away /home because it's technically part of /" problem, though. Btrfs subvolumes do.
Directory quotas only protect against space contention, and while Btrfs quotas do the same thing, we're deliberately not proposing setting those up because we want space allocation to be flexible.
On 7/1/20 4:08 PM, Neal Gompa wrote:
On Wed, Jul 1, 2020 at 5:06 PM Eric Sandeen sandeen@redhat.com wrote:
On 7/1/20 11:53 AM, Michael Catanzaro wrote:
On Wed, Jul 1, 2020 at 4:25 pm, Nicolas Mailhot via devel devel@lists.fedoraproject.org wrote:
Actually this split is a godsend because you can convince anaconda to leave your home alone when reinstalling, while someone always seems too invent a new Fedora change that justifies the reformatting of /.
Good luck dealing with user data the next time workstation (or any other group) feels the / filesystem should change, once you've put user data on the same mount point
So for the avoidance of doubt: if the btrfs change is rejected, we are almost certain to put everything on the same mount point. We haven't approved this yet, but odds are very high IMO. The options we are seriously considering for our default going forward are (a) btrfs, (b) failing that, probably ext4 all one big partition without LVM, (c) less-likely, maybe xfs all one big partition without LVM. This is being discussed in https://pagure.io/fedora-workstation/issue/152
We have a high number of complaints from developers running out of space on / with plenty of space left on /home (happens to me all the time). The opposite scenario is a problem too. Separate mountpoints by default is just not a good default, sorry. Ensuring users don't run out of space due to bad partitioning is more important than keeping /home during reinstall IMO. But with btrfs, then /home will just be a subvolume so we can have our cake and eat it too.
This can be mitigated with directory (project) quotas, btw.
On XFS, exceeding a directory tree quota even yields ENOSPC. (on ext4, it's EDQUOT right now.)
So one big / partition including /home, with a directory quota set on /home at 20G, will yield ENOSPC when home contains 20G and will now allow / to get filled with user files.
It's also trivial to adjust the directory quota on /home up or down, as needed.
It's another cake eating-and-having option which is a pretty trivial thing to implement.
This does not solve the "Anaconda will blow away /home because it's technically part of /" problem, though. Btrfs subvolumes do.
Directory quotas only protect against space contention, and while Btrfs quotas do the same thing, we're deliberately not proposing setting those up because we want space allocation to be flexible.
I was not proposing directory quotas as any protection against mkfs of the root device, of course. Changing that behavior in Anaconda would be another rather minor change as well, i.e. the equivalent of "rm -rf /usr /var/ ..." instead of mkfs at reinstall time.
-Eric
On Wed, Jul 01, 2020 at 06:54:02AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
I'm leaning towards recommending this as well. I feel like we don't have good data to make a decision on -- the work that Red Hat did previously when making a decision was 1) years ago and 2) server-focused, and the Facebook production usage is encouraging but also not the same use case. I'm particularly concerned about metadata corruption fragility as noted in the Usenix paper. (It'd be nice if we could do something about that!)
Given the number of Fedora desktop users, even an increase of 0.1% in now-I-can't-boot situations would be a catastrophe. Is that a risk? I literally don't know. Maybe it's not -- but we've worked hard to get Fedora a reputation of being problem-free and something that leads without being "bleeding edge". It's a tricky balance.
Normally we just switch the default or we don't, without half measures. But the fs is important enough and complicated enough to be extra careful about any transitions.
Exactly.
Maybe we could add an "Automatically configure with btrfs (experimental)" option to the Installation Destination screen, and then feature that in Fedora Magazine and schedule a number of test days?
To be clear, I'm not suggesting this as a blocking tactic. The assumption would be that we'd go ahead with flipping the defaults (as you say above) for F34 unless the results come back in a way that gives us pause.
On 1 July 2020 20:24:37 CEST, Matthew Miller mattdm@fedoraproject.org wrote:
On Wed, Jul 01, 2020 at 06:54:02AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
I'm leaning towards recommending this as well. I feel like we don't have good data to make a decision on -- the work that Red Hat did previously when making a decision was 1) years ago and 2) server-focused, and the Facebook production usage is encouraging but also not the same use case. I'm particularly concerned about metadata corruption fragility as noted in the Usenix paper. (It'd be nice if we could do something about that!)
Given the number of Fedora desktop users, even an increase of 0.1% in now-I-can't-boot situations would be a catastrophe. Is that a risk? I literally don't know. Maybe it's not -- but we've worked hard to get Fedora a reputation of being problem-free and something that leads without being "bleeding edge". It's a tricky balance.
Normally we just switch the default or we don't, without half measures. But the fs is important enough and complicated enough to be extra careful about any transitions.
Exactly.
Maybe we could add an "Automatically configure with btrfs (experimental)" option to the Installation Destination screen, and then feature that in Fedora Magazine and schedule a number of test days?
To be clear, I'm not suggesting this as a blocking tactic. The assumption would be that we'd go ahead with flipping the defaults (as you say above) for F34 unless the results come back in a way that gives us pause.
This is pretty much exactly how I would like this to happen. It has a schedule so it doesn't just slip while still being as cautious as one should be about fs changes. The only way that would make it even better is a clear definition of what severity of problem is needed to not implement as default in F34 and what happens then. This to avoid the inevitable discussion before F34. With this plan I have no problems.
I like this approach, a lot. I'm all in favour of switching to btrfs (I've been using it for a while, on server & desktop), and I think this would be a safe approach to do so.
Christopher
On 01.07.20 20:24, Matthew Miller wrote:
On Wed, Jul 01, 2020 at 06:54:02AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
I'm leaning towards recommending this as well. I feel like we don't have good data to make a decision on -- the work that Red Hat did previously when making a decision was 1) years ago and 2) server-focused, and the Facebook production usage is encouraging but also not the same use case. I'm particularly concerned about metadata corruption fragility as noted in the Usenix paper. (It'd be nice if we could do something about that!)
Given the number of Fedora desktop users, even an increase of 0.1% in now-I-can't-boot situations would be a catastrophe. Is that a risk? I literally don't know. Maybe it's not -- but we've worked hard to get Fedora a reputation of being problem-free and something that leads without being "bleeding edge". It's a tricky balance.
Normally we just switch the default or we don't, without half measures. But the fs is important enough and complicated enough to be extra careful about any transitions.
Exactly.
Maybe we could add an "Automatically configure with btrfs (experimental)" option to the Installation Destination screen, and then feature that in Fedora Magazine and schedule a number of test days?
To be clear, I'm not suggesting this as a blocking tactic. The assumption would be that we'd go ahead with flipping the defaults (as you say above) for F34 unless the results come back in a way that gives us pause.
On 7/1/20 2:24 PM, Matthew Miller wrote:
On Wed, Jul 01, 2020 at 06:54:02AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34 could be good option. I know technically it is already opt-in, but it's not very visible or popular. We could make the btrfs option more prominent and ask people to pick it if they are ready to handle potential fallout.
I'm leaning towards recommending this as well. I feel like we don't have good data to make a decision on -- the work that Red Hat did previously when making a decision was 1) years ago and 2) server-focused, and the Facebook production usage is encouraging but also not the same use case. I'm particularly concerned about metadata corruption fragility as noted in the Usenix paper. (It'd be nice if we could do something about that!)
There's only so much we can do about this. I've sent up patches to ignore failed global trees to allow users to more easily recover data in case of corruption in the case of global trees, but as they say if only 1 bit is off in a node, we throw the whole node away. And throwing a node away means you lose access to any of its children, which could be a large chunk of the file system.
This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but this is just the reality of using checksums. It's a checksum, not ECC. We don't know _which_ bits are fucked, we just know somethings fucked, so we throw it all away. If you have RAID or DUP then we go read the other copy, and fix the broken copy if we find a good copy. If we don't, well then there's nothing really we can do.
As for their complaint about DIR_INDEX vs DIR_ITEM recovery, that's been around for a while now. A lot of these things have been added over the last year.
Another thing to keep in mind is that fsck is _very_ conservative for a reason. It's only job is to get the fs back to the point that it can be mounted, it has no knowledge of what data is important and which is not. So by default it doesn't do much, because we want the user to be able to use the rescue tools to pull off any data they can before they run repair. Because it's possible that fsck decides to delete problematic entries, and maybe those entries are to data you cared about.
I've stated this many times before, btrfs is more vulnerable to things going wrong. It's also more likely to notice things going wrong. There's things we can do to make it easier in the face of these issues, they're patches I've written and submitted in the last few days. There's bigger, more complex things that I can do to make us more resilient in the face of these corruptions. But even with all of the things I have in my head, I could still go do one or two things and render the file system unusable. Would these things happen in practice? Unlikely. Is it impossible? Unfortunately no. Thanks,
Josef
On 7/1/20 3:50 PM, Josef Bacik wrote:
This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but this is just the reality of using checksums. It's a checksum, not ECC.
Yes, exactly---why isn't it ECC? Wouldn't it work better, especially in the context of faulty hardware?
I do realize it would require changing the on-disk format, and maybe slow the critical path...
On Wed, Jul 1, 2020, at 9:03 PM, Przemek Klosowski via devel wrote:
On 7/1/20 3:50 PM, Josef Bacik wrote:
This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but this is just the reality of using checksums. It's a checksum, not ECC.
Yes, exactly---why isn't it ECC? Wouldn't it work better, especially in the context of faulty hardware?
I do realize it would require changing the on-disk format, and maybe slow the critical path...
Or maybe make all metadata raid 1, even on single disk set up?
V/r, James Cassell
On Wed, Jul 1, 2020 at 9:27 PM James Cassell fedoraproject@cyberpear.com wrote:
On Wed, Jul 1, 2020, at 9:03 PM, Przemek Klosowski via devel wrote:
On 7/1/20 3:50 PM, Josef Bacik wrote:
This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but this is just the reality of using checksums. It's a checksum, not ECC.
Yes, exactly---why isn't it ECC? Wouldn't it work better, especially in the context of faulty hardware?
I do realize it would require changing the on-disk format, and maybe slow the critical path...
Or maybe make all metadata raid 1, even on single disk set up?
Not that isn't interesting, but what would be the mirror target on a single disk setup?
On Wed, Jul 1, 2020, at 9:43 PM, Neal Gompa wrote:
On Wed, Jul 1, 2020 at 9:27 PM James Cassell fedoraproject@cyberpear.com wrote:
On Wed, Jul 1, 2020, at 9:03 PM, Przemek Klosowski via devel wrote:
On 7/1/20 3:50 PM, Josef Bacik wrote:
This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but this is just the reality of using checksums. It's a checksum, not ECC.
Yes, exactly---why isn't it ECC? Wouldn't it work better, especially in the context of faulty hardware?
I do realize it would require changing the on-disk format, and maybe slow the critical path...
Or maybe make all metadata raid 1, even on single disk set up?
Not that isn't interesting, but what would be the mirror target on a single disk setup?
The idea is that the second copy of metadata on the same disk might be readable in case the first copy has a checksum error, in case of fault hardware. I haven't tried it, but I'd gladly give up a little space for more robustness, especially if btrfs is sensitive to metadata corruption by the hardware. If btrfs demands a separate device for raid1 metadata, I wonder if a small 1G partition could be dedicated for purely mirrored metadata use.
V/r, James Cassell
On Wed, Jul 1, 2020 at 8:24 PM James Cassell fedoraproject@cyberpear.com wrote:
On Wed, Jul 1, 2020, at 9:43 PM, Neal Gompa wrote:
On Wed, Jul 1, 2020 at 9:27 PM James Cassell
Or maybe make all metadata raid 1, even on single disk set up?
Not that isn't interesting, but what would be the mirror target on a single disk setup?
The idea is that the second copy of metadata on the same disk might be readable in case the first copy has a checksum error, in case of fault hardware. I haven't tried it, but I'd gladly give up a little space for more robustness, especially if btrfs is sensitive to metadata corruption by the hardware. If btrfs demands a separate device for raid1 metadata, I wonder if a small 1G partition could be dedicated for purely mirrored metadata use.
This is called 'dup' profile in Btrfs. Two copies of a block group. It can be set on metadata only, or both metadata and data block groups. It is the default mkfs option for HDDs. It is not enabled by default on SSDs because concurrent writes of metadata i.e. they happen essentially at the exact same time, means the data is likely to end up on the same erase block, and typical corruptions affect the whole block so it's widely considered to be pointless to use dup on flash media. You can use it anyway, either with mkfs, or by converting the block group from the single profile to dup. This is a safe procedure.
On Wed, Jul 1, 2020 at 11:25 PM Chris Murphy lists@colorremedies.com wrote:
This is called 'dup' profile in Btrfs. Two copies of a block group.
Two copies of a block group ^on the same drive.
Chris Murphy wrote on Wed, Jul 01, 2020:
This is called 'dup' profile in Btrfs. Two copies of a block group. It can be set on metadata only, or both metadata and data block groups. It is the default mkfs option for HDDs. It is not enabled by default on SSDs because concurrent writes of metadata i.e. they happen essentially at the exact same time, means the data is likely to end up on the same erase block, and typical corruptions affect the whole block so it's widely considered to be pointless to use dup on flash media. You can use it anyway, either with mkfs, or by converting the block group from the single profile to dup. This is a safe procedure.
Does anyone know if anything in the nvme spec says that creating two namespaces should or could prevent coalescing IO like this? perhaps is the blocksize is different?
(this doesn't really help with default setup case, but it could make sense to split the disk in two with data single + metadata raid1 over a nvme namespace for people who can bother creating one. Unfortunately nvme namespaces are rather messy and I don't think autopartitionning tools should mess with that, but having a raid just for metadata is one of btrfs' strength so it's a shame to pass on it... Alternatively it would require something like async copyback of the second metadata copy but that in itself has a lot of other problems and don't really look like an option)
Once upon a time, Josef Bacik josef@toxicpanda.com said:
This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but this is just the reality of using checksums. It's a checksum, not ECC. We don't know _which_ bits are fucked, we just know somethings fucked, so we throw it all away. If you have RAID or DUP then we go read the other copy, and fix the broken copy if we find a good copy. If we don't, well then there's nothing really we can do.
That's where an fsck and a lost+found type directory should come into play. Maybe punt to user space, but still try to see what you can make sense of to try to salvage. If you are saying a single bit error in the wrong place can basically lop off a good chunk of a filesystem, then I'm going to say that's not an improvement in reliability.