On 8/23/19 1:10 PM, Neal Gompa wrote:
On Fri, Aug 23, 2019 at 3:48 PM Justin Forbes jmforbes@linuxtx.org wrote:
On Fri, Aug 23, 2019 at 2:17 PM Adam Williamson adamwill@fedoraproject.org wrote:
Hey folks!
So, there was recently a Thing where btrfs installs were broken, and this got accepted as a release blocker:
https://bugzilla.redhat.com/show_bug.cgi?id=1733388
The bug was fixed, so that's fine, but along the way, Laura said this:
"I'm strongly against anything with btrfs being a blocker. If that's in the criteria I think we should see about removing btrfs simply because we don't have the resources to actually deal with btrfs besides reporting bugs upstream."
and Justin followed up with:
"Agreed, btrfs has been a gamble pretty much always. See previous discussion around proposals to make btrfs default. Ext4 and xfs should be the only release blocking."
So, that's the whole kernel team 'strongly against' blocking on btrfs. Which means we should talk about not doing that any more!
This is a bit complicated, though, because of how the Final criteria are phrased. Basic does not include btrfs at all, and Beta includes a laundry list we can just remove btrfs from:
"When using both the installer-native and the blivet-gui-based custom partitioning flow, the installer must be able to:
- Correctly interpret, and modify as described below, any disk with a
valid ms-dos or gpt disk label and partition table containing ext4 partitions, LVM and/or btrfs volumes, and/or software RAID arrays at RAID levels 0, 1 and 5 containing ext4 partitions
- Create mount points backed by ext4 partitions, LVM volumes or btrfs
volumes, or software RAID arrays at RAID levels 0, 1 and 5 containing ext4 partitions ..."
so those two are easy. However, the Final criterion is not laundry list-style. The relevant Final criterion is this:
"The installer must be able to create and install to any workable partition layout using any file system and/or container format combination offered in a default installer configuration."
with a somewhat apologetic explanatory footnote:
"Wait, what? Yeah, we know. This is a huge catch-all criterion and it's subject to a lot of on-the-fly interpretation. Broadly what it's 'meant to mean' is that you should be able to do anything sane that the Installation Destination spoke attempts to let you do, without the installer exploding or failing. We are trying to write more specific criteria covering this area, but it's not easy. Patches welcome, as the kids say..."
so as the footnote says, the rule is basically "if the installer lets you do it, it ought to work". It seems a bit awkward to craft an exception for btrfs from that. I mean, technically it's easy:
"The installer must be able to create and install to any workable partition layout using any file system and/or container format combination offered in a default installer configuration, except btrfs."
but that's odd. Why is btrfs, alone, an exception? It kinda goes against the fundamental idea of the criterion: that we stand behind everything the UI offers.
All of this, the criteria, and the UI support for btrfs are from the many years old proposal to make btrfs the default filesystem. In the beginning, it was not ready, but did show promise. This proposal came up for several releases in a row, and at the end of it, even the upstream developers recommended against it. At this point, it is safe to say that btrfs will not be the default. Since that time, things have not gotten better. Yes, there is active btrfs development upstream. It is fairly narrowly focused, and not something we can rely upon for a supported default among the Fedora use cases.
Getting btrfs in Fedora to be in a state where it *could* be the default is something I am working towards. However, it is *very* hard when people keep shutting down discussions that I try to have about enablement related to it. The situation with btrfs today is many orders of magnitude better than before, and yet I've mostly been improving Btrfs support in Fedora in tiny ways because the bigger things to do (improving kickstart, Anaconda, etc.) are impossible due to how difficult it is to contribute to those projects.
The *only* remaining "major" issue in Btrfs itself is the RAID 5/6 feature, which does not provide write hole protection without additional work (similar to mdraid). There was some work last year by David Sterba to rework the the RAID code for the SUSE Hackweek 17, but it has not been completed yet. Some work was done again to try to land this for the 5.3 cycle, but some last minute issues got that postponed. It's definitely on the radar to fix, though.
I've been watching and using Btrfs since May of 2015, and the development has drastically improved. I know for a fact no one has asked the upstream developers in at least the last two years, because I've gotten "cautionary" recommendations that it'd be okay to do so since early last year, and last week I've gotten much more enthusiastic responses when I met some of the Facebook folks (like David, who I've CC'd to this email).
The question I have is what things do we need to target to make Btrfs better for Fedora? I've already got some work done for boot to snapshot support, and I'm looking at how to adapt that for the BLS work being done today.
We need a person to respond to bug reports and deal with them in a timely fashion.
While Fedora does enable it in the kernel, and plans to continue doing so, it is enabled in the "if you break it, you get to keep the pieces" method of many other options. Sure, we will be happy to bring in a patch that is headed upstream if it fixes a bug, and someone points us to it. No, we aren't going to spend time debugging issues with it ourselves. There is no shortage of issues in more "core" kernel pieces that require attention.
Would it help if we could get a kernel engineer who works on Btrfs to join the Fedora kernel team to help with Btrfs-specific issues? If the issue is the inability to work on that code, then getting one who does would help, right?
I don't think we need someone to join the team per se. All we need is someone who we can assign bugs to and have them work through the issues, whether that's development or working with upstream to test. We have a fedora-btrfs bug alias and we can add whoever we want on here.
I'm okay with keeping btrfs alive if there's enough of a community who is willing to actually fix bugs and work through the issues. We do this with other parts of the kernel too.
So...what should we do? Here are the options as I see 'em:
- Keep supporting btrfs
- Just modify the criterion with a btrfs exception, even if it's weird
- Rewrite the criterion entirely
- Keep btrfs support in the installer (and blivet-gui) but hide it as
we used to - require a special boot argument for it to be visible 5. Drop btrfs support from the installer
I would opt for 4 or 5, and would be in full support of 5. I do not think that it can (or should) be dropped from the kernel, because we don't want to cut off existing users, and it can still be a useful filesystem for specific cases.
I would prefer option 1, provided that people stop shutting down discussions for improving support for it. As the btrfs-progs maintainer, I'm also trying to improve the quality of btrfs support in Fedora. Even if I'm mostly doing it alone right now, I'm hoping to have that change soon.
I think 3-5 are the best options right now with a focus on having btrfs be available but not "supported". If we had a group of people who were willing to actively debug issues like the one Adam reported, I'd be okay with #1.
Thanks, Laura