On Mon, Jun 29, 2020 at 7:55 AM Solomon Peachy <pizza(a)shaftnet.org> wrote:
On Mon, Jun 29, 2020 at 11:33:40AM +0200, Florian Weimer wrote:
> Just to be clear here, the choice of XFS here is purely based on
> performance, not on the reliability of the file systems, right?
> (So it's not “all the really important data is stored in XFS”.)
Be careful about overloading quite a few definitions into the single
You seem to be referring to btrfs features like file checksumming that
can detect silent corruption, and automagically fix things if you've
enabled the equally automagic RAID1-like features. (Which, for the
record, I think are really frickin' awesome!)
But what good is btrfs' attestation of file integrity when it craps
itself to the point where it doesn't even know those files even _exist_
You've got an example where 'btrfs restore' saw no files at all? And
you think it's the file system rather than the hardware, why?
I think this is the wrong metaphor because it suggests btrfs caused
the crapping. The sequence is: btrfs does the right thing, drive
firmware craps itself and there's a power failure or a crash. Btrfs in
the ordinary case doesn't care and boots without complaint. In the far
less common case some critical node just happened to get nerfed and
there's no way to automatically recover. The user is left on an
island. This part should get better anyway, even though it can happen
with any file system.
And as a community we need the user to user support to make sure folks
aren't left on an island - can we do that? This is the question. It
really is a community question more than it is a technology question.
How can we brag about robustness in the face of cosmic rays or
recovery from the power cord getting yanked when it couldn't reliably
_remount_ a lightly-used, cleanly unmounted filesystem?
Come on. It's cleanly unmounted and doesn't mount?
I guess you missed the other emails about dm-log-writes and xfstests,
but they directly relate here. Josef relayed that all of his deep
dives into Btrfs failures since the dm-log-writes work, have all been
traced back to hardware doing the wrong thing.
All file systems have write ordering expectations. If the hardware
doesn't honor that, it's trouble if there's a crash. What you're
describing is 100% a hardware crapped itself case. You said it cleanly
unmounted i.e. the exact correct write ordering did happen. And yet
the file system can't be mounted again. That's a hardware failure.
I realize this is several-years-out-of-date anectdata, but it's
of thing that has given btrfs (quite deservedly) a very bad reputation.
The frustration and skepticism are palpable. But here is the problem
with the road you're going down: you are arguing in favor of closed
door development practices. Keep all the scary early development out
of public scrutiny, as a form of messaging control, so that reputation
isn't damaged by knowing about all the sausage making.
The point here is not F32 vs F33 or whatever, but that of _time_ --
don't think there's enough time between now and the F33 go/no point for
folks like me to set up and sufficiently burn-in F32 btrfs systems to
gain confidence that btrfs is indeed ready. (In any case, the
traditional beta period is _way_ too short for something like this!)
There is no way for one person to determine if Btrfs is ready. That's
done by combination of synthetic tests (xfstests) and volume
regression testing on actual workloads. And by the way the Red Hat CKI
project is going to help run btrfs xfstests for Fedora kernels.
The questions are whether the Fedora community wants and is ready for
Btrfs by default.