I've been very clear from the outset that Facebook's fault
tolerance is much
higher than the average Fedora user. The only reason I've agreed to assist in
answering questions and support this proposal is because I have multi-year data
that shows our failure rates are the same that we see on every other file
system, which is basically the failure rate of the disks themselves.
And I specifically point out the hardware that we use that most closely reflects
the drives that an average Fedora user is going to have. We of course have a
very wide variety of hardware. In fact the very first thing we deployed on were
these expensive hardware RAID setups. Btrfs found bugs in that firmware that
was silently corrupting data. These corruptions had been corrupting AI test
data for years under XFS, and Btrfs found it in a matter of days because of our
We use all sorts of hardware, and have all sorts of similar stories like this.
I agree that the hardware is going to be muuuuuch more varied with Fedora users,
and that Facebook has muuuuch higher fault tolerance. But higher production
failures inside FB means more engineering time spent dealing with those
failures, which translates to lost productivity. If btrfs was causing us to run
around fixing it all the time then we wouldn't deploy it. The fact is that it's
not, it's perfectly stable from our perspective. Thanks,
Thanks for the details, you have any data/information/opinions on non
x86 architectures such as aarch64/armv7/ppc64le all of which have
supported desktops too?