On Sat, Jun 27, 2020 at 7:58 AM Peter Robinson <pbrobinson(a)gmail.com> wrote:
> I've been very clear from the outset that Facebook's fault tolerance is
> higher than the average Fedora user. The only reason I've agreed to assist in
> answering questions and support this proposal is because I have multi-year data
> that shows our failure rates are the same that we see on every other file
> system, which is basically the failure rate of the disks themselves.
> And I specifically point out the hardware that we use that most closely reflects
> the drives that an average Fedora user is going to have. We of course have a
> very wide variety of hardware. In fact the very first thing we deployed on were
> these expensive hardware RAID setups. Btrfs found bugs in that firmware that
> was silently corrupting data. These corruptions had been corrupting AI test
> data for years under XFS, and Btrfs found it in a matter of days because of our
> We use all sorts of hardware, and have all sorts of similar stories like this.
> I agree that the hardware is going to be muuuuuch more varied with Fedora users,
> and that Facebook has muuuuch higher fault tolerance. But higher production
> failures inside FB means more engineering time spent dealing with those
> failures, which translates to lost productivity. If btrfs was causing us to run
> around fixing it all the time then we wouldn't deploy it. The fact is that
> not, it's perfectly stable from our perspective. Thanks,
Thanks for the details, you have any data/information/opinions on non
x86 architectures such as aarch64/armv7/ppc64le all of which have
supported desktops too?
Sample size of 1: Raspberry Pi Zero running Arch for ~ a year. I use
mount option -o compress=zstd:1. I haven't benchmarked it, it's a Pi
Zero so it's slow no matter what file system is used. But anecdotally
I can't tell a difference enough to even speculate.
This is a bit of an overly verbose mess, but the take away is that at
least for /usr I'm saving about 41%. Space and writes.
$ sudo compsize /usr
Processed 48038 files, 28473 regular extents (28757 refs), 25825 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 59% 879M 1.4G 1.4G
none 100% 435M 435M 435M
lzo 54% 153M 281M 287M
zstd 37% 289M 767M 786M
I could instead selectively compress just certain directories or
files, using an XATTR (there is a btrfs command for setting it).
Compression can aslo be applied after the fact by defragmenting with a
I think the reduction in write amplification in this use case is
significant because SD cards are just so impressively terrible. I have
only ever seen them return garbage rather than the device itself
admit a read error (UNC read error), and btrfs will catch that. I
seriously would only ever use btrfs for this. I might consider another
file system if I were using industrial SD cards, but *shrug* in that
case I'd probably spend a bit more time benchmarking things and seeing
if i can squeak out a bit more performance from lzo or zstd:1 on reads
due to a reduction in IO latency. Because SLC is going to be slower
than TLC or anything else.
I don't know much about eMMC media, but if it's a permanent resident
on the board, all the more reason I'd use btrfs and compress
everything. I *might* even consider changing the compression level to
something more aggressive for updates because the performance
limitation isn't the compression hit, but rather the internet
bandwidth. This is as simple as 'mount -o remount,compress=zstd:9 /'
and then do the update - and upon reboot it's still zstd:1 or whatever
is in fstab/systemd mount unit. A future feature might be to add level
to the existing XATTR method of setting compression per dir or per
file. So you could indicate things like "always use heavier
compression" for specific dirs.