On 6/26/20 2:58 PM, James Szinger wrote:
On Fri, 26 Jun 2020 12:30:02 -0500
Chris Adams <linux(a)cmadams.net> wrote:
> So... I freely admit I have not looked closely at btrfs in some time,
> so I could be out of date (and my apologies if so). One issue that I
> have seen mentioned as an issue within the last week is still the
> problem of running out of space when it still looks like there's
> space free. I didn't read the responses, so not sure of the
> resolution, but I remember that being a "thing" with btrfs. Is that
> still the case? What are the causes, and if so, how can we keep from
> getting a lot of the same question on mailing lists/forums/etc.?
Yes, it happened to me last week. The workstation has been upgraded
since F25 and is now at F31. A yum update last week ran a restorecon
-r / which filled up the filesystem and RAM and swap. The 460 GB
filesystem had about 140GB of real data, 100 GB of data bloat from
underfull blocks, and the rest (200GB) was metadata. I had to boot
from a live USB and run btrfs balance to free up the bloat. I expect
to reformat it to ext4 when the quarantine is over.
This is my last BTRFS filesystem. One was on a laptop hard disk that
was painfully slow, especially when compared with it's ext4 twin
sitting next to it. It was reformatted to ext4. I also had a BTRFS
RAID 0 hard disk array. It was also slow and also ended up needing
rescue. I converted it over to xfs on MD raid and it's been faster
and perfectly reliable ever since.
While I like subvolumes and snapshots, I find the maintenance,
reliability, and performance overhead to be not worth it.
Generally speaking btrfs performance has been the same if not better for our
workloads. This is millions of boxes with thousands of different workloads and
That being said I can make btrfs look really stupid on some workloads. There's
going to be cases where Btrfs isn't awesome. We still use xfs for all our
storage related tiers (think databases). Performance is always going to be
workload dependent, and Btrfs has built in overhead out the gate because of
checksumming and the fact that we generate far more metadata.
As for your ENOSPC issue, I've made improvements on that area. I see this in
production as well, I have monitoring in place to deal with the machine before
it gets to this point. That being said if you run the box out of metadata space
things get tricky to fix. I've been working my way down the list of issues in
this area for years, this last go around of patches I sent were in these corner
I described this case to the working group last week, because it hit us in
production this winter. Somebody screwed up and suddenly pushed 2 extra copies
of the whole website to everybody's VM. The website is mostly metadata, because
of the inline extents, so it exhausted everybody's metadata space. Tens of
thousands of machines affected. Of those machines I had to hand boot and run
balance on ~20 of them to get them back. The rest could run balance from the
automation and recover cleanly.
It's a shit user experience, and its a shitty corner case that still needs work.
It's a top priority of mine. Thanks,