On Fri, Jun 26, 2020 at 3:44 PM Matthew Miller <mattdm(a)fedoraproject.org> wrote:
On Fri, Jun 26, 2020 at 03:22:07PM -0400, Josef Bacik wrote:
> I described this case to the working group last week, because it hit
> us in production this winter. Somebody screwed up and suddenly
> pushed 2 extra copies of the whole website to everybody's VM. The
> website is mostly metadata, because of the inline extents, so it
> exhausted everybody's metadata space. Tens of thousands of machines
> affected. Of those machines I had to hand boot and run balance on
> ~20 of them to get them back. The rest could run balance from the
> automation and recover cleanly.
Is there a way to mitigate this by reserving space or setting quotas? Users
running out of space on their laptops because:
* they downloaded a lot of media
* they created huge vms
* some sort of horrible log thing gone awry
are pretty common in both a) my anecdotal experience helping people
professionally and personally and b) um, me.
Real out of space can happen on any file system. Bogus enospc on btrfs
due to edge cases hitting bugs are less common than real enospc due to
the current partitioning arrangement creating competition between
/home and / free space - which won't exist with btrfs. I expect a net
reduction of out of space as a result of the change.
There is a reserve in btrfs to help make sure if you do get to a real
out of space condition, that the file system (a) stays read write and
(b) can be backed out of the full condition by deleting files and
successfully freeing up space. Edge cases where this doesn't work are
bugs, and there are some non-obvious ways to back out of it if someone
does hit one.
The old stories on #btrfs and linux-btrfs@ do include cases of a file
system that goes read-only, can't be remounted read-write, and you
have to backup->reformat->restore. And that is a PITA. But also not