* Steven Whitehouse:
On 27/06/2020 11:00, Florian Weimer wrote:
> * Josef Bacik:
>> As for your ENOSPC issue, I've made improvements on that area. I
>> see this in production as well, I have monitoring in place to deal
>> with the machine before it gets to this point. That being said if
>> you run the box out of metadata space things get tricky to fix.
>> I've been working my way down the list of issues in this area for
>> years, this last go around of patches I sent were in these corner
> Is there anything we need to do in userspace to improve the behavior
> of fflush and similar interfaces?
> This is not strictly a btrfs issue: Some of us are worried about
> scenarios where the write system call succeeds and the data never
> makes it to storage *without a catastrophic failure*. (I do not
> consider running out of disk space a catastrophic failure.) NFS
> apparently has this property, and you have to call fsync or close the
> descriptor to detect this. fsync is not desirable due to its
> performance impact.
It doesn't matter which filesystem you use, you can't be sure that the
data is really safe on disk without calling fsync. In the case of a
new inode, that means fsync on the file and on the containing
In my opinion, there is a conceptual difference between the machine or
storage crashing hard, and just running out of disk space.
There can be performance issues depending on how that is done,
there are a number of solutions to those issues which can reduce the
performance effects to the point where they are usually no longer a
problem. That is with the caveat that slow storage will always be
slow, of course!
The usual tricks are to avoid doing lots of small fsyncs, by gathering
up smaller files, ideally sorting them into inode number order for
local filesystems, and then issuing fsyncs asynchronously, waiting for
them all only once all the fsyncs have been issued. Also
fadvise/madvise can be useful in these situations too,
None of this applies to shell utilities such as grep and cat. They work
around data loss as a result of the write system call not reporting
ENOSPC errors: they close stdout and stderr underneath glibc, which
leads to a different class of problems. It turns out that on Linux,
close does more space checks than write, so this allows the shell
utilities to check for ENOSPC without issuing fsyncs. At present, lack
of space checks from write seems to primarily happen with NFS.
So let me rephrase: Does btrfs report ENOSPC during write? If it does
not, what can we do to check for sufficient space during fflush and
If we change the shell utilities to do an fsync on close, we get
traditional UNIX behavior with traditional UNIX performance. I don't
think that's what people want.