On 6/29/20 2:23 PM, Eric Sandeen wrote:
On 6/29/20 8:39 AM, Josef Bacik wrote:
> On 6/29/20 5:33 AM, Florian Weimer wrote:
>> * Josef Bacik:
>>
>>> That being said I can make btrfs look really stupid on some workloads.
>>> There's going to be cases where Btrfs isn't awesome. We still use
xfs
>>> for all our storage related tiers (think databases). Performance is
>>> always going to be workload dependent, and Btrfs has built in overhead
>>> out the gate because of checksumming and the fact that we generate far
>>> more metadata.
>>
>> Just to be clear here, the choice of XFS here is purely based on
>> performance, not on the reliability of the file systems, right?
>> (So it's not “all the really important data is stored in XFS”.)
>>
>
> Yes that's correct. At our scale everything falls over, including XFS, and as
I've stated elsewhere in this thread we actually see a higher rate of failure
(relative to the install size) with XFS. The databases we use already do all of the fancy
things that btrfs does in the application. If we could get away with it we'd just use
raw disks for those applications. and in fact may do that in the future. Thanks,
Josef, with my XFS hat on, are these recent failures? Have they
all been reported to the XFS list?
It makes sense to look at reliability in the context of this thread, but
offering "btrfs fails less often than XFS for us" without any context
(what kind of failure, what kernel, when, etc) doesn't help much, it's
just more anecdotes.
Yup this is why I try to avoid talking about other file systems. This shouldn't
be interpreted as "XFS drools, btrfs rules!", just that in our own environment,
btrfs does not fail at any significant rate higher than xfs.
Xfs is used in completely different workloads, and with completely different
(much better) hardware.
And the reason they haven't been brought up to the list is because it fails at
such a low rate that I didn't even realize we were having xfs reprovisions until
I went and looked at the data. So far of the 15 machines that fell over, 10 of
them appear to be hardware related. The other 5 have logs that are in a
different database that take longer to pull out. Thanks,
Josef