On Tue, Jun 23, 2020 at 5:35 PM Chris Murphy
<chrismurphy(a)fedoraproject.org> wrote:
> On Tue, Jun 23, 2020 at 3:56 PM Neal Gompa <ngompa13(a)gmail.com> wrote:
>
> I personally would throw out the datacenter numbers.
The use case is different. But what they represent is sheer volume. No company is going
to put up with intrinsic problems with a file system at this scale. It's 100's of
thousands of btrfs instances, and millions of containers. If there were a per se problem,
it would be a shit show.
Consider openSUSE who have been using it for six years. And even in Fedora btrfs users at
slightly smaller scale is positive.
> Why?
> Through your whole email you keep saying "except for hardware failures"
> Datacenters tend to have lifetimes. Meaning that if a machine/disk is
> too old it get's replaced. Or at least put in the "when this dies it
> get's replaced."
> Datacenters are also much more uniform. If you 1000 machines, the
> odds are that at most you have 10 to 20 variety of machines and/or
> hard drives. One the datacenter I last worked at if you had 1000
> machines, you had 1 variety.
Yes. Although they report using consumer hardware that they know is very ordinary if not
below average. Quite a lot of datacenters don't do that.
> What Fedora, and especially desktop Fedora, has is variety. Variety
> on new, medium, and old machines.
> How many people on this list often get hand-me down machines from
> windows users and we throw Fedora on them and they work great.
> Why do they work great?
> Because most of those hardware errors, especially disk errors, are
> recoverable or not even noticed.
You've got two basic kinds: silent data corruption, and the case of unrecoverable
read (or write) error reported by the drive itself. The later should always result in an
i/o error, doesn't matter what the file system is. And there's no recovery for any
file system unless there's redundancy.
For silent data corruption, you don't actually know it's benign, you just know
it's getting to user space. What happens, depends on the corruption and the
application. It could result in a crash, or application confusion that leads to more
corruption, or maybe nothing. If it's your backup application, you're replicating
corruption into your backups silently.
It is correct that Fedora will have to assess the relative importance of catching such
corruptions early or letting them to do whatever it is that they do. Btrfs, upon detecting
data corruption, results in i/o error which should be properly handled by applications
anyway. And path to affected file is reported in kernel messages.
> Did you count how many times you said "unless there's hardware
> problems" in your email.
> That does not give me confidence.
In the hardware? Or the file system? Because just ignoring the reality of this class of
corruption is itself a choice that has consequences. And a significant consequence is, you
just don't know about it. How do you have confidence about events that you don't
know are occurring or what's happening as a result?
Thank you for the clarifications and explanations. That helped calm me down.
I will admit, I'm still nervous, and will try it out on machines I can
easily wipe clean again. But I did that when XFS was becoming default
as well.
I still hope that someone will talk to whomever the right person is to
get btrfs back into RHEL.
Troy