BTRFS: The Good, The Bad and The Ugly

Josef Bacik josef at toxicpanda.com
Wed Jul 13 21:14:05 UTC 2011


On Wed, Jul 13, 2011 at 4:53 PM, Manuel Escudero <Jmlevick at gmail.com> wrote:
> Today I'll be switching from BTRFS to Ext4 again because of the troubles
> I've been having with
> the New Linux Filesystem. As BTRFS is going to be the Default in F16 I
> wanted the developers to
> know what kind of troubles I've been experiencing with this FS in F15 so
> they can take a look
> at them in order to have a better F16 release:
> The Good:
> Since BTRFS arrived into my computer (Everything in the HDD is formated with
> BTRFS excluding "/boot")
> I've seen a performance improvement in the data transfer part from and to
> the computer (copying files seem to
> be faster than before) But that's all about the good things I noticed...
> The Bad:
> BTRFS has reduced system's overall performance, at this point, sometimes it
> is OK, sometimes it is
> VERY BAD, I've noticed "Performance Peaks" in F15 with BTRFS and the Boot
> times are not nice: I mean,
> they are not the slowest ones, but they're not as good as Before in F14 with
> Ext4 instead of BTRFS.
> The performance Running/Launching apps has been afected too and now the PC
> freezes sometimes (that never
> happened in F14 unless I forced it a lot with 4 VM's to suck the 4GB of RAM
> I have); And Now it freezes
> very often when it wants without a lot of effort.
> The Ugly:
> Running VM's when having their virtual HDD's stored in a BTRFS partition is
> DEATH!
> They're very slow, sometimes they open, sometimes they not, usually they
> freeze, You can't
> work with them. Same thing about Gnome Shell working over a BTRFS partition:
> it is really slow,
> sometimes it reacts but most of the time is pretty unresponsive.
> Reading in the Web, I found that some users think that the BTRFS poor
> performance is caused by some
> special kind of fragmentation it suffers, others think it's because of it's
> CopyonWrite attributes and some
> others blame other stuff, God Knows! the only thing I know is that BTRFS is
> not ready for being
> used in normal production machines (as I tought) and it needs to be fixed
> before the release of F16, because it's
> performance is really far from good...
> Other Stuff I noticed is that with Kernel 2.6.38.8-35 the system seems to
> work better that with the previous one,
> just a little, but is some kind of improvement.
> Here you have all the info I found on the net about BTRFS Performance
> issues noticed by users:
> https://bugzilla.redhat.com/show_bug.cgi?id=689127
> http://arosenfeld.wordpress.com/2010/12/27/back-to-ext4-from-btrfs/
> http://www.vyatta4people.org/btrfs-is-a-bad-choice-when-running-kvm/
> http://lkml.org/lkml/2010/7/13/475
> http://blog.patshead.com/2011/03/btrfs---six-months-later.html
> I only have a question:
> Why Any Kind of VM is Sooo Slow when being stored on a BTRFS
> partition? Any Way to Solve this? or at least have a BTRFS performance
> improvement?

Yeah VMs are a particular problem with Btrfs.  There are a ton of
reasons for this, for example by default we use fsync.  Fsync _sucks_
for btrfs currently, and it has historically not been a well optimized
piece of code.  I'm working on fixing this, but it requires VFS level
changes that are currently sitting in Al's queue.  I suspect they will
go into 3.1 and so we can move ahead with our work, but for now, it
sucks.  You can use cache=none you get better performance, but still
not that great.  And this is all because of one major thing

Btrfs has threads for _everything_.  This works out fantastically when
you have big chunks of reads or writes you want done.  This _sucks_
when you are doing little piddly io's.  The reason for all of this is
because we don't want you to get bottlenecked on us
calculating/verifying checksums, so we farm all IO and endio out to
different threads, which as I said works out great if you are trying
to cram gigs of data down your drives throat.

But with VMs you are doing small scattered IO's, so the IO comes down,
we prepare it, and farm it off to a thread and wait for that thread to
wake up and submit the io.  Then the io is completed and that is
farmed off to another thread and we wait on that.  This switching
around and waiting for things to wake up is hugely painful when all
you want to do is write a few bytes.  If you were to do

dd if=/dev/zero of=/mnt/btrfs/file bs=4k count=100 oflag=direct

on a btrfs fs and then do it on an ext4 fs, you would see about a 20%
difference between the 2.  But if you do say bs=20M, the gap closes
quite a bit.

I fixed part of this problem for O_DIRECT (which is cache=none with
qemu), if the IO's are small we don't send it off to a thread but
submit it within our threads context, which is what got us with 20% of
ext4 as opposed to 50%.  The other half is doing the completion in the
submitters context, which is going to take some extra work.  I'm
fixing this in the fsync case as well, but as I said we need a VFS
patch to do it properly so that will be a little later coming.  After
that I can do the endio part of it and hopefully get us within
spitting distance of ext4.

So there's my long ass explanation of why VMs on Btrfs suck.  I'm
sorry, I'm aware of the problem and I'm trying to fix it, but it's a
slow going process.

As for your other spikes, can you test an upstream kernel?  I've done
various other performance things to try and get rid of those problems
and would like to know how it helps.  If the spikes last long enough a
sysrq+w would be very helpful in seeing what is going on so we can try
and address the problems you are seeing.  Thanks,

Josef


More information about the devel mailing list