phoronix benchmarks ext4 vs. btrfs

Fri Mar 9 16:09:11 UTC 2012

On 03/09/2012 11:00, Josef Bacik wrote:
> On Fri, Mar 9, 2012 at 10:11 AM, David Quigley
> <selinux at davequigley.com> wrote:
>> On 03/09/2012 08:42, Przemek Klosowski wrote:
>>>
>>> On 03/09/2012 01:43 AM, Adam Williamson wrote:
>>>>
>>>> On Thu, 2012-03-08 at 22:19 -0700, Chris Murphy wrote:
>>>>>
>>>>> I'm not sure how useful 'time' is as a benchmark for file copies.
>>>>
>>>>
>>>> Don't file transfers get cached and return to a console as 
>>>> 'complete'
>>>> long before the data is ever written, sometimes?
>>>>
>>>> I'm pretty sure you sometimes hit the case where you copy 200MB to 
>>>> a USB
>>>> stick, it returns to the console pretty fast, but the light on the 
>>>> stick
>>>> is still flashing, and if you run 'sync', it sits there for quite 
>>>> a
>>>> while before returning to the console, indicating the transfer 
>>>> wasn't
>>>> really complete. So I'm not sure 'time'ing a 'cp' is an accurate 
>>>> test of
>>>> actual final-write-to-device.
>>>
>>>
>>> That is true---but in that case, we could flush the disks. and then
>>> time the operation followed by another flush, i.e.:
>>>
>>> sync; time (cp ...; sync)
>>>
>>> I assume that the old-time Unix superstition of calling sync three
>>> times no longer applies :)
>>>
>>> Perhaps a dedicated disk benchmark like bonnie++ would be a better
>>> test, though.
>>
>>
>>
>> If you want to look seriously into file-system benchmarking I would 
>> suggest
>> looking into the work done by the fsbench people at Stony Brook 
>> University's
>> Filesystem and Storage Lab (FSL). There is a survey paper there for 
>> the last
>> decade of FS benchmarks and their short commings and what should be
>> addressed.
>>
>>
>> http://www.fsl.cs.sunysb.edu/project-fsbench.html
>>
>
> fsbench is amazing, I also use fio and fs_mark to test various 
> things.
>  But these are artificial workloads!  These numbers don't mean a
> damned thing to anybody, the only way you know if a fs is going to
> work for you is if you run your application on a couple of fses and
> figure out which one is faster for you!  For example if you mostly
> compile kernels, btrfs is fastest.  However if you mostly use a fs 
> for
> your virt images, don't use btrfs!  It's all a matter of workloads 
> and
> no amount of benchmarking is going to be able to tell you if your pet
> workload is going to work well at all.
>
> The work that we file system developers do with benchmarking is to
> stress particular areas of our respective filesystems.  For example,
> with Dave's tests he was testing our ability to scale as the amount 
> of
> metadata gets ridiculously huge.  He has exposed real problems that 
> we
> are working on fixing.  However these real problems are things that I
> imagine 99% of you will never run into, and therefore should not be
> what you use to base your decisions on.
>
> So let's try to remember that benchmarks mean next to nothing to real
> users, unless watching iozone output happens to be what you use your
> computer for.  Thanks,
>
> Josef

True fsbench can be used for micro benchmarking but if you read the 
paper on that page it also goes over the bechmarking suites that are 
supposed to provide real world workloads as well. Copying files isn't 
much more complex than a couple micro benchmarks. It really is only 
testing read/write performance. If you want to do real performance 
testing like you said you need to be running real world workloads. The 
database benchmarks in the paper cover some of them but also provides 
criticism about the nature of the workloads. The cool thing about the 
projects on those pages is that they allow you to capture the same 
workload and replay them on different filesystems. You can hope you get 
the same workload twice across two runs or you can capture the workload 
with tracefs and replay it with replayfs. Now this does introduce some 
overhead as these are stackable filesystems so they are going to add an 
additional thin vfs like layer to your analysis but if both of the 
filesystems have that you can factor that overhead out and get 
performance data for each filesystem individually.

Dave