[ANNOUNCE] bfq disk I/O scheduler

Paolo Valente posta_paolo at yahoo.it
Wed Aug 4 23:14:25 UTC 2010


Vivek Goyal ha scritto:
> On Wed, Aug 04, 2010 at 02:44:21PM -0400, Ric Wheeler wrote:
>   
>> On 08/04/2010 01:58 PM, Paolo Valente wrote:
>>     
>>> Hi,
>>> I have been working for a few years (with Fabio Checconi) on a disk
>>> scheduler providing definitely lower latencies than cfq, as well as a
>>> higher throughput with most of the test workloads we used (or the same
>>> throughput as cfq with the other workloads). We named this scheduler
>>> bfq (budget fair queueing). I hope this is the right list for announcing
>>> this work.
>>>
>>> One of the things we measured in our tests is the cold-cache execution
>>> time of a command as, e.g., "bash -c exit", "xterm /bin/true" or
>>> "konsole -e /bin/true", while the disk was also accessed by different
>>> combinations of sequential, or random, readers and/or
>>> writers. Depending on which of these background workloads was used,
>>> these execution times were five to nine times lower with bfq under
>>> 2.6.32. Under 2.6.35 they were instead from six to fourteen times
>>> lower. The highest price paid for these lower latencies was a 20% loss
>>> of aggregated disk throughput for konsole in case of background
>>> workloads made only of sequential requests (due to the fact that bfq
>>> of course privileges, more than cfq, the seeky IO needed to load
>>> konsole and its dependencies). In contrast, with shorter commands, as
>>> bash or xterm, bfq also provided up to 30% higher aggregated
>>> throughput.
>>>
>>> We saw from 15% to 30% higher aggregated throughput also in our
>>> only-aggregated-throughput tests. You can find in [1] all the details
>>> on our tests and on other nice features of bfq, such as the fact that
>>> it perfectly distributes the disk throughput as desired, independently
>>> of disk physical parameters like, e.g., ZBR. in [1] you can also find
>>> a detailed description of bfq and a short report on the maturity level
>>> of the code (TODO list), plus all the scripts used for the tests.
>>>
>>> The results I mentioned so far have been achieved with the last
>>> version of bfq, released about two months ago as patchsets for 2.6.33
>>> or 2.6.34. From a few days a patchset for 2.6.35 is available too, as
>>> well as a backport to 2.6.32. The latter has been prepared by Mauro
>>> Andreolini, who also helped me a lot with debugging. All these patches
>>> can be found here [2]. Mauro also built a binary kernel package for
>>> current lucid, and hosted it into a PPA, which can be found here [3].
>>>
>>> A few days after being released, this version of bfq has been
>>> introduced as the default disk scheduler in the Zen Kernel. It has
>>> been adopted as the default disk scheduler in Gentoo Linux too. I
>>> also recorded downloads from users with other distributions, as, e.g.,
>>> Ubuntu and ArchLinux. As of now we received only positive feedbacks
>>>       
>> >from the users.
>>     
>>> Paolo
>>>
>>> [1] http://algo.ing.unimo.it/people/paolo/disk_sched/
>>> [2] http://algo.ing.unimo.it/people/paolo/disk_sched/sources.php
>>> [3] Ubuntu PPA: ppa:mauro-andreolini/ubuntu-kernel-bfq
>>>
>>>       
>> Hi Paolo,
>>
>> Have you tried to post this to the upstream developers of CFQ and IO schedulers?
>>
>>     
>
> Hi Paolo,
>   
Hi Vivek :)
> Last time when BFQ was discussed on lkml, Jens did not want one more IO
> scheduler and wanted CFQ to be fixed. So are you not convinced that CFQ
> can be fixed to achieve similar results as BFQ?
>   
I thought about how to integrate the nice features of bfq into cfq, but 
I am afraid that the problem is intrinsic to the base algorithm itself 
of cfq.
I will try to provide more details in the following points
> IIRC, BFQ had two main things.
>
> - B-WF2Q+ algorithm
> - Budget allocation in terms of sectors and not time.
>
> I am not sure that in practice we really need kind of fairness and accracy
> B-W2Q+ tries to provide. Reason being that WF2Q+ was assuming continuous
> IO traffic from queues and on desktop systems,
I'm not sure I fully understood what you mean, but in general WF2Q+ has 
been designed exactly to provide the smoothest possible service, i,e, 
lowest possible latencies, with intermittent traffics
>  I have seen most of the
> things we care about do small amount of random read/write and go away.
> Only sequential readers keep the IO queue busy for longer intervals.
>   
I agree. Apart from sequential readers/writers that can use the disk for 
a very long time, most of the other applications are intermittent.
The problem is that many of these applications may easily cause load 
periods that last for several seconds: consider for example the update 
of the data bases used in many applications, or a make with a proper 
level of concurrency, or, even better, both at the same time :)
Besides, in a realistic heavy-load scenario, sequential traffic must be 
added too: consider, e.g., peer-to-peer file exchange, and possibly some 
downloads.
The purpose of the four seq/rand read/write background workloads used in 
my tests was to mimic these, more or less short, load periods.
> IOW, I thought that one can fix latency issues by just reducing the time
> slice length and by giving perference to random readers/writers.
>
>   
Reducing the slice does reduce latencies, but, apart from a scenario 
where any application tends to issue random requests uniformly 
distributed over the disk surface, reducing latencies reduces the 
aggregated throughput too.
The same happens if preference is given to random requests.
On the contrary, thanks to a different scheduling policy, bfq reduces 
latencies and at the same gets a higher throughput than cfq, with almost 
all the workloads I used (or about the same throughput with the other 
workloads).

> I am wondering what difference design philosophy we bring in BFQ which
> can not be employed in CFQ and fix the things.
>
>   
The 'secret' is the core of bfq, i.e., B-WF2Q+. From a conceptual point 
of view, the key is the fact that the scheduling scheme of B-WF2Q+ 
allows bandwidth distribution to be decoupled from slice lengths (more 
precisely from budgets). This gives bfq one more freedom degree with 
respect to cfq for trying to achieve both low latencies and a high 
throughput.
> Thanks
> Vivek
>   
Thank you
Paolo
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.5.441 / Virus Database: 271.1.1/3049 - Release Date: 08/03/10 14:22:00
>
>   



More information about the devel mailing list