Fwd: Re: [ANNOUNCE] bfq disk I/O scheduler

Wed Aug 4 19:26:50 UTC 2010

My response got rejected by list. Now I have subscribed to the list.
Hoepefully it will go through this time.

Vivek

----- Forwarded message from Vivek Goyal <vgoyal at redhat.com> -----

Date: Wed, 4 Aug 2010 15:18:45 -0400
From: Vivek Goyal <vgoyal at redhat.com>
To: Paolo Valente <posta_paolo at yahoo.it>
Cc: Development discussions related to Fedora <devel at lists.fedoraproject.org>,
	Jeff Moyer <jmoyer at redhat.com>, Ric Wheeler <rwheeler at redhat.com>
Subject: Re: [ANNOUNCE] bfq disk I/O scheduler
User-Agent: Mutt/1.5.20 (2009-12-10)
Message-ID: <20100804191845.GD11807 at redhat.com>

On Wed, Aug 04, 2010 at 02:44:21PM -0400, Ric Wheeler wrote:
> On 08/04/2010 01:58 PM, Paolo Valente wrote:
> >Hi,
> >I have been working for a few years (with Fabio Checconi) on a disk
> >scheduler providing definitely lower latencies than cfq, as well as a
> >higher throughput with most of the test workloads we used (or the same
> >throughput as cfq with the other workloads). We named this scheduler
> >bfq (budget fair queueing). I hope this is the right list for announcing
> >this work.
> >
> >One of the things we measured in our tests is the cold-cache execution
> >time of a command as, e.g., "bash -c exit", "xterm /bin/true" or
> >"konsole -e /bin/true", while the disk was also accessed by different
> >combinations of sequential, or random, readers and/or
> >writers. Depending on which of these background workloads was used,
> >these execution times were five to nine times lower with bfq under
> >2.6.32. Under 2.6.35 they were instead from six to fourteen times
> >lower. The highest price paid for these lower latencies was a 20% loss
> >of aggregated disk throughput for konsole in case of background
> >workloads made only of sequential requests (due to the fact that bfq
> >of course privileges, more than cfq, the seeky IO needed to load
> >konsole and its dependencies). In contrast, with shorter commands, as
> >bash or xterm, bfq also provided up to 30% higher aggregated
> >throughput.
> >
> >We saw from 15% to 30% higher aggregated throughput also in our
> >only-aggregated-throughput tests. You can find in [1] all the details
> >on our tests and on other nice features of bfq, such as the fact that
> >it perfectly distributes the disk throughput as desired, independently
> >of disk physical parameters like, e.g., ZBR. in [1] you can also find
> >a detailed description of bfq and a short report on the maturity level
> >of the code (TODO list), plus all the scripts used for the tests.
> >
> >The results I mentioned so far have been achieved with the last
> >version of bfq, released about two months ago as patchsets for 2.6.33
> >or 2.6.34. From a few days a patchset for 2.6.35 is available too, as
> >well as a backport to 2.6.32. The latter has been prepared by Mauro
> >Andreolini, who also helped me a lot with debugging. All these patches
> >can be found here [2]. Mauro also built a binary kernel package for
> >current lucid, and hosted it into a PPA, which can be found here [3].
> >
> >A few days after being released, this version of bfq has been
> >introduced as the default disk scheduler in the Zen Kernel. It has
> >been adopted as the default disk scheduler in Gentoo Linux too. I
> >also recorded downloads from users with other distributions, as, e.g.,
> >Ubuntu and ArchLinux. As of now we received only positive feedbacks
> >from the users.
> >
> >Paolo
> >
> >[1] http://algo.ing.unimo.it/people/paolo/disk_sched/
> >[2] http://algo.ing.unimo.it/people/paolo/disk_sched/sources.php
> >[3] Ubuntu PPA: ppa:mauro-andreolini/ubuntu-kernel-bfq
> >
> 
> Hi Paolo,
> 
> Have you tried to post this to the upstream developers of CFQ and IO schedulers?
> 

Hi Paolo,

Last time when BFQ was discussed on lkml, Jens did not want one more IO
scheduler and wanted CFQ to be fixed. So are you not convinced that CFQ
can be fixed to achieve similar results as BFQ?

IIRC, BFQ had two main things.

- B-WF2Q+ algorithm
- Budget allocation in terms of sectors and not time.

I am not sure that in practice we really need kind of fairness and accracy
B-W2Q+ tries to provide. Reason being that WF2Q+ was assuming continuous
IO traffic from queues and on desktop systems, I have seen most of the
things we care about do small amount of random read/write and go away.
Only sequential readers keep the IO queue busy for longer intervals.

IOW, I thought that one can fix latency issues by just reducing the time
slice length and by giving perference to random readers/writers.

I am wondering what difference design philosophy we bring in BFQ which
can not be employed in CFQ and fix the things.

Thanks
Vivek

----- End forwarded message -----