On Mon, Jun 29, 2020 at 9:45 PM Tom Seewald <tseewald(a)gmail.com> wrote:
> The latter but considering they're a broad variety of workloads I
> think it's misleading to call them server workloads as if that's one
> particular type of thing, or not applicable to a desktop under IO
> pressure. Why? (a) they're using consumer storage devices (b) these
> are real workloads rather than simulations (c) even by upstream's own
> descriptions of the various IO schedulers only mq-deadline is intended
> to be generic. (d) it's really hard to prove anything in this area
> without a lot of data.
You are right that the difference between them is blurry. My question comes from being
unsure if it's the case that Fedora users are experiencing problems with bfq but are
not reporting them, or if there is something specific that is causing that pathological
scheduling behavior at Facebook.
They're using mq-deadline most everywhere, not just the servers, but
local computers and VMs. They use kyber (which is Facebook
contributed) for high end storage, and it's not indicated for our
usage. I'm not sure they're seeing anything wrong per se with bfq,
it's just consistently not performing as well as mq-deadline due to
latencies. I'm not sure that's a bug if it's improving performance in
other areas that are relevant for the intended workloads. The gotcha
is, what are the intended workloads? What is even a desktop workload?
It was also my understanding that Facebook primarily uses NVMe drives
, and that is the class of storage Fedora does not use bfq with. Is it possible
these latency problems occurred when using bfq with NVMe drives?
Not certain. But in our case we use 'none' for NVMe drives. For most
people that's OK, but then some workloads will suffer if you get a
task that has a heavy demand for tags, because there's no scheduler to
spread them out among those demanding them. So it's pulling a number
ouf of my butt, but none could be fine for 90% and not great for 10%.
If anything 'none' and NVMe is a server like configuration, if it's
running a typically homogenous workload.
I now see that Paolo was cc'd in comment #9 of the bugzilla
ticket, so hopefully he responds.
> But fair enough, I'll see about collecting some data before asking to
> change the IO scheduler yet again.
For the record, I definitely agree that mq-deadline should become the default scheduler
for NVMe drives.
The other question I have, I'm pretty sure we're using the same udev
rule across all of Fedora. It's not just on the desktops. My Fedora
Server is using bfq for everything. VM's are using mq-deadline for
/dev/vd* virtio devices and bfq for /dev/sr* and /dev/sd* devies. I
have nothing against bfq but I'm inclined to go with the most generic
IO scheduler as the default, and let people optimize for their
specific workload, rather than the other way around.
It's super annoying for me to post, because benchmarks drive me crazy,
and yet here I am posting one - this is almost like self flagellation
to paste this...
None of these benchmarks are representative of a generic desktop. The
difficulty with desktop workloads is their heterogenetity. Some people
are mixing music, others compiling, still others lots of web browsing
(Chrome OS I guess went to bfq around the same time we did), and we
just don't really know what people are going to do. Some even use
Workstation as a base for more typical server operations.
The geometric mean isn't helpful either, because none of the tests are
run concurrently or attempt to produce tag starvation which would
result in latency spikes. That's where mq-deadline would do better