Re: drop bfq scheduler, instead use mq-deadline across the board

Tuesday, 30 June 2020

On Tue, Jun 30, 2020 at 07:28:53PM +0100, Ankur Sinha wrote:
...
 On Tue, Jun 30, 2020 17:23:16 +0000, Zbigniew Jędrzejewski-Szmek
wrote:
 > On Tue, Jun 30, 2020 at 04:25:23PM +0100, Ankur Sinha wrote:
 > > On Mon, Jun 29, 2020 15:01:24 -0600, Chris Murphy wrote:
 > > > https://bugzilla.redhat.com/show_bug.cgi?id=1851783
 > > > 
 > > > The main argument is that for typical and varied workloads in Fedora,
 > > > mostly on consumer hardware, we should use mq-deadline scheduler
 > > > rather than either none or bfq.
 > > > 
 > > > It may be true most folks with NVMe won't see anything bad with none,
 > > > but those who have heavier IO workloads are likely to be better off
 > > > with mq-deadline.
 > > > 
 > > > Further details are in the bug, but let's discuss it on list. Thanks!
 > > 
 > > There was this thread about our systems hanging, and the workaround was
 > > to revert to mq-deadline from bfq:
 > > 
 > >
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...
 > 
 > To clarify: you could reliably reproduce the issue when building steps in mock.
 > Did you verify that it is reliably fixed simply by switching bfq→mq-deadline?

 Yes, that was the first change I had made and it had stopped the
 hanging. As a permanent fix, though, I switched to using isolation =
 simple in mock, and since that works, I've not changed it since. 
OK, thanks.

...
 (I make it a point to provide the needed information for bugs, but
this
 release my quota is currently being used up on getting Docker + minikube
 to work on F32 for $dayjob)

 > > There are a few threads on AskFedora about systems hanging. They're not
 > > the easiest to debug but we did suggest people try switching to
 > > mq-deadline to see if it helps:
 > > 
 > >
https://ask.fedoraproject.org/t/whole-os-freezes-watching-a-video-with-mp...
 > > 
 > > I don't know enough about this to say if it's a bug and if it has been
 > > fixed.
 > 
 > There's a lot of noise in those bug reports. For heisenbugs, the fact
 > that something was an issue and after a flurry of half-random changes
 > to the system isn't, does not allow us conclude _anything_. We need
 > somebody who understands what they are doing to isolate the issue. In
 > particular, if this is a kernel hang, than we need a proper traceback
 > from the kernel, and not just assume it's the scheduler.

 There is a kernel trace in the related bug that was cited there:
 https://bugzilla.redhat.com/show_bug.cgi?id=1767097#c7

 which links to another bfq bug here that's currently needinfo:
 https://bugzilla.redhat.com/show_bug.cgi?id=1767539

 > (In particular, if this is a race condition, changing the scheduler
 > could be just making the condition less likely because the system is
 > slower or faster or just schedules processes in a different order,
 > without the scheduler being relevant to the bug).

 Like I said, I don't know. I'm a fairly advanced Linux user but you can
 hardly me to also be kernel hacker.  :)

 For kernel bugs, I'd strongly suggest giving reporters steps by step
 instructions or links to using a "serial console" or a "netconsole".
 These are not part of my working vocabulary (I cannot speak for others). 
Thanks for the links. This seems to be a tough cookie and I hope it
gets resolved as some point. And to clarify: my comment about
debugging was not directed to you in particular, apart from the
question above which you have already answered.

Zbyszek

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: drop bfq scheduler, instead use mq-deadline across the board