On Tue, Jun 30, 2020 at 07:28:53PM +0100, Ankur Sinha wrote:
On Tue, Jun 30, 2020 17:23:16 +0000, Zbigniew Jędrzejewski-Szmek
wrote:
> On Tue, Jun 30, 2020 at 04:25:23PM +0100, Ankur Sinha wrote:
> > On Mon, Jun 29, 2020 15:01:24 -0600, Chris Murphy wrote:
> > >
https://bugzilla.redhat.com/show_bug.cgi?id=1851783
> > >
> > > The main argument is that for typical and varied workloads in Fedora,
> > > mostly on consumer hardware, we should use mq-deadline scheduler
> > > rather than either none or bfq.
> > >
> > > It may be true most folks with NVMe won't see anything bad with none,
> > > but those who have heavier IO workloads are likely to be better off
> > > with mq-deadline.
> > >
> > > Further details are in the bug, but let's discuss it on list. Thanks!
> >
> > There was this thread about our systems hanging, and the workaround was
> > to revert to mq-deadline from bfq:
> >
> >
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...
>
> To clarify: you could reliably reproduce the issue when building steps in mock.
> Did you verify that it is reliably fixed simply by switching bfq→mq-deadline?
Yes, that was the first change I had made and it had stopped the
hanging. As a permanent fix, though, I switched to using isolation =
simple in mock, and since that works, I've not changed it since.
OK, thanks.
(I make it a point to provide the needed information for bugs, but
this
release my quota is currently being used up on getting Docker + minikube
to work on F32 for $dayjob)
> > There are a few threads on AskFedora about systems hanging. They're not
> > the easiest to debug but we did suggest people try switching to
> > mq-deadline to see if it helps:
> >
> >
https://ask.fedoraproject.org/t/whole-os-freezes-watching-a-video-with-mp...
> >
> > I don't know enough about this to say if it's a bug and if it has been
> > fixed.
>
> There's a lot of noise in those bug reports. For heisenbugs, the fact
> that something was an issue and after a flurry of half-random changes
> to the system isn't, does not allow us conclude _anything_. We need
> somebody who understands what they are doing to isolate the issue. In
> particular, if this is a kernel hang, than we need a proper traceback
> from the kernel, and not just assume it's the scheduler.
There is a kernel trace in the related bug that was cited there:
https://bugzilla.redhat.com/show_bug.cgi?id=1767097#c7
which links to another bfq bug here that's currently needinfo:
https://bugzilla.redhat.com/show_bug.cgi?id=1767539
> (In particular, if this is a race condition, changing the scheduler
> could be just making the condition less likely because the system is
> slower or faster or just schedules processes in a different order,
> without the scheduler being relevant to the bug).
Like I said, I don't know. I'm a fairly advanced Linux user but you can
hardly me to also be kernel hacker. :)
For kernel bugs, I'd strongly suggest giving reporters steps by step
instructions or links to using a "serial console" or a "netconsole".
These are not part of my working vocabulary (I cannot speak for others).
Thanks for the links. This seems to be a tough cookie and I hope it
gets resolved as some point. And to clarify: my comment about
debugging was not directed to you in particular, apart from the
question above which you have already answered.
Zbyszek