On 03/08/2022 22:47, Adam Williamson wrote:
On Sun, 2022-07-24 at 10:28 +0100, Richard W.M. Jones wrote:
> The current Fedora Rawhide kernels are too slow to run libguestfs
> tests when doing Koji builds. These run in a qemu VM, running the
> Rawhide kernel, emulated using software virtualization (ie. TCG).
> They now time out because these kernels are so slow. Until fairly
> recently they were slow but working.
>
> I wondered if particular debug options had a greater effect on
> performance, so I compiled many kernels (v5.19-rc7 from upstream)
> using the baseline "no debug" config, then adding each debug option
> that we use in turn, and measuring the performance using [1], using
> qemu software virtualization (TCG). The tests were run many times
> with warmups discarded to get the mean and standard deviation, using
> the hyperfine program[2].
>
> The results are below, and not very conclusive, but some options do
> have a very large performance impact.
>
> NO_DEBUG is the kernel compiled with no debug options enabled (ie. the
> baseline).
>
> In the actual debug kernel I expect the slow downs to be multiplied
> together. To test that I did an extra run with all debug options
> enabled (ALL_DEBUG).
>
> CONFIG_PROVE_LOCKING, CONFIG_LOCK_STAT and CONFIG_DEBUG_LOCK_ALLOC
> were present and enabled in the kernel when it was imported into git
> in 2010.
>
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH was turned off in the past
> (RHBZ#1114160). It seems to have been switched on again in 2020.
>
> CONFIG_DEBUG_KMEMLEAK seems like it was enabled in 2012.
>
> It's also possible that an existing debug option has got slower in the
> upstream kernel, that is, it's not that we've recently changed
> something in Fedora.
Thanks a lot for this work, Richard! And thanks to Justin for looking
at it. I would be super appreciative of anything we can do to reduce
the performance hit here, as it is also an issue for openQA testing -
we get noticeably more test failures due to timeouts, things taking
longer than expected, or typing errors when Rawhide is on a debug
kernel.
In Cockpit we recently enabled rawhide testing on the testing farm and
noticed similar performance issues. [1]
In comparison to Fedora 36 it takes 5 minutes longer in one test
scenario. So it would be great to speed that up a bit!
[1]
https://gitlab.com/testing-farm/general/-/issues/45