The current Fedora Rawhide kernels are too slow to run libguestfs tests when doing Koji builds. These run in a qemu VM, running the Rawhide kernel, emulated using software virtualization (ie. TCG). They now time out because these kernels are so slow. Until fairly recently they were slow but working.
I wondered if particular debug options had a greater effect on performance, so I compiled many kernels (v5.19-rc7 from upstream) using the baseline "no debug" config, then adding each debug option that we use in turn, and measuring the performance using [1], using qemu software virtualization (TCG). The tests were run many times with warmups discarded to get the mean and standard deviation, using the hyperfine program[2].
The results are below, and not very conclusive, but some options do have a very large performance impact.
NO_DEBUG is the kernel compiled with no debug options enabled (ie. the baseline).
In the actual debug kernel I expect the slow downs to be multiplied together. To test that I did an extra run with all debug options enabled (ALL_DEBUG).
CONFIG_PROVE_LOCKING, CONFIG_LOCK_STAT and CONFIG_DEBUG_LOCK_ALLOC were present and enabled in the kernel when it was imported into git in 2010.
CONFIG_DEBUG_WW_MUTEX_SLOWPATH was turned off in the past (RHBZ#1114160). It seems to have been switched on again in 2020.
CONFIG_DEBUG_KMEMLEAK seems like it was enabled in 2012.
It's also possible that an existing debug option has got slower in the upstream kernel, that is, it's not that we've recently changed something in Fedora.
Rich.
[1] https://libguestfs.org/libguestfs-test-tool.1.html [2] https://github.com/sharkdp/hyperfine
NO_DEBUG: 12.362 s ± 0.093 s
ALL_DEBUG: 30.134 s ± 0.402 s (+143%)
CONFIG_PROVE_LOCKING=y: 23.435 s ± 0.526 s (+ 88%) CONFIG_LOCK_STAT=y: 17.707 s ± 0.254 s (+ 43%) CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y: 15.804 s ± 0.161 s (+ 27%) CONFIG_DEBUG_KMEMLEAK=y: 15.794 s ± 0.261 s (+ 27%) CONFIG_DEBUG_LOCK_ALLOC=y: 15.696 s ± 0.116 s (+ 27%) CONFIG_PAGE_TABLE_CHECK_ENFORCED=y: 12.694 s ± 0.104 s (+ 2%) CONFIG_FAILSLAB=y: 12.679 s ± 0.122 s CONFIG_NOUVEAU_DEBUG_MMU=y: 12.657 s ± 0.156 s CONFIG_FAULT_INJECTION_DEBUG_FS=y: 12.630 s ± 0.158 s CONFIG_DMA_API_DEBUG=y: 12.624 s ± 0.148 s CONFIG_PERF_USE_VMALLOC=y: 12.611 s ± 0.125 s CONFIG_NOUVEAU_DEBUG_PUSH=y: 12.608 s ± 0.165 s CONFIG_DEBUG_SPINLOCK=y: 12.600 s ± 0.132 s CONFIG_PM_ADVANCED_DEBUG=y: 12.586 s ± 0.132 s CONFIG_FAIL_IO_TIMEOUT=y: 12.580 s ± 0.131 s CONFIG_FAIL_MMC_REQUEST=y: 12.571 s ± 0.103 s CONFIG_INTEL_IOMMU_DEBUGFS=y: 12.569 s ± 0.111 s CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y: 12.564 s ± 0.111 s CONFIG_SYNTH_EVENT_GEN_TEST=m: 12.552 s ± 0.082 s CONFIG_LOCK_EVENT_COUNTS=y: 12.551 s ± 0.118 s CONFIG_FAIL_MAKE_REQUEST=y: 12.550 s ± 0.098 s CONFIG_TEST_MIN_HEAP=m: 12.545 s ± 0.071 s CONFIG_DEBUG_RWSEMS=y: 12.543 s ± 0.117 s CONFIG_FAULT_INJECTION=y: 12.541 s ± 0.153 s CONFIG_LOCKDEP_BITS=16: 12.532 s ± 0.161 s CONFIG_FAIL_PAGE_ALLOC=y: 12.532 s ± 0.136 s CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12: 12.526 s ± 0.068 s CONFIG_KDB_DEFAULT_ENABLE=0x0: 12.523 s ± 0.143 s CONFIG_TEST_LIST_SORT=m: 12.522 s ± 0.062 s CONFIG_SND_VERBOSE_PRINTK=y: 12.522 s ± 0.120 s CONFIG_WQ_WATCHDOG=y: 12.518 s ± 0.141 s CONFIG_RTW89_DEBUG=y: 12.517 s ± 0.099 s CONFIG_KDB_KEYBOARD=y: 12.517 s ± 0.183 s CONFIG_DETECT_HUNG_TASK=y: 12.517 s ± 0.123 s CONFIG_TEST_LOCKUP=m: 12.514 s ± 0.080 s CONFIG_IWLWIFI_DEVICE_TRACING=y: 12.514 s ± 0.139 s CONFIG_QUOTA_DEBUG=y: 12.511 s ± 0.114 s CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120: 12.511 s ± 0.159 s CONFIG_DEBUG_RT_MUTEXES=y: 12.511 s ± 0.116 s CONFIG_EFI_PGT_DUMP=y: 12.507 s ± 0.130 s CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14: 12.506 s ± 0.095 s CONFIG_PAGE_TABLE_CHECK=y: 12.504 s ± 0.102 s CONFIG_DEBUG_VM_PGFLAGS=y: 12.500 s ± 0.106 s CONFIG_XFS_WARN=y: 12.497 s ± 0.168 s CONFIG_SND_JACK_INJECTION_DEBUG=y: 12.495 s ± 0.098 s CONFIG_FAIL_FUNCTION=y: 12.495 s ± 0.127 s CONFIG_DMAR_DEBUG=y: 12.486 s ± 0.145 s CONFIG_RTW88_DEBUG=y: 12.484 s ± 0.050 s CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=4096: 12.483 s ± 0.075 s CONFIG_DEBUG_PERF_USE_VMALLOC=y: 12.481 s ± 0.107 s CONFIG_KPROBE_EVENT_GEN_TEST=m: 12.478 s ± 0.148 s CONFIG_LOCKDEP=y: 12.475 s ± 0.095 s CONFIG_KDB_CONTINUE_CATASTROPHIC=0: 12.474 s ± 0.136 s CONFIG_DEBUG_ATOMIC_SLEEP=y: 12.469 s ± 0.125 s CONFIG_SND_PCM_XRUN_DEBUG=y: 12.467 s ± 0.073 s CONFIG_DEBUG_VM_PGTABLE=y: 12.466 s ± 0.099 s CONFIG_LOCKDEP_CHAINS_BITS=17: 12.460 s ± 0.148 s CONFIG_DEBUG_SG=y: 12.456 s ± 0.177 s CONFIG_MODULE_FORCE_UNLOAD=y: 12.453 s ± 0.150 s CONFIG_DEBUG_MISC=y: 12.453 s ± 0.133 s CONFIG_DMADEVICES_DEBUG=y: 12.450 s ± 0.135 s CONFIG_DEBUG_NET=y: 12.450 s ± 0.088 s CONFIG_PERCPU_STATS=y: 12.448 s ± 0.097 s CONFIG_CEPH_LIB_PRETTYDEBUG=y: 12.447 s ± 0.086 s CONFIG_DMAR_PERF=y: 12.445 s ± 0.146 s CONFIG_DMABUF_DEBUG=y: 12.445 s ± 0.178 s CONFIG_TRACE_IRQFLAGS_NMI=y: 12.444 s ± 0.196 s CONFIG_CAN_DEBUG_DEVICES=y: 12.440 s ± 0.100 s CONFIG_DMA_API_DEBUG_SG=y: 12.437 s ± 0.159 s CONFIG_CRYPTO_DEV_CCP_DEBUGFS=y: 12.433 s ± 0.139 s CONFIG_PTDUMP_DEBUGFS=y: 12.432 s ± 0.129 s CONFIG_EXT4_DEBUG=y: 12.431 s ± 0.124 s CONFIG_DEBUG_NOTIFIERS=y: 12.424 s ± 0.140 s CONFIG_PROVE_RCU=y: 12.420 s ± 0.183 s CONFIG_SND_CTL_VALIDATION=y: 12.417 s ± 0.152 s CONFIG_IOMMU_DEBUGFS=y: 12.415 s ± 0.149 s CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y: 12.414 s ± 0.135 s CONFIG_DEBUG_STACK_USAGE=y: 12.412 s ± 0.170 s CONFIG_DEBUG_OBJECTS_RCU_HEAD=y: 12.410 s ± 0.119 s CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION=y: 12.409 s ± 0.089 s CONFIG_DEBUG_OBJECTS_WORK=y: 12.408 s ± 0.162 s CONFIG_CARL9170_DEBUGFS=y: 12.408 s ± 0.054 s CONFIG_SND_DEBUG=y: 12.406 s ± 0.103 s CONFIG_RTW89_DEBUGMSG=y: 12.406 s ± 0.144 s CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y: 12.406 s ± 0.081 s CONFIG_DEBUG_OBJECTS_TIMERS=y: 12.403 s ± 0.177 s CONFIG_RTW88_DEBUGFS=y: 12.395 s ± 0.177 s CONFIG_B43_DEBUG=y: 12.392 s ± 0.127 s CONFIG_B43LEGACY_DEBUG=y: 12.390 s ± 0.135 s CONFIG_ACPI_APEI_ERST_DEBUG=m: 12.389 s ± 0.124 s CONFIG_DEBUG_OBJECTS_FREE=y: 12.387 s ± 0.136 s CONFIG_RTW89_DEBUGFS=y: 12.381 s ± 0.143 s CONFIG_LOCKDEP_STACK_TRACE_BITS=19: 12.372 s ± 0.136 s CONFIG_RCU_REF_SCALE_TEST=m: 12.367 s ± 0.102 s CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y: 12.363 s ± 0.180 s CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1: 12.362 s ± 0.160 s CONFIG_JBD2_DEBUG=y: 12.361 s ± 0.120 s CONFIG_TRACE_IRQFLAGS=y: 12.359 s ± 0.121 s CONFIG_ACPI_CUSTOM_METHOD=m: 12.353 s ± 0.115 s CONFIG_PREEMPTIRQ_TRACEPOINTS=y: 12.349 s ± 0.099 s CONFIG_DEBUG_CREDENTIALS=y: 12.346 s ± 0.103 s CONFIG_DEBUG_OBJECTS=y: 12.344 s ± 0.120 s CONFIG_ATH_DEBUG=y: 12.328 s ± 0.104 s CONFIG_ACPI_DEBUGGER=y: 12.326 s ± 0.146 s CONFIG_DRBD_FAULT_INJECTION=y: 12.314 s ± 0.130 s CONFIG_BPF_KPROBE_OVERRIDE=y: 12.314 s ± 0.093 s CONFIG_ACPI_DEBUGGER_USER=m: 12.312 s ± 0.142 s CONFIG_KGDB_KDB=y: 12.309 s ± 0.131 s CONFIG_DEBUG_MUTEXES=y: 12.287 s ± 0.109 s CONFIG_DEBUG_PER_CPU_MAPS=y: 12.286 s ± 0.131 s CONFIG_ACPI_EC_DEBUGFS=m: 12.285 s ± 0.117 s CONFIG_ACPI_CONFIGFS=m: 12.280 s ± 0.110 s CONFIG_ACPI_DEBUG=y: 12.277 s ± 0.101 s CONFIG_BTRFS_ASSERT=y: 12.268 s ± 0.130 s
On Sun, Jul 24, 2022 at 4:29 AM Richard W.M. Jones rjones@redhat.com wrote:
The current Fedora Rawhide kernels are too slow to run libguestfs tests when doing Koji builds. These run in a qemu VM, running the Rawhide kernel, emulated using software virtualization (ie. TCG). They now time out because these kernels are so slow. Until fairly recently they were slow but working.
I wondered if particular debug options had a greater effect on performance, so I compiled many kernels (v5.19-rc7 from upstream) using the baseline "no debug" config, then adding each debug option that we use in turn, and measuring the performance using [1], using qemu software virtualization (TCG). The tests were run many times with warmups discarded to get the mean and standard deviation, using the hyperfine program[2].
The results are below, and not very conclusive, but some options do have a very large performance impact.
NO_DEBUG is the kernel compiled with no debug options enabled (ie. the baseline).
In the actual debug kernel I expect the slow downs to be multiplied together. To test that I did an extra run with all debug options enabled (ALL_DEBUG).
CONFIG_PROVE_LOCKING, CONFIG_LOCK_STAT and CONFIG_DEBUG_LOCK_ALLOC were present and enabled in the kernel when it was imported into git in 2010.
CONFIG_DEBUG_WW_MUTEX_SLOWPATH was turned off in the past (RHBZ#1114160). It seems to have been switched on again in 2020.
CONFIG_DEBUG_KMEMLEAK seems like it was enabled in 2012.
It's also possible that an existing debug option has got slower in the upstream kernel, that is, it's not that we've recently changed something in Fedora.
Thanks for looking into this. You are probably correct, it ends up being a mix of things Fedora does differently and especially of upstream changing the performance profile of debug options over time. I was on vacation last week, and will be slammed with merge window/test week for the next couple, but I do definitely want to address some of this during the 6.0 cycle.
Justin
Rich.
[1] https://libguestfs.org/libguestfs-test-tool.1.html [2] https://github.com/sharkdp/hyperfine
NO_DEBUG: 12.362 s ± 0.093 s
ALL_DEBUG: 30.134 s ± 0.402 s (+143%)
CONFIG_PROVE_LOCKING=y: 23.435 s ± 0.526 s (+ 88%) CONFIG_LOCK_STAT=y: 17.707 s ± 0.254 s (+ 43%) CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y: 15.804 s ± 0.161 s (+ 27%) CONFIG_DEBUG_KMEMLEAK=y: 15.794 s ± 0.261 s (+ 27%) CONFIG_DEBUG_LOCK_ALLOC=y: 15.696 s ± 0.116 s (+ 27%) CONFIG_PAGE_TABLE_CHECK_ENFORCED=y: 12.694 s ± 0.104 s (+ 2%) CONFIG_FAILSLAB=y: 12.679 s ± 0.122 s CONFIG_NOUVEAU_DEBUG_MMU=y: 12.657 s ± 0.156 s CONFIG_FAULT_INJECTION_DEBUG_FS=y: 12.630 s ± 0.158 s CONFIG_DMA_API_DEBUG=y: 12.624 s ± 0.148 s CONFIG_PERF_USE_VMALLOC=y: 12.611 s ± 0.125 s CONFIG_NOUVEAU_DEBUG_PUSH=y: 12.608 s ± 0.165 s CONFIG_DEBUG_SPINLOCK=y: 12.600 s ± 0.132 s CONFIG_PM_ADVANCED_DEBUG=y: 12.586 s ± 0.132 s CONFIG_FAIL_IO_TIMEOUT=y: 12.580 s ± 0.131 s CONFIG_FAIL_MMC_REQUEST=y: 12.571 s ± 0.103 s CONFIG_INTEL_IOMMU_DEBUGFS=y: 12.569 s ± 0.111 s CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y: 12.564 s ± 0.111 s CONFIG_SYNTH_EVENT_GEN_TEST=m: 12.552 s ± 0.082 s CONFIG_LOCK_EVENT_COUNTS=y: 12.551 s ± 0.118 s CONFIG_FAIL_MAKE_REQUEST=y: 12.550 s ± 0.098 s CONFIG_TEST_MIN_HEAP=m: 12.545 s ± 0.071 s CONFIG_DEBUG_RWSEMS=y: 12.543 s ± 0.117 s CONFIG_FAULT_INJECTION=y: 12.541 s ± 0.153 s CONFIG_LOCKDEP_BITS=16: 12.532 s ± 0.161 s CONFIG_FAIL_PAGE_ALLOC=y: 12.532 s ± 0.136 s CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12: 12.526 s ± 0.068 s CONFIG_KDB_DEFAULT_ENABLE=0x0: 12.523 s ± 0.143 s CONFIG_TEST_LIST_SORT=m: 12.522 s ± 0.062 s CONFIG_SND_VERBOSE_PRINTK=y: 12.522 s ± 0.120 s CONFIG_WQ_WATCHDOG=y: 12.518 s ± 0.141 s CONFIG_RTW89_DEBUG=y: 12.517 s ± 0.099 s CONFIG_KDB_KEYBOARD=y: 12.517 s ± 0.183 s CONFIG_DETECT_HUNG_TASK=y: 12.517 s ± 0.123 s CONFIG_TEST_LOCKUP=m: 12.514 s ± 0.080 s CONFIG_IWLWIFI_DEVICE_TRACING=y: 12.514 s ± 0.139 s CONFIG_QUOTA_DEBUG=y: 12.511 s ± 0.114 s CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120: 12.511 s ± 0.159 s CONFIG_DEBUG_RT_MUTEXES=y: 12.511 s ± 0.116 s CONFIG_EFI_PGT_DUMP=y: 12.507 s ± 0.130 s CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14: 12.506 s ± 0.095 s CONFIG_PAGE_TABLE_CHECK=y: 12.504 s ± 0.102 s CONFIG_DEBUG_VM_PGFLAGS=y: 12.500 s ± 0.106 s CONFIG_XFS_WARN=y: 12.497 s ± 0.168 s CONFIG_SND_JACK_INJECTION_DEBUG=y: 12.495 s ± 0.098 s CONFIG_FAIL_FUNCTION=y: 12.495 s ± 0.127 s CONFIG_DMAR_DEBUG=y: 12.486 s ± 0.145 s CONFIG_RTW88_DEBUG=y: 12.484 s ± 0.050 s CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=4096: 12.483 s ± 0.075 s CONFIG_DEBUG_PERF_USE_VMALLOC=y: 12.481 s ± 0.107 s CONFIG_KPROBE_EVENT_GEN_TEST=m: 12.478 s ± 0.148 s CONFIG_LOCKDEP=y: 12.475 s ± 0.095 s CONFIG_KDB_CONTINUE_CATASTROPHIC=0: 12.474 s ± 0.136 s CONFIG_DEBUG_ATOMIC_SLEEP=y: 12.469 s ± 0.125 s CONFIG_SND_PCM_XRUN_DEBUG=y: 12.467 s ± 0.073 s CONFIG_DEBUG_VM_PGTABLE=y: 12.466 s ± 0.099 s CONFIG_LOCKDEP_CHAINS_BITS=17: 12.460 s ± 0.148 s CONFIG_DEBUG_SG=y: 12.456 s ± 0.177 s CONFIG_MODULE_FORCE_UNLOAD=y: 12.453 s ± 0.150 s CONFIG_DEBUG_MISC=y: 12.453 s ± 0.133 s CONFIG_DMADEVICES_DEBUG=y: 12.450 s ± 0.135 s CONFIG_DEBUG_NET=y: 12.450 s ± 0.088 s CONFIG_PERCPU_STATS=y: 12.448 s ± 0.097 s CONFIG_CEPH_LIB_PRETTYDEBUG=y: 12.447 s ± 0.086 s CONFIG_DMAR_PERF=y: 12.445 s ± 0.146 s CONFIG_DMABUF_DEBUG=y: 12.445 s ± 0.178 s CONFIG_TRACE_IRQFLAGS_NMI=y: 12.444 s ± 0.196 s CONFIG_CAN_DEBUG_DEVICES=y: 12.440 s ± 0.100 s CONFIG_DMA_API_DEBUG_SG=y: 12.437 s ± 0.159 s CONFIG_CRYPTO_DEV_CCP_DEBUGFS=y: 12.433 s ± 0.139 s CONFIG_PTDUMP_DEBUGFS=y: 12.432 s ± 0.129 s CONFIG_EXT4_DEBUG=y: 12.431 s ± 0.124 s CONFIG_DEBUG_NOTIFIERS=y: 12.424 s ± 0.140 s CONFIG_PROVE_RCU=y: 12.420 s ± 0.183 s CONFIG_SND_CTL_VALIDATION=y: 12.417 s ± 0.152 s CONFIG_IOMMU_DEBUGFS=y: 12.415 s ± 0.149 s CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y: 12.414 s ± 0.135 s CONFIG_DEBUG_STACK_USAGE=y: 12.412 s ± 0.170 s CONFIG_DEBUG_OBJECTS_RCU_HEAD=y: 12.410 s ± 0.119 s CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION=y: 12.409 s ± 0.089 s CONFIG_DEBUG_OBJECTS_WORK=y: 12.408 s ± 0.162 s CONFIG_CARL9170_DEBUGFS=y: 12.408 s ± 0.054 s CONFIG_SND_DEBUG=y: 12.406 s ± 0.103 s CONFIG_RTW89_DEBUGMSG=y: 12.406 s ± 0.144 s CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y: 12.406 s ± 0.081 s CONFIG_DEBUG_OBJECTS_TIMERS=y: 12.403 s ± 0.177 s CONFIG_RTW88_DEBUGFS=y: 12.395 s ± 0.177 s CONFIG_B43_DEBUG=y: 12.392 s ± 0.127 s CONFIG_B43LEGACY_DEBUG=y: 12.390 s ± 0.135 s CONFIG_ACPI_APEI_ERST_DEBUG=m: 12.389 s ± 0.124 s CONFIG_DEBUG_OBJECTS_FREE=y: 12.387 s ± 0.136 s CONFIG_RTW89_DEBUGFS=y: 12.381 s ± 0.143 s CONFIG_LOCKDEP_STACK_TRACE_BITS=19: 12.372 s ± 0.136 s CONFIG_RCU_REF_SCALE_TEST=m: 12.367 s ± 0.102 s CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y: 12.363 s ± 0.180 s CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1: 12.362 s ± 0.160 s CONFIG_JBD2_DEBUG=y: 12.361 s ± 0.120 s CONFIG_TRACE_IRQFLAGS=y: 12.359 s ± 0.121 s CONFIG_ACPI_CUSTOM_METHOD=m: 12.353 s ± 0.115 s CONFIG_PREEMPTIRQ_TRACEPOINTS=y: 12.349 s ± 0.099 s CONFIG_DEBUG_CREDENTIALS=y: 12.346 s ± 0.103 s CONFIG_DEBUG_OBJECTS=y: 12.344 s ± 0.120 s CONFIG_ATH_DEBUG=y: 12.328 s ± 0.104 s CONFIG_ACPI_DEBUGGER=y: 12.326 s ± 0.146 s CONFIG_DRBD_FAULT_INJECTION=y: 12.314 s ± 0.130 s CONFIG_BPF_KPROBE_OVERRIDE=y: 12.314 s ± 0.093 s CONFIG_ACPI_DEBUGGER_USER=m: 12.312 s ± 0.142 s CONFIG_KGDB_KDB=y: 12.309 s ± 0.131 s CONFIG_DEBUG_MUTEXES=y: 12.287 s ± 0.109 s CONFIG_DEBUG_PER_CPU_MAPS=y: 12.286 s ± 0.131 s CONFIG_ACPI_EC_DEBUGFS=m: 12.285 s ± 0.117 s CONFIG_ACPI_CONFIGFS=m: 12.280 s ± 0.110 s CONFIG_ACPI_DEBUG=y: 12.277 s ± 0.101 s CONFIG_BTRFS_ASSERT=y: 12.268 s ± 0.130 s
-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Sun, 2022-07-24 at 10:28 +0100, Richard W.M. Jones wrote:
The current Fedora Rawhide kernels are too slow to run libguestfs tests when doing Koji builds. These run in a qemu VM, running the Rawhide kernel, emulated using software virtualization (ie. TCG). They now time out because these kernels are so slow. Until fairly recently they were slow but working.
I wondered if particular debug options had a greater effect on performance, so I compiled many kernels (v5.19-rc7 from upstream) using the baseline "no debug" config, then adding each debug option that we use in turn, and measuring the performance using [1], using qemu software virtualization (TCG). The tests were run many times with warmups discarded to get the mean and standard deviation, using the hyperfine program[2].
The results are below, and not very conclusive, but some options do have a very large performance impact.
NO_DEBUG is the kernel compiled with no debug options enabled (ie. the baseline).
In the actual debug kernel I expect the slow downs to be multiplied together. To test that I did an extra run with all debug options enabled (ALL_DEBUG).
CONFIG_PROVE_LOCKING, CONFIG_LOCK_STAT and CONFIG_DEBUG_LOCK_ALLOC were present and enabled in the kernel when it was imported into git in 2010.
CONFIG_DEBUG_WW_MUTEX_SLOWPATH was turned off in the past (RHBZ#1114160). It seems to have been switched on again in 2020.
CONFIG_DEBUG_KMEMLEAK seems like it was enabled in 2012.
It's also possible that an existing debug option has got slower in the upstream kernel, that is, it's not that we've recently changed something in Fedora.
Thanks a lot for this work, Richard! And thanks to Justin for looking at it. I would be super appreciative of anything we can do to reduce the performance hit here, as it is also an issue for openQA testing - we get noticeably more test failures due to timeouts, things taking longer than expected, or typing errors when Rawhide is on a debug kernel.
On 03/08/2022 22:47, Adam Williamson wrote:
On Sun, 2022-07-24 at 10:28 +0100, Richard W.M. Jones wrote:
The current Fedora Rawhide kernels are too slow to run libguestfs tests when doing Koji builds. These run in a qemu VM, running the Rawhide kernel, emulated using software virtualization (ie. TCG). They now time out because these kernels are so slow. Until fairly recently they were slow but working.
I wondered if particular debug options had a greater effect on performance, so I compiled many kernels (v5.19-rc7 from upstream) using the baseline "no debug" config, then adding each debug option that we use in turn, and measuring the performance using [1], using qemu software virtualization (TCG). The tests were run many times with warmups discarded to get the mean and standard deviation, using the hyperfine program[2].
The results are below, and not very conclusive, but some options do have a very large performance impact.
NO_DEBUG is the kernel compiled with no debug options enabled (ie. the baseline).
In the actual debug kernel I expect the slow downs to be multiplied together. To test that I did an extra run with all debug options enabled (ALL_DEBUG).
CONFIG_PROVE_LOCKING, CONFIG_LOCK_STAT and CONFIG_DEBUG_LOCK_ALLOC were present and enabled in the kernel when it was imported into git in 2010.
CONFIG_DEBUG_WW_MUTEX_SLOWPATH was turned off in the past (RHBZ#1114160). It seems to have been switched on again in 2020.
CONFIG_DEBUG_KMEMLEAK seems like it was enabled in 2012.
It's also possible that an existing debug option has got slower in the upstream kernel, that is, it's not that we've recently changed something in Fedora.
Thanks a lot for this work, Richard! And thanks to Justin for looking at it. I would be super appreciative of anything we can do to reduce the performance hit here, as it is also an issue for openQA testing - we get noticeably more test failures due to timeouts, things taking longer than expected, or typing errors when Rawhide is on a debug kernel.
In Cockpit we recently enabled rawhide testing on the testing farm and noticed similar performance issues. [1] In comparison to Fedora 36 it takes 5 minutes longer in one test scenario. So it would be great to speed that up a bit!
On Sun, Jul 24, 2022 at 4:29 AM Richard W.M. Jones rjones@redhat.com wrote:
The current Fedora Rawhide kernels are too slow to run libguestfs tests when doing Koji builds. These run in a qemu VM, running the Rawhide kernel, emulated using software virtualization (ie. TCG). They now time out because these kernels are so slow. Until fairly recently they were slow but working.
I wondered if particular debug options had a greater effect on performance, so I compiled many kernels (v5.19-rc7 from upstream) using the baseline "no debug" config, then adding each debug option that we use in turn, and measuring the performance using [1], using qemu software virtualization (TCG). The tests were run many times with warmups discarded to get the mean and standard deviation, using the hyperfine program[2].
The results are below, and not very conclusive, but some options do have a very large performance impact.
NO_DEBUG is the kernel compiled with no debug options enabled (ie. the baseline).
In the actual debug kernel I expect the slow downs to be multiplied together. To test that I did an extra run with all debug options enabled (ALL_DEBUG).
CONFIG_PROVE_LOCKING, CONFIG_LOCK_STAT and CONFIG_DEBUG_LOCK_ALLOC were present and enabled in the kernel when it was imported into git in 2010.
CONFIG_DEBUG_WW_MUTEX_SLOWPATH was turned off in the past (RHBZ#1114160). It seems to have been switched on again in 2020.
CONFIG_DEBUG_KMEMLEAK seems like it was enabled in 2012.
It's also possible that an existing debug option has got slower in the upstream kernel, that is, it's not that we've recently changed something in Fedora.
Just to reiterate this is not being dropped. I did a bit of research, CONFIG_DEBUG_WW_MUTEX_SLOWPATH was actually turned back on in 2018, as upstream changed PROVE_LOCKING to select it. As such, there is no way to turn it off without turning off PROVE_LOCKING. All things considered, PROVE_LOCKING has found a good number of bugs over the years. From the numbers you have given, there doesn't seem to be a whole lot of opportunity for real improvement without turning off that chain. Overall, this requires some thought on the rawhide debug strategy more than anything else. We do have things (CKI and similar) which can give us more useful data without making a debug kernel default, but it still doesn't catch issues seen in hardware drivers so much. Though if debug kernels have gotten slow enough that no one is running them anymore, opting for the no-debug repo, we may be better served by changing our strategy here.
Justin
On Fri, 2022-10-14 at 10:27 -0500, Justin Forbes wrote:
Just to reiterate this is not being dropped. I did a bit of research, CONFIG_DEBUG_WW_MUTEX_SLOWPATH was actually turned back on in 2018, as upstream changed PROVE_LOCKING to select it. As such, there is no way to turn it off without turning off PROVE_LOCKING. All things considered, PROVE_LOCKING has found a good number of bugs over the years. From the numbers you have given, there doesn't seem to be a whole lot of opportunity for real improvement without turning off that chain. Overall, this requires some thought on the rawhide debug strategy more than anything else. We do have things (CKI and similar) which can give us more useful data without making a debug kernel default, but it still doesn't catch issues seen in hardware drivers so much. Though if debug kernels have gotten slow enough that no one is running them anymore, opting for the no-debug repo, we may be better served by changing our strategy here.
My personal experience when running Rawhide on my main laptops is that, yeah, debug kernels are sufficiently slow that I will stick to booting one of the periodic non-debug builds unless I'm specifically running into a kernel bug and want to see if the debug kernel provides more information on it.