We have seen a significant reduction in performance in our research DomU OS kernel when running on Fedora 16 with Linux 3.4.2 vs. 3.3.7. We run a series of benchmarks which are DomU-kernel-space-CPU-heavy; many of these run 10x slower when using the 3.4.2 Linux kernel as Dom0.
This is a little surprising---we've been tracking the Fedora kernels for a long time with no problem like this. Did anyone else notice any changes?
On Mon, 2 Jul 2012, W. Michael Petullo wrote:
We have seen a significant reduction in performance in our research DomU OS kernel when running on Fedora 16 with Linux 3.4.2 vs. 3.3.7. We run a series of benchmarks which are DomU-kernel-space-CPU-heavy; many of these run 10x slower when using the 3.4.2 Linux kernel as Dom0.
This is a little surprising---we've been tracking the Fedora kernels for a long time with no problem like this. Did anyone else notice any changes?
I have noticed a slow boot on one of my systems and it has been known to loose connections with keyboard and mouse when a guest is started. Given the previous issues with this system I suspect IRQ issues with the kernel interacting with xen though I haven't investigated further.
Michael Young
On Mon, Jul 02, 2012 at 04:50:42PM -0500, W. Michael Petullo wrote:
We have seen a significant reduction in performance in our research DomU OS kernel when running on Fedora 16 with Linux 3.4.2 vs. 3.3.7. We run a series of benchmarks which are DomU-kernel-space-CPU-heavy; many of these run 10x slower when using the 3.4.2 Linux kernel as Dom0.
This is a little surprising---we've been tracking the Fedora kernels for a long time with no problem like this. Did anyone else notice any changes?
Just to verify.. both the 3.3.7 and 3.4.2 Linux kernel are 'release' builds? and not debug-versions from rawhide?
-- Pasi
We have seen a significant reduction in performance in our research DomU OS kernel when running on Fedora 16 with Linux 3.4.2 vs. 3.3.7. We run
a series of benchmarks which are DomU-kernel-space-CPU-heavy; many of
these run 10x slower when using the 3.4.2 Linux kernel as Dom0.
This is a little surprising---we've been tracking the Fedora kernels for a long time with no problem like this. Did anyone else notice any changes?
Just to verify.. both the 3.3.7 and 3.4.2 Linux kernel are 'release' builds? and not debug-versions from rawhide?
Yes, they are the Fedora 16 release builds.
On Tue, Jul 03, 2012 at 07:16:04AM -0500, W. Michael Petullo wrote:
We have seen a significant reduction in performance in our research DomU OS kernel when running on Fedora 16 with Linux 3.4.2 vs. 3.3.7. We run
a series of benchmarks which are DomU-kernel-space-CPU-heavy; many of
these run 10x slower when using the 3.4.2 Linux kernel as Dom0.
This is a little surprising---we've been tracking the Fedora kernels for a long time with no problem like this. Did anyone else notice any changes?
Just to verify.. both the 3.3.7 and 3.4.2 Linux kernel are 'release' builds? and not debug-versions from rawhide?
Yes, they are the Fedora 16 release builds.
The commits that went in (3.4) were:
1fd1443 xen/Kconfig: fix Kconfig layout 76a8df7 xen/pci: don't use PCI BIOS service for configuration space accesses b7e5ffe xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs 558daa2 xen/apic: Return the APIC ID (and version) for CPU 0. a7a97c6 drivers/video/xen-fbfront.c: add missing cleanup code 7eb7ce4 xen: correctly check for pending events when restoring irq flags b930fe5 xen/acpi: Workaround broken BIOSes exporting non-existing C-states. cf405ae xen/smp: Fix crash when booting with ACPI hotplug CPUs. 521394e xen: use the pirq number to check the pirq_eoi_map df88b2d xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded. cd74257 x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler 2a14e54 ACPI: Convert wake_sleep_flags to a value instead of function 3d81acb Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe" 186bab1 xen/resume: Fix compile warnings. 3066616 xen/xenbus: Add quirk to deal with misconfigured backends. a71e23d xen/blkback: Fix warning error. b960d6c xen/p2m: m2p_find_override: use list_for_each_entry_safe e8e937b xen/gntdev: do not set VM_PFNMAP 6b5e7d9 xen/grant-table: add error-handling code on failure of gnttab_resume f09d843 xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success 0ee46ec xen/pciback: fix XEN_PCI_OP_enable_msix result e8c9e78 xen/smp: Remove unnecessary call to smp_processor_id() 2531d64 xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' e95ae5a xen: only check xen_platform_pci_unplug if hvm 136d249 x86/ioapic: Add io_apic_ops driver layer to allow interception 3389bb8 xen/blkback: Make optional features be really optional. 4dae767 xen/blkback: Squash the discard support for 'file' and 'phy' type. df7a3ee xen/acpi: Fix Kconfig dependency on CPU_FREQ f132c5b Fix full_name_hash() behaviour when length is a multiple of 8 b9136d2 xen: initialize platform-pci even if xen_emul_unplug=never 106b443 xen/smp: Fix bringup bug in AP code. 27257fc xen/acpi: Remove the WARN's as they just create noise. 8e6f7c2 xen/tmem: cleanup 9846ff1 xen: support pirq_eoi_map 102b208 xen/acpi-processor: Do not depend on CPU frequency scaling drivers. 48cdd82 xen/cpufreq: Disable the cpu frequency scaling drivers from loading. 448c8b1 provide disable_cpufreq() function to disable the API. 3467811 xen-blkfront: make blkif_io_lock spinlock per-device dad5cf6 xen/blkfront: don't put bdev right after getting it 34ae2e4 xen-blkfront: use bitmap_set() and bitmap_clear() b2167ba xen/blkback: Enable blkback on HVM guests 4f14faa xen/blkback: use grant-table.c hypercall wrappers 4bc25af xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps a7b422c provide disable_cpufreq() function to disable the API. 59a5680 xen/acpi-processor: C and P-state driver that uploads said data to hypervisor. ead1d01 xen: constify all instances of "struct attribute_group" 42c46e6 xen/xenbus: ignore console/0 cf8e019 hvc_xen: introduce HVC_XEN_FRONTEND 02e19f9 hvc_xen: implement multiconsole support eb5ef07 hvc_xen: support PV on HVM consoles bd0d5aa xenbus: don't free other end details too early a1f37788 tboot: Add return values for tboot_sleep 09f98a8 x86, acpi, tboot: Have a ACPI os prepare sleep instead of calling tboot_sleep. 73c154c xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. cc7335b xen/setup/pm/acpi: Remove the call to boot_option_idle_override. 5ac0800 xenbus: address compiler warnings 1160831 xen/pciback: Support pci_reset_function, aka FLR or D3 support. 6fbf9e7 PCI: Introduce __pci_reset_function_locked to be used when holding device_lock. cf66f9d xen/netfront: add netconsole support. f3ff924 Remove useless get_driver()/put_driver() calls 2113f46 xen: use this_cpu_xxx replace percpu_xxx funcs cd9db80 xen/pciback: Support pci_reset_function, aka FLR or D3 support. a96d627 pci: Introduce __pci_reset_function_locked to be used when holding device_lock. 8605c68 xen: Utilize the restore_msi_irqs hook.
So one thing that you might be hitting is that now the CPU freq driver is uploading the data to the hypervisor - the hypervisor might be doing power-save stuff instead of concentrating on giving your raw performance.
So can you start with 'cpufreq=verbose,performance' on your hypervisor line.
Besides that..there is:
9846ff1 xen: support pirq_eoi_map (and its fix) 521394e xen: use the pirq number to check the pirq_eoi_map
See if reverting those makes issues go away.
-- Mike
:wq
xen mailing list xen@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/xen
On Tue, 3 Jul 2012, Konrad Rzeszutek Wilk wrote:
Besides that..there is:
9846ff1 xen: support pirq_eoi_map (and its fix) 521394e xen: use the pirq number to check the pirq_eoi_map
In my case it is definitely IRQ related. From xl dmesg I get (XEN) physdev.c:164: dom0: wrong map_pirq type 3 (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1) (XEN) traps.c:2488:d0 Domain attempted WRMSR 0000000000000079 from 0x00000000000 00000 to 0xffffc90005193030. (XEN) traps.c:2488:d0 Domain attempted WRMSR 0000000000000079 from 0x00000000000 00000 to 0xffffc9000519b030. (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1) (XEN) traps.c:2488:d1 Domain attempted WRMSR 0000000000000079 from 0x00000000000 00000 to 0xffffc900001a9030. (XEN) do_IRQ: 1.40 No irq handler for vector (irq -1) (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1) (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1)
Michael Young
On Thu, Jul 05, 2012 at 11:00:05PM +0100, M A Young wrote:
On Tue, 3 Jul 2012, Konrad Rzeszutek Wilk wrote:
Besides that..there is:
9846ff1 xen: support pirq_eoi_map (and its fix) 521394e xen: use the pirq number to check the pirq_eoi_map
In my case it is definitely IRQ related. From xl dmesg I get
If you revert those two, are those issues still present?
(XEN) physdev.c:164: dom0: wrong map_pirq type 3
This is one is due to another one - a fix that went in 3.3 (allowing PCI domains to work).
(XEN) do_IRQ: 1.240 No irq handler for vector (irq -1)
So vector 240...or 0xf0. That I am not sure about.
(XEN) traps.c:2488:d0 Domain attempted WRMSR 0000000000000079 from 0x00000000000 00000 to 0xffffc90005193030. (XEN) traps.c:2488:d0 Domain attempted WRMSR 0000000000000079 from 0x00000000000 00000 to 0xffffc9000519b030. (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1) (XEN) traps.c:2488:d1 Domain attempted WRMSR 0000000000000079 from 0x00000000000 00000 to 0xffffc900001a9030. (XEN) do_IRQ: 1.40 No irq handler for vector (irq -1) (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1) (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1)
Michael Young
On Thu, 5 Jul 2012, Konrad Rzeszutek Wilk wrote:
On Thu, Jul 05, 2012 at 11:00:05PM +0100, M A Young wrote:
On Tue, 3 Jul 2012, Konrad Rzeszutek Wilk wrote:
Besides that..there is:
9846ff1 xen: support pirq_eoi_map (and its fix) 521394e xen: use the pirq number to check the pirq_eoi_map
In my case it is definitely IRQ related. From xl dmesg I get
If you revert those two, are those issues still present?
(XEN) physdev.c:164: dom0: wrong map_pirq type 3
This is one is due to another one - a fix that went in 3.3 (allowing PCI domains to work).
(XEN) do_IRQ: 1.240 No irq handler for vector (irq -1)
So vector 240...or 0xf0. That I am not sure about.
(XEN) traps.c:2488:d0 Domain attempted WRMSR 0000000000000079 from 0x00000000000 00000 to 0xffffc90005193030. (XEN) traps.c:2488:d0 Domain attempted WRMSR 0000000000000079 from 0x00000000000 00000 to 0xffffc9000519b030. (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1) (XEN) traps.c:2488:d1 Domain attempted WRMSR 0000000000000079 from 0x00000000000 00000 to 0xffffc900001a9030. (XEN) do_IRQ: 1.40 No irq handler for vector (irq -1) (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1) (XEN) do_IRQ: 1.240 No irq handler for vector (irq -1)
40 and 240 should be IRQ 0 and 1
(XEN) IRQ: 0 affinity:00000000,00000000,00000000,00000001 vec:f0 type=IO-AP IC-edge status=00000000 mapped, unbound (XEN) IRQ: 1 affinity:00000000,00000000,00000000,00000001 vec:28 type=IO-AP IC-edge status=00000050 in-flight=0 domain-list=0: 1(----), (XEN) IO-APIC interrupt information: (XEN) IRQ 0 Vec240: (XEN) Apic 0x00, Pin 2: vec=f0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1 (XEN) IRQ 1 Vec 40: (XEN) Apic 0x00, Pin 1: vec=28 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1
Incidentally, I have had IRQ problems with this computer before, eg. see http://lists.xen.org/archives/html/xen-devel/2010-08/msg01390.html though that turned out to be on the xen side.
Michael Young
We have seen a significant reduction in performance in our research DomU OS kernel when running on Fedora 16 with Linux 3.4.2 vs. 3.3.7. We run
a series of benchmarks which are DomU-kernel-space-CPU-heavy; many of
these run 10x slower when using the 3.4.2 Linux kernel as Dom0.
This is a little surprising---we've been tracking the Fedora kernels for a long time with no problem like this. Did anyone else notice any changes?
Just to verify.. both the 3.3.7 and 3.4.2 Linux kernel are 'release' builds? and not debug-versions from rawhide?
Yes, they are the Fedora 16 release builds.
The commits that went in (3.4) were:
1fd1443 xen/Kconfig: fix Kconfig layout 76a8df7 xen/pci: don't use PCI BIOS service for configuration space accesses b7e5ffe xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs 558daa2 xen/apic: Return the APIC ID (and version) for CPU 0. a7a97c6 drivers/video/xen-fbfront.c: add missing cleanup code 7eb7ce4 xen: correctly check for pending events when restoring irq flags b930fe5 xen/acpi: Workaround broken BIOSes exporting non-existing C-states. cf405ae xen/smp: Fix crash when booting with ACPI hotplug CPUs. 521394e xen: use the pirq number to check the pirq_eoi_map df88b2d xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded. cd74257 x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler 2a14e54 ACPI: Convert wake_sleep_flags to a value instead of function 3d81acb Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe" 186bab1 xen/resume: Fix compile warnings. 3066616 xen/xenbus: Add quirk to deal with misconfigured backends. a71e23d xen/blkback: Fix warning error. b960d6c xen/p2m: m2p_find_override: use list_for_each_entry_safe e8e937b xen/gntdev: do not set VM_PFNMAP 6b5e7d9 xen/grant-table: add error-handling code on failure of gnttab_resume f09d843 xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success 0ee46ec xen/pciback: fix XEN_PCI_OP_enable_msix result e8c9e78 xen/smp: Remove unnecessary call to smp_processor_id() 2531d64 xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' e95ae5a xen: only check xen_platform_pci_unplug if hvm 136d249 x86/ioapic: Add io_apic_ops driver layer to allow interception 3389bb8 xen/blkback: Make optional features be really optional. 4dae767 xen/blkback: Squash the discard support for 'file' and 'phy' type. df7a3ee xen/acpi: Fix Kconfig dependency on CPU_FREQ f132c5b Fix full_name_hash() behaviour when length is a multiple of 8 b9136d2 xen: initialize platform-pci even if xen_emul_unplug=never 106b443 xen/smp: Fix bringup bug in AP code. 27257fc xen/acpi: Remove the WARN's as they just create noise. 8e6f7c2 xen/tmem: cleanup 9846ff1 xen: support pirq_eoi_map 102b208 xen/acpi-processor: Do not depend on CPU frequency scaling drivers. 48cdd82 xen/cpufreq: Disable the cpu frequency scaling drivers from loading. 448c8b1 provide disable_cpufreq() function to disable the API. 3467811 xen-blkfront: make blkif_io_lock spinlock per-device dad5cf6 xen/blkfront: don't put bdev right after getting it 34ae2e4 xen-blkfront: use bitmap_set() and bitmap_clear() b2167ba xen/blkback: Enable blkback on HVM guests 4f14faa xen/blkback: use grant-table.c hypercall wrappers 4bc25af xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps a7b422c provide disable_cpufreq() function to disable the API. 59a5680 xen/acpi-processor: C and P-state driver that uploads said data to hypervisor. ead1d01 xen: constify all instances of "struct attribute_group" 42c46e6 xen/xenbus: ignore console/0 cf8e019 hvc_xen: introduce HVC_XEN_FRONTEND 02e19f9 hvc_xen: implement multiconsole support eb5ef07 hvc_xen: support PV on HVM consoles bd0d5aa xenbus: don't free other end details too early a1f37788 tboot: Add return values for tboot_sleep 09f98a8 x86, acpi, tboot: Have a ACPI os prepare sleep instead of calling tboot_sleep. 73c154c xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. cc7335b xen/setup/pm/acpi: Remove the call to boot_option_idle_override. 5ac0800 xenbus: address compiler warnings 1160831 xen/pciback: Support pci_reset_function, aka FLR or D3 support. 6fbf9e7 PCI: Introduce __pci_reset_function_locked to be used when holding device_lock. cf66f9d xen/netfront: add netconsole support. f3ff924 Remove useless get_driver()/put_driver() calls 2113f46 xen: use this_cpu_xxx replace percpu_xxx funcs cd9db80 xen/pciback: Support pci_reset_function, aka FLR or D3 support. a96d627 pci: Introduce __pci_reset_function_locked to be used when holding device_lock. 8605c68 xen: Utilize the restore_msi_irqs hook.
I'm having trouble with the Dom0 performance of this kernel too (vs. its effect on DomU performance). I submitted a bug with a reproducible test case:
https://bugzilla.redhat.com/show_bug.cgi?id=841330
On Wed, Jul 18, 2012 at 12:36:43PM -0500, W. Michael Petullo wrote:
We have seen a significant reduction in performance in our research DomU OS kernel when running on Fedora 16 with Linux 3.4.2 vs. 3.3.7. We run
a series of benchmarks which are DomU-kernel-space-CPU-heavy; many of
these run 10x slower when using the 3.4.2 Linux kernel as Dom0.
This is a little surprising---we've been tracking the Fedora kernels for a long time with no problem like this. Did anyone else notice any changes?
Just to verify.. both the 3.3.7 and 3.4.2 Linux kernel are 'release' builds? and not debug-versions from rawhide?
Yes, they are the Fedora 16 release builds.
The commits that went in (3.4) were:
1fd1443 xen/Kconfig: fix Kconfig layout 76a8df7 xen/pci: don't use PCI BIOS service for configuration space accesses b7e5ffe xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs 558daa2 xen/apic: Return the APIC ID (and version) for CPU 0. a7a97c6 drivers/video/xen-fbfront.c: add missing cleanup code 7eb7ce4 xen: correctly check for pending events when restoring irq flags b930fe5 xen/acpi: Workaround broken BIOSes exporting non-existing C-states. cf405ae xen/smp: Fix crash when booting with ACPI hotplug CPUs. 521394e xen: use the pirq number to check the pirq_eoi_map df88b2d xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded. cd74257 x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler 2a14e54 ACPI: Convert wake_sleep_flags to a value instead of function 3d81acb Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe" 186bab1 xen/resume: Fix compile warnings. 3066616 xen/xenbus: Add quirk to deal with misconfigured backends. a71e23d xen/blkback: Fix warning error. b960d6c xen/p2m: m2p_find_override: use list_for_each_entry_safe e8e937b xen/gntdev: do not set VM_PFNMAP 6b5e7d9 xen/grant-table: add error-handling code on failure of gnttab_resume f09d843 xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success 0ee46ec xen/pciback: fix XEN_PCI_OP_enable_msix result e8c9e78 xen/smp: Remove unnecessary call to smp_processor_id() 2531d64 xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' e95ae5a xen: only check xen_platform_pci_unplug if hvm 136d249 x86/ioapic: Add io_apic_ops driver layer to allow interception 3389bb8 xen/blkback: Make optional features be really optional. 4dae767 xen/blkback: Squash the discard support for 'file' and 'phy' type. df7a3ee xen/acpi: Fix Kconfig dependency on CPU_FREQ f132c5b Fix full_name_hash() behaviour when length is a multiple of 8 b9136d2 xen: initialize platform-pci even if xen_emul_unplug=never 106b443 xen/smp: Fix bringup bug in AP code. 27257fc xen/acpi: Remove the WARN's as they just create noise. 8e6f7c2 xen/tmem: cleanup 9846ff1 xen: support pirq_eoi_map 102b208 xen/acpi-processor: Do not depend on CPU frequency scaling drivers. 48cdd82 xen/cpufreq: Disable the cpu frequency scaling drivers from loading. 448c8b1 provide disable_cpufreq() function to disable the API. 3467811 xen-blkfront: make blkif_io_lock spinlock per-device dad5cf6 xen/blkfront: don't put bdev right after getting it 34ae2e4 xen-blkfront: use bitmap_set() and bitmap_clear() b2167ba xen/blkback: Enable blkback on HVM guests 4f14faa xen/blkback: use grant-table.c hypercall wrappers 4bc25af xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps a7b422c provide disable_cpufreq() function to disable the API. 59a5680 xen/acpi-processor: C and P-state driver that uploads said data to hypervisor. ead1d01 xen: constify all instances of "struct attribute_group" 42c46e6 xen/xenbus: ignore console/0 cf8e019 hvc_xen: introduce HVC_XEN_FRONTEND 02e19f9 hvc_xen: implement multiconsole support eb5ef07 hvc_xen: support PV on HVM consoles bd0d5aa xenbus: don't free other end details too early a1f37788 tboot: Add return values for tboot_sleep 09f98a8 x86, acpi, tboot: Have a ACPI os prepare sleep instead of calling tboot_sleep. 73c154c xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. cc7335b xen/setup/pm/acpi: Remove the call to boot_option_idle_override. 5ac0800 xenbus: address compiler warnings 1160831 xen/pciback: Support pci_reset_function, aka FLR or D3 support. 6fbf9e7 PCI: Introduce __pci_reset_function_locked to be used when holding device_lock. cf66f9d xen/netfront: add netconsole support. f3ff924 Remove useless get_driver()/put_driver() calls 2113f46 xen: use this_cpu_xxx replace percpu_xxx funcs cd9db80 xen/pciback: Support pci_reset_function, aka FLR or D3 support. a96d627 pci: Introduce __pci_reset_function_locked to be used when holding device_lock. 8605c68 xen: Utilize the restore_msi_irqs hook.
I'm having trouble with the Dom0 performance of this kernel too (vs. its effect on DomU performance). I submitted a bug with a reproducible test case:
OK, did you try any of the options I've asked for (see earlier part of the thread)?
On Wed, Jul 18, 2012 at 02:50:14PM -0400, Konrad Rzeszutek Wilk wrote:
On Wed, Jul 18, 2012 at 12:36:43PM -0500, W. Michael Petullo wrote:
We have seen a significant reduction in performance in our research DomU OS kernel when running on Fedora 16 with Linux 3.4.2 vs. 3.3.7. We run
a series of benchmarks which are DomU-kernel-space-CPU-heavy; many of
these run 10x slower when using the 3.4.2 Linux kernel as Dom0.
This is a little surprising---we've been tracking the Fedora kernels for a long time with no problem like this. Did anyone else notice any changes?
Just to verify.. both the 3.3.7 and 3.4.2 Linux kernel are 'release' builds? and not debug-versions from rawhide?
Yes, they are the Fedora 16 release builds.
The commits that went in (3.4) were:
1fd1443 xen/Kconfig: fix Kconfig layout 76a8df7 xen/pci: don't use PCI BIOS service for configuration space accesses b7e5ffe xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs 558daa2 xen/apic: Return the APIC ID (and version) for CPU 0. a7a97c6 drivers/video/xen-fbfront.c: add missing cleanup code 7eb7ce4 xen: correctly check for pending events when restoring irq flags b930fe5 xen/acpi: Workaround broken BIOSes exporting non-existing C-states. cf405ae xen/smp: Fix crash when booting with ACPI hotplug CPUs. 521394e xen: use the pirq number to check the pirq_eoi_map df88b2d xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded. cd74257 x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler 2a14e54 ACPI: Convert wake_sleep_flags to a value instead of function 3d81acb Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe" 186bab1 xen/resume: Fix compile warnings. 3066616 xen/xenbus: Add quirk to deal with misconfigured backends. a71e23d xen/blkback: Fix warning error. b960d6c xen/p2m: m2p_find_override: use list_for_each_entry_safe e8e937b xen/gntdev: do not set VM_PFNMAP 6b5e7d9 xen/grant-table: add error-handling code on failure of gnttab_resume f09d843 xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success 0ee46ec xen/pciback: fix XEN_PCI_OP_enable_msix result e8c9e78 xen/smp: Remove unnecessary call to smp_processor_id() 2531d64 xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' e95ae5a xen: only check xen_platform_pci_unplug if hvm 136d249 x86/ioapic: Add io_apic_ops driver layer to allow interception 3389bb8 xen/blkback: Make optional features be really optional. 4dae767 xen/blkback: Squash the discard support for 'file' and 'phy' type. df7a3ee xen/acpi: Fix Kconfig dependency on CPU_FREQ f132c5b Fix full_name_hash() behaviour when length is a multiple of 8 b9136d2 xen: initialize platform-pci even if xen_emul_unplug=never 106b443 xen/smp: Fix bringup bug in AP code. 27257fc xen/acpi: Remove the WARN's as they just create noise. 8e6f7c2 xen/tmem: cleanup 9846ff1 xen: support pirq_eoi_map 102b208 xen/acpi-processor: Do not depend on CPU frequency scaling drivers. 48cdd82 xen/cpufreq: Disable the cpu frequency scaling drivers from loading. 448c8b1 provide disable_cpufreq() function to disable the API. 3467811 xen-blkfront: make blkif_io_lock spinlock per-device dad5cf6 xen/blkfront: don't put bdev right after getting it 34ae2e4 xen-blkfront: use bitmap_set() and bitmap_clear() b2167ba xen/blkback: Enable blkback on HVM guests 4f14faa xen/blkback: use grant-table.c hypercall wrappers 4bc25af xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps a7b422c provide disable_cpufreq() function to disable the API. 59a5680 xen/acpi-processor: C and P-state driver that uploads said data to hypervisor. ead1d01 xen: constify all instances of "struct attribute_group" 42c46e6 xen/xenbus: ignore console/0 cf8e019 hvc_xen: introduce HVC_XEN_FRONTEND 02e19f9 hvc_xen: implement multiconsole support eb5ef07 hvc_xen: support PV on HVM consoles bd0d5aa xenbus: don't free other end details too early a1f37788 tboot: Add return values for tboot_sleep 09f98a8 x86, acpi, tboot: Have a ACPI os prepare sleep instead of calling tboot_sleep. 73c154c xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. cc7335b xen/setup/pm/acpi: Remove the call to boot_option_idle_override. 5ac0800 xenbus: address compiler warnings 1160831 xen/pciback: Support pci_reset_function, aka FLR or D3 support. 6fbf9e7 PCI: Introduce __pci_reset_function_locked to be used when holding device_lock. cf66f9d xen/netfront: add netconsole support. f3ff924 Remove useless get_driver()/put_driver() calls 2113f46 xen: use this_cpu_xxx replace percpu_xxx funcs cd9db80 xen/pciback: Support pci_reset_function, aka FLR or D3 support. a96d627 pci: Introduce __pci_reset_function_locked to be used when holding device_lock. 8605c68 xen: Utilize the restore_msi_irqs hook.
I'm having trouble with the Dom0 performance of this kernel too (vs. its effect on DomU performance). I submitted a bug with a reproducible test case:
OK, did you try any of the options I've asked for (see earlier part of the thread)?
And you are running an AMD CPU. I think you are hitting an Xen power affecting bug. See in 3.4 the driver that uploads power mananagement data to the hypervisor is now enabled - which means it exposes some power management functionality in the hypervisor that v3.3 did not activate. Hence I want you try to run the hypervisor at performance mode to double check.
-- xen mailing list xen@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/xen
On Tue, Jul 03, 2012 at 08:59:34AM -0400, Konrad Rzeszutek Wilk wrote:
On Tue, Jul 03, 2012 at 07:16:04AM -0500, W. Michael Petullo wrote:
We have seen a significant reduction in performance in our research DomU OS kernel when running on Fedora 16 with Linux 3.4.2 vs. 3.3.7. We run
a series of benchmarks which are DomU-kernel-space-CPU-heavy; many of
these run 10x slower when using the 3.4.2 Linux kernel as Dom0.
This is a little surprising---we've been tracking the Fedora kernels for a long time with no problem like this. Did anyone else notice any changes?
Just to verify.. both the 3.3.7 and 3.4.2 Linux kernel are 'release' builds? and not debug-versions from rawhide?
Yes, they are the Fedora 16 release builds.
The commits that went in (3.4) were:
...
So one thing that you might be hitting is that now the CPU freq driver is uploading the data to the hypervisor - the hypervisor might be doing power-save stuff instead of concentrating on giving your raw performance.
So can you start with 'cpufreq=verbose,performance' on your hypervisor line.
Michael openned a bug and on it we found that the xen-acpi-processor.off=1 would solve the performance problem. What that does is to not upload C-states and P-states information to the hypervisor. So I pulled up an AMD box and found that the problem is only if hypervisor enters C-2 states. If I do 'xenpm set-max-cstate 1' it gets back to working nicely.
Wei, any ideas? This is with Xen 4.1