New subject: f10 x86_64 xen VM guests fail to boot on f8 host (guest setting NX bit in L1 PTE?)

Monday, 19 January 2009

Hi,

I've posted this same problem on the fedora-xen list, and the fedora
forums.  Sorry to anybody who is getting duplicates.

Additional log info is available at
http://forums.fedoraforum.org/showthread.php?p=1149972&posted=1#post1149
972
It is also formatted a lot better and may be easier to follow.

------------------------------------------------------------------
I have two machines running fresh installs of f8 with the xen. Kernel
and all software versions are the same on both.
Specifically:
[root@machineA boot]# uname -a
Linux machineA 2.6.21.7-5.fc8xen #1 SMP Thu Aug 7 12:44:22 EDT 2008
x86_64 x86_64 x86_64 GNU/Linux
[root@machineA boot]# virsh version
Compiled against library: libvir 0.4.4
Using library: libvir 0.4.4
Using API: Xen 3.0.1
Running hypervisor: Xen 3.1.0
------------------------------------------------------------------

And:
------------------------------------------------------------------
[root@machineB ~]# uname -a
Linux machineB 2.6.21.7-5.fc8xen #1 SMP Thu Aug 7 12:44:22 EDT 2008
x86_64 x86_64 x86_64 GNU/Linux
[root@machineB ~]# virsh version
Compiled against library: libvir 0.4.4
Using library: libvir 0.4.4
Using API: Xen 3.0.1
 Running hypervisor: Xen 3.1.0

MachineA has two AMD Opteron 275s. MachineB has four Intel(R) Xeon(TM)
CPU 2.80GHz processors.

Both machines are as up to date as possible.

I can boot or create x86_64 f10 guests on MachineA with no trouble
whatsoever.

MachineB will not boot/create x86_64 f10 guests.

The configuration files are created in the same manner, but as soon as
Xen tries to unpause the newly created domain, it crashes pretty much
instantly.

------------------------------------------------------------------
/var/log/xen/xend.log relevant output:
[2009-01-16 14:45:32 4120] DEBUG (DevController:150) Waiting for devices
vtpm.
[2009-01-16 14:45:32 4120] INFO (XendDomain:1130) Domain f10testB (21)
unpaused.
[2009-01-16 14:45:32 4120] WARNING (XendDomainInfo:1203) Domain has
crashed: name=f10testB id=21.
[2009-01-16 14:45:32 4120] DEBUG (XendDomainInfo:1802)
XendDomainInfo.destroy: domid=21
[2009-01-16 14:45:32 4120] DEBUG (XendDomainInfo:1821)
XendDomainInfo.destroyDomain(21)
------------------------------------------------------------------

I've also tried moving a functional guest from MachineA to MachineB to
boot it there, with the same results. Guest will not boot on MachineB.

f8 64bit guests will boot on MachineB with no problems.
f10 32bit guests will boot on MachineB with no problems.

Only 64bit machines seem to be borked.

Mark on the fedora-xen list suggested running xenctx on the crashed
domain. Output is as follows:
------------------------------------------------------------------
xenctx output:
/usr/lib64/xen/bin/xenctx -s System.map-2.6.27.5-117.fc10.x86_64 46
rip: ffffffff8100b8a2 set_page_prot+0x6d
rsp: ffffffff81573f08
rax: ffffffea   rbx: 000016e1   rcx: 00000055   rdx: 00000000
rsi: 800000014ffc6061   rdi: ffffffff816e1000   rbp: ffffffff81573f68
 r8: 0000000f    r9: ffffffff817eb450   r10: ffffffff817eb650   r11:
00000010
r12: ffffffff816e1000   r13: 800000014ffc6061   r14: 8000000000000161
r15: 00000016
 cs: 0000e033    ds: 00000000    fs: 00000000    gs: 00000000

Stack:
 0000000000000055 0000000000000010 ffffffff8100b8a2 000000010000e030
 0000000000010082 ffffffff81573f48 000000000000e02b ffffffff8100b89e
0000000000000200 ffffffff816e4000 0000000000000800 0000000000002c00
 ffffffff81573ff8 ffffffff815a3c60 0000000000002c00 0000000000000000

Code:
7b 4a 1d 00 4c 89 e7 4c 89 ee 31 d2 e8 22 d9 ff ff 85 c0 74 04 <0f> 0b
eb fe 5b 41 5c 41 5d 41 5e

Call Trace:
  [<ffffffff8100b8a2>] set_page_prot+0x6d <--
  [<ffffffff8100b8a2>] set_page_prot+0x6d
  [<ffffffff8100b89e>] set_page_prot+0x69
  [<ffffffff815a3c60>] xen_start_kernel+0x5dd
------------------------------------------------------------------

I also finally figured out you can look at the Xen dmesg, which includes
the following line:
(XEN) traps.c:405:d44 Unhandled invalid opcode fault/trap [#6] in domain
46 on VCPU 0 [ec=0000]

The domain does install so the following bug does not seem to be the
cause of the current issues:
http://fedoraproject.org/wiki/Bugs/F10Common#Installing_Fedora_10_DomU_o
n_Fedora_8_Dom0_Fails

Any information / help / insight as to why this is happening would be
very much appreciated. The machines are pretty similar, and since the
guests are paravirtualized it does not really make sense for the
processors to be the cause of the problem.

Thanks,
jon

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

f10 x86_64 xen VM guests fail to boot on f8 host