Host lockup
OS is:
Linux
seanl64.xxxxxx.com 2.6.32.14-1.2.107.xendom0.fc12.x86_64 #1 SMP Wed Jun
16 19:26:35 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
xen-4.0.1-0.1.rc3.fc12.x86_64
livelock appears to be related to the network driver interrupt (source: from
scanning the news groups). The network LED is flashing continuously, but the
system is locked up from the console. We cannot SSH into the box anymore.
Could this be related to interrupt priorities (don't know the hardware ins and
outs of that).
We have a serial captcha of the XEN output!!
Here is the suspicious piece:
(XEN) MCE: MSR 417 is not MCA MSR
(XEN) traps.c:2854: GPF (0000): ffff82c4801ae806 -> ffff82c4801f7cb3
(XEN) vlapic.c:702:d13 Local APIC Write to read-only register 0x30
(XEN) vlapic.c:702:d13 Local APIC Write to read-only register 0x20
(XEN) vlapic.c:702:d13 Local APIC Write to read-only register 0x20
(XEN) irq.c:243: Dom13 PCI link 0 changed 5 -> 0
(XEN) irq.c:243: Dom13 PCI link 1 changed 10 -> 0
(XEN) irq.c:243: Dom13 PCI link 2 changed 11 -> 0
(XEN) irq.c:243: Dom13 PCI link 3 changed 5 -> 0
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 1683748570 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 1683749071 seconds
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 86402 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 91586 seconds
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 86408 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 92211 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 86456 seconds
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 94532 seconds
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 86402 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 94742 seconds
Note the "skipping 1683748570 seconds" which is more than 53 years.
But we haven't had the computer switched on for that long :-)
Cheers
V
On Thu, 1 Jul 2010 09:25:48 am Virgil wrote:
Update:
Completely stable with 7 VMs. Hasn't missed a beat for several days now.
Will now run the pings again in 2 of 64PV from the virtual consoles (no
graphics - and disconnected) and see if the host clock tick dies again.
Cheers
V
On Thu, 24 Jun 2010 03:17:51 pm Virgil wrote:
> Quick update:
>
> Added 3 more VMs. Total of 7 now on this "desktop" computer.
>
> 3x32PCFC6
> 1x32HVFC12
> 1x32HVwinXP-pro
> 2x64PVFC12
>
> Host is maxed out now.
>
> Everything going well when the pings are not running in the 64PVFC12
> machines.
>
> Will leave it going for another couple of days.
>
> Cheers
> V
>
> p.s. FYI Samba4-Alpha12 Active Directory controller is working well on
> 64PVFC12.
>
> On Wed, 23 Jun 2010 05:18:56 pm Virgil wrote:
> > On Wed, 23 Jun 2010 03:30:52 pm Pasi Kärkkäinen wrote:
> > > On Wed, Jun 23, 2010 at 11:11:58AM +1000, Virgil wrote:
> > > > Hi Pasi,
> > > >
> > > > Had a hiccup overnite:
> > > >
> > > > The host became unresponsive in a weird way. The time stopped
> > > > incrementing.
> > > >
> > > > Turns out the clock stopped ticking (which I put down to the
> > > > interrupts being disconnected).
> > > >
> > > > Anyway I decided I'd reset the time using 'time -s
10:41:30'.
> > > >
> > > > Kaboom, or actually deathly silence. The machine fully stopped dead
> > > > in its tracks.
> > > >
> > > > Just prior to this I connected to the console of one of the 64PV
> > > > machines which was just running a ping from yesterday. Anyway,
> > > > 60,000 or so lines of pings went to the console zipping up the
> > > > screen. Then it was dead. I did a CTRL-C and eventually it
> > > > returned to the prompt.
> > > >
> > > > So I looked at the other 64PV machine, which was also pining, and
> > > > identical situation.
> > > >
> > > > So I reckon, there's some kind of buffer overflow going on when
> > > > you're not "xm console MACHINE" connected. Once you
pass 60,000
> > > > lines of text this buffer overflow causes the RTC to hangup
> > > > somehow.
> > >
> > > Do you have xenconsoled running?
> > >
> > > I've noticed PV guests that print a lot to the console will stall if
> > > xenconsoled is not running.. xenconsoled needs to clear the guest
> > > console buffer..
> > >
> > > -- Pasi
> >
> > Seems to be now. Pretty sure it was then too.
> >
> > udev-post 0:off 1:on 2:on 3:on 4:on 5:on 6:off
> > wpa_supplicant 0:off 1:off 2:off 3:off 4:off 5:off 6:off
> > xenconsoled 0:off 1:off 2:off 3:on 4:on 5:on 6:off
> > xend 0:off 1:off 2:off 3:on 4:on 5:on 6:off
> > xendomains 0:off 1:off 2:off 3:on 4:on 5:on 6:off
> > xenstored 0:off 1:off 2:off 3:on 4:on 5:on 6:off
> > ypbind 0:off 1:off 2:off 3:off 4:off 5:off 6:off
> > [root@seanl64 ~]# ps -ef | grep xenconso
> > root 1508 1 0 10:19 ? 00:00:00 /usr/sbin/xenconsoled
> > --log=none --log-dir=/var/log/xen/console root 7815 7732 0 17:07
> > pts/5 00:00:00 grep xenconso
> >
> > Cheers
> > V
> >
> > > > I pressed the reset button, but this time the 2 64PV machines are
> > > > not logged in. I'll just let it go and see if it keeps going.
> > > >
> > > > Cheers
> > > > V
> > > >
> > > > On Tue, 22 Jun 2010 04:29:06 pm Pasi Kärkkäinen wrote:
> > > > > On Tue, Jun 22, 2010 at 12:03:53PM +1000, Virgil wrote:
> > > > > > Hi Pasi,
> > > > > >
> > > > > > On Mon, 21 Jun 2010 08:57:55 pm Pasi Kärkkäinen wrote:
> > > > > > > On Mon, Jun 21, 2010 at 01:56:36PM +0300, Pasi
Kärkkäinen
wrote:
> > > > > > > > On Mon, Jun 21, 2010 at
02:28:15PM +1000, Virgil wrote:
> > > > > > > > > Another quick update....
> > > > > > > > >
> > > > > > > > > xen-4.0.1-0.1.rc3.fc13.src.rpm just compiled
this under
> > > > > > > > > fc12.
> > > > > > > > >
> > > > > > > > > Identical results with this too (i.e.
it's probably in
> > > > > > > > > the kernel).
> > > > > > > > >
> > > > > > > > > I have a (silly) idea for the serial
console. The wiki
> > > > > > > > > page recommends using a phone camera to
capture the
> > > > > > > > > screen....
> > > > > > > > >
> > > > > > > > > Well my idea is to add an n-millisecond
delay every time
> > > > > > > > > the output stream in Xen sees a \n. This
would delay the
> > > > > > > > > screen updates enough for the camera to see
them. The n
> > > > > > > > > should be configurable on the kernel boot
command line.
> > > > > > > > > It's set to 0 right now.
> > > > > > > >
> > > > > > > > Yeah, we really need to get a log somehow to
troubleshoot
> > > > > > > > your problem.
> > > > > > > >
> > > > > > > > Serial console log would be the best:
> > > > > > > >
http://wiki.xensource.com/xenwiki/XenSerialConsole
> > > > > > >
> > > > > > > Btw are you running the latest kernel:
> > > > > > >
http://koji.fedoraproject.org/koji/taskinfo?taskID=2254110
> > > > > > >
> > > > > > > Or are you running custom/self compiled kernel?
> > > > > >
> > > > > > Everything is working with:
> > > > > > xen-4.0.1-0.1.rc3 compiled from source on fc12 machine and
> > > > > > 2.6.32.14-1.2.107.xendom0.fc12.x86_64 from the myoung
repo.
> > > > > >
> > > > > > All fixed.
> > > > >
> > > > > Good to hear it works!
> > > > >
> > > > > > We also now have a "null modem" cable to another
old computer
> > > > > > with a COM port. Turns out I was the only old man that
could
> > > > > > remember what a null modem cable is. The young guy said
"wtf"?
> > > > > > Also turns out I'm the only one who knows what minicom
is and
> > > > > > what 8N1 means
> > > > > >
> > > > > > :-)
> > > > >
> > > > > Hehe.. yeah I guess young people don't get to play with
serial
> > > > > consoles nowadays, until they're doing networking stuff..
> > > > >
> > > > > So I guess most SOL devices in servers go unused.. :)
> > > > >
> > > > > -- Pasi
> > > > >
> > > > > > All VMs are now running concurrently.
> > > > > >
> > > > > > Very happy again. Thanks.
> > > > > > V
> > > > > >
> > > > > > > -- Pasi
> > > > > > >
> > > > > > > > > Cheers
> > > > > > > > > V
> > > > > > > > >
> > > > > > > > > On Mon, 21 Jun 2010 12:10:17 pm Virgil
wrote:
> > > > > > > > > > Just a quick update:
> > > > > > > > > >
> > > > > > > > > > Just tried xen-4.0.0-2. Recompile from
source on
> > > > > > > > > > fc12.x86_64.
> > > > > > > > > >
> > > > > > > > > > identical behaviour.
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > > V
> > > > > > > > > >
> > > > > > > > > > On Fri, 18 Jun 2010 03:17:19 pm Virgil
wrote:
> > > > > > > > > > > On Sat, 29 May 2010 11:26:50 pm M
A Young wrote:
> > > > > > > > > > > > If anyone wants to test xen
3.4.3, I have put up a
> > > > > > > > > > > > source RPM at
> > > > > > > > > > > >
http://myoung.fedorapeople.org/dom0/src/xen-3.4.3-0
> > > > > > > > > > > > .9 1. fc 13. src.r pm
> > > > > > > > > > > >
> > > > > > > > > > > > Michael Young
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > xen mailing list
> > > > > > > > > > > > xen(a)lists.fedoraproject.org
> > > > > > > > > > > >
https://admin.fedoraproject.org/mailman/listinfo/xe
> > > > > > > > > > > > n
> > > > > > > > > > >
> > > > > > > > > > > Hi list,
> > > > > > > > > > >
> > > > > > > > > > > Host crashing on 64FC12 kernel
-105 dom0 when 2 PV64
> > > > > > > > > > > machines are run.
> > > > > > > > > > >
> > > > > > > > > > > I can run HV32WinXP and HV32FC12
and 1 PV64FC12 all
> > > > > > > > > > > at the same time.
> > > > > > > > > > >
> > > > > > > > > > > However, when any combination
involves 2 PV64FC12
> > > > > > > > > > > (kernel version doesn't
matter) the host crashes.
> > > > > > > > > > >
> > > > > > > > > > > Running on the -97 dom0 everything
works in all
> > > > > > > > > > > combos.
> > > > > > > > > > >
> > > > > > > > > > > Using Xen 3.4.3.
> > > > > > > > > > >
> > > > > > > > > > > Turning off the virt network cards
in the PV64FC12
> > > > > > > > > > > machines makes things go
(obviously not much use
> > > > > > > > > > > though).
> > > > > > > > > > >
> > > > > > > > > > > Tried disabling IPV6, firewall
stuff etc. etc.
> > > > > > > > > > >
> > > > > > > > > > > Sometimes it would fire up and go
but whichever
> > > > > > > > > > > machine is started second gets
really long ping
> > > > > > > > > > > times like it's not receiving
unless it sends
> > > > > > > > > > > something (if that makes sense).
Sooner or later the
> > > > > > > > > > > host crashes.
> > > > > > > > > > >
> > > > > > > > > > > Strangely a PV64FC12 and a
PV64FC10 machine coexist
> > > > > > > > > > > happily. It's only when a
second PV64FC12 machine
> > > > > > > > > > > starts up.
> > > > > > > > > > >
> > > > > > > > > > > V
> > > > > > > > > > > --
> > > > > > > > > > > xen mailing list
> > > > > > > > > > > xen(a)lists.fedoraproject.org
> > > > > > > > > > >
https://admin.fedoraproject.org/mailman/listinfo/xen
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > xen mailing list
> > > > > > > > > > xen(a)lists.fedoraproject.org
> > > > > > > > > >
https://admin.fedoraproject.org/mailman/listinfo/xen
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > xen mailing list
> > > > > > > > > xen(a)lists.fedoraproject.org
> > > > > > > > >
https://admin.fedoraproject.org/mailman/listinfo/xen
> > > > > > > >
> > > > > > > > --
> > > > > > > > xen mailing list
> > > > > > > > xen(a)lists.fedoraproject.org
> > > > > > > >
https://admin.fedoraproject.org/mailman/listinfo/xen
> >
> > --
> > xen mailing list
> > xen(a)lists.fedoraproject.org
> >
https://admin.fedoraproject.org/mailman/listinfo/xen
>
> --
> xen mailing list
> xen(a)lists.fedoraproject.org
>
https://admin.fedoraproject.org/mailman/listinfo/xen
--
xen mailing list
xen(a)lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/xen