PROBLEM alert - Host fas03 is DOWN
jcm at redhat.com
Sat Sep 11 17:12:36 UTC 2010
On Sat, 2010-09-11 at 11:40 -0500, Mike McGrath wrote:
> On Sat, 11 Sep 2010, Jon Masters wrote:
> > > The code in block/blk-core:338 contains an explicit check to ensure that
> > > interrupts have been disabled, but this not true since blkif_interrupt
> > > is not registered with IRQF_DISABLED set at the time of the setup in
> > > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
> > > when we get to kick_pending_request_queues. Does this always happen?
> > >
> > > This perhaps happened because upstream removed IRQF_DISABLED and now
> > > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
> > > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
> > > and I might be reading this wrong, but I at least suggest you open a
> > > RHEL6 bug and try a more recent kernel build.
> > Ah, of course I shouldn't email before bed. There's an obvious giant
> > spin_lock_irqsave/restore there, but as noted on xen-devel (when they
> > were mulling over moving all of the blkif_interrupt bits into a tasklet
> > jut a couple of weeks ago): "It looks like __blk_end_request_all...is
> > returning with interrupts enabled sometimes". I pinged some folks.
> Just so everyone else knows, I've set kernel.panic to 10 on these hosts so
> at least they'll reboot when they panic. Hopefully we can avoid a few
> wake-and-reboot issues like we had last night :-/
I pinged some folks about it last night. I would hope there will be a
fix for that soon. I suspect it's reproducible on the 70+ kernels, but
can you check that for us and update the BZ?
More information about the infrastructure