PROBLEM alert - Host fas03 is DOWN

Stephen John Smoogen smooge at gmail.com
Sun Sep 12 16:12:00 UTC 2010


On Sun, Sep 12, 2010 at 09:46, Jon Masters <jonathan at jonmasters.org> wrote:
> On Sat, 2010-09-11 at 11:40 -0500, Mike McGrath wrote:
>> On Sat, 11 Sep 2010, Jon Masters wrote:
>>
>> > On Sat, 2010-09-11 at 02:51 -0400, Jon Masters wrote:
>> > > On Fri, 2010-09-10 at 19:24 -0600, Stephen John Smoogen wrote:
>> > >
>> > > > Sep 11 01:10:23 fas03 kernel: WARNING: at block/blk-core.c:338
>> > >
>> > > > Sep 11 01:10:23 fas03 kernel: [<c044fc97>] ? warn_slowpath_common+0x77/0xb0
>> > > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
>> > > > Sep 11 01:10:23 fas03 kernel: [<c044fce3>] ? warn_slowpath_null+0x13/0x20
>> > > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
>> > > > Sep 11 01:10:23 fas03 kernel: [<ed63896b>] ?
>> > > > kick_pending_request_queues+0x1b/0x30 [xen_blkfront]
>> > > > Sep 11 01:10:23 fas03 kernel: [<ed638b80>] ?
>> > > > blkif_interrupt+0x200/0x220 [xen_blkfront]
>> > > > Sep 11 01:10:23 fas03 kernel: [<c04ad7c5>] ? handle_IRQ_event+0x45/0x140
>> > >
>> > > The code in block/blk-core:338 contains an explicit check to ensure that
>> > > interrupts have been disabled, but this not true since blkif_interrupt
>> > > is not registered with IRQF_DISABLED set at the time of the setup in
>> > > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
>> > > when we get to kick_pending_request_queues. Does this always happen?
>> > >
>> > > This perhaps happened because upstream removed IRQF_DISABLED and now
>> > > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
>> > > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
>> > > and I might be reading this wrong, but I at least suggest you open a
>> > > RHEL6 bug and try a more recent kernel build.
>> >
>> > Ah, of course I shouldn't email before bed. There's an obvious giant
>> > spin_lock_irqsave/restore there, but as noted on xen-devel (when they
>> > were mulling over moving all of the blkif_interrupt bits into a tasklet
>> > jut a couple of weeks ago): "It looks like __blk_end_request_all...is
>> > returning with interrupts enabled sometimes". I pinged some folks.
>> >
>>
>> Just so everyone else knows, I've set kernel.panic to 10 on these hosts so
>> at least they'll reboot when they panic.  Hopefully we can avoid a few
>> wake-and-reboot issues like we had last night :-/
>
> Mike, is there any chance you could boot the -debug kernel on one of
> these affected systems? Also, can you let us know about the host?
>

kernel.panic set to 10 did not reboot the systems. What and where is a
debug kernel?




-- 
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines


More information about the infrastructure mailing list