Kernel-3.1 Crash

Thu Oct 27 19:44:32 UTC 2011

Vivek Goyal <vgoyal at redhat.com> writes:

> On Thu, Oct 27, 2011 at 03:09:05PM -0400, Don Zickus wrote:
>> On Thu, Oct 27, 2011 at 02:43:22PM -0400, Jeff Moyer wrote:
>> > >> This doesn't look like the same problem.  Here we've got BUG: scheduling
>> > >> while atomic.  If it was the bug fixed by the above commits, then you
>> > >> would hit a BUG_ON.  I would start looking at the btrfs bits to see if
>> > >> they're holding any locks in this code path.
>> > >
>> > > Ignore that one and move to IMG_0350.IMG.  'scheduling while atomic' is
>> > > just noise.  Besides Mike and Vivek told me to blame you for not pushing
>> > > Jens harder on these fixes. :-)))))
>> > 
>> > I'm looking at 0355, which shows the very top of the trace, and that
>> > says BUG: scheduling while atomic.  So the problem reported here *is*
>> > different from the one fixed by the above two commits.  In fact, I don't
>> > see evidence of the multipath + flush issue in any of these pictures.
>> 
>> You have to ignore the 'schedule while atomic' thing it is just a
>> 
>> printk("BUG: scheduling while atomic"), it is _not_ a BUG().  :-)
>> (hint read kernel/sched.c::__schedule_bug)
>
> May be thread holding the queue lock got scheduled out hence leading to
> deadlock. ?

Assuming all of these messages were from the same boot, the scheduling
while atomic message actually came *after* the nmi lockup detection
logic fired.

Is there any more information available on this bug?  Is it
reproducible?  What is the storage configuration?

-Jeff