Re: Kernel-3.1 Crash

Thursday, 27 October 2011

On Thu, Oct 27, 2011 at 02:43:22PM -0400, Jeff Moyer wrote:
...
 >> This doesn't look like the same problem.  Here we've
got BUG: scheduling
 >> while atomic.  If it was the bug fixed by the above commits, then you
 >> would hit a BUG_ON.  I would start looking at the btrfs bits to see if
 >> they're holding any locks in this code path.
 >
 > Ignore that one and move to IMG_0350.IMG.  'scheduling while atomic' is
 > just noise.  Besides Mike and Vivek told me to blame you for not pushing
 > Jens harder on these fixes. :-)))))

 I'm looking at 0355, which shows the very top of the trace, and that
 says BUG: scheduling while atomic.  So the problem reported here *is*
 different from the one fixed by the above two commits.  In fact, I don't
 see evidence of the multipath + flush issue in any of these pictures. 
You have to ignore the 'schedule while atomic' thing it is just a

printk("BUG: scheduling while atomic"), it is _not_ a BUG().  :-)
(hint read kernel/sched.c::__schedule_bug)

I see those messages all the time, it really should be a WARN and not a
misleading BUG, but whatever. 

His machine died because the NMI watchdog detected a lockup.  The lockup
was because in blk_insert_cloned_request(), spin_lock_irqsave disabled
interrupts and spun forever waiting on the q->queue_lock (IMG_0350.JPG).

Mike and Vivek both said that is what you fixed for 3.2.  They also said
the only caller of blk_insert_cloned_request() is multipath, hence that
argument.  I'll cc them.  Or maybe I can have them walk over to your cube.
:-)

Cheers,
Don

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: Kernel-3.1 Crash