On Mon, Nov 21, 2011 at 10:26:52AM -0500, Adam Jackson wrote:
> On Mon, 2011-11-21 at 12:11 +0100, Gianluca Cecchi wrote:
> > On Fri, Nov 18, 2011 at 7:00 PM, Adam Jackson  wrote:
> > 
> > > If you debuginfo-install gnome-shell, attach with gdb instead of sending
> > > SIGHUP, and run 'thread apply all backtrace', what do you get?
> > 
> > As MJ whould have said... This is it ... ;-)
> The interesting part seems to be:
> Thread 1 (Thread 0x7fbf4aa8c9c0 (LWP 1609)):
> #0  0x0000003cb7ee6443 in __GI___poll (fds=<optimized out>,
> nfds=<optimized out>, timeout=<optimized out>)
> at ../sysdeps/unix/sysv/linux/poll.c:87
> #1  0x0000003cbc208ba2 in ?? () from /usr/lib64/
> #2  0x0000003cbc2090ff in ?? () from /usr/lib64/
> #3  0x0000003cbc209184 in xcb_writev () from /usr/lib64/
> #4  0x0000003cbc6456e7 in _XSend (dpy=0xb47a30, data=<optimized out>,
> size=<optimized out>) at xcb_io.c:436
> #5  0x0000003cbc639d55 in SendZImage (dest_scanline_pad=0,
> dest_bits_per_pixel=32, req_yoffset=<optimized out>, req_xoffset=0,
> image=0x7fffe21a7240, 
>     req=<optimized out>, dpy=0xb47a30) at PutImage.c:802
> This is showing gnome-shell trying to write an image to the X server,
> but blocking because the socket to the X server does not appear to be
> ready for writing.
> So there's (at least) three things that could be going wrong here, from
> probably most to least likely:
> 1) the write queue to the X server really might be blocked
> 2) libxcb could have a logic bug that's getting stuck here
> 3) the kernel might have a bug in poll()
> #1 typically only happens in two cases: either the X server is stuck
> away from the dispatch loop, or it's explicitly ignoring you because
> there's a grab in process.  In the former case SIGHUP wouldn't help,
> simply reloading the shell won't un-stick the X server.  But in the
> latter case, it might; if the grab is from one of the shell's other
> threads, then closing all of the shell's display connections would reset
> the grab.
> So my next intuition would be to gdb the X server and see what's up.  If
> you find it waiting patiently on a call to select(), then the second
> case is more likely, and 'print AllClients' should show you an fd_set
> with only one bit set.

I think I have the same problem here, I've followed it once, gdbing the
server, it was in select, so maybe I'll try to do it again and do the
'print AllClients' - for me reproducing is 100% by doing a chvt /
suspend and resume. To get back to work (i.e.  workaround) I chvt to
some console, do "killall -9 gnome-shell; sleep 5; DISPLAY=:0.0
gnome-shell" and quickly change back. Recently gnome-shell started to
get unstuck occasionally if I wait about 10-20 seconds, but I'm not
always that patient.

> - ajax

