Getting harder and harder to debug startup probs

Adam Williamson awilliam at redhat.com
Tue Feb 14 02:38:14 UTC 2012


On Mon, 2012-02-13 at 18:30 -0800, Adam Williamson wrote:
> On Mon, 2012-02-13 at 10:59 +0000, M A Young wrote:
> > On Mon, 13 Feb 2012, Matthias Runge wrote:
> > 
> > > More general:
> > >
> > > If you're hit by some bug, you get lost; nowadays sooner than later. I
> > > can't count, how many times I had some error getting X up (or even
> > > system up) since the move to systemd.
> > >
> > > Next sad thing is, it isn't reproducible every time. Since we're
> > > testing moving targets, it's pretty unclear, how to reproduce the
> > > situtation _now_ on _my_ special system? Since I can't login, I can't
> > > get a package list.
> > 
> > It isn't just systemd. I don't think gnome-shell helps either, because you 
> > tend to get an unhelpful "blue screen of death" type message if something 
> > goes wrong making it difficult to work out what the problem is. Previously 
> > the system continued as best it could, meaning you might see what the 
> > problem was in what did and didn't appear on the desktop, and might get 
> > enough functionality on the desktop to debug the problem more easily.
> 
> I'd say the answer to the question is 'no, it's just different'.
> 
> systemd actually makes it *easier* to debug startup issues, really,
> because it has good and sophisticated tools for investigating the status
> of all services and the dependencies between them. But it's _different_
> from sysv/upstart, and you have to learn how to use the tools and logs.
> Once you do that, it's actually much better. You can see from the bug
> report how Kay diagnosed this particular dbus/dracut issue: you can do
> that too. Also learn how to use systemctl - the man page is great.
> 
> For Shell, whatever causes the fail whale will almost invariably be
> pointed up by ~/.xsession-errors; you just have to read it carefully.
> The fail whale comes up if any one of a certain set of core GNOME
> components spawns (runs) more than twice within a 60 second period -
> this is intended to catch crash/respawn loops. So you're looking for a
> component - usually shell itself, or gdm, or gnome-settings-daemon -
> crashing more than once.

Oh, extending that: how I usually confirm and get more info on 'fail
whale' scenarios is to leave the broken GNOME session running and switch
to a VT, log in as myself, and look at .xsession-errors. When I think
I've spotted what process is failing, I do this:

DISPLAY=:0 (command)

So if it's gnome-settings-daemon that's failing, I do:

DISPLAY=:0 /usr/libexec/gnome-settings-daemon

that tries to run the process in question *in the X session*, not in the
terminal you're actually typing the command from. That way you can
confirm that it really is that process that's failing, and get any error
output it spits out to the console but doesn't put in logs.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net



More information about the test mailing list