How to best debug kernel problems

Fri Jul 1 19:46:15 UTC 2011

>>> Does turning on kdump and installing debuginfo kernel stuff change
>>> things in a way that a watchdog oops won't happen?
>> I doubt the debuginfo package install has anything to do with it, as
>> that is really only used by things like crash (and maybe perf?).
>> Enabling kdump does cause some changes though, as the kernel will
>> reserve a section of memory to put the kdump kernel in.  It's
>> plausible that your machine is tripping on some memory issues and
>> enabling kdump is forcing the kernel to not touch that memory at
>> runtime.  It's something of a long shot but it's plausible.
> I am going to try the memtest. The system this morning was incredibly
> sluggish with a yum install taking several times longer than in the
> past. A reboot of the system this morning had it oops a lot on agetty
> versus working.. but it locked up solid so a power reboot was needed
> so I couldn't get anything 'saved' from ram. sigh.
> Will run memtest86 for the day and see if that gets anything. I had
> tested the system earlier on my rawhide experience with the IBM
> maintenance tools but they may miss something.

So ran memtest86+ for the morning, and nothing showed up. Rebooted
into 3.0 and it watchdog halted on me in a way that
FN-CNTRL-SHIFT-SYSRQ-B didn't do anything. Power cycling caused it to
reboot into another watchdog issue so I went into single user mode.
There I was able to turn off watchdog resets and exit (this was not
meant to be a solution as much to see what would be causing stuff).

What seems to happen is I get slower and slower accesses of some sort
to disks. [Sorry for the wishy-washy wording I need to figure out how
to better debug this for you guys.] System came up and I logged in
after 5 minutes. Doing a yum update took about 4 seconds to get to the
(y/N) part and then afterwords sat there and then did a long set of
disk accesses. Doing it again took even longer with long pauses. My
guess is something is sitting in the CPU and the watchdog saw nothing
happening for 10 seconds and whacked. On the third time the system
just sat there and never returned (keyboard doesn't seem to allow for
a reset so I don't know what is going on.)

At the moment, the system is usable with the 2.6.38 from Fedora-15
(though systemd will not allow a reboot due to constantly trying to
restart systemd-kmsg something) Will look into getting a modem null
cable though I may just go for a picture of the watchdog issue.

