F7: Trying to figure out why kernel crashes with journal commit I/O error

Howard Wilkinson howard at cohtech.com
Mon Oct 8 07:47:25 UTC 2007


Gilbert Sebenste wrote:
> Hello all,
>
> I am having an absolutely vexing problem that maybe somebody might 
> shed some light on.
>
> I just got 2 new computers, both running F7. They each have one 
> Seagate 750 GB SATA 3 Gb/s, 7200 RPM, 16 MB drive. Each machine has 4 
> GB of RAM, Core 2 quad 6700 motherboard from ASUS.
>
> OK. I run the computers pretty hard. But I have two Pentium 4's who 
> work just as hard, all getting a 20 MB/sec peak (1 MB/sec avg) weather 
> feed from the National Weather Service, flawlessly for months until I 
> install new kernels on it and reboot.
>
> OK, within 12 hours after startup of the new machine running identical 
> software that the other slower machines are running with the exact 
> same data feed, I get
>
> kernel: journal commit I/O error
>
> I can log in, but can't do commands. A manual power-down (shutdown -r 
> now won't work) and reboot clears it fine.
>
> First I suspected a hard drive error on both machines. But then
> replacement hard drives came in. It seemed to stop the problem for a 
> few days, so I closed a bugzilla I had. Nope, this weekend, it went 
> back to crashing every 4-18 hours.
>
> I tried to cut the read-writes in half, to no effect, by reducing the
> amount of data/files coming in.
>
> I have:
>
> Replaced the hard drive 3 times with new ones (to no avail)
>
> Reduced the read/writes by around half
>
> Turned off legacy USB support, which also caused my keyboard and mouse 
> to stop working with errors (that's been cleared and is OK)
>
> Filed a bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=318661
>
> Tonight, I tried using the original kernel that came with F7
> (2.6.21-1.3194.fc7) instead of the latest (2.6.22.9-91.fc-7).
> As of two hours into this, so far so good, but I'm not confident.
>
> Two other machines, Pentium 4's at 3 GHZ with ASUS motherboards, purr 
> like a kitten.
>
> Has anyone seen anything like this, or know what could be the problem?
>
> As always, grateful for any help, and thanks for reading this!
>
> Gilbert
>
> ******************************************************************************* 
>
> Gilbert Sebenste                                                     
> ********
> (My opinions only!)                                                  
> ******
> ******************************************************************************* 
>
>
I would suspect a hardware issue with the motherboards as my first port 
of call. I have had a similar problsm with a new Pentium 4 board 
recently where the ATA disc interface offlined every 18 hours of so but 
hvaing replaced with a SATA drive the system purrs for weeks.

Secondly the kernel version may be important - core 2 quad processors 
are newish so later kernel SHOULD have better support. Maybe try a 
development kernel on one of the machines e.g. 2.6.23.-----

Finally, have you run a full FSCK on the drives after they fail - reboot 
into single mode and run fsck -f. You may find that the problem is a 
disc structure corruption ... then you have to find out why.

You do not say which journalling file system you are using - is this 
ext3, jfs, reiserfs, ...

Finally, have you run memtest86+ on these machines - possible memory 
dropout going unnoticed (especially if they do not have ECC memory)

Note sure if this will help but hope it is not just noise....

-- 

Howard Wilkinson

	

Phone:

	

+44(20)76907075

Coherent Technology Limited

	

Fax:

	

 

23 Northampton Square,

	

Mobile:

	

+44(7980)639379

United Kingdom, EC1V 0HL

	

Email:

	

howard at cohtech.com

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.fedoraproject.org/pipermail/users/attachments/20071008/914d48b5/attachment-0001.html 


More information about the users mailing list