My desktop system is a MSI K8T Master2-FAR w/2 Opteron CPUs. This system has been running various releases of i386 Fedora (not x86_64 due to various 32 vs 64 bit issues), starting at Fedora Core 2, upgrading to FC3, FC4, FC5, FC6. It had no real issues with any of those releases of Fedora.
During the Christmas break I did a fresh install of Fedora 8 with all the current patches. The system ran fine for a couple of weeks on kernel 2.6.23.9-85.fc8. It then started to get this error
Message from syslogd@draco at Jan 9 08:43:51 ... kernel: journal commit I/O error
The errors don't have any kind of pattern - the system may run for less than 24 hours or for several days before the error happens.
Figuring the hard drive may be going, I replaced it and moved the Lite-on DVD drive to the other IDE bus, using new IDE cables for both drives. The problems continued to happen, so I tried disconnecting the DVD drive in case it might be causing some kind of issue that was causing the journal commit errors. I still got the errors, so I went back to the kernel that was installed by the F8 install, 2.6.23.1-42.fc8. The system ran for a week with no problems with that kernel.
I've also tried the latest F8 kernel, 2.6.23.14-107.fc8, and got a journal commit error there too (in less that 24 hours), so I'm back on the 2.6.23.1-42.fc8 kernel for now.
Does anyone have any ideas on how I can track this issue down? From what I can tell, it's either a motherboard problem or a kernel bug/issue, but there may be something I'm overlooking that might be causing this.
Thanks,
Does anyone have any ideas on how I can track this issue down? From what I can tell, it's either a motherboard problem or a kernel bug/issue, but there may be something I'm overlooking that might be causing this.
Posting the actual text of system logs, and info about the hardware - eg what controller. Most likely its a drive failing.
Alan Cox wrote:
Does anyone have any ideas on how I can track this issue down? From what I can tell, it's either a motherboard problem or a kernel bug/issue, but there may be something I'm overlooking that might be causing this.
Posting the actual text of system logs, and info about the hardware - eg what controller. Most likely its a drive failing.
There's nothing logged in the logs after the journal commit error because the kernel makes the drive read-only at that point. Most of the time I'm not able to run anything either - I just get a generic "input/output" error message. I haven't tried running dmesg when the problem happens; would anything useful come out of that?
There's only one hard drive in the system, and I replaced it a week or so ago with a brand new drive. Although anything is possible, I doubt that the new drive is exhibiting the same problems as the old drive.
The motherboard chipset is a VIA VT8237. The kernel says it's using the pata_via drivers:
Jan 28 09:27:46 draco kernel: scsi0 : pata_via Jan 28 09:27:46 draco kernel: scsi1 : pata_via Jan 28 09:27:46 draco kernel: ata1: PATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001d000 irq 14 Jan 28 09:27:46 draco kernel: ata2: PATA max UDMA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001d008 irq 15
Jan 28 09:27:46 draco kernel: ata1.00: ATA-7: WDC WD1600AAJB-00PVA0, 00.07H00, max UDMA/100 Jan 28 09:27:46 draco kernel: ata1.00: 312581808 sectors, multi 16: LBA48 Jan 28 09:27:46 draco kernel: ata1.00: configured for UDMA/100
Jan 28 09:27:46 draco kernel: ata2.00: ATAPI: LITE-ON DVDRW LH-20A1H, LL05, max UDMA/66 Jan 28 09:27:46 draco kernel: ata2.00: limited to UDMA/33 due to 40-wire cable Jan 28 09:27:46 draco kernel: ata2.00: configured for UDMA/33
There's nothing logged in the logs after the journal commit error because the kernel makes the drive read-only at that point. Most of the time I'm not able to run anything either - I just get a generic "input/output" error message. I haven't tried running dmesg when the problem happens; would anything useful come out of that?
Probably yes. It will at least give a good idea of the problem. Without that its guesswork.
Alan
Alan Cox wrote:
Probably yes. It will at least give a good idea of the problem. Without that its guesswork.
Ok, I've rebooted to the latest F8 kernel. When it happens again I'll post any more information I can get out of it.
Alan Cox wrote:
There's nothing logged in the logs after the journal commit error because the kernel makes the drive read-only at that point. Most of the time I'm not able to run anything either - I just get a generic "input/output" error message. I haven't tried running dmesg when the problem happens; would anything useful come out of that?
Probably yes. It will at least give a good idea of the problem. Without that its guesswork.
The problem (finally) happened again. I was unable to get anything via dmesg:
dmesg
/bin/dmesg: Input/output error.
Any other suggestions on how to get more information out of this?