Recently I replaced MB/CPU/Memory on my main workstation, and a few days later I installed FC14. The problems started then. I have tried looking this one up in bugzilla, but so far with no luck.
The system ran cleanly for 4 days on FC13 and the problems I describing started only after the new install.
THE PROBLEM (h/w s/w details after this section):
What is happening is that processes are dying randomly. So far no core dumps however, so I can't go into it that way. I did start a background process going however, and every 2 minutes I do a dump of dmseg to a file.
Currently I have about 3000 saves of dmesg to look at. Only a couple of things fall out:
1) 16 times I have hit a combination of both the message "BUG: unable to handle kernel NULL pointer dereference at 0000000000000049" "Oops: 0000 [#2] SMP" (or [#14] which I presume means both 1 & 4) (16 out of 3000 is statistically below the noise threshold) 2) 88 times I got the message: "last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/host2/target2:0:0/2:0:0:0/block/sdb/sdb6/stat"
Those were the only consistent problems which came out of the dmesg outputs - the 'tainted' process has most commonly been ps (54 times), followed by jackd (11 times) and plasma-desktop (4 times)
I would have thought the message about the problems with /sdb6 was significant but I tried installing on a different drive, and the same problem continues. Then it hit me - I have accounting turned on and that's where /var/log lives .... so that one was probably a false lead.
Finally I suspected a memory problem - but memtest86 couldn't find the problem.
I still have about a week to return MB or CPU or Memory if I could narrow the problem down to one of them - but without being able to narrow the problem down to a specific component, I can't really do that.
Any suggestions would be greatly appreciated. I don't want to be stuck with a buggy system.
H/W S/W DETAILS:
Here are some of the pertinent details for this box: (BTW: nothing is overclocked) MB: Gigabyte 890XA-UD3 CPU: AMD Phenom(tm) II X6 1090T Processor Mem: 8gb Kingston HyperX Blu 4GB 240-Pin DDR3 SDRAM DDR3 1333 HD's: 5 sata 3 drivers, 3.75Tb total Installed cards: - D-Link System Inc DGE-560T PCI Express Gigabit Ethernet Adapter (rev 13)\ - Adaptec AHA-2940U2/U2W - Creative Labs SB Audigy (rev 04) - JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller* (* for a set of 4 2GB drives to hold data for analysis - drives not\ currently on system) Alien drivers: I'm using the nvidia drivers now, but the same thing happened with the noiveau and vesa drivers, so I don't think that's it.
Finally, I'm a computer geek for a living, and I also build about 6-8 new systems a year for myself of friends and upgrade 12-15. I've been doing this for at least 15 years and consider myself relatively experienced with working at the hardware level.
The install was a fresh install, wiping out the previous FC13 installation.
Thanks, -- William william w. austin airedad@att.net ====================================================================== "life is just another phase i'm going through... this time anyway"
The install was a fresh install, wiping out the previous FC13 installation.
If you put FC13 back what happens - a first test would be to install and boot an FC13 kernel with no Nvidia driver weirdness but leave the rest on FC14. The second to do a full FC13 (at least it's otherwise hard to swap all of X out which is probably the other obvious test)
Alan
On 06/03/2011 12:38 AM, William Austin wrote:
Recently I replaced MB/CPU/Memory on my main workstation, and a few days later I installed FC14. The problems started then. I have tried looking this one up in bugzilla, but so far with no luck.
The system ran cleanly for 4 days on FC13 and the problems I describing started only after the new install.
THE PROBLEM (h/w s/w details after this section):
What is happening is that processes are dying randomly. So far no core dumps however, so I can't go into it that way. I did start a background process going however, and every 2 minutes I do a dump of dmseg to a file.
Currently I have about 3000 saves of dmesg to look at. Only a couple of things fall out:
- 16 times I have hit a combination of both the message "BUG: unable to handle kernel NULL pointer dereference at 0000000000000049" "Oops: 0000 [#2] SMP" (or [#14] which I presume means both 1& 4) (16 out of 3000 is statistically below the noise threshold)
- 88 times I got the message: "last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/host2/target2:0:0/2:0:0:0/block/sdb/sdb6/stat"
I had this same type of problem with a newly-built FC14 system using an AMD 1090T on a Gigabyte mobo. The problem, as it turned out, was that I needed to update the BIOS of the mobo to support the AMD 1090T, and it has been rock-solid ever since then. My mobo was an earlier-rev 880GM series.
So you might want to check to see that your BIOS is compatible with the 1090T CPU.