FC4 crashes repeatedly on dual dual-core Opteron Supermicro AS1020A-T

Michal Szymanski msz at astrouw.edu.pl
Sat Apr 1 06:21:00 UTC 2006


Hi,

I have recently purchased three Supermicro AS1020A-T servers equipped
with two dual-core Opterons 280 each. H8DAR-T motherboards, 8 or 12 GB
RAM. The systems carry FC4 x86_64 with proprietary driver (made by
Adaptec) for the onboard Marvell 88SX6041 SATA Controller.
Original (install) kernel 2.6.11-1.1369_FC4smp - unfortunately not
upgradable due to the lack of the SATA driver for other kernel
versions.

All systems crash (either hang with some "machine check exception"
kernel messages or reset) when loaded with repeating runs of
1.3gb, CPU intensive with some I/O. I run 2 or 4 jobs simultaneously and
it never survived more than a few hours.

Suspecting it may be the SATA driver problem I mounted /tmp as "tmpfs"
and repeated the tests entirely in /tmp (with plenty of RAM this means
doing I/O in memory). No success.

It is somewhat better when I run similar size no-I/O jobs but these also
crash, although less frequently.

I tried to install i386 version, also crashes.

Memtest does not show any RAM errors.

The SATA driver works somewhat strangely. In addition to the kernel
module "aar81xx", there is also "aar81xx_wd" process in the user space,
always seen by "ps" as in "D - uninterruptible sleep", so the "idle"
load of the machine is always 1.

There is no Marvell driver for FC5 (yet?) so I cannot give it a try.

Any ideas or sharing experience on this hardware under Linux or
dual-core Opterons under Linux (e.g. kernel version requiremnets) would
be welcomed.

regards, Michal.

-- 
  Michal Szymanski (msz at astrouw dot edu dot pl)
  Warsaw University Observatory, Warszawa, POLAND




More information about the users mailing list