memtest86+ ECC oddity

Jack Howarth howarth at bromo.msbb.uc.edu
Thu May 4 14:13:58 UTC 2006


   We have a machine with ECC support enabled in the motherboard firmware
and ECC DIMMs installed. Recently this machine has suffered a couple
random freezes and yesterday began to report the following kernel error...

kernel: EDAC MC0: UE page 0x8e0, offset 0x0, grain 4096, row 0, labels ":": i82875p UE

...indicating it had unrecoverable memory errors. However, when I run
memtest86+ by booting into it, the default settings with ECC disabled
don't report any memory errors during the test. If I enable the ECC
mode in memtest86+, I finally do see a bad memory location appear
repeatedly. 
   What exactly is happening in this situation? I am guessing that the
ECC enabled memory is suppressing the bad memory location just enough
that it passes when the memtest86+ memory test is run with ECC disabled.
This would only make sense if memtest86+ somehow short-circuited the
ECC feature when the ECC mode in memtest86+ is enabled so that it could
see if ECC is correcting memory errors in the background silently. Is
this a correct read on the situation?
                  Jack




More information about the users mailing list