EDAC error

Roger Heflin rogerheflin at gmail.com
Mon Mar 24 14:27:19 UTC 2008


Ric Moore wrote:
> On Sat, 2008-03-22 at 10:03 -0500, Roger Heflin wrote:
>> Ric Moore wrote:
>>> On Thu, 2008-03-20 at 21:58 -0500, Roger Heflin wrote:
>>>> Brent Snow, Mr. wrote:
>>>>> Hi All,
>>>>>
>>>>>  
>>>>>
>>>>>             I am having a problem with a new Dell PowerEdge 1900 Server
>>>>> running Fedora 8.
>>>>>
>>>>>  
>>>>>
>>>>>             The System setup is as follows:
>>>>>
>>>>>  
>>>>>
>>>>>             2 - Xeon  E5310 (Quad-Core 1.6 GHz) processors
>>>>>
>>>>>  
>>>>>
>>>>>             16 GB of RAM, I SATA 80 GB HDD. 
>>>>>
>>> Holy Smokes! 2 quad cores? That's 8 cores total(?) and 16 GIGS of Ram??
>>> My Gawd, not only am I jealous as all hell, I'm wondering what kinda
>>> kernel are you running?? Any sort of stock kernel would roll over and
>>> join the Choir Eternal. 
>> Actually fairly normal kernels work just fine on the large boxes, I have ran 
>> stock FC6 kernels up to 8 cpus/16 cores and up to 64GB of ram with no issues.
>>
>>> Wouldn't you be running some sort of mini clustering setup?? Setup
>>> right, it should really blow serious coal. Your problem might lie in
>>> that direction. You might have training wheels on a Dodge Hemi. With a
>>> machine like that, I could almost do without eating! 
>>> <huge drooling grins> Ric
>>>  
>> Clustering setups are only needed when you have more than 1 machine, having lots 
>> of cpus on a single machine is much easier than clustering as you don't need 
>> have to worry about the networking, and the memory can be shared easily between 
>> the cpus.
> 
> Huh, I wonder then why he's having problems. In the -OLD- days he'd be
> rolling a new kernel. Is the stock kernel multi-cpu aware or does he
> need a more specialized kernel, or is it the kernel at all?? That's
> where I would be looking, fer sure. God, I want one like he's got.
> <scratching strong itch> I always stay a couple of years behind. :) Ric
> 

Hyperthreading has been around too long, and dual core has also been around too 
long, so pretty much everyone ships with SMP on *NOW*.   And you are correct, 
several years ago, SMP was default off on a number of distributions, so you 
almost always had to compile your own.

EDAC errors either mean that the memory is actually bad (or not correctly 
seated, or has dirty connectors, or has some other issue), or that EDAC has some 
sort of issues with either his bios or his hardware.    I guess the easiest way 
to test would be to test a minimum ram configuration and see if *ANY* config 
gets no EDAC errors, if he can find a configuration that has no errors, then it 
is fairly likely that EDAC actually works on that MB, and it is likely he has 
one of the other problems.

It is really much harder to build the big machines, they have more dimms to 
start with and each of the dimms have 2x-4x times the number of chips that a 
normal PC dimm has (ignoring the ECC chips the dimm has), this is because the 
dimms are often double-sided and sometimes on top of that have 2 or 4 chips 
stacked on top of each other to increase the capacity (I don't remember the term 
for that), and once you start stacking the fanout on the memory controller 
rises, and everything gets a lot nastier, and harder to get to work reliably, 
timing has to be changed (and how much it has to be changed depends on the 
number of dimms on the controller).  It just gets messy, I have seen some really 
weird failures when using all of the dimm slots on MB's, often things are not 
adequately tested by the MB companies and/or noted in the MB manual.


                                  Roger




More information about the users mailing list