Fedora Makes a Terrible Server?

Roger Heflin rogerheflin at gmail.com
Tue Mar 25 22:55:02 UTC 2008


Les Mikesell wrote:
> Roger Heflin wrote:
>>
>> Yes, and typically to support anything recent you have too many 
>> add-ons on the enterprise OSes, if you are in a fast moving enterprise 
>> environment RHEL won't work.
> 
> Fast moving and enterprise are words you don't usually see together. 
> Don't you have to keep decades-old processes running?
> 
>> RHEL is probably quite good for any of the nice simple static 
>> enterprise environments, but most would argue there you should 
>> probably lock everything down so tight that few kernel 
>> updates/userspace are even required for anything, the problem is in an 
>> environment were you are constantly having to bring in new hardware 
>> that does not work on the older release, where you cannot wait 6 
>> months for RHEL to catch up.
> 
> I can't recall ever being in a position of "having to bring in new 
> hardware".  What scenario forces this issue on you?  I haven't noticed a 
> shortage of vendors who will sell RHEL supported boxes.  But it sounds 
> like you have an interesting job...
> 

More cpu power needed to do the job.   And the new boxes aren't officially RHEL 
supported (and sometimes won't even boot with the latest update-but will work 
with the latest fedora/kernel.org).    You typically bring in around a large 
enough set of new machines at a time (usually this was 100-200 machines) any 
only update the pieces required to support that new machine, and then you run 
some test to validate that it gives the correct answers for various jobs.   It 
is really a money issue, the new machine is 2x the speed but not yet supported 
by RHEL, so you would need 2x the number of old supported machines for 2x the 
cost or more.   Reliability was required (50 or so disk servers any one of which 
would cause at least partial loss of access to data).

The problem is validating that the new hardware/OS (or old hardware/just an huge 
update) is the same process, so you change only what you need to, and you are 
better off starting with as new as possible and going from there.  We were 
typically only updating the kernel, change userspace was even more dangerous as 
a bad library update could change answers, so unless we found a library bug that 
could not be easily worked around, we did not update it.

Some of the compute customers don't apply updates, it is too risky to cause 
downtime/wrong answers, they fix the issues that they find, and then every 1-2 
years they update the older stuff to what is currently being used/proven on the 
newest machines, the situation you have is you have sets of machines with 
slightly different OS loads  (which is kind of nasty), but the other choice is 
that you update everything all of the time (and that is even worse-as too many 
different types of HW must be revalidated to still give the correct answer). 
You end up with old servers that may be running what was originally determined 
to be stable by testing and they aren't touched.  In this environment the 
testing is just as bad with any OS as any other, you start with something like 
F8 and then update/downgrade any parts that fail you to something that works, 
and they you don't touch it for several years.    I had a subset of machines 
(about 250 machines) all of which had reached about 500+ days of uptime (the 
uptime counter rolled over), the 20 or so machines out of that set that failed 
to reach that uptime were all HW issues (usually disk failures), but some more 
lethal failures that required the retiring of the specific machine, it was not 
typically worth paying for fixing the nastier issues (usually MB failures) so 
they were retired, it was often cheap to just buy a something new that was 
validated and much faster.    The issue with all OSes is that no one tests 
enough to catch these high MTBF issues, and in a big environment a machine 
crashing 1x per every 1000 days of uptime, comes to 1 machine a day crashing 
because of software, and typically the enterprise OSes aren't even close to that 
level, and while fedora is worse, it is just not that much worse.

                            Roger




More information about the users mailing list