We have a room full of *identical* boxes that we have run Red Hat Linux classes (6.x, 7.x, 8.0 and 9) on over the past 4 years. These are 500Mhz Intel 440BX motherboard boxes.
No problems until RHL9 came out. On about 50% of the machines (identical hardware remember, including BIOS settings) kernel system calls on RH 2.4.20 kernels run about 4x - 10x slower.
Of course with this problem the whole system runs dog slow and is painful to use.
The vanilla kernel.org kernels and the RH 2.4.18 kernels (from RHL8.0) do NOT exhibit the slowdown.
The problem can be easily quantified using strace. Take a look at the following (especially the third column):
First with vanilla ftp.kernel.org 2.4.20 compiled using kernel-2.4.20-i686.config from RH.
[root@station9 root]# uname -r 2.4.20 [root@station9 root]# strace -c ls -al /etc > /dev/null execve("/bin/ls", ["ls", "-al", "/etc"], [/* 30 vars */]) = 0 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 41.25 0.002895 10 289 lstat64 12.92 0.000907 36 25 read 12.61 0.000885 22 41 14 open 5.34 0.000375 21 18 readlink 5.27 0.000370 12 31 old_mmap 5.14 0.000361 52 7 getdents64 5.06 0.000355 15 23 munmap 2.82 0.000198 7 30 close 2.11 0.000148 5 28 fstat64 1.44 0.000101 3 31 fcntl64 1.44 0.000101 51 2 socket 1.42 0.000100 50 2 2 connect 1.27 0.000089 5 17 brk 0.80 0.000056 56 1 mmap2 0.57 0.000040 8 5 write 0.19 0.000013 4 3 2 rt_sigaction 0.17 0.000012 4 3 3 ioctl 0.13 0.000009 9 1 uname 0.06 0.000004 4 1 gettimeofday ------ ----------- ----------- --------- --------- ---------------- 100.00 0.007019 558 21 total
Now the latest RHL9 errata kernel. All RHL9 kernels and RHL8.0 kernels
= 2.4.20 perform the same:
[root@station9 root]# uname -r; strace -c ls -al /etc > /dev/null 2.4.20-20.9 execve("/bin/ls", ["ls", "-al", "/etc"], [/* 30 vars */]) = 0 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 43.57 0.019390 67 289 lstat64 10.88 0.004841 194 25 read 9.82 0.004369 109 40 13 open 5.84 0.002601 145 18 readlink 5.78 0.002574 112 23 munmap 4.51 0.002007 72 28 fstat64 4.17 0.001857 265 7 getdents64 3.12 0.001387 45 31 fcntl64 2.53 0.001124 37 30 close 2.42 0.001078 98 11 old_mmap 1.87 0.000834 49 17 brk 1.07 0.000475 238 2 2 connect 1.06 0.000473 237 2 socket 1.00 0.000446 20 22 mmap2 0.91 0.000405 81 5 write 0.52 0.000233 78 3 2 rt_sigaction 0.44 0.000197 66 3 3 ioctl 0.44 0.000197 197 1 uname 0.02 0.000011 11 1 set_thread_area 0.01 0.000003 3 1 gettimeofday ------ ----------- ----------- --------- --------- ---------------- 100.00 0.044502 559 20 total
I'm posting this message to see if anyone else has seen anything similar or has any ideas. This same problem is 100% reproducible on multiple machines in the classroom.
You may want to add comments or add your self to the CC list here:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90116
The machines in this classroom are being replaced in two weeks with P4 2.8Ghz HyperThreaded boxes, so ideally we can get this problem nailed down soon.
Dax Kelson
Hey Dax.. say hi to Bryan for me.. and tell him he got the 40th digit of PI wrong.
Can you give the complete listing of the machines and I mean a COMPLETE listing.. We had something like this with newer hardware but found that we had two different revisions on Disk BIOS's. One set of machines were just plain slower by 1+ rotations of the drive... (I only know this because the guy who helped build 360's here worked out the speed differences to being an average of 1-3 disk drive rotations if the drive was really 7200 RPM).
On Thu, 2003-09-04 at 00:35, Dax Kelson wrote:
We have a room full of *identical* boxes that we have run Red Hat Linux classes (6.x, 7.x, 8.0 and 9) on over the past 4 years. These are 500Mhz Intel 440BX motherboard boxes.
No problems until RHL9 came out. On about 50% of the machines (identical hardware remember, including BIOS settings) kernel system calls on RH 2.4.20 kernels run about 4x - 10x slower.
Of course with this problem the whole system runs dog slow and is painful to use.
On Thu, 2003-09-04 at 09:40, Stephen Smoogen wrote:
Hey Dax.. say hi to Bryan for me.. and tell him he got the 40th digit of PI wrong.
Can you give the complete listing of the machines and I mean a COMPLETE listing.. We had something like this with newer hardware but found that we had two different revisions on Disk BIOS's. One set of machines were just plain slower by 1+ rotations of the drive... (I only know this because the guy who helped build 360's here worked out the speed differences to being an average of 1-3 disk drive rotations if the drive was really 7200 RPM).
He doesn't believe he got the 40th digit (after the decimal place) wrong. He says it is 1. Ask him what his home zip code is though, and that throws him for a loop. :)
Bryan sends congrats on the test scores.
I was able track down the cause of the slowness, it was lm_sensors. See the following for details:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90116
Dax Kelson Guru Labs