On Wed, Aug 13, 2003 at 10:23:01PM +0200, Jean Francois Martinez wrote:
Given that most/all of the recent boxes (ie the ones doing the real
work) are P4s and Athlons it is time RedHat stopped compiling
with -mcpu=i686 and started optimizing for the P4: -mcpu=p4
RHL glibc is compiled with -march=i686 actually, and there are not
many instructions other than those enabled by -mfpmath=sse
which would the compiler generate for normal code with -march=pentiumiii
and not -march=i686 (the only difference is scheduling and to my knowledge
the difference is not very big between i686 and PIII).
-mfpmath=sse is not usable for libm, because glibc on IA-32 relies
on extended precision in several places.
Scheduling difference between P4 and i686 is bigger, but I don't think
that code runs that well on Athlons.
Another point is that there is no such thing like low-level glibc
functions for the P4 and the Athlon. The highest targetted
processor is the PIII. However documents in AMD's web site show
that moving data (ie memcpy and friends) can be made several times
faster if using 3DNow instructions and data prefetching, I gave only
a cursory glance to the assembler parts of glibc but it didn't look
like those parts (targetting the PIII) would be even remotely ideal
for the Athlon. Same thing about the P4.
Where have you seen PIII optimized assembly in glibc? AFAIK there is none.
P4/Athlon/PIII optimized stringops are certainly welcome (patches to
libc-alpha(a)sources.redhat.com), but bear in mind that any use of floating
point regs (SSE/SSE2/whatever) has quite a big price in lazy FPU saving
environment. Another thing to keep in mind is what are typical arguments
to these functions.
Jakub