Re: P4s, Athlons and bandwidth

Wednesday, 13 August 2003

On Wed, Aug 13, 2003 at 10:23:01PM +0200, Jean Francois Martinez wrote:
...
 Given that most/all of the recent boxes (ie the ones doing the real
 work) are P4s and Athlons it is time RedHat stopped compiling
 with -mcpu=i686 and started optimizing for the P4: -mcpu=p4 
RHL glibc is compiled with -march=i686 actually, and there are not
many instructions other than those enabled by -mfpmath=sse
which would the compiler generate for normal code with -march=pentiumiii
and not -march=i686 (the only difference is scheduling and to my knowledge
the difference is not very big between i686 and PIII).
-mfpmath=sse is not usable for libm, because glibc on IA-32 relies
on extended precision in several places.
Scheduling difference between P4 and i686 is bigger, but I don't think
that code runs that well on Athlons.

...
 Another point is that there is no such thing like low-level glibc
 functions for the P4 and the Athlon.  The highest targetted
 processor is the PIII.  However documents in AMD's web site show
 that moving data (ie memcpy and friends) can be made several times
 faster if using 3DNow instructions and data prefetching, I gave only
 a cursory glance to the assembler parts of glibc but it didn't look
 like those parts (targetting the PIII) would be even remotely ideal
 for the Athlon.  Same thing about the P4. 
Where have you seen PIII optimized assembly in glibc? AFAIK there is none.
P4/Athlon/PIII optimized stringops are certainly welcome (patches to
libc-alpha(a)sources.redhat.com), but bear in mind that any use of floating
point regs (SSE/SSE2/whatever) has quite a big price in lazy FPU saving
environment. Another thing to keep in mind is what are typical arguments
to these functions.

	Jakub

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: P4s, Athlons and bandwidth