2009/2/3 Callum Lerwick <seg(a)haxxed.com>:
On Tue, 2009-02-03 at 01:01 -0500, Gregory Maxwell wrote:
> We would see much more substantial gains from things like -msse2 &
> fpmath=sse but, unfortunately, unlike i586 there are a *lot* of
> systems out there (and still being sold) which do not have all the
> fancy instruction set extensions.
I've found -mfpmath=sse to actually be slightly slower than x87. GCC
just isn't very good at SSE yet, but people have been tuning its x87
output for decades now.
And at this point, all really performance critical bits of code I've
ever seen, are already using runtime selection of hand-tuned SSE/MMX/etc
inner loops. This is absolutely key. There's little gain to be had from
diddling with GCC's instruction set usage because most performance
critical software is already using hand-tuned assembly in their hotpaths
on CPUs that support them.
Speaking of, it's worth mentioning liboil:
http://liboil.freedesktop.org/wiki/