Jakub's Recommendations for ia32 Support

Gregory Maxwell gmaxwell at gmail.com
Tue Feb 3 21:31:18 UTC 2009


On Tue, Feb 3, 2009 at 3:45 PM, Dominik 'Rathann' Mierzejewski
<dominik at greysector.net> wrote:
>> There are certainly cases where cmov can be faster.  Perhaps exclusively
>> on older micro architectures (P4s, early Core2, maybe AMD, haven't
>> checked).  But in general it's no win.
>
> Well, I talk to people who write hand-optimized assembly and care to
> squeeze every cycle out of various CPUs and they say it's definitely
> a win.

GCC is not a person who writes hand-optimized assembly, yet it is
GCC's use of cmov that matters to us. It wouldn't surprise me to find
that profile driven use of CMOV works a lot better than the generic
case.

These people can continue to write their cmov using ASM. If they are
doing that kind of tuning work then they are likely also doing SSE
detection and can handle switching code variants at run time.

> So please, show me some code instead of hand-waving.

… I did post benchmarks of libTheora showing a 0.2% gain on core2 from
cmov.  Perhaps you'd care to benchmark freetype?




More information about the devel mailing list