Fedora 15 x64 stunning performance (vs Win7).

Sat Jan 7 00:35:38 UTC 2012

Hardware platform: Dell Studio XPS, Core i7 950, 8Mb L3 Cache.
Operating systems: Fedora 15 x64 / Windows 7 x64 Ultimate

Hi everybody,

I have developed a sorting/searching library written in assembly
language. As long as one stays in the L1 Cache (in place physical sorts)
speeds are identical, but when the proportion of L1 cache misses
is hign (sorts by reference which return an ordering vector as APL
sorts do) Fedora dramatically outperforms Win7.
This performance gap is stunning but consistent and I am not overdoing
it. Something weird occurs when one leaves the L1 cache. As one remains
in the L3 cache (my Core i7 950 has a 8Mb cache) the performance penalty
is about 33%, with mixed L3 cache/main memory accesses it grows to
50%/60% ! I wish some x64 Linux kernel developer could enlighten me. The
assembly code is exactly the same in both cases(except or course for
calls to APIs being replaced with Linux system calls), JWASM assembler
being used. No disk swapping, large/huge pages, or virtual machine
involved and my test program is a plain application run from the command

Any thoughts?


