Fixing the glibc adobe flash incompatibility

Thu Nov 18 16:15:57 UTC 2010

On Thu, Nov 18, 2010 at 04:23:56PM +0100, Jakub Jelinek wrote:

 > It is very sad that Intel/AMD just didn't make sure rep movsb
 > isn't the fastest copying sequence on all of their CPUs,
 > which underneath could do whatever magic based on size and src/dst
 > alignment (e.g. for small length handle it in hw so it is as quick as
 > possible, for larger sizes perhaps handle it in microcode) - rep movsb
 > can be easily inlined and is quite short as well.  But on many, especially 
 > recent, CPUs it performs very badly compared to these much larger SSE* optimized
 > routines.
 > 
 > If you want exact numbers, best ask Intel folks who wrote and tuned the
 > SSE4.2 memcpy routine.

I wonder if the Intel people who benchmarked memcpy throughput also benchmarked
the increased context switch time that will happen now that the kernels lazy-fpu
state saving is effectively disabled every time something calls memcpy.

	Dave