memcpy overlap: quickly detect, diagnose, work around

John Reiser jreiser at
Sun Nov 28 23:13:43 UTC 2010

This patch (with .rpms for x86_64 and i686) enables glibc optionally
to detect, diagnose, and work around overlap in memcpy/mempcpy:
The option to check is controlled by an environment variable
MEMCPY_CHECK_ which influences choices made by __init_cpu_features
and the STT_GNU_IFUNC mechanism for choosing alternate implementations
at runtime.  The patch extends the IFUNC mechanism by passing
&getenv as an actual argument.  If MEMCPY_CHECK_ is unset or 0, then
there is no runtime overhead at all when calling memcpy.  Setting
MEMCPY_CHECK_ nonzero enables detection, diagnosis, and work-around
using memmove.  The runtime cost of detecting no overlap is about
4 or 5 cycles per call to memcpy or mempcpy.

A previous thread "Fixing the glibc adobe flash incompatibility"
is related.

The patch demonstrates what is possible.  It is one example
of a selective diagnostic tool that may be useful in some cases.
Other tools such as valgrind(memcheck) may be preferable for
general development.

Looking at the new architecture-dependent implementations of memcpy
for i686 and x86_64, it seems that overlap makes no difference
for lengths of 40 bytes or less (i686) or 80 bytes or less (x86_64).
The new code copies short source regions entirely into registers
before storing any bytes at the destination.  Thus the implementation
might move the check for overlap into the branch for long regions only.
The 4 or 5 cycles would be an almost insignificant overhead, particularly
in relation to the claimed savings of hundreds of cycles for the
recently-introduced code.


More information about the devel mailing list