On Mon, Mar 28, 2011 at 5:29 AM, Chris Ball <cjb(a)laptop.org> wrote:
Hi,
On Sun, Mar 27 2011, Gordan Bobic wrote:
> On 03/27/2011 04:19 AM, Chris Ball wrote:
>> I suspect you're right that recompiling the world with NEON is no big
>> deal, but simply doing glibc/X/liboil/codecs should be a large win by
>> itself. In those cases there's pre-vectorized code sitting there
>> and waiting to be emitted once the right flag's turned on.
>
> If there is such code in there (and I'm not convinced there is much, if
> any), it is likely to be hand-crafted assembly - and if that is the
> case, it's a virtual certainty that it isn't ARM assembly.
You are wrong. Here is a patch for NEON-optimized memcpy() for glibc
written in ARM assembly:
http://sourceware.org/ml/libc-ports/2009-07/msg00003.html
Orc¹, which replaced liboil in gstreamer, also emits NEON asm:
http://code.entropywave.com/git?p=orc.git;a=blob;f=orc/orcrules-neon.c
As does pixman, which accelerates X rendering and reports simple
fill/blit operations being at least twice as fast with NEON:
http://sandbox.movial.com/blog/2009/06/pixman-gets-neon-support/
Yip, and SKIA, EFL, and ffmpeg. Linaro is working quite a bit in this
area such as Cairo, libjpeg, AAC, and VP8:
http://status.linaro.org/group/tr-graphics-toolkits-optimization-cairo.html
http://status.linaro.org/group/tr-multimedia-optimize-jpeg-decoding.html
http://status.linaro.org/group/tr-multimedia-optimize-aac-encoding.html
http://status.linaro.org/group/tr-multimedia-optimize-vp8-decoding.html
On the toolchain side, using NEON by default on normal code is pretty
much neutral. We're working on that though:
https://blueprints.launchpad.net/gcc-linaro/+spec/auto-vectorization-impr...
You can do runtime selection based on the capabilities of the chip
using something custom, GLIBC's hwcaps, or the recently added IFUNC
support. That way you get compatible and fast for the cost of a
bigger installed image.
-- Michael