On Tue, 23 Jul 2019 12:16:45 +0200
Igor Gnatenko <ignatenkobrain(a)fedoraproject.org> wrote:
On Tue, Jul 23, 2019 at 12:08 PM Kevin Kofler
<kevin.kofler(a)chello.at> wrote:
>
> Igor Gnatenko wrote:
> > 1. Lower requirement to something like SSE4 and select other CPU
> > features which are available in most of CPUs for last decade.
>
> Sorry, but -1 to SSE4 too. One of my machines supports only up to
> SSSE3, and other replies in this thread have also suggested SSSE3
> as the most we can assume. And if you ask me, we should just stick
> to SSE2 as the baseline. What are the big gains to be had from
> SSE3, SSSE3, SSE4.1, and SSE4.2? Especially if you limit it to
> packages that don't do runtime detection? (Performance-sensitive
> software SHOULD do runtime detection, and most of it does, e.g.,
> OpenBLAS.)
I used SSE4 as an example. Obviously one needs to spend time digging
into all this and find appropriate set.
From what I saw, openblas does not do any runtime detection. You
either compile it with avx2 or not. And in runtime it will check
whether it was enabled during compilation and use some kind of
fallback.
openblas can do a runtime CPU detection for x86, aarch64 and Power, if
built accordingly
Dan