Igor Gnatenko wrote:
From what I saw, openblas does not do any runtime detection. You
either compile it with avx2 or not. And in runtime it will check
whether it was enabled during compilation and use some kind of
fallback.
If built with the DYNAMIC_ARCH option, which is the case in the Fedora
packages, OpenBLAS actually compiles its routines several times, for many
different architectures it supports (even some with the same instruction
sets, but different performance characteristics), and then picks whatever is
best according to the runtime CPUID information.
> We already have 2 such mechanisms:
> * several upstream software packages check CPUID directly. See, e.g., how
> OpenBLAS does it. Or the performance-sensitive parts of Chromium. Etc.
You can't do several things with this, like FMA.
Sure you can! OpenBLAS actually also checks for both FMA variants (FMA3 and
FMA4) during the runtime detection. The routines for the CPUs that support
FMA also make use of it.
You are talking about libraries while I am talking about binaries.
Then just build the binary as a library, twice, and make a dummy main
program that links to the library. (See also the "kdeinit hack", which
does/did something similar for different reasons.)
Kevin Kofler