Dave Love wrote:
Kevin Kofler writes:
> If you are talking about the missing RPM AutoProvides:
> Provides: libblas.so.3()(64bit)
> does wonders.
I mean you need to get the soname right and ensure that you have
everything implemented in the replacement library.
Only the soname of the Provides matters. The actual library file can be a
symlink to the monolithic libopenblas.so.0, the dynamic linker (ld.so) will
load it just fine. The soname is only read at link time, and there, it is
fine (and in fact desired) that newly linked applications get
libopenblas.so.0 recorded as the soname, not libblas.so.3.
>> Various things have been changed to use openblas on x86 after
some of us
> The problem is, "various things" is not enough, we need a plan to ensure
> ALL things use it.
It's not available for them all as far as I know -- there's an rpm macro
which says which ones. I'm happy if that's wrong now.
"things" = "packages" here. Surely OpenBLAS should work for all the
using packages on x86, especially if we symlink libblas.so to it. If not, it
is a bug either in OpenBLAS or in the package.
OpenBLAS is not available for some exotic architectures, but the solution
there is to build ATLAS (or some other implementation) for those
architectures (and those architectures only) and set up the symlinks there
> But the new approach I am proposing installs only one version of
> BLAS and LAPACK (the OpenBLAS one), so there cannot possibly be
> mismatched versions (except if you have third-party binaries bundling
> BLAS and linking to the system LAPACK or the other way round, but those
> are then very broken and will also fail on other distributions for the
> same reason).
"Third party" (user and system) binaries linking non-OB linear algebra
is normal on the sort of systems I work on, though I wish I was allowed
to package system installations. (Red Hat would have caused chaos on
our systems by introducing backwards-incompatible openmpi if it hadn't
been caught by the local package dependencies.)
It is fine if third-party binaries either:
* link to the system version of both BLAS and LAPACK, or
* bundle their own version of both BLAS and LAPACK.
The only case where you can end up with an incompatible mix is if they link
to the system version of one and bundle the other, which is very broken and
hopefully not too common. (It will also break on Debian.)
Also, OB still has at least some correctness problems as far as I
According to the comments, the latest version has precision issues only in
the single-precision version. If you are using single precision where
precision matters, you have a problem already.
Also, having an error of 17.something times a small error instead of 16.00
times is hardly "incorrect". I would be more worried if it were returning
completely wrong results (e.g., 1 instead of 0 or something like that).
And in the end, all software has bugs. glibc also has bugs, so should we
attempt to make all of Fedora switchable to musl at runtime (or worse,
arbitrarily link some of it to glibc, some to musl, some to ucLibc, and some
to dietlibc, just because we can – this is the situation with BLAS/LAPACK
And keep in mind that floating-point computation ALWAYS returns
approximations. If you need rigorous error bounds, you probably need
something completely different entirely (e.g., interval arithmetic), which
will of course have its own limitations.
Is there actually anything wrong with Debian's tried and tested
as long as openblas is preferred where appropriate?
Sorry, I really don't like the alternatives system. It requires you to make
a global systemwide switch to change your implementation. If you want to
make the implementation switchable by the user, it needs to be a runtime
choice (using, e.g., environment modules). But I think we do not have to
leave this decision to the user to begin with.
> ld.so.conf.d is the only way to build those [versions optimized
> subarchitectures where no runtime detection is available], if you want to
> support them. I wonder whether non-x86 architectures are even worth
> investing the effort.
How would relevant hwcaps not help, if they were available, as they
seemed to be on some architectures when I looked some time ago? (That
used to be important on SPARC for efficiency in crypto libraries, at
My point is, those architectures are so rarely used, with Fedora at least,
that I wonder whether it is worth Fedora maintainers' time to optimize for
their subarchitectures. Of course it will give a performance benefit, that
goes without doubt. But is it worth our time considering the actual usage?
Yes it is, but if you can have atlas-sse3, I don't see why you
have blis-avx512. (Dynamism was on the radar for BLIS when I last
looked, and might be worth contributing, but I haven't evaluated it
against OB on anything other than KNL.)
It could be done, but the idea behind my proposal is that atlas-sse3 would
go away. :-) Compile-time switching done right would mean to ship at least 6
or 7 atlas-* packages on x86_64 (even more on i686 because you have SSE 1,
MMX, and x87-only to support there too), probably over a dozen (because
different CPUs supporting the same level of SSE/AVX don't necessarily have
the same optimal settings, see the different kernels the OpenBLAS runtime
switching supports). But of course, if AVX-512 is the only special case, it
is probably doable (using the same ld.so.conf.d approach that atlas-sse3
uses), though the BLIS build would need to be a drop-in for the default
implementation, which would be OpenBLAS.
and the KNL qua Haswell is contributed is worse than you might
To be honest, I did not expect it to be great. Not using AVX-512 is of
course a bummer. Now, I would naïvely have guessed a factor of 2 rather than
3 (because AVX2 is 256-bit, AVX-512 is 512-bit), but I guess AVX2 is just
not as optimized as AVX-512 in the KNL hardware's circuitry.
Realistically, I think native AVX-512 support will come in OpenBLAS when
people start getting those Skylake-X CPUs that have been out for a few weeks
now. The market for high-performance coprocessors is just too small (which
is kinda sad because the whole point of those coprocessors was to give you
performance not available in any CPUs at the time).