On Wed, 27 May 2020 12:29:36 +0200 IƱaki Ucar iucar@fedoraproject.org wrote:
Hi,
I wanted to bring some attention to this in devel, not only to openblas' maintainer (in CC), because there have been some discussions around BLAS/LAPACK in the past here.
As Dave Love pointed out in a previous discussion, generally, parallelization is made at the top level and then you simply call a single-threaded BLAS/LAPACK implementation. But as it turns out, openblas is not thread-safe in such a scenario since v0.3.7 at least. To ensure thread-safety, we need to build single-threaded openblas with USE_LOCKING=1 [1] (which we don't do now, and we should, especially if we intend to make this implementation a system-wide default [2]).
The correct case is to use the OpenMP flavor of OpenBLAS to avoid these issues. If you use the OpenMP library in a sequential program, the BLAS runs in parallel, and if you use the OpenMP library in an OpenMP parallel program the BLAS runs either sequentally (within already-parallel regions) or in parallel (within sequential regions).
So it's clear that things are failing out there, and USE_LOCKING=1 is a sensible default that we should apply. I didn't find though what's the performance penalty of setting such a flag.
I've toggled USE_LOCKING=1 in openblas-0.3.9-3.
But if that's noticeable, then this is another argument in favour of providing a proper mechanism for the user to switch the implementation, as e.g. Debian does.
Debian's mechanism for switching the implementations is improper, due to reasons already discussed on this list.