Hi,
I wanted to bring some attention to this in devel, not only to openblas' maintainer (in CC), because there have been some discussions around BLAS/LAPACK in the past here.
As Dave Love pointed out in a previous discussion, generally, parallelization is made at the top level and then you simply call a single-threaded BLAS/LAPACK implementation. But as it turns out, openblas is not thread-safe in such a scenario since v0.3.7 at least. To ensure thread-safety, we need to build single-threaded openblas with USE_LOCKING=1 [1] (which we don't do now, and we should, especially if we intend to make this implementation a system-wide default [2]).
Some bug reports motivated the inclusion of this new flag in v0.3.7 (see [3, 4]). And I stumbled upon this due to a question in r-sig-fedora [5] about a proper way to switch BLAS/LAPACK version in Fedora motivated by this issue in an R package.
So it's clear that things are failing out there, and USE_LOCKING=1 is a sensible default that we should apply. I didn't find though what's the performance penalty of setting such a flag. But if that's noticeable, then this is another argument in favour of providing a proper mechanism for the user to switch the implementation, as e.g. Debian does.
[1] https://github.com/xianyi/OpenBLAS/wiki/Faq/4bded95e8dc8aadc70ce65267d1093ca... [2] https://fedoraproject.org/wiki/Changes/OpenBLAS_as_default_BLAS [3] https://github.com/xianyi/OpenBLAS/issues/2126 [4] https://github.com/xianyi/OpenBLAS/issues/2155 [5] https://stat.ethz.ch/pipermail/r-sig-fedora/2020-May/000616.html
On Wed, 27 May 2020 12:29:36 +0200 Iñaki Ucar iucar@fedoraproject.org wrote:
Hi,
I wanted to bring some attention to this in devel, not only to openblas' maintainer (in CC), because there have been some discussions around BLAS/LAPACK in the past here.
As Dave Love pointed out in a previous discussion, generally, parallelization is made at the top level and then you simply call a single-threaded BLAS/LAPACK implementation. But as it turns out, openblas is not thread-safe in such a scenario since v0.3.7 at least. To ensure thread-safety, we need to build single-threaded openblas with USE_LOCKING=1 [1] (which we don't do now, and we should, especially if we intend to make this implementation a system-wide default [2]).
The correct case is to use the OpenMP flavor of OpenBLAS to avoid these issues. If you use the OpenMP library in a sequential program, the BLAS runs in parallel, and if you use the OpenMP library in an OpenMP parallel program the BLAS runs either sequentally (within already-parallel regions) or in parallel (within sequential regions).
So it's clear that things are failing out there, and USE_LOCKING=1 is a sensible default that we should apply. I didn't find though what's the performance penalty of setting such a flag.
I've toggled USE_LOCKING=1 in openblas-0.3.9-3.
But if that's noticeable, then this is another argument in favour of providing a proper mechanism for the user to switch the implementation, as e.g. Debian does.
Debian's mechanism for switching the implementations is improper, due to reasons already discussed on this list.
On Thu, 28 May 2020 at 10:04, Susi Lehtola jussilehtola@fedoraproject.org wrote:
The correct case is to use the OpenMP flavor of OpenBLAS to avoid these issues. If you use the OpenMP library in a sequential program, the BLAS runs in parallel, and if you use the OpenMP library in an OpenMP parallel program the BLAS runs either sequentally (within already-parallel regions) or in parallel (within sequential regions).
The problem arised in a threaded algorithm. Reference BLAS, as well as serial MKL and Atlas are thread-safe. I did recommend openblas-openmp to the user that reported the issue with an R package, but not everybody knows or understands why there are so many versions.
So it's clear that things are failing out there, and USE_LOCKING=1 is a sensible default that we should apply. I didn't find though what's the performance penalty of setting such a flag.
I've toggled USE_LOCKING=1 in openblas-0.3.9-3.
Great, thanks!
But if that's noticeable, then this is another argument in favour of providing a proper mechanism for the user to switch the implementation, as e.g. Debian does.
Debian's mechanism for switching the implementations is improper, due to reasons already discussed on this list.
I'm not proposing Debian's mechanism. I just say that Debian has *a* mechanism, and for that reason some people prefer Debian. But there may be a better way: see [1].
[1] https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...