Ivan K
806073ccbc
utest: test fork safety on OpenMP >= 5
In addition to testing fork safety on non-OpenMP builds, test it when
omp_pause_resource_all() is available to release the locks.
3 years ago
Ivan K
f677f4f29b
blas_thread_shutdown: release OpenMP resources too
OpenMP 5.0 introduced the function omp_pause_resource_all that instructs
the runtime to "relinquish resources used by OpenMP on all devices". In
practice, these resources include the locks that would otherwise trip up
the runtime after a fork(). Releasing these resources in a function
called by pthread_atfork() makes it possible for the child process to
continue functioning after the runtime automatically re-acquires its
resources.
Thread safety: blas_thread_shutdown doesn't check whether there are
other BLAS operations running in parallel, so this isn't any less safe
than before with respect to OpenBLAS function calls. On the other hand,
if there are other OpenMP operations in progress, asking the runtime to
pause may result in unspecified behaviour. A hard pause is allowed to
deallocate threadprivate variables too.
3 years ago
Ivan K
c5f0dcf72b
c_check: test for omp_pause_resource_all()
Assume that the ctest3.c program successfully compiling and linking
means that the function is available.
1 year ago
Martin Kroeker
eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
1 year ago
Martin Kroeker
abbd78aa59
Merge pull request #5138 from martin-frbg/issue5131
Ensure that gmake builds with flang-new link the flang runtime into the shared library
1 year ago
Martin Kroeker
ebcab90976
Handle flang-new runtime library linking on Linux like classic-flang
1 year ago
Martin Kroeker
ed1584666c
Merge pull request #5137 from martin-frbg/issue5136
Fix the CMake build to define USE_TRMM for RISCV64 targets as well
1 year ago
Martin Kroeker
b9ae246f20
define USE_TRMM for RISCV64 targets as well
1 year ago
Martin Kroeker
86cf9d8a2e
Merge pull request #5133 from OpenMathLib/revert-4920-issue4917
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
1 year ago
Martin Kroeker
0b3c56968d
Merge pull request #5135 from martin-frbg/ghwf-n2
CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow
1 year ago
Martin Kroeker
c1bb90a823
remove the express NeoverseN2 target from the Cobalt100 job
1 year ago
Martin Kroeker
77c638db67
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
1 year ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
1 year ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
1 year ago
Martin Kroeker
a64b75a2e0
Merge pull request #5127 from Harishmcw/gesv-threshold
Refined GESV Parallelization Logic for Windows on ARM64
1 year ago
Martin Kroeker
453efbd103
Merge pull request #5128 from martin-frbg/issue5120
Add -O2 to flang flags when building on WoA in Release mode
1 year ago
Martin Kroeker
877d5a5be6
Add -O2 to flang flags when building on WoA in Release mode
1 year ago
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
LoongArch64: Fixed lapack test for LA264
1 year ago
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
1 year ago
Martin Kroeker
e8b11a126b
Merge pull request #5125 from martin-frbg/issue5122
Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code
1 year ago
Martin Kroeker
9a3948df82
Merge pull request #5126 from martin-frbg/cirrusbsd4
CirrusCI: Update FreeBSD jobs to 14.2
1 year ago
Martin Kroeker
7f1f776f58
Update FreeBSD jobs to 14.2
1 year ago
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
1 year ago
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
1 year ago
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
1 year ago
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
1 year ago
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
1 year ago
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
5d6356bc16
LoongArch64: Fixed amax_lsx.S
Fixed register zeroing operation
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Martin Kroeker
f42ce7067f
Merge pull request #5116 from martin-frbg/issue5110
Handle INCX=0 in ?NRM2
1 year ago
Martin Kroeker
7478c10268
Merge branch 'OpenMathLib:develop' into issue5110
1 year ago
Martin Kroeker
c54f5417cc
Merge pull request #5118 from martin-frbg/zrot_utestext
Disable extended utests for CSROT/ZDROT that invoke undefined behavior
1 year ago
Martin Kroeker
57208b8bce
Disable tests with incx,incy=0 (undefined behavior)
1 year ago
Martin Kroeker
3a4a9b21eb
Disable tests with incx,incy=0 (undefined behavior)
1 year ago
Martin Kroeker
60d0be0e97
Update nrm2.c
1 year ago
Martin Kroeker
0fd5448b2c
Handle INCX=0
1 year ago
Martin Kroeker
1b85b6a396
Merge pull request #5108 from taoye9/sbgemm_neoversev1
Add SBGEMM for arm neoversev1
1 year ago
Martin Kroeker
cae480683a
Merge pull request #5113 from martin-frbg/issue5112
Ensure that GEMMTR name appears in XERBLA if GEMMT was called as such
1 year ago
Martin Kroeker
db7e5f1fa7
Update gemmt.c
1 year ago
Martin Kroeker
ff30ac9666
Update Makefile
1 year ago
Martin Kroeker
7c3e169b67
Update gemmt.c
1 year ago
Martin Kroeker
09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such
1 year ago
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
1 year ago
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
1 year ago
Martin Kroeker
c139b63342
Merge pull request #5107 from jhgit/develop
fix signedness of pointer to integer type passed to blas_lock()
1 year ago
John Hein
6cd9bbe531
fix signedness of pointer to integer type passed to blas_lock()
1 year ago
Martin Kroeker
5de5072940
Improve flang-new identification and add CI job for it on OSX-x86_64 ( #5103 )
* AzureCI: Add LLVM/flang-new build on OSX-x86_64
* distinguish classic flang from flang-new in name based recognition
1 year ago
Martin Kroeker
1f74fb9a07
Merge pull request #5101 from martin-frbg/issue5100
Fix CMake build for PPCG4 breaking due to unparsable KERNEL file
1 year ago