Mark Seminatore
4ebf814b42
fix bug failing to mark task as finished.
2 years ago
Mark Seminatore
5f51811728
try at new threading model
2 years ago
Martin Kroeker
a8cb611157
Merge pull request #4358 from martin-frbg/lapack954
Fix keyword used to count successful tests (Reference-LAPACK PR 954)
2 years ago
Martin Kroeker
589f2b6466
Fix search phrase used to count successful tests (Reference-LAPACK PR 954)
2 years ago
Martin Kroeker
6aa5f53e26
Merge pull request #4357 from martin-frbg/lapack953
Fix memory leak in LAPACK testing framework (Reference-LAPACK PR 953)
2 years ago
Martin Kroeker
effb7af2a2
Fix memory leak (Reference-LAPACK PR 953)
2 years ago
Martin Kroeker
5915a69734
Merge pull request #4356 from martin-frbg/lapack736-2
Add LAPACK tests for the Dynamic Mode Decomposition functions from Reference-LAPACK PR 736
2 years ago
Martin Kroeker
226a14c549
Restore library path adjustments
2 years ago
Martin Kroeker
c5fa318add
Add tests for DMD (Reference-LAPACK PR 736)
2 years ago
Martin Kroeker
fa03e5497a
Add tests for the DMD functions (Reference-LAPACK PR 736)
2 years ago
Martin Kroeker
a53a79e059
Add tests for the DMD functions (Reference-LAPACK PR 736)
2 years ago
Martin Kroeker
e3039fa7f6
Merge pull request #4351 from catap/cmake-old-macos
Use 64bit build on `CMAKE_SYSTEM_PROCESSOR=i386` on Darwin
2 years ago
Octavian Maghiar
4a12cf53ec
[RISC-V] Improve RVV kernel generator LMUL usage
The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers.
Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax.
At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW.
Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.
2 years ago
Octavian Maghiar
e4586e81b8
[RISC-V] Add RISC-V Vector 128-bit target
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
2 years ago
Erik Bråthen Solem
2381132ada
Darwin < 20: always write xerbla.c.o into archive
Write xerbla.c.o into archive regardless of timestamp by using ar -rs
instead of ar -ru.
2 years ago
Erik Bråthen Solem
89fa51d495
Revert 42b5e08 ("Allow weak linking on old macOS")
2 years ago
Kirill A. Korinsky
08fde5ebd2
Use 64bit build on `CMAKE_SYSTEM_PROCESSOR=i386` on Darwin
Here a bit tricky things.
A value `CMAKE_SYSTEM_PROCESSOR` is came from output of `uname -m` which
migth be 32bit with 64bit building applicaiton.
So, for that case use `CMAKE_SIZEOF_VOID_P` to detect the target.
See https://trac.macports.org/ticket/68488
2 years ago
Martin Kroeker
39bf8ece20
Merge pull request #4340 from yinshiyou/la-dev
Add some refines and optimizations for LoongArch.
2 years ago
Martin Kroeker
42b5e081d8
Merge pull request #4348 from catap/macos-undefinded-dynamic-lookup
Allow weak linking on old macOS
2 years ago
Kirill A. Korinsky
a1562e4bae
Allow weak linking on old macOS
2 years ago
Martin Kroeker
c4a622db9e
Merge pull request #4346 from martin-frbg/issue4343
Fix CMAKE installation location of lapacke_mangling header
2 years ago
Shiyou Yin
9fe07d82fd
loongarch: Add LSX optimization for dot.
2 years ago
Shiyou Yin
13b8c44b44
loongarch: Add optimization for dsdot kernel.
2 years ago
Shiyou Yin
3def6a8143
loongarch: Add LASX optimization for dot.
2 years ago
Shiyou Yin
1310a0931b
loongarch: Refine build control for loongarch64.
1. Use getauxval instead of cpucfg to test hardware capability.
2. Remove unnecessary code and option for compiler check in c_check.
2 years ago
Martin Kroeker
ff92e6e707
Fix installation location of lapacke_mangling header
2 years ago
Martin Kroeker
b7a28f5e42
Merge pull request #4344 from catap/macos-always-use-ar
Enable overstep of too long args without DYNAMIC_ARCH
2 years ago
Kirill A. Korinsky
9beee55167
Enable overstep of too long args without DYNAMIC_ARCH
2 years ago
Kirill A. Korinsky
01c7010543
cmake/openblas.pc.in: fixed version and URL
2 years ago
Martin Kroeker
fc66ecd25a
Merge pull request #4339 from martin-frbg/lapack-3-12-0
Update version number and documentation of Reference-LAPACK to 3.12.0
2 years ago
Martin Kroeker
08be9004f8
Update version number and copyright date to Reference-LAPACK 3.12.0
2 years ago
Martin Kroeker
578f0f9590
Update version number to 3.12.0
2 years ago
Martin Kroeker
3d9e20f614
Update version to 3.12.0
2 years ago
Martin Kroeker
f7351e493c
Update Reference-LAPACK docs to 3.12.0
2 years ago
Martin Kroeker
be8661ba40
Merge pull request #4338 from martin-frbg/lapack941
Docu fix for Truncated QR With Pivoting (Reference-LAPACK PR 941)
2 years ago
Martin Kroeker
ca5a87ff1d
Small documentation fix for Truncated QR With Pivoting (Reference-LAPACK PR 941)
2 years ago
Shiyou Yin
f745f02f35
benchmark: Fix missing colons in outputs of ./strsv.goto
2 years ago
Martin Kroeker
97d3c9b827
Merge pull request #4336 from martin-frbg/fix4322
Revert unintentional change to gmake linking rule in LAPACK TESTING/LIN
2 years ago
Martin Kroeker
c883abf838
Revert unintentional change to linking rule from PR 4322
2 years ago
Martin Kroeker
8138999cd0
Merge pull request #4333 from codeworm96/update_dynamic_core_readme
Update the list of default dynamic targets for x86_64 in the README to be consistent with the Makefile
2 years ago
Martin Kroeker
a938e48fa2
Merge pull request #4334 from RajalakshmiSR/Makefile_power
POWER: Fixing Makefile error
2 years ago
Rajalakshmi Srinivasaraghavan
47da601a2d
POWER: Fixing Makefile error
Recent commit d99aad8ee3 added
extra `)`. This patch fixes the warning from Makefile.
2 years ago
Yuning Zhang
54be8f4d67
Update the list of default dynamic targets for x86_64 in the README to be consistent with the Makefile
Signed-off-by: Yuning Zhang <codeworm96@outlook.com>
2 years ago
Martin Kroeker
d526c4306f
Merge pull request #4329 from isuruf/sbgemm
Fix building test_sbgemm
2 years ago
Martin Kroeker
2ea65bacd0
Merge pull request #4330 from bartoldeman/asum-init-mask
Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
2 years ago
Bart Oldeman
c34e2cf380
Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.
For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
2 years ago
Martin Kroeker
864c65b526
Merge pull request #4328 from martin-frbg/4239-3
Copy XCode15-specific workaround for Apple M to Fortran flags to fix build of tests
2 years ago
Isuru Fernando
6b2651ece3
Fix building test_sbgemm
2 years ago
Martin Kroeker
22aa401656
Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC ( #4327 )
* Temporarily disable the C/ZASUM microkernels for any version of NVHPC
2 years ago
Martin Kroeker
47b03fd4b4
Copy XCode15-specific workaround to Fortran flags to fix build of tests
2 years ago