manjam01
5c4e38ab17
Optimize gemv_n_sve kernel
11 months ago
Martin Kroeker
ef9e3f7159
Merge pull request #5149 from martin-frbg/fixup5077-5088
Make the Neoverse GEMM/GEMV throttling code conditional on SMP
11 months ago
Martin Kroeker
09ba099461
make throttling code conditional on SMP
11 months ago
Martin Kroeker
1533fe49be
Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
11 months ago
Martin Kroeker
c03a81b927
Merge pull request #5141 from michalowski-arm/fork-throttle
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
11 months ago
Martin Kroeker
643966d9c7
Merge pull request #5146 from martin-frbg/issue5123
Fix "dummy2" flag reading in PPC970 S/DSCAL
11 months ago
Martin Kroeker
77fba0f400
Fix "dummy2" flag handling
11 months ago
Ye Tao
f0bea79a6e
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
11 months ago
Martin Kroeker
20d1118865
Merge pull request #5143 from martin-frbg/issue5111
Fix GEMMT transforming the input array B in some complex cases
11 months ago
Martin Kroeker
75b958a018
Transform the B array back if necessary before returning
11 months ago
Marek Michalowski
650a062e19
Add thread throttling profile for SGEMV on `NEOVERSEV2`
11 months ago
Marek Michalowski
b723c1b7b7
Add thread throttling profile for SGEMM on `NEOVERSEV2`
11 months ago
Martin Kroeker
ceb8f1e34b
Merge pull request #5140 from martin-frbg/issue5139
Add ARM64 options for NVIDIA HPC
11 months ago
Martin Kroeker
f1fa370579
fix missing endif
11 months ago
Martin Kroeker
6d1444be3a
Add ARM64 options for NVIDIA HPC
11 months ago
Martin Kroeker
eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
11 months ago
Martin Kroeker
abbd78aa59
Merge pull request #5138 from martin-frbg/issue5131
Ensure that gmake builds with flang-new link the flang runtime into the shared library
11 months ago
Martin Kroeker
ebcab90976
Handle flang-new runtime library linking on Linux like classic-flang
11 months ago
Martin Kroeker
ed1584666c
Merge pull request #5137 from martin-frbg/issue5136
Fix the CMake build to define USE_TRMM for RISCV64 targets as well
11 months ago
Martin Kroeker
b9ae246f20
define USE_TRMM for RISCV64 targets as well
11 months ago
Martin Kroeker
86cf9d8a2e
Merge pull request #5133 from OpenMathLib/revert-4920-issue4917
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
11 months ago
Martin Kroeker
0b3c56968d
Merge pull request #5135 from martin-frbg/ghwf-n2
CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow
11 months ago
Martin Kroeker
c1bb90a823
remove the express NeoverseN2 target from the Cobalt100 job
11 months ago
Martin Kroeker
77c638db67
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
11 months ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
11 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
1 year ago
Martin Kroeker
a64b75a2e0
Merge pull request #5127 from Harishmcw/gesv-threshold
Refined GESV Parallelization Logic for Windows on ARM64
11 months ago
Martin Kroeker
453efbd103
Merge pull request #5128 from martin-frbg/issue5120
Add -O2 to flang flags when building on WoA in Release mode
11 months ago
Martin Kroeker
877d5a5be6
Add -O2 to flang flags when building on WoA in Release mode
11 months ago
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
LoongArch64: Fixed lapack test for LA264
11 months ago
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
11 months ago
Martin Kroeker
e8b11a126b
Merge pull request #5125 from martin-frbg/issue5122
Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code
11 months ago
Martin Kroeker
9a3948df82
Merge pull request #5126 from martin-frbg/cirrusbsd4
CirrusCI: Update FreeBSD jobs to 14.2
11 months ago
Martin Kroeker
7f1f776f58
Update FreeBSD jobs to 14.2
11 months ago
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
11 months ago
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
11 months ago
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
11 months ago
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
11 months ago
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
11 months ago
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
11 months ago
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
11 months ago
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
5d6356bc16
LoongArch64: Fixed amax_lsx.S
Fixed register zeroing operation
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Martin Kroeker
f42ce7067f
Merge pull request #5116 from martin-frbg/issue5110
Handle INCX=0 in ?NRM2
11 months ago
Martin Kroeker
7478c10268
Merge branch 'OpenMathLib:develop' into issue5110
11 months ago
Martin Kroeker
c54f5417cc
Merge pull request #5118 from martin-frbg/zrot_utestext
Disable extended utests for CSROT/ZDROT that invoke undefined behavior
11 months ago
Martin Kroeker
57208b8bce
Disable tests with incx,incy=0 (undefined behavior)
11 months ago
Martin Kroeker
3a4a9b21eb
Disable tests with incx,incy=0 (undefined behavior)
11 months ago
Martin Kroeker
60d0be0e97
Update nrm2.c
11 months ago
Martin Kroeker
0fd5448b2c
Handle INCX=0
11 months ago