Martin Kroeker
c1bb90a823
remove the express NeoverseN2 target from the Cobalt100 job
11 months ago
Martin Kroeker
a64b75a2e0
Merge pull request #5127 from Harishmcw/gesv-threshold
Refined GESV Parallelization Logic for Windows on ARM64
1 year ago
Martin Kroeker
453efbd103
Merge pull request #5128 from martin-frbg/issue5120
Add -O2 to flang flags when building on WoA in Release mode
1 year ago
Martin Kroeker
877d5a5be6
Add -O2 to flang flags when building on WoA in Release mode
1 year ago
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
LoongArch64: Fixed lapack test for LA264
1 year ago
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
1 year ago
Martin Kroeker
e8b11a126b
Merge pull request #5125 from martin-frbg/issue5122
Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code
1 year ago
Martin Kroeker
9a3948df82
Merge pull request #5126 from martin-frbg/cirrusbsd4
CirrusCI: Update FreeBSD jobs to 14.2
1 year ago
Martin Kroeker
7f1f776f58
Update FreeBSD jobs to 14.2
1 year ago
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
1 year ago
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
1 year ago
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
1 year ago
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
1 year ago
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
1 year ago
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
5d6356bc16
LoongArch64: Fixed amax_lsx.S
Fixed register zeroing operation
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Martin Kroeker
f42ce7067f
Merge pull request #5116 from martin-frbg/issue5110
Handle INCX=0 in ?NRM2
1 year ago
Martin Kroeker
7478c10268
Merge branch 'OpenMathLib:develop' into issue5110
1 year ago
Martin Kroeker
c54f5417cc
Merge pull request #5118 from martin-frbg/zrot_utestext
Disable extended utests for CSROT/ZDROT that invoke undefined behavior
1 year ago
Martin Kroeker
57208b8bce
Disable tests with incx,incy=0 (undefined behavior)
1 year ago
Martin Kroeker
3a4a9b21eb
Disable tests with incx,incy=0 (undefined behavior)
1 year ago
Martin Kroeker
60d0be0e97
Update nrm2.c
1 year ago
Martin Kroeker
0fd5448b2c
Handle INCX=0
1 year ago
Martin Kroeker
1b85b6a396
Merge pull request #5108 from taoye9/sbgemm_neoversev1
Add SBGEMM for arm neoversev1
1 year ago
Martin Kroeker
cae480683a
Merge pull request #5113 from martin-frbg/issue5112
Ensure that GEMMTR name appears in XERBLA if GEMMT was called as such
1 year ago
Martin Kroeker
db7e5f1fa7
Update gemmt.c
1 year ago
Martin Kroeker
ff30ac9666
Update Makefile
1 year ago
Martin Kroeker
7c3e169b67
Update gemmt.c
1 year ago
Martin Kroeker
09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such
1 year ago
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
1 year ago
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
1 year ago
Martin Kroeker
c139b63342
Merge pull request #5107 from jhgit/develop
fix signedness of pointer to integer type passed to blas_lock()
1 year ago
John Hein
6cd9bbe531
fix signedness of pointer to integer type passed to blas_lock()
1 year ago
Martin Kroeker
5de5072940
Improve flang-new identification and add CI job for it on OSX-x86_64 ( #5103 )
* AzureCI: Add LLVM/flang-new build on OSX-x86_64
* distinguish classic flang from flang-new in name based recognition
1 year ago
Martin Kroeker
1f74fb9a07
Merge pull request #5101 from martin-frbg/issue5100
Fix CMake build for PPCG4 breaking due to unparsable KERNEL file
1 year ago
Martin Kroeker
d7036cfd74
Remove trailing blanks that break the cmake parser
1 year ago
Martin Kroeker
3375a0c990
Merge pull request #5099 from martin-frbg/issue5097-2
Simplify build instructions for Windows on Arm
1 year ago
Martin Kroeker
7a27e2b00d
Simplify build instructions for Windows on Arm
1 year ago
Martin Kroeker
fdeac17237
Merge pull request #5098 from martin-frbg/issue5095
Fix compilation with BUILD_BFLOAT16 enabled
1 year ago
Martin Kroeker
1829ac5b44
Add (dummy) declaration of SBROT_M
1 year ago
Martin Kroeker
53d20a83f3
Merge pull request #5089 from annop-w/gemv_t
Simplify gemv_t_sve_v1x3 kernel
1 year ago
Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
1 year ago
Martin Kroeker
9b11fd5802
Merge pull request #5088 from michalowski-arm/develop
Add thread throttling profile for SGEMV on `NEOVERSEV1`
1 year ago
Martin Kroeker
5930c162ef
Merge pull request #5097 from matthew-brett/fix-woa-cmd
Fix Windows on ARM build instructions
1 year ago
Marek Michalowski
838bb57e27
Merge branch 'develop' into develop
1 year ago
Matthew Brett
252c43265d
Fix Windows on ARM build instructions
The command as merged uses the compiler target as the compiler path.
I have run and tested a build with this command.
@Mugundanmcw - is this correct?
1 year ago
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
1 year ago
Martin Kroeker
a54f9a9c69
Merge pull request #5071 from annop-w/sgemm_throttling
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago