Martin Kroeker
b30dc9701f
Merge pull request #5215 from annop-w/gemv_t
Use SVE kernel for S/DGEMVT for SVE machines
1 year ago
Martin Kroeker
2893d0add4
Merge pull request #5211 from guoyuanplct/develop
Optimizing the Implementation of GEMV on the RISC-V V Extension
1 year ago
Annop Wongwathanarat
ec146157d3
Use SVE kernel for S/DGEMVT for SVE machines
1 year ago
Martin Kroeker
70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
1 year ago
lglglglgy
1ff303f36e
Optimizing the Implementation of GEMV on the RISC-V V Extension
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
1 year ago
ColumbusAI
7bf848454d
Update zsum.c -- fixed spelling error to successfully compile
spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.
1 year ago
Egbert Eich
ea6515c4b3
On zarch don't produce objects from assembler with a writable stack section
On z-series, the current version of the GNU toolchain produces warnings
such as:
```
/usr/lib64/gcc/[...]/s390x-suse-linux/bin/ld: warning: ztrmm_kernel_RC_Z14.o: missing .note.GNU-stack section implies
executable stack
/usr/lib64/[...]/s390x-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
```
To prevent this message and make sure we are future proof, add
```
.section .note.GNU-stack,"",@progbits
```
Also add the `.size` bit to give the asm defined functions a proper size
in the symbol table.
Signed-off-by: Egbert Eich <eich@suse.com>
1 year ago
Ruiyang Wu
02fd1df10b
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
1 year ago
Ye Tao
f27ba5efd1
fix bugs in aarch64 sbgemv_n kernel
1 year ago
Annop Wongwathanarat
edef2e4441
Fix bug in ARM64 sbgemv_t
1 year ago
Martin Kroeker
b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy
Optimize aarch64 sgemm_ncopy
1 year ago
Martin Kroeker
2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
1 year ago
Annop Wongwathanarat
9807f56580
Optimize aarch64 sgemm_ncopy
1 year ago
Martin Kroeker
a3e7b16072
Merge pull request #5157 from manaalmj/feature
Optimize gemv_n_sve kernel
1 year ago
Ye Tao
4c00099ed6
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
1 year ago
Annop Wongwathanarat
a085b6c9ec
Fix aarch64 sbgemv_t compilation error for GCC < 13
1 year ago
manjam01
5c4e38ab17
Optimize gemv_n_sve kernel
1 year ago
Martin Kroeker
1d5ed5c46b
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
1 year ago
Ye Tao
6b8b35cdf2
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
1 year ago
Ye Tao
38ee7c9301
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
1 year ago
Martin Kroeker
2b941c44b5
Merge branch 'develop' into sbgemv_n_neon
1 year ago
Ye Tao
35bdbca153
Add sbgemv_n_neon kernel for arm64.
1 year ago
Annop Wongwathanarat
edaf51dd99
Add sbgemv_t_bfdot kernel for ARM64
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
1 year ago
Martin Kroeker
77fba0f400
Fix "dummy2" flag handling
1 year ago
Martin Kroeker
eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
1 year ago
Martin Kroeker
b9ae246f20
define USE_TRMM for RISCV64 targets as well
1 year ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
1 year ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
1 year ago
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
LoongArch64: Fixed lapack test for LA264
1 year ago
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
1 year ago
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
1 year ago
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
1 year ago
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
1 year ago
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
1 year ago
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Hao Chen
5d6356bc16
LoongArch64: Fixed amax_lsx.S
Fixed register zeroing operation
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
1 year ago
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
1 year ago
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
1 year ago
Martin Kroeker
d7036cfd74
Remove trailing blanks that break the cmake parser
1 year ago
Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
1 year ago
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
1 year ago
Martin Kroeker
180ba5e7d0
Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
1 year ago
Deeksha Goplani
d1bfa979f7
small gemm kernel packing modifications
1 year ago
Martin Kroeker
1a6a9fb22f
add another generator line for rotm
1 year ago
Martin Kroeker
4924319c50
fix position of srotm, qrotm
1 year ago
Martin Kroeker
b58cba9eb6
fix qrotm build rules
1 year ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Annop Wongwathanarat
c0318cea6e
Simplify gemv_t_sve_v1x3 kernel
1 year ago