Martin Kroeker
a1efb03610
Update cscal.c
11 months ago
Martin Kroeker
80bf765839
Update cscal.c
11 months ago
Martin Kroeker
b1008985ae
Update cscal.c
11 months ago
Martin Kroeker
41cd46c2a9
Update cscal.c
11 months ago
Martin Kroeker
7b915870eb
Update cscal.c
11 months ago
Martin Kroeker
ef01810dde
Update cscal.c
11 months ago
Martin Kroeker
234bba3810
Update cscal.c
11 months ago
Martin Kroeker
62d8047c42
Update cscal.c
11 months ago
Martin Kroeker
2a17540469
Update cscal.c
11 months ago
Martin Kroeker
1c3fcfdbb3
Update cscal.c
11 months ago
Martin Kroeker
3c150610b7
Update cscal.c
11 months ago
Martin Kroeker
b23efc5846
add handling of dummy2 flag
11 months ago
Martin Kroeker
0b0bb9951d
Merge pull request #5265 from guoyuanplct/develop
kernel/riscv64:Added support for omatcopy on RISCV64_ZVL256B
1 year ago
guoyuanplct
be9f7550b5
Format Code
1 year ago
guoyuanplct
4d213653d8
kernel/riscv64:Added support for omatcopy on riscv64.
1 year ago
Martin Kroeker
8afddc1a81
Merge pull request #5262 from guoyuanplct/develop
kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:
1 year ago
guoyuanplct
9a7e3f102b
kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:
1 year ago
pengxu
a978ad3180
Loongarch64: add C functions of zgemm_ncopy_16
1 year ago
pengxu
0ccb050583
Loongarch64: fixed cgemm_ncopy_16_lasx
1 year ago
Martin Kroeker
5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS ( #5222 )
* Fix ARMV9SME target and add support_sme1 code for MacOS
* make sgemm_direct unconditionally available on all arm64
* build a (dummy) sgemm_direct kernel on all arm64
* Update dynamic_arm64.c
1 year ago
Martin Kroeker
151b74284e
Merge pull request #5203 from quic/fix-sgemmdirect-sme1
Add vector registers to clobber list to prevent compiler optimization.
1 year ago
Martin Kroeker
cba32d001a
Merge pull request #5245 from guoyuanplct/develop
Optimized RVV_ZVL256B Implementation of zgemv_n
1 year ago
pengxu
f19e72c402
Loongarch64: fixed swap_lasx
1 year ago
pengxu
b471fa337b
Loongarch64: fixed snrm2_lasx
1 year ago
pengxu
57bb46bedf
Loongarch64: fixed rot_lasx
1 year ago
pengxu
6dc4ca2391
Loongarch64: fixed icamax_lasx
1 year ago
pengxu
b528b1b8ea
Loongarch64: fixed iamax_lasx
1 year ago
pengxu
ba9569e382
Loongarch64: fixed dot_lasx
1 year ago
pengxu
dc5fa29851
Loongarch64: fixed cscal_lasx
1 year ago
pengxu
a98dd6d911
Loongarch64: fixed copy_lasx
1 year ago
pengxu
d49319c2d2
Loongarch64: fixed cnrm2_lasx
1 year ago
pengxu
74c97ef814
Loongarch64: fixed cdot_lasx
1 year ago
pengxu
be525521ad
Loongarch64: fixed asum_lasx
1 year ago
pengxu
0cd5ca5527
Loongarch64: fixed amax_lasx
1 year ago
guoyuanplct
11ffc8680e
Format the code
1 year ago
guoyuanplct
7616c42095
Optimized RVV_ZVL256B Implementation of zgemv_n
The implementation of zgemv_n using RVV_ZVL256B has been optimized.
Compared to the previous implementation, it has achieved a 1.5x
performance improvement.
1 year ago
abhishek-fujitsu
9c02cdb073
optimise dot using thread throttling for NEOVERSE V1
1 year ago
Martin Kroeker
d0e8fd6d40
Merge pull request #5239 from annop-w/gemv_n_sve
Use SVE kernel for S/DGEMVN for SVE machines
1 year ago
Iha, Taisei
08b5c18d70
fixed a potential out-of-bounds on gemv.
1 year ago
Annop Wongwathanarat
e11744a411
Use SVE kernel for S/DGEMVN for SVE machines
1 year ago
Martin Kroeker
db0abfa907
Merge pull request #5238 from martin-frbg/revert5125
remove non-vectorized SGEMV transpose reduce path for POWER8, restoring optimizations frpm PR4880
1 year ago
Martin Kroeker
7389b6c483
Merge pull request #5237 from martin-frbg/revert5219
Fix and reinstate the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV
1 year ago
Martin Kroeker
4ec62d7f73
remove non-vectorized code path for power8, restoring PR4880
1 year ago
Martin Kroeker
1df8738f27
Merge pull request #5235 from quickwritereader/issue_unaligned_ppc64le
Explicit unaligned vector load/stores in PPC64LE GEMV kernels
1 year ago
Martin Kroeker
99d9f1ff38
Fix conditional
1 year ago
Martin Kroeker
96d80801bc
Reinstate the CooperLake microkernel
1 year ago
Martin Kroeker
2e4309315c
Merge pull request #5219 from martin-frbg/sbgemvn_cooper
Temporarily disable the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV
1 year ago
Ubuntu
0cc2485594
Explicit unaligned vector load/stores in PPC64LE GEMV kernels
1 year ago
Martin Kroeker
dd38b4e811
Merge pull request #5225 from annop-w/gemv_n
Improve performance for SGEMVN on NEONVERSEN1
1 year ago
Martin Kroeker
0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll
Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.
1 year ago