Martin Kroeker
|
d74eb02954
|
Merge pull request #5057 from martin-frbg/issue5050
Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds
|
1 year ago |
Martin Kroeker
|
30f7a4120b
|
Merge pull request #5056 from tingboliao/dev_omatcopy_20250108
Optimize the omatcopy_cn/zomatcopy_cn kernels with RVV 1.0 intrinsic.
|
1 year ago |
gxw
|
20a8e48f25
|
LoongArch64: Update ssymv LASX version
|
1 year ago |
gxw
|
e0748588b8
|
LoongArch64: Update dsymv LASX version
|
1 year ago |
Martin Kroeker
|
d91d4fa6e9
|
convert the beta=0 branch to a for loop as well
|
1 year ago |
Martin Kroeker
|
09e75f1588
|
fix absurd typo
|
1 year ago |
Martin Kroeker
|
2891fd8d6d
|
Replace while loop with for
|
1 year ago |
tingbo.liao
|
0a5dbf13d3
|
Optimize the omatcopy_cn and zomatcopy_cn kernels with RVV 1.0 intrinsic.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
|
1 year ago |
Sergey Fedorov
|
229efa42ff
|
scal.S: use r11 on 32-bit Darwin on powerpc
|
1 year ago |
Sergey Fedorov
|
81e1be8d90
|
Revert "temporarily disable the default S/DSCAL kernel"
This reverts commit 9b9c0aa5c9.
|
1 year ago |
Martin Kroeker
|
9b9c0aa5c9
|
temporarily disable the default S/DSCAL kernel
|
1 year ago |
tingbo.liao
|
c37509c213
|
Optimize the nrm2_rvv function to further improve performance.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
|
1 year ago |
tingbo.liao
|
0bea1cfd9d
|
Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
|
1 year ago |
tingbo.liao
|
d00cc400b1
|
Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
|
1 year ago |
Martin Kroeker
|
229d8a025e
|
Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
|
1 year ago |
SushilPratap04
|
3368a4e697
|
Update swap_kernel_sve.c
|
1 year ago |
CDAC-SSDG
|
dd71e4234a
|
Added Updated swap and rot sve kernels.
|
1 year ago |
CDAC-SSDG
|
06ffd411a5
|
Update KERNEL.ARMV8SVE
|
1 year ago |
CDAC-SSDG
|
765850194e
|
Delete kernel/arm64/swap_kernel_sve.c
|
1 year ago |
CDAC-SSDG
|
c17c19fbcf
|
Delete kernel/arm64/swap_kernel_c.c
|
1 year ago |
CDAC-SSDG
|
f6416c0e37
|
Delete kernel/arm64/swap.c
|
1 year ago |
CDAC-SSDG
|
3b7b74664c
|
Delete kernel/arm64/scal_kernel_sve.c
|
1 year ago |
CDAC-SSDG
|
95a97012e8
|
Delete kernel/arm64/scal_kernel_c.c
|
1 year ago |
CDAC-SSDG
|
5540f2121e
|
Delete kernel/arm64/scal.c
|
1 year ago |
CDAC-SSDG
|
f62519cc87
|
Delete kernel/arm64/rot_kernel_sve.c
|
1 year ago |
CDAC-SSDG
|
10857c9df4
|
Delete kernel/arm64/rot_kernel_c.c
|
1 year ago |
CDAC-SSDG
|
b9f51a5cf7
|
Delete kernel/arm64/rot.c
|
1 year ago |
Martin Kroeker
|
81666de4ef
|
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
|
1 year ago |
Martin Kroeker
|
3345007d8f
|
retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN
|
1 year ago |
Martin Kroeker
|
5fe983db29
|
retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies
|
1 year ago |
Iha, Taisei
|
4918beecbe
|
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
|
1 year ago |
Juliya32
|
3b2421cba0
|
Add files via upload
|
1 year ago |
Juliya32
|
012fe4da36
|
Delete kernel/arm64/rot_kernel_sve.c
|
1 year ago |
Juliya32
|
d90ee00f85
|
Delete kernel/arm64/rot_kernel_c.c
|
1 year ago |
Juliya32
|
668e28adc4
|
Delete kernel/arm64/rot.c
|
1 year ago |
SushilPratap04
|
fa880ab1cf
|
Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
|
1 year ago |
SushilPratap04
|
7822ae9617
|
Added sve kernels for rot routine.
|
1 year ago |
SushilPratap04
|
b8bc2a752e
|
Added sve optimized kernels for swap routine
|
1 year ago |
CDAC-SSDG
|
0667cf6c92
|
Added optimized scal routine files
|
1 year ago |
gxw
|
73c6a28073
|
x86_64: opt somatcopy_ct with AVX
|
1 year ago |
Ayappan Perumal
|
020cce1068
|
Fix build issues with gcc compiler as well
|
1 year ago |
Ayappan Perumal
|
b6ec73e77c
|
Fix AIX build
|
1 year ago |
Martin Kroeker
|
016bdb9b0b
|
Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
LoongArch64: Opt somatcopy with LASX
|
1 year ago |
Chip Kerchner
|
ab71a1edf2
|
Better VSX.
|
1 year ago |
gxw
|
bb31bbef52
|
LoongArch64: Opt somatcopy_ct with LASX
|
1 year ago |
gxw
|
b37129341b
|
LoongArch64: Opt somatcopy_cn with LASX
|
1 year ago |
gxw
|
acf6cab304
|
LoongArch64: Opt somatcopy_rn with LASX
|
1 year ago |
gxw
|
15edb441bf
|
LoongArch64: Opt somatcopy_rt with LASX
|
1 year ago |
Chip Kerchner
|
36bd3eeddf
|
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
|
1 year ago |
Martin Kroeker
|
e52d9b4cf1
|
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
|
1 year ago |