Annop Wongwathanarat
c0318cea6e
Simplify gemv_t_sve_v1x3 kernel
1 year ago
Martin Kroeker
87083fdbf6
[WIP] Work around assembler limitations in current LLVM for Windows on Arm ( #5076 )
* Protect align directives in assembly files that are currently problematic with LLVM on WoA
* use the armv8 zdot on WoA to work around other LLVM issues
1 year ago
tingbo.liao
ef7f54b357
Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
gxw
e0a8216554
LoongArch64: Update dsymv LSX version
1 year ago
gxw
a9070ba3f9
LoongArch64: Update ssymv LSX version
1 year ago
Xi Ruoyao
af10c132b8
LoongArch64: Fix dsymv and ssymv LASX version
"fmov.d $f2, $f4" leaves all the bits higher than the 63-th bit
unpredictable but it's obvious that the following code uses the value of
those high bits. We actually want to replicate the lower 64 bits here,
so we should use xvreplve0.d instead.
LA464 (Loongson 3[A-Z]-5000) happens to replicate them for us due to
some uarch internal details so the issue was not detected, but for LA664
(Loongson 3[A-Z]-6000) and future uarch we need to do things correctly
or we end up getting a lot of test failures.
Closes: https://bbs.aosc.io/t/topic/302
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
1 year ago
Martin Kroeker
d74eb02954
Merge pull request #5057 from martin-frbg/issue5050
Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds
1 year ago
Martin Kroeker
30f7a4120b
Merge pull request #5056 from tingboliao/dev_omatcopy_20250108
Optimize the omatcopy_cn/zomatcopy_cn kernels with RVV 1.0 intrinsic.
1 year ago
gxw
20a8e48f25
LoongArch64: Update ssymv LASX version
1 year ago
gxw
e0748588b8
LoongArch64: Update dsymv LASX version
1 year ago
Martin Kroeker
d91d4fa6e9
convert the beta=0 branch to a for loop as well
1 year ago
Martin Kroeker
09e75f1588
fix absurd typo
1 year ago
Martin Kroeker
2891fd8d6d
Replace while loop with for
1 year ago
tingbo.liao
0a5dbf13d3
Optimize the omatcopy_cn and zomatcopy_cn kernels with RVV 1.0 intrinsic.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Sergey Fedorov
229efa42ff
scal.S: use r11 on 32-bit Darwin on powerpc
1 year ago
Sergey Fedorov
81e1be8d90
Revert "temporarily disable the default S/DSCAL kernel"
This reverts commit 9b9c0aa5c9 .
1 year ago
Martin Kroeker
9b9c0aa5c9
temporarily disable the default S/DSCAL kernel
1 year ago
tingbo.liao
c37509c213
Optimize the nrm2_rvv function to further improve performance.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
tingbo.liao
0bea1cfd9d
Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
tingbo.liao
d00cc400b1
Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Martin Kroeker
229d8a025e
Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
1 year ago
SushilPratap04
3368a4e697
Update swap_kernel_sve.c
1 year ago
CDAC-SSDG
dd71e4234a
Added Updated swap and rot sve kernels.
1 year ago
CDAC-SSDG
06ffd411a5
Update KERNEL.ARMV8SVE
1 year ago
CDAC-SSDG
765850194e
Delete kernel/arm64/swap_kernel_sve.c
1 year ago
CDAC-SSDG
c17c19fbcf
Delete kernel/arm64/swap_kernel_c.c
1 year ago
CDAC-SSDG
f6416c0e37
Delete kernel/arm64/swap.c
1 year ago
CDAC-SSDG
3b7b74664c
Delete kernel/arm64/scal_kernel_sve.c
1 year ago
CDAC-SSDG
95a97012e8
Delete kernel/arm64/scal_kernel_c.c
1 year ago
CDAC-SSDG
5540f2121e
Delete kernel/arm64/scal.c
1 year ago
CDAC-SSDG
f62519cc87
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
CDAC-SSDG
10857c9df4
Delete kernel/arm64/rot_kernel_c.c
1 year ago
CDAC-SSDG
b9f51a5cf7
Delete kernel/arm64/rot.c
1 year ago
Martin Kroeker
81666de4ef
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
1 year ago
Martin Kroeker
3345007d8f
retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN
1 year ago
Martin Kroeker
5fe983db29
retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies
1 year ago
Iha, Taisei
4918beecbe
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
1 year ago
Juliya32
3b2421cba0
Add files via upload
1 year ago
Juliya32
012fe4da36
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
Juliya32
d90ee00f85
Delete kernel/arm64/rot_kernel_c.c
1 year ago
Juliya32
668e28adc4
Delete kernel/arm64/rot.c
1 year ago
SushilPratap04
fa880ab1cf
Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
1 year ago
SushilPratap04
7822ae9617
Added sve kernels for rot routine.
1 year ago
SushilPratap04
b8bc2a752e
Added sve optimized kernels for swap routine
1 year ago
CDAC-SSDG
0667cf6c92
Added optimized scal routine files
1 year ago
gxw
73c6a28073
x86_64: opt somatcopy_ct with AVX
1 year ago
Ayappan Perumal
020cce1068
Fix build issues with gcc compiler as well
1 year ago
Ayappan Perumal
b6ec73e77c
Fix AIX build
1 year ago
Martin Kroeker
016bdb9b0b
Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
LoongArch64: Opt somatcopy with LASX
1 year ago
Chip Kerchner
ab71a1edf2
Better VSX.
1 year ago