tingbo.liao
0bea1cfd9d
Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
tingbo.liao
d00cc400b1
Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Martin Kroeker
229d8a025e
Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
1 year ago
SushilPratap04
3368a4e697
Update swap_kernel_sve.c
1 year ago
CDAC-SSDG
dd71e4234a
Added Updated swap and rot sve kernels.
1 year ago
CDAC-SSDG
06ffd411a5
Update KERNEL.ARMV8SVE
1 year ago
CDAC-SSDG
765850194e
Delete kernel/arm64/swap_kernel_sve.c
1 year ago
CDAC-SSDG
c17c19fbcf
Delete kernel/arm64/swap_kernel_c.c
1 year ago
CDAC-SSDG
f6416c0e37
Delete kernel/arm64/swap.c
1 year ago
CDAC-SSDG
3b7b74664c
Delete kernel/arm64/scal_kernel_sve.c
1 year ago
CDAC-SSDG
95a97012e8
Delete kernel/arm64/scal_kernel_c.c
1 year ago
CDAC-SSDG
5540f2121e
Delete kernel/arm64/scal.c
1 year ago
CDAC-SSDG
f62519cc87
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
CDAC-SSDG
10857c9df4
Delete kernel/arm64/rot_kernel_c.c
1 year ago
CDAC-SSDG
b9f51a5cf7
Delete kernel/arm64/rot.c
1 year ago
Martin Kroeker
81666de4ef
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
1 year ago
Martin Kroeker
3345007d8f
retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN
1 year ago
Martin Kroeker
5fe983db29
retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies
1 year ago
Iha, Taisei
4918beecbe
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
1 year ago
Juliya32
3b2421cba0
Add files via upload
1 year ago
Juliya32
012fe4da36
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
Juliya32
d90ee00f85
Delete kernel/arm64/rot_kernel_c.c
1 year ago
Juliya32
668e28adc4
Delete kernel/arm64/rot.c
1 year ago
SushilPratap04
fa880ab1cf
Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
1 year ago
SushilPratap04
7822ae9617
Added sve kernels for rot routine.
1 year ago
SushilPratap04
b8bc2a752e
Added sve optimized kernels for swap routine
1 year ago
CDAC-SSDG
0667cf6c92
Added optimized scal routine files
1 year ago
gxw
73c6a28073
x86_64: opt somatcopy_ct with AVX
1 year ago
Ayappan Perumal
020cce1068
Fix build issues with gcc compiler as well
1 year ago
Ayappan Perumal
b6ec73e77c
Fix AIX build
1 year ago
Martin Kroeker
016bdb9b0b
Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
LoongArch64: Opt somatcopy with LASX
1 year ago
Chip Kerchner
ab71a1edf2
Better VSX.
1 year ago
gxw
bb31bbef52
LoongArch64: Opt somatcopy_ct with LASX
1 year ago
gxw
b37129341b
LoongArch64: Opt somatcopy_cn with LASX
1 year ago
gxw
acf6cab304
LoongArch64: Opt somatcopy_rn with LASX
1 year ago
gxw
15edb441bf
LoongArch64: Opt somatcopy_rt with LASX
1 year ago
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
1 year ago
Martin Kroeker
e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
1 year ago
Gordon Fossum
0b7fb5c791
CGEMM & ZGEMM using C code.
1 year ago
Martin Kroeker
9783dd07ab
Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC
1 year ago
Martin Kroeker
c9e92348a6
Handle inf/nan if dummy2 flag is set
1 year ago
Martin Kroeker
d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds
1 year ago
Martin Kroeker
de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
1 year ago
Martin Kroeker
e05d98d00a
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros
1 year ago
Chip Kerchner
a0aeba631d
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Chip Kerchner
083faf7556
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Chip Kerchner
75472b830a
Merge branch 'develop' into betterPowerGEMVTail
1 year ago