Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
1 year ago
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
1 year ago
Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
1 year ago
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
1 year ago
Martin Kroeker
180ba5e7d0
Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
1 year ago
Deeksha Goplani
d1bfa979f7
small gemm kernel packing modifications
1 year ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Annop Wongwathanarat
c0318cea6e
Simplify gemv_t_sve_v1x3 kernel
1 year ago
Martin Kroeker
87083fdbf6
[WIP] Work around assembler limitations in current LLVM for Windows on Arm ( #5076 )
* Protect align directives in assembly files that are currently problematic with LLVM on WoA
* use the armv8 zdot on WoA to work around other LLVM issues
1 year ago
Martin Kroeker
229d8a025e
Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
1 year ago
SushilPratap04
3368a4e697
Update swap_kernel_sve.c
1 year ago
CDAC-SSDG
dd71e4234a
Added Updated swap and rot sve kernels.
1 year ago
CDAC-SSDG
06ffd411a5
Update KERNEL.ARMV8SVE
1 year ago
CDAC-SSDG
765850194e
Delete kernel/arm64/swap_kernel_sve.c
1 year ago
CDAC-SSDG
c17c19fbcf
Delete kernel/arm64/swap_kernel_c.c
1 year ago
CDAC-SSDG
f6416c0e37
Delete kernel/arm64/swap.c
1 year ago
CDAC-SSDG
3b7b74664c
Delete kernel/arm64/scal_kernel_sve.c
1 year ago
CDAC-SSDG
95a97012e8
Delete kernel/arm64/scal_kernel_c.c
1 year ago
CDAC-SSDG
5540f2121e
Delete kernel/arm64/scal.c
1 year ago
CDAC-SSDG
f62519cc87
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
CDAC-SSDG
10857c9df4
Delete kernel/arm64/rot_kernel_c.c
1 year ago
CDAC-SSDG
b9f51a5cf7
Delete kernel/arm64/rot.c
1 year ago
Martin Kroeker
81666de4ef
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
1 year ago
Martin Kroeker
3345007d8f
retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN
1 year ago
Martin Kroeker
5fe983db29
retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies
1 year ago
Iha, Taisei
4918beecbe
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
1 year ago
Juliya32
3b2421cba0
Add files via upload
1 year ago
Juliya32
012fe4da36
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
Juliya32
d90ee00f85
Delete kernel/arm64/rot_kernel_c.c
1 year ago
Juliya32
668e28adc4
Delete kernel/arm64/rot.c
1 year ago
SushilPratap04
fa880ab1cf
Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
1 year ago
SushilPratap04
7822ae9617
Added sve kernels for rot routine.
1 year ago
SushilPratap04
b8bc2a752e
Added sve optimized kernels for swap routine
1 year ago
CDAC-SSDG
0667cf6c92
Added optimized scal routine files
1 year ago
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
1 year ago
Chris Sidebottom
ba2e989c67
Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
1 year ago
Martin Kroeker
fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
1 year ago
Martin Kroeker
a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
1 year ago
yamazaki-mitsufumi
88caf02f62
Fix ambiguous error on Mac OS
1 year ago
Chris Sidebottom
ea4ab3b310
Better header guard around bridge
1 year ago
Chris Sidebottom
7311d93016
Unroll TT further
1 year ago
Chris Sidebottom
a9edddb695
Unroll TN further
1 year ago
Chris Sidebottom
9984c5ce9d
Clean up k2 removal more and unroll SGEMM more
1 year ago
Chris Sidebottom
b1c9fafabb
Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM
1 year ago
Martin Kroeker
eb4879e04c
make NAN handling depend on the dummy2 parameter
1 year ago
iha fujitsu
0985fdc82b
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
1 year ago
Martin Kroeker
3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
1 year ago
Chris Sidebottom
8c472ef7e3
Further tweak small GEMM for AArch64
1 year ago
Martin Kroeker
a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728
1 year ago
Martin Kroeker
3ec59922b6
Add a clobber list to fix utest errors seen with gcc13 on Apple M
1 year ago