abhishek-fujitsu
9c02cdb073
optimise dot using thread throttling for NEOVERSE V1
10 months ago
Martin Kroeker
d0e8fd6d40
Merge pull request #5239 from annop-w/gemv_n_sve
Use SVE kernel for S/DGEMVN for SVE machines
9 months ago
Iha, Taisei
08b5c18d70
fixed a potential out-of-bounds on gemv.
9 months ago
Annop Wongwathanarat
e11744a411
Use SVE kernel for S/DGEMVN for SVE machines
9 months ago
Martin Kroeker
dd38b4e811
Merge pull request #5225 from annop-w/gemv_n
Improve performance for SGEMVN on NEONVERSEN1
9 months ago
Martin Kroeker
0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll
Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.
9 months ago
Annop Wongwathanarat
d535728803
Improve performance for SGEMVN on NEONVERSEN1
9 months ago
Usui, Tetsuzo
d711906e3e
Add symv kernels for arm64
9 months ago
Iha, Taisei
f1e628b889
Further performance improvements to [SD]GEMV.
9 months ago
Annop Wongwathanarat
ec146157d3
Use SVE kernel for S/DGEMVT for SVE machines
10 months ago
Ye Tao
f27ba5efd1
fix bugs in aarch64 sbgemv_n kernel
10 months ago
Annop Wongwathanarat
edef2e4441
Fix bug in ARM64 sbgemv_t
10 months ago
Martin Kroeker
b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy
Optimize aarch64 sgemm_ncopy
10 months ago
Martin Kroeker
2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
10 months ago
Annop Wongwathanarat
9807f56580
Optimize aarch64 sgemm_ncopy
10 months ago
Martin Kroeker
a3e7b16072
Merge pull request #5157 from manaalmj/feature
Optimize gemv_n_sve kernel
10 months ago
Ye Tao
4c00099ed6
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
10 months ago
Annop Wongwathanarat
a085b6c9ec
Fix aarch64 sbgemv_t compilation error for GCC < 13
10 months ago
manjam01
5c4e38ab17
Optimize gemv_n_sve kernel
11 months ago
Martin Kroeker
1d5ed5c46b
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
11 months ago
Ye Tao
6b8b35cdf2
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
11 months ago
Ye Tao
38ee7c9301
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
11 months ago
Martin Kroeker
2b941c44b5
Merge branch 'develop' into sbgemv_n_neon
11 months ago
Ye Tao
35bdbca153
Add sbgemv_n_neon kernel for arm64.
11 months ago
Annop Wongwathanarat
edaf51dd99
Add sbgemv_t_bfdot kernel for ARM64
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
11 months ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
11 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
1 year ago
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
1 year ago
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
1 year ago
Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
1 year ago
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
1 year ago
Martin Kroeker
180ba5e7d0
Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
1 year ago
Deeksha Goplani
d1bfa979f7
small gemm kernel packing modifications
1 year ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Annop Wongwathanarat
c0318cea6e
Simplify gemv_t_sve_v1x3 kernel
1 year ago
Martin Kroeker
87083fdbf6
[WIP] Work around assembler limitations in current LLVM for Windows on Arm ( #5076 )
* Protect align directives in assembly files that are currently problematic with LLVM on WoA
* use the armv8 zdot on WoA to work around other LLVM issues
1 year ago
Martin Kroeker
229d8a025e
Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
1 year ago
SushilPratap04
3368a4e697
Update swap_kernel_sve.c
1 year ago
CDAC-SSDG
dd71e4234a
Added Updated swap and rot sve kernels.
1 year ago
CDAC-SSDG
06ffd411a5
Update KERNEL.ARMV8SVE
1 year ago
CDAC-SSDG
765850194e
Delete kernel/arm64/swap_kernel_sve.c
1 year ago
CDAC-SSDG
c17c19fbcf
Delete kernel/arm64/swap_kernel_c.c
1 year ago
CDAC-SSDG
f6416c0e37
Delete kernel/arm64/swap.c
1 year ago
CDAC-SSDG
3b7b74664c
Delete kernel/arm64/scal_kernel_sve.c
1 year ago
CDAC-SSDG
95a97012e8
Delete kernel/arm64/scal_kernel_c.c
1 year ago
CDAC-SSDG
5540f2121e
Delete kernel/arm64/scal.c
1 year ago
CDAC-SSDG
f62519cc87
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
CDAC-SSDG
10857c9df4
Delete kernel/arm64/rot_kernel_c.c
1 year ago
CDAC-SSDG
b9f51a5cf7
Delete kernel/arm64/rot.c
1 year ago
Martin Kroeker
81666de4ef
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
1 year ago