You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Arne Juul 5442aff218 Accumulate results in output register explicitly 11 months ago
..
KERNEL Further rearranged the rotm kernel for the different architectures. 1 year ago
KERNEL.A64FX Further performance improvements to [SD]GEMV. 1 year ago
KERNEL.ARMV8 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
KERNEL.ARMV8SVE Use SVE kernel for S/DGEMVN for SVE machines 1 year ago
KERNEL.ARMV9SME Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 1 year ago
KERNEL.CORTEXA53 optimize cgemm on ARM cortex A53 & cortex A55 4 years ago
KERNEL.CORTEXA55 Reduce duplication in kernel definitions 2 years ago
KERNEL.CORTEXA57 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
KERNEL.CORTEXA72 Simplifying ARMv8 build parameters 7 years ago
KERNEL.CORTEXA73 Simplifying ARMv8 build parameters 7 years ago
KERNEL.CORTEXA76 Add support for Cortex-A76 2 years ago
KERNEL.CORTEXA510 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 2 years ago
KERNEL.CORTEXA710 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 2 years ago
KERNEL.CORTEXX1 CortexX1 is ARMV8 like A7x 4 years ago
KERNEL.CORTEXX2 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 2 years ago
KERNEL.EMAG8180 Add preliminary support for EMAG8180 6 years ago
KERNEL.FALKOR Simplifying ARMv8 build parameters 7 years ago
KERNEL.FT2000 Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 4 years ago
KERNEL.NEOVERSEN1 Merge pull request #5225 from annop-w/gemv_n 1 year ago
KERNEL.NEOVERSEN2 Use SVE kernel for S/DGEMVN for SVE machines 1 year ago
KERNEL.NEOVERSEV1 Further performance improvements to [SD]GEMV. 1 year ago
KERNEL.NEOVERSEV2 Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2 1 year ago
KERNEL.THUNDERX Add workaround for NVIDIA HPC 5 years ago
KERNEL.THUNDERX2T99 Add SVE implementation for sdot/ddot 3 years ago
KERNEL.THUNDERX3T110 Reduce duplication in kernel definitions 2 years ago
KERNEL.TSV110 Add workaround for NVIDIA HPC 5 years ago
KERNEL.VORTEX Use Neoverse's current mix of ThunderX2 kernels for Vortex as well 4 years ago
KERNEL.generic Further rearranged the rotm kernel for the different architectures. 1 year ago
Makefile added experimental support for ARMV8 12 years ago
amax.S ARM64: Convert all labels to local labels 8 years ago
asum.S ARM64: Convert all labels to local labels 8 years ago
axpy.S ARM64: Convert all labels to local labels 8 years ago
casum.S ARM64: Convert all labels to local labels 8 years ago
casum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 8 years ago
cgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
cgemm_kernel_8x4.S move ALPHA_I out of register 18 (reserved on OSX) 3 years ago
cgemm_kernel_8x4_cortexa53.c optimize cgemm on ARM cortex A53 & cortex A55 4 years ago
cgemm_kernel_8x4_thunderx2t99.S Move ALPHA_I out of register 18 (reserved on OSX) 3 years ago
cgemm_kernel_sve_v1x4.S Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
cgemm_ncopy_sve_v1.c Disambiguate whilelt 2 years ago
cgemm_tcopy_sve_v1.c Disambiguate whilelt 2 years ago
copy.S ARM64: Convert all labels to local labels 8 years ago
copy_thunderx2t99.c [WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076) 1 year ago
csum.S Add ARM64 implementations of ?sum 7 years ago
csum_thunderx2t99.c add csum/zsum kernels (trivially derived from the asum ones)s) 2 years ago
ctrmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
ctrmm_kernel_8x4.S Move ALPHA_I out of register 18 (reserved on OSX) 3 years ago
ctrmm_kernel_sve_v1x4.S add cgemm ctrmm sve kernels 4 years ago
dasum_thunderx2t99.c [WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076) 1 year ago
daxpy_thunderx.c aarch64 fix std=c18 compilation 5 years ago
daxpy_thunderx2t99.S ARM64: Improve DAXPY for ThunderX2 6 years ago
ddot_thunderx.c ARM64: Rename kernel files to have consistent naming 9 years ago
dgemm_beta.S Fix zero initialization for beta=0 case 6 years ago
dgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_4x4_cortexa53.c MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 4 years ago
dgemm_kernel_4x8.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_8x4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_8x4_thunderx2t99.S ARM64: Move parameters from parameter.c to param.h 7 years ago
dgemm_kernel_sve_v1x8.S some clean-up & commentary 4 years ago
dgemm_kernel_sve_v2x8.S Remove prefetches from SVE kernels 3 years ago
dgemm_ncopy_4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_ncopy_8.S ARM64: Convert all labels to local labels 8 years ago
dgemm_small_kernel_nn_sve.c Better header guard around bridge 1 year ago
dgemm_small_kernel_nt_sve.c Better header guard around bridge 1 year ago
dgemm_small_kernel_tn_sve.c small gemm kernel packing modifications 1 year ago
dgemm_small_kernel_tt_sve.c small gemm kernel packing modifications 1 year ago
dgemm_tcopy_4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_tcopy_8.S Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX) 4 years ago
dot.S ARM64: Fix utest dsdot errors 8 years ago
dot.c optimise dot using thread throttling for NEOVERSE V1 1 year ago
dot_kernel_asimd.c Accumulate results in output register explicitly 11 months ago
dot_kernel_sve.c add clobber list 1 year ago
dot_thunderx.c ARM64: Rename kernel files to have consistent naming 9 years ago
dtrmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
dtrmm_kernel_4x8.S ARM64: Convert all labels to local labels 8 years ago
dtrmm_kernel_8x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
dtrmm_kernel_sve_v1x8.S some clean-up & commentary 4 years ago
dznrm2_thunderx2t99.c remove another early exit for incx < 0 2 years ago
dznrm2_thunderx2t99_fast.c Fixed a few more unnecessary calls to num_cpu_avail. 8 years ago
gemm_ncopy_complex_sve_v1x4.c Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
gemm_ncopy_sve_v1x8.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
gemm_small_kernel_permit_sve.c Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM 1 year ago
gemm_tcopy_complex_sve_v1x4.c Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
gemm_tcopy_sve_v1x8.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
gemv_n.S ARM64: Convert all labels to local labels 8 years ago
gemv_n_sve.c Optimize gemv_n_sve kernel 1 year ago
gemv_n_sve_v1x3.c fixed a potential out-of-bounds on gemv. 1 year ago
gemv_n_sve_v4x3.c fixed a potential out-of-bounds on gemv. 1 year ago
gemv_t.S Add accumulators to AArch64 GEMV Kernels 1 year ago
gemv_t_sve.c Add accumulators to AArch64 GEMV Kernels 1 year ago
gemv_t_sve_v1x3.c Simplify gemv_t_sve_v1x3 kernel 1 year ago
gemv_t_sve_v4x3.c Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1 1 year ago
iamax.S ARM64: Convert all labels to local labels 8 years ago
iamax_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 8 years ago
izamax.S ARM64: Convert all labels to local labels 8 years ago
izamax_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 8 years ago
nrm2.S Fix accidental duplication of jump instruction 6 years ago
rot.S ARM64: Convert all labels to local labels 8 years ago
rot.c Added Updated swap and rot sve kernels. 1 year ago
rot_kernel_c.c Added Updated swap and rot sve kernels. 1 year ago
rot_kernel_sve.c Added Updated swap and rot sve kernels. 1 year ago
sasum_thunderx2t99.c [WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076) 1 year ago
sbgemm_beta_neoversen2.c neoverse n2 sbgemm: init file 3 years ago
sbgemm_beta_neoversev1.c * checkpoint sbgemm for SVE-256 1 year ago
sbgemm_kernel_4x4_neoversev1.c optimized sbgemm kernel for neoverse-v1 (sve-256) 1 year ago
sbgemm_kernel_4x4_neoversev1_impl.c optimized sbgemm kernel for neoverse-v1 (sve-256) 1 year ago
sbgemm_kernel_8x4_neoversen2.c Change file name to match the norm and delete useless code. 3 years ago
sbgemm_kernel_8x4_neoversen2_impl.c Change file name to match the norm and delete useless code. 3 years ago
sbgemm_ncopy_4_neoversen2.c Change file name to match the norm and delete useless code. 3 years ago
sbgemm_ncopy_4_neoversev1.c optimized sbgemm kernel for neoverse-v1 (sve-256) 1 year ago
sbgemm_ncopy_8_neoversen2.c bugfix for sbgemm_ncopy_8_neoversen2 3 years ago
sbgemm_tcopy_4_neoversen2.c Add sbgemm_ncopy_8 and sbgemm_tcopy_4 3 years ago
sbgemm_tcopy_4_neoversev1.c optimized sbgemm kernel for neoverse-v1 (sve-256) 1 year ago
sbgemm_tcopy_8_neoversen2.c Improve the performance of sbgemm_tcopy on neoversen2 3 years ago
sbgemv_n_neon.c fix bugs in aarch64 sbgemv_n kernel 1 year ago
sbgemv_t_bfdot.c Fix bug in ARM64 sbgemv_t 1 year ago
scal.S make NAN handling depend on the dummy2 parameter 1 year ago
scnrm2_thunderx2t99.c remove another early exit for incx < 0 2 years ago
sgemm_beta.S fix initialization to zero in the NEON SGEMM_BETA kernel as well 6 years ago
sgemm_direct_arm64_sme1.c Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222) 1 year ago
sgemm_direct_sme1.S Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 1 year ago
sgemm_direct_sme1_preprocess.S Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 1 year ago
sgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_8x8.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_8x8_cortexa53.S fix INIT8x4 5 years ago
sgemm_kernel_16x4.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_16x4_thunderx2t99.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_sve_v1x8.S add sgemm kernel and copy functions for sgemm and ssymm 4 years ago
sgemm_kernel_sve_v2x8.S Remove prefetches from SVE kernels 3 years ago
sgemm_ncopy_4.S Optimize aarch64 sgemm_ncopy 1 year ago
sgemm_ncopy_8.S Optimize aarch64 sgemm_ncopy 1 year ago
sgemm_small_kernel_nn_sve.c Better header guard around bridge 1 year ago
sgemm_small_kernel_nt_sve.c Better header guard around bridge 1 year ago
sgemm_small_kernel_tn_sve.c small gemm kernel packing modifications 1 year ago
sgemm_small_kernel_tt_sve.c small gemm kernel packing modifications 1 year ago
sgemm_tcopy_8.S sgemm copy source init 6 years ago
sgemm_tcopy_16.S change line endings from CRLF to LF 3 years ago
sgemv_n_neon.c Improve performance for SGEMVN on NEONVERSEN1 1 year ago
strmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
strmm_kernel_8x8.S ARM64: Convert all labels to local labels 8 years ago
strmm_kernel_8x8_cortexa53.S use general register to speedup 6 years ago
strmm_kernel_16x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
strmm_kernel_sve_v1x8.S strmm sve v1x8 kernel 4 years ago
sum.S Add ARM64 implementations of ?sum 7 years ago
swap.S ARM64: Convert all labels to local labels 8 years ago
swap.c Added Updated swap and rot sve kernels. 1 year ago
swap_kernel_c.c Added Updated swap and rot sve kernels. 1 year ago
swap_kernel_sve.c Update swap_kernel_sve.c 1 year ago
swap_thunderx2t99.S THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations 9 years ago
symm_lcopy_sve.c Disambiguate whilelt 2 years ago
symm_ucopy_sve.c Disambiguate whilelt 2 years ago
symv_L_asimd_4x4.c Add symv kernels for arm64 1 year ago
symv_L_sve_v1x4.c Add symv kernels for arm64 1 year ago
symv_U_asimd_4x4.c Add symv kernels for arm64 1 year ago
symv_U_sve_v1x4.c Add symv kernels for arm64 1 year ago
symv_microk_asimd_4x4.c Add symv kernels for arm64 1 year ago
symv_microk_sve_v1x4.c Add symv kernels for arm64 1 year ago
trmm_lncopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trmm_ltcopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trmm_uncopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trmm_utcopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trsm_kernel_LN_sve.c add sve ztrsm 4 years ago
trsm_kernel_LT_sve.c add sve ztrsm 4 years ago
trsm_kernel_RN_sve.c add sve ztrsm 4 years ago
trsm_kernel_RT_sve.c add sve ztrsm 4 years ago
trsm_lncopy_sve.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trsm_ltcopy_sve.c Disambiguate whilelt 2 years ago
trsm_uncopy_sve.c Disambiguate whilelt 2 years ago
trsm_utcopy_sve.c Disambiguate whilelt 2 years ago
zamax.S Fix the functional bugs for zamax. 6 years ago
zasum.S ARM64: Convert all labels to local labels 8 years ago
zasum_thunderx2t99.c [WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076) 1 year ago
zaxpy.S ARM64: Convert all labels to local labels 8 years ago
zdot.S ARM64: Convert all labels to local labels 8 years ago
zdot_thunderx2t99.c Add a clobber list to fix utest errors seen with gcc13 on Apple M 1 year ago
zgemm_kernel_4x4.S move alpha to x19/x20 to leave x18 unused for OSX 4 years ago
zgemm_kernel_4x4_cortexa53.c MOD: add comments to a53 zgemm kernel 4 years ago
zgemm_kernel_4x4_thunderx2t99.S ARM64: Convert all labels to local labels 8 years ago
zgemm_kernel_sve_v1x4.S Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
zgemm_ncopy_sve_v1.c Disambiguate whilelt 2 years ago
zgemm_tcopy_sve_v1.c Disambiguate whilelt 2 years ago
zgemv_n.S ARM64: Convert all labels to local labels 8 years ago
zgemv_t.S ARM64: Convert all labels to local labels 8 years ago
zhemm_ltcopy_sve.c Fix ZHEMM copy for SVE 2 years ago
zhemm_utcopy_sve.c Fix ZHEMM copy for SVE 2 years ago
znrm2.S Remove automatic label postfixes from macro included only once 6 years ago
zrot.S ARM64: Convert all labels to local labels 8 years ago
zscal.S Fix handling of NAN 2 years ago
zsum.S Add ARM64 implementations of ?sum 7 years ago
zsum_thunderx2t99.c add csum/zsum kernels (trivially derived from the asum ones)s) 2 years ago
zsymm_lcopy_sve.c Disambiguate whilelt 2 years ago
zsymm_ucopy_sve.c Disambiguate whilelt 2 years ago
ztrmm_kernel_4x4.S Move alphaI to x22 to leave x18 unused (reserved on OSX) 4 years ago
ztrmm_kernel_sve_v1x4.S fix sve ztrmm kernel 4 years ago
ztrmm_lncopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrmm_ltcopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrmm_uncopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrmm_utcopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrsm_lncopy_sve.c Disambiguate whilelt 2 years ago
ztrsm_ltcopy_sve.c Disambiguate whilelt 2 years ago
ztrsm_uncopy_sve.c Disambiguate whilelt 2 years ago
ztrsm_utcopy_sve.c Disambiguate whilelt 2 years ago