364 Commits (1ee8879c787c19b6a6c092def2016e76e93ffefd)

Author SHA1 Message Date
  Martin Kroeker edaa73fd24
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc 5 months ago
  Martin Kroeker 501728a354
adjust register 20 accesses to 21 after moving x18 5 months ago
  Martin Kroeker 05dbb54362
Delete misplaced file 5 months ago
  Martin Kroeker 0bc19a1335
Update SME kernel details 5 months ago
  Martin Kroeker ca542f319f
Add VORTEXM4 5 months ago
  Martin Kroeker 53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility 5 months ago
  Martin Kroeker 08a00326a4
Build symbol name from build system variables 5 months ago
  Martin Kroeker 89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels 5 months ago
  Martin Kroeker 22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS 5 months ago
  Martin Kroeker ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S 5 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 5 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 5 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 5 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 5 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 6 months ago
  abhishek-fujitsu 0bc79da587 add neon header 6 months ago
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 6 months ago
  Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 6 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 6 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 6 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 6 months ago
  Martin Kroeker fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3 6 months ago
  Iha, Taisei f7ad906b49 Performance improvements of [SD]DOT with loop-unrolling on A64FX 7 months ago
  Martin Kroeker ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone 7 months ago
  davidz-ampere aa90ab4142 Add support for Ampere AmpereOne processors 7 months ago
  Ian McInerney badef1d32e Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types 7 months ago
  davidz-ampere 84730068af reduce duplicate kernel code 7 months ago
  davidz-ampere be68ef03b4 Add support for Ampere processors 7 months ago
  Martin Kroeker 58eeb9041c
fix handling of dummy2 7 months ago
  Martin Kroeker 1589d0b21e
Merge pull request #5281 from martin-frbg/zscal_arm64 7 months ago
  Sharif Inamdar 8279e68805 Optimize gemv_n_sve_v1x3 kernel 7 months ago
  Arne Juul 5442aff218 Accumulate results in output register explicitly 7 months ago
  Martin Kroeker 28f8fdaf0f
support flag for NaN/Inf handling and fix scaling of NaN/Inf values 8 months ago
  Martin Kroeker 5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222) 8 months ago
  Martin Kroeker 151b74284e
Merge pull request #5203 from quic/fix-sgemmdirect-sme1 8 months ago
  abhishek-fujitsu 9c02cdb073 optimise dot using thread throttling for NEOVERSE V1 10 months ago
  Martin Kroeker d0e8fd6d40
Merge pull request #5239 from annop-w/gemv_n_sve 9 months ago
  Iha, Taisei 08b5c18d70 fixed a potential out-of-bounds on gemv. 9 months ago
  Annop Wongwathanarat e11744a411 Use SVE kernel for S/DGEMVN for SVE machines 9 months ago
  Martin Kroeker dd38b4e811
Merge pull request #5225 from annop-w/gemv_n 9 months ago
  Martin Kroeker 0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll 9 months ago
  Annop Wongwathanarat d535728803 Improve performance for SGEMVN on NEONVERSEN1 9 months ago
  Usui, Tetsuzo d711906e3e Add symv kernels for arm64 9 months ago
  Iha, Taisei f1e628b889 Further performance improvements to [SD]GEMV. 9 months ago
  Annop Wongwathanarat ec146157d3 Use SVE kernel for S/DGEMVT for SVE machines 10 months ago
  Vaisakh K V 04915be829 Add vector registers to clobber list to prevent compiler optimization. 10 months ago
  Ye Tao f27ba5efd1 fix bugs in aarch64 sbgemv_n kernel 10 months ago
  Annop Wongwathanarat edef2e4441 Fix bug in ARM64 sbgemv_t 10 months ago