360 Commits (4328c91e27fb75e337d23939e7ccbeaed20dd43a)

Author SHA1 Message Date
  Martin Kroeker ca542f319f
Add VORTEXM4 5 months ago
  Martin Kroeker 53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility 5 months ago
  Martin Kroeker 08a00326a4
Build symbol name from build system variables 5 months ago
  Martin Kroeker 89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels 5 months ago
  Martin Kroeker 22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS 5 months ago
  Martin Kroeker ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S 5 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 5 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 6 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 6 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 6 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 6 months ago
  abhishek-fujitsu 0bc79da587 add neon header 6 months ago
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 6 months ago
  Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 6 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 6 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 7 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 7 months ago
  Martin Kroeker fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3 7 months ago
  Iha, Taisei f7ad906b49 Performance improvements of [SD]DOT with loop-unrolling on A64FX 7 months ago
  Martin Kroeker ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone 7 months ago
  davidz-ampere aa90ab4142 Add support for Ampere AmpereOne processors 7 months ago
  Ian McInerney badef1d32e Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types 7 months ago
  davidz-ampere 84730068af reduce duplicate kernel code 7 months ago
  davidz-ampere be68ef03b4 Add support for Ampere processors 7 months ago
  Martin Kroeker 58eeb9041c
fix handling of dummy2 8 months ago
  Martin Kroeker 1589d0b21e
Merge pull request #5281 from martin-frbg/zscal_arm64 8 months ago
  Sharif Inamdar 8279e68805 Optimize gemv_n_sve_v1x3 kernel 8 months ago
  Arne Juul 5442aff218 Accumulate results in output register explicitly 8 months ago
  Martin Kroeker 28f8fdaf0f
support flag for NaN/Inf handling and fix scaling of NaN/Inf values 8 months ago
  Martin Kroeker 5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222) 9 months ago
  Martin Kroeker 151b74284e
Merge pull request #5203 from quic/fix-sgemmdirect-sme1 9 months ago
  abhishek-fujitsu 9c02cdb073 optimise dot using thread throttling for NEOVERSE V1 10 months ago
  Martin Kroeker d0e8fd6d40
Merge pull request #5239 from annop-w/gemv_n_sve 9 months ago
  Iha, Taisei 08b5c18d70 fixed a potential out-of-bounds on gemv. 9 months ago
  Annop Wongwathanarat e11744a411 Use SVE kernel for S/DGEMVN for SVE machines 9 months ago
  Martin Kroeker dd38b4e811
Merge pull request #5225 from annop-w/gemv_n 9 months ago
  Martin Kroeker 0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll 10 months ago
  Annop Wongwathanarat d535728803 Improve performance for SGEMVN on NEONVERSEN1 10 months ago
  Usui, Tetsuzo d711906e3e Add symv kernels for arm64 10 months ago
  Iha, Taisei f1e628b889 Further performance improvements to [SD]GEMV. 10 months ago
  Annop Wongwathanarat ec146157d3 Use SVE kernel for S/DGEMVT for SVE machines 10 months ago
  Vaisakh K V 04915be829 Add vector registers to clobber list to prevent compiler optimization. 10 months ago
  Ye Tao f27ba5efd1 fix bugs in aarch64 sbgemv_n kernel 11 months ago
  Annop Wongwathanarat edef2e4441 Fix bug in ARM64 sbgemv_t 11 months ago
  Martin Kroeker b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy 11 months ago
  Martin Kroeker 2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16 11 months ago
  Annop Wongwathanarat 9807f56580 Optimize aarch64 sgemm_ncopy 11 months ago
  Martin Kroeker a3e7b16072
Merge pull request #5157 from manaalmj/feature 11 months ago