355 Commits (c31861ea62e140c030daf2e0296da524f82ea526)

Author SHA1 Message Date
  h-motoki 855945befb Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E 5 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 5 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 5 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 5 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 5 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 5 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 5 months ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 6 months ago
  abhishek-fujitsu 0bc79da587 add neon header 6 months ago
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 6 months ago
  Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 6 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 6 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 6 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 6 months ago
  Martin Kroeker fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3 6 months ago
  Iha, Taisei f7ad906b49 Performance improvements of [SD]DOT with loop-unrolling on A64FX 6 months ago
  Martin Kroeker ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone 7 months ago
  davidz-ampere aa90ab4142 Add support for Ampere AmpereOne processors 7 months ago
  Ian McInerney badef1d32e Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types 7 months ago
  davidz-ampere 84730068af reduce duplicate kernel code 7 months ago
  davidz-ampere be68ef03b4 Add support for Ampere processors 7 months ago
  Martin Kroeker 58eeb9041c
fix handling of dummy2 7 months ago
  Martin Kroeker 1589d0b21e
Merge pull request #5281 from martin-frbg/zscal_arm64 7 months ago
  Sharif Inamdar 8279e68805 Optimize gemv_n_sve_v1x3 kernel 7 months ago
  Arne Juul 5442aff218 Accumulate results in output register explicitly 7 months ago
  Martin Kroeker 28f8fdaf0f
support flag for NaN/Inf handling and fix scaling of NaN/Inf values 8 months ago
  Martin Kroeker 5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222) 8 months ago
  Martin Kroeker 151b74284e
Merge pull request #5203 from quic/fix-sgemmdirect-sme1 8 months ago
  abhishek-fujitsu 9c02cdb073 optimise dot using thread throttling for NEOVERSE V1 10 months ago
  Martin Kroeker d0e8fd6d40
Merge pull request #5239 from annop-w/gemv_n_sve 9 months ago
  Iha, Taisei 08b5c18d70 fixed a potential out-of-bounds on gemv. 9 months ago
  Annop Wongwathanarat e11744a411 Use SVE kernel for S/DGEMVN for SVE machines 9 months ago
  Martin Kroeker dd38b4e811
Merge pull request #5225 from annop-w/gemv_n 9 months ago
  Martin Kroeker 0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll 9 months ago
  Annop Wongwathanarat d535728803 Improve performance for SGEMVN on NEONVERSEN1 9 months ago
  Usui, Tetsuzo d711906e3e Add symv kernels for arm64 9 months ago
  Iha, Taisei f1e628b889 Further performance improvements to [SD]GEMV. 9 months ago
  Annop Wongwathanarat ec146157d3 Use SVE kernel for S/DGEMVT for SVE machines 10 months ago
  Vaisakh K V 04915be829 Add vector registers to clobber list to prevent compiler optimization. 10 months ago
  Ye Tao f27ba5efd1 fix bugs in aarch64 sbgemv_n kernel 10 months ago
  Annop Wongwathanarat edef2e4441 Fix bug in ARM64 sbgemv_t 10 months ago
  Martin Kroeker b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy 10 months ago
  Martin Kroeker 2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16 10 months ago
  Annop Wongwathanarat 9807f56580 Optimize aarch64 sgemm_ncopy 10 months ago
  Martin Kroeker a3e7b16072
Merge pull request #5157 from manaalmj/feature 10 months ago
  Ye Tao 4c00099ed6 replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16 10 months ago
  Annop Wongwathanarat a085b6c9ec Fix aarch64 sbgemv_t compilation error for GCC < 13 10 months ago
  manjam01 5c4e38ab17 Optimize gemv_n_sve kernel 11 months ago
  Martin Kroeker 1d5ed5c46b
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2 11 months ago
  Ye Tao 6b8b35cdf2 fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c 11 months ago