446 Commits (05dbb54362f77cdcd6320947fa30b63a0cfd0cac)

Author SHA1 Message Date
  Martin Kroeker de91afd2ae
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direct_performant for ARM64 10 months ago
  Martin Kroeker 30d11bc92c
Adjust multithreading threshold and add an intermediate step 11 months ago
  Martin Kroeker a9e8fa06bf
Introduce a (crude) threshold to multithreading 11 months ago
  Martin Kroeker 965463f177
Include float-bfloat conversion functions in ONLY_CBLAS builds as well 11 months ago
  youcai 41f9701ebc Fix cmake building with cblas_bgemm 11 months ago
  Martin Kroeker 30dbca5051
fix misleading indentation to silence a gcc warning 11 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 11 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 11 months ago
  Chris Sidebottom 947d7af4c9 Fix CMake references to bscal and bgemv 11 months ago
  Chris Sidebottom e105411460 Add infrastructure for bgemv/bscal 11 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 11 months ago
  Chris Sidebottom 66d9185ebe Fix CMake support 1 year ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 1 year ago
  Usui, Tetsuzo 14107e37d9 Add parallel laed3 1 year ago
  Martin Kroeker d96daa220d
Merge pull request #5290 from Srangrang/develop 1 year ago
  Srangrang ec14e1648c fix: resolve non-RISCV host build failed issue 1 year ago
  Martin Kroeker 5e393f207c
fix source file used for sbgemmt/sbgemmtr 1 year ago
  Martin Kroeker 11ff18bb0f
Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal 1 year ago
  gkdddd 670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B 1 year ago
  Martin Kroeker 42b7d1f897
Fix addressing of alpha in CBLAS 1 year ago
  Martin Kroeker 6680e0592f
Fix conditional inclusion of SGEMM_KERNEL_DIRECT 1 year ago
  Martin Kroeker 70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake 1 year ago
  Ruiyang Wu 02fd1df10b CMake: Pass `OpenMP` compiler and linker flags through CMake targets 1 year ago
  Martin Kroeker 51c1fb1f93
Fix ?spmv build and misinterpretation of NO_LAPACK=0 1 year ago
  shubham.chaudhari 8e289ecddc Simplified thread throttling function in gemv 1 year ago
  shubham.chaudhari 189dbbc04f Add thread throttling for dynamic arch neoversev1 1 year ago
  shubham.chaudhari b6cb5ece58 Add thread throttling profile for DGEMV on NEOVERSEV1 1 year ago
  Martin Kroeker 7338a473a7
Merge pull request #5150 from Harishmcw/WoA-Experiments 1 year ago
  Martin Kroeker 09ba099461
make throttling code conditional on SMP 1 year ago
  Harishmcw 030ae1fd97 Redefined threading logic for WoA 1 year ago
  Martin Kroeker c03a81b927
Merge pull request #5141 from michalowski-arm/fork-throttle 1 year ago
  Martin Kroeker 75b958a018
Transform the B array back if necessary before returning 1 year ago
  Marek Michalowski 650a062e19 Add thread throttling profile for SGEMV on `NEOVERSEV2` 1 year ago
  Marek Michalowski b723c1b7b7 Add thread throttling profile for SGEMM on `NEOVERSEV2` 1 year ago
  Vaisakh K V f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1 1 year ago
  Vaisakh K V d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 1 year ago
  Harish-Gits daf16b8229 Adjusted GESV threading logic for optimal performance on WoA 1 year ago
  Martin Kroeker 60d0be0e97
Update nrm2.c 1 year ago
  Martin Kroeker 0fd5448b2c
Handle INCX=0 1 year ago
  Martin Kroeker db7e5f1fa7
Update gemmt.c 1 year ago
  Martin Kroeker ff30ac9666
Update Makefile 1 year ago
  Martin Kroeker 7c3e169b67
Update gemmt.c 1 year ago
  Martin Kroeker 09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such 1 year ago
  Marek Michalowski 838bb57e27
Merge branch 'develop' into develop 1 year ago
  Martin Kroeker a54f9a9c69
Merge pull request #5071 from annop-w/sgemm_throttling 1 year ago
  Marek Michalowski 4d5b13f765 Add thread throttling profile for SGEMV on `NEOVERSEV1` 1 year ago
  tingbo.liao 3c8df6358f Further rearranged the rotm kernel for the different architectures. 1 year ago
  gxw e114880dc4 kernel/generic: Fixed cscal and zscal 1 year ago
  Annop Wongwathanarat c8cd8da496 Add thread throttling profile for SGEMM on NEOVERSEV1 1 year ago
  Martin Kroeker a1075477c3
Merge pull request #4994 from martin-frbg/issue4886 1 year ago