9520 Commits (e82bcd27403b788bdacaaec56da839e5ccec5806)
 

Author SHA1 Message Date
  Martin Kroeker e82bcd2740
Update ARM64 sgemm_direct object generation 9 months ago
  Martin Kroeker 731f4dd686
Add VORTEXM4 settings 9 months ago
  Martin Kroeker 53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility 9 months ago
  Martin Kroeker 08a00326a4
Build symbol name from build system variables 9 months ago
  Martin Kroeker 89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels 9 months ago
  Martin Kroeker 22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS 9 months ago
  Martin Kroeker ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S 9 months ago
  Martin Kroeker 9c43301b6d
Merge pull request #5421 from reibax-marcus/develop 9 months ago
  Martin Kroeker 9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking 9 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 9 months ago
  Chip Kerchner 64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 9 months ago
  Martin Kroeker 5e43ba948c
Merge pull request #5419 from Mousius/bgemm-optimisation 9 months ago
  Chip Kerchner c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 9 months ago
  Xabier Marquiegui 3a6b79c50f fix: broken cblas installation when using makefile based builds 9 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 9 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 9 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 9 months ago
  Martin Kroeker 75c6ab4036
CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 (#5411) 9 months ago
  Martin Kroeker 5c5f852ee3
Merge pull request #5415 from martin-frbg/Fixum-5399 9 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 9 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 9 months ago
  Martin Kroeker d23680b81d
Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1 9 months ago
  Martin Kroeker b4cc4be2ce
Merge pull request #5410 from martin-frbg/issue5404 9 months ago
  Martin Kroeker 0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372 9 months ago
  Martin Kroeker eddfe1e6b3
Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings 9 months ago
  Martin Kroeker 30d11bc92c
Adjust multithreading threshold and add an intermediate step 9 months ago
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 9 months ago
  Chip Kerchner 72f082f31d Fix bad vector zero initializer and other compiler warnings for RISC-V. 9 months ago
  Masato Nakagawa 7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 9 months ago
  Martin Kroeker 9a64b32b44
Merge pull request #5406 from martin-frbg/fixbgemmtest 9 months ago
  Martin Kroeker b66a01f909
Fix building of bgemm tests on GEMM3M-capable (x86) targets 9 months ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 9 months ago
  abhishek-fujitsu 6356190d06 fix gfortran link path in dynamic_arch.yml 9 months ago
  abhishek-fujitsu 4c8dcb3a8f Darwin/arm64: disable SVE/SME and fix gfortran link path 9 months ago
  Martin Kroeker 33b50548eb
Merge pull request #5403 from martin-frbg/issue5402 9 months ago
  Martin Kroeker c504aedca1
Merge pull request #5400 from Mousius/neoversev2-target 9 months ago
  Martin Kroeker b9e107932a
add NeoverseV2 9 months ago
  Martin Kroeker 2f89a5970e
fix NeoverseV2 typo 9 months ago
  Martin Kroeker a9e8fa06bf
Introduce a (crude) threshold to multithreading 9 months ago
  Martin Kroeker b4c2b34a45
Merge pull request #5401 from martin-frbg/followup-5397 9 months ago
  Martin Kroeker c9204f7b6f
Merge pull request #5399 from Mousius/bgemm-8x4 9 months ago
  Martin Kroeker a55e65dba9
Merge pull request #5391 from martin-frbg/issue5387 9 months ago
  abhishek-fujitsu 0bc79da587 add neon header 9 months ago
  abhishek-fujitsu 720a4743b9 update contribution list 9 months ago
  abhishek-fujitsu 05fc88180c ARM64: Enable bfloat16 kernels by default 1 year ago
  Martin Kroeker 965463f177
Include float-bfloat conversion functions in ONLY_CBLAS builds as well 9 months ago
  Martin Kroeker 4272cf8c7f
Merge pull request #5398 from martin-frbg/fixup-5394 9 months ago
  Chris Sidebottom 87247daadc Add NEOVERSEV2 target support 9 months ago
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 9 months ago
  Martin Kroeker a5b55f6fe3
remove CBLAS restriction on GEMM_GEMV forwarding 9 months ago