9527 Commits (b0a00fbd62e1d3d6c0be4a42201ddee3002df1ae)
 

Author SHA1 Message Date
  Martin Kroeker b0a00fbd62
Add minimal compiler flags for VORTEXM4 5 months ago
  Martin Kroeker ccfd0170fb
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list 5 months ago
  Martin Kroeker ef0b883dff
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker e76c39099a
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker 202a7a0e2a
Separate VORTEXM4 from VORTEX and ARMV9SME 5 months ago
  Martin Kroeker de91afd2ae
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker 0203657f40
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker e82bcd2740
Update ARM64 sgemm_direct object generation 5 months ago
  Martin Kroeker 731f4dd686
Add VORTEXM4 settings 5 months ago
  Martin Kroeker 53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility 5 months ago
  Martin Kroeker 08a00326a4
Build symbol name from build system variables 5 months ago
  Martin Kroeker 89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels 5 months ago
  Martin Kroeker 22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS 5 months ago
  Martin Kroeker ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S 5 months ago
  Martin Kroeker 9c43301b6d
Merge pull request #5421 from reibax-marcus/develop 5 months ago
  Martin Kroeker 9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking 5 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 6 months ago
  Chip Kerchner 64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 6 months ago
  Martin Kroeker 5e43ba948c
Merge pull request #5419 from Mousius/bgemm-optimisation 6 months ago
  Chip Kerchner c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 6 months ago
  Xabier Marquiegui 3a6b79c50f fix: broken cblas installation when using makefile based builds 6 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 6 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 6 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 6 months ago
  Martin Kroeker 75c6ab4036
CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 (#5411) 6 months ago
  Martin Kroeker 5c5f852ee3
Merge pull request #5415 from martin-frbg/Fixum-5399 6 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker d23680b81d
Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1 6 months ago
  Martin Kroeker b4cc4be2ce
Merge pull request #5410 from martin-frbg/issue5404 6 months ago
  Martin Kroeker 0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372 6 months ago
  Martin Kroeker eddfe1e6b3
Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings 6 months ago
  Martin Kroeker 30d11bc92c
Adjust multithreading threshold and add an intermediate step 6 months ago
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 6 months ago
  Chip Kerchner 72f082f31d Fix bad vector zero initializer and other compiler warnings for RISC-V. 6 months ago
  Masato Nakagawa 7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 6 months ago
  Martin Kroeker 9a64b32b44
Merge pull request #5406 from martin-frbg/fixbgemmtest 6 months ago
  Martin Kroeker b66a01f909
Fix building of bgemm tests on GEMM3M-capable (x86) targets 6 months ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 6 months ago
  abhishek-fujitsu 6356190d06 fix gfortran link path in dynamic_arch.yml 6 months ago
  abhishek-fujitsu 4c8dcb3a8f Darwin/arm64: disable SVE/SME and fix gfortran link path 6 months ago
  Martin Kroeker 33b50548eb
Merge pull request #5403 from martin-frbg/issue5402 6 months ago
  Martin Kroeker c504aedca1
Merge pull request #5400 from Mousius/neoversev2-target 6 months ago
  Martin Kroeker b9e107932a
add NeoverseV2 6 months ago
  Martin Kroeker 2f89a5970e
fix NeoverseV2 typo 6 months ago
  Martin Kroeker a9e8fa06bf
Introduce a (crude) threshold to multithreading 6 months ago
  Martin Kroeker b4c2b34a45
Merge pull request #5401 from martin-frbg/followup-5397 6 months ago
  Martin Kroeker c9204f7b6f
Merge pull request #5399 from Mousius/bgemm-8x4 6 months ago
  Martin Kroeker a55e65dba9
Merge pull request #5391 from martin-frbg/issue5387 6 months ago
  abhishek-fujitsu 0bc79da587 add neon header 6 months ago