9543 Commits (1ee8879c787c19b6a6c092def2016e76e93ffefd)
 

Author SHA1 Message Date
  Martin Kroeker 1ee8879c78
Add VORTEXM4 5 months ago
  Martin Kroeker edaa73fd24
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc 5 months ago
  Martin Kroeker 501728a354
adjust register 20 accesses to 21 after moving x18 5 months ago
  Martin Kroeker 107c883c8a
Update SME-related kernels 5 months ago
  Martin Kroeker 05dbb54362
Delete misplaced file 5 months ago
  Martin Kroeker 4609732e69
Relax version number requirement for AppleClang 5 months ago
  Martin Kroeker bf98e448eb
Add VORTEXM4 to DYNAMIC_ARCH list 5 months ago
  Martin Kroeker 0bc19a1335
Update SME kernel details 5 months ago
  Martin Kroeker 426b5f23ed
Add compiler options for VORTEXM4 5 months ago
  Martin Kroeker 4328c91e27
relax requirements in compiler SME capability check 5 months ago
  Martin Kroeker c794d0a4ce
Add VORTEXM4 5 months ago
  Martin Kroeker a4f5fec46e
Add compiler options for VORTEXM4 5 months ago
  Martin Kroeker ca542f319f
Add VORTEXM4 5 months ago
  Martin Kroeker 18f9582f3e
Add VORTEXM4 5 months ago
  Martin Kroeker 4e2a8c18e5
Split VORTEXM4 from VORTEX target due to SME support 5 months ago
  Martin Kroeker 30970460b8
Add VORTEXM4 target 5 months ago
  Martin Kroeker b0a00fbd62
Add minimal compiler flags for VORTEXM4 5 months ago
  Martin Kroeker ccfd0170fb
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list 5 months ago
  Martin Kroeker ef0b883dff
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker e76c39099a
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker 202a7a0e2a
Separate VORTEXM4 from VORTEX and ARMV9SME 5 months ago
  Martin Kroeker de91afd2ae
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker 0203657f40
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker e82bcd2740
Update ARM64 sgemm_direct object generation 5 months ago
  Martin Kroeker 731f4dd686
Add VORTEXM4 settings 5 months ago
  Martin Kroeker 53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility 5 months ago
  Martin Kroeker 08a00326a4
Build symbol name from build system variables 5 months ago
  Martin Kroeker 89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels 5 months ago
  Martin Kroeker 22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS 5 months ago
  Martin Kroeker ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S 5 months ago
  Martin Kroeker 9c43301b6d
Merge pull request #5421 from reibax-marcus/develop 5 months ago
  Martin Kroeker 9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking 5 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 5 months ago
  Chip Kerchner 64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 5 months ago
  Martin Kroeker 5e43ba948c
Merge pull request #5419 from Mousius/bgemm-optimisation 5 months ago
  Chip Kerchner c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 5 months ago
  Xabier Marquiegui 3a6b79c50f fix: broken cblas installation when using makefile based builds 5 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 5 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 5 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 5 months ago
  Martin Kroeker 75c6ab4036
CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 (#5411) 5 months ago
  Martin Kroeker 5c5f852ee3
Merge pull request #5415 from martin-frbg/Fixum-5399 5 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker d23680b81d
Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1 6 months ago
  Martin Kroeker b4cc4be2ce
Merge pull request #5410 from martin-frbg/issue5404 6 months ago
  Martin Kroeker 0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372 6 months ago
  Martin Kroeker eddfe1e6b3
Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings 6 months ago
  Martin Kroeker 30d11bc92c
Adjust multithreading threshold and add an intermediate step 6 months ago
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 6 months ago