9543 Commits (1ee8879c787c19b6a6c092def2016e76e93ffefd)
 

Author SHA1 Message Date
  Martin Kroeker 1ee8879c78
Add VORTEXM4 10 months ago
  Martin Kroeker edaa73fd24
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc 10 months ago
  Martin Kroeker 501728a354
adjust register 20 accesses to 21 after moving x18 10 months ago
  Martin Kroeker 107c883c8a
Update SME-related kernels 10 months ago
  Martin Kroeker 05dbb54362
Delete misplaced file 10 months ago
  Martin Kroeker 4609732e69
Relax version number requirement for AppleClang 10 months ago
  Martin Kroeker bf98e448eb
Add VORTEXM4 to DYNAMIC_ARCH list 10 months ago
  Martin Kroeker 0bc19a1335
Update SME kernel details 10 months ago
  Martin Kroeker 426b5f23ed
Add compiler options for VORTEXM4 10 months ago
  Martin Kroeker 4328c91e27
relax requirements in compiler SME capability check 10 months ago
  Martin Kroeker c794d0a4ce
Add VORTEXM4 10 months ago
  Martin Kroeker a4f5fec46e
Add compiler options for VORTEXM4 10 months ago
  Martin Kroeker ca542f319f
Add VORTEXM4 10 months ago
  Martin Kroeker 18f9582f3e
Add VORTEXM4 10 months ago
  Martin Kroeker 4e2a8c18e5
Split VORTEXM4 from VORTEX target due to SME support 10 months ago
  Martin Kroeker 30970460b8
Add VORTEXM4 target 10 months ago
  Martin Kroeker b0a00fbd62
Add minimal compiler flags for VORTEXM4 10 months ago
  Martin Kroeker ccfd0170fb
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list 10 months ago
  Martin Kroeker ef0b883dff
Add sgemm_direct_performant for ARM64 10 months ago
  Martin Kroeker e76c39099a
Add sgemm_direct_performant for ARM64 10 months ago
  Martin Kroeker 202a7a0e2a
Separate VORTEXM4 from VORTEX and ARMV9SME 10 months ago
  Martin Kroeker de91afd2ae
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direct_performant for ARM64 10 months ago
  Martin Kroeker 0203657f40
Add sgemm_direct_performant for ARM64 10 months ago
  Martin Kroeker e82bcd2740
Update ARM64 sgemm_direct object generation 10 months ago
  Martin Kroeker 731f4dd686
Add VORTEXM4 settings 10 months ago
  Martin Kroeker 53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility 10 months ago
  Martin Kroeker 08a00326a4
Build symbol name from build system variables 10 months ago
  Martin Kroeker 89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels 10 months ago
  Martin Kroeker 22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS 10 months ago
  Martin Kroeker ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S 10 months ago
  Martin Kroeker 9c43301b6d
Merge pull request #5421 from reibax-marcus/develop 10 months ago
  Martin Kroeker 9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking 10 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 10 months ago
  Chip Kerchner 64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 10 months ago
  Martin Kroeker 5e43ba948c
Merge pull request #5419 from Mousius/bgemm-optimisation 10 months ago
  Chip Kerchner c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 10 months ago
  Xabier Marquiegui 3a6b79c50f fix: broken cblas installation when using makefile based builds 10 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 10 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 10 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 10 months ago
  Martin Kroeker 75c6ab4036
CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 (#5411) 10 months ago
  Martin Kroeker 5c5f852ee3
Merge pull request #5415 from martin-frbg/Fixum-5399 11 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 11 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 11 months ago
  Martin Kroeker d23680b81d
Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1 11 months ago
  Martin Kroeker b4cc4be2ce
Merge pull request #5410 from martin-frbg/issue5404 11 months ago
  Martin Kroeker 0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372 11 months ago
  Martin Kroeker eddfe1e6b3
Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings 11 months ago
  Martin Kroeker 30d11bc92c
Adjust multithreading threshold and add an intermediate step 11 months ago
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 11 months ago