9556 Commits (ef8a44d981fad2673bf32743c4153559949e31fb)
 

Author SHA1 Message Date
  Martin Kroeker ef8a44d981
Merge 2b5d8c789d into 06c09deee9 5 months ago
  Martin Kroeker 06c09deee9
Merge pull request #5426 from hideaki-motoki/issue5417_axpy_sve 5 months ago
  Martin Kroeker da7d0f4a38
Merge pull request #5427 from yuanjia111/develop 5 months ago
  Martin Kroeker 2b5d8c789d
remove debugging printout 5 months ago
  Martin Kroeker 1b88c9c742
remove debugging printouts 5 months ago
  Martin Kroeker b4fc09e9e1
Add registers d8 to d15 to clobber lists as the code does not expressly save them 5 months ago
  Martin Kroeker 8e50b8d525
Add d8 to d15 to clobber lists as the code does not expressly save them 5 months ago
  Martin Kroeker 7f89c6f353
smh-based direct sgemm currently requires leading dimensions to be same as matrix dimension 5 months ago
  yuanjia c2cc7a3602 riscv64: optimize gemv_t_vector.c 5 months ago
  h-motoki e23f9c6642 Merge remote-tracking branch 'upstream/develop' into issue5417_axpy_sve 5 months ago
  Martin Kroeker b3f247ae5a
Merge pull request #5425 from martin-frbg/fixup5389 5 months ago
  h-motoki 855945befb Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E 5 months ago
  Martin Kroeker 7c1839899e
Increase assumed L2 sizes for RISCV X280 / ZVL256B and for SVE-capable ARM64 5 months ago
  Martin Kroeker 1ee8879c78
Add VORTEXM4 5 months ago
  Martin Kroeker edaa73fd24
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc 5 months ago
  Martin Kroeker 501728a354
adjust register 20 accesses to 21 after moving x18 5 months ago
  Martin Kroeker 107c883c8a
Update SME-related kernels 5 months ago
  Martin Kroeker 05dbb54362
Delete misplaced file 5 months ago
  Martin Kroeker 4609732e69
Relax version number requirement for AppleClang 5 months ago
  Martin Kroeker bf98e448eb
Add VORTEXM4 to DYNAMIC_ARCH list 5 months ago
  Martin Kroeker 0bc19a1335
Update SME kernel details 5 months ago
  Martin Kroeker 426b5f23ed
Add compiler options for VORTEXM4 5 months ago
  Martin Kroeker 4328c91e27
relax requirements in compiler SME capability check 5 months ago
  Martin Kroeker c794d0a4ce
Add VORTEXM4 5 months ago
  Martin Kroeker a4f5fec46e
Add compiler options for VORTEXM4 5 months ago
  Martin Kroeker ca542f319f
Add VORTEXM4 5 months ago
  Martin Kroeker 18f9582f3e
Add VORTEXM4 5 months ago
  Martin Kroeker 4e2a8c18e5
Split VORTEXM4 from VORTEX target due to SME support 5 months ago
  Martin Kroeker 30970460b8
Add VORTEXM4 target 5 months ago
  Martin Kroeker b0a00fbd62
Add minimal compiler flags for VORTEXM4 5 months ago
  Martin Kroeker ccfd0170fb
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list 5 months ago
  Martin Kroeker ef0b883dff
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker e76c39099a
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker 202a7a0e2a
Separate VORTEXM4 from VORTEX and ARMV9SME 5 months ago
  Martin Kroeker de91afd2ae
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker 0203657f40
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker e82bcd2740
Update ARM64 sgemm_direct object generation 5 months ago
  Martin Kroeker 731f4dd686
Add VORTEXM4 settings 5 months ago
  Martin Kroeker 53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility 5 months ago
  Martin Kroeker 08a00326a4
Build symbol name from build system variables 5 months ago
  Martin Kroeker 89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels 5 months ago
  Martin Kroeker 22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS 5 months ago
  Martin Kroeker ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S 5 months ago
  Martin Kroeker 9c43301b6d
Merge pull request #5421 from reibax-marcus/develop 5 months ago
  Martin Kroeker 9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking 6 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 6 months ago
  Chip Kerchner 64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 6 months ago
  Martin Kroeker 5e43ba948c
Merge pull request #5419 from Mousius/bgemm-optimisation 6 months ago
  Chip Kerchner c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 6 months ago
  Xabier Marquiegui 3a6b79c50f fix: broken cblas installation when using makefile based builds 6 months ago