2576 Commits (ef8a44d981fad2673bf32743c4153559949e31fb)

Author SHA1 Message Date
  Martin Kroeker ef8a44d981
Merge 2b5d8c789d into 06c09deee9 8 months ago
  Martin Kroeker 06c09deee9
Merge pull request #5426 from hideaki-motoki/issue5417_axpy_sve 8 months ago
  Martin Kroeker 2b5d8c789d
remove debugging printout 8 months ago
  Martin Kroeker b4fc09e9e1
Add registers d8 to d15 to clobber lists as the code does not expressly save them 8 months ago
  Martin Kroeker 8e50b8d525
Add d8 to d15 to clobber lists as the code does not expressly save them 8 months ago
  yuanjia c2cc7a3602 riscv64: optimize gemv_t_vector.c 8 months ago
  h-motoki 855945befb Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E 8 months ago
  Martin Kroeker edaa73fd24
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc 8 months ago
  Martin Kroeker 501728a354
adjust register 20 accesses to 21 after moving x18 8 months ago
  Martin Kroeker 107c883c8a
Update SME-related kernels 8 months ago
  Martin Kroeker 05dbb54362
Delete misplaced file 8 months ago
  Martin Kroeker 0bc19a1335
Update SME kernel details 8 months ago
  Martin Kroeker ca542f319f
Add VORTEXM4 8 months ago
  Martin Kroeker 0203657f40
Add sgemm_direct_performant for ARM64 8 months ago
  Martin Kroeker e82bcd2740
Update ARM64 sgemm_direct object generation 8 months ago
  Martin Kroeker 731f4dd686
Add VORTEXM4 settings 8 months ago
  Martin Kroeker 53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility 8 months ago
  Martin Kroeker 08a00326a4
Build symbol name from build system variables 8 months ago
  Martin Kroeker 89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels 8 months ago
  Martin Kroeker 22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS 8 months ago
  Martin Kroeker ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S 8 months ago
  Martin Kroeker 9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking 9 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 9 months ago
  Chip Kerchner 64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 9 months ago
  Chip Kerchner c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 9 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 9 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 9 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 9 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 9 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 9 months ago
  Martin Kroeker 0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372 9 months ago
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 9 months ago
  Chip Kerchner 72f082f31d Fix bad vector zero initializer and other compiler warnings for RISC-V. 9 months ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 9 months ago
  abhishek-fujitsu 0bc79da587 add neon header 9 months ago
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 9 months ago
  Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 9 months ago
  Martin Kroeker e2d941e9af
Declare the "small" kernel static in addition to inline 9 months ago
  Martin Kroeker 8214700930
Declare the "small" kernel static in addition to inline 9 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 9 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 10 months ago
  Chris Sidebottom 947d7af4c9 Fix CMake references to bscal and bgemv 10 months ago
  Chris Sidebottom e105411460 Add infrastructure for bgemv/bscal 10 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 10 months ago
  Martin Kroeker 343830c26f
Add BGEMM parameter tables 10 months ago
  Martin Kroeker ff614575c9
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds 10 months ago
  Martin Kroeker 0e11537cab
Merge pull request #5357 from Mousius/bgemm-init 10 months ago
  Chris Sidebottom 66d9185ebe Fix CMake support 10 months ago
  Martin Kroeker fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3 10 months ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 10 months ago