2569 Commits (edaa73fd2423ca332c41fdf16966da8ddb1c5ca3)

Author SHA1 Message Date
  Martin Kroeker edaa73fd24
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc 5 months ago
  Martin Kroeker 501728a354
adjust register 20 accesses to 21 after moving x18 5 months ago
  Martin Kroeker 107c883c8a
Update SME-related kernels 5 months ago
  Martin Kroeker 05dbb54362
Delete misplaced file 5 months ago
  Martin Kroeker 0bc19a1335
Update SME kernel details 5 months ago
  Martin Kroeker ca542f319f
Add VORTEXM4 5 months ago
  Martin Kroeker 0203657f40
Add sgemm_direct_performant for ARM64 5 months ago
  Martin Kroeker e82bcd2740
Update ARM64 sgemm_direct object generation 5 months ago
  Martin Kroeker 731f4dd686
Add VORTEXM4 settings 5 months ago
  Martin Kroeker 53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility 5 months ago
  Martin Kroeker 08a00326a4
Build symbol name from build system variables 5 months ago
  Martin Kroeker 89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels 5 months ago
  Martin Kroeker 22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS 5 months ago
  Martin Kroeker ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S 5 months ago
  Martin Kroeker 9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking 5 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 5 months ago
  Chip Kerchner 64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 5 months ago
  Chip Kerchner c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 5 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 5 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 5 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 5 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 6 months ago
  Martin Kroeker 0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372 6 months ago
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 6 months ago
  Chip Kerchner 72f082f31d Fix bad vector zero initializer and other compiler warnings for RISC-V. 6 months ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 6 months ago
  abhishek-fujitsu 0bc79da587 add neon header 6 months ago
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 6 months ago
  Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 6 months ago
  Martin Kroeker e2d941e9af
Declare the "small" kernel static in addition to inline 6 months ago
  Martin Kroeker 8214700930
Declare the "small" kernel static in addition to inline 6 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 6 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 6 months ago
  Chris Sidebottom 947d7af4c9 Fix CMake references to bscal and bgemv 6 months ago
  Chris Sidebottom e105411460 Add infrastructure for bgemv/bscal 6 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 6 months ago
  Martin Kroeker 343830c26f
Add BGEMM parameter tables 6 months ago
  Martin Kroeker ff614575c9
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds 6 months ago
  Martin Kroeker 0e11537cab
Merge pull request #5357 from Mousius/bgemm-init 6 months ago
  Chris Sidebottom 66d9185ebe Fix CMake support 6 months ago
  Martin Kroeker fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3 6 months ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 7 months ago
  Iha, Taisei f7ad906b49 Performance improvements of [SD]DOT with loop-unrolling on A64FX 7 months ago
  Martin Kroeker d96daa220d
Merge pull request #5290 from Srangrang/develop 7 months ago
  Martin Kroeker ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone 7 months ago
  davidz-ampere aa90ab4142 Add support for Ampere AmpereOne processors 7 months ago
  Ian McInerney badef1d32e Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types 7 months ago
  Martin Kroeker 3318a2b904
override CDOT and ZDOT with the generic C kernel 7 months ago
  davidz-ampere 84730068af reduce duplicate kernel code 7 months ago