2541 Commits (a9e8fa06bf11f174f339151a08978bcb3d49b59e)

Author SHA1 Message Date
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 6 months ago
  Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 6 months ago
  Martin Kroeker e2d941e9af
Declare the "small" kernel static in addition to inline 6 months ago
  Martin Kroeker 8214700930
Declare the "small" kernel static in addition to inline 6 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 6 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 6 months ago
  Chris Sidebottom 947d7af4c9 Fix CMake references to bscal and bgemv 6 months ago
  Chris Sidebottom e105411460 Add infrastructure for bgemv/bscal 6 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 6 months ago
  Martin Kroeker 343830c26f
Add BGEMM parameter tables 6 months ago
  Martin Kroeker ff614575c9
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds 6 months ago
  Martin Kroeker 0e11537cab
Merge pull request #5357 from Mousius/bgemm-init 6 months ago
  Chris Sidebottom 66d9185ebe Fix CMake support 6 months ago
  Martin Kroeker fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3 6 months ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 6 months ago
  Iha, Taisei f7ad906b49 Performance improvements of [SD]DOT with loop-unrolling on A64FX 6 months ago
  Martin Kroeker d96daa220d
Merge pull request #5290 from Srangrang/develop 7 months ago
  Martin Kroeker ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone 7 months ago
  davidz-ampere aa90ab4142 Add support for Ampere AmpereOne processors 7 months ago
  Ian McInerney badef1d32e Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types 7 months ago
  Martin Kroeker 3318a2b904
override CDOT and ZDOT with the generic C kernel 7 months ago
  davidz-ampere 84730068af reduce duplicate kernel code 7 months ago
  davidz-ampere be68ef03b4 Add support for Ampere processors 7 months ago
  Srangrang 9f13b2c6ac style: modify HALF to BFLOAT16 in benchmark folder 7 months ago
  Srangrang ec14e1648c fix: resolve non-RISCV host build failed issue 7 months ago
  Martin Kroeker e338d34ce1
fix path 7 months ago
  Martin Kroeker d36093d084
temporarily change default C/ZSCAL to the non-asm implementation 7 months ago
  Martin Kroeker b3c90564d7
resync with the generic arm version for inf/nan handling 7 months ago
  Martin Kroeker 6bdc7f9eb7
Merge pull request #5300 from martin-frbg/fixup5296 7 months ago
  Martin Kroeker 73af02b89f
use dummy2 as Inf/NAN handling flag 7 months ago
  Martin Kroeker 549a9f1dbb
Disable the default SSE kernels for CSCAL/ZSCAL for now 7 months ago
  Martin Kroeker 58eeb9041c
fix handling of dummy2 7 months ago
  Martin Kroeker 7c77537b25
Merge pull request #5297 from martin-frbg/zscal_x86_sparc 7 months ago
  Martin Kroeker 63287e1855
Merge pull request #5296 from martin-frbg/zscal_riscv 7 months ago
  Martin Kroeker d2855d3dab
Merge pull request #5285 from martin-frbg/zscal_zarch 7 months ago
  Martin Kroeker 1408be5fe0
Merge pull request #5282 from martin-frbg/zscal_power 7 months ago
  Martin Kroeker 1589d0b21e
Merge pull request #5281 from martin-frbg/zscal_arm64 7 months ago
  Martin Kroeker a86419fb66
Merge pull request #5280 from martin-frbg/zscal_x86_64 7 months ago
  Martin Kroeker 11ff18bb0f
Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal 7 months ago
  Martin Kroeker f4194fc65f
Merge branch 'develop' into la64_fixed_cscal_zscal 7 months ago
  Martin Kroeker e12132abd4
Use generic C/ZSCAL kernels to address inf/nan handling for now 7 months ago
  Martin Kroeker 1cefbea7ea
Use generic SCAL kernels to address inf/nan handling for now 7 months ago
  Sharif Inamdar 8279e68805 Optimize gemv_n_sve_v1x3 kernel 7 months ago
  Martin Kroeker f18b7a46bf
add dummy2 flag handling for inf/nan agnostic zeroing 7 months ago
  Martin Kroeker fe220a0d7d
Merge pull request #5291 from guoyuanplct/develop 7 months ago
  Arne Juul 5442aff218 Accumulate results in output register explicitly 7 months ago
  guoyuanplct 2ae019161a fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small 7 months ago
  Srangrang fb89820f20 Merge branch 'develop' of https://github.com/Srangrang/OpenBLAS into develop 7 months ago
  Srangrang 4e1a381e5b fix: resolve the compilation failure without zfh instruction 7 months ago
  gkdddd 670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B 8 months ago