261 Commits (a4e56e045266b6049e336ff27db184fc42ba7ff1)

Author SHA1 Message Date
  Martin Kroeker a4e56e0452
Merge pull request #4806 from Mousius/small-gemm 1 year ago
  yamazaki-mitsufumi 88caf02f62 Fix ambiguous error on Mac OS 1 year ago
  Chris Sidebottom ea4ab3b310 Better header guard around bridge 1 year ago
  Chris Sidebottom 7311d93016 Unroll TT further 1 year ago
  Chris Sidebottom a9edddb695 Unroll TN further 1 year ago
  Chris Sidebottom 9984c5ce9d Clean up k2 removal more and unroll SGEMM more 1 year ago
  Chris Sidebottom b1c9fafabb Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM 1 year ago
  iha fujitsu 0985fdc82b A64FX: Add support for SVE to SGEMV/DGEMV kernels. 1 year ago
  Martin Kroeker 3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace 1 year ago
  Chris Sidebottom 8c472ef7e3 Further tweak small GEMM for AArch64 1 year ago
  Martin Kroeker a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728 1 year ago
  Martin Kroeker 3ec59922b6
Add a clobber list to fix utest errors seen with gcc13 on Apple M 1 year ago
  Martin Kroeker 3d8054fb16
add clobber list 1 year ago
  Martin Kroeker c7cacd9b38
disable the shortcut for da=0 to ensure proper handling of INF and NAN 1 year ago
  Matthias Langer 0050a9660b Correctly detect ARM Neoverse V2 CPUs. 1 year ago
  Martin Kroeker 7cfd433d0c
revert the C/Z NRM2 kernels to the base NEON kernel as well 1 year ago
  Martin Kroeker 441c81026e
Add support for Cortex-A76 1 year ago
  Martin Kroeker 9ead81bd39
Revert S/DNRM2 to the base NEON kernel to fix precision loss 1 year ago
  Martin Kroeker 552c521353
remove another early exit for incx < 0 1 year ago
  Martin Kroeker ed532dc75b
remove another early exit for incx < 0 1 year ago
  Martin Kroeker e41d01bad9
remove early exit on negative inc_x 1 year ago
  Martin Kroeker 02a025f9c1
remove early exit on negative inc_x 1 year ago
  Chris Sidebottom 7a6fa699f2 Small GEMM for AArch64 1 year ago
  Martin Kroeker 7d506984fa
fix assignment of default CSUM kernel 1 year ago
  Martin Kroeker 12787775d9
add csum/zsum kernels (trivially derived from the asum ones)s) 1 year ago
  Martin Kroeker c9df62e883
Fix handling of NAN 2 years ago
  Chris Sidebottom ecae1389df Reduce duplication in kernel definitions 2 years ago
  Chris Sidebottom 60e66725e4 Use numeric labels to allow repeated inlining 2 years ago
  Chris Sidebottom 7a4fef4f60 Tweak SVE dot kernel 2 years ago
  Martin Kroeker 3bfa4d4dcc
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 2 years ago
  Martin Kroeker e7d05402e0
Fix up S/D GEMM copy function definitions after #4009 2 years ago
  Martin Kroeker fc8894dd98
Workaround miscompilation by NVIDIA nvc 2 years ago
  Martin Kroeker 5720fa02c5
Merge pull request #4168 from Mousius/sve-zgemm-cgemm 2 years ago
  Chris Sidebottom 84a268b6ca Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
  Chris Sidebottom 730ca04b48 Fix ZHEMM copy for SVE 2 years ago
  Martin Kroeker 849c8806b8
Merge pull request #4161 from Mousius/non-sve-kernels 2 years ago
  Chris Sidebottom 24586bc4ff Disambiguate whilelt 2 years ago
  Chris Sidebottom aea2a4622b Use latest non-SVE kernels in ARMV8SVE 2 years ago
  martin-frbg 7976deff80 Fix file permissions (issue 4095) 2 years ago
  Martin Kroeker 3d31191b0f
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
  Martin Kroeker 72caceb324
Merge pull request #4009 from Mousius/sve-gemm 2 years ago
  Chris Sidebottom ec334e69dc Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 2 years ago
  Martin Kroeker 44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw) 2 years ago
  Martin Kroeker 8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now) 2 years ago
  Martin Kroeker 3727672a74
Improve workaround and keep compilers from optimizing it out 2 years ago
  Martin Kroeker 108a21e47a
Move ALPHA out of register 18 (reserved on OSX) 2 years ago
  Martin Kroeker 0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
  Martin Kroeker c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
  Martin Kroeker cda29633a3
move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
  Martin Kroeker 09ace3cf23
Merge pull request #3846 from lilh9598/sbgemm_opt 2 years ago