Martin Kroeker
a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
1 year ago
yamazaki-mitsufumi
88caf02f62
Fix ambiguous error on Mac OS
1 year ago
Chris Sidebottom
ea4ab3b310
Better header guard around bridge
1 year ago
Chris Sidebottom
7311d93016
Unroll TT further
1 year ago
Chris Sidebottom
a9edddb695
Unroll TN further
1 year ago
Chris Sidebottom
9984c5ce9d
Clean up k2 removal more and unroll SGEMM more
1 year ago
Chris Sidebottom
b1c9fafabb
Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM
1 year ago
iha fujitsu
0985fdc82b
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
1 year ago
Martin Kroeker
3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
1 year ago
Chris Sidebottom
8c472ef7e3
Further tweak small GEMM for AArch64
1 year ago
Martin Kroeker
a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728
1 year ago
Martin Kroeker
3ec59922b6
Add a clobber list to fix utest errors seen with gcc13 on Apple M
1 year ago
Martin Kroeker
3d8054fb16
add clobber list
1 year ago
Martin Kroeker
c7cacd9b38
disable the shortcut for da=0 to ensure proper handling of INF and NAN
1 year ago
Matthias Langer
0050a9660b
Correctly detect ARM Neoverse V2 CPUs.
1 year ago
Martin Kroeker
7cfd433d0c
revert the C/Z NRM2 kernels to the base NEON kernel as well
1 year ago
Martin Kroeker
441c81026e
Add support for Cortex-A76
1 year ago
Martin Kroeker
9ead81bd39
Revert S/DNRM2 to the base NEON kernel to fix precision loss
1 year ago
Martin Kroeker
552c521353
remove another early exit for incx < 0
1 year ago
Martin Kroeker
ed532dc75b
remove another early exit for incx < 0
1 year ago
Martin Kroeker
e41d01bad9
remove early exit on negative inc_x
1 year ago
Martin Kroeker
02a025f9c1
remove early exit on negative inc_x
1 year ago
Chris Sidebottom
7a6fa699f2
Small GEMM for AArch64
This is a fairly conservative addition of small matrix kernels using
SVE.
1 year ago
Martin Kroeker
7d506984fa
fix assignment of default CSUM kernel
1 year ago
Martin Kroeker
12787775d9
add csum/zsum kernels (trivially derived from the asum ones)s)
1 year ago
Martin Kroeker
c9df62e883
Fix handling of NAN
2 years ago
Chris Sidebottom
ecae1389df
Reduce duplication in kernel definitions
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2 years ago
Chris Sidebottom
60e66725e4
Use numeric labels to allow repeated inlining
2 years ago
Chris Sidebottom
7a4fef4f60
Tweak SVE dot kernel
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2 years ago
Martin Kroeker
3bfa4d4dcc
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE
2 years ago
Martin Kroeker
e7d05402e0
Fix up S/D GEMM copy function definitions after #4009
2 years ago
Martin Kroeker
fc8894dd98
Workaround miscompilation by NVIDIA nvc
2 years ago
Martin Kroeker
5720fa02c5
Merge pull request #4168 from Mousius/sve-zgemm-cgemm
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2 years ago
Chris Sidebottom
84a268b6ca
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868 , this means I'm happy to enable this on any applicable cores.
I also replicated the unrolling the copies from sgemm and dgemm.
2 years ago
Chris Sidebottom
730ca04b48
Fix ZHEMM copy for SVE
Whilst disambiguating whilelt, I inadvertantly used the wrong datatype
for offsets, which can be negative. This rectifies that.
2 years ago
Martin Kroeker
849c8806b8
Merge pull request #4161 from Mousius/non-sve-kernels
Use latest non-SVE kernels in ARMV8SVE
2 years ago
Chris Sidebottom
24586bc4ff
Disambiguate whilelt
2 years ago
Chris Sidebottom
aea2a4622b
Use latest non-SVE kernels in ARMV8SVE
These are generally better and, in some cases, include threading which helps in the cores we're targeting here.
2 years ago
martin-frbg
7976deff80
Fix file permissions (issue 4095)
2 years ago
Martin Kroeker
3d31191b0f
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI ( #4140 )
* Add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH
* add casts to disambiguate svwhilelt for clang
2 years ago
Martin Kroeker
72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2 years ago
Chris Sidebottom
ec334e69dc
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2 years ago
Martin Kroeker
44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw)
2 years ago
Martin Kroeker
8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now)
2 years ago
Martin Kroeker
3727672a74
Improve workaround and keep compilers from optimizing it out
2 years ago
Martin Kroeker
108a21e47a
Move ALPHA out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
cda29633a3
move ALPHA_I out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
09ace3cf23
Merge pull request #3846 from lilh9598/sbgemm_opt
Improve the performance of sbgemm_tcopy on neoversen2
2 years ago