CDAC-SSDG
95a97012e8
Delete kernel/arm64/scal_kernel_c.c
1 year ago
CDAC-SSDG
5540f2121e
Delete kernel/arm64/scal.c
1 year ago
CDAC-SSDG
f62519cc87
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
CDAC-SSDG
10857c9df4
Delete kernel/arm64/rot_kernel_c.c
1 year ago
CDAC-SSDG
b9f51a5cf7
Delete kernel/arm64/rot.c
1 year ago
Juliya32
3b2421cba0
Add files via upload
1 year ago
Juliya32
012fe4da36
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
Juliya32
d90ee00f85
Delete kernel/arm64/rot_kernel_c.c
1 year ago
Juliya32
668e28adc4
Delete kernel/arm64/rot.c
1 year ago
SushilPratap04
fa880ab1cf
Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
1 year ago
SushilPratap04
7822ae9617
Added sve kernels for rot routine.
1 year ago
SushilPratap04
b8bc2a752e
Added sve optimized kernels for swap routine
1 year ago
CDAC-SSDG
0667cf6c92
Added optimized scal routine files
1 year ago
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
1 year ago
Chris Sidebottom
ba2e989c67
Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
1 year ago
Martin Kroeker
fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
1 year ago
Martin Kroeker
a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
1 year ago
yamazaki-mitsufumi
88caf02f62
Fix ambiguous error on Mac OS
1 year ago
Chris Sidebottom
ea4ab3b310
Better header guard around bridge
1 year ago
Chris Sidebottom
7311d93016
Unroll TT further
1 year ago
Chris Sidebottom
a9edddb695
Unroll TN further
1 year ago
Chris Sidebottom
9984c5ce9d
Clean up k2 removal more and unroll SGEMM more
1 year ago
Chris Sidebottom
b1c9fafabb
Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM
1 year ago
Martin Kroeker
eb4879e04c
make NAN handling depend on the dummy2 parameter
1 year ago
iha fujitsu
0985fdc82b
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
1 year ago
Martin Kroeker
3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
1 year ago
Chris Sidebottom
8c472ef7e3
Further tweak small GEMM for AArch64
1 year ago
Martin Kroeker
a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728
1 year ago
Martin Kroeker
3ec59922b6
Add a clobber list to fix utest errors seen with gcc13 on Apple M
1 year ago
Martin Kroeker
3d8054fb16
add clobber list
1 year ago
Martin Kroeker
c7cacd9b38
disable the shortcut for da=0 to ensure proper handling of INF and NAN
1 year ago
Matthias Langer
0050a9660b
Correctly detect ARM Neoverse V2 CPUs.
1 year ago
Martin Kroeker
7cfd433d0c
revert the C/Z NRM2 kernels to the base NEON kernel as well
1 year ago
Martin Kroeker
441c81026e
Add support for Cortex-A76
1 year ago
Martin Kroeker
9ead81bd39
Revert S/DNRM2 to the base NEON kernel to fix precision loss
1 year ago
Martin Kroeker
552c521353
remove another early exit for incx < 0
1 year ago
Martin Kroeker
ed532dc75b
remove another early exit for incx < 0
1 year ago
Martin Kroeker
e41d01bad9
remove early exit on negative inc_x
1 year ago
Martin Kroeker
02a025f9c1
remove early exit on negative inc_x
1 year ago
Chris Sidebottom
7a6fa699f2
Small GEMM for AArch64
This is a fairly conservative addition of small matrix kernels using
SVE.
2 years ago
Martin Kroeker
7d506984fa
fix assignment of default CSUM kernel
1 year ago
Martin Kroeker
12787775d9
add csum/zsum kernels (trivially derived from the asum ones)s)
1 year ago
Martin Kroeker
c9df62e883
Fix handling of NAN
2 years ago
Chris Sidebottom
ecae1389df
Reduce duplication in kernel definitions
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2 years ago
Chris Sidebottom
60e66725e4
Use numeric labels to allow repeated inlining
2 years ago
Chris Sidebottom
7a4fef4f60
Tweak SVE dot kernel
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2 years ago
Martin Kroeker
3bfa4d4dcc
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE
2 years ago
Martin Kroeker
e7d05402e0
Fix up S/D GEMM copy function definitions after #4009
2 years ago
Martin Kroeker
fc8894dd98
Workaround miscompilation by NVIDIA nvc
2 years ago
Martin Kroeker
5720fa02c5
Merge pull request #4168 from Mousius/sve-zgemm-cgemm
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2 years ago