Martin Kroeker
|
229d8a025e
|
Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
|
1 year ago |
SushilPratap04
|
3368a4e697
|
Update swap_kernel_sve.c
|
1 year ago |
CDAC-SSDG
|
dd71e4234a
|
Added Updated swap and rot sve kernels.
|
1 year ago |
CDAC-SSDG
|
06ffd411a5
|
Update KERNEL.ARMV8SVE
|
1 year ago |
CDAC-SSDG
|
765850194e
|
Delete kernel/arm64/swap_kernel_sve.c
|
1 year ago |
CDAC-SSDG
|
c17c19fbcf
|
Delete kernel/arm64/swap_kernel_c.c
|
1 year ago |
CDAC-SSDG
|
f6416c0e37
|
Delete kernel/arm64/swap.c
|
1 year ago |
CDAC-SSDG
|
3b7b74664c
|
Delete kernel/arm64/scal_kernel_sve.c
|
1 year ago |
CDAC-SSDG
|
95a97012e8
|
Delete kernel/arm64/scal_kernel_c.c
|
1 year ago |
CDAC-SSDG
|
5540f2121e
|
Delete kernel/arm64/scal.c
|
1 year ago |
CDAC-SSDG
|
f62519cc87
|
Delete kernel/arm64/rot_kernel_sve.c
|
1 year ago |
CDAC-SSDG
|
10857c9df4
|
Delete kernel/arm64/rot_kernel_c.c
|
1 year ago |
CDAC-SSDG
|
b9f51a5cf7
|
Delete kernel/arm64/rot.c
|
1 year ago |
Martin Kroeker
|
81666de4ef
|
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
|
1 year ago |
Martin Kroeker
|
3345007d8f
|
retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN
|
1 year ago |
Martin Kroeker
|
5fe983db29
|
retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies
|
1 year ago |
Iha, Taisei
|
4918beecbe
|
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
|
1 year ago |
Juliya32
|
3b2421cba0
|
Add files via upload
|
1 year ago |
Juliya32
|
012fe4da36
|
Delete kernel/arm64/rot_kernel_sve.c
|
1 year ago |
Juliya32
|
d90ee00f85
|
Delete kernel/arm64/rot_kernel_c.c
|
1 year ago |
Juliya32
|
668e28adc4
|
Delete kernel/arm64/rot.c
|
1 year ago |
SushilPratap04
|
fa880ab1cf
|
Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
|
1 year ago |
SushilPratap04
|
7822ae9617
|
Added sve kernels for rot routine.
|
1 year ago |
SushilPratap04
|
b8bc2a752e
|
Added sve optimized kernels for swap routine
|
1 year ago |
CDAC-SSDG
|
0667cf6c92
|
Added optimized scal routine files
|
1 year ago |
Deeksha Goplani
|
4894c54055
|
Improve TN case with further unrolling
|
1 year ago |
Chris Sidebottom
|
ba2e989c67
|
Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
|
1 year ago |
Martin Kroeker
|
fb7c53c5e5
|
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
|
1 year ago |
Martin Kroeker
|
a4e56e0452
|
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
|
1 year ago |
yamazaki-mitsufumi
|
88caf02f62
|
Fix ambiguous error on Mac OS
|
1 year ago |
Chris Sidebottom
|
ea4ab3b310
|
Better header guard around bridge
|
1 year ago |
Chris Sidebottom
|
7311d93016
|
Unroll TT further
|
1 year ago |
Chris Sidebottom
|
a9edddb695
|
Unroll TN further
|
1 year ago |
Chris Sidebottom
|
9984c5ce9d
|
Clean up k2 removal more and unroll SGEMM more
|
1 year ago |
Chris Sidebottom
|
b1c9fafabb
|
Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM
|
1 year ago |
Martin Kroeker
|
eb4879e04c
|
make NAN handling depend on the dummy2 parameter
|
1 year ago |
iha fujitsu
|
0985fdc82b
|
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
|
1 year ago |
Martin Kroeker
|
3677b3886c
|
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
|
1 year ago |
Chris Sidebottom
|
8c472ef7e3
|
Further tweak small GEMM for AArch64
|
1 year ago |
Martin Kroeker
|
a2ee4b1966
|
Merge branch 'OpenMathLib:develop' into issue4728
|
1 year ago |
Martin Kroeker
|
3ec59922b6
|
Add a clobber list to fix utest errors seen with gcc13 on Apple M
|
1 year ago |
Martin Kroeker
|
3d8054fb16
|
add clobber list
|
1 year ago |
Martin Kroeker
|
c7cacd9b38
|
disable the shortcut for da=0 to ensure proper handling of INF and NAN
|
1 year ago |
Matthias Langer
|
0050a9660b
|
Correctly detect ARM Neoverse V2 CPUs.
|
1 year ago |
Martin Kroeker
|
7cfd433d0c
|
revert the C/Z NRM2 kernels to the base NEON kernel as well
|
1 year ago |
Martin Kroeker
|
441c81026e
|
Add support for Cortex-A76
|
1 year ago |
Martin Kroeker
|
9ead81bd39
|
Revert S/DNRM2 to the base NEON kernel to fix precision loss
|
1 year ago |
Martin Kroeker
|
552c521353
|
remove another early exit for incx < 0
|
1 year ago |
Martin Kroeker
|
ed532dc75b
|
remove another early exit for incx < 0
|
1 year ago |
Martin Kroeker
|
e41d01bad9
|
remove early exit on negative inc_x
|
1 year ago |