Martin Kroeker
72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2 years ago
Chris Sidebottom
ec334e69dc
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2 years ago
Martin Kroeker
44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw)
2 years ago
Martin Kroeker
8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now)
2 years ago
Martin Kroeker
3727672a74
Improve workaround and keep compilers from optimizing it out
2 years ago
Martin Kroeker
108a21e47a
Move ALPHA out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
cda29633a3
move ALPHA_I out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
09ace3cf23
Merge pull request #3846 from lilh9598/sbgemm_opt
Improve the performance of sbgemm_tcopy on neoversen2
2 years ago
Chris Sidebottom
1361229291
Remove prefetches from SVE kernels
This is a precursor to enabling the SVE kernels for Arm(R) Neoverse(TM)
V1 which has 256-bit SVE. Testing revealed that the SVE kernel was
actually worse in some cases than the existing kernel which seemed odd -
removing these prefetches the underlying architecture seems to do a better job
😸
3 years ago
lilianhuang
729af6406f
bugfix for sbgemm_ncopy_8_neoversen2
3 years ago
Chris Sidebottom
eea006a688
Wrap SVE header with __has_include check
3 years ago
Chris Sidebottom
fd4f52c797
Add SVE implementation for sdot/ddot
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.
All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
3 years ago
lilianhuang
fdac8a97c1
Add sbgemm_ncopy_8 and sbgemm_tcopy_4
3 years ago
lilianhuang
135718eafc
Improve the performance of sbgemm_tcopy on neoversen2
3 years ago
Chris Sidebottom
4f7b77e08a
Remove unnecessary instructions from Advanced SIMD dot
The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.
This has an impact on smaller sized dots and seemed like a quick fix
3 years ago
Martin Kroeker
1688c7da43
change line endings from CRLF to LF
3 years ago
Honglin Zhu
79066b6bf3
Change file name to match the norm and delete useless code.
3 years ago
Honglin Zhu
b00d5b9746
New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
2. Padding k to a power of 4.
3 years ago
Martin Kroeker
e12d474780
Eliminate uses of CREAL on left-hand side of assignments
3 years ago
Martin Kroeker
9e29598575
workaround fault with ssq=inf,scale=0
3 years ago
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
3 years ago
Honglin Zhu
bc3728475f
format code
3 years ago
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
3 years ago
Honglin Zhu
04593bb27c
neoverse n2 sbgemm: init file
3 years ago
Nursultan Zarlyk
1bb7993a97
Fix MSVC ARM64 build. Add generic kernel for ARM64
3 years ago
Martin Kroeker
115bc9b98f
CortexX1 is ARMV8 like A7x
3 years ago
Martin Kroeker
b3b4672c30
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
3 years ago
Martin Kroeker
c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
4 years ago
Bine Brank
19d435b1b3
update armv8sve + contributors
4 years ago
Bine Brank
0fb6cc07bf
fix ztrsm lt/ut copy
4 years ago
Bine Brank
f1315288a8
add sve ztrsm
4 years ago
Bine Brank
aaa2b1a861
fix sve dtrsm kernels
4 years ago
Bine Brank
8071e179f1
add remaining sve trsm copy kernels
4 years ago
Bine Brank
f87468ac91
trsm_lncopy_sve
4 years ago
Bine Brank
e8939b3d30
sve trsmRN and trsmRT
4 years ago
Bine Brank
098672b51b
add trsm_kernel_LT_sve
4 years ago
Bine Brank
be7e55880c
sve trsm_kernel_LN
4 years ago
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
4 years ago
Bine Brank
f33543d029
combine zchemm into single file
4 years ago
Bine Brank
39ab219704
sve copy functions for cgemm chemm zsymm
4 years ago
Bine Brank
18102ae8c3
add cgemm ctrmm sve kernels
4 years ago
Bine Brank
87537b8c55
modify sve zgemmcopy kernels
4 years ago
Bine Brank
d30157d891
update configuration of kernels for A64FX and ARMV8SVE
4 years ago
Bine Brank
2e2c02b762
fix sve ztrmm kernel
4 years ago
Bine Brank
68c414d3a6
ztrmm sve copy functions
4 years ago
Bine Brank
ce329ab686
add sve zhemm copy routines
4 years ago
Bine Brank
0140373802
add sve ztrmm
4 years ago
Bine Brank
f7b6912868
ztrmm sve copy kernels
4 years ago