Chris Sidebottom
1361229291
Remove prefetches from SVE kernels
This is a precursor to enabling the SVE kernels for Arm(R) Neoverse(TM)
V1 which has 256-bit SVE. Testing revealed that the SVE kernel was
actually worse in some cases than the existing kernel which seemed odd -
removing these prefetches the underlying architecture seems to do a better job
😸
3 years ago
Chris Sidebottom
eea006a688
Wrap SVE header with __has_include check
3 years ago
Chris Sidebottom
fd4f52c797
Add SVE implementation for sdot/ddot
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.
All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
3 years ago
Chris Sidebottom
4f7b77e08a
Remove unnecessary instructions from Advanced SIMD dot
The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.
This has an impact on smaller sized dots and seemed like a quick fix
3 years ago
Martin Kroeker
1688c7da43
change line endings from CRLF to LF
3 years ago
Honglin Zhu
79066b6bf3
Change file name to match the norm and delete useless code.
3 years ago
Honglin Zhu
b00d5b9746
New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
2. Padding k to a power of 4.
3 years ago
Martin Kroeker
e12d474780
Eliminate uses of CREAL on left-hand side of assignments
3 years ago
Martin Kroeker
9e29598575
workaround fault with ssq=inf,scale=0
3 years ago
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
3 years ago
Honglin Zhu
bc3728475f
format code
3 years ago
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
3 years ago
Honglin Zhu
04593bb27c
neoverse n2 sbgemm: init file
3 years ago
Nursultan Zarlyk
1bb7993a97
Fix MSVC ARM64 build. Add generic kernel for ARM64
3 years ago
Martin Kroeker
115bc9b98f
CortexX1 is ARMV8 like A7x
3 years ago
Martin Kroeker
b3b4672c30
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
3 years ago
Martin Kroeker
c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
4 years ago
Bine Brank
19d435b1b3
update armv8sve + contributors
4 years ago
Bine Brank
0fb6cc07bf
fix ztrsm lt/ut copy
4 years ago
Bine Brank
f1315288a8
add sve ztrsm
4 years ago
Bine Brank
aaa2b1a861
fix sve dtrsm kernels
4 years ago
Bine Brank
8071e179f1
add remaining sve trsm copy kernels
4 years ago
Bine Brank
f87468ac91
trsm_lncopy_sve
4 years ago
Bine Brank
e8939b3d30
sve trsmRN and trsmRT
4 years ago
Bine Brank
098672b51b
add trsm_kernel_LT_sve
4 years ago
Bine Brank
be7e55880c
sve trsm_kernel_LN
4 years ago
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
4 years ago
Bine Brank
f33543d029
combine zchemm into single file
4 years ago
Bine Brank
39ab219704
sve copy functions for cgemm chemm zsymm
4 years ago
Bine Brank
18102ae8c3
add cgemm ctrmm sve kernels
4 years ago
Bine Brank
87537b8c55
modify sve zgemmcopy kernels
4 years ago
Bine Brank
d30157d891
update configuration of kernels for A64FX and ARMV8SVE
4 years ago
Bine Brank
2e2c02b762
fix sve ztrmm kernel
4 years ago
Bine Brank
68c414d3a6
ztrmm sve copy functions
4 years ago
Bine Brank
ce329ab686
add sve zhemm copy routines
4 years ago
Bine Brank
0140373802
add sve ztrmm
4 years ago
Bine Brank
f7b6912868
ztrmm sve copy kernels
4 years ago
Bine Brank
40b14e4957
fix zgemm kernel
4 years ago
Bine Brank
6ec4aab875
zgemm sve copy routines
4 years ago
Bine Brank
878064f394
sve zgemm kernel
4 years ago
Bine Brank
683a7548bf
added macros for sve zgemm kernels
4 years ago
Bine Brank
e3c9947c0f
prepare kernel for sve zgemm
4 years ago
Jia-Chen
b610d2de37
optimize cgemm on ARM cortex A53 & cortex A55
4 years ago
Bine Brank
a8f62a347b
fix UNROLL_MN and add to targets for SVE
4 years ago
Bine Brank
a1fea1fe2a
sgemm v2x8 SVE kernel
4 years ago
Bine Brank
abe1ce3434
strmm sve v1x8 kernel
4 years ago
Bine Brank
0de36f7b5c
trmm sve copy fucntions for single precision
4 years ago
Bine Brank
86ae89bf33
add sgemm kernel and copy functions for sgemm and ssymm
4 years ago
Martin Kroeker
454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
Add dgemm kernel for arm64 SVE
4 years ago
Jia-Chen
5c1cd5e0c2
MOD: add comments to a53 zgemm kernel
4 years ago