Ashwin Sekhar T K
d5aeff636f
ARM64: Enable DYNAMIC_ARCH
Enable DYNAMIC_ARCH feature on ARM64. This patch uses the cpuid
feature in linux kernel to detect the core type at runtime
(https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt ).
If this feature is missing in kernel, then the user should use the
OPENBLAS_CORETYPE env variable to select the desired core type.
7 years ago
Ashwin Sekhar T K
d50abc8903
ARM64: Move parameters from parameter.c to param.h
Remove the runtime setting of P, Q, R parameters for
targets ARMV8, THUNDERX2T99. Instead set them as constants
in param.h at compile time.
7 years ago
Ashwin Sekhar T K
351a0c777c
ARM64: Remove XGENE1 references
Remove XGENE1 target as the implementation for the
same is incomplete. Moreover whoever wishes to use
on XGENE1 can use the generic ARMV8 target as there
are no XGENE1 specific optimizations in OpenBLAS.
7 years ago
Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
7 years ago
Ashwin Sekhar T K
caf339412f
ARM64: Remove dependency of THUNDERX2T99 Makefile on CORTEXA57 Makefile
7 years ago
Ashwin Sekhar T K
8001fdcd2a
ARM64: Remove dependency of THUNDERX Makefile on ARMV8 Makefile
7 years ago
Ashwin Sekhar T K
162e312832
ARM64: Remove dependency of CORTEXA57 Makefile on ARMV8 Makefile
7 years ago
Ashwin Sekhar T K
c3d93caa8d
ARM64: Remove dependency of XGENE1 Makefile on ARMV8 Makefile
7 years ago
Martin Kroeker
1cb7b9015e
Conditional compilation of assembly files that IOS does not like
7 years ago
Martin Kroeker
a4bd41e9f2
Fix paths to C kernels for nrm2
7 years ago
Craig Donner
c2545b0fd6
Fixed a few more unnecessary calls to num_cpu_avail.
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
7 years ago
Ashwin Sekhar T K
fa9ca65c0e
ARM64: Fix utest dsdot errors
8 years ago
Martin Kroeker
c9d408064a
Use dot.S also for DSDOT on CORTEXA57
8 years ago
Martin Kroeker
288d1a3f6e
Use dot.S also for DSDOT on ARMV8
8 years ago
Martin Kroeker
b47e6822aa
Enable most assembly kernels in the generic ARMV8 target
ref #1439
8 years ago
Ashwin Sekhar T K
a0128aa489
ARM64: Convert all labels to local labels
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.
To avoid this, all the labels within the kernels are changed to local
labels.
8 years ago
Ashwin Sekhar T K
4899d67f7d
THUDNERX2T99: Fix clang compilation
8 years ago
Ashwin Sekhar T K
67473d09dd
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
9 years ago
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
9 years ago
Ashwin Sekhar T K
a3935f0dfb
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
9 years ago
Ashwin Sekhar T K
738628e9a8
ARM64: Remove unused code
9 years ago
Ashwin Sekhar T K
ab3ffab96a
THUNDERX2T99: Add Optimized C/Z DOT Implementation
9 years ago
Ashwin Sekhar T K
f036be9ce2
THUNDERX2T99: Add Optimized SDOT Implementation
9 years ago
Ashwin Sekhar T K
faba876fda
THUNDERX2T99: Bug fix in C/Z IAMAX
9 years ago
Ashwin Sekhar T K
172a62d73e
THUNDERX2T99: Add Optimized C/Z IAMAX Implementation
9 years ago
Ashwin Sekhar T K
228c75a69c
THUNDERX2T99: Add parallel SCNRM2 Implementation
9 years ago
Ashwin Sekhar T K
8e89668f62
THUNDERX2T99: Fix bug in SNRM2
9 years ago
Ashwin Sekhar T K
f63deae9de
THUNDERX2T99: Add Optimized S/D IAMAX Implementation
9 years ago
Ashwin Sekhar T K
071a830e8b
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
9 years ago
Ashwin Sekhar T K
d09f88192c
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
9 years ago
Ashwin Sekhar T K
e58233460a
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
9 years ago
Ashwin Sekhar T K
99bd2892bf
THUNDERX2T99: Add optimized CASUM Implementation
9 years ago
Ashwin Sekhar T K
ff6f572f2e
THUNDERX2T99: Rename labels in for DDOT and SNRM2
9 years ago
Ashwin Sekhar T K
e0dc5f58c5
THUNDERX2T99: Remove Duplicate Code
9 years ago
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
9 years ago
Ashwin Sekhar T K
907e286eb6
THUNDERX2T99: Add threaded SNRM2 Implementation
9 years ago
Ashwin Sekhar T K
cde3aee08b
ARM64: Rename kernel files to have consistent naming
9 years ago
Ashwin Sekhar T K
ee6ea7e988
THUNDERX2T99: Add Optimized CNRM2 Implementation
9 years ago
Ashwin Sekhar T K
ca0b36b012
THUNDERX2T99: Add Optimized SNRM2 Implementation
9 years ago
Ashwin Sekhar T K
d0a79ca6e0
THUNDERX2T99: Add threaded DDOT Implementation
9 years ago
Ashwin Sekhar T K
0c07003ccf
THUNDERX2T99: Add Optimized DDOT Implementation
9 years ago
Ashwin Sekhar T K
f33fcedb30
THUNDERX2T99: Improve SGEMM
9 years ago
Ashwin Sekhar T K
0f1d6e8b39
THUNDERX2T99: Improve DGEMM
9 years ago
Ashwin Sekhar T K
981064acc6
THUNDERX2T99: Add Optimized DAXPY Implementation
9 years ago
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
9 years ago
Ashwin Sekhar T K
759f37feba
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
9 years ago
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
9 years ago
Andrew Pinski
95649dee28
THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
10 years ago
Andrew Pinski
8fdb0655e9
THUNDERX: Add an optimized version of ddot
10 years ago
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
9 years ago