Ashwin Sekhar T K
a0128aa489
ARM64: Convert all labels to local labels
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.
To avoid this, all the labels within the kernels are changed to local
labels.
8 years ago
Ashwin Sekhar T K
4899d67f7d
THUDNERX2T99: Fix clang compilation
8 years ago
Ashwin Sekhar T K
67473d09dd
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
9 years ago
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
9 years ago
Ashwin Sekhar T K
a3935f0dfb
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
9 years ago
Ashwin Sekhar T K
738628e9a8
ARM64: Remove unused code
9 years ago
Ashwin Sekhar T K
ab3ffab96a
THUNDERX2T99: Add Optimized C/Z DOT Implementation
9 years ago
Ashwin Sekhar T K
f036be9ce2
THUNDERX2T99: Add Optimized SDOT Implementation
9 years ago
Ashwin Sekhar T K
faba876fda
THUNDERX2T99: Bug fix in C/Z IAMAX
9 years ago
Ashwin Sekhar T K
172a62d73e
THUNDERX2T99: Add Optimized C/Z IAMAX Implementation
9 years ago
Ashwin Sekhar T K
228c75a69c
THUNDERX2T99: Add parallel SCNRM2 Implementation
9 years ago
Ashwin Sekhar T K
8e89668f62
THUNDERX2T99: Fix bug in SNRM2
9 years ago
Ashwin Sekhar T K
f63deae9de
THUNDERX2T99: Add Optimized S/D IAMAX Implementation
9 years ago
Ashwin Sekhar T K
071a830e8b
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
9 years ago
Ashwin Sekhar T K
d09f88192c
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
9 years ago
Ashwin Sekhar T K
e58233460a
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
9 years ago
Ashwin Sekhar T K
99bd2892bf
THUNDERX2T99: Add optimized CASUM Implementation
9 years ago
Ashwin Sekhar T K
ff6f572f2e
THUNDERX2T99: Rename labels in for DDOT and SNRM2
9 years ago
Ashwin Sekhar T K
e0dc5f58c5
THUNDERX2T99: Remove Duplicate Code
9 years ago
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
9 years ago
Ashwin Sekhar T K
907e286eb6
THUNDERX2T99: Add threaded SNRM2 Implementation
9 years ago
Ashwin Sekhar T K
cde3aee08b
ARM64: Rename kernel files to have consistent naming
9 years ago
Ashwin Sekhar T K
ee6ea7e988
THUNDERX2T99: Add Optimized CNRM2 Implementation
9 years ago
Ashwin Sekhar T K
ca0b36b012
THUNDERX2T99: Add Optimized SNRM2 Implementation
9 years ago
Ashwin Sekhar T K
d0a79ca6e0
THUNDERX2T99: Add threaded DDOT Implementation
9 years ago
Ashwin Sekhar T K
0c07003ccf
THUNDERX2T99: Add Optimized DDOT Implementation
9 years ago
Ashwin Sekhar T K
f33fcedb30
THUNDERX2T99: Improve SGEMM
9 years ago
Ashwin Sekhar T K
0f1d6e8b39
THUNDERX2T99: Improve DGEMM
9 years ago
Ashwin Sekhar T K
981064acc6
THUNDERX2T99: Add Optimized DAXPY Implementation
9 years ago
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
9 years ago
Ashwin Sekhar T K
759f37feba
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
9 years ago
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
9 years ago
Andrew Pinski
95649dee28
THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
10 years ago
Andrew Pinski
8fdb0655e9
THUNDERX: Add an optimized version of ddot
10 years ago
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
9 years ago
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
9 years ago
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
9 years ago
Ashwin Sekhar T K
6085386b10
CORTEXA57: Add assembly kernels for copy routines
9 years ago
Ashwin Sekhar T K
c54a29bb48
Cortex A57: Improvements to DGEMM 8x4 kernel
9 years ago
Ashwin Sekhar T K
0a5ff9f9f9
Improvements to TRMM and GEMM kernels
9 years ago
Ashwin Sekhar T K
8a40f1355e
Improvements to GEMV kernels
9 years ago
Ashwin Sekhar T K
78782485b6
Improvements to COPY and IAMAX kernels
9 years ago
Ashwin Sekhar T K
278511ad2d
Cortex-A57: Fix clang compilation errors
9 years ago
Ashwin Sekhar T K
3b5ffb49d3
Cortex-A57: Improve DGEMM 8x4 Implementation
10 years ago
Ashwin Sekhar T K
5ac02f6dc7
Optimize Dgemm 4x4 for Cortex A57
10 years ago
Ashwin Sekhar T K
7aa1ad4923
Functional Assembly Kernels for CortexA57
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
10 years ago
Zhang Xianyi
74b0672223
Fix c/zaxpyc kernel bug on Cortex-A57.
10 years ago
Ashwin Sekhar T K
318f0949c3
lapack-test fixes in nrm2 kernels for Cortex A57
10 years ago
Ashwin Sekhar T K
98965da2e8
lapack-test fixes for Cortex A57
10 years ago
Ashwin Sekhar T K
c99c43d51e
Optimized trmm kernels for CORTEXA57
10 years ago