Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
9 years ago
Ashwin Sekhar T K
907e286eb6
THUNDERX2T99: Add threaded SNRM2 Implementation
9 years ago
Ashwin Sekhar T K
cde3aee08b
ARM64: Rename kernel files to have consistent naming
9 years ago
Ashwin Sekhar T K
ee6ea7e988
THUNDERX2T99: Add Optimized CNRM2 Implementation
9 years ago
Ashwin Sekhar T K
ca0b36b012
THUNDERX2T99: Add Optimized SNRM2 Implementation
9 years ago
Ashwin Sekhar T K
d0a79ca6e0
THUNDERX2T99: Add threaded DDOT Implementation
9 years ago
Ashwin Sekhar T K
0c07003ccf
THUNDERX2T99: Add Optimized DDOT Implementation
9 years ago
Ashwin Sekhar T K
f33fcedb30
THUNDERX2T99: Improve SGEMM
9 years ago
Ashwin Sekhar T K
0f1d6e8b39
THUNDERX2T99: Improve DGEMM
9 years ago
Ashwin Sekhar T K
981064acc6
THUNDERX2T99: Add Optimized DAXPY Implementation
9 years ago
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
9 years ago
Ashwin Sekhar T K
759f37feba
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
9 years ago
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
9 years ago
Andrew Pinski
95649dee28
THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
11 years ago
Andrew Pinski
8fdb0655e9
THUNDERX: Add an optimized version of ddot
11 years ago
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
9 years ago
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
9 years ago
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
9 years ago
Ashwin Sekhar T K
6085386b10
CORTEXA57: Add assembly kernels for copy routines
9 years ago
Ashwin Sekhar T K
c54a29bb48
Cortex A57: Improvements to DGEMM 8x4 kernel
9 years ago
Ashwin Sekhar T K
0a5ff9f9f9
Improvements to TRMM and GEMM kernels
9 years ago
Ashwin Sekhar T K
8a40f1355e
Improvements to GEMV kernels
9 years ago
Ashwin Sekhar T K
78782485b6
Improvements to COPY and IAMAX kernels
9 years ago
Ashwin Sekhar T K
278511ad2d
Cortex-A57: Fix clang compilation errors
10 years ago
Ashwin Sekhar T K
3b5ffb49d3
Cortex-A57: Improve DGEMM 8x4 Implementation
10 years ago
Ashwin Sekhar T K
5ac02f6dc7
Optimize Dgemm 4x4 for Cortex A57
10 years ago
Ashwin Sekhar T K
7aa1ad4923
Functional Assembly Kernels for CortexA57
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
10 years ago
Zhang Xianyi
74b0672223
Fix c/zaxpyc kernel bug on Cortex-A57.
10 years ago
Ashwin Sekhar T K
318f0949c3
lapack-test fixes in nrm2 kernels for Cortex A57
10 years ago
Ashwin Sekhar T K
98965da2e8
lapack-test fixes for Cortex A57
10 years ago
Ashwin Sekhar T K
c99c43d51e
Optimized trmm kernels for CORTEXA57
10 years ago
Ashwin Sekhar T K
1397b47197
Optimized zgemm kernel for CORTEXA57
10 years ago
Ashwin Sekhar T K
45f78963ac
Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
10 years ago
Ashwin Sekhar T K
402443bf9c
Optimized dgemm kernel for CORTEXA57
10 years ago
Ashwin Sekhar T K
19fdbee291
Improve the sgemm kernel for CORTEXA57
10 years ago
Ashwin Sekhar T K
3b0cdfab1e
Optimized gemv kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
46efa6a1da
Optimized swap kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
ea1465cdf8
Optimized scal kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
fb4be3b3eb
Optimized rot kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
6c2f4ddbcd
Optimized nrm2 kernels for CORTEXA57
10 years ago
Ashwin Sekhar T K
870c4d49c0
Optimized dot kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
cd7684097c
Optimized copy kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
2690b71b1f
Optimized axpy kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
3e4acedf0e
Optimized asum kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
2610752dbb
Optimized iamax kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
dbb213655e
Optimized amax kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Ashwin Sekhar T K
f2f8a0fe8b
Adding arm64 target CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Zhang Xianyi
e5b96e55a7
Fix build bug for ARM64.
11 years ago
Benedikt Huber
58c90d5937
# The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).
1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.
Added Dave Nuechterlein to the contributors list.
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
12 years ago