Martin Kroeker
69fa4eb701
Merge b8c0a1f7e2 into 39eecfd20c
9 years ago
Martin Kroeker
60eea75409
Merge pull request #1076 from ashwinyes/develop_20170130_thunderx2t99
More optimized implementations for ThunderX2T99
9 years ago
Martin Kroeker
b8c0a1f7e2
Fix register clobbers
Remove PIC registers and memory from clobber list, add vector registers to list - fixes accidental overwriting of callee saved registers and compilation with gcc7
Copied from patch provided by Alan Modra in #1078
9 years ago
Ashwin Sekhar T K
d09f88192c
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
9 years ago
Ashwin Sekhar T K
e58233460a
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
9 years ago
Ashwin Sekhar T K
99bd2892bf
THUNDERX2T99: Add optimized CASUM Implementation
9 years ago
Ashwin Sekhar T K
ff6f572f2e
THUNDERX2T99: Rename labels in for DDOT and SNRM2
9 years ago
Ashwin Sekhar T K
e0dc5f58c5
THUNDERX2T99: Remove Duplicate Code
9 years ago
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
9 years ago
Zhang Xianyi
ff41e13385
Merge pull request #1074 from ashwinyes/develop_20170116_thunderx2t99_sgemm
Add more THUNDERX2T99 Optimized APIs
9 years ago
Ashwin Sekhar T K
907e286eb6
THUNDERX2T99: Add threaded SNRM2 Implementation
9 years ago
Ashwin Sekhar T K
cde3aee08b
ARM64: Rename kernel files to have consistent naming
9 years ago
Ashwin Sekhar T K
ee6ea7e988
THUNDERX2T99: Add Optimized CNRM2 Implementation
9 years ago
Ashwin Sekhar T K
ca0b36b012
THUNDERX2T99: Add Optimized SNRM2 Implementation
9 years ago
Ashwin Sekhar T K
d0a79ca6e0
THUNDERX2T99: Add threaded DDOT Implementation
9 years ago
Ashwin Sekhar T K
0c07003ccf
THUNDERX2T99: Add Optimized DDOT Implementation
9 years ago
Ashwin Sekhar T K
f33fcedb30
THUNDERX2T99: Improve SGEMM
9 years ago
Ashwin Sekhar T K
0f1d6e8b39
THUNDERX2T99: Improve DGEMM
9 years ago
Ashwin Sekhar T K
981064acc6
THUNDERX2T99: Add Optimized DAXPY Implementation
9 years ago
Shivraj Patil
a4d97d980f
Added rot functions.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
9 years ago
Ashwin Sekhar T K
759f37feba
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
9 years ago
Zhang Xianyi
0863a0d4b4
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
Add new targets for ARM64
9 years ago
Werner Saar
28e2fab33e
prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two
9 years ago
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
9 years ago
Andrew Pinski
95649dee28
THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
10 years ago
Andrew Pinski
8fdb0655e9
THUNDERX: Add an optimized version of ddot
10 years ago
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
9 years ago
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
9 years ago
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
9 years ago
Ashwin Sekhar T K
6085386b10
CORTEXA57: Add assembly kernels for copy routines
9 years ago
kaustubh
1480f3df71
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
9 years ago
kaustubh
88afb3bc94
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
9 years ago
Zhang Xianyi
b678471d65
Merge branch 'z13' into develop
Conflicts:
CONTRIBUTORS.md
9 years ago
Zhang Xianyi
864e202afd
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
9 years ago
Abdurrauf
6418667818
dtrmm and dgemm for z13
9 years ago
Shivraj Patil
a9bf8a781a
Added prefetch to CGEMV and ZGEMV.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
kaustubh
5f93aa5f87
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
9 years ago
kaustubh
9db451acd0
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
9 years ago
kaustubh
3eaff85191
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
9 years ago
kaustubh
00abce3b93
Add data prefetch in DOT and ASUM functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
9 years ago
Andrew
becf8bc7a0
remove dead code
9 years ago
kaustubh
f3419e634c
SGEMM, DGEMM, CGEMM, ZGEMM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
9 years ago
Zhang Xianyi
7472c79ea6
Merge pull request #984 from ksraste/develop
STRSM, DTRSM functions data prefetch
9 years ago
kaustubh
90e2321ac3
STRSM, DTRSM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
9 years ago
Martin Kroeker
4998e19869
Change file comments to work around clang 3.9 assembler bug
9 years ago
Martin Kroeker
91610f3835
Update zdot_msa.c
9 years ago
Martin Kroeker
6e22ecf102
Update zdot.c
9 years ago
Martin Kroeker
6221d6df5f
Update zdot.c
9 years ago
Martin Kroeker
16446d1d23
Remove explicit include of complex.h
9 years ago