Zhang Xianyi
|
ff41e13385
|
Merge pull request #1074 from ashwinyes/develop_20170116_thunderx2t99_sgemm
Add more THUNDERX2T99 Optimized APIs
|
9 years ago |
Ashwin Sekhar T K
|
ee6ea7e988
|
THUNDERX2T99: Add Optimized CNRM2 Implementation
|
9 years ago |
Ashwin Sekhar T K
|
ca0b36b012
|
THUNDERX2T99: Add Optimized SNRM2 Implementation
|
9 years ago |
Ashwin Sekhar T K
|
d0a79ca6e0
|
THUNDERX2T99: Add threaded DDOT Implementation
|
9 years ago |
Ashwin Sekhar T K
|
0c07003ccf
|
THUNDERX2T99: Add Optimized DDOT Implementation
|
9 years ago |
Ashwin Sekhar T K
|
f33fcedb30
|
THUNDERX2T99: Improve SGEMM
|
9 years ago |
Ashwin Sekhar T K
|
0f1d6e8b39
|
THUNDERX2T99: Improve DGEMM
|
9 years ago |
Ashwin Sekhar T K
|
981064acc6
|
THUNDERX2T99: Add Optimized DAXPY Implementation
|
9 years ago |
Shivraj Patil
|
a4d97d980f
|
Added rot functions.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
9 years ago |
Ashwin Sekhar T K
|
f279ff4789
|
THUNDERX2T99: Add Optimized SGEMM Implementation
|
9 years ago |
Ashwin Sekhar T K
|
759f37feba
|
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
|
9 years ago |
Zhang Xianyi
|
0863a0d4b4
|
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
Add new targets for ARM64
|
9 years ago |
Werner Saar
|
28e2fab33e
|
prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two
|
9 years ago |
Ashwin Sekhar T K
|
4b55fae337
|
ARM64: Add Cavium THUNDERX2T99 Target
|
9 years ago |
Andrew Pinski
|
95649dee28
|
THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
|
10 years ago |
Andrew Pinski
|
8fdb0655e9
|
THUNDERX: Add an optimized version of ddot
|
10 years ago |
Andrew Pinski
|
fb200c7245
|
ARM64: Add Cavium THUNDERX Target
|
9 years ago |
Ashwin Sekhar T K
|
0b8e876d89
|
VULCAN: Add optimized DGEMM implementation
|
9 years ago |
Ashwin Sekhar T K
|
4713e7c47f
|
ARM64: Add the VULCAN Target
|
9 years ago |
Ashwin Sekhar T K
|
6085386b10
|
CORTEXA57: Add assembly kernels for copy routines
|
9 years ago |
kaustubh
|
1480f3df71
|
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
9 years ago |
kaustubh
|
88afb3bc94
|
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
9 years ago |
Zhang Xianyi
|
b678471d65
|
Merge branch 'z13' into develop
Conflicts:
CONTRIBUTORS.md
|
9 years ago |
Zhang Xianyi
|
864e202afd
|
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
|
9 years ago |
Abdurrauf
|
6418667818
|
dtrmm and dgemm for z13
|
9 years ago |
Shivraj Patil
|
a9bf8a781a
|
Added prefetch to CGEMV and ZGEMV.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
9 years ago |
kaustubh
|
5f93aa5f87
|
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
9 years ago |
kaustubh
|
9db451acd0
|
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
9 years ago |
kaustubh
|
3eaff85191
|
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
9 years ago |
kaustubh
|
00abce3b93
|
Add data prefetch in DOT and ASUM functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
9 years ago |
Andrew
|
becf8bc7a0
|
remove dead code
|
9 years ago |
kaustubh
|
f3419e634c
|
SGEMM, DGEMM, CGEMM, ZGEMM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
9 years ago |
Zhang Xianyi
|
7472c79ea6
|
Merge pull request #984 from ksraste/develop
STRSM, DTRSM functions data prefetch
|
9 years ago |
kaustubh
|
90e2321ac3
|
STRSM, DTRSM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
9 years ago |
Martin Kroeker
|
4998e19869
|
Change file comments to work around clang 3.9 assembler bug
|
9 years ago |
Martin Kroeker
|
91610f3835
|
Update zdot_msa.c
|
9 years ago |
Martin Kroeker
|
6e22ecf102
|
Update zdot.c
|
9 years ago |
Martin Kroeker
|
6221d6df5f
|
Update zdot.c
|
9 years ago |
Martin Kroeker
|
16446d1d23
|
Remove explicit include of complex.h
|
9 years ago |
Martin Kroeker
|
a6e9e0b94b
|
Remove explicit include of complex.h
|
9 years ago |
Martin Kroeker
|
3178e4fea0
|
Remove explicit include of complex.h
|
9 years ago |
Martin Kroeker
|
95c245ddb0
|
Remove explicit include of complex.h
|
9 years ago |
Martin Kroeker
|
4b1b27347f
|
Remove explicit include of complex.h
|
9 years ago |
Shivraj Patil
|
54747fe24a
|
DGEMM function split and data prefech
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
9 years ago |
Zhang Xianyi
|
515bc56ea9
|
Refs #946. Use nrm2 reference implementation for Power8.
|
9 years ago |
Zhang Xianyi
|
ae70b916f4
|
Refs #929. Deal with zero and NaNs for scale.
|
9 years ago |
Shivraj Patil
|
9687437928
|
MIPS n32 ABI and build time mips simd support check
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
9 years ago |
Shivraj Patil
|
d1c6469283
|
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
9 years ago |
Ashwin Sekhar T K
|
c54a29bb48
|
Cortex A57: Improvements to DGEMM 8x4 kernel
|
9 years ago |
Shivraj Patil
|
beb1d076a4
|
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
9 years ago |