Martin Kroeker
|
a8bfe4f709
|
Expand gcc version check to include early 7.x
|
6 years ago |
Martin Kroeker
|
e607c67e5f
|
Expand gcc version check to include early 7.x
|
6 years ago |
Martin Kroeker
|
ae969f071d
|
Remove extraneous brace
|
6 years ago |
Martin Kroeker
|
857a1baa29
|
Remove extraneous brace
|
6 years ago |
Martin Kroeker
|
c446cc54e2
|
Remove extraneous brace
|
6 years ago |
Martin Kroeker
|
bda2a10033
|
Remove extraneous brace
|
6 years ago |
Martin Kroeker
|
1d02aebdfa
|
Work around optimizer bug in old gcc
|
6 years ago |
Martin Kroeker
|
5a262dc5e6
|
Work around optimizer bug in old gcc
|
6 years ago |
Martin Kroeker
|
520327aa11
|
Work around bug in old gcc
|
6 years ago |
Martin Kroeker
|
f195f7d2ea
|
Work around optimizer bug in old gcc
|
6 years ago |
Martin Kroeker
|
f3c314550c
|
Merge pull request #2243 from quickwritereader/develop
possible cgemv,caxpy,cdot fix
|
6 years ago |
AbdelRauf
|
847c20c9b7
|
fix uninitialized variables i
|
6 years ago |
AbdelRauf
|
4c22828812
|
caxpy and cdot are using vec_vsx_ld
|
6 years ago |
AbdelRauf
|
e79712d969
|
cgemv using vec_vsx_ld instead of letting gcc to decide
|
6 years ago |
AbdelRauf
|
be09551cdf
|
aligned
|
6 years ago |
Martin Kroeker
|
11c59acfb1
|
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows
|
6 years ago |
Martin Kroeker
|
3a55dca2dc
|
Make x86_64 zdot compile with PGI and Sun C again
broken by #2222 as CREAL,CIMAG do not expand to a valid lvalue with these compilers
|
6 years ago |
Martin Kroeker
|
9ef96b32a6
|
Add multithreading support to the x86_64 zdot kernel (#2222)
* Add multithreading support
copied from the ThunderX2T99 kernel. For #2221
|
6 years ago |
Martin Kroeker
|
103b32fdb7
|
Merge pull request #2216 from martin-frbg/issue2214
Remove case-sensitivity in x86 LSAME on (AMD) cpus without CMOV
|
6 years ago |
Martin Kroeker
|
aef9804089
|
Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV
Problem was already noticed some years ago in #238, but back then the problem was only corrected in one of the #ifdef branches.
Fixes #2214
|
6 years ago |
Martin Kroeker
|
dccff2e785
|
Merge pull request #2206 from martin-frbg/zen-dtrmm
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
|
6 years ago |
Martin Kroeker
|
5c3458a6e7
|
Merge pull request #2199 from martin-frbg/zen-dtrsm
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
|
6 years ago |
Martin Kroeker
|
acf6002ab2
|
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
|
6 years ago |
Martin Kroeker
|
2dfb804cb9
|
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
to improve performance on AMD Zen (#2180) applying wjc404's improvement of the DGEMM kernel from #2186
|
6 years ago |
Martin Kroeker
|
4c153ec9da
|
Merge pull request #2196 from wjc404/develop
Add vbroadcastsd kernel to dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
7eecd8e39c
|
Add files via upload
|
6 years ago |
Martin Kroeker
|
7b0b7c11d2
|
Merge pull request #2190 from martin-frbg/zdot-zen
Replace vpermpd with vpermilpd in the Haswell/Zen zdot microkernel
|
6 years ago |
Martin Kroeker
|
28e96458e5
|
Replace vpermpd with vpermilpd
to improve performance on Zen/Zen2 (as demonstrated by wjc404 in #2180)
|
6 years ago |
wjc404
|
95fb98f556
|
Update dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
4801c6d36b
|
Update dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
9440fa607d
|
Add files via upload
|
6 years ago |
wjc404
|
94db259e5b
|
Add files via upload
|
6 years ago |
wjc404
|
f49f8047ac
|
Add files via upload
|
6 years ago |
wjc404
|
825777faab
|
Update dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
9c89757562
|
Add files via upload
|
6 years ago |
wjc404
|
9b04baeaee
|
Update dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
8a074b3965
|
Update dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
211ab03b14
|
Update dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
1733f927e6
|
Update dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
182b06d6ad
|
Update dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
7a9050d681
|
Update dgemm_kernel_4x8_haswell.S
|
6 years ago |
wjc404
|
0ba29fd262
|
Update dgemm_kernel_4x8_haswell.S for zen2
replaced a bunch of vpermpd instructions with vpermilpd and vperm2f128
|
6 years ago |
Martin Kroeker
|
6b6c9b1441
|
Merge pull request #2172 from quickwritereader/develop
power9 cgemm/ctrmm. new sgemm 8x16
|
6 years ago |
AbdelRauf
|
a97b301aaa
|
cgemm/ctrmm power9
|
7 years ago |
Piotr Kubaj
|
eebfeba768
|
Fix build on FreeBSD/powerpc64.
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
|
6 years ago |
kavanabhat
|
a575f1e4c7
|
Update dtrmm_kernel_16x4_power8.S
|
7 years ago |
AbdelRauf
|
cdbfb891da
|
new sgemm 8x16
|
7 years ago |
Martin Kroeker
|
a17cf36225
|
Merge pull request #2153 from quickwritereader/develop
improved power9 zgemm,sgemm
|
7 years ago |
AbdelRauf
|
148c4cc5fd
|
conflict resolve
|
7 years ago |
AbdelRauf
|
d0c3543c3f
|
power9 zgemm ztrmm optimized
|
7 years ago |