Martin Kroeker
|
5a262dc5e6
|
Work around optimizer bug in old gcc
|
6 years ago |
Martin Kroeker
|
520327aa11
|
Work around bug in old gcc
|
6 years ago |
Martin Kroeker
|
f195f7d2ea
|
Work around optimizer bug in old gcc
|
6 years ago |
Martin Kroeker
|
f3c314550c
|
Merge pull request #2243 from quickwritereader/develop
possible cgemv,caxpy,cdot fix
|
6 years ago |
AbdelRauf
|
847c20c9b7
|
fix uninitialized variables i
|
6 years ago |
AbdelRauf
|
4c22828812
|
caxpy and cdot are using vec_vsx_ld
|
6 years ago |
AbdelRauf
|
e79712d969
|
cgemv using vec_vsx_ld instead of letting gcc to decide
|
6 years ago |
AbdelRauf
|
be09551cdf
|
aligned
|
6 years ago |
Martin Kroeker
|
6b6c9b1441
|
Merge pull request #2172 from quickwritereader/develop
power9 cgemm/ctrmm. new sgemm 8x16
|
6 years ago |
AbdelRauf
|
a97b301aaa
|
cgemm/ctrmm power9
|
7 years ago |
Piotr Kubaj
|
eebfeba768
|
Fix build on FreeBSD/powerpc64.
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
|
7 years ago |
kavanabhat
|
a575f1e4c7
|
Update dtrmm_kernel_16x4_power8.S
|
7 years ago |
AbdelRauf
|
cdbfb891da
|
new sgemm 8x16
|
7 years ago |
Martin Kroeker
|
a17cf36225
|
Merge pull request #2153 from quickwritereader/develop
improved power9 zgemm,sgemm
|
7 years ago |
AbdelRauf
|
148c4cc5fd
|
conflict resolve
|
7 years ago |
AbdelRauf
|
d0c3543c3f
|
power9 zgemm ztrmm optimized
|
7 years ago |
AbdelRauf
|
a469b32cf4
|
sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
|
7 years ago |
AbdelRauf
|
8fe794f059
|
improved zgemm power9 based on power8
|
7 years ago |
Martin Kroeker
|
3f427c0cf9
|
Merge pull request #2107 from quickwritereader/develop
sgemm/strmm kernel for power9
|
7 years ago |
AbdelRauf
|
47f892198c
|
conflict resolve
|
7 years ago |
AbdelRauf
|
628b335e83
|
Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop
|
7 years ago |
AbdelRauf
|
0f105dd8a5
|
sgemm/strmm
|
7 years ago |
Martin Kroeker
|
ccfb7ead15
|
Merge pull request #2072 from martin-frbg/sum
Add (C)BLAS extension ?sum
|
7 years ago |
Rashmica Gupta
|
bcdf1d4917
|
Add in runtime CPU detection for POWER.
|
7 years ago |
Martin Kroeker
|
706dfe263b
|
Add POWER implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
|
7 years ago |
Martin Kroeker
|
7c51cc8527
|
Merge branch 'develop' into develop
|
7 years ago |
AbdelRauf
|
853a18bc17
|
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
|
7 years ago |
Martin Kroeker
|
718efcec6f
|
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
|
7 years ago |
Martin Kroeker
|
f9d67bb5e8
|
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
|
7 years ago |
Ubuntu
|
498ac98581
|
Note for unused kernels
|
7 years ago |
Ubuntu
|
cd9ea45463
|
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
|
7 years ago |
Ubuntu
|
4abc375a91
|
sgemv cgemv pairs
|
7 years ago |
Ubuntu
|
43a4572038
|
crot fix
|
7 years ago |
Abdelrauf
|
a034e65512
|
Merge branch 'develop' into develop
|
7 years ago |
Ubuntu
|
8c3386be87
|
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
Fixed idamin,icamin choosing the first occurance index of equal minimals
|
7 years ago |
Martin Kroeker
|
961d25e9c7
|
Use the new zrot.c on POWER8 for crot as well
fixes #1571 (the old zrot.S assembly does not handle incx=0 correctly)
|
8 years ago |
Martin Kroeker
|
8a3b6fa108
|
Use generic zrot.c on ppc64/POWER6 to work around utest failure from … (#1535)
* Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469
|
8 years ago |
QWR QWR
|
28ca97015d
|
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
|
8 years ago |
the mslm
|
2c0a008281
|
dgemm_ncopy_4_ save/restore
|
8 years ago |
the mslm
|
c5425daa6b
|
power8 ?gemm_tcopy save/restore
|
8 years ago |
martin
|
7a4b3cfbf8
|
Add trivially optimized DSDOT for POWER8
|
8 years ago |
Martin Kroeker
|
9c017a2218
|
Save and restore VSX registers
|
8 years ago |
Matt Brown
|
bd831a03a8
|
Optimise sscal for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
|
9 years ago |
Matt Brown
|
edc97918f8
|
Optimise srot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
|
9 years ago |
Matt Brown
|
e0034de22d
|
Optimise sdot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
|
9 years ago |
Matt Brown
|
32c7fe6bff
|
Optimise sasum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
|
9 years ago |
Matt Brown
|
19bdf9d52b
|
Optimise casum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
|
9 years ago |
Matt Brown
|
4f09030fdc
|
Optimise cswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
|
9 years ago |
Matt Brown
|
6f4eca5ea4
|
Optimise sswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
|
9 years ago |
Matt Brown
|
be55f96cbd
|
Optimise scopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
|
9 years ago |