Martin Kroeker
5e244d80f2
Merge pull request #2271 from quickwritereader/strmm_fix
fixed bug power9 strmm . BLAS-TESTER passes
6 years ago
AbdelRauf
ede5efebab
trmm fix
6 years ago
Martin Kroeker
596a22325a
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc
6 years ago
Martin Kroeker
7f58f3ad0e
Fix mis-edits in the gcc-derived power8 caxpy kernel
6 years ago
Martin Kroeker
673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions ( #2263 )
* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy
To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0
* Use gcc-generated assembly instead of original C sources
to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3
* Use gcc-generated assembly instead of the original C source
to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3
* Add gcc7-generated assembler version of caxpy for power8
to work around wrong code generated by gcc 8.3
* Handle CONJ define for caxpyc
* Handle CONJ define for caxpyc
* Add gcc7-generated assembly cdot for POWER9
* Use prebuilt assembly for POWER9 cdot
created with gcc 7.3.1 to work around ICE in older gcc versions
* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6
* Update Makefile.system
* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH
* Disable POWER9 with old gcc versions
6 years ago
Martin Kroeker
f3c314550c
Merge pull request #2243 from quickwritereader/develop
possible cgemv,caxpy,cdot fix
6 years ago
AbdelRauf
847c20c9b7
fix uninitialized variables i
6 years ago
AbdelRauf
4c22828812
caxpy and cdot are using vec_vsx_ld
6 years ago
AbdelRauf
e79712d969
cgemv using vec_vsx_ld instead of letting gcc to decide
6 years ago
AbdelRauf
be09551cdf
aligned
6 years ago
Martin Kroeker
6b6c9b1441
Merge pull request #2172 from quickwritereader/develop
power9 cgemm/ctrmm. new sgemm 8x16
6 years ago
AbdelRauf
a97b301aaa
cgemm/ctrmm power9
6 years ago
Piotr Kubaj
eebfeba768
Fix build on FreeBSD/powerpc64.
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
6 years ago
kavanabhat
a575f1e4c7
Update dtrmm_kernel_16x4_power8.S
6 years ago
AbdelRauf
cdbfb891da
new sgemm 8x16
6 years ago
Martin Kroeker
a17cf36225
Merge pull request #2153 from quickwritereader/develop
improved power9 zgemm,sgemm
6 years ago
AbdelRauf
148c4cc5fd
conflict resolve
6 years ago
AbdelRauf
d0c3543c3f
power9 zgemm ztrmm optimized
6 years ago
AbdelRauf
a469b32cf4
sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
6 years ago
AbdelRauf
8fe794f059
improved zgemm power9 based on power8
6 years ago
Martin Kroeker
3f427c0cf9
Merge pull request #2107 from quickwritereader/develop
sgemm/strmm kernel for power9
6 years ago
AbdelRauf
47f892198c
conflict resolve
6 years ago
AbdelRauf
628b335e83
Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop
6 years ago
AbdelRauf
0f105dd8a5
sgemm/strmm
6 years ago
Martin Kroeker
ccfb7ead15
Merge pull request #2072 from martin-frbg/sum
Add (C)BLAS extension ?sum
6 years ago
Rashmica Gupta
bcdf1d4917
Add in runtime CPU detection for POWER.
6 years ago
Martin Kroeker
706dfe263b
Add POWER implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
6 years ago
Martin Kroeker
7c51cc8527
Merge branch 'develop' into develop
6 years ago
AbdelRauf
853a18bc17
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
6 years ago
Martin Kroeker
718efcec6f
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
7 years ago
Martin Kroeker
f9d67bb5e8
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
7 years ago
Ubuntu
498ac98581
Note for unused kernels
7 years ago
Ubuntu
cd9ea45463
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
7 years ago
Ubuntu
4abc375a91
sgemv cgemv pairs
7 years ago
Ubuntu
43a4572038
crot fix
7 years ago
Abdelrauf
a034e65512
Merge branch 'develop' into develop
7 years ago
Ubuntu
8c3386be87
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
Fixed idamin,icamin choosing the first occurance index of equal minimals
7 years ago
Martin Kroeker
961d25e9c7
Use the new zrot.c on POWER8 for crot as well
fixes #1571 (the old zrot.S assembly does not handle incx=0 correctly)
7 years ago
Martin Kroeker
8a3b6fa108
Use generic zrot.c on ppc64/POWER6 to work around utest failure from … ( #1535 )
* Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469
7 years ago
QWR QWR
28ca97015d
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
8 years ago
the mslm
2c0a008281
dgemm_ncopy_4_ save/restore
8 years ago
the mslm
c5425daa6b
power8 ?gemm_tcopy save/restore
8 years ago
martin
7a4b3cfbf8
Add trivially optimized DSDOT for POWER8
8 years ago
Martin Kroeker
9c017a2218
Save and restore VSX registers
8 years ago
Matt Brown
bd831a03a8
Optimise sscal for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago
Matt Brown
edc97918f8
Optimise srot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago
Matt Brown
e0034de22d
Optimise sdot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago
Matt Brown
32c7fe6bff
Optimise sasum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago
Matt Brown
19bdf9d52b
Optimise casum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago
Matt Brown
4f09030fdc
Optimise cswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago