Chip Kerchner
2bb7ea64a1
Only vectorize 64-bit version for Power8.
2 years ago
Chip Kerchner
09bb48d1b9
Vectorize in-copy packing/copying for SGEMM - 4X faster.
2 years ago
Martin Kroeker
d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
4 years ago
Martin Kroeker
251a09ec90
Typo fix
5 years ago
Martin Kroeker
95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY
5 years ago
Martin Kroeker
f308e741b2
remove debug output and revert changes to cdot and crot
5 years ago
Martin Kroeker
f8c2697701
Use POWER6 GEMM, TRMM and DTRSM on 32bit POWER8
5 years ago
Rajalakshmi Srinivasaraghavan
bd9ff820bc
Fix cmake compilation issue - POWER9
This patch removes extra space in the sgemmotcopy filename
thereby allowing it to create entry in kernel/Makefile
created by cmake.
5 years ago
Martin Kroeker
06208c8d01
Limit this fix to ELFv2 builds
5 years ago
Martin Kroeker
f5c4c28b98
Work around POWER8BE bugs on FreeBSD (ELFv2)
for #2299
5 years ago
Martin Kroeker
0b39cf95b0
Fix endianness conditionals
6 years ago
Martin Kroeker
9f39f0a2c3
Specify ismin/ismax assembly kernels for POWER8 directly
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
6 years ago
Martin Kroeker
d483e9270a
Update KERNEL.POWER8
6 years ago
Martin Kroeker
01834aee33
Merge pull request #29 from xianyi/develop
rebase
6 years ago
Martin Kroeker
d92bd5be24
Update KERNEL.POWER8
6 years ago
Martin Kroeker
46e4b12946
Update KERNEL.POWER8
6 years ago
Martin Kroeker
dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround
6 years ago
Martin Kroeker
cad0d150db
Define alternate kernels for big-endian POWER8
6 years ago
Martin Kroeker
673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions ( #2263 )
* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy
To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0
* Use gcc-generated assembly instead of original C sources
to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3
* Use gcc-generated assembly instead of the original C source
to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3
* Add gcc7-generated assembler version of caxpy for power8
to work around wrong code generated by gcc 8.3
* Handle CONJ define for caxpyc
* Handle CONJ define for caxpyc
* Add gcc7-generated assembly cdot for POWER9
* Use prebuilt assembly for POWER9 cdot
created with gcc 7.3.1 to work around ICE in older gcc versions
* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6
* Update Makefile.system
* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH
* Disable POWER9 with old gcc versions
6 years ago
Rashmica Gupta
bcdf1d4917
Add in runtime CPU detection for POWER.
6 years ago
Ubuntu
4abc375a91
sgemv cgemv pairs
7 years ago
Ubuntu
8c3386be87
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
Fixed idamin,icamin choosing the first occurance index of equal minimals
7 years ago
QWR QWR
28ca97015d
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
8 years ago
martin
7a4b3cfbf8
Add trivially optimized DSDOT for POWER8
8 years ago
Zhang Xianyi
515bc56ea9
Refs #946 . Use nrm2 reference implementation for Power8.
9 years ago
Zhang Xianyi
ae70b916f4
Refs #929 . Deal with zero and NaNs for scale.
9 years ago
Werner Saar
8fb5a1aaff
added optimized dtrsm_LT kernel for POWER8
9 years ago
Werner Saar
56948dbf0f
optimized dgemm for POWER8
9 years ago
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
9 years ago
Werner Saar
a3da10662f
added sgemm_tcopy_8_power8.S
9 years ago
Werner Saar
d46f07bb4e
added cgemm_tcopy_8_power8.S
9 years ago
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
9 years ago
Werner Saar
9276c9012f
Optimized sgemm and dgemm and tested again.
9 years ago
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
9 years ago
Werner Saar
68a69c5b50
added optimized dgemv_n kernel for POWER8
9 years ago
Werner Saar
c2464a7c4a
added optimized casum kernel for POWER8
9 years ago
Werner Saar
294f933869
added optimized zasum kernel for POWER8
9 years ago
Werner Saar
f59c9bd6ef
added optimized sasum kernel for POWER8
9 years ago
Werner Saar
c53be46d78
added optimized dasum kernel for POWER8
9 years ago
Werner Saar
659ed16591
added otimized cswap and zswap kernels for POWER8
9 years ago
Werner Saar
35c98a3556
added optimized zscal kernel for POWER8
9 years ago
Werner Saar
f1a5dd06c5
added optimized sscal kernel for POWER8
9 years ago
Werner Saar
35f1f21a7f
added drot- and srot-kernel optimimized for POWER8
9 years ago
Werner Saar
3d9a50e841
added optimized sswap kernel for POWER8
10 years ago
Werner Saar
828c849b44
added optimized ccopy kernel for POWER8
10 years ago
Werner Saar
ecc0bc9813
added optimized scopy kernel for POWER8
10 years ago
Werner Saar
12f209b7b0
added optimized zswap kernel for POWER8
10 years ago
Werner Saar
7316a87930
added optimized dswap kernel for POWER8
10 years ago
Werner Saar
0bff057a87
added optimized dcopy kernel for POWER8
10 years ago
Werner Saar
1e6cf9808c
added optimized dscal kernel for POWER8
10 years ago