Matt Brown
4f09030fdc
Optimise cswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
9 years ago
Matt Brown
6f4eca5ea4
Optimise sswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
9 years ago
Matt Brown
be55f96cbd
Optimise scopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
9 years ago
Matt Brown
96dd0ef4f7
Optimise ccopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
9 years ago
Martin Kroeker
35387edb8d
Merge pull request #1160 from gcp/extra-streamroller-cpuid
Add an extra familiy/model combination used by AMD Steamrolller.
9 years ago
Gian-Carlo Pascutto
9c884986ad
Add an extra familiy/model combination used by AMD Steamrolller (Godavari).
9 years ago
Martin Kroeker
f2f0e98bb5
Merge pull request #1158 from martin-frbg/force-zen
Make FORCE_ZEN option in getarch.c actually set target names to ZEN
9 years ago
Martin Kroeker
166d64eb7c
Fix FORCE_ZEN option in getarch.c
9 years ago
Martin Kroeker
e078339e8d
Merge pull request #1157 from gcp/revert-zen-param
Revert Zen param.h to Haswell values (instead of Excavator).
9 years ago
Gian-Carlo Pascutto
832a272784
Revert Zen param.h to Haswell values (instead of Excavator).
9 years ago
Martin Kroeker
356606314c
Merge pull request #1156 from SoapGentoo/cmake-fixes
Use GNUInstallDirs to allow changing target directories
9 years ago
David Seifert
ed79a29d87
Use GNUInstallDirs to allow changing target directories
* Multi-lib distributions need to change the libdir
which is only portably possible with `GNUInstallDirs`.
* Multi-arch distributions such as Debian and Exherbo
need to be able to change the bindir.
9 years ago
Martin Kroeker
77d16ffc69
Merge pull request #1154 from sharkcz/s390x
add lapack laswp directory for zarch
9 years ago
Dan Horák
56762d5e4c
add lapack laswp for zarch
9 years ago
Zhang Xianyi
90dd190a6d
Build shared library for Android.
9 years ago
Martin Kroeker
ab9ec4ab4e
Merge pull request #1148 from gcp/fix-dynamic-zen
Fix dynamic detection for ZEN CPUs.
9 years ago
Gian-Carlo Pascutto
0cbd2d34e4
Recognize ZEN when passed as OPENBLAS_CORETYPE.
9 years ago
Gian-Carlo Pascutto
62979fd104
Fix dynamic detection for ZEN CPUs.
9 years ago
Martin Kroeker
20a413e154
Merge pull request #1142 from amodra/develop
Power8 inline assembly tweaks
9 years ago
Alan Modra
dc40bc7368
Power8 inline assembly tweaks
Further fixes on top of 9e2f316ed . Writing some doco for gcc on
inline assembly woke me up to some more errors.
- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
*y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
operand, a problem if the "=f" sum output was ever allocated a vsx
reg in the altivec set. This might be possible with inlining and
future gcc optimisation.
9 years ago
Martin Kroeker
1acfc78c8f
Merge pull request #1140 from JohannesBuchner/develop
Autodetect AMD A8-6410 as BARCELONA
9 years ago
Johannes Buchner
b4071d0d16
Autodetect AMD A8-6410 as BARCELONA
9 years ago
Martin Kroeker
7908efafc8
Fix integer overflow in LAPACK DBDSQR, SBDSQR ( #1135 )
* Fix integer overflow in DBDSQR
As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.
* Fix integer overflow in SBDSQR
As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.
* Fix integer overflow in threshold calculation
Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
* Fix integer overflow in threshold calculation
Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
9 years ago
Martin Kroeker
66dc10b019
Merge pull request #1133 from steckdenis/develop
Add ZEN support
9 years ago
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
9 years ago
Andrew
99880f7906
Address unlikely memleak in zimatcopy interface ( #1129 )
* fix unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
9 years ago
Martin Kroeker
cd135e2b59
Merge pull request #1130 from quickwritereader/develop
Blas 3 for single precision
9 years ago
Martin Kroeker
ad124a5e8b
Merge pull request #1126 from martin-frbg/pgi
Fix compilation with PGI by replacing verbatim _real_, _imag_ extensions and updating macro definitions for modern, C99-capable versions of the PGI compiler
9 years ago
Martin Kroeker
211d2eceb5
Update zdot.c
9 years ago
Martin Kroeker
5813ed095b
Update zdot.c
9 years ago
Martin Kroeker
e44b028fe5
Replace gnu _real_, _imag_ extensions in initializers
9 years ago
Martin Kroeker
a6efabf155
Replace gnu _real_ , _imag_ extensions in initializers
9 years ago
Martin Kroeker
ea26b00c06
Fix CREAL,CIMAG macros for PGI
9 years ago
Abdurrauf
08786c4b95
strmm and ctrmm
9 years ago
Martin Kroeker
12e476f7a2
Merge pull request #1124 from martin-frbg/c_check-ppc
Update c_check.cmake to label ppc64 as power ARCH
9 years ago
Martin Kroeker
8de40955ad
Update c_check.cmake
9 years ago
Martin Kroeker
9b24688eed
Merge pull request #1122 from martin-frbg/zlasyf
Fix misspelling of zlasyf_aa from previous commit
9 years ago
Martin Kroeker
43224f7273
Fix misspelling of zlasyf_aa from previous commit
9 years ago
Martin Kroeker
9254a701f3
Merge pull request #1121 from staticfloat/sf/Xsymv_export
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
9 years ago
Elliot Saba
26a614fdd1
Whitespace cleanup/reformatting
9 years ago
Elliot Saba
7ae64f4f9c
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
9 years ago
Abdurrauf
82e80fa82b
initial strmm(sgemm). not tuned yet
9 years ago
Martin Kroeker
4227049c7d
Merge pull request #1111 from martin-frbg/kaby-no-avx
Fix core detection for Kaby Lake without AVX (G4560)
9 years ago
Martin Kroeker
688267edf3
Fix core detection for Kaby Lake without AVX (G4560)
Should fix #1109 )
9 years ago
Martin Kroeker
d1fe040d9b
Merge pull request #1110 from quickwritereader/develop
Conventional usage of the register save area.
9 years ago
Abdurrauf
411982715c
conventional usage of the register save area
9 years ago
Abdurrauf
e831d6924e
changed to conventional register save area
9 years ago
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
Optimized Implementations for ThunderX2T99
9 years ago
Ashwin Sekhar T K
a86474c6f7
THUNDERX2T99: Performance fix for ZGEMM
9 years ago
Ashwin Sekhar T K
67473d09dd
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
9 years ago