Martin Kroeker
e8002536ec
disable quiet_make for the moment
8 years ago
Martin Kroeker
15a78d6b66
export NO_AVX512 setting
8 years ago
Martin Kroeker
38ad05bd04
Extend loop range to find SkylakeX in force_coretype
8 years ago
Martin Kroeker
b7feded85a
Propagate NO_AVX512 via CCOMMON_OPT
8 years ago
Martin Kroeker
dc9fe05ab5
Update cpuid_x86.c
8 years ago
Martin Kroeker
8be027e4c6
Update dynamic.c
8 years ago
Martin Kroeker
ac7b6e3e9a
Fix misplaced endif
8 years ago
Arjan van de Ven
89372e0993
Use AVX512 also for DGEMM
this required switching to the generic gemm_beta code (which is faster anyway on SKX)
for both DGEMM and SGEMM
Performance for the not-retuned version is in the 30% range
8 years ago
Martin Kroeker
ef626c6824
typo fix
8 years ago
Martin Kroeker
5a51cf4576
Separate Skylake X from Skylake
8 years ago
Martin Kroeker
5a92b311e0
Separate Skylake X from Skylake
8 years ago
Martin Kroeker
a7d0f49cec
Add SKYLAKEX to DYNAMIC_CORE list only if AVX512 is available
8 years ago
Martin Kroeker
f1fb9a4745
Propagate NO_AVX512 if needed
8 years ago
Martin Kroeker
0023515733
Typo fix (misplaced parenthesis)
8 years ago
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
8 years ago
Martin Kroeker
36c4523d85
Merge pull request #1587 from matthew-brett/fix-compile-error-early-glibc
Revert "take out unused variables"
8 years ago
Matthew Brett
a8002e283a
Revert "take out unused variables"
This reverts commit e5752ff9b3 .
The variables i and n are used in the `#if !__GLIBC_PREREQ(2, 7)`
branch.
Closes gh-1586.
8 years ago
Martin Kroeker
401adddb2b
Merge pull request #1585 from martin-frbg/lapack-253
Fixes from Lapack-Reference PR 253
8 years ago
Martin Kroeker
c5b13d4e10
Fixes from netlib PR 253
8 years ago
Martin Kroeker
677e42d7b0
Fixes from netlib PR 253
When minimal workspace is given in ?hesv_aa, ?sysv_aa, ?hesv_aa_2stage, ?sysv_aa_2stage, now no error is given
Quick return for ?laqr1
8 years ago
Martin Kroeker
e2a8c35e5a
Fixes from netlib PR253
LAPACKE interfaces for Aasen's functions now call ?sytrf_aa and ?hetrf_aa instead of ?sytrf and ?hetrf
8 years ago
Martin Kroeker
1a49fb1c05
Merge pull request #1584 from martin-frbg/issue1503
Work around name clash with Windows10's winnt.h
8 years ago
Martin Kroeker
8562d5787a
Merge pull request #1583 from martin-frbg/issue1575
Handle INCX=0,INCY=0 case
8 years ago
Martin Kroeker
93f1eb09c3
Merge pull request #1582 from martin-frbg/develop-031
Update version number on the develop branch to 0.3.1.dev
8 years ago
Martin Kroeker
c90bbda3df
Merge pull request #1581 from martin-frbg/issue1574-2
Fix paths to LIN and EIG tests
8 years ago
Martin Kroeker
7df8c4f76f
typo fix
8 years ago
Martin Kroeker
2fc748bf72
Restore optimized swap kernel now that we have a proper fix
8 years ago
Martin Kroeker
a91f1587b9
Work around name clash with Windows10's winnt.h
fixes #1503
8 years ago
Martin Kroeker
d1b7be14aa
Handle INCX=0,INCY=0 case
Fixes #1575 (sswap/dswap failing the swap utest on x86) as suggested by atsampson.
8 years ago
Martin Kroeker
b491b10057
Update version to 0.3.1.dev
8 years ago
Martin Kroeker
5fae96fb70
Update version to 0.3.1.dev
8 years ago
Martin Kroeker
a7dbd4c57d
Fix paths to LIN and EIG tests
should fix 1574
8 years ago
Martin Kroeker
2cae104b5e
Merge pull request #1579 from martin-frbg/issue1574
Adapt lapack-test and blas-test to changes in netlib directory layout
8 years ago
Martin Kroeker
908d40be71
Adapt lapack-test and blas-test to changes in netlib directory layout
partial fix for #1574 - the problem with lapack_testing.py looks like an upstream bug
8 years ago
Zhang Xianyi
43e592ceb3
Add -lm for Android.
Conflicts:
exports/Makefile
8 years ago
Martin Kroeker
f0f27868d8
Merge pull request #1572 from martin-frbg/issue1571
Use the new zrot.c on POWER8 for crot as well
8 years ago
Martin Kroeker
961d25e9c7
Use the new zrot.c on POWER8 for crot as well
fixes #1571 (the old zrot.S assembly does not handle incx=0 correctly)
8 years ago
Martin Kroeker
f5959f2543
Merge pull request #1567 from martin-frbg/mipstrmm
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
8 years ago
Martin Kroeker
82012b960b
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
... as it was just a silly workaround for the issue seen in #1563 , caused by #1419
8 years ago
Martin Kroeker
8dd3515fa2
Merge pull request #1565 from martin-frbg/mipstypo
Remove extraneous brace from previous commit of mips dsdot fix
8 years ago
Martin Kroeker
95f7f0229c
Remove extraneous brace from previous commit
8 years ago
Martin Kroeker
5082fe4306
Merge pull request #1564 from martin-frbg/issue1563
Revert changes from PR#1419
8 years ago
Martin Kroeker
7a7619af6d
Revert changes from PR#1419
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
8 years ago
Martin Kroeker
9a400b7014
Merge pull request #1562 from martin-frbg/issue1561
Use correct data type for initializers of v2f64, v4f32
8 years ago
Martin Kroeker
893b535540
Use correct data type for initializers of v2f64, v4f32
Fixes #1561
8 years ago
Martin Kroeker
6791294312
Merge pull request #1559 from martin-frbg/buildconf
Add build-time configuration options to pkgconfig file
8 years ago
Martin Kroeker
ddb8b124de
Merge pull request #1558 from martin-frbg/instpc
Overwrite any pre-existing openblas.pc rather than append to it
8 years ago
Martin Kroeker
191746c493
Merge pull request #1557 from martin-frbg/getconfig
Add threading and OpenMP information to output
8 years ago
Martin Kroeker
eb9b021d38
Add build-time configuration options to pkgconfig file
8 years ago
Martin Kroeker
7d7564568c
Add build-time configuration options to pkgconfig file
8 years ago