Arjan van de Ven
850b73dbb9
saxpy_haswell: Add AVX512 support
avx512 support fits nicely in the C+intrinsics code and gets a
speed improvement for vectors where the saxpy operation is not fully
memory bound
7 years ago
Arjan van de Ven
06ea72f5a5
write saxpy_haswell kernel using C intrinsics and don't disallow inlining
the intrinsics version of saxpy is more readable than the inline asm version,
and in the intrinsics version there's no reason anymore to ban inlining
(since the compiler has full visibility now) which gives a mid single digits
improvement in performance
7 years ago
Arjan van de Ven
d86604687f
saxpy_haswell: Use named arguments in inline asm
Improves readability
7 years ago
Arjan van de Ven
ef30a7239c
sdot_haswell: similar to ddot: turn into intrinsics based C code that supports AVX512
do the same thing for SDOT that the previous patches did for DDOT; the perf gain
is in the 60% range so at least somewhat interesting
7 years ago
Arjan van de Ven
21c6220d63
fix typo in dsymv avx512 code path
7 years ago
Arjan van de Ven
34d63df4b3
Add AVX512 support to DDOT
now that it's written in C + intrinsics it's easy to add AVX512 support
for DDOT
7 years ago
Arjan van de Ven
ae38fa55c3
Use intrinsics instead of inline asm
Intrinsics based code is generally easier to read for the non-math part
of the algorithm and it's easier to add, say, AVX512 to it later
7 years ago
Arjan van de Ven
847bbd6f4c
use named arguments in the inline asm
makes the asm easier to read
7 years ago
Arjan van de Ven
9c29524f50
various code cleanups and comments
7 years ago
Arjan van de Ven
f2810beafb
Add AVX512 support to dsymv_L_microk_haswell-2.c
Now that the code is written in intrinsics it's relatively easy to add AVX512 support
7 years ago
Arjan van de Ven
c202e06297
Write dsymv_kernel_4x4 for Haswell using intrinsics
intrinsics make the non-math part of the code easier to follow
than all hand coded asm, and it also helps getting ready for
adding avx512 support
7 years ago
Arjan van de Ven
0faba28adb
dsymv_L haswell: use symbol names for inline asm
symbolic names for gcc inline assembly are much easier to read
7 years ago
Arjan van de Ven
df31ec064e
Add AVX512 support to the dgemv_n_microk_haswell-4.c kernel
Now that the kernel is written in C-with-intrinsics, adding
AVX512 support to this kernel is trivial and yields a pretty significant
performance increase
7 years ago
Arjan van de Ven
e52d01cfe7
Also make the kernel_4x2 use intrinsics for readability and consistency
7 years ago
Arjan van de Ven
4a8ae8b8aa
replace the hasell dgemv_kernel_4x4 kernel with a the same code written in intrinsics
using intrinsics is a bit easier to read (at least for the non-math part of the code)
and also allows the compiler to be better about register allocation and optimizing the
non-math (loop/setup) code.
It also allows the code to honor the "no fma" flag if the user so desires.
The result of this change is (measured for a size of 16) a 15% performance increase.
And it is a step towards being able to add an AVX512 version of the code.
7 years ago
Arjan van de Ven
350531e76a
dgemv_n_microk_haswell: Use symbolic names for asm inputs to make the code more readable
gcc assembly syntax supports symbolic names in addition to numeric parameter order;
it's generally more readable to have code use the symbolic names
7 years ago
Martin Kroeker
9e65430504
Merge pull request #1703 from wsttiger/cmake_fix
Set EXPORT_NAME to match OpenBLASConfig.cmake
7 years ago
Martin Kroeker
2cfa86b406
Merge pull request #1707 from extrowerk/haiku_support
Haiku supporting patches
7 years ago
Scott Thornton
2a9a9389ef
Added target_include_directories()
7 years ago
Zoltán Mizsei
6463bffd59
Haiku supporting patches
7 years ago
Martin Kroeker
8ef7d4fb54
Merge pull request #1706 from oon3m0oo/develop
Fix #1705 where we incorrectly calculate page locations.
7 years ago
Craig Donner
6400868e55
Fix #1705 where we incorrectly calculate page locations.
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly. Now we detect if we've
found enough pages for the allocation and terminate the loop.
7 years ago
Scott Thornton
8ebf541e97
Set EXPORT_NAME to match OpenBLASConfig.cmake
7 years ago
Martin Kroeker
b03ae3f4dc
Set version to 0.3.3.dev
7 years ago
Martin Kroeker
2cc8fb0ad2
Set version to 0.3.3.dev
7 years ago
Martin Kroeker
64826a0d7d
Merge branch 'release-0.3.0' into develop
7 years ago
Martin Kroeker
25f2d25cfe
Merge pull request #1697 from martin-frbg/issue1696
Do not treat WIndows UWB builds as cross-compiling
7 years ago
Martin Kroeker
73131fa30a
Do not treat WIndows UWB builds as cross-compiling
7 years ago
Martin Kroeker
66fcdd5be8
Merge pull request #1695 from martin-frbg/issue1692
Unset memory table entry, not just the local pointer to it on shutdown
7 years ago
Martin Kroeker
43ac839c16
Unset memory table entry, not just the temporary pointer to it on shutdown
to fix crash with multiple instances of OpenBLAS, #1692
7 years ago
Martin Kroeker
7ba5936ecd
Merge pull request #1688 from martin-frbg/issue1673
Temporarily disable special handling of OPENMP thread memory allocation
7 years ago
Martin Kroeker
b14f44d2ad
Temporarily disable special handling of OPENMP thread memory allocation
for issue #1673
7 years ago
Martin Kroeker
e71d70ba87
Merge pull request #1681 from martin-frbg/issue1671
Add cpu identification via mfpvr call for the BSDs
7 years ago
Martin Kroeker
d671870f5f
Merge pull request #1684 from martin-frbg/issue1672
Work around utest failures in the MIPS64 SICORTEX target
7 years ago
Martin Kroeker
4e103c822c
typo fix
7 years ago
Martin Kroeker
d2142760e0
Fix precision problem in DSDOT
7 years ago
Martin Kroeker
2fbfc64da8
Use C kernels for default c/zAXPY, xROT, c/zSWAP
7 years ago
Martin Kroeker
8d5b33b6be
Add cpu identification via mfpvr call for the BSDs
fixes #1671
7 years ago
Martin Kroeker
36aea5ce2d
Merge pull request #1680 from martin-frbg/snprint
Fix wrong redefinitions of snprintf for older MSVC
7 years ago
Martin Kroeker
1309711e24
Fix declaration of snprintf for older MSVC
_snprintf_s takes an additional (size) argument, so is no direct replacement.
(Note that this code is currently unused - the two instances of snprintf here are within ifdef blocks that are not compiled for MSVC)
7 years ago
Martin Kroeker
571e9de2ac
Fix definition of snprintf for MSVC
MS _snprintf_s takes an additional argument for the size of the buffer, so is not a direct replacement (utest/ctest.h from which I copied was wrong)
7 years ago
Martin Kroeker
448ed15115
Merge pull request #1678 from martin-frbg/issue1677
Define snprintf for older versions of MSVC
7 years ago
Martin Kroeker
045fb5ea2c
Define snprintf for older versions of MSVC
for #1677
7 years ago
Martin Kroeker
4dd70d98d7
Merge pull request #1667 from xianyi/revert-1642-develop
Revert "Rewrite &= -> = and simplify the initial blocking phase."
7 years ago
Martin Kroeker
504310eeb9
Merge pull request #1665 from martin-frbg/cpuid-ryzen2
Add cpuid for AMD Ryzen 2
7 years ago
Martin Kroeker
ea1f39518f
Merge pull request #1663 from martin-frbg/issue1641
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
7 years ago
Martin Kroeker
5f2a3c05cd
Revert "Rewrite &= -> = and simplify the initial blocking phase."
7 years ago
Martin Kroeker
d0ec4325cf
Add cpuid for AMD Ryzen 2
7 years ago
Martin Kroeker
3f73e8b8cf
Add cpuid for AMD Ryzen 2
for #1664
7 years ago
Martin Kroeker
a83f01e0ee
Merge pull request #1662 from martin-frbg/cmake-avx512
Add -march=skylake-avx512 to AVX512 compile check and suppress its ou…
7 years ago