Martin Kroeker
d4d3113adc
Merge pull request #1731 from fenrus75/readme
add short blurb about avx512 and needed compiler to README
7 years ago
Martin Kroeker
375dff54fc
Merge pull request #1733 from fenrus75/dsymv
Add an AVX512 enabled DSYMV (L) function
7 years ago
Martin Kroeker
a5f165275a
Merge pull request #1732 from fenrus75/dgemv
Add an AVX512 enabled DGEMV (n) function
7 years ago
Martin Kroeker
8c13aa495a
Merge pull request #1730 from fenrus75/fix-sdot
Fix typo in sdot function
7 years ago
Martin Kroeker
1ee6d087c3
Merge pull request #1729 from fenrus75/dscal
Add an AVX512 enabled DSCAL function
7 years ago
Martin Kroeker
a95a784ab2
Merge pull request #1723 from maamountki/develop
Disable zgemv scale in gemv benchmark by default
7 years ago
Arjan van de Ven
9bec34cb67
Add an AVX512 enabled DSYMV (L) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
7 years ago
Arjan van de Ven
87bebdbd8a
Add an AVX512 enabled DGEMV (n) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
7 years ago
Arjan van de Ven
9493f26309
add short blurb about avx512 and needed compiler to README
7 years ago
Arjan van de Ven
36add7570a
Fix typo in sdot function
it looks like my previous pull request was short the final commit;
fix a typo in sdot
7 years ago
Arjan van de Ven
cacacc8007
Add an AVX512 enabled DSCAL function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
7 years ago
Martin Kroeker
1a00ef3d27
Merge pull request #1725 from fenrus75/axpy
Add a AVX512 enabled SAXPY/DAXPY functions
7 years ago
Martin Kroeker
4c0d832ec3
Merge pull request #1724 from fenrus75/sdot
Add an AVX512 enabled SDOT function
7 years ago
Martin Kroeker
fc33cbc7bb
Merge pull request #1728 from martin-frbg/changelog
Add changes from the 0.3.x releases
7 years ago
Martin Kroeker
c52a831ae4
Add changes from the 0.3.x releases
fixes #1727
7 years ago
Arjan van de Ven
2e99873ff7
Add a AVX512 enabled SAXPY/DAXPY functions
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
7 years ago
Arjan van de Ven
00abaa865b
Add an AVX512 enabled SDOT function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
7 years ago
maamountki
33043f563f
Disable scal to benchmark zgemv separately by default
7 years ago
Martin Kroeker
66da7677bd
Merge pull request #1721 from fenrus75/ddot2
Add an AVX512 enabled DDOT function
7 years ago
Arjan van de Ven
7932ff3ea9
Add an AVX512 enabled DDOT function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
7 years ago
Martin Kroeker
62f4c69708
Merge pull request #1717 from martin-frbg/issue1708
Add workaround for avx512 compilations on Cygwin
7 years ago
maamountki
453bfa7e71
[ZARCH] Restore detect() function
7 years ago
maamountki
23229011db
[ZARCH] Z14 support, BLAS 1/2 single precision implementations, Some missing double precision implementations, Gemv optimization
7 years ago
Martin Kroeker
73478664d4
Add workaround for avx512 compilations on Cygwin
fixes #1708
7 years ago
Martin Kroeker
ee955757f9
Merge pull request #1715 from stevengj/patch-1
fix blasabs for windows
7 years ago
Steven G. Johnson
48610a4524
fix blasabs for windows
Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.
7 years ago
Martin Kroeker
4a553e8678
Merge pull request #1713 from martin-frbg/issue1710
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
7 years ago
Martin Kroeker
e788102c10
Merge pull request #1709 from stevengj/patch-1
fabs -> fabsl
7 years ago
Martin Kroeker
165f00c159
fabs -> fabsl
7 years ago
Martin Kroeker
40c068a875
Introduce blasabs() to switch between abs() and labs() for INTERFACE64
7 years ago
Martin Kroeker
933896a1d0
Use blasabs to switch between abs and labs as needed for INTERFACE64
7 years ago
Steven G. Johnson
a4e321400b
fabs -> fabsl
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
7 years ago
Martin Kroeker
9e65430504
Merge pull request #1703 from wsttiger/cmake_fix
Set EXPORT_NAME to match OpenBLASConfig.cmake
7 years ago
Martin Kroeker
2cfa86b406
Merge pull request #1707 from extrowerk/haiku_support
Haiku supporting patches
7 years ago
Scott Thornton
2a9a9389ef
Added target_include_directories()
7 years ago
Zoltán Mizsei
6463bffd59
Haiku supporting patches
7 years ago
Martin Kroeker
8ef7d4fb54
Merge pull request #1706 from oon3m0oo/develop
Fix #1705 where we incorrectly calculate page locations.
7 years ago
Craig Donner
6400868e55
Fix #1705 where we incorrectly calculate page locations.
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly. Now we detect if we've
found enough pages for the allocation and terminate the loop.
7 years ago
Scott Thornton
8ebf541e97
Set EXPORT_NAME to match OpenBLASConfig.cmake
7 years ago
Martin Kroeker
b03ae3f4dc
Set version to 0.3.3.dev
7 years ago
Martin Kroeker
2cc8fb0ad2
Set version to 0.3.3.dev
7 years ago
Martin Kroeker
e8a68ef261
Merge pull request #1702 from xianyi/develop
Merge develop for 0.3.2
7 years ago
Martin Kroeker
64826a0d7d
Merge branch 'release-0.3.0' into develop
7 years ago
Martin Kroeker
25f2d25cfe
Merge pull request #1697 from martin-frbg/issue1696
Do not treat WIndows UWB builds as cross-compiling
7 years ago
Martin Kroeker
73131fa30a
Do not treat WIndows UWB builds as cross-compiling
7 years ago
Martin Kroeker
66fcdd5be8
Merge pull request #1695 from martin-frbg/issue1692
Unset memory table entry, not just the local pointer to it on shutdown
7 years ago
Martin Kroeker
43ac839c16
Unset memory table entry, not just the temporary pointer to it on shutdown
to fix crash with multiple instances of OpenBLAS, #1692
7 years ago
Martin Kroeker
7ba5936ecd
Merge pull request #1688 from martin-frbg/issue1673
Temporarily disable special handling of OPENMP thread memory allocation
7 years ago
Martin Kroeker
b14f44d2ad
Temporarily disable special handling of OPENMP thread memory allocation
for issue #1673
7 years ago
Martin Kroeker
e71d70ba87
Merge pull request #1681 from martin-frbg/issue1671
Add cpu identification via mfpvr call for the BSDs
7 years ago