OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Craig Donner	c39fab3483	Support arbitrary numbers of threads for memory allocation. When I originally refactored memory.c to reduce locking, I made the (incorrect) assumption that all threads were managed by OpenBLAS. The recent Issues we've seen show that really, any caller can make its own threads and call into OpenBLAS; they don't all come from blas_server. Thus we have to be able to support an arbitrary number of threads that can come in any time. The original implementation (before my changes) dealt with this by having a single allocation table, and everyone had to lock to get access to and update it, which was expensive. Moving to thread-local allocation tables was much faster, but now we have to deal with the fact that thread local storage might not be cleaned up. This change gives each thread its own local allocation table, and completely does away with the global table. We cleanup allocations using pthreads' key destructor and Win32's DLL_THREAD_DETACH. This change also removes compiler TLS, which in the end, wasn't really worth it given the issues with the glibc implementation. The overall performance impact was < 1%, anyway. Removing it also simplifies the code. Support arbitrary numbers of threads for memory allocation. When I originally refactored memory.c to reduce locking, I made the (incorrect) assumption that all threads were managed by OpenBLAS. The recent Issues we've seen show that really, any caller can make its own threads and call into OpenBLAS; they don't all come from blas_server. Thus we have to be able to support an arbitrary number of threads that can come in any time. The original implementation (before my changes) dealt with this by having a single allocation table, and everyone had to lock to get access to and update it, which was expensive. Moving to thread-local allocation tables was much faster, but now we have to deal with the fact that thread local storage might not be cleaned up. This change gives each thread its own local allocation table, and completely does away with the global table. We cleanup allocations using pthreads' key destructor and Win32's DLL_THREAD_DETACH. This change also removes compiler TLS, which in the end, wasn't really worth it given the issues with the glibc implementation. The overall performance impact was < 1%, anyway. Removing it also simplifies the code.	8 years ago
Martin Kroeker	66da7677bd	Merge pull request #1721 from fenrus75/ddot2 Add an AVX512 enabled DDOT function	8 years ago
Arjan van de Ven	7932ff3ea9	Add an AVX512 enabled DDOT function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	8 years ago
Martin Kroeker	62f4c69708	Merge pull request #1717 from martin-frbg/issue1708 Add workaround for avx512 compilations on Cygwin	8 years ago
Martin Kroeker	73478664d4	Add workaround for avx512 compilations on Cygwin fixes #1708	8 years ago
Martin Kroeker	ee955757f9	Merge pull request #1715 from stevengj/patch-1 fix blasabs for windows	8 years ago
Steven G. Johnson	48610a4524	fix blasabs for windows Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.	8 years ago
Martin Kroeker	4a553e8678	Merge pull request #1713 from martin-frbg/issue1710 Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64	8 years ago
Martin Kroeker	e788102c10	Merge pull request #1709 from stevengj/patch-1 fabs -> fabsl	8 years ago
Martin Kroeker	165f00c159	fabs -> fabsl	8 years ago
Martin Kroeker	40c068a875	Introduce blasabs() to switch between abs() and labs() for INTERFACE64	8 years ago
Martin Kroeker	933896a1d0	Use blasabs to switch between abs and labs as needed for INTERFACE64	8 years ago
Steven G. Johnson	a4e321400b	fabs -> fabsl Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.	8 years ago
Martin Kroeker	9e65430504	Merge pull request #1703 from wsttiger/cmake_fix Set EXPORT_NAME to match OpenBLASConfig.cmake	8 years ago
Martin Kroeker	2cfa86b406	Merge pull request #1707 from extrowerk/haiku_support Haiku supporting patches	8 years ago
Scott Thornton	2a9a9389ef	Added target_include_directories()	8 years ago
Zoltán Mizsei	6463bffd59	Haiku supporting patches	8 years ago
Martin Kroeker	8ef7d4fb54	Merge pull request #1706 from oon3m0oo/develop Fix #1705 where we incorrectly calculate page locations.	8 years ago
Craig Donner	6400868e55	Fix #1705 where we incorrectly calculate page locations. Since we now use an allocation size that isn't a multiple of PAGESIZE, finding the pages for run_bench wasn't terminating properly. Now we detect if we've found enough pages for the allocation and terminate the loop.	8 years ago
Scott Thornton	8ebf541e97	Set EXPORT_NAME to match OpenBLASConfig.cmake	8 years ago
Martin Kroeker	b03ae3f4dc	Set version to 0.3.3.dev	8 years ago
Martin Kroeker	2cc8fb0ad2	Set version to 0.3.3.dev	8 years ago
Martin Kroeker	64826a0d7d	Merge branch 'release-0.3.0' into develop	8 years ago
Martin Kroeker	25f2d25cfe	Merge pull request #1697 from martin-frbg/issue1696 Do not treat WIndows UWB builds as cross-compiling	8 years ago
Martin Kroeker	73131fa30a	Do not treat WIndows UWB builds as cross-compiling	8 years ago
Martin Kroeker	66fcdd5be8	Merge pull request #1695 from martin-frbg/issue1692 Unset memory table entry, not just the local pointer to it on shutdown	8 years ago
Martin Kroeker	43ac839c16	Unset memory table entry, not just the temporary pointer to it on shutdown to fix crash with multiple instances of OpenBLAS, #1692	8 years ago
Martin Kroeker	7ba5936ecd	Merge pull request #1688 from martin-frbg/issue1673 Temporarily disable special handling of OPENMP thread memory allocation	8 years ago
Martin Kroeker	b14f44d2ad	Temporarily disable special handling of OPENMP thread memory allocation for issue #1673	8 years ago
Martin Kroeker	e71d70ba87	Merge pull request #1681 from martin-frbg/issue1671 Add cpu identification via mfpvr call for the BSDs	8 years ago
Martin Kroeker	d671870f5f	Merge pull request #1684 from martin-frbg/issue1672 Work around utest failures in the MIPS64 SICORTEX target	8 years ago
Martin Kroeker	4e103c822c	typo fix	8 years ago
Martin Kroeker	d2142760e0	Fix precision problem in DSDOT	8 years ago
Martin Kroeker	2fbfc64da8	Use C kernels for default c/zAXPY, xROT, c/zSWAP	8 years ago
Martin Kroeker	8d5b33b6be	Add cpu identification via mfpvr call for the BSDs fixes #1671	8 years ago
Martin Kroeker	36aea5ce2d	Merge pull request #1680 from martin-frbg/snprint Fix wrong redefinitions of snprintf for older MSVC	8 years ago
Martin Kroeker	1309711e24	Fix declaration of snprintf for older MSVC _snprintf_s takes an additional (size) argument, so is no direct replacement. (Note that this code is currently unused - the two instances of snprintf here are within ifdef blocks that are not compiled for MSVC)	8 years ago
Martin Kroeker	571e9de2ac	Fix definition of snprintf for MSVC MS _snprintf_s takes an additional argument for the size of the buffer, so is not a direct replacement (utest/ctest.h from which I copied was wrong)	8 years ago
Martin Kroeker	448ed15115	Merge pull request #1678 from martin-frbg/issue1677 Define snprintf for older versions of MSVC	8 years ago
Martin Kroeker	045fb5ea2c	Define snprintf for older versions of MSVC for #1677	8 years ago
Martin Kroeker	4dd70d98d7	Merge pull request #1667 from xianyi/revert-1642-develop Revert "Rewrite &= -> = and simplify the initial blocking phase."	8 years ago
Martin Kroeker	504310eeb9	Merge pull request #1665 from martin-frbg/cpuid-ryzen2 Add cpuid for AMD Ryzen 2	8 years ago
Martin Kroeker	ea1f39518f	Merge pull request #1663 from martin-frbg/issue1641 Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave	8 years ago
Martin Kroeker	5f2a3c05cd	Revert "Rewrite &= -> = and simplify the initial blocking phase."	8 years ago
Martin Kroeker	d0ec4325cf	Add cpuid for AMD Ryzen 2	8 years ago
Martin Kroeker	3f73e8b8cf	Add cpuid for AMD Ryzen 2 for #1664	8 years ago
Martin Kroeker	a83f01e0ee	Merge pull request #1662 from martin-frbg/cmake-avx512 Add -march=skylake-avx512 to AVX512 compile check and suppress its ou…	8 years ago
Martin Kroeker	a49203b48c	Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave for #1641	8 years ago
Martin Kroeker	b74aef2816	Add -march=skylake-avx512 to AVX512 compile check and suppress its output	8 years ago
Martin Kroeker	a9fa805007	Merge pull request #1660 from martin-frbg/issue1659 Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2	8 years ago

1 2 3 4 5 ...

3099 Commits (c39fab3483301a5801dbac0425ad41ef932ddb5c) All Branches Search

3099 Commits (c39fab3483301a5801dbac0425ad41ef932ddb5c)

All Branches