Martin Kroeker
03a2bf2602
Fix potential memory leak in cpu enumeration on Linux ( #2008 )
* Fix potential memory leak in cpu enumeration with glibc
An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003
7 years ago
Martin Kroeker
69edc5bbe7
Restore dropped patches in the non-TLS branch of memory.c ( #2004 )
* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002 , the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f rather than a8002e2 , thereby dropping the commits for #1450 , #1468 , #1501 , #1504 and #1520 .
7 years ago
caiyu
29dc72889f
Add support for Hygon Dhyana
7 years ago
Martin Kroeker
dbc9a060ef
Fix missing braces in support_av() call
7 years ago
Martin Kroeker
21c0f2af7b
Merge pull request #1957 from martin-frbg/issue1954
Move TLS key deletion to openblas_quit
7 years ago
Martin Kroeker
ad2c386d6a
Move TLS key deletion to openblas_quit
fixes #1954 (as suggested by thrasibule in that issue)
7 years ago
Martin Kroeker
31ed19e8b9
Add message for SkylakeX and KNL fallbacks to Haswell
7 years ago
Martin Kroeker
e1574fa2b4
Add xcr0 (os support) check
7 years ago
Martin Kroeker
ae1d1f74f7
Query AVX2 and AVX512 capability for runtime cpu selection
7 years ago
Martin Kroeker
8643521127
Merge pull request #1943 from martin-frbg/issue1748
Re-enable loop unrolling in trmv and remove the scary warning
7 years ago
Martin Kroeker
5a720cf9ca
Re-enable loop unrolling in trmv and remove the scary warning
fixes #1748 as that half of the fix for #1332 appears to have been an overreaction on my part.
7 years ago
Martin Kroeker
ccd5945d38
Merge pull request #1942 from martin-frbg/issue1720
Delete the pthread key on cleanup in TLS mode
7 years ago
Martin Kroeker
bba1e67269
Delete the pthread key on cleanup in TLS mode
to avoid a crash when OpenBLAS was loaded via dlopen and libc tries to clean up the leaked TLS after dlclose
Fixes #1720
7 years ago
Martin Kroeker
f343ed65b5
Avoid taking the root of a negative number
Fixes #1924 where numpy 1.17+ would report the (transient) FE_INVALID exception raised for the domain error.
7 years ago
Martin Kroeker
0bf6d74e5f
Fix typo in previous commit for arm dynamic arch
7 years ago
Martin Kroeker
2b355592e3
Make sure to use the arm version of dynamic.c in ARM64 DYNAMIC_ARCH
cf. #1908
7 years ago
Andrew
2601cd58ab
remove surplus locking code , only enabled w x86, disabled or never enabled on all others
7 years ago
Martin Kroeker
97d7298973
call it OpenBLAS not just version
7 years ago
Martin Kroeker
de0d0ed52f
Improve formatting of config output
7 years ago
Martin Kroeker
816775e309
Add version information to openblas_get_config output
7 years ago
Martin Kroeker
f72fdf525c
Merge pull request #1875 from martin-frbg/issue1851
Serialize accesses to parallelized level3 functions from multiple cal…
7 years ago
Martin Kroeker
113cb00b95
fix missing parenthesis
7 years ago
Martin Kroeker
5192651706
Add CriticalSection handling instead of mutexes for Windows
7 years ago
Martin Kroeker
2e6fae2aad
Serialize accesses to parallelized level3 functions from multiple callers
for #1851
7 years ago
Martin Kroeker
368d14f8c8
Fix harmless typo
fixes #1872
7 years ago
Martin Kroeker
0427277cef
Allow optimization for small m, large n only if it can be made threadsafe
otherwise the introduction of a static array in 8e5a108 to improve #532 breaks concurrent calls from multiple threads as seen in #1844
7 years ago
Arjan van de Ven
5b708e5eb1
sgemm/dgemm: add a way for an arch kernel to specify prefered sizes
The current gemm threading code can make very unfortunate choices, for
example on my 10 core system a 1024x1024x1024 matrix multiply ends up
chunking into blocks of 102... which is not a vector friendly size
and performance ends up horrible.
this patch adds a helper define where an architecture can specify
a preference for size multiples.
This is different from existing defines that are minimum sizes and such.
The performance increase with this patch for the 1024x1024x1024 sgemm
is 2.3x (!!)
7 years ago
Martin Kroeker
f5595d0262
Merge pull request #1843 from martin-frbg/aix_numprocs
Add get_num_procs implementation for AIX
7 years ago
Martin Kroeker
326d394a0f
Add get_num_procs implementation for AIX
(and copy HAIKU implementation to the non-TLS version of the code as well)
7 years ago
Erik M. Bray
38cf5d9364
ensure that threading has been initialized in the first place before calling openblas_set_num_threads
7 years ago
Ashwin Sekhar T K
d5aeff636f
ARM64: Enable DYNAMIC_ARCH
Enable DYNAMIC_ARCH feature on ARM64. This patch uses the cpuid
feature in linux kernel to detect the core type at runtime
(https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt ).
If this feature is missing in kernel, then the user should use the
OPENBLAS_CORETYPE env variable to select the desired core type.
7 years ago
Ashwin Sekhar T K
d50abc8903
ARM64: Move parameters from parameter.c to param.h
Remove the runtime setting of P, Q, R parameters for
targets ARMV8, THUNDERX2T99. Instead set them as constants
in param.h at compile time.
7 years ago
Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
7 years ago
Andrew
3439158dea
address #1782 2nd loop
7 years ago
Martin Kroeker
28aa94bf4b
Include thread numbers in failure message from blas_thread_init
to aid in debugging cases like #1767
7 years ago
Martin Kroeker
1ad1e79062
Catch inadvertent USE_TLS=0 declaration
for #1766
7 years ago
Martin Kroeker
b402626509
Do not use the new TLS code for non-threaded builds even if USE_TLS is set
Workaround for #1761 as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway).
7 years ago
Martin Kroeker
b55690a659
typo fix
7 years ago
Martin Kroeker
b902a40986
Rewrite glibc version check
7 years ago
Martin Kroeker
5991d1a6cd
Update memory.c
7 years ago
Martin Kroeker
b1b743f434
Merge branch 'develop' into interim033
7 years ago
Martin Kroeker
fd42ca462d
Combo of default pre-0.3.1 memory.c and band-aided version of PR1739
7 years ago
Zoltán Mizsei
6463bffd59
Haiku supporting patches
7 years ago
Martin Kroeker
8ef7d4fb54
Merge pull request #1706 from oon3m0oo/develop
Fix #1705 where we incorrectly calculate page locations.
7 years ago
Craig Donner
6400868e55
Fix #1705 where we incorrectly calculate page locations.
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly. Now we detect if we've
found enough pages for the allocation and terminate the loop.
7 years ago
Martin Kroeker
66fcdd5be8
Merge pull request #1695 from martin-frbg/issue1692
Unset memory table entry, not just the local pointer to it on shutdown
7 years ago
Martin Kroeker
43ac839c16
Unset memory table entry, not just the temporary pointer to it on shutdown
to fix crash with multiple instances of OpenBLAS, #1692
7 years ago
Martin Kroeker
7ba5936ecd
Merge pull request #1688 from martin-frbg/issue1673
Temporarily disable special handling of OPENMP thread memory allocation
7 years ago
Martin Kroeker
b14f44d2ad
Temporarily disable special handling of OPENMP thread memory allocation
for issue #1673
7 years ago
Martin Kroeker
36aea5ce2d
Merge pull request #1680 from martin-frbg/snprint
Fix wrong redefinitions of snprintf for older MSVC
7 years ago