Martin Kroeker
90f890ee67
fix improper function prototypes (empty parentheses) (USE_TLS branch)
2 years ago
Martin Kroeker
cf2174fb69
fix improper function prototypes (empty parentheses)
2 years ago
Martin Kroeker
c6b1d8e7a3
fix improper function prototypes (empty parentheses)
2 years ago
Martin Kroeker
7e939fb831
Fix handling of additional buffer structures in case of overflow
2 years ago
Tiziano Müller
6a611db560
memory: show correct number of max threads
2 years ago
Martin Kroeker
c2f4bdbbb4
Merge pull request #4163 from martin-frbg/issue4017
Rework OpenMP thread count limit handling
2 years ago
Martin Kroeker
9ff84dc3f2
remove unused status variable
2 years ago
Martin Kroeker
3326b924b3
remove status variable blas_num_threads_set; initialize openmp thread maximum on startup
2 years ago
Chris Sidebottom
f971ef55f2
Add ARMV8SVE to AArch64 Dynamic Dispatch
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.
To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2 years ago
Martin Kroeker
3bdcf3259d
Merge branch 'xianyi:develop' into issue4101
2 years ago
Martin Kroeker
b34f19a365
Ensure that a premature call to set_num_threads will not overwrite unrelated memory
2 years ago
Martin Kroeker
66904f8148
Ensure that a premature call will not overwrite unrelated memory
2 years ago
Martin Kroeker
5c58994eb2
Add fallback warning
2 years ago
Martin Kroeker
ca7199f249
Treat newer Neoverse as N1 if SVE unavailable (may be disabled in container/cloud env)
2 years ago
Martin Kroeker
616fdea82a
Revert "Improve Windows threading performance scaling"
3 years ago
Mark Seminatore
d6991dd230
fix missing #endif
3 years ago
Mark Seminatore
7783a9af02
attempt to fix old mingw gcc issue
3 years ago
Mark Seminatore
8caabc5982
fix #4063 remove unused pool_lock
3 years ago
Mark Seminatore
d301649430
fix #4063 threading perf issues on Windows
3 years ago
Honglin Zhu
9e80a194d6
Fix dynamic_list build and gcc version check error
3 years ago
Honglin Zhu
0b83088887
spr dynamic arch support
3 years ago
Martin Kroeker
e5538a62cb
Add suggestions to NUM_THREADS/auxiliary buffer message
3 years ago
Martin Kroeker
579bc86671
remove call to omp_set_num_threads
3 years ago
Martin Kroeker
e298d613fa
initialize status variable for openblas_set_num_threads
3 years ago
Martin Kroeker
05aa88268f
add status variable for openblas_set_num_threads
3 years ago
Martin Kroeker
e38ab079a0
Fix OpenMP thread counting returning places rather than cores
3 years ago
Martin Kroeker
d4868babbc
Fix typos
3 years ago
Martin Kroeker
18c99d3e63
Update dynamic_arm64.c
3 years ago
Martin Kroeker
186a310f92
Update dynamic_arm64.c
3 years ago
Martin Kroeker
da6e426b13
fix Cooperlake not selectable via environment variable
3 years ago
Martin Kroeker
ab6009b0b6
Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads
Add `OPENBLAS_DEFAULT_NUM_THREADS`
3 years ago
Martin Kroeker
db50ab4a72
Add BUILD_vartype defines
3 years ago
Elliot Saba
d2ce93179f
Add `OPENBLAS_DEFAULT_NUM_THREADS`
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.
The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`. Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.
[0] https://github.com/JuliaLang/julia/pull/46844
3 years ago
Kai T. Ohlhus
84453b924f
Support CONSISTENT_FPCSR on AARCH64
3 years ago
Martin Kroeker
9402df5604
Fix missing external declaration
3 years ago
Martin Kroeker
bd30120ba7
Merge pull request #3720 from FlyGoat/mips64
Make it work on general MIPS64 processors
3 years ago
Jiaxun Yang
fae9368f14
Implement DYNAMIC_LIST for MIPS64
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Jiaxun Yang
b633eb79f2
Use $at as temporary register for mips/loongson CPUCFG read
Some compilers (namely LLVM) are not happy with clobbering
registers in inline assembly.
Use $at as temporary register and explicitly use noat
hint.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Martin Kroeker
19fefd100e
Merge pull request #3703 from martin-frbg/omp_adaptive
Add env variable OMP_ADAPTIVE to control OMP threadpool behaviour
3 years ago
Jiaxun Yang
19d4f90c44
Use auvx to detect CPUCFG on mips/loongson
It's safer and easier than SIGILL.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Martin Kroeker
d0ba257de0
Merge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch
LoongArch64: Add DYNAMIC_ARCH support
3 years ago
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
3 years ago
Martin Kroeker
80cdfed7b2
Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size
3 years ago
Martin Kroeker
08e3754b39
Add environment variable OMP_ADAPTIVE
3 years ago
Martin Kroeker
30473b6a9d
add openblas_getaffinity()
3 years ago
Martin Kroeker
daca01622b
fix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH ( #3700 )
* fix detection of Neoverse V1 and user-enforced selection of N2
3 years ago
Honglin Zhu
d5ca477f42
Neoverse N2: DYNAMIC_ARCH
4 years ago
Martin Kroeker
69148ae795
Guard against sysconf returning zero processors
4 years ago
Martin Kroeker
e9260f5451
Guard against system call returning zero processors
4 years ago