Martin Kroeker
19fefd100e
Merge pull request #3703 from martin-frbg/omp_adaptive
Add env variable OMP_ADAPTIVE to control OMP threadpool behaviour
3 years ago
Martin Kroeker
d0ba257de0
Merge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch
LoongArch64: Add DYNAMIC_ARCH support
3 years ago
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
3 years ago
Martin Kroeker
80cdfed7b2
Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size
3 years ago
Martin Kroeker
08e3754b39
Add environment variable OMP_ADAPTIVE
3 years ago
Martin Kroeker
30473b6a9d
add openblas_getaffinity()
3 years ago
Martin Kroeker
daca01622b
fix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH ( #3700 )
* fix detection of Neoverse V1 and user-enforced selection of N2
3 years ago
Honglin Zhu
d5ca477f42
Neoverse N2: DYNAMIC_ARCH
3 years ago
Martin Kroeker
69148ae795
Guard against sysconf returning zero processors
3 years ago
Martin Kroeker
e9260f5451
Guard against system call returning zero processors
3 years ago
Martin Kroeker
2c62096fce
Expand cpu mapping for future Zen cpus and use feature-based fallback for unknown AMD family codes
3 years ago
Adam Niederer
69f2ac4ea2
Fix broken elif in dynamic.c
This fixes compilation in the following case:
$(MAKE) USE_OPENMP=1 USE_THREAD=1 NO_LAPACK=0 DYNAMIC_ARCH=1 \
DYNAMIC_LIST="HASWELL SKYLAKEX ATOM COOPERLAKE SAPPHIRERAPIDS ZEN"
3 years ago
Martin Kroeker
8d5a9c2f98
Merge pull request #3565 from jonaszhou1/develop
Support Zhaoxin/Centaur kh40000 as ZEN
3 years ago
Martin Kroeker
bf4642eb7e
Report USE_TLS if set
3 years ago
JonasZhou
2d0ad89b0d
Support Zhaoxin/Centaur kh40000 as ZEN
Signed-off-by: JonasZhou <JonasZhou@zhaoxin.com>
3 years ago
Martin Kroeker
fa3e9f25e6
Support AVX512-enabled Alder Lake
4 years ago
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
4 years ago
Martin Kroeker
7f0b11fbc1
Exclude some complex drivers when NO_LAPACK is set
4 years ago
Martin Kroeker
b6b024232d
Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
4 years ago
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
4 years ago
Martin Kroeker
b329e45288
Guard against omp_get_num_places returning zero
4 years ago
Martin Kroeker
07fe5b19a4
typecast function pointers
4 years ago
Martin Kroeker
6ed52576f8
Add feature-based fallback for unknown x86_64 cpus
4 years ago
Martin Kroeker
7a7fbb11c3
define "unlikely" on non-cygwin too
4 years ago
Martin Kroeker
b31349c22a
Open up delayed (re)init to non-Cygwin OS as well
4 years ago
Martin Kroeker
c8d05aa7a5
Move the threads overflow flag under the protection of the local blas lock ( #3476 )
* Move accesses to the overflow flag into the scope of the blas lock
4 years ago
Rafael Cardoso Fernandes Sousa
214fbcee15
Fix cmake for power
4 years ago
Martin Kroeker
4f057bffd6
Fix NULL pointer checks in blas_memory_alloc
4 years ago
Martin Kroeker
08f8bb66c0
Add CPUIDs for Alder Lake and other recent Intel cpus
4 years ago
Martin Kroeker
efb16fafb0
Fix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE ( #3437 )
* return OMP places (if available, or SC_NPROCESSORS_CONF) for maximum thread count when built with OpenMP
4 years ago
Marius Hillenbrand
77747bc536
cpuid_zarch/hwcaps: add documentation and dump hwcaps in init
Add pointers to the definition of the hardware capability flags in glibc
and describe how they relate to the levels CPU_Z13 and CPU_Z14 for
optimized kernels.
To aid identifying available hardware capabilities and in debugging
potential build issues, dump their value in dynamic_arch_init() when
OPENBLAS_VERBOSE is set to 2 or higher.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years ago
Martin Kroeker
22a616bd8f
Add model number for Tiger Lake H (mobile variant)
4 years ago
Marius Hillenbrand
44950ca173
s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice
On s390x, the run-time detection for DYNAMIC_ARCH and the compile-time
choice in cpuid_zarch use different methods for identifying the
supported CPU features. To make cpuid_zarch future-proof and both easier
to maintain, switch cpuid_zarch to the same mechanism as DYNAMIC_ZARCH
(i.e., derive the supported CPU features from hwcap flags) and share
code between both (in a new header cpuid_zarch.h).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years ago
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
4 years ago
Rafael Cardoso Fernandes Sousa
0e8b4adf22
Remove unused commented code (#if directive)
4 years ago
Martin Kroeker
fa8bf57768
Merge pull request #3380 from martin-frbg/structwarn
Remove extraneous qualifiers from struct definition
4 years ago
Martin Kroeker
dd09f0173e
Remove extraneous qualifiers from struct definition
4 years ago
Martin Kroeker
2f8220d757
Add sbgemm
4 years ago
Martin Kroeker
5f6a609253
Add sbgemv
4 years ago
Wangyang Guo
045ed5c91d
sbgemm: fix build error in BFLOAT16 disabled
4 years ago
Wangyang Guo
8356a604f0
sbgemm: cooperlake: tuning for block params
4 years ago
Martin Kroeker
cd10d1c03b
Fix typo
4 years ago
Martin Kroeker
2db1a99aca
Clean up debug messages
4 years ago
Martin Kroeker
89fc5b8f4f
Fix unmap logic
4 years ago
Martin Kroeker
7fd12a5e69
Add likely() hints for gcc
4 years ago
Martin Kroeker
2ba9a567aa
Fix typo
4 years ago
Martin Kroeker
b4b952eece
Add auxiliary tracking space for thread buffer frees too
4 years ago
Martin Kroeker
7d1becc575
Allocate an auxiliary struct when running out of preconfigured threads
4 years ago
Martin Kroeker
898212efcd
Actually add the message to the TLS section
4 years ago
Martin Kroeker
210a1584c5
Rebase source and edit TLS version of the message as well
4 years ago