Martin Kroeker
f14435cb4b
Merge pull request #3810 from martin-frbg/fix3800
Add fallbacks to RaptorLake entry from PR3800
3 years ago
Martin Kroeker
1865b15240
Add fallbacks to RaptorLake entry
3 years ago
Martin Kroeker
4743d80c22
Merge pull request #3800 from thrasibule/raptorlake
add raptor lake ids
3 years ago
Martin Kroeker
5d02f2e83e
Merge pull request #3806 from martin-frbg/dyn_coop
Fix OPENBLAS_CORETYPE=COOPERLAKE not working in DYNAMIC_ARCH builds
3 years ago
Martin Kroeker
da6e426b13
fix Cooperlake not selectable via environment variable
3 years ago
Martin Kroeker
62a44c9c5d
Merge pull request #3804 from martin-frbg/issue3803
Remove excess initializer (leftover from rework of PR 3793)
3 years ago
Martin Kroeker
c9d78dc3b2
Remove excess initializer (leftover from rework of PR 3793)
3 years ago
Martin Kroeker
65338a9493
Merge pull request #3799 from bartoldeman/cscal-zscal-no-fma
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
3 years ago
Martin Kroeker
03bd1157d8
Merge pull request #3793 from imzhuhl/new_sbgemm
New sbgemm implementation for Neoverse N2
3 years ago
Guillaume Horel
e27ad3a6cc
add raptor lake ids
3 years ago
Honglin Zhu
79066b6bf3
Change file name to match the norm and delete useless code.
3 years ago
Bart Oldeman
e7e3aa2948
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.
Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f for more information.
This uses the same syntax as used in 22aa81f for zarch (s390x).
3 years ago
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
3 years ago
Honglin Zhu
843e9fd0b9
Fix typo error
3 years ago
Honglin Zhu
b00d5b9746
New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
2. Padding k to a power of 4.
3 years ago
Martin Kroeker
8c10f0abba
Merge pull request #3794 from bartoldeman/benchmark-align-malloc
Benchmarks: align malloc'ed buffers.
3 years ago
Bart Oldeman
9e6b060bf3
Fix comment.
It stores the pointer, not an offset (that would be an alternative approach).
3 years ago
Bart Oldeman
9959a60873
Benchmarks: align malloc'ed buffers.
Benchmarks should allocate with cacheline (often 64 bytes) alignment
to avoid unreliable timings. This technique, storing the offset in the
byte before the pointer, doesn't require C11's aligned_alloc for
compatibility with older compilers.
For example, Glibc's x86_64 malloc returns 16-byte aligned buffers, which is
not sufficient for AVX/AVX2 (32-byte preferred) or AVX512 (64-byte).
3 years ago
Martin Kroeker
ad424fce08
Merge pull request #3791 from martin-frbg/issue3790
Fix pkgconfig file generation for INTERFACE64 builds
3 years ago
Martin Kroeker
5f72415f10
Suffix the pkgconfig file itself in INTERFACE64 builds
3 years ago
Martin Kroeker
747ade5adf
fix INTERFACE64/USE64BITINT reporting
3 years ago
Martin Kroeker
8bacea1254
Pass libsuffix to openblas.pc and fix passing of INTERFACE64/USE64BITINT flag
3 years ago
Martin Kroeker
b2523471c9
Add libsuffix support
3 years ago
Martin Kroeker
11b2570c13
Merge pull request #3786 from martin-frbg/issue3784
Disable the gfortran tree vectorizer for lapack-netlib
3 years ago
Martin Kroeker
ab6009b0b6
Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads
Add `OPENBLAS_DEFAULT_NUM_THREADS`
3 years ago
Martin Kroeker
32566bfb44
Disable the gfortran tree vectorizer for netlib LAPACK
3 years ago
Martin Kroeker
57809526c4
Disable the gfortran tree vectorizer for lapack-netlib
3 years ago
Martin Kroeker
eece0dfd14
Merge pull request #3781 from martin-frbg/issue3779
Fix building with only a subset of variable types on Windows
3 years ago
Martin Kroeker
db50ab4a72
Add BUILD_vartype defines
3 years ago
Martin Kroeker
a84a8a7096
Merge pull request #3778 from martin-frbg/issue3775
Fix misdetection of gfortran on Cray systems
3 years ago
Martin Kroeker
79d842047a
Move Cray case after GNU as Cray builds of gfortran have both names in the version string
3 years ago
Martin Kroeker
5e78493d95
Move Cray case after GNU as Cray builds of gfortran have both names in the version string
3 years ago
Elliot Saba
d2ce93179f
Add `OPENBLAS_DEFAULT_NUM_THREADS`
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.
The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`. Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.
[0] https://github.com/JuliaLang/julia/pull/46844
3 years ago
Martin Kroeker
8e851160d7
Merge pull request #3772 from siko1056/develop
Support CONSISTENT_FPCSR on aarch64 systems
3 years ago
Martin Kroeker
cf132deb14
Merge pull request #3774 from sashashura/patch-1
GitHub Workflows security hardening
3 years ago
Martin Kroeker
6077d81161
Merge pull request #3777 from martin-frbg/fixmips64generic2
Fix MIPS64_GENERIC copyobj declarations for DYNAMIC_ARCH
3 years ago
Martin Kroeker
f6f35a4288
fix copyobj declarations to work with DYNAMIC_ARCH
3 years ago
Alex
c726604319
build: harden dynamic_arch.yml permissions
Signed-off-by: Alex <aleksandrosansan@gmail.com>
3 years ago
Alex
4de8e1b8f9
build: harden mips64.yml permissions
Signed-off-by: Alex <aleksandrosansan@gmail.com>
3 years ago
Alex
11cd108095
build: harden nightly-Homebrew-build.yml permissions
Signed-off-by: Alex <aleksandrosansan@gmail.com>
3 years ago
Kai T. Ohlhus
c2892f0e31
Makefile.rule: update CONSISTENT_FPCSR documentation
3 years ago
Kai T. Ohlhus
84453b924f
Support CONSISTENT_FPCSR on AARCH64
3 years ago
Martin Kroeker
667d0e0b48
Merge pull request #3771 from martin-frbg/fixmips64generic
Add KERNEL file for MIPS64_GENERIC as a copy of GENERIC
3 years ago
Martin Kroeker
b1d69fb3ac
Add MIPS64_GENERIC as a copy of GENERIC
3 years ago
Martin Kroeker
63d063cb6d
Merge pull request #3769 from XiWeiGu/mips64-test
[WIP,Testing]: Add test for mips64
3 years ago
gxw
edea1bcfaf
MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500
3 years ago
gxw
548a11b9d9
[WIP,Testing]: Add test for mips64
3 years ago
Martin Kroeker
47120f20ca
Merge pull request #3768 from martin-frbg/fixwarnings
Fix some warnings in x86_64 kernels
3 years ago
Martin Kroeker
101a2c77c3
Fix warnings
3 years ago
Martin Kroeker
7ee3cab4ff
Merge pull request #3767 from martin-frbg/decl_adaptive
Fix missing external declaration of openblas_omp_adaptive_env()
3 years ago