OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Bart Oldeman	c956271c2e	Use FMA for cscal and zscal Haswell microkernels. This patch has two benefits: 1. Using vfmaddsub231p[sd] instead of vaddsubp[sd] eliminates a vmulp[sd] instruction, giving a ~10% speedup, measured from ~33 to ~36 Gflops for sscal with 4096 elements and from ~17 to ~19 Gflops for dscal on my Kaby Lake laptop, see e.g. OPENBLAS_LOOPS=10000 benchmark/cscal.goto 4096 4096. 2. Using it for both the main loop and the tail end makes sure the same FMA instruction is used for all loop iterations, which is not the case with the current situation where the tail loop is implemented in C, if the compiler is allowed to use FMA instructions. This is important for some LAPACK eigenvalue testcases that rely on bitwise identical results independent of how many loop iterations are used.	3 years ago
Martin Kroeker	8c10f0abba	Merge pull request #3794 from bartoldeman/benchmark-align-malloc Benchmarks: align malloc'ed buffers.	3 years ago
Bart Oldeman	9e6b060bf3	Fix comment. It stores the pointer, not an offset (that would be an alternative approach).	3 years ago
Bart Oldeman	9959a60873	Benchmarks: align malloc'ed buffers. Benchmarks should allocate with cacheline (often 64 bytes) alignment to avoid unreliable timings. This technique, storing the offset in the byte before the pointer, doesn't require C11's aligned_alloc for compatibility with older compilers. For example, Glibc's x86_64 malloc returns 16-byte aligned buffers, which is not sufficient for AVX/AVX2 (32-byte preferred) or AVX512 (64-byte).	3 years ago
Martin Kroeker	ad424fce08	Merge pull request #3791 from martin-frbg/issue3790 Fix pkgconfig file generation for INTERFACE64 builds	3 years ago
Martin Kroeker	5f72415f10	Suffix the pkgconfig file itself in INTERFACE64 builds	3 years ago
Martin Kroeker	747ade5adf	fix INTERFACE64/USE64BITINT reporting	3 years ago
Martin Kroeker	8bacea1254	Pass libsuffix to openblas.pc and fix passing of INTERFACE64/USE64BITINT flag	3 years ago
Martin Kroeker	b2523471c9	Add libsuffix support	3 years ago
Martin Kroeker	11b2570c13	Merge pull request #3786 from martin-frbg/issue3784 Disable the gfortran tree vectorizer for lapack-netlib	3 years ago
Martin Kroeker	ab6009b0b6	Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads Add `OPENBLAS_DEFAULT_NUM_THREADS`	3 years ago
Martin Kroeker	32566bfb44	Disable the gfortran tree vectorizer for netlib LAPACK	3 years ago
Martin Kroeker	57809526c4	Disable the gfortran tree vectorizer for lapack-netlib	3 years ago
Martin Kroeker	eece0dfd14	Merge pull request #3781 from martin-frbg/issue3779 Fix building with only a subset of variable types on Windows	3 years ago
Martin Kroeker	db50ab4a72	Add BUILD_vartype defines	3 years ago
Martin Kroeker	a84a8a7096	Merge pull request #3778 from martin-frbg/issue3775 Fix misdetection of gfortran on Cray systems	3 years ago
Martin Kroeker	79d842047a	Move Cray case after GNU as Cray builds of gfortran have both names in the version string	3 years ago
Martin Kroeker	5e78493d95	Move Cray case after GNU as Cray builds of gfortran have both names in the version string	3 years ago
Elliot Saba	d2ce93179f	Add `OPENBLAS_DEFAULT_NUM_THREADS` This allows Julia to set a default number of threads (usually `1`) to be used when no other thread counts are specified [0], to short-circuit the default OpenBLAS thread initialization routine that spins up a different number of threads than Julia would otherwise choose. The reason to add a new environment variable is that we want to be able to configure OpenBLAS to avoid performing its initial memory allocation/thread startup, as that can consume significant amounts of memory, but we still want to be sensitive to legacy codebases that set things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`. Creating a new environment variable that is openblas-specific and is not already publicly used to control the overall number of threads of programs like Julia seems to be the best way forward. [0] https://github.com/JuliaLang/julia/pull/46844	3 years ago
Martin Kroeker	8e851160d7	Merge pull request #3772 from siko1056/develop Support CONSISTENT_FPCSR on aarch64 systems	3 years ago
Martin Kroeker	cf132deb14	Merge pull request #3774 from sashashura/patch-1 GitHub Workflows security hardening	3 years ago
Martin Kroeker	6077d81161	Merge pull request #3777 from martin-frbg/fixmips64generic2 Fix MIPS64_GENERIC copyobj declarations for DYNAMIC_ARCH	3 years ago
Martin Kroeker	f6f35a4288	fix copyobj declarations to work with DYNAMIC_ARCH	3 years ago
Alex	c726604319	build: harden dynamic_arch.yml permissions Signed-off-by: Alex <aleksandrosansan@gmail.com>	3 years ago
Alex	4de8e1b8f9	build: harden mips64.yml permissions Signed-off-by: Alex <aleksandrosansan@gmail.com>	3 years ago
Alex	11cd108095	build: harden nightly-Homebrew-build.yml permissions Signed-off-by: Alex <aleksandrosansan@gmail.com>	3 years ago
Kai T. Ohlhus	c2892f0e31	Makefile.rule: update CONSISTENT_FPCSR documentation	3 years ago
Kai T. Ohlhus	84453b924f	Support CONSISTENT_FPCSR on AARCH64	3 years ago
Martin Kroeker	667d0e0b48	Merge pull request #3771 from martin-frbg/fixmips64generic Add KERNEL file for MIPS64_GENERIC as a copy of GENERIC	3 years ago
Martin Kroeker	b1d69fb3ac	Add MIPS64_GENERIC as a copy of GENERIC	3 years ago
Martin Kroeker	63d063cb6d	Merge pull request #3769 from XiWeiGu/mips64-test [WIP,Testing]: Add test for mips64	3 years ago
gxw	edea1bcfaf	MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500	3 years ago
gxw	548a11b9d9	[WIP,Testing]: Add test for mips64	3 years ago
Martin Kroeker	47120f20ca	Merge pull request #3768 from martin-frbg/fixwarnings Fix some warnings in x86_64 kernels	3 years ago
Martin Kroeker	101a2c77c3	Fix warnings	3 years ago
Martin Kroeker	7ee3cab4ff	Merge pull request #3767 from martin-frbg/decl_adaptive Fix missing external declaration of openblas_omp_adaptive_env()	3 years ago
Martin Kroeker	9402df5604	Fix missing external declaration	3 years ago
Martin Kroeker	dd846e72ed	Merge pull request #3766 from martin-frbg/issue3640 Add (minimal) initial support for processing with the Emscripten Javascript converter	3 years ago
Martin Kroeker	b285307e18	Add a kludge for the Emscripten js converter	3 years ago
Martin Kroeker	9773a9d6b3	undefine YIELDING for the Emscripten js converter	3 years ago
Martin Kroeker	dc856de3af	Merge pull request #3765 from martin-frbg/f2cpointer Fix pointer/integer argument mismatch in the f2c-translated LAPACK	3 years ago
Martin Kroeker	91110f92d2	fix missing return type in function declaration	3 years ago
Martin Kroeker	515cf26929	Fix pointer/integer argument mismatch in calls to pow()	3 years ago
Martin Kroeker	8273ab6ee3	Merge pull request #3764 from martin-frbg/issue3757 Fix compilation of Haswell/Zen DYNAMIC_ARCH targets with Apple clang	3 years ago
Martin Kroeker	a0a4f7c447	Add -mfma to -mavx2 for clang, and add AVX2 declaration for Zen in DYNAMIC_ARCH builds	3 years ago
Martin Kroeker	23d59baaf1	Add -mfma to -mavx2 for Apple clang, and set AVX2 options for Zen as well	3 years ago
Martin Kroeker	85758aba67	Merge pull request #3763 from XiWeiGu/issue3761 MIPS64: Using the macro MTC rather than MTC1	3 years ago
gxw	365936ae1b	MIPS64: Using the macro MTC rather than MTC1	3 years ago
Martin Kroeker	fab84910ea	Merge pull request #3758 from martin-frbg/issue3755 Remove excessive quoting of arguments in Makefile.prebuild again	3 years ago
Martin Kroeker	389e378063	Remove excessive quoting of arguments from PR3722	3 years ago

1 2 3 4 5 ...

6675 Commits (c956271c2e2d9196872f58f09d6ee3187fa0b718) All Branches Search

6675 Commits (c956271c2e2d9196872f58f09d6ee3187fa0b718)

All Branches