OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	fbda20c856	Merge pull request #94 from xianyi/develop rebase	5 years ago
Martin Kroeker	e1b7123bbe	Merge pull request #2867 from Qiyu8/usimd-floatdot Optimize the performance of dot by using universal intrinsics in X86/ARM	5 years ago
Qiyu8	f32d34a015	add sse3 compiler flag	5 years ago
Martin Kroeker	599777ecb7	Merge pull request #2879 from martin-frbg/issue2839 Default BLAS3_MEM_ALLOC_THRESHOLD on all platforms to 32	5 years ago
Martin Kroeker	a5feea6611	make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows	5 years ago
Martin Kroeker	dc8e4e1959	Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable	5 years ago
Martin Kroeker	cccd1438da	Merge pull request #93 from xianyi/develop rebase	5 years ago
Martin Kroeker	f032d8966e	Merge pull request #2874 from Flamefire/memory_fixes Avoid out of bounds access on invalid memory free	5 years ago
Martin Kroeker	f6e4cf2f9d	Merge pull request #2876 from Flamefire/omp_fork_fix Lazyly reinit threads after a fork in OMP mode	5 years ago
Martin Kroeker	9828343e12	Merge pull request #2878 from brada4/asms fix clang std=c18 compilation on aarch64	5 years ago
User User-User	d2333e7842	aarch64 fix std=c18 compilation	5 years ago
Alexander Grund	3094fc6c83	Lazyly reinit threads after a fork in OMP mode This initializes the per-thread memory buffers which get cleared/released on a fork via pthread_at_fork. Not doing so leads to each thread calling blas_memory_alloc on almost every execution which slows down the code significantly as the threads race for the memory allocation using locks to serialize that.	5 years ago
Alexander Grund	3c05f54df8	Avoid out of bounds access on invalid memory free	5 years ago
Alexander Grund	dee7c49938	Fix TABs and trailing space	5 years ago
Martin Kroeker	d3c0d6811b	Merge pull request #2873 from martin-frbg/issue2871 Check for __linux rather than linux in cpuid code and benchmarks	5 years ago
Martin Kroeker	9637cd1fd1	Merge pull request #2865 from thisch/backticks Consolidate usage of backticks for build options	5 years ago
Martin Kroeker	5464eb13ea	Change ifdef linux to __linux for C11 compatibility	5 years ago
Martin Kroeker	e1574cbc83	Change ifdef linux to __linux for C11 compatibility and add a fallback for unsupported operating systems in detect()	5 years ago
Martin Kroeker	0b2bb5696a	Change ifdef linux to __linux for C11 compatibility	5 years ago
Martin Kroeker	a7d5d0078d	Change ifdef linux to __linux for C11 compatibility	5 years ago
Martin Kroeker	be40440ec5	Change ifdef linux to __linux for C11 compatibility	5 years ago
Martin Kroeker	2bf70c8e3b	Change ifdef linux to __linux for C11 compatibility	5 years ago
Qiyu8	60e6c68e38	Adapt ARM architect	5 years ago
Martin Kroeker	64629cb5c7	Merge pull request #91 from xianyi/develop rebase	5 years ago
Qiyu8	1b1a757f5f	Optimize the performance of dot by using universal intrinsics in X86/ARM	5 years ago
Martin Kroeker	0d98ce202c	Merge pull request #2866 from RajalakshmiSR/p10_dcopy Optimize dcopy/zcopy for POWER10	5 years ago
Rajalakshmi Srinivasaraghavan	2df4235e00	Optimize dcopy/zcopy for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores. Tested in simulator and no new failures.	5 years ago
Thomas Hisch	fe8cd5ae7e	Consolidate usage of backticks for build options There were some build options in the README that were not highlighted. Now all are highlighted.	5 years ago
Martin Kroeker	ba31c8f5f9	Merge pull request #2853 from Qiyu8/usimd-daxpy Optimize the performance of daxpy by using universal intrinsics	5 years ago
Martin Kroeker	e961d4d609	Merge pull request #2864 from martin-frbg/lapack445 FIx underflow/rounding errors in LAPACK (S,D)LANV2	5 years ago
Martin Kroeker	7ed25e9e10	FIx underflow/rounding errors in LAPACK (S,D)LANV2 Reference-LAPACK PR 445, fixing their issue 263	5 years ago
Martin Kroeker	7b169379e0	Merge pull request #2863 from martin-frbg/readmefixes Readmefixes	5 years ago
Martin Kroeker	7f539fb850	Update cpu list, outline cmake build, clarify scope of set_num_threads extension	5 years ago
Martin Kroeker	caf7a12295	Merge pull request #90 from xianyi/develop rebase	5 years ago
Martin Kroeker	72b5b73647	Merge pull request #2850 from xiaojiayuan111/develop fix a bug of trmm	5 years ago
Qiyu8	881c15179f	remove default support for FMA4 on zen architect	5 years ago
Martin Kroeker	dfaafd3b55	Merge pull request #2854 from martin-frbg/travis-graviton Add an AWS-Graviton2 build to Travis CI	5 years ago
Martin Kroeker	f2e9a24e1a	Add AWS Graviton2 build	5 years ago
Martin Kroeker	61fae59298	Merge pull request #88 from xianyi/develop rebase	5 years ago
Martin Kroeker	33d22f99f1	Merge pull request #2851 from martin-frbg/travis-xcode12 Add an OSX build with xcode12	5 years ago
Martin Kroeker	5ba01dd1a8	Add an OSX build with xcode12	5 years ago
Qiyu8	14f7dad3b7	performance improved	5 years ago
y00512012	06cf73a239	fix a bug of trmm	5 years ago
Qiyu8	325b539c26	Optimize the performance of daxpy by using universal intrinsics	5 years ago
Martin Kroeker	0f112077e6	Merge pull request #2847 from mhillenibm/fixup_cscal s390x: fix cscal and zscal implementations	5 years ago
Marius Hillenbrand	22aa81f3e5	s390x: fix cscal and zscal implementations The implementation of complex scalar * vector multiplication for Z14 makes some LAPACK tests fail because the numerical differences to the reference implementation exceed the threshold (as can be seen by running make lapack-test and replacing kernel/zarch/cscal.c with a generic implementation for comparison). The complex multiplication uses terms of the form a * b + c * d for both real and imaginary parts. The assembly code (and compiler-emitted code as well) uses fused multiply add operations for the second product and sum. The results can be "surprising", for example when both terms in the imaginary part nearly cancel each other out. In that case, the second product contributes more digits to the sum than the first product that has been rounded before. One option is to use separate multiplications (which then round the same way) and a distinct add. Change the code to pursue that path, by (1) requesting the compiler not to contract the operations into FMAs and (2) replacing the assembly kernel with corresponding vectorized C code (where change 1 also applies). Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	5 years ago
Marius Hillenbrand	77ea73f5e5	s390x: for clang use fp-contract=on instead of fast Make clang slightly more cautious when contracting floating-point operations (e.g., when applying fused multiply add) by setting -ffp-contract=on (instead of fast). Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	5 years ago
Marius Hillenbrand	f91057cbad	s390x: move common vector definitions and utils into header ... to facilitate reuse beyond gemm_vec.c and avoid code duplication. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	5 years ago
Martin Kroeker	992d7ca63d	Merge pull request #2845 from martin-frbg/lapack443 Fix workspace query in LAPACK xGELQ (Reference-LAPACK 443)	5 years ago
Martin Kroeker	7e4d5c237c	Fix workspace query in xGELQ (Reference-LAPACK PR443)	5 years ago

1 2 3 4 5 ...

4972 Commits (fbda20c856df3375d4ea32c98664ab2e30854248) All Branches Search

4972 Commits (fbda20c856df3375d4ea32c98664ab2e30854248)

All Branches