OpenBLAS

Commit Graph

Author	SHA1	Message	Date
gxw	73c6a28073	x86_64: opt somatcopy_ct with AVX	1 year ago
Martin Kroeker	7ca835a82c	address clang array overflow warning	1 year ago
Martin Kroeker	f1c9803f9a	add proper return statement	1 year ago
Martin Kroeker	60abcc3991	add proper return statement	1 year ago
Martin Kroeker	fb7c53c5e5	Merge pull request #4807 from martin-frbg/scalfixes [WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter	1 year ago
Martin Kroeker	dfbc2348a8	fix NAN handling	1 year ago
Martin Kroeker	c064319ecb	fix alpha=NAN case	1 year ago
Martin Kroeker	c2ffd90e8c	make NAN handling depend on dummy2 parameter	1 year ago
gxw	f3cebb3ca3	x86: Fixed numpy CI failure when the target is ZEN.	1 year ago
Martin Kroeker	68f2501958	temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x	1 year ago
Martin Kroeker	0a744a939a	temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x	1 year ago
Martin Kroeker	a2ee4b1966	Merge branch 'OpenMathLib:develop' into issue4728	1 year ago
Martin Kroeker	dd7efcf9ef	Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748 ) * avoid setting nthreads higher than available	1 year ago
Martin Kroeker	1abafcd9b2	handle corner cases involving NAN and/or INF	1 year ago
Martin Kroeker	020b3e1682	fix handling of INF arguments	1 year ago
Martin Kroeker	ce130f11d2	Update zscal.c	1 year ago
Martin Kroeker	ab13cfef93	more fixes for infinite x	1 year ago
Martin Kroeker	ad2b5c67c8	fix another corner case involving infinity	1 year ago
Bart Oldeman	62f7b244ff	Replace use of FLT_MAX in x86_64 zscal.c by isinf() Commit `def4996` fixed issues with inf and nan values in zscal, but used FLT_MAX, where DBL_MAX or isinf() is more appropriate, as FLT_MAX is for single precision only. Using FLT_MAX caused test case failures in the LAPACK tests. isinf() is consistent with the later fix `969601a1`	1 year ago
Zoltán Böszörményi	ca64861ce8	Add forgotten conditional uses of PREFETCH This fixes a (cross-)compilation/linker error for PRESCOTT on Yocto. Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>	1 year ago
Martin Kroeker	8f8ef3492a	Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts)	1 year ago
Martin Kroeker	be5e18c6f9	Add kernel definitions for CSUM and ZSUM	1 year ago
gxw	969601a1dc	X86_64: Fixed bug in zscal Fixed handling of NAN and INF arguments when inc is greater than 1.	2 years ago
Martin Kroeker	5f5b7c4f45	Merge pull request #4423 from martin-frbg/issue4422 Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that	2 years ago
Martin Kroeker	995a990e24	Make AVX512 BFLOAT16 kernels conditional on compiler capability	2 years ago
Martin Kroeker	cf8b03ae8b	Use NAN rather than SNAN for portability	2 years ago
Martin Kroeker	def4996170	Fix handling of NAN and INF arguments	2 years ago
Martin Kroeker	f06b535566	Use C kernel for dgemv_t due to limitations of the old assembly one	2 years ago
Bart Oldeman	c34e2cf380	Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum for skylake kernels. This is the same method as used in [sd]asum. _mm_set1_epi64x was commented out for zasum, but has the advantage of avoiding possible undefined behaviour (using an uninitialized variable), optimized out by NVHPC and icx. The new code works fine with those compilers. For GCC 12.3 the generated code is identical; no matter what method you use, the compiler optimizes the code into a compile-time constant, there is no performance benefit using mm_cmpeq_epi8 since the corresponding instruction (VPCMPEQB) isn't actually generated!	2 years ago
Martin Kroeker	22aa401656	Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC (#4327 ) * Temporarily disable the C/ZASUM microkernels for any version of NVHPC	2 years ago
Bart Oldeman	f8ad5344c2	Fix casum fallback kernel. This kernel is only used on Skylake+ if the kernel with AVX512 intrinsics can't be used, but used the variable x1 incorrectly in the tail end of the loop, as it is still at the initial value instead of where x points to. This caused 55 "other error"s in the LAPACK tests (https://github.com/OpenMathLib/OpenBLAS/issues/4282) This change makes casum.c as similar as possible as zasum.c, because zasum.c does this correctly.	2 years ago
Martin Kroeker	9019bc4945	Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well	2 years ago
Martin Kroeker	675cd551da	fix improper function prototypes (empty parentheses)	2 years ago
Martin Kroeker	2c3034ff7f	Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well	2 years ago
Martin Kroeker	34da1a067d	Allow negative INCX (API change from version 3.10 of the reference implementation)	2 years ago
Martin Kroeker	4664b57e6e	use shortcut only when both incx and incy are zero	2 years ago
Martin Kroeker	6a428b5629	Update casum_microk_skylakex-2.c	2 years ago
Martin Kroeker	ebb447e32e	Update zasum_microk_skylakex-2.c	2 years ago
Martin Kroeker	9f6847583a	nvc currently miscompiles this, hopefully fixed in release 23.09	2 years ago
Martin Kroeker	fe54ee3d15	nvc currently miscompiles this, hopefully fixed in release 23.09	2 years ago
Martin Kroeker	2a62d2df96	Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3	2 years ago
Honglin Zhu	a76afdc047	Compatible with older version of GNU make	2 years ago
Honglin Zhu	0b83088887	spr dynamic arch support	2 years ago
Honglin Zhu	f249ccb741	Fix spr sbgemm error	2 years ago
Martin Kroeker	84bcf6639f	Disable gcc's tree-vectorizer pass on all operating systems	2 years ago
Martin Kroeker	c9174ae8d7	Disable gcc's tree-vectorizer pass on all operating systems	2 years ago
Martin Kroeker	c2fe9cb91f	Disable gcc's tree-vectorizer pass on all operating systems	2 years ago
Martin Kroeker	66b39b835c	Disable gcc's tree-vectorizer pass on all operating systems	2 years ago
Martin Kroeker	bb6d6735bf	Disable gcc's tree-vectorizer pass on all operating systems	2 years ago
Martin Kroeker	d18efaed20	Disable gcc's tree-vectorizer pass on all operating systems	2 years ago

1 2 3 4 5 ...

798 Commits (3a63bbabd1e032b4e0e5ef4199f7c19ff1a5594e)