OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chip Kerchner	c8f53b85ce	Merge remote-tracking branch 'origin/develop' into vectorizeBF16GEMV	1 year ago
Martin Kroeker	e52d9b4cf1	Merge pull request #4928 from austinpagan/czgemm_in_c CGEMM & ZGEMM using C code, Power only, P10 only.	1 year ago
Gordon Fossum	0b7fb5c791	CGEMM & ZGEMM using C code.	1 year ago
Chip Kerchner	d6bb8dcfd1	Common code.	1 year ago
Martin Kroeker	c9e92348a6	Handle inf/nan if dummy2 flag is set	1 year ago
Chip Kerchner	9ac0fb0111	Merge branch 'develop' into vectorizeBF16GEMV	1 year ago
Martin Kroeker	d714013ab9	change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds	1 year ago
Chip Kerchner	915a6d6e44	Add casting.	1 year ago
Chip Kerchner	7ec3c16d82	Remove beta from optimized functions.	1 year ago
Chip Kerchner	7cc00f68c9	Remove more duplicate.	1 year ago
Chip Kerchner	e238a68c03	Remove duplicate.	1 year ago
Chip Kerchner	32095b0cbb	Remove parameter.	1 year ago
Chip Kerchner	c8788208c8	Fixing block issue with transpose version.	1 year ago
Chip Kerchner	d7c0d87cd1	Small changes.	1 year ago
Chip Kerchner	eb6f3a05ef	Common MMA code.	1 year ago
Chip Kerchner	fb287d17fc	Common code.	1 year ago
Chip Kerchner	8ab6245771	Small change.	1 year ago
Chip Kerchner	df19375560	Almost final code for MMA.	1 year ago
Chip Kerchner	05aa63e738	More MMA BF16 GEMV code.	1 year ago
Chip Kerchner	c9ce37d527	Force vector pairs in clang.	1 year ago
Chip Kerchner	89a12fa083	MMA BF16 GEMV code.	1 year ago
Chip Kerchner	7947970f9d	Move common code.	1 year ago
Chip Kerchner	72216d28c2	Fix bug with inc_y adding results twice.	1 year ago
Chip Kerchner	2f142ee857	More common code.	1 year ago
Chip Kerchner	39fd29f1de	Minor improvement and turn off BF16 GEMV forwarding by default.	1 year ago
Chip Kerchner	8541b25e1d	Special case beta is one.	1 year ago
Chip Kerchner	76227e2948	Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV.	1 year ago
Chip Kerchner	1a7b8c650d	Merge branch 'develop' into betterPowerGEMVTail	1 year ago
Martin Kroeker	f5d04318e3	Merge branch 'OpenMathLib:develop' into scalfixes	1 year ago
Martin Kroeker	73f8866ffb	make NAN handling depend on DUMMY2 parameter	1 year ago
Hong Bo Peng	db98f8753f	Try to fix LAPACK testing failures on P7. 1. Remove the FADD insn from the GEMV Transpose code. 2. Remove the FADD insn from GEMM and ZGEMM code. 3. Reorder the compution of the Imaginary part in ZGEMM code.	1 year ago
Martin Kroeker	b9bfc8ce09	make NAN handling depend on dummy2 parameter	1 year ago
Chip Kerchner	ba47c7f4f3	Vectorize reduction stage of sgemv_t.	1 year ago
Chip Kerchner	cb154832f8	Vectorize SBGEMM incopy - 4x faster.	1 year ago
Martin Kroeker	2a5fe97e3b	temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN	1 year ago
Martin Kroeker	7f8f037a36	handle INF and NAN in input	1 year ago
Martin Kroeker	f1248b849d	handle INF and NAN in input	1 year ago
Rajalakshmi Srinivasaraghavan	e112191b54	POWER: Fix issues in zscal to address lapack failures This patch fixes following lapack failures with clang compiler on POWER. zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold	2 years ago
Martin Kroeker	aa259b141d	Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix Fix regression SAXPY when compiler with OpenXL compiler.	2 years ago
Chip Kerchner	3a1417671a	POWER: Fixing endianness issue in cswap/zswap kernel for AIX	2 years ago
Amrita H S	87b3d9054f	Fix regression SAXPY when compiler with OpenXL compiler. SAXPY built with OpenXL regresses when compared to SAXPY built with gcc. OpenXL compiler doesn't know that the SAXPY inner kernel assembly is a 64 element loop and to it the remainder loop is the main loop. It vectorizes and interleaves the remainder to be a 48 elements per iteration loop. With a max of 63 iterations, a 48 element loop is mostly not going to get executed, so the 1 element scalar loop that is the remainder after that is probably mostly what gets executed. This can be fixed by adding a pragma, loop interleave_count(2) which will result in 8 element loop. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2 years ago
Chip-Kerchner	99384933ff	Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code" This reverts commit accea1555159d0928a6aa2db740c042c7e8f0dd3, reversing changes made to `b925353006`.	2 years ago
Martin Kroeker	accea15551	Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code Cgemm zgemm c code	2 years ago
austinpagan	87ba528d8b	Changed C files to straighten out indentation. Removed commented lines from other file.	2 years ago
austinpagan	ddac75e0ef	Adding .C versions of CGEMM and ZGEMM	2 years ago
Chip Kerchner	2bb7ea64a1	Only vectorize 64-bit version for Power8.	2 years ago
Chip Kerchner	09bb48d1b9	Vectorize in-copy packing/copying for SGEMM - 4X faster.	2 years ago
Chip-Kerchner	058dd2a4cb	Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.	2 years ago
barracuda156	d9653af018	KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366	2 years ago
Chip-Kerchner	4e738e561a	Replace two vector loads with one vector pair load and fix endianess of stores.	2 years ago

1 2 3 4 5 ...

296 Commits (c8f53b85ceb885bd0f49fcba0ac888419fb6d3bd)