OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Mosè Giordano	bed01f47c4	Cast arguments of `_mm512_abs_pd` to `__m512` Argument of `_mm512_abs_pd` must be `__m512`, see https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=KNC&expand=60. Without the explicit typecast we get ``` In file included from ../kernel/x86_64/dasum.c:8: ../kernel/x86_64/dasum_microk_skylakex-2.c: In function ‘dasum_kernel’: ../kernel/x86_64/dasum_microk_skylakex-2.c:42:38: error: incompatible type for argument 1 of ‘_mm512_abs_pd’ accum_0 += _mm512_abs_pd(_mm512_load_pd(&x1[i + 0])); ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/immintrin.h:45, from ../kernel/x86_64/dasum_microk_skylakex-2.c:6, from ../kernel/x86_64/dasum.c:8: /opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/avx512fintrin.h:7730:23: note: expected ‘__m512’ {aka ‘__vector(16) float’} but argument is of type ‘__m512d’ {aka ‘__vector(8) double’} _mm512_abs_pd (__m512 __A) ~~~~~~~^~~ In file included from ../kernel/x86_64/dasum.c:8: ../kernel/x86_64/dasum_microk_skylakex-2.c:43:38: error: incompatible type for argument 1 of ‘_mm512_abs_pd’ accum_1 += _mm512_abs_pd(_mm512_load_pd(&x1[i + 8])); ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/immintrin.h:45, from ../kernel/x86_64/dasum_microk_skylakex-2.c:6, from ../kernel/x86_64/dasum.c:8: /opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/avx512fintrin.h:7730:23: note: expected ‘__m512’ {aka ‘__vector(16) float’} but argument is of type ‘__m512d’ {aka ‘__vector(8) double’} _mm512_abs_pd (__m512 __A) ~~~~~~~^~~ In file included from ../kernel/x86_64/dasum.c:8: ../kernel/x86_64/dasum_microk_skylakex-2.c:44:38: error: incompatible type for argument 1 of ‘_mm512_abs_pd’ accum_2 += _mm512_abs_pd(_mm512_load_pd(&x1[i +16])); ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/immintrin.h:45, /opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/avx512fintrin.h:7730:23: note: expected ‘__m512’ {aka ‘__vector(16) float’} but argument is of type ‘__m512d’ {aka ‘__vector(8) double’} _mm512_abs_pd (__m512 __A) ~~~~~~~^~~ ```	5 years ago
Martin Kroeker	7e9cb39a25	Merge pull request #2981 from Qiyu8/fix-sum Fix sum optimize issues	5 years ago
Qiyu8	ae0b1dea19	modify system.cmake to enable fma flag	5 years ago
Qiyu8	e0dac6b53b	fix the CI failure of target specific option mismatch	5 years ago
Qiyu8	e5c2ceb675	fix the CI failure of lack the head	5 years ago
Qiyu8	a87e537b8c	modify macro	5 years ago
Qiyu8	5bc0a7583f	only FMA3 and vector larger than 128 have positive effects.	5 years ago
Qiyu8	8c0b206d4c	Optimize the performance of rot by using universal intrinsics	5 years ago
Qiyu8	c4c591ac5a	fix sum optimize issues	5 years ago
Martin Kroeker	ff16329cb7	Merge pull request #2972 from xiegengxin/rot-intrinsic Improve the performance of rot by using AVX512 and AVX2 intrinsic	5 years ago
Martin Kroeker	110c7a6de0	Merge pull request #2979 from RajalakshmiSR/dot_power10 Optimize sdot/ddot for POWER10	5 years ago
Rajalakshmi Srinivasaraghavan	6e364981a8	Optimize sdot/ddot for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	5 years ago
Martin Kroeker	b976a0bf40	Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds	5 years ago
Martin Kroeker	ff74319ea5	Merge pull request #2977 from martin-frbg/issue2976 Fix macro name used in ifdef for POWERPC/PGI	5 years ago
Martin Kroeker	28d2dfe2b3	Fix macro name used in ifdef	5 years ago
Gengxin Xie	725ffbf041	fix typo	5 years ago
Gengxin Xie	d9ba49165a	Improve the performance of rot by using AVX512 and AVX2 intrinsic	5 years ago
Rajalakshmi Srinivasaraghavan	dd7a9cc5bf	POWER10: Change dgemm unroll factors Changing the unroll factors for dgemm to 8 shows improved performance with POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.	5 years ago
Rajalakshmi Srinivasaraghavan	b435491885	Optimize caxpy for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	5 years ago
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	5 years ago
Martin Kroeker	67f39ad813	Merge pull request #2939 from thrasibule/Makefile_cleanup reuse variables defined in Makefile.system	5 years ago
Rajalakshmi Srinivasaraghavan	c24ba8b1dd	Optimize saxpy for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	5 years ago
Martin Kroeker	6f9460f0f6	Merge pull request #2937 from martin-frbg/pwr-buffersz Increase and unify BUFFERSIZE on POWER;fix gcc inline warning	5 years ago
Guillaume Horel	1917a4e7b8	reuse variables defined in Makefile.system	5 years ago
Martin Kroeker	34c3c407ef	label always_inline function as inline to silence a gcc warning	5 years ago
Martin Kroeker	2e48d560ba	Fix compiler version check	5 years ago
Rajalakshmi Srinivasaraghavan	ad745c0bae	Optimize scopy/ccopy for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores. Also reorganized all variants of copy functions to make use of same kernel.	5 years ago
İsmail Dönmez	4a1d00f589	Fix build with -Werror=return-type dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a return 0 similar to other files.	5 years ago
Bart Oldeman	b073d759d0	x86_64: clobber all xmm registers after vzeroupper As observed using GCC 10 using -march=native -ftree-vectorize on Knights Landing, it is now smart enough to find clobbers inside non-inlined static functions. In particular, sgemv counted on a kernel to preserve the whole %ymm2 register (since it was not in the clobber list), but the top part was destroyed by vzeroupper. This caused many tests to fail. This patch makes sure all xmm (and ymm/zmm by extension) registers are listed as clobbered to avoid this happening, as most kernels already did correctly in fact.	5 years ago
Martin Kroeker	dc6e44c3f8	Merge pull request #2916 from martin-frbg/issue2911 Clean up duplicate definitions in POWER8 kernels and fix power10 option passing	5 years ago
Martin Kroeker	a61c086408	Fix spurious trailing whitespace in comment	5 years ago
Bart Oldeman	03e781b766	sgemm_direct_skylakex: fix `75eeb26` regression. The `#if defined(SKYLAKEX) \|\| defined (COOPERLAKE)` from that commit was before #include "common.h" so caused the compiled function to be empty, returning garbage results for qualifying sgemm's on those architectures. Closes #2914	5 years ago
Martin Kroeker	f1a4071d8c	Clean up STACKSIZE redefinition	5 years ago
Martin Kroeker	97cf10062f	Clean up STACKSIZE redefinition	5 years ago
Martin Kroeker	17e288e18d	Clean up STACKSIZE redefinition	5 years ago
Martin Kroeker	c1422f3e46	Clean up STACKSIZE redefinition	5 years ago
Martin Kroeker	d85b24e103	Clean up STACKSIZE redefinition	5 years ago
Martin Kroeker	df70667043	fix core list for sse/sse2	5 years ago
Martin Kroeker	f071d1207a	add sse2	5 years ago
Martin Kroeker	dc6cefd2f5	Expressly enable -msse for 32bit DYNAMIC_ARCH kernels	5 years ago
Martin Kroeker	c339c40c01	Silence a redefinition warning	5 years ago
Martin Kroeker	10379fc83b	Use ifdef instead of if	5 years ago
Martin Kroeker	4c25910da0	Merge pull request #2896 from martin-frbg/intrin-double Add compiler flag for SSE4 where available	5 years ago
Martin Kroeker	ae6ac83991	Revert "add double precision SSE"	5 years ago
Qiyu8	4fac91ef37	adapt arm platform	5 years ago
Qiyu8	bfdf4b56da	Add double precision universal intrinsics for X86/ARM	5 years ago
Martin Kroeker	ebf0470fc2	add sse4.1 for DYNAMIC_ARCH kernels	5 years ago
Martin Kroeker	c9c3ae07af	Add double precision operations	5 years ago
Martin Kroeker	756802df61	Merge pull request #2890 from martin-frbg/s-d-sum Revert special handling of Windows xNRM2 and enable C+intrinsics kern…	5 years ago
Rajalakshmi Srinivasaraghavan	0826d68f93	POWER10: Change the packing format for bfloat16 As the new MMA instructions need the inputs in 4x2 order for bfloat16, changing the format in copy/packing code. This avoids permute instructions in the gemm kernel inner loop.	5 years ago

1 2 3 4 5 ...

1557 Commits (bed01f47c483fe1270e359c14fc6999f93ead7d5)