OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	5b4b385ecf	Temporarily disable the SkylakeX sgemv_t microkernel due to LAPACK testsuite failures	5 years ago
Ma, Yu	706a08d4a0	Optimized sgemv_t for small N based on AVX512	5 years ago
Martin Kroeker	5f677e782e	Merge pull request #3196 from guowangy/skylakex-gemm-batch-k GEMM: skylake: improve the performance when m is small	5 years ago
Martin Kroeker	02087a62e7	Merge pull request #3205 from intelmy/sgemv_n_opt optimize on sgemv_n for small n	5 years ago
Martin Kroeker	8b90e5f202	Drop redundant inclusion of complex.h	5 years ago
Martin Kroeker	c0ca63ea46	Fix missing conditionals for non-SKX kernels	5 years ago
pnp	3d4ccd2a13	fix for build error	5 years ago
pnp	c59652f0ce	optimize on sgemv_n for small n	5 years ago
Wangyang Guo	aa7b3dc3db	GEMM: skylake: improve the performance when m is small	5 years ago
Martin Kroeker	3d511f0e66	replace spurious avx512 requirement with fma check	5 years ago
Martin Kroeker	2dfb24730d	Use "old" compute(24) function with clang due to register limitations	5 years ago
Martin Kroeker	7b8f580941	Merge pull request #3156 from martin-frbg/omatcopy_d Move x86_64 DOMATCOPY_RT back to the C implementation	5 years ago
Martin Kroeker	0f5e86a0d9	Remove premature entry for DOMATCOPY_RT	5 years ago
Martin Kroeker	7b294a99fd	Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time	5 years ago
Martin Kroeker	0934568d9c	Move includes under the ifdef for compilers w/o intrinsics support	5 years ago
Martin Kroeker	a9f6f7ad39	Remove spurious AVX512 requirement and add AVX2/FMA3 guard	5 years ago
Martin Kroeker	292d1af1a0	Update omatcopy_rt.c	5 years ago
Martin Kroeker	325b398e3c	Update omatcopy_rt.c	5 years ago
Martin Kroeker	6f5667b4d4	Enable optimized S/D OMATCOPY_RT	5 years ago
Martin Kroeker	cceeee7806	Add optimized omatcopy_rt	5 years ago
Martin Kroeker	47691c031f	Use Haswell optimizations for Zen as well	5 years ago
Martin Kroeker	ce7ddd8921	Use Haswell optimizations for Zen as well	5 years ago
Martin Kroeker	950c047b49	Use Haswell optimizations for Zen as well	5 years ago
Martin Kroeker	46509953a9	Use Haswell optimizations for Zen as well	5 years ago
Martin Kroeker	db348dcff2	Enable optimized srot/drot kernels from Haswell	5 years ago
Martin Kroeker	69a5558203	Merge pull request #3059 from Guobing-Chen/BF16_gemm Initial code for Cooperlake BF16 GEMM kernel	5 years ago
Alex Henrie	202fc9e8ed	Fix uninitialized argument value in dasum_k	5 years ago
Chen, Guobing	b0beb0b1ca	Initial code for Cooperlake BF16 GEMM kernel	5 years ago
Martin Kroeker	114eb159a4	Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA	5 years ago
Martin Kroeker	441c08c9ff	Merge pull request #3016 from xiegengxin/complex-asum Improve the performance of zasum and casum with AVX512 intrinsic	5 years ago
Gengxin Xie	0cb7a403b2	fix error declare function blas_level1_thread_with_return_value	5 years ago
Gengxin Xie	b766c1e9bb	Improve the performance of zasum and casum with AVX512 intrinsic	5 years ago
Martin Kroeker	f1bf040b25	Merge pull request #2988 from xiegengxin/smp-asum Improve the performance of dasum and sasum when SMP is defined	5 years ago
Gengxin Xie	d6e7e05bb3	Improve the performance of dasum and sasum when SMP is defined	5 years ago
Qiyu8	a87e537b8c	modify macro	5 years ago
Qiyu8	5bc0a7583f	only FMA3 and vector larger than 128 have positive effects.	5 years ago
Qiyu8	8c0b206d4c	Optimize the performance of rot by using universal intrinsics	5 years ago
Martin Kroeker	ff16329cb7	Merge pull request #2972 from xiegengxin/rot-intrinsic Improve the performance of rot by using AVX512 and AVX2 intrinsic	5 years ago
Gengxin Xie	725ffbf041	fix typo	5 years ago
Gengxin Xie	d9ba49165a	Improve the performance of rot by using AVX512 and AVX2 intrinsic	5 years ago
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	5 years ago
İsmail Dönmez	4a1d00f589	Fix build with -Werror=return-type dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a return 0 similar to other files.	5 years ago
Bart Oldeman	b073d759d0	x86_64: clobber all xmm registers after vzeroupper As observed using GCC 10 using -march=native -ftree-vectorize on Knights Landing, it is now smart enough to find clobbers inside non-inlined static functions. In particular, sgemv counted on a kernel to preserve the whole %ymm2 register (since it was not in the clobber list), but the top part was destroyed by vzeroupper. This caused many tests to fail. This patch makes sure all xmm (and ymm/zmm by extension) registers are listed as clobbered to avoid this happening, as most kernels already did correctly in fact.	5 years ago
Bart Oldeman	03e781b766	sgemm_direct_skylakex: fix `75eeb26` regression. The `#if defined(SKYLAKEX) \|\| defined (COOPERLAKE)` from that commit was before #include "common.h" so caused the compiled function to be empty, returning garbage results for qualifying sgemm's on those architectures. Closes #2914	5 years ago
Martin Kroeker	c339c40c01	Silence a redefinition warning	5 years ago
Qiyu8	bfdf4b56da	Add double precision universal intrinsics for X86/ARM	5 years ago
Martin Kroeker	756802df61	Merge pull request #2890 from martin-frbg/s-d-sum Revert special handling of Windows xNRM2 and enable C+intrinsics kern…	5 years ago
Martin Kroeker	8d2df7d066	Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM	5 years ago
Martin Kroeker	08929430cd	Merge pull request #2886 from martin-frbg/issue_2767 Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix	5 years ago
Martin Kroeker	0c84ffe05f	Merge pull request #2881 from mattip/fninit add fninit to reset fpu registers before assembler routines	5 years ago

1 2 3 4 5 ...

657 Commits (issue3321)