OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	91c84e1c01	Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis Add bfloat16 based dot and conversion with single/double	5 years ago
Martin Kroeker	e72430fe46	Merge pull request #2803 from xiegengxin/AVX2-asum Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic	5 years ago
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	5 years ago
Gengxin Xie	1b0f17eeed	align to 64, using SSE when input size is small	5 years ago
Gengxin Xie	448152cdd8	define __AVX2__ to ensure the haswell code compiled with avx2	5 years ago
Gengxin Xie	cb3c190a3a	Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic	5 years ago
Martin Kroeker	b2053239fc	Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function	5 years ago
Martin Kroeker	9ee21a0a39	Merge pull request #2780 from Guobing-Chen/CPL_build_support Enable COOPERLAKE build target	5 years ago
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	5 years ago
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	5 years ago
Martin Kroeker	81dcfdcf39	Multiply by 2 instead of left-shifting a potentially negative number fixes GCC ubsan warning in the BLAS tests	5 years ago
Martin Kroeker	0ef4b3f1f2	Multiply instead of doing a left shift of a potentially negative number fixes GCC ubsan report in the BLAS tests	5 years ago
Martin Kroeker	aa53a8a5cb	Multiply by two instead of left-shifting one place fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests	5 years ago
Martin Kroeker	aa3a1e7d8c	Multiply by two rather than left shift by one place fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests	5 years ago
Martin Kroeker	e30ad0e521	Strip UTF8 byte order marker from source	5 years ago
Martin Kroeker	93592d1260	Merge pull request #2675 from wjc404/develop AVX512 DGEMM TCOPY_16 Function	5 years ago
wjc404	086d87a302	AVX512 dgemm tcopy_16 function	5 years ago
Martin Kroeker	c3574ffe53	Merge pull request #2646 from wjc404/develop Optimize AVX512 parallel DGEMM performance	6 years ago
wjc404	0e3ac4a06b	Add files via upload	6 years ago
Martin Kroeker	2271c3506b	Work around excessive LAPACK test failures on Skylake-X Something in the plain C parts of x86_64 cscal.c and zscal.c appears to be miscompiled by both gfortran9 and ifort when compiling for skylakex-avx512, even when the optimized Haswell microkernel is not in use.	6 years ago
Martin Kroeker	90dba9f716	Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version As discussed on the original PR #2329, the "Apple Clang 11.0.3" that appears to be based the same LLVM release produces the same miscompilation of this file.	6 years ago
Martin Kroeker	5b0093b5fe	Convert aligned moves to unaligned should have no performance impact on reasonably modern cpus and fixes occasional crashes in actual user code.	6 years ago
Martin Kroeker	567d2760e6	Merge pull request #2520 from wjc404/develop Fix avx512 sgemm performance bug when ldc is a multiple of 1024	6 years ago
wjc404	b8307768e2	Add files via upload	6 years ago
Martin Kroeker	af8a619e1f	Merge pull request #2517 from wjc404/develop Temporary fix for SKX STRSM	6 years ago
wjc404	62b9608986	Update KERNEL.SKYLAKEX	6 years ago
Martin Kroeker	a1b181cea2	Merge pull request #2516 from wjc404/develop AVX2 STRSM kernels	6 years ago
wjc404	cdc0e9011e	Update KERNEL.ZEN	6 years ago
wjc404	fa049d49c2	AVX2 STRSM kernel	6 years ago
Martin Kroeker	ea8eec5d17	Merge pull request #2422 from wjc404/develop Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM	6 years ago
wjc404	dd22eb7621	Update cgemm_kernel_8x2_haswell.c	6 years ago
wjc404	2352331e60	Update zgemm_kernel_4x2_haswell.c	6 years ago
wjc404	1b980001dd	Update zgemm_kernel_4x2_haswell.c	6 years ago
wjc404	2515e1152f	Update cgemm_kernel_8x2_haswell.c	6 years ago
wjc404	903854c168	Add files via upload	6 years ago
wjc404	a2ff577a30	Update KERNEL.ZEN	6 years ago
wjc404	97a32cb0a5	Update KERNEL.HASWELL	6 years ago
Martin Liska	aeea14ee40	Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S.	6 years ago
Martin Liska	18bcc36a69	Fix implementation of iamax_sse.S as reported in #2116 . The was a typo in iamax_sse.S where one of the comparison was cmpeqps instead of cmpeqss. That misdetected index for sequences where the minimum value was 0.	6 years ago
wjc404	f566787e6e	Update KERNEL.SKYLAKEX	6 years ago
wjc404	e3368cbf18	AVX512 STRMM kernel	6 years ago
Bart Oldeman	7ea5e07d1c	Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408 The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they must be declared as input/output constraints, otherwise the compiler may assume the corresponding registers are not modified.	6 years ago
wjc404	3447d04eaf	Update dgemm_kernel_16x2_skylakex.c	6 years ago
wjc404	8b5cdcc64c	Update sgemm_kernel_8x4_haswell.c	6 years ago
wjc404	4e00d96a78	Update dgemm_kernel_16x2_skylakex.c	6 years ago
wjc404	096da2f51a	Update dgemm_kernel_16x2_skylakex.c	6 years ago
wjc404	081b188529	Update KERNEL.SKYLAKEX	6 years ago
wjc404	8019e70211	AVX512 16x2 DGEMM kernel	6 years ago
wjc404	e5dcdeb550	Update sgemm_direct_skylakex.c	6 years ago
wjc404	952cc2ba38	Update sgemm_kernel_16x4_skylakex_2.c	6 years ago

1 2 3 4 5 ...

598 Commits (dfbc62ef7e89e448f2a57f3aaf72a11dae61bbd2)