598 Commits (dfbc62ef7e89e448f2a57f3aaf72a11dae61bbd2)

Author SHA1 Message Date
  Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis 5 years ago
  Martin Kroeker e72430fe46
Merge pull request #2803 from xiegengxin/AVX2-asum 5 years ago
  Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double 5 years ago
  Gengxin Xie 1b0f17eeed align to 64, using SSE when input size is small 5 years ago
  Gengxin Xie 448152cdd8 define __AVX2__ to ensure the haswell code compiled with avx2 5 years ago
  Gengxin Xie cb3c190a3a Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic 5 years ago
  Martin Kroeker b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 5 years ago
  Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support 5 years ago
  Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782) 5 years ago
  Chen, Guobing e740c4873d Enable COOPERLAKE build target 5 years ago
  Martin Kroeker 81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number 5 years ago
  Martin Kroeker 0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number 5 years ago
  Martin Kroeker aa53a8a5cb
Multiply by two instead of left-shifting one place 5 years ago
  Martin Kroeker aa3a1e7d8c
Multiply by two rather than left shift by one place 5 years ago
  Martin Kroeker e30ad0e521
Strip UTF8 byte order marker from source 5 years ago
  Martin Kroeker 93592d1260
Merge pull request #2675 from wjc404/develop 5 years ago
  wjc404 086d87a302
AVX512 dgemm tcopy_16 function 5 years ago
  Martin Kroeker c3574ffe53
Merge pull request #2646 from wjc404/develop 5 years ago
  wjc404 0e3ac4a06b
Add files via upload 5 years ago
  Martin Kroeker 2271c3506b
Work around excessive LAPACK test failures on Skylake-X 5 years ago
  Martin Kroeker 90dba9f716
Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version 5 years ago
  Martin Kroeker 5b0093b5fe
Convert aligned moves to unaligned 5 years ago
  Martin Kroeker 567d2760e6
Merge pull request #2520 from wjc404/develop 5 years ago
  wjc404 b8307768e2
Add files via upload 5 years ago
  Martin Kroeker af8a619e1f
Merge pull request #2517 from wjc404/develop 5 years ago
  wjc404 62b9608986
Update KERNEL.SKYLAKEX 5 years ago
  Martin Kroeker a1b181cea2
Merge pull request #2516 from wjc404/develop 5 years ago
  wjc404 cdc0e9011e
Update KERNEL.ZEN 5 years ago
  wjc404 fa049d49c2
AVX2 STRSM kernel 5 years ago
  Martin Kroeker ea8eec5d17
Merge pull request #2422 from wjc404/develop 6 years ago
  wjc404 dd22eb7621
Update cgemm_kernel_8x2_haswell.c 6 years ago
  wjc404 2352331e60
Update zgemm_kernel_4x2_haswell.c 6 years ago
  wjc404 1b980001dd
Update zgemm_kernel_4x2_haswell.c 6 years ago
  wjc404 2515e1152f
Update cgemm_kernel_8x2_haswell.c 6 years ago
  wjc404 903854c168
Add files via upload 6 years ago
  wjc404 a2ff577a30
Update KERNEL.ZEN 6 years ago
  wjc404 97a32cb0a5
Update KERNEL.HASWELL 6 years ago
  Martin Liska aeea14ee40
Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S. 6 years ago
  Martin Liska 18bcc36a69
Fix implementation of iamax_sse.S as reported in #2116. 6 years ago
  wjc404 f566787e6e
Update KERNEL.SKYLAKEX 6 years ago
  wjc404 e3368cbf18
AVX512 STRMM kernel 6 years ago
  Bart Oldeman 7ea5e07d1c Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408 6 years ago
  wjc404 3447d04eaf
Update dgemm_kernel_16x2_skylakex.c 6 years ago
  wjc404 8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c 6 years ago
  wjc404 4e00d96a78
Update dgemm_kernel_16x2_skylakex.c 6 years ago
  wjc404 096da2f51a
Update dgemm_kernel_16x2_skylakex.c 6 years ago
  wjc404 081b188529
Update KERNEL.SKYLAKEX 6 years ago
  wjc404 8019e70211
AVX512 16x2 DGEMM kernel 6 years ago
  wjc404 e5dcdeb550
Update sgemm_direct_skylakex.c 6 years ago
  wjc404 952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c 6 years ago