625 Commits (a1eecccda28cf7d00a5ffbbcd5afb4ca6ef6c6a1)

Author SHA1 Message Date
  Martin Kroeker f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum 5 years ago
  Gengxin Xie d6e7e05bb3 Improve the performance of dasum and sasum when SMP is defined 5 years ago
  Qiyu8 a87e537b8c modify macro 5 years ago
  Qiyu8 5bc0a7583f only FMA3 and vector larger than 128 have positive effects. 5 years ago
  Qiyu8 8c0b206d4c Optimize the performance of rot by using universal intrinsics 5 years ago
  Martin Kroeker ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic 5 years ago
  Gengxin Xie 725ffbf041 fix typo 5 years ago
  Gengxin Xie d9ba49165a Improve the performance of rot by using AVX512 and AVX2 intrinsic 5 years ago
  Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv 5 years ago
  İsmail Dönmez 4a1d00f589
Fix build with -Werror=return-type 5 years ago
  Bart Oldeman b073d759d0 x86_64: clobber all xmm registers after vzeroupper 5 years ago
  Bart Oldeman 03e781b766 sgemm_direct_skylakex: fix 75eeb26 regression. 5 years ago
  Martin Kroeker c339c40c01
Silence a redefinition warning 5 years ago
  Qiyu8 bfdf4b56da Add double precision universal intrinsics for X86/ARM 5 years ago
  Martin Kroeker 756802df61
Merge pull request #2890 from martin-frbg/s-d-sum 5 years ago
  Martin Kroeker 8d2df7d066
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM 5 years ago
  Martin Kroeker 08929430cd
Merge pull request #2886 from martin-frbg/issue_2767 5 years ago
  Martin Kroeker 0c84ffe05f
Merge pull request #2881 from mattip/fninit 5 years ago
  Matti Picus 403eb513a0 use emms instead, add WIN guards 5 years ago
  Martin Kroeker dc8a1afa63
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker fd94236042
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 68ce719fac
Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c 5 years ago
  Martin Kroeker d7dd9b396c
Rename shdot.c to sbdot.c 5 years ago
  Martin Kroeker 7812486091
Use generic C for D/Z nrm2 kernels on Windows to work around fpu exception bug 5 years ago
  Matti Picus a5b164946c add fninit to reset fpu registers before assembler routines 5 years ago
  Qiyu8 14f7dad3b7 performance improved 5 years ago
  Qiyu8 325b539c26 Optimize the performance of daxpy by using universal intrinsics 5 years ago
  Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis 5 years ago
  Martin Kroeker e72430fe46
Merge pull request #2803 from xiegengxin/AVX2-asum 5 years ago
  Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double 5 years ago
  Gengxin Xie 1b0f17eeed align to 64, using SSE when input size is small 5 years ago
  Gengxin Xie 448152cdd8 define __AVX2__ to ensure the haswell code compiled with avx2 5 years ago
  Gengxin Xie cb3c190a3a Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic 5 years ago
  Martin Kroeker b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 5 years ago
  Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support 5 years ago
  Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782) 5 years ago
  Chen, Guobing e740c4873d Enable COOPERLAKE build target 5 years ago
  Martin Kroeker 81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number 5 years ago
  Martin Kroeker 0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number 5 years ago
  Martin Kroeker aa53a8a5cb
Multiply by two instead of left-shifting one place 5 years ago
  Martin Kroeker aa3a1e7d8c
Multiply by two rather than left shift by one place 5 years ago
  Martin Kroeker e30ad0e521
Strip UTF8 byte order marker from source 5 years ago
  Martin Kroeker 93592d1260
Merge pull request #2675 from wjc404/develop 5 years ago
  wjc404 086d87a302
AVX512 dgemm tcopy_16 function 5 years ago
  Martin Kroeker c3574ffe53
Merge pull request #2646 from wjc404/develop 5 years ago
  wjc404 0e3ac4a06b
Add files via upload 5 years ago
  Martin Kroeker 2271c3506b
Work around excessive LAPACK test failures on Skylake-X 5 years ago
  Martin Kroeker 90dba9f716
Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version 5 years ago
  Martin Kroeker 5b0093b5fe
Convert aligned moves to unaligned 5 years ago
  Martin Kroeker 567d2760e6
Merge pull request #2520 from wjc404/develop 5 years ago