648 Commits (b67a92c19f0b3287c933a526ec2ca3ad132a3763)

Author SHA1 Message Date
  Martin Kroeker 3d511f0e66
replace spurious avx512 requirement with fma check 4 years ago
  Martin Kroeker 2dfb24730d
Use "old" compute(24) function with clang due to register limitations 4 years ago
  Martin Kroeker 7b8f580941
Merge pull request #3156 from martin-frbg/omatcopy_d 4 years ago
  Martin Kroeker 0f5e86a0d9
Remove premature entry for DOMATCOPY_RT 4 years ago
  Martin Kroeker 7b294a99fd
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time 4 years ago
  Martin Kroeker 0934568d9c
Move includes under the ifdef for compilers w/o intrinsics support 4 years ago
  Martin Kroeker a9f6f7ad39
Remove spurious AVX512 requirement and add AVX2/FMA3 guard 4 years ago
  Martin Kroeker 292d1af1a0
Update omatcopy_rt.c 5 years ago
  Martin Kroeker 325b398e3c
Update omatcopy_rt.c 5 years ago
  Martin Kroeker 6f5667b4d4
Enable optimized S/D OMATCOPY_RT 5 years ago
  Martin Kroeker cceeee7806
Add optimized omatcopy_rt 5 years ago
  Martin Kroeker 47691c031f
Use Haswell optimizations for Zen as well 5 years ago
  Martin Kroeker ce7ddd8921
Use Haswell optimizations for Zen as well 5 years ago
  Martin Kroeker 950c047b49
Use Haswell optimizations for Zen as well 5 years ago
  Martin Kroeker 46509953a9
Use Haswell optimizations for Zen as well 5 years ago
  Martin Kroeker db348dcff2
Enable optimized srot/drot kernels from Haswell 5 years ago
  Martin Kroeker 69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm 5 years ago
  Alex Henrie 202fc9e8ed Fix uninitialized argument value in dasum_k 5 years ago
  Chen, Guobing b0beb0b1ca Initial code for Cooperlake BF16 GEMM kernel 5 years ago
  Martin Kroeker 114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA 5 years ago
  Martin Kroeker 441c08c9ff
Merge pull request #3016 from xiegengxin/complex-asum 5 years ago
  Gengxin Xie 0cb7a403b2 fix error declare function blas_level1_thread_with_return_value 5 years ago
  Gengxin Xie b766c1e9bb Improve the performance of zasum and casum with AVX512 intrinsic 5 years ago
  Martin Kroeker f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum 5 years ago
  Gengxin Xie d6e7e05bb3 Improve the performance of dasum and sasum when SMP is defined 5 years ago
  Qiyu8 a87e537b8c modify macro 5 years ago
  Qiyu8 5bc0a7583f only FMA3 and vector larger than 128 have positive effects. 5 years ago
  Qiyu8 8c0b206d4c Optimize the performance of rot by using universal intrinsics 5 years ago
  Martin Kroeker ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic 5 years ago
  Gengxin Xie 725ffbf041 fix typo 5 years ago
  Gengxin Xie d9ba49165a Improve the performance of rot by using AVX512 and AVX2 intrinsic 5 years ago
  Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv 5 years ago
  İsmail Dönmez 4a1d00f589
Fix build with -Werror=return-type 5 years ago
  Bart Oldeman b073d759d0 x86_64: clobber all xmm registers after vzeroupper 5 years ago
  Bart Oldeman 03e781b766 sgemm_direct_skylakex: fix 75eeb26 regression. 5 years ago
  Martin Kroeker c339c40c01
Silence a redefinition warning 5 years ago
  Qiyu8 bfdf4b56da Add double precision universal intrinsics for X86/ARM 5 years ago
  Martin Kroeker 756802df61
Merge pull request #2890 from martin-frbg/s-d-sum 5 years ago
  Martin Kroeker 8d2df7d066
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM 5 years ago
  Martin Kroeker 08929430cd
Merge pull request #2886 from martin-frbg/issue_2767 5 years ago
  Martin Kroeker 0c84ffe05f
Merge pull request #2881 from mattip/fninit 5 years ago
  Matti Picus 403eb513a0 use emms instead, add WIN guards 5 years ago
  Martin Kroeker dc8a1afa63
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker fd94236042
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 68ce719fac
Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c 5 years ago
  Martin Kroeker d7dd9b396c
Rename shdot.c to sbdot.c 5 years ago
  Martin Kroeker 7812486091
Use generic C for D/Z nrm2 kernels on Windows to work around fpu exception bug 5 years ago
  Matti Picus a5b164946c add fninit to reset fpu registers before assembler routines 5 years ago
  Qiyu8 14f7dad3b7 performance improved 5 years ago
  Qiyu8 325b539c26 Optimize the performance of daxpy by using universal intrinsics 5 years ago