Martin Kroeker
5b4b385ecf
Temporarily disable the SkylakeX sgemv_t microkernel due to LAPACK testsuite failures
4 years ago
Ma, Yu
706a08d4a0
Optimized sgemv_t for small N based on AVX512
4 years ago
Martin Kroeker
5f677e782e
Merge pull request #3196 from guowangy/skylakex-gemm-batch-k
GEMM: skylake: improve the performance when m is small
4 years ago
Martin Kroeker
02087a62e7
Merge pull request #3205 from intelmy/sgemv_n_opt
optimize on sgemv_n for small n
4 years ago
Martin Kroeker
8b90e5f202
Drop redundant inclusion of complex.h
4 years ago
Martin Kroeker
c0ca63ea46
Fix missing conditionals for non-SKX kernels
4 years ago
pnp
3d4ccd2a13
fix for build error
4 years ago
pnp
c59652f0ce
optimize on sgemv_n for small n
4 years ago
Wangyang Guo
aa7b3dc3db
GEMM: skylake: improve the performance when m is small
4 years ago
Martin Kroeker
3d511f0e66
replace spurious avx512 requirement with fma check
4 years ago
Martin Kroeker
2dfb24730d
Use "old" compute(24) function with clang due to register limitations
4 years ago
Martin Kroeker
7b8f580941
Merge pull request #3156 from martin-frbg/omatcopy_d
Move x86_64 DOMATCOPY_RT back to the C implementation
4 years ago
Martin Kroeker
0f5e86a0d9
Remove premature entry for DOMATCOPY_RT
4 years ago
Martin Kroeker
7b294a99fd
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time
4 years ago
Martin Kroeker
0934568d9c
Move includes under the ifdef for compilers w/o intrinsics support
4 years ago
Martin Kroeker
a9f6f7ad39
Remove spurious AVX512 requirement and add AVX2/FMA3 guard
4 years ago
Martin Kroeker
292d1af1a0
Update omatcopy_rt.c
5 years ago
Martin Kroeker
325b398e3c
Update omatcopy_rt.c
5 years ago
Martin Kroeker
6f5667b4d4
Enable optimized S/D OMATCOPY_RT
5 years ago
Martin Kroeker
cceeee7806
Add optimized omatcopy_rt
5 years ago
Martin Kroeker
47691c031f
Use Haswell optimizations for Zen as well
5 years ago
Martin Kroeker
ce7ddd8921
Use Haswell optimizations for Zen as well
5 years ago
Martin Kroeker
950c047b49
Use Haswell optimizations for Zen as well
5 years ago
Martin Kroeker
46509953a9
Use Haswell optimizations for Zen as well
5 years ago
Martin Kroeker
db348dcff2
Enable optimized srot/drot kernels from Haswell
5 years ago
Martin Kroeker
69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm
Initial code for Cooperlake BF16 GEMM kernel
5 years ago
Alex Henrie
202fc9e8ed
Fix uninitialized argument value in dasum_k
5 years ago
Chen, Guobing
b0beb0b1ca
Initial code for Cooperlake BF16 GEMM kernel
5 years ago
Martin Kroeker
114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
5 years ago
Martin Kroeker
441c08c9ff
Merge pull request #3016 from xiegengxin/complex-asum
Improve the performance of zasum and casum with AVX512 intrinsic
5 years ago
Gengxin Xie
0cb7a403b2
fix error declare function blas_level1_thread_with_return_value
5 years ago
Gengxin Xie
b766c1e9bb
Improve the performance of zasum and casum with AVX512 intrinsic
5 years ago
Martin Kroeker
f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
5 years ago
Gengxin Xie
d6e7e05bb3
Improve the performance of dasum and sasum when SMP is defined
5 years ago
Qiyu8
a87e537b8c
modify macro
5 years ago
Qiyu8
5bc0a7583f
only FMA3 and vector larger than 128 have positive effects.
5 years ago
Qiyu8
8c0b206d4c
Optimize the performance of rot by using universal intrinsics
5 years ago
Martin Kroeker
ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
5 years ago
Gengxin Xie
725ffbf041
fix typo
5 years ago
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
5 years ago
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
5 years ago
İsmail Dönmez
4a1d00f589
Fix build with -Werror=return-type
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
5 years ago
Bart Oldeman
b073d759d0
x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.
In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.
This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
5 years ago
Bart Oldeman
03e781b766
sgemm_direct_skylakex: fix 75eeb26 regression.
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.
Closes #2914
5 years ago
Martin Kroeker
c339c40c01
Silence a redefinition warning
5 years ago
Qiyu8
bfdf4b56da
Add double precision universal intrinsics for X86/ARM
5 years ago
Martin Kroeker
756802df61
Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
5 years ago
Martin Kroeker
8d2df7d066
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM
5 years ago
Martin Kroeker
08929430cd
Merge pull request #2886 from martin-frbg/issue_2767
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
5 years ago
Martin Kroeker
0c84ffe05f
Merge pull request #2881 from mattip/fninit
add fninit to reset fpu registers before assembler routines
5 years ago