Argument of `_mm512_abs_pd` must be `__m512`, see
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=KNC&expand=60.
Without the explicit typecast we get
```
In file included from ../kernel/x86_64/dasum.c:8:
../kernel/x86_64/dasum_microk_skylakex-2.c: In function ‘dasum_kernel’:
../kernel/x86_64/dasum_microk_skylakex-2.c:42:38: error: incompatible type for argument 1 of ‘_mm512_abs_pd’
accum_0 += _mm512_abs_pd(_mm512_load_pd(&x1[i + 0]));
^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/immintrin.h:45,
from ../kernel/x86_64/dasum_microk_skylakex-2.c:6,
from ../kernel/x86_64/dasum.c:8:
/opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/avx512fintrin.h:7730:23: note: expected ‘__m512’ {aka ‘__vector(16) float’} but argument is of type ‘__m512d’ {aka ‘__vector(8) double’}
_mm512_abs_pd (__m512 __A)
~~~~~~~^~~
In file included from ../kernel/x86_64/dasum.c:8:
../kernel/x86_64/dasum_microk_skylakex-2.c:43:38: error: incompatible type for argument 1 of ‘_mm512_abs_pd’
accum_1 += _mm512_abs_pd(_mm512_load_pd(&x1[i + 8]));
^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/immintrin.h:45,
from ../kernel/x86_64/dasum_microk_skylakex-2.c:6,
from ../kernel/x86_64/dasum.c:8:
/opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/avx512fintrin.h:7730:23: note: expected ‘__m512’ {aka ‘__vector(16) float’} but argument is of type ‘__m512d’ {aka ‘__vector(8) double’}
_mm512_abs_pd (__m512 __A)
~~~~~~~^~~
In file included from ../kernel/x86_64/dasum.c:8:
../kernel/x86_64/dasum_microk_skylakex-2.c:44:38: error: incompatible type for argument 1 of ‘_mm512_abs_pd’
accum_2 += _mm512_abs_pd(_mm512_load_pd(&x1[i +16]));
^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/immintrin.h:45,
/opt/x86_64-linux-gnu/lib/gcc/x86_64-linux-gnu/8.1.0/include/avx512fintrin.h:7730:23: note: expected ‘__m512’ {aka ‘__vector(16) float’} but argument is of type ‘__m512d’ {aka ‘__vector(8) double’}
_mm512_abs_pd (__m512 __A)
~~~~~~~^~~
```
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.
In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.
This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.
Closes#2914
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.