Martin Kroeker
f73cfb7e2c
change line endings from CRLF to LF
3 years ago
Martin Kroeker
1688c7da43
change line endings from CRLF to LF
3 years ago
Bart Oldeman
6c1043eb41
Add [cz]scal microkernels for SKYLAKEX
These are as similar to dscal_microk_skylakex-2.c as possible
for consistency.
Note that before this change SKYLAKEX+ uses generic C functions for
cscal/zscal via commit 2271c350 from #2610 (which is masked by
commit 086d87a30 ). However now #3799 disables FMAs (in turn enabled
by `-march=skylake-avx512`) in the plain C code which fixes excessive
LAPACK test failures more nicely.
3 years ago
Bart Oldeman
e7e3aa2948
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.
Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f for more information.
This uses the same syntax as used in 22aa81f for zarch (s390x).
3 years ago
Martin Kroeker
101a2c77c3
Fix warnings
3 years ago
Martin Kroeker
739c3c44a7
Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI ( #3745 )
Add pragma to disable the gcc tree-optimizer for some x86_64 S and Z kernels with gcc12 on OSX or Windows
3 years ago
Martin Kroeker
dc49edd4e6
Revert "roll back DGEMM kernel ... for DYNAMIC_ARCH"
3 years ago
Caroline Newcombe
5cc1111383
fix unsafe read of Y in assembly kernel
3 years ago
Wangyang Guo
225683218c
Small Matrix: use proper inline asm input constraint for AVX512 mask
3 years ago
Martin Kroeker
9c626e466e
really fix definition of SHUFFLE_MAGIC_NO
3 years ago
Martin Kroeker
9d7429406f
Declare SHUFFLE_MAGIC_NO as const to placate clang
3 years ago
Martin Kroeker
522f809825
Merge pull request #3542 from martin-frbg/issue3540
Fix compilation for CooperLake on Windows/clang
3 years ago
Mosè Giordano
abbc947edb
Fix compilation of Skylake AVX512 kernels with GCC 6
3 years ago
Martin Kroeker
c62f8e2c01
Prevent compiler attempts to use k0 as mask register
3 years ago
Martin Kroeker
80eb581c83
Fix non-portable u_int64_t
3 years ago
Martin Kroeker
73ffabe6ba
Guard uses of _mm512_reduce_add_p?
3 years ago
Martin Kroeker
7b146e590c
fix function typecast
4 years ago
Martin Kroeker
e9a0e52201
fix function typecast
4 years ago
Martin Kroeker
d1ee6ff73f
fix function typecasts
4 years ago
Martin Kroeker
5378046abd
roll back DGEMM kernels to 4x8 when compiling for DYNAMIC_ARCH
4 years ago
Caroline Newcombe
feeb8283a5
Fix unsafe read during final iteration of zsymv_L_sse2.S
4 years ago
Wangyang Guo
63a103ba6e
sbgemm: spr: disable small matrix path by default
4 years ago
Wangyang Guo
82194ea9d2
sbgemm: spr: implement otcopy_16
4 years ago
Wangyang Guo
8632380a96
sbgemm: spr: reuse ncopy_16 from cooperlake as incopy
4 years ago
Wangyang Guo
6bc8204ce5
sbgemm: spr: optimization for tmp_c buffer
4 years ago
Wangyang Guo
f018aa342a
sbgemm: spr: kernel handle alpha != 1.0
4 years ago
Wangyang Guo
a52456b168
sbgemm: spr: oncopy: use tile load/store instead
4 years ago
Wangyang Guo
f2485352a6
sbgemm: spr: only load A once in tail_k handling
4 years ago
Wangyang Guo
9ab33228bb
sbgemm: spr: process k2 and odd k at the same time
4 years ago
Wangyang Guo
10d52646e2
sbgemm: spr: oncopy: avoid handling too much pointer at a time
4 years ago
Wangyang Guo
88154ed02d
sbgemm: spr: reduce tile conf loading by seperate tail k handling
4 years ago
Wangyang Guo
a70bfb52d5
sbgemm: spr: kernel works for NN case when alpha is 1.0
4 years ago
Wangyang Guo
6051c86741
sbgemm: spr: kernel works for m32 in NN case
4 years ago
Wangyang Guo
d0b253ac6e
sbgemm: spr: implement oncopy_16
4 years ago
Wangyang Guo
1d48b7cb16
sbgemm: spr: add dummy source files
4 years ago
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
4 years ago
Wangyang Guo
ee5ca8a328
x86_64: BFLOAT16: fix build warning
4 years ago
Martin Kroeker
8dfa61a61c
Initialize abs_mask1 with itself to silence a gcc warning
4 years ago
Martin Kroeker
99aa10b3ff
Initialize abs_mask1 with itself to silence a gcc warning
actual initialization is via the _mm_cmpeq_ep18, which I've seen claimed to be the fastest way to set an xmm register to all 1s
4 years ago
Martin Kroeker
ce036a2fc0
Add casts
4 years ago
Martin Kroeker
af8843875a
Merge pull request #3376 from martin-frbg/issue3370
Fix a few harmless compiler warnings
4 years ago
Martin Kroeker
0925dfe2c9
One instance of kernel_4x1 is used even on SKX
4 years ago
Martin Kroeker
7d873a329f
Add ifdefs around conditionally used functions
4 years ago
Martin Kroeker
d17238599b
Add casts
4 years ago
Wangyang Guo
59a1114d03
sbgemm: cooperlake: tuning for small matrix
4 years ago
Wangyang Guo
682d66555d
sbgemm: cooperlake: implement ncopy_16
4 years ago
Wangyang Guo
beccb83b16
sbgemm: cooperlake: add n24 kernel for tcopy_4
4 years ago
Wangyang Guo
5fcacad32b
sbgemm: cooperlake: implement tcopy_4
4 years ago
Wangyang Guo
bb1c4fa5bd
sbgemm: cooperlake: prefetch A & B
4 years ago
Wangyang Guo
7a2d1601ec
sbgemm: cooperlake: unroll core loop by 2
4 years ago