Caroline Newcombe
|
5cc1111383
|
fix unsafe read of Y in assembly kernel
|
3 years ago |
Wangyang Guo
|
225683218c
|
Small Matrix: use proper inline asm input constraint for AVX512 mask
|
3 years ago |
Martin Kroeker
|
9c626e466e
|
really fix definition of SHUFFLE_MAGIC_NO
|
3 years ago |
Martin Kroeker
|
9d7429406f
|
Declare SHUFFLE_MAGIC_NO as const to placate clang
|
3 years ago |
Martin Kroeker
|
522f809825
|
Merge pull request #3542 from martin-frbg/issue3540
Fix compilation for CooperLake on Windows/clang
|
3 years ago |
Mosè Giordano
|
abbc947edb
|
Fix compilation of Skylake AVX512 kernels with GCC 6
|
3 years ago |
Martin Kroeker
|
c62f8e2c01
|
Prevent compiler attempts to use k0 as mask register
|
3 years ago |
Martin Kroeker
|
80eb581c83
|
Fix non-portable u_int64_t
|
3 years ago |
Martin Kroeker
|
73ffabe6ba
|
Guard uses of _mm512_reduce_add_p?
|
3 years ago |
Martin Kroeker
|
7b146e590c
|
fix function typecast
|
4 years ago |
Martin Kroeker
|
e9a0e52201
|
fix function typecast
|
4 years ago |
Martin Kroeker
|
d1ee6ff73f
|
fix function typecasts
|
4 years ago |
Martin Kroeker
|
5378046abd
|
roll back DGEMM kernels to 4x8 when compiling for DYNAMIC_ARCH
|
4 years ago |
Caroline Newcombe
|
feeb8283a5
|
Fix unsafe read during final iteration of zsymv_L_sse2.S
|
4 years ago |
Wangyang Guo
|
63a103ba6e
|
sbgemm: spr: disable small matrix path by default
|
4 years ago |
Wangyang Guo
|
82194ea9d2
|
sbgemm: spr: implement otcopy_16
|
4 years ago |
Wangyang Guo
|
8632380a96
|
sbgemm: spr: reuse ncopy_16 from cooperlake as incopy
|
4 years ago |
Wangyang Guo
|
6bc8204ce5
|
sbgemm: spr: optimization for tmp_c buffer
|
4 years ago |
Wangyang Guo
|
f018aa342a
|
sbgemm: spr: kernel handle alpha != 1.0
|
4 years ago |
Wangyang Guo
|
a52456b168
|
sbgemm: spr: oncopy: use tile load/store instead
|
4 years ago |
Wangyang Guo
|
f2485352a6
|
sbgemm: spr: only load A once in tail_k handling
|
4 years ago |
Wangyang Guo
|
9ab33228bb
|
sbgemm: spr: process k2 and odd k at the same time
|
4 years ago |
Wangyang Guo
|
10d52646e2
|
sbgemm: spr: oncopy: avoid handling too much pointer at a time
|
4 years ago |
Wangyang Guo
|
88154ed02d
|
sbgemm: spr: reduce tile conf loading by seperate tail k handling
|
4 years ago |
Wangyang Guo
|
a70bfb52d5
|
sbgemm: spr: kernel works for NN case when alpha is 1.0
|
4 years ago |
Wangyang Guo
|
6051c86741
|
sbgemm: spr: kernel works for m32 in NN case
|
4 years ago |
Wangyang Guo
|
d0b253ac6e
|
sbgemm: spr: implement oncopy_16
|
4 years ago |
Wangyang Guo
|
1d48b7cb16
|
sbgemm: spr: add dummy source files
|
4 years ago |
Wangyang Guo
|
3dc6052c7e
|
initial support for Sapphire Rapids platform
|
4 years ago |
Wangyang Guo
|
ee5ca8a328
|
x86_64: BFLOAT16: fix build warning
|
4 years ago |
Martin Kroeker
|
8dfa61a61c
|
Initialize abs_mask1 with itself to silence a gcc warning
|
4 years ago |
Martin Kroeker
|
99aa10b3ff
|
Initialize abs_mask1 with itself to silence a gcc warning
actual initialization is via the _mm_cmpeq_ep18, which I've seen claimed to be the fastest way to set an xmm register to all 1s
|
4 years ago |
Martin Kroeker
|
ce036a2fc0
|
Add casts
|
4 years ago |
Martin Kroeker
|
af8843875a
|
Merge pull request #3376 from martin-frbg/issue3370
Fix a few harmless compiler warnings
|
4 years ago |
Martin Kroeker
|
0925dfe2c9
|
One instance of kernel_4x1 is used even on SKX
|
4 years ago |
Martin Kroeker
|
7d873a329f
|
Add ifdefs around conditionally used functions
|
4 years ago |
Martin Kroeker
|
d17238599b
|
Add casts
|
4 years ago |
Wangyang Guo
|
59a1114d03
|
sbgemm: cooperlake: tuning for small matrix
|
4 years ago |
Wangyang Guo
|
682d66555d
|
sbgemm: cooperlake: implement ncopy_16
|
4 years ago |
Wangyang Guo
|
beccb83b16
|
sbgemm: cooperlake: add n24 kernel for tcopy_4
|
4 years ago |
Wangyang Guo
|
5fcacad32b
|
sbgemm: cooperlake: implement tcopy_4
|
4 years ago |
Wangyang Guo
|
bb1c4fa5bd
|
sbgemm: cooperlake: prefetch A & B
|
4 years ago |
Wangyang Guo
|
7a2d1601ec
|
sbgemm: cooperlake: unroll core loop by 2
|
4 years ago |
Wangyang Guo
|
45fdf951b6
|
sbgemm: cooperlake: reorder ptr increase for performance
|
4 years ago |
Wangyang Guo
|
cece3541ab
|
sbgemm: cooperlake: fix bug in m64n12
|
4 years ago |
Wangyang Guo
|
9df0953cde
|
sbgemm: cooperlake: kernel works for NN
|
4 years ago |
Wangyang Guo
|
2ec9f3a8aa
|
sbgemm: cooperlake: change kernel size to 16x4
|
4 years ago |
Wangyang Guo
|
ef8f5fecc8
|
sbgemm: cooperlake: implement sbgemm_tcopy_32
|
4 years ago |
Wangyang Guo
|
4c294336e6
|
sbgemm: cooperlake: add dummy source files
|
4 years ago |
Wangyang Guo
|
619588fbab
|
sbgemm: remove unnecessary b0 files
|
4 years ago |