wjc404
2cd9306bb5
Update KERNEL.ZEN
6 years ago
wjc404
c418c81224
Update KERNEL.HASWELL
6 years ago
wjc404
025741f16a
Fast Haswell CGEMM kernel
6 years ago
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
6 years ago
wjc404
d573d24de7
Fast Haswell ZGEMM kernel
6 years ago
Isuru Fernando
b863b32ac5
Workaround an ICE in clang 9.0.0
This bug is not there in 8.x nor in the 9.0 daily snapshot.
6 years ago
wjc404
934e601e93
Update dgemm_kernel_4x8_skylakex_2.c
6 years ago
wjc404
eb1e9c8c92
some optimizations
6 years ago
Wang, Long
bfb5fbdb4d
revised fix windows compatible for #2313
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Wang, Long
1191db1a49
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Wang, Long
0caf1434c9
Fix the integer overflow issue for large matrix size
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
wjc404
819e852ae7
AVX512 CGEMM & ZGEMM kernels
96-99% 1-thread performance of MKL2018
6 years ago
wjc404
836c414e22
optimizations of software prefetching
6 years ago
wjc404
430c11e135
Add files via upload
6 years ago
wjc404
fbacd2605d
optimizations via software prefetches
6 years ago
wjc404
1df9a2013d
new sgemm kernel for skylakex
6 years ago
wjc404
6ff013bae0
native support for icopy_4
90% MKL 1-thread performance.
6 years ago
wjc404
0d669e04bb
Update dgemm_kernel_8x8_skylakex.c
6 years ago
wjc404
17cdd9f9e1
some correction
6 years ago
wjc404
6bcb06fcb1
make further changes to icopy_8 easier
6 years ago
wjc404
b7315f8401
Add files via upload
6 years ago
wjc404
9b19e9e1b0
Update dgemm_kernel_8x8_skylakex.c
6 years ago
wjc404
6bd67ddbab
Update dgemm_kernel_8x8_skylakex.c
6 years ago
wjc404
844629af57
Add files via upload
6 years ago
Martin Kroeker
11c59acfb1
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows
6 years ago
Martin Kroeker
3a55dca2dc
Make x86_64 zdot compile with PGI and Sun C again
broken by #2222 as CREAL,CIMAG do not expand to a valid lvalue with these compilers
6 years ago
Martin Kroeker
9ef96b32a6
Add multithreading support to the x86_64 zdot kernel ( #2222 )
* Add multithreading support
copied from the ThunderX2T99 kernel. For #2221
6 years ago
Martin Kroeker
dccff2e785
Merge pull request #2206 from martin-frbg/zen-dtrmm
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
6 years ago
Martin Kroeker
5c3458a6e7
Merge pull request #2199 from martin-frbg/zen-dtrsm
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
6 years ago
Martin Kroeker
acf6002ab2
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
6 years ago
Martin Kroeker
2dfb804cb9
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
to improve performance on AMD Zen (#2180 ) applying wjc404's improvement of the DGEMM kernel from #2186
6 years ago
Martin Kroeker
4c153ec9da
Merge pull request #2196 from wjc404/develop
Add vbroadcastsd kernel to dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
7eecd8e39c
Add files via upload
6 years ago
Martin Kroeker
7b0b7c11d2
Merge pull request #2190 from martin-frbg/zdot-zen
Replace vpermpd with vpermilpd in the Haswell/Zen zdot microkernel
6 years ago
Martin Kroeker
28e96458e5
Replace vpermpd with vpermilpd
to improve performance on Zen/Zen2 (as demonstrated by wjc404 in #2180 )
6 years ago
wjc404
95fb98f556
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
4801c6d36b
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
9440fa607d
Add files via upload
6 years ago
wjc404
94db259e5b
Add files via upload
6 years ago
wjc404
f49f8047ac
Add files via upload
6 years ago
wjc404
825777faab
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
9c89757562
Add files via upload
6 years ago
wjc404
9b04baeaee
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
8a074b3965
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
211ab03b14
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
1733f927e6
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
182b06d6ad
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
7a9050d681
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
0ba29fd262
Update dgemm_kernel_4x8_haswell.S for zen2
replaced a bunch of vpermpd instructions with vpermilpd and vperm2f128
6 years ago
Martin Kroeker
9ea30f3788
Replace ISMIN and ISAMIN kernels on all x86_64 platforms ( #2125 )
* Mark iamax_sse.S as unsuitable for MIN due to issue #2116
* Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116
6 years ago