Martin Kroeker
93592d1260
Merge pull request #2675 from wjc404/develop
AVX512 DGEMM TCOPY_16 Function
5 years ago
wjc404
086d87a302
AVX512 dgemm tcopy_16 function
5 years ago
Martin Kroeker
c3574ffe53
Merge pull request #2646 from wjc404/develop
Optimize AVX512 parallel DGEMM performance
5 years ago
wjc404
0e3ac4a06b
Add files via upload
5 years ago
Martin Kroeker
2271c3506b
Work around excessive LAPACK test failures on Skylake-X
Something in the plain C parts of x86_64 cscal.c and zscal.c appears to be miscompiled by both gfortran9 and ifort when compiling for skylakex-avx512, even when the optimized Haswell microkernel is not in use.
5 years ago
Martin Kroeker
90dba9f716
Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version
As discussed on the original PR #2329 , the "Apple Clang 11.0.3" that appears to be based the same LLVM release produces the same miscompilation of this file.
5 years ago
Martin Kroeker
5b0093b5fe
Convert aligned moves to unaligned
should have no performance impact on reasonably modern cpus and fixes occasional crashes in actual user code.
5 years ago
Martin Kroeker
567d2760e6
Merge pull request #2520 from wjc404/develop
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
5 years ago
wjc404
b8307768e2
Add files via upload
5 years ago
Martin Kroeker
af8a619e1f
Merge pull request #2517 from wjc404/develop
Temporary fix for SKX STRSM
5 years ago
wjc404
62b9608986
Update KERNEL.SKYLAKEX
5 years ago
Martin Kroeker
a1b181cea2
Merge pull request #2516 from wjc404/develop
AVX2 STRSM kernels
5 years ago
wjc404
cdc0e9011e
Update KERNEL.ZEN
5 years ago
wjc404
fa049d49c2
AVX2 STRSM kernel
5 years ago
Martin Kroeker
ea8eec5d17
Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
6 years ago
wjc404
dd22eb7621
Update cgemm_kernel_8x2_haswell.c
6 years ago
wjc404
2352331e60
Update zgemm_kernel_4x2_haswell.c
6 years ago
wjc404
1b980001dd
Update zgemm_kernel_4x2_haswell.c
6 years ago
wjc404
2515e1152f
Update cgemm_kernel_8x2_haswell.c
6 years ago
wjc404
903854c168
Add files via upload
6 years ago
wjc404
a2ff577a30
Update KERNEL.ZEN
6 years ago
wjc404
97a32cb0a5
Update KERNEL.HASWELL
6 years ago
Martin Liska
aeea14ee40
Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S.
6 years ago
Martin Liska
18bcc36a69
Fix implementation of iamax_sse.S as reported in #2116 .
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
6 years ago
wjc404
f566787e6e
Update KERNEL.SKYLAKEX
6 years ago
wjc404
e3368cbf18
AVX512 STRMM kernel
6 years ago
Bart Oldeman
7ea5e07d1c
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
6 years ago
wjc404
3447d04eaf
Update dgemm_kernel_16x2_skylakex.c
6 years ago
wjc404
8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
4e00d96a78
Update dgemm_kernel_16x2_skylakex.c
6 years ago
wjc404
096da2f51a
Update dgemm_kernel_16x2_skylakex.c
6 years ago
wjc404
081b188529
Update KERNEL.SKYLAKEX
6 years ago
wjc404
8019e70211
AVX512 16x2 DGEMM kernel
6 years ago
wjc404
e5dcdeb550
Update sgemm_direct_skylakex.c
6 years ago
wjc404
952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c
6 years ago
wjc404
feaafbedd3
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
6 years ago
wjc404
3a100b2797
Update KERNEL.SKYLAKEX
6 years ago
wjc404
bd4c032f52
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
92b10212de
optimize AVX2 SGEMM
6 years ago
wjc404
b73bf01378
optimize AVX2 SGEMM
6 years ago
wjc404
eb3c9f1db9
optimize AVX2 SGEMM
6 years ago
wjc404
a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c
6 years ago
wjc404
700fe5b5ee
Add files via upload
6 years ago
wjc404
f60840c420
Update KERNEL.ZEN
6 years ago
wjc404
109e18cd96
Update KERNEL.HASWELL
6 years ago
wjc404
ae1579be13
Create zgemm3m_kernel_4x4_haswell.c
6 years ago
wjc404
cd765f094b
Update cgemm3m_kernel_8x4_haswell.c
6 years ago
wjc404
3a66c8cac1
Update KERNEL.ZEN
6 years ago
wjc404
ed9af2f7da
Update KERNEL.HASWELL
6 years ago