wjc404
2cd9306bb5
Update KERNEL.ZEN
6 years ago
wjc404
c418c81224
Update KERNEL.HASWELL
6 years ago
wjc404
025741f16a
Fast Haswell CGEMM kernel
6 years ago
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
6 years ago
wjc404
d573d24de7
Fast Haswell ZGEMM kernel
6 years ago
w00421467
b7cc69ee62
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
6 years ago
w00421467
aeef942c4f
use arm neon instructions to optimize gemm beta operation
6 years ago
Martin Kroeker
1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
6 years ago
Kavana Bhat
6baa9b07d7
AIX changes for Power8
6 years ago
Kavana Bhat
3938e59569
AIX changes for Power8
6 years ago
Isuru Fernando
b863b32ac5
Workaround an ICE in clang 9.0.0
This bug is not there in 8.x nor in the 9.0 daily snapshot.
6 years ago
Martin Kroeker
dd04143d4a
Merge pull request #2328 from martin-frbg/ppc9
Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version
6 years ago
Martin Kroeker
f3a6164bff
Merge pull request #2324 from antonblanchard/power9_segv
Fix SEGV in cdot_power9
6 years ago
Martin Kroeker
dedd822d1a
Fix caxpy/caxpyc naming in localentry
6 years ago
Martin Kroeker
2181fb7047
Fix caxpy/caxpyc naming in localentry
6 years ago
Martin Kroeker
a9b62c03f8
Substitute precompiled gcc7 codes only when gcc is older than 9.x
6 years ago
Martin Kroeker
97762234f9
Add variable for gcc >=9 test
used in KERNEL.POWER9
6 years ago
wjc404
934e601e93
Update dgemm_kernel_4x8_skylakex_2.c
6 years ago
Anton Blanchard
cf2a8e410c
Fix SEGV in cdot_power9
We were corrupting r2 because the local entry wasn't being
setup correctly.
6 years ago
wjc404
eb1e9c8c92
some optimizations
6 years ago
Andreas Arnez
d117dfd505
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead. The mismatch causes a build error. This is fixed.
6 years ago
Martin Kroeker
b09b5be0a4
Merge pull request #2315 from ewanglong/develop
revised fix windows compatible for #2313
6 years ago
Wang, Long
bfb5fbdb4d
revised fix windows compatible for #2313
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Martin Kroeker
08fa83aba2
Merge pull request #2312 from martin-frbg/power8be
Further Power8 big-endian corrections
6 years ago
Wang, Long
1191db1a49
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Wang, Long
0caf1434c9
Fix the integer overflow issue for large matrix size
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Martin Kroeker
cad0d150db
Define alternate kernels for big-endian POWER8
6 years ago
Martin Kroeker
eba0aeb7cd
Fix compilation for big-endian POWER8
6 years ago
Martin Kroeker
0c07c356c1
Define alternate kernels for big-endian PPC440
6 years ago
Martin Kroeker
3e67017ac8
Merge pull request #2309 from martin-frbg/ppc970-be
Fix PPC970 big-endian support
6 years ago
Martin Kroeker
b3ac6ee222
Define alternate kernels for big-endian PPC970
The altivec versions of SGEMM and CGEMM fail most test in LAPACK-TESTING when compiled for big endian, STRSM/CTRSM even cause segfaults. The rot kernels either fail the corresponding utest or lead to failures in LAPACK-TESTING.
6 years ago
Martin Kroeker
71e96163db
Merge pull request #2305 from wjc404/develop
AVX512 CGEMM & ZGEMM kernels
6 years ago
wjc404
819e852ae7
AVX512 CGEMM & ZGEMM kernels
96-99% 1-thread performance of MKL2018
6 years ago
Martin Kroeker
4c6a457358
Merge pull request #2300 from wjc404/develop
Optimize SGEMM on SKYLAKEX CPUs
6 years ago
wjc404
836c414e22
optimizations of software prefetching
6 years ago
Martin Kroeker
3cd97f1a80
Merge pull request #2301 from martin-frbg/ppc8be
Disable IDAMIN/MAX and IZAMIN/MAX optimizations on big-endian POWER8
6 years ago
wjc404
430c11e135
Add files via upload
6 years ago
wjc404
fbacd2605d
optimizations via software prefetches
6 years ago
Martin Kroeker
68597002ea
The assembly microkernel is not safe to use on ELFv1
6 years ago
Martin Kroeker
d2a6285549
The assembly microkernel is not safe to use on ELFv1
6 years ago
Martin Kroeker
d999688d1a
The assembly microkernel is not safe to use on ELFv1
6 years ago
Martin Kroeker
928fe1b28e
The assembly microkernel is not safe to use on ELFv1
6 years ago
wjc404
1df9a2013d
new sgemm kernel for skylakex
6 years ago
Martin Kroeker
85ccdce8c4
Remove the IOS fallbacks to generic C kernels
6 years ago
wjc404
6ff013bae0
native support for icopy_4
90% MKL 1-thread performance.
6 years ago
wjc404
0d669e04bb
Update dgemm_kernel_8x8_skylakex.c
6 years ago
wjc404
17cdd9f9e1
some correction
6 years ago
wjc404
6bcb06fcb1
make further changes to icopy_8 easier
6 years ago
wjc404
b7315f8401
Add files via upload
6 years ago
wjc404
9b19e9e1b0
Update dgemm_kernel_8x8_skylakex.c
6 years ago