Martin Kroeker
08fa83aba2
Merge pull request #2312 from martin-frbg/power8be
Further Power8 big-endian corrections
6 years ago
Martin Kroeker
63d3ee8dfc
Merge pull request #2313 from ewanglong/develop
Fix the integer overflow issue for large matrix size
6 years ago
Wang, Long
1191db1a49
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Wang, Long
0caf1434c9
Fix the integer overflow issue for large matrix size
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Martin Kroeker
73128f3883
Merge pull request #2310 from martin-frbg/ppc440
Fix PPC440 big-endian support and disable the QCDOC qalloc routine by default
6 years ago
Martin Kroeker
cad0d150db
Define alternate kernels for big-endian POWER8
6 years ago
Martin Kroeker
eba0aeb7cd
Fix compilation for big-endian POWER8
6 years ago
Martin Kroeker
0c07c356c1
Define alternate kernels for big-endian PPC440
6 years ago
Martin Kroeker
82b75f97e5
Disable the old QCDOC qalloc by default and copy utility functions from memory.c
1. qalloc() appears to have been a special routine written for the PPC440-based QCDOC supercomputer(s) from around 2005, its source does not seem to be readily available. So switch the #if 1 in the code to rely on standard malloc() by default.
2. Utility functions like get_num_procs, get_num_threads that were added to the "normally" used memory.c in the meantime were still missing here.
6 years ago
Martin Kroeker
7887c45077
Merge pull request #17 from xianyi/develop
rebase
6 years ago
Martin Kroeker
3e67017ac8
Merge pull request #2309 from martin-frbg/ppc970-be
Fix PPC970 big-endian support
6 years ago
Martin Kroeker
b3ac6ee222
Define alternate kernels for big-endian PPC970
The altivec versions of SGEMM and CGEMM fail most test in LAPACK-TESTING when compiled for big endian, STRSM/CTRSM even cause segfaults. The rot kernels either fail the corresponding utest or lead to failures in LAPACK-TESTING.
6 years ago
Martin Kroeker
6082e556cd
Use "generic" S/CGEMM unroll M on big-endian PPC970
as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian
6 years ago
Martin Kroeker
92315173d5
Merge pull request #2308 from martin-frbg/ctestfix
Fix potential issue in the c/z blas3 ctests
6 years ago
Martin Kroeker
351d12b94e
Fix potential spurious failure from uninitialized variable
6 years ago
Martin Kroeker
bf73aa141b
Fix potential spurious failure from uninitialized variable
6 years ago
Martin Kroeker
71e96163db
Merge pull request #2305 from wjc404/develop
AVX512 CGEMM & ZGEMM kernels
6 years ago
wjc404
819e852ae7
AVX512 CGEMM & ZGEMM kernels
96-99% 1-thread performance of MKL2018
6 years ago
Martin Kroeker
4e466d739c
Merge pull request #15 from xianyi/develop
rebase
6 years ago
Martin Kroeker
4c6a457358
Merge pull request #2300 from wjc404/develop
Optimize SGEMM on SKYLAKEX CPUs
6 years ago
wjc404
836c414e22
optimizations of software prefetching
6 years ago
Martin Kroeker
d403eb3c2f
Merge pull request #2302 from martin-frbg/ppc970
Disable three-operand DCBT on PPC970 regardless of operating system
6 years ago
Martin Kroeker
3cd97f1a80
Merge pull request #2301 from martin-frbg/ppc8be
Disable IDAMIN/MAX and IZAMIN/MAX optimizations on big-endian POWER8
6 years ago
Martin Kroeker
9955f0996f
Merge pull request #2294 from martin-frbg/ios-cleanup
Remove obsolete workarounds for IOS on ARMV8
6 years ago
wjc404
430c11e135
Add files via upload
6 years ago
wjc404
fbacd2605d
optimizations via software prefetches
6 years ago
Martin Kroeker
6fa89b06a1
Use the two-operand form of DCBT on all PPC970 regardless of OS
There seems to be no advantage to the three-operand form used in the earliest GotoBLAS kernels, and it causes compilation problems on other than the previously special-cased platforms as well
6 years ago
Martin Kroeker
68597002ea
The assembly microkernel is not safe to use on ELFv1
6 years ago
Martin Kroeker
d2a6285549
The assembly microkernel is not safe to use on ELFv1
6 years ago
Martin Kroeker
d999688d1a
The assembly microkernel is not safe to use on ELFv1
6 years ago
Martin Kroeker
928fe1b28e
The assembly microkernel is not safe to use on ELFv1
6 years ago
Martin Kroeker
ccc28c6d60
Merge pull request #13 from xianyi/develop
resync with upstream
6 years ago
wjc404
ae43b75a6a
Add files via upload
6 years ago
wjc404
54fc06fd70
Add files via upload
6 years ago
wjc404
1df9a2013d
new sgemm kernel for skylakex
6 years ago
wjc404
274ff5cdb8
update sgemm_q on skylakex cpus
6 years ago
Martin Kroeker
eb2eddf241
Merge pull request #2296 from kdunee/develop
Fixed a minor cmake problem, occuring when DYNAMIC_ARCH=ON and CMAKE_C_FLAGS was empty
6 years ago
k.dunikowski
8691825944
Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty
6 years ago
Martin Kroeker
7dc8a76f60
Merge pull request #2293 from martin-frbg/pr2288
Add support for NetBSD by adding it to the existing xBSD conditionals
6 years ago
Martin Kroeker
df857551c0
Remove special parameter set for obsolete IOS/ARMV8 workaround
6 years ago
Martin Kroeker
85ccdce8c4
Remove the IOS fallbacks to generic C kernels
6 years ago
Martin Kroeker
aeabe0a83f
Fix regex to parse -R options with and without whitespace
Both forms are seen on NetBSD (#2288 )
6 years ago
Martin Kroeker
1b90989662
Add NetBSD to the xBSD conditionals
6 years ago
Martin Kroeker
e3e8b5cdca
Add NetBSD
6 years ago
Martin Kroeker
69b16a894d
Merge pull request #2292 from martin-frbg/g95fixes
Improve support for g95 and non-GNU ld
6 years ago
Martin Kroeker
6782e5767d
Merge pull request #2291 from martin-frbg/gensymbol
Fix netlib 3.7/3.8 function enumeration for linktest
6 years ago
Martin Kroeker
48f5a89f92
Merge pull request #2282 from martin-frbg/issue2281
Optimize RPCC function on ARM64
6 years ago
Martin Kroeker
4ae1610f37
Merge pull request #2290 from martin-frbg/cpuidfixes
Fixup x86 cpuid changes from #2283
6 years ago
Martin Kroeker
911c3e2f4b
Improve support for g95 and non-GNU ld
Auto-add "-fno-second-underscore" option to make LAPACKE compile (as it calls LAPACK functions that may have gotten a second underscore added otherwise). Also support -R for rpath when parsing compiler directives in f_check
6 years ago
Martin Kroeker
fab49e49e5
Move most lapack 3.7/3.8 additions to the embedded_underscores list
to allow linktest to pass with a compiler that adds a second underscore to such names
6 years ago