Martin Kroeker
fd99b3e057
workaround for sign change warning
6 years ago
Martin Kroeker
aab5380aa8
typo fix
6 years ago
Martin Kroeker
6f2e18d5e5
Comment out SGEMM_R for POWER8 again, try if declaring P and Q as UL is sufficient to avoid int overflow
6 years ago
Martin Kroeker
66caf61a2c
Try predefining GEMM_R for POWER8
6 years ago
Martin Kroeker
188e9239a4
Increase BUFFER_SIZE and remove remnants of arm64 source
6 years ago
Martin Kroeker
0b8d69f7ae
Restore correct version
6 years ago
Martin Kroeker
07d59c0455
print the current values when buffer_size is too small
6 years ago
Martin Kroeker
f03b667dd2
Increase BUFFER_SIZE for POWER8/9
6 years ago
Martin Kroeker
053712eb1f
Increase BUFFER_SIZE
6 years ago
Martin Kroeker
db6db050de
Increase BUFFER_SIZE for POWER8/9
6 years ago
Martin Kroeker
b21ca5c96a
Increase BUFFER_SIZE for POWER8/9
6 years ago
Martin Kroeker
cab855d56e
Increase default BUFFER_SIZE for Haswell, Zen and SKX
6 years ago
Martin Kroeker
df989d7a52
Add compile-time guard for adequate buffersize
as suggested by akobotov in #2538
6 years ago
Martin Kroeker
5e3e657caa
Make BUFFER_SIZE configurable and increase its default value for TSV110 and EMAG8180
6 years ago
Martin Kroeker
7bd8624b79
Merge pull request #41 from xianyi/develop
rebase
6 years ago
Martin Kroeker
806f89166e
Make ARMV7 compile with xcode and add a CI job for it ( #2537 )
* Add an ARMV7 iOS build on Travis
* thread_local appears to be unavailable on ARMV7 iOS
* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH
* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
6 years ago
Martin Kroeker
f059e614eb
Merge pull request #2536 from martin-frbg/recurs
Add "recursive" option for LAPACK builds with ifort or pgfort as well
6 years ago
Martin Kroeker
e13b6773ee
ifort and pgfort need "recursive" for safe compilation of LAPACK as well
6 years ago
Martin Kroeker
a05243d0f2
ifort and pgfort need "recursive" for compiling LAPACK as well
as shown in Reference-LAPACK issue 401 (their PR 403)
6 years ago
Martin Kroeker
c6af9bbb32
Merge pull request #2534 from martin-frbg/issue2496
Fix zero initialization for beta=0 case
6 years ago
Martin Kroeker
144be81ca1
fix initialization to zero in the NEON SGEMM_BETA kernel as well
6 years ago
Martin Kroeker
07cdd5d05c
Fix zero initialization for beta=0 case
use immediate initialization instead of multiplication in case register content is a NaN
6 years ago
Martin Kroeker
567d2760e6
Merge pull request #2520 from wjc404/develop
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
6 years ago
Martin Kroeker
018bb3e433
Merge pull request #2533 from martin-frbg/gemmdirect2
Use runtime check for AVX512 capability in DYNAMIC_ARCH builds made on SKX
6 years ago
Martin Kroeker
79fd006c58
Expose the support_avx512 function provided in dynamic.c
6 years ago
Martin Kroeker
8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH
6 years ago
Martin Kroeker
a986d42ea6
Merge pull request #39 from xianyi/develop
rebase
6 years ago
Martin Kroeker
b6a948fbee
Merge pull request #2530 from martin-frbg/dynmsg
Add message highlighting minimum target choice at end of DYNAMIC_ARCH…
6 years ago
Martin Kroeker
0cc352417e
Merge pull request #2529 from shengyang-3390/dev1
add ctest for drotm and modified ctest for drot.
6 years ago
Martin Kroeker
fe47dc8673
Add message highlighting minimum target choice at end of DYNAMIC_ARCH builds
related to #2526
6 years ago
Martin Kroeker
9f67d03d3b
Merge pull request #2527 from martin-frbg/gemmdirect
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
6 years ago
shengyang
50f4fb2fbd
add ctest for drotm and modified ctest for drot.
make sure that test cases cover all code path when kernel uses looping unrolling.
6 years ago
Martin Kroeker
6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
6 years ago
Martin Kroeker
8c7c1395da
Merge pull request #2521 from martin-frbg/cm-avx512
Use proper extension on the avx512 testcase filename
6 years ago
Martin Kroeker
5f6f6a2c7d
Merge pull request #2525 from andreas-schwab/develop
Fix ARCHCONFIG for Neoverse-N1
6 years ago
Andreas Schwab
71cf2acdef
Fix ARCHCONFIG for Neoverse-N1
../config_kernel.h:24:9: warning: missing whitespace after the macro name
24 | #define ARMV8-march armv8.2-a
| ^~~~~
6 years ago
Martin Kroeker
1d9773b800
Use proper extension on the avx512 testcase filename
The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)
6 years ago
Martin Kroeker
a46a8c4956
Merge pull request #2518 from shengyang-3390/dev
add ctest for srotm and modified ctest for srot.
6 years ago
Martin Kroeker
7ae737e04c
Merge pull request #2519 from martin-frbg/issue2472
Fix cmake compilation with ifort on Windows
6 years ago
wjc404
64daad4365
Update param.h
6 years ago
wjc404
b8307768e2
Add files via upload
6 years ago
Martin Kroeker
6d54c94760
Make ifort on Windows create lowercase symbols with appended underscore
tentative fix for #2472
6 years ago
Martin Kroeker
c0da205412
Merge pull request #38 from xianyi/develop
rebase
6 years ago
shengyang
a06d78556d
add ctest for srotm and modified ctest for srot.
make sure that test cases cover all code path when kernel uses looping unrolling.
6 years ago
Martin Kroeker
af8a619e1f
Merge pull request #2517 from wjc404/develop
Temporary fix for SKX STRSM
6 years ago
wjc404
62b9608986
Update KERNEL.SKYLAKEX
6 years ago
Martin Kroeker
717c604aeb
Merge pull request #2515 from zelong-1024/develop
[OpenBLAS]: benchmark for her/her2 LEVEL2 functions
6 years ago
Martin Kroeker
ce33da4cab
Merge pull request #2513 from aaawuanjun/develop
[OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile
6 years ago
Martin Kroeker
a1b181cea2
Merge pull request #2516 from wjc404/develop
AVX2 STRSM kernels
6 years ago
wjc404
cdc0e9011e
Update KERNEL.ZEN
6 years ago