Martin Kroeker
bc792904ea
use modified gemm benchmark to trigger race condition
6 years ago
Martin Kroeker
d8735bb66a
parallelize gemm benchmark to trigger races
6 years ago
Martin Kroeker
0e0681f535
Experimental barrier
6 years ago
Martin Kroeker
29a50dd048
increase nthreads to 96
6 years ago
Martin Kroeker
aa8269d472
Add g++ as dependency for dgemm_tester
6 years ago
Martin Kroeker
e1ec040b95
Try dgemm_tester instead of lapack-test
6 years ago
Martin Kroeker
9a4959997d
Add python dependency for lapack test
6 years ago
Martin Kroeker
8639c8a683
Try to get an all-core lapack test to identify barrier issues
6 years ago
Martin Kroeker
3a6d51c2fd
Merge pull request #44 from xianyi/develop
Add a Z13 build to the Travis configuration (#2542 )
6 years ago
Martin Kroeker
1c7771df96
Merge pull request #43 from martin-frbg/revert-42-z12ci
Revert 42 z12ci to keep forked develop clean
6 years ago
Martin Kroeker
a56c9ec52a
Revert "Add IBM Z to Travis configuration ( #42 )"
This reverts commit 7972beb375 .
6 years ago
Martin Kroeker
4ae6d1a01b
Add a Z13 build to the Travis configuration ( #2542 )
* Add IBM Z to Travis configuration
6 years ago
Martin Kroeker
7972beb375
Add IBM Z to Travis configuration ( #42 )
* Add IBM Z to Travis configuration
6 years ago
Martin Kroeker
7bd8624b79
Merge pull request #41 from xianyi/develop
rebase
6 years ago
Martin Kroeker
806f89166e
Make ARMV7 compile with xcode and add a CI job for it ( #2537 )
* Add an ARMV7 iOS build on Travis
* thread_local appears to be unavailable on ARMV7 iOS
* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH
* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
6 years ago
Martin Kroeker
f059e614eb
Merge pull request #2536 from martin-frbg/recurs
Add "recursive" option for LAPACK builds with ifort or pgfort as well
6 years ago
Martin Kroeker
e13b6773ee
ifort and pgfort need "recursive" for safe compilation of LAPACK as well
6 years ago
Martin Kroeker
a05243d0f2
ifort and pgfort need "recursive" for compiling LAPACK as well
as shown in Reference-LAPACK issue 401 (their PR 403)
6 years ago
Martin Kroeker
c6af9bbb32
Merge pull request #2534 from martin-frbg/issue2496
Fix zero initialization for beta=0 case
6 years ago
Martin Kroeker
144be81ca1
fix initialization to zero in the NEON SGEMM_BETA kernel as well
6 years ago
Martin Kroeker
07cdd5d05c
Fix zero initialization for beta=0 case
use immediate initialization instead of multiplication in case register content is a NaN
6 years ago
Martin Kroeker
567d2760e6
Merge pull request #2520 from wjc404/develop
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
6 years ago
Martin Kroeker
018bb3e433
Merge pull request #2533 from martin-frbg/gemmdirect2
Use runtime check for AVX512 capability in DYNAMIC_ARCH builds made on SKX
6 years ago
Martin Kroeker
79fd006c58
Expose the support_avx512 function provided in dynamic.c
6 years ago
Martin Kroeker
8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH
6 years ago
Martin Kroeker
a986d42ea6
Merge pull request #39 from xianyi/develop
rebase
6 years ago
Martin Kroeker
b6a948fbee
Merge pull request #2530 from martin-frbg/dynmsg
Add message highlighting minimum target choice at end of DYNAMIC_ARCH…
6 years ago
Martin Kroeker
0cc352417e
Merge pull request #2529 from shengyang-3390/dev1
add ctest for drotm and modified ctest for drot.
6 years ago
Martin Kroeker
fe47dc8673
Add message highlighting minimum target choice at end of DYNAMIC_ARCH builds
related to #2526
6 years ago
Martin Kroeker
9f67d03d3b
Merge pull request #2527 from martin-frbg/gemmdirect
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
6 years ago
shengyang
50f4fb2fbd
add ctest for drotm and modified ctest for drot.
make sure that test cases cover all code path when kernel uses looping unrolling.
6 years ago
Martin Kroeker
6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
6 years ago
Martin Kroeker
8c7c1395da
Merge pull request #2521 from martin-frbg/cm-avx512
Use proper extension on the avx512 testcase filename
6 years ago
Martin Kroeker
5f6f6a2c7d
Merge pull request #2525 from andreas-schwab/develop
Fix ARCHCONFIG for Neoverse-N1
6 years ago
Andreas Schwab
71cf2acdef
Fix ARCHCONFIG for Neoverse-N1
../config_kernel.h:24:9: warning: missing whitespace after the macro name
24 | #define ARMV8-march armv8.2-a
| ^~~~~
6 years ago
Martin Kroeker
1d9773b800
Use proper extension on the avx512 testcase filename
The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)
6 years ago
Martin Kroeker
a46a8c4956
Merge pull request #2518 from shengyang-3390/dev
add ctest for srotm and modified ctest for srot.
6 years ago
Martin Kroeker
7ae737e04c
Merge pull request #2519 from martin-frbg/issue2472
Fix cmake compilation with ifort on Windows
6 years ago
wjc404
64daad4365
Update param.h
6 years ago
wjc404
b8307768e2
Add files via upload
6 years ago
Martin Kroeker
6d54c94760
Make ifort on Windows create lowercase symbols with appended underscore
tentative fix for #2472
6 years ago
Martin Kroeker
c0da205412
Merge pull request #38 from xianyi/develop
rebase
6 years ago
shengyang
a06d78556d
add ctest for srotm and modified ctest for srot.
make sure that test cases cover all code path when kernel uses looping unrolling.
6 years ago
Martin Kroeker
af8a619e1f
Merge pull request #2517 from wjc404/develop
Temporary fix for SKX STRSM
6 years ago
wjc404
62b9608986
Update KERNEL.SKYLAKEX
6 years ago
Martin Kroeker
717c604aeb
Merge pull request #2515 from zelong-1024/develop
[OpenBLAS]: benchmark for her/her2 LEVEL2 functions
6 years ago
Martin Kroeker
ce33da4cab
Merge pull request #2513 from aaawuanjun/develop
[OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile
6 years ago
Martin Kroeker
a1b181cea2
Merge pull request #2516 from wjc404/develop
AVX2 STRSM kernels
6 years ago
wjc404
cdc0e9011e
Update KERNEL.ZEN
6 years ago
wjc404
fa049d49c2
AVX2 STRSM kernel
6 years ago