Martin Kroeker
d8bdd4f236
revert previous, num_buffers is not a makefile variable
6 years ago
Martin Kroeker
ff23bd09f4
Update gemm.c
6 years ago
Martin Kroeker
1d12a33a9d
print num_buffers at end of build just to be sure
6 years ago
Martin Kroeker
c00b960009
Update .drone.yml
6 years ago
Martin Kroeker
417eb28517
Update .drone.yml
6 years ago
Martin Kroeker
54973cca1b
Update .drone.yml
6 years ago
Martin Kroeker
5d2cf4ec19
Update gemm.c
6 years ago
Martin Kroeker
4ffe9d788f
Update .drone.yml
6 years ago
Martin Kroeker
f10c9a99a3
Delete azure-pipelines.yml
6 years ago
Martin Kroeker
b7fa8fe694
Delete appveyor.yml
6 years ago
Martin Kroeker
71b8e284e6
Delete .travis.yml
6 years ago
Martin Kroeker
8290b6787f
Update .drone.yml
6 years ago
Martin Kroeker
67de70813c
remove thread count from pragma as drone.io HW varies
6 years ago
Martin Kroeker
35036d9b61
reduce NUM_PARALLEL to 1
6 years ago
Martin Kroeker
ce95853101
limit dgemm benchmark to just 10,10,0
6 years ago
Martin Kroeker
11528f3afe
Update gemm.c
6 years ago
Martin Kroeker
9ed53824d9
Update gemm.c
6 years ago
Martin Kroeker
3778b91657
Update gemm.c
6 years ago
Martin Kroeker
626e98028d
Update gemm.c
6 years ago
Martin Kroeker
aa170123e6
fix accidental deletion
6 years ago
Martin Kroeker
353e996d1d
Merge branch 'develop' into dronethunder2
6 years ago
Martin Kroeker
bc792904ea
use modified gemm benchmark to trigger race condition
6 years ago
Martin Kroeker
d8735bb66a
parallelize gemm benchmark to trigger races
6 years ago
Martin Kroeker
69f277f8ee
Add another memory barrier for ARM and a multicore test run on ThunderX to help detect such issues ( #2544 )
* Add another memory barrier in memory.c to prevent races in memory slot allocation
* Add an all-core test on Drone.io's ThunderX platform and modify dgemm_tester to use all 96 cores
6 years ago
Martin Kroeker
0e0681f535
Experimental barrier
6 years ago
Martin Kroeker
29a50dd048
increase nthreads to 96
6 years ago
Martin Kroeker
aa8269d472
Add g++ as dependency for dgemm_tester
6 years ago
Martin Kroeker
e1ec040b95
Try dgemm_tester instead of lapack-test
6 years ago
Martin Kroeker
9a4959997d
Add python dependency for lapack test
6 years ago
Martin Kroeker
8639c8a683
Try to get an all-core lapack test to identify barrier issues
6 years ago
Martin Kroeker
3a6d51c2fd
Merge pull request #44 from xianyi/develop
Add a Z13 build to the Travis configuration (#2542 )
6 years ago
Martin Kroeker
1c7771df96
Merge pull request #43 from martin-frbg/revert-42-z12ci
Revert 42 z12ci to keep forked develop clean
6 years ago
Martin Kroeker
a56c9ec52a
Revert "Add IBM Z to Travis configuration ( #42 )"
This reverts commit 7972beb375 .
6 years ago
Martin Kroeker
4ae6d1a01b
Add a Z13 build to the Travis configuration ( #2542 )
* Add IBM Z to Travis configuration
6 years ago
Martin Kroeker
7972beb375
Add IBM Z to Travis configuration ( #42 )
* Add IBM Z to Travis configuration
6 years ago
Martin Kroeker
7bd8624b79
Merge pull request #41 from xianyi/develop
rebase
6 years ago
Martin Kroeker
806f89166e
Make ARMV7 compile with xcode and add a CI job for it ( #2537 )
* Add an ARMV7 iOS build on Travis
* thread_local appears to be unavailable on ARMV7 iOS
* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH
* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
6 years ago
Martin Kroeker
f059e614eb
Merge pull request #2536 from martin-frbg/recurs
Add "recursive" option for LAPACK builds with ifort or pgfort as well
6 years ago
Martin Kroeker
e13b6773ee
ifort and pgfort need "recursive" for safe compilation of LAPACK as well
6 years ago
Martin Kroeker
a05243d0f2
ifort and pgfort need "recursive" for compiling LAPACK as well
as shown in Reference-LAPACK issue 401 (their PR 403)
6 years ago
Martin Kroeker
c6af9bbb32
Merge pull request #2534 from martin-frbg/issue2496
Fix zero initialization for beta=0 case
6 years ago
Martin Kroeker
144be81ca1
fix initialization to zero in the NEON SGEMM_BETA kernel as well
6 years ago
Martin Kroeker
07cdd5d05c
Fix zero initialization for beta=0 case
use immediate initialization instead of multiplication in case register content is a NaN
6 years ago
Martin Kroeker
567d2760e6
Merge pull request #2520 from wjc404/develop
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
6 years ago
Martin Kroeker
018bb3e433
Merge pull request #2533 from martin-frbg/gemmdirect2
Use runtime check for AVX512 capability in DYNAMIC_ARCH builds made on SKX
6 years ago
Martin Kroeker
79fd006c58
Expose the support_avx512 function provided in dynamic.c
6 years ago
Martin Kroeker
8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH
6 years ago
Martin Kroeker
a986d42ea6
Merge pull request #39 from xianyi/develop
rebase
6 years ago
Martin Kroeker
b6a948fbee
Merge pull request #2530 from martin-frbg/dynmsg
Add message highlighting minimum target choice at end of DYNAMIC_ARCH…
6 years ago
Martin Kroeker
0cc352417e
Merge pull request #2529 from shengyang-3390/dev1
add ctest for drotm and modified ctest for drot.
6 years ago