af8084906
Add sgemm_direct by
2020-08-17 18:54:28 +0200
aa286e301
(refs/pull/2784/head)
Add typedef for bfloat16 if needed by
2020-08-17 15:32:14 +0200
9f0ef9cdf
Merge pull request #77 from xianyi/develop by
2020-08-17 15:28:15 +0200
6bfc66663
(refs/pull/2781/head)
revert by
2020-08-17 15:20:41 +0200
a8c6fb9e1
revert by
2020-08-17 15:20:16 +0200
5ec8f716c
revert by
2020-08-17 15:19:40 +0200
54e02aaf1
Update gemm.c by
2020-08-16 20:45:20 +0200
a83cb3966
Refactor sgemm_direct by
2020-08-16 19:01:43 +0200
5a74bd45f
remove include as sgemm_direct is handled at the makefile level now by
2020-08-16 09:20:44 +0200
56d4d4f84
Move sgemm_direct_performant helper to separate file by
2020-08-16 09:19:34 +0200
2586b26e2
Add direct_sgemm by
2020-08-16 09:16:52 +0200
86e3455d0
Add sgemm_direct targets by
2020-08-16 09:15:56 +0200
774029af3
move sgemm_direct function declarations by
2020-08-16 09:13:39 +0200
82f8a0aeb
Update .drone.yml by
2020-08-15 15:46:18 +0200
d57d503c1
Update Makefile by
2020-08-15 14:46:26 +0200
37ac23e8a
Add simple MT sgemm precision test and INTERFACE64 build by
2020-08-15 13:38:05 +0200
6a93e3b2b
Add simple sgemm preicsion test by
2020-08-15 13:33:52 +0200
47ce1dd08
Update gemm64.cpp by
2020-08-15 13:31:28 +0200
f5fcc5bae
Add trivial gemm test for multithread consistency by
2020-08-15 13:30:29 +0200
597010a96
(refs/pull/2778/head)
Fix incorrect argument to SLASET by
2020-08-14 00:41:56 +0200
d64f1ef26
Fix incorrect argument to SLASET by
2020-08-14 00:40:24 +0200
c62aad62e
Fix incorrect calls to DLASET by
2020-08-14 00:35:45 +0200
e740c4873
Enable COOPERLAKE build target by
2020-08-13 06:17:34 +0800
efdd237a9
Add a dedicated POWER9 build to the Travis CI (#2774) by
2020-08-12 23:08:38 +0200
8f1111f4c
(refs/pull/2774/head)
Update .travis.yml by
2020-08-12 22:35:29 +0200
b05289dd2
Switch p9 to Ubuntu 18 container to ensure P9 hosting by
2020-08-12 19:57:38 +0200
7632a561d
use autodetection for power9 in case there are still power8 boxes in the mix by
2020-08-12 18:05:14 +0200
941339824
Update .travis.yml by
2020-08-12 16:54:06 +0200
ef2db95f5
add the script back... by
2020-08-12 13:57:39 +0200
5137146d5
use plain apt commands rather than addon on ppc64le by
2020-08-12 12:50:55 +0200
072f68dbc
Update .travis.yml by
2020-08-12 10:54:10 +0200
f7bd46483
Update .travis.yml by
2020-08-11 21:13:48 +0200
93e748d67
(refs/pull/2771/head)
Change BFLOAT16 data type/API support naming by
2020-08-11 09:27:29 +0800
4573cb2f4
Merge pull request #2765 from martin-frbg/issue2760 by
2020-08-11 22:40:17 +0200
2a4bb797d
Merge pull request #2773 from martin-frbg/issue2770 by
2020-08-11 21:02:55 +0200
72f8d8f44
Update .travis.yml by
2020-08-11 18:34:22 +0200
cbbe38bb8
Merge pull request #2772 from mhillenibm/s390x_gemm_tuning by
2020-08-11 18:14:09 +0200
4f9fb930e
Update .travis.yml by
2020-08-11 18:06:18 +0200
22f746786
Update .travis.yml by
2020-08-11 17:57:16 +0200
780bd896b
Update .travis.yml by
2020-08-11 17:49:59 +0200
7dd3ccf79
Bump gcc version for POWER9 build by
2020-08-11 17:37:36 +0200
8ccd6831d
Add dedicated POWER9 build by
2020-08-11 16:12:49 +0200
619343278
(refs/pull/2773/head)
Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0 by
2020-08-11 13:40:40 +0200
fee361ae6
fix another source of NO_CBLAS=0 surprise by
2020-08-11 13:27:19 +0200
62f4c84f2
Merge pull request #76 from xianyi/develop by
2020-08-11 13:25:12 +0200
e115c97e0
(refs/pull/2772/head)
s390x/SGEMM: adjust default P and Q to multiples of M by
2020-08-11 12:55:59 +0200
07c334e7b
s390x: Factor out small block sizes for SGEMM/DGEMM on z14 by
2020-08-11 12:55:53 +0200
e2828e30a
s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving by
2020-08-11 12:55:42 +0200
7219c9cb8
Merge pull request #2764 from martin-frbg/lapacktests by
2020-08-10 13:27:51 +0200
c9d32674e
(refs/pull/2765/head)
Add memory barrier to the blas_lock implementation for Linux by
2020-08-09 19:17:04 +0200
64259d521
(refs/pull/2764/head)
Fix use of unallocated array in workspace query and wrong type of argument to xSCAL by
2020-08-09 13:02:27 +0200
6f5ca44c1
Expand TAU array as SGEMQR/DGEMQR read elements 2 and 3 by
2020-08-09 12:59:20 +0200
d28b3f277
Create Jenkinsfile for OSUOSL PowerCI by
2020-08-08 18:05:20 +0200
ba3f7b3ac
Merge pull request #2761 from RajalakshmiSR/Makefile_err by
2020-08-08 12:20:04 +0200
475b5c95b
(refs/pull/2761/head)
Remove extra symbol in Makefile by
2020-08-07 15:27:44 -0500
cd60080d4
Merge pull request #2758 from martin-frbg/undef_shift by
2020-08-03 23:30:26 +0200
4847bfddd
Merge pull request #2757 from martin-frbg/cmake64 by
2020-08-02 23:05:21 +0200
81dcfdcf3
(refs/pull/2758/head)
Multiply by 2 instead of left-shifting a potentially negative number by
2020-08-02 18:29:56 +0200
0ef4b3f1f
Multiply instead of doing a left shift of a potentially negative number by
2020-08-02 18:27:40 +0200
aa53a8a5c
Multiply by two instead of left-shifting one place by
2020-08-02 18:25:09 +0200
aa3a1e7d8
Multiply by two rather than left shift by one place by
2020-08-02 18:22:31 +0200
aaf1a1716
(refs/pull/2757/head)
Apply current library name suffix by
2020-08-02 17:58:33 +0200
53add6a80
Apply library name suffix to openblas if any by
2020-08-02 17:57:12 +0200
9eb897cc0
Merge pull request #75 from xianyi/develop by
2020-08-02 17:50:06 +0200
7cead5625
Merge pull request #2753 from martin-frbg/issue2751 by
2020-08-02 15:32:46 +0200
6794ac341
(refs/pull/2753/head)
Add SYMBOLPREFIX and/or -SUFFIX to cblas.h if needed by
2020-08-02 11:20:08 +0200
ecf4b9e0f
Improve substitution rules for SYMBOLPREFIX and -SUFFIX addition by
2020-08-01 17:06:03 +0200
dfe5d0964
Merge pull request #2756 from martin-frbg/issue2755 by
2020-08-01 15:19:02 +0200
60cd5e55f
(refs/pull/2756/head)
Protect against inadvertent activation of USE_CUDA by
2020-08-01 12:31:39 +0200
da9e2a7ad
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes by
2020-07-31 16:03:33 +0200
c88cbc5e0
Merge pull request #2752 from kadler/cpuid_aix by
2020-07-31 12:52:24 +0200
589c74aed
(refs/pull/2752/head)
Use systemcfg APIs for CPU detection on AIX by
2020-07-30 20:52:16 -0500
104aa678b
Fix inadvertent version number reversal to 0.3.9.dev caused by #2710 by
2020-07-30 11:40:52 +0200
c6b48e039
Merge pull request #2749 from martin-frbg/make_ppc by
2020-07-30 11:35:53 +0200
492725129
Merge pull request #2750 from RajalakshmiSR/dgemv_p10 by
2020-07-30 10:13:19 +0200
f77b6a83f
(refs/pull/2750/head)
dgemv optimization for POWER10 by
2020-07-29 18:59:32 -0500
39724e812
(refs/pull/2749/head)
Separate OpenMP handling and allow compilation of Power9 code with older gcc by
2020-07-30 01:14:08 +0200
525db5401
Merge pull request #74 from xianyi/develop by
2020-07-30 01:04:09 +0200
cb097beba
Merge pull request #2741 from martin-frbg/issue2739 by
2020-07-29 10:01:14 +0200
7c02f4b1f
Merge pull request #2744 from martin-frbg/issue2738 by
2020-07-28 19:32:04 +0200
383262035
Merge pull request #2740 from RajalakshmiSR/clang-power by
2020-07-28 18:15:25 +0200
5fa581c87
Put hint to use git develop rather than master branch in README by
2020-07-28 14:22:41 +0000
12918358a
(refs/pull/2744/head)
Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2 by
2020-07-28 13:53:17 +0000
200f5c44c
Add AMD Renoir models and preliminary support for ZEN3 as ZEN2 by
2020-07-28 13:45:23 +0000
c4176105d
(refs/pull/2743/head)
Fix accidental deletion by
2020-07-28 10:08:41 +0000
ba27936ce
Add cpuid detection of AMD Zen2 Matisse and Renoir by
2020-07-28 09:03:52 +0000
afdca268a
Add AMD Matisse and Renoir Zen2 variants by
2020-07-28 09:00:12 +0000
64e2e4aaf
(refs/pull/2741/head)
missing braces by
2020-07-27 20:19:22 +0000
921ec4e9e
Adjust A53 SGEMM parameters to reflect move to 8x8 kernel by
2020-07-27 19:54:46 +0000
d557584b7
(refs/pull/2740/head)
Fix compilation issues with clang on POWER by
2020-07-27 14:11:07 -0500
a4ceb1ade
Merge pull request #2737 from ashwinyes/add_thunderx3_target by
2020-07-27 15:19:47 +0200
4e1be0e48
(refs/pull/2737/head)
ARM64: Add THUNDERX3T110 Target by
2020-06-11 04:12:49 -0700
49b83e00b
Merge pull request #2735 from martin-frbg/move_potrf by
2020-07-26 19:54:11 +0200
769ed9ffa
Merge pull request #2734 from RajalakshmiSR/p10_fix by
2020-07-25 09:02:32 +0200
f194ad59e
(refs/pull/2735/head)
Use _Atomic instead of volatile where available (file moved from ../getrf) by
2020-07-25 08:52:24 +0200
4fda217f9
Delete potrf_parallel.c (moving it to ../potrf) by
2020-07-25 06:42:39 +0000
9be2688c7
(refs/pull/2734/head)
Fix to store results in correct order for POWER10 GEMM kernels by
2020-07-24 23:08:11 -0500
6a2a60038
Merge pull request #2720 from martin-frbg/issue2694 by
2020-07-24 23:19:45 +0200
251a09ec9
(refs/pull/2720/head)
Typo fix by
2020-07-24 16:04:58 +0000
95d37e157
Regroup the 32 and 64bit sections and restore 64bit CAXPY by
2020-07-24 10:13:46 +0000