ea5bdc3f7
split cortex-a53 param to match 8x8 kernel by
2020-05-20 22:34:47 +0800
9df79ae9a
update sgemm and strmm kernel selecting strategy by
2020-05-20 21:57:12 +0800
a1fc6041c
use general register to speedup by
2020-05-20 21:55:32 +0800
edb423d77
align general register using to strmm_kernel_8x8 by
2020-05-20 21:52:49 +0800
0e6eb8c24
sgemm kernel use sgemm_kernel_8x8_cortexa53 by
2020-05-18 16:51:33 +0800
d475db29c
optimized for cortex-a53 by
2020-05-18 16:47:33 +0800
729ac6bd4
Merge pull request #2623 from mhillenibm/zarch_dgemm_z14 by
2020-05-20 14:51:04 +0200
89fe17f20
(refs/pull/2623/head)
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 by
2020-05-19 14:56:34 +0200
bdd795ed0
s390x/GEMM: replace 0-init with peeled first iteration by
2020-05-19 14:30:44 +0200
e1038ea83
Merge pull request #2622 from martin-frbg/issue2619 by
2020-05-19 23:07:22 +0200
6baa9a778
(refs/pull/2622/head)
Improve declaration of LAPACKE_get_nancheck by
2020-05-19 17:59:31 +0200
cf46c9f84
Merge pull request #2617 from martin-frbg/issue2616 by
2020-05-18 13:23:58 +0200
55602fce5
(refs/pull/2617/head)
Ignore spurious all-numeric library names derived from mishandled jobserver flags by
2020-05-17 15:28:14 +0200
3d5e159e7
Ignore spurious all-numeric library names derived from mishandled jobserver flags by
2020-05-17 15:26:57 +0200
2931feb57
Merge pull request #58 from xianyi/develop by
2020-05-17 15:23:32 +0200
20245ded5
Merge pull request #2615 from mhillenibm/z14_alignment_hints by
2020-05-14 21:06:34 +0200
2840432e4
(refs/pull/2615/head)
s390x: improvise vector alignment hints for older compilers by
2020-05-13 17:48:50 +0200
ea78106c7
Merge pull request #2614 from mhillenibm/gemm_vec_z14 by
2020-05-13 15:09:23 +0200
cb9dc36dd
(refs/pull/2614/head)
Update CONTRIBUTORS.md by
2020-05-12 16:14:00 +0200
1b0b4349a
s390x/Z14: Change register blocking for SGEMM to 16x4 by
2020-05-12 15:06:38 +0200
71b6eaf45
s390x: Use new sgemm kernel also for strmm on Z14 and newer by
2020-05-12 14:40:30 +0200
43c0d4f31
s390x: Add vectorized sgemm kernel for Z14 and newer by
2020-05-12 14:13:54 +0200
d7c1677c2
(refs/pull/2613/head)
Update CONTRIBUTORS.md, adding myself by
2020-05-12 11:09:28 +0200
0dbe61a61
s390x: choose SIMD kernels at run-time based on OS and compiler support by
2020-05-11 13:00:10 +0200
62cf391cb
s390x: only build kernels supported by gcc with dynamic arch support by
2020-05-11 18:37:04 +0200
8c338616f
s390x: gate dynamic arch detection on gcc version and add generic by
2020-05-11 12:37:21 +0200
f94c53ec0
Merge pull request #2612 from RajalakshmiSR/testshgemm by
2020-05-12 08:34:02 +0200
8efba9b7c
(refs/pull/2612/head)
Improve shgemm test by
2020-05-11 17:15:10 -0500
4fffa556d
Merge pull request #2611 from RajalakshmiSR/bench_half by
2020-05-11 21:08:41 +0200
ce90e2bd3
(refs/pull/2611/head)
Include shgemm in benchtest by
2020-05-11 09:57:46 -0500
948b6712b
Merge pull request #2610 from martin-frbg/issue2552-3 by
2020-05-10 13:10:31 +0200
2271c3506
(refs/pull/2610/head)
Work around excessive LAPACK test failures on Skylake-X by
2020-05-09 23:49:18 +0200
db00b2144
Merge pull request #2609 from martin-frbg/issue2552-2 by
2020-05-09 21:33:02 +0200
58d26b444
(refs/pull/2609/head)
Correct ifort options by
2020-05-09 17:15:36 +0200
8e47d1405
Merge pull request #2608 from martin-frbg/issue2604 by
2020-05-09 16:36:14 +0200
cd10b35fe
(refs/pull/2608/head)
Handle trailing spaces and empty condition variables by
2020-05-09 13:42:33 +0200
9472dd99c
Merge pull request #57 from xianyi/develop by
2020-05-09 13:20:44 +0200
718166545
Merge pull request #2605 from RajalakshmiSR/cmake-power by
2020-05-09 11:29:28 +0200
bd9ff820b
(refs/pull/2605/head)
Fix cmake compilation issue - POWER9 by
2020-05-08 20:31:56 -0500
63e45def7
Merge pull request #2603 from martin-frbg/issue2552 by
2020-05-08 22:08:39 +0200
ec0f22863
(refs/pull/2603/head)
Add FFLAGS_DRV to the generated make.inc to fix lapack-test on x86_64 with icc/ifort by
2020-05-08 18:06:12 +0200
90e2941c6
Merge pull request #56 from xianyi/develop by
2020-05-07 22:43:48 +0200
10d5f3c87
Merge pull request #2602 from ashwinyes/thunderx2_develop by
2020-05-07 22:06:41 +0200
8353cb245
(refs/pull/2602/head)
ARM64: Improve DAXPY for ThunderX2 by
2020-05-07 09:14:05 -0700
ec2dd7b87
Merge pull request #2601 from martin-frbg/issue818 by
2020-05-07 10:12:33 +0200
4e82eb9f8
(refs/pull/2601/head)
Undefine ASMNAME/NAME/CNAME before defining them by
2020-05-07 00:31:32 +0200
61300bb73
Merge pull request #55 from xianyi/develop by
2020-05-07 00:27:14 +0200
33e9b1246
Merge pull request #2597 from martin-frbg/appleclang by
2020-05-05 13:55:08 +0200
90dba9f71
(refs/pull/2597/head)
Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version by
2020-05-05 10:44:50 +0200
4d0fd365a
(refs/pull/2594/head)
Update common_x86_64.h by
2020-05-02 20:29:25 +0200
4abb651af
fix format specifier for unsigned by
2020-05-02 16:10:49 +0200
b5d3e46e6
more debugging by
2020-05-02 15:21:13 +0200
ccdf81ecc
and back to unsigned to run another test... by
2020-05-02 14:22:32 +0200
20f2f6fc8
revert last change, blas_quickdivide returns a signed int again by
2020-05-01 21:12:11 +0200
6b96e6dfa
make blas_quickdivide actually return unsigned (to placate clang) by
2020-05-01 16:01:42 +0200
94487c02d
Delete extra semicolon after brace to make clang happy by
2020-05-01 15:56:17 +0200
c3c00380d
Delete spurious copy of common_param.h by
2020-05-01 15:34:56 +0200
2de3fff4f
Move some declarations for pre-C99 compatibility by
2020-05-01 15:25:32 +0200
424d551e0
Merge pull request #53 from xianyi/develop by
2020-05-01 15:18:46 +0200
596f5df9e
Merge pull request #2591 from RajalakshmiSR/testhalf by
2020-05-01 09:59:39 +0200
5dd14e3d4
Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) by
2020-05-01 09:58:30 +0200
924cc7e58
(refs/pull/2590/head)
typo fix by
2020-04-29 22:11:42 +0200
4297e2ed8
fix shgemm parameter references in arm64 branch by
2020-04-29 22:09:23 +0200
a54e35e78
Merge pull request #2586 from martin-frbg/miscfixes by
2020-04-29 22:01:41 +0200
564b0d39e
(refs/pull/2591/head)
Add test for shgemm by
2020-04-29 13:40:34 -0500
254a934b5
ifdef another group of shgemm parameters by
2020-04-29 20:25:33 +0200
9acf45c67
Fix overlooked shgemm parameters by
2020-04-29 19:25:13 +0200
8d4042d89
Make shgemm parameters conditional on BUILD_HALF by
2020-04-29 18:46:16 +0200
33059ad1d
make bfloat16 functions conditional on BUILD_HALF by
2020-04-29 18:31:24 +0200
137781096
fix endif by
2020-04-29 18:30:41 +0200
b2f6f76a5
Pass BUILD_HALF as a compiler define for dynamic_arch builds by
2020-04-29 18:30:10 +0200
84e5b0c4f
typo by
2020-04-29 16:07:27 +0200
75e0495a7
Make shgemm kernels conditional on BUILD_HALF by
2020-04-29 15:58:59 +0200
fd267b58b
make shgemm kernels conditional on BUILD_HALF by
2020-04-29 14:48:37 +0200
f881c697f
pass the BUILD_HALF option to gensymbol by
2020-04-29 14:47:09 +0200
48e26bc31
make bfloat16 functions conditional on BUILD_HALF by
2020-04-29 14:46:13 +0200
34e64d57a
make shgemm functions conditional on BUILD_HALF by
2020-04-29 14:44:53 +0200
45881fab5
make shgemm functions conditional on BUILD_HALF by
2020-04-29 14:44:07 +0200
7bf186565
make building the bfloat16 functions conditional on BUILD_HALF by
2020-04-29 14:42:35 +0200
3c37071ee
make bfloat16 kernels conditional on BUILD_HALF by
2020-04-29 14:40:17 +0200
5d58b1110
Merge pull request #52 from xianyi/develop by
2020-04-29 14:36:15 +0200
d394d4e67
Merge pull request #2585 from martin-frbg/mips64fix by
2020-04-28 19:47:55 +0200
9d3a317ab
Refs #2587 Fix typos. by
2020-04-29 00:19:19 +0800
92372c70f
Fix gemm interface bug for small matrix. by
2020-04-28 23:15:20 +0800
43bef4aaa
Add alpha=1.0 beta=0.0 for small gemm. by
2020-04-28 22:35:36 +0800
aae6af94b
Add small marix optimization kernel interface. by
2020-04-28 19:01:36 +0800
f4248af26
(refs/pull/2586/head)
Fix compiler warnings by
2020-04-28 10:43:12 +0200
2d89603e9
(refs/pull/2585/head)
Increase BUFFER_SIZE on mips64 to match SGEMM parameters by
2020-04-28 10:40:40 +0200
26bc15258
Merge pull request #51 from xianyi/develop by
2020-04-28 10:38:50 +0200
141998dce
Merge pull request #2584 from martin-frbg/issue2583 by
2020-04-28 10:35:12 +0200
3bd56846b
(refs/pull/2584/head)
Silence a debug message by
2020-04-27 16:27:09 +0200
e7bbdfdf8
Have CMAKE parse conditional lines in KERNEL files by
2020-04-27 15:20:03 +0200
b6795db73
Merge pull request #2582 from martin-frbg/mips32fix by
2020-04-27 09:18:34 +0200
5e0dbf8df
(refs/pull/2582/head)
Increase default BUFFER_SIZE to accomodate SGEMM parameters by
2020-04-26 22:21:05 +0200
955d73127
Merge pull request #50 from xianyi/develop by
2020-04-26 22:17:56 +0200
a8c1bea7a
Merge pull request #2581 from martin-frbg/raji by
2020-04-25 19:57:10 +0200
e43b49e06
(refs/pull/2581/head)
Drop the set -e from travis scripts by
2020-04-25 16:18:54 +0200
3e28db7f3
Update CONTRIBUTORS.md by
2020-04-25 13:51:44 +0200
4b69ee31a
Merge pull request #2580 from martin-frbg/issue2538-3 by
2020-04-25 00:28:18 +0200
03ff213c5
(refs/pull/2580/head)
Increase POWER8 ZGEMM_R and use same R values for POWER9 by
2020-04-24 21:46:54 +0200