0f27a0360
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels by
2021-01-12 16:39:35 +0100
c2a8ebfe6
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels by
2021-01-12 16:38:51 +0100
43aac5bac
Support NVIDIA HPC compiler by
2021-01-12 16:36:12 +0100
bff2b7c94
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options) by
2021-01-12 16:34:18 +0100
2d45a262d
Support compilation with nvfortran by
2021-01-12 16:32:29 +0100
ed652d813
(refs/pull/3062/head)
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h. by
2021-01-11 21:13:53 -0500
6fe0f1fab
(refs/pull/3060/head)
Label get_cpu_ftr as volatile to keep gcc from rearranging the code by
2021-01-11 19:05:29 +0100
f725ef29d
(refs/pull/3058/head)
Loop the OpenMP test 20 times by
2021-01-10 23:14:14 +0100
b0beb0b1c
(refs/pull/3059/head)
Initial code for Cooperlake BF16 GEMM kernel by
2021-01-11 02:15:21 +0800
5bcc7bcb0
add include path for cblas.h by
2021-01-10 18:55:14 +0100
14381868b
Add another OpenMP test for EPYC and ARM server by
2021-01-10 17:16:25 +0100
f3ad15df5
Add another OpenMP test variant by
2021-01-10 17:14:25 +0100
0930b2bab
Add another OpenMP test by
2021-01-10 17:11:45 +0100
f88a337f9
Create test_gemm_omp.cc by
2021-01-10 17:11:00 +0100
018dec858
Merge pull request #7 from xianyi/develop by
2021-01-10 17:09:46 +0100
5d6209e1f
Merge pull request #3055 from RajalakshmiSR/swapp10 by
2021-01-09 00:11:44 +0100
601b711c7
(refs/pull/3055/head)
Optimize swap function for POWER10 by
2021-01-08 08:01:36 -0600
78702753f
Merge pull request #3053 from pkubaj/patch-1 by
2021-01-02 16:14:07 +0100
7aa1ff8ff
(refs/pull/3053/head)
Fix build on FreeBSD/powerpc64le by
2021-01-01 21:19:57 +0000
d6c97cf01
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2 by
2021-01-01 15:51:07 +0100
1b2508362
(refs/pull/3052/head)
arm64: Fix nrm2 for input vectors with Inf by
2021-01-01 02:09:40 -0800
ca3f7bad1
(zhbmv_smp)
Enable zhbmv smp implementation. by
2020-12-31 10:05:00 +0800
cd898af59
Merge pull request #3050 from aurel32/riscv64-openblas-supported by
2020-12-29 21:59:40 +0100
0a535e58d
(refs/pull/3050/head)
getarch.c: define OPENBLAS_SUPPORTED for riscv64 by
2020-12-29 12:06:39 +0000
9ce9e295f
Merge pull request #3049 from martin-frbg/readme by
2020-12-27 22:54:20 +0100
9a38592c7
(refs/pull/3049/head)
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers by
2020-12-27 21:55:08 +0100
9b3965b08
Merge pull request #6 from xianyi/develop by
2020-12-27 21:28:10 +0100
531cb4f67
Merge pull request #3035 from Joshua-Ashton/patch-1 by
2020-12-27 21:26:52 +0100
3559c5d7a
Merge pull request #3048 from martin-frbg/issue2998 by
2020-12-21 13:30:08 +0100
8631e2976
(refs/pull/3048/head)
Temporarily revert to the old nrm2 kernels by
2020-12-21 07:45:13 +0100
2768bc176
Temporarily revert to the old nrm2 kernels by
2020-12-21 07:42:51 +0100
6f4698ee1
Temporarily revert to the old nrm2 kernel by
2020-12-21 07:41:18 +0100
85e5165e9
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc by
2020-12-20 11:55:53 +0100
17c16f2a7
(refs/pull/3046/head)
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers by
2020-12-19 23:21:22 +0100
91c3f86c2
NVIDIA compiler does not yet support POWER10 by
2020-12-19 23:19:05 +0100
75b1f3bec
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers by
2020-12-19 23:17:40 +0100
07c5e549b
Merge pull request #3045 from martin-frbg/nvidiasdk by
2020-12-19 23:14:02 +0100
114eb159a
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA by
2020-12-19 22:15:58 +0100
005cce550
(refs/pull/3045/head)
Amend SkylakeX options to support the NVIDIA compiler by
2020-12-19 22:11:49 +0100
b859b6e79
Add nvfortran by
2020-12-19 22:09:57 +0100
b212a2fb9
Add/modify "PGI" compiler options for NVIDIA SDK 20.11 by
2020-12-19 22:08:37 +0100
e40416567
Add version printout for PGI/NVIDIA compiler by
2020-12-19 22:06:56 +0100
b37e5fa2f
Merge pull request #5 from xianyi/develop by
2020-12-19 20:11:06 +0100
326469ef4
Merge pull request #3042 from martin-frbg/develop by
2020-12-19 20:04:19 +0100
a3cac9cca
Update sgemm kernel 1x4 for C910. by
2020-12-18 11:53:23 +0800
c73d8ee40
(refs/pull/3042/head)
Conditionally add -mfma to compiler options where needed by
2020-12-17 11:34:05 +0100
abef2ea77
Move -fma option setting to kernel/Makefile.L1 by
2020-12-17 11:32:27 +0100
b26e32c3a
Merge pull request #3040 from martin-frbg/fixfcheck by
2020-12-16 00:05:04 +0100
7822eff93
Merge pull request #3038 from martin-frbg/issue3037 by
2020-12-16 00:04:45 +0100
865676682
(refs/pull/3051/head)
Add Intel Rocket Lake by
2020-12-14 22:40:23 +0100
0f7776af0
Add Intel Rocket Lake by
2020-12-14 22:30:36 +0100
b03dc011b
(refs/pull/3040/head)
Fix undefined CC variable in clang check by
2020-12-14 19:21:52 +0100
77460ac25
(small_matrices)
Fix gemm_batch bug for SMALL_MATRIX_OPT=1. by
2020-12-12 18:59:07 +0800
88e6806e3
Init cblas_?gemm_batch implementation. by
2020-12-12 17:05:14 +0800
00ce35336
(refs/pull/3038/head)
Fix spurious removal of a trailing character from the hostarch string on x86_64 by
2020-12-13 21:28:01 +0100
723776ddf
Merge pull request #4 from xianyi/develop by
2020-12-13 21:22:41 +0100
5a77ec7f1
Merge pull request #3036 from RajalakshmiSR/p10copyalign by
2020-12-13 21:21:34 +0100
2fb11f873
(refs/pull/3036/head)
POWER10: Improve copy performance by
2020-12-13 10:41:45 -0600
ad6364744
(refs/pull/3035/head)
Define BLAS acronym in README by
2020-12-13 09:06:14 +0000
87315e8a8
Update version to 0.3.13.dev by
2020-12-12 23:28:49 +0100
9031ebd7d
Update version to 0.3.13.dev by
2020-12-12 23:28:20 +0100
12b41d559
Merge pull request #3034 from xianyi/release-0.3.0 by
2020-12-12 23:27:40 +0100
d2b11c477
(tag: v0.3.13, refs/pull/3034/head)
Merge pull request #3033 from xianyi/develop by
2020-12-12 18:19:29 +0100
7bc0e4a2e
(refs/pull/3033/head)
Update version to 0.3.13 for release by
2020-12-12 18:15:33 +0100
d3ec787f7
Update version to 0.3.13 for release by
2020-12-12 18:14:49 +0100
2c309c235
Merge pull request #3031 from martin-frbg/changelog13 by
2020-12-12 18:13:23 +0100
3dec81200
(refs/pull/3031/head)
Update Changelog.txt by
2020-12-12 14:27:37 +0100
737724607
Merge pull request #3030 from martin-frbg/fix2994 by
2020-12-12 10:01:45 +0100
77edf82c7
Update Changelog.txt for 0.3.13 by
2020-12-12 01:25:20 +0100
6232237db
(refs/pull/3030/head)
Make fallback from P10 to P9 conditional on suitable compiler by
2020-12-11 23:41:17 +0100
7d81acc76
Merge pull request #3 from xianyi/develop by
2020-12-11 23:38:42 +0100
18d8a6748
Merge pull request #2994 from antonblanchard/power10-fixes by
2020-12-11 23:37:30 +0100
043128cbe
Merge pull request #3029 from RajalakshmiSR/axpyp10 by
2020-12-10 22:49:28 +0100
3331ca492
Merge pull request #3021 from austinpagan/trsm_p10 by
2020-12-10 19:42:54 +0100
346e30a46
(refs/pull/3029/head)
POWER10: Improve axpy performance by
2020-12-10 11:51:42 -0600
83de62c20
Merge pull request #3026 from martin-frbg/revert747 by
2020-12-10 16:29:41 +0100
658da9a76
Merge pull request #3027 from gxw-loongson/develop by
2020-12-10 16:27:30 +0100
be24c66a7
(refs/pull/3027/head)
Keep LOONGSON3A and LOONGSON3B for loongson by
2020-12-10 10:48:53 +0800
4b548857d
Add msa support for loongson by
2020-11-26 14:59:41 +0800
d71fe4ed4
(refs/pull/3026/head)
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747) by
2020-12-08 21:07:57 +0100
a55471243
remove extra/intermediate size step for min_jj introduced in PR747 by
2020-12-08 21:01:36 +0100
5d26223f4
remove extra/intermediate size step of min_jj from PR747 by
2020-12-08 20:59:56 +0100
980ab349b
Merge pull request #2 from xianyi/develop by
2020-12-08 20:53:35 +0100
d67babf34
Remove gcc unrecognized option '-msched-weight' when check msa by
2020-12-08 19:16:39 +0800
7f11e33e8
Merge pull request #3025 from TiredNotTear/develop by
2020-12-08 09:39:27 +0100
7834c10e2
(ck860v)
Add PingTouGe contribution credit. by
2020-12-07 16:55:05 +0800
53e083780
Merge pull request #3022 from jinboson/develop by
2020-12-07 08:09:11 +0100
ad38bd0e8
(refs/pull/3025/head)
Fix failed cgemv and zgemv test case after using msa optimization by
2020-12-07 10:18:51 +0800
47b639cc9
Fix failed sswap and dswap case by using msa optimization by
2020-12-07 10:04:00 +0800
8fef5876d
Merge pull request #3024 from martin-frbg/sparc by
2020-12-06 22:34:36 +0100
6c7d557a1
(refs/pull/3024/head)
Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio by
2020-12-06 19:20:50 +0100
b660008c7
Work around DOT and SWAP test failures by
2020-12-06 19:15:37 +0100
f8346603c
Fix compilation with SolarisStudio by
2020-12-06 19:14:16 +0100
93473174d
Fix utest build with SolarisStudio compilers by
2020-12-06 19:12:56 +0100
b0b14f4e9
Change comments to C style for compatibility by
2020-12-06 19:12:02 +0100
3a1b1b7c8
Fix complex ABI for 32bit SolarisStudio builds by
2020-12-06 19:08:43 +0100
da6d5d675
Fix hostarch detection for sparc by
2020-12-06 19:07:45 +0100
04fa17322
Fix build options for SolarisStudio compilers by
2020-12-06 19:05:27 +0100
3853014ea
Merge pull request #1 from xianyi/develop by
2020-12-06 18:52:51 +0100
65de6f595
(refs/pull/3022/head)
Fix test errors reported by cblas_cgemm & cblas_ctrmm by
2020-12-05 15:06:12 +0800