Commit Graph

  • *
  • *
  • *
  • *
  • *
  • | *
  • | | *
  • | |/
  • |/|
  • | | *
  • | | | *
  • | | |/
  • | |/|
  • | | *
  • | | *
  • | | *
  • | | *
  • | | *
  • | |/
  • |/|
  • * |
  • |\|
  • | *
  • | |\
  • | | *
  • | |/
  • | *
  • | |\
  • | | *
  • | |/
  • | *
  • | |\
  • | | *
  • | |/
  • | | *
  • | |/
  • | *
  • | |\
  • | | *
  • | |/
  • | *
  • | |\
  • | | *
  • | |/
  • |/|
  • * |
  • |\|
  • | *
  • | |\
  • | * \
  • | |\ \
  • | | * |
  • | | * |
  • | | * |
  • | |/ /
  • |/| |
  • | * |
  • | |\ \
  • | | * |
  • | | * |
  • | | * |
  • | * | |
  • | |\| |
  • * | | |
  • | | * |
  • | | * |
  • | | * |
  • | | * |
  • | |/ /
  • |/| |
  • * | |
  • |\| |
  • | * |
  • | |\ \
  • | |/ /
  • |/| |
  • | | | *
  • * | | |
  • * | | |
  • | * | |
  • | |\ \ \
  • | * \ \ \
  • | |\ \ \ \
  • | | | | | | *
  • | | | | | | *
  • | |_|_|_|_|/
  • |/| | | | |
  • | | | * | |
  • | |_|/ / /
  • |/| | | |
  • | | | | | *
  • | | | | | *
  • | | * | | |
  • | |/ / / /
  • |/| | | |
  • * | | | |
  • |\| | | |
  • | * | | |
  • | |\ \ \ \
  • | | * | | |
  • | |/ / / /
  • | | * / /
  • | |/ / /
  • | * | |
  • | * | |
  • | * | |
  • | |\ \ \
  • | | * \ \
  • | | |\ \ \
  • | | |/ / /
  • | |/| | |
  • | * | | |
  • | * | | |
  • | * | | |
  • | |\ \ \ \
  • | | * | | |
  • | * | | | |
  • | |\ \ \ \ \
  • | | | * | | |
  • | |_|/ / / /
  • |/| | | | |
  • | | * | | |
  • | |/ / / /
  • |/| | | |
  • * | | | |
  • |\| | | |
  • | * | | |
  • | |\ \ \ \
  • | * \ \ \ \
  • | |\ \ \ \ \
  • | * \ \ \ \ \
  • | |\ \ \ \ \ \
  • | | | * | | | |
  • | | |/ / / / /
  • | |/| | | | |
  • | * | | | | |
  • | |\ \ \ \ \ \
  • | * \ \ \ \ \ \
  • | |\ \ \ \ \ \ \
  • | | * | | | | | |
  • | | * | | | | | |
  • | | | * | | | | |
  • | | | * | | | | |
  • | | | * | | | | |
  • | |_|/ / / / / /
  • |/| | | | | | |
  • * | | | | | | |
  • |\| | | | | | |
  • | | * | | | | |
  • | |/ / / / / /
  • | * | | | | |
  • | |\ \ \ \ \ \
  • | | |_|_|_|/ /
  • | |/| | | | |
  • | * | | | | |
  • | * | | | | |
  • | |\ \ \ \ \ \
  • | | | * | | | |
  • | | | * | | | |
  • | | |/ / / / /
  • | |/| | | | |
  • | * | | | | |
  • | |\ \ \ \ \ \
  • | | * | | | | |
  • | | * | | | | |
  • | | * | | | | |
  • | | * | | | | |
  • | | * | | | | |
  • | | * | | | | |
  • | | * | | | | |
  • | | * | | | | |
  • | |/ / / / / /
  • |/| | | | | |
  • * | | | | | |
  • |\| | | | | |
  • | | * | | | |
  • | |/ / / / /
  • 0f27a0360 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels by Martin Kroeker 2021-01-12 16:39:35 +0100
  • c2a8ebfe6 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels by Martin Kroeker 2021-01-12 16:38:51 +0100
  • 43aac5bac Support NVIDIA HPC compiler by Martin Kroeker 2021-01-12 16:36:12 +0100
  • bff2b7c94 Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options) by Martin Kroeker 2021-01-12 16:34:18 +0100
  • 2d45a262d Support compilation with nvfortran by Martin Kroeker 2021-01-12 16:32:29 +0100
  • ed652d813 (refs/pull/3062/head) Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h. by Gordon Fossum 2021-01-11 21:13:53 -0500
  • 6fe0f1fab (refs/pull/3060/head) Label get_cpu_ftr as volatile to keep gcc from rearranging the code by Martin Kroeker 2021-01-11 19:05:29 +0100
  • f725ef29d (refs/pull/3058/head) Loop the OpenMP test 20 times by Martin Kroeker 2021-01-10 23:14:14 +0100
  • b0beb0b1c (refs/pull/3059/head) Initial code for Cooperlake BF16 GEMM kernel by Chen, Guobing 2021-01-11 02:15:21 +0800
  • 5bcc7bcb0 add include path for cblas.h by Martin Kroeker 2021-01-10 18:55:14 +0100
  • 14381868b Add another OpenMP test for EPYC and ARM server by Martin Kroeker 2021-01-10 17:16:25 +0100
  • f3ad15df5 Add another OpenMP test variant by Martin Kroeker 2021-01-10 17:14:25 +0100
  • 0930b2bab Add another OpenMP test by Martin Kroeker 2021-01-10 17:11:45 +0100
  • f88a337f9 Create test_gemm_omp.cc by Martin Kroeker 2021-01-10 17:11:00 +0100
  • 018dec858 Merge pull request #7 from xianyi/develop by Martin Kroeker 2021-01-10 17:09:46 +0100
  • 5d6209e1f Merge pull request #3055 from RajalakshmiSR/swapp10 by Martin Kroeker 2021-01-09 00:11:44 +0100
  • 601b711c7 (refs/pull/3055/head) Optimize swap function for POWER10 by Rajalakshmi Srinivasaraghavan 2021-01-08 08:01:36 -0600
  • 78702753f Merge pull request #3053 from pkubaj/patch-1 by Martin Kroeker 2021-01-02 16:14:07 +0100
  • 7aa1ff8ff (refs/pull/3053/head) Fix build on FreeBSD/powerpc64le by pkubaj 2021-01-01 21:19:57 +0000
  • d6c97cf01 Merge pull request #3052 from ashwinyes/arm64_fix_nrm2 by Martin Kroeker 2021-01-01 15:51:07 +0100
  • 1b2508362 (refs/pull/3052/head) arm64: Fix nrm2 for input vectors with Inf by Ashwin Sekhar T K 2021-01-01 02:09:40 -0800
  • ca3f7bad1 (zhbmv_smp) Enable zhbmv smp implementation. by Zhang Xianyi 2020-12-31 10:05:00 +0800
  • cd898af59 Merge pull request #3050 from aurel32/riscv64-openblas-supported by Martin Kroeker 2020-12-29 21:59:40 +0100
  • 0a535e58d (refs/pull/3050/head) getarch.c: define OPENBLAS_SUPPORTED for riscv64 by Aurelien Jarno 2020-12-29 12:06:39 +0000
  • 9ce9e295f Merge pull request #3049 from martin-frbg/readme by Martin Kroeker 2020-12-27 22:54:20 +0100
  • 9a38592c7 (refs/pull/3049/head) Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers by Martin Kroeker 2020-12-27 21:55:08 +0100
  • 9b3965b08 Merge pull request #6 from xianyi/develop by Martin Kroeker 2020-12-27 21:28:10 +0100
  • 531cb4f67 Merge pull request #3035 from Joshua-Ashton/patch-1 by Martin Kroeker 2020-12-27 21:26:52 +0100
  • 3559c5d7a Merge pull request #3048 from martin-frbg/issue2998 by Martin Kroeker 2020-12-21 13:30:08 +0100
  • 8631e2976 (refs/pull/3048/head) Temporarily revert to the old nrm2 kernels by Martin Kroeker 2020-12-21 07:45:13 +0100
  • 2768bc176 Temporarily revert to the old nrm2 kernels by Martin Kroeker 2020-12-21 07:42:51 +0100
  • 6f4698ee1 Temporarily revert to the old nrm2 kernel by Martin Kroeker 2020-12-21 07:41:18 +0100
  • 85e5165e9 Merge pull request #3046 from martin-frbg/nvidiasdk-ppc by Martin Kroeker 2020-12-20 11:55:53 +0100
  • 17c16f2a7 (refs/pull/3046/head) Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers by Martin Kroeker 2020-12-19 23:21:22 +0100
  • 91c3f86c2 NVIDIA compiler does not yet support POWER10 by Martin Kroeker 2020-12-19 23:19:05 +0100
  • 75b1f3bec Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers by Martin Kroeker 2020-12-19 23:17:40 +0100
  • 07c5e549b Merge pull request #3045 from martin-frbg/nvidiasdk by Martin Kroeker 2020-12-19 23:14:02 +0100
  • 114eb159a Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA by Martin Kroeker 2020-12-19 22:15:58 +0100
  • 005cce550 (refs/pull/3045/head) Amend SkylakeX options to support the NVIDIA compiler by Martin Kroeker 2020-12-19 22:11:49 +0100
  • b859b6e79 Add nvfortran by Martin Kroeker 2020-12-19 22:09:57 +0100
  • b212a2fb9 Add/modify "PGI" compiler options for NVIDIA SDK 20.11 by Martin Kroeker 2020-12-19 22:08:37 +0100
  • e40416567 Add version printout for PGI/NVIDIA compiler by Martin Kroeker 2020-12-19 22:06:56 +0100
  • b37e5fa2f Merge pull request #5 from xianyi/develop by Martin Kroeker 2020-12-19 20:11:06 +0100
  • 326469ef4 Merge pull request #3042 from martin-frbg/develop by Martin Kroeker 2020-12-19 20:04:19 +0100
  • a3cac9cca Update sgemm kernel 1x4 for C910. by Xianyi Zhang 2020-12-18 11:53:23 +0800
  • c73d8ee40 (refs/pull/3042/head) Conditionally add -mfma to compiler options where needed by Martin Kroeker 2020-12-17 11:34:05 +0100
  • abef2ea77 Move -fma option setting to kernel/Makefile.L1 by Martin Kroeker 2020-12-17 11:32:27 +0100
  • b26e32c3a Merge pull request #3040 from martin-frbg/fixfcheck by Martin Kroeker 2020-12-16 00:05:04 +0100
  • 7822eff93 Merge pull request #3038 from martin-frbg/issue3037 by Martin Kroeker 2020-12-16 00:04:45 +0100
  • 865676682 (refs/pull/3051/head) Add Intel Rocket Lake by Martin Kroeker 2020-12-14 22:40:23 +0100
  • 0f7776af0 Add Intel Rocket Lake by Martin Kroeker 2020-12-14 22:30:36 +0100
  • b03dc011b (refs/pull/3040/head) Fix undefined CC variable in clang check by Martin Kroeker 2020-12-14 19:21:52 +0100
  • 77460ac25 (small_matrices) Fix gemm_batch bug for SMALL_MATRIX_OPT=1. by Zhang Xianyi 2020-12-12 18:59:07 +0800
  • 88e6806e3 Init cblas_?gemm_batch implementation. by Zhang Xianyi 2020-12-12 17:05:14 +0800
  • 00ce35336 (refs/pull/3038/head) Fix spurious removal of a trailing character from the hostarch string on x86_64 by Martin Kroeker 2020-12-13 21:28:01 +0100
  • 723776ddf Merge pull request #4 from xianyi/develop by Martin Kroeker 2020-12-13 21:22:41 +0100
  • 5a77ec7f1 Merge pull request #3036 from RajalakshmiSR/p10copyalign by Martin Kroeker 2020-12-13 21:21:34 +0100
  • 2fb11f873 (refs/pull/3036/head) POWER10: Improve copy performance by Rajalakshmi Srinivasaraghavan 2020-12-13 10:41:45 -0600
  • ad6364744 (refs/pull/3035/head) Define BLAS acronym in README by Joshie 2020-12-13 09:06:14 +0000
  • 87315e8a8 Update version to 0.3.13.dev by Martin Kroeker 2020-12-12 23:28:49 +0100
  • 9031ebd7d Update version to 0.3.13.dev by Martin Kroeker 2020-12-12 23:28:20 +0100
  • 12b41d559 Merge pull request #3034 from xianyi/release-0.3.0 by Martin Kroeker 2020-12-12 23:27:40 +0100
  • d2b11c477 (tag: v0.3.13, refs/pull/3034/head) Merge pull request #3033 from xianyi/develop by Martin Kroeker 2020-12-12 18:19:29 +0100
  • 7bc0e4a2e (refs/pull/3033/head) Update version to 0.3.13 for release by Martin Kroeker 2020-12-12 18:15:33 +0100
  • d3ec787f7 Update version to 0.3.13 for release by Martin Kroeker 2020-12-12 18:14:49 +0100
  • 2c309c235 Merge pull request #3031 from martin-frbg/changelog13 by Martin Kroeker 2020-12-12 18:13:23 +0100
  • 3dec81200 (refs/pull/3031/head) Update Changelog.txt by Martin Kroeker 2020-12-12 14:27:37 +0100
  • 737724607 Merge pull request #3030 from martin-frbg/fix2994 by Martin Kroeker 2020-12-12 10:01:45 +0100
  • 77edf82c7 Update Changelog.txt for 0.3.13 by Martin Kroeker 2020-12-12 01:25:20 +0100
  • 6232237db (refs/pull/3030/head) Make fallback from P10 to P9 conditional on suitable compiler by Martin Kroeker 2020-12-11 23:41:17 +0100
  • 7d81acc76 Merge pull request #3 from xianyi/develop by Martin Kroeker 2020-12-11 23:38:42 +0100
  • 18d8a6748 Merge pull request #2994 from antonblanchard/power10-fixes by Martin Kroeker 2020-12-11 23:37:30 +0100
  • 043128cbe Merge pull request #3029 from RajalakshmiSR/axpyp10 by Martin Kroeker 2020-12-10 22:49:28 +0100
  • 3331ca492 Merge pull request #3021 from austinpagan/trsm_p10 by Martin Kroeker 2020-12-10 19:42:54 +0100
  • 346e30a46 (refs/pull/3029/head) POWER10: Improve axpy performance by Rajalakshmi Srinivasaraghavan 2020-12-10 11:51:42 -0600
  • 83de62c20 Merge pull request #3026 from martin-frbg/revert747 by Martin Kroeker 2020-12-10 16:29:41 +0100
  • 658da9a76 Merge pull request #3027 from gxw-loongson/develop by Martin Kroeker 2020-12-10 16:27:30 +0100
  • be24c66a7 (refs/pull/3027/head) Keep LOONGSON3A and LOONGSON3B for loongson by gxw 2020-12-10 10:48:53 +0800
  • 4b548857d Add msa support for loongson by gxw 2020-11-26 14:59:41 +0800
  • d71fe4ed4 (refs/pull/3026/head) Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747) by Martin Kroeker 2020-12-08 21:07:57 +0100
  • a55471243 remove extra/intermediate size step for min_jj introduced in PR747 by Martin Kroeker 2020-12-08 21:01:36 +0100
  • 5d26223f4 remove extra/intermediate size step of min_jj from PR747 by Martin Kroeker 2020-12-08 20:59:56 +0100
  • 980ab349b Merge pull request #2 from xianyi/develop by Martin Kroeker 2020-12-08 20:53:35 +0100
  • d67babf34 Remove gcc unrecognized option '-msched-weight' when check msa by gxw 2020-12-08 19:16:39 +0800
  • 7f11e33e8 Merge pull request #3025 from TiredNotTear/develop by Martin Kroeker 2020-12-08 09:39:27 +0100
  • 7834c10e2 (ck860v) Add PingTouGe contribution credit. by Xianyi Zhang 2020-12-07 16:55:05 +0800
  • 53e083780 Merge pull request #3022 from jinboson/develop by Martin Kroeker 2020-12-07 08:09:11 +0100
  • ad38bd0e8 (refs/pull/3025/head) Fix failed cgemv and zgemv test case after using msa optimization by Hao Chen 2020-12-07 10:18:51 +0800
  • 47b639cc9 Fix failed sswap and dswap case by using msa optimization by Hao Chen 2020-12-07 10:04:00 +0800
  • 8fef5876d Merge pull request #3024 from martin-frbg/sparc by Martin Kroeker 2020-12-06 22:34:36 +0100
  • 6c7d557a1 (refs/pull/3024/head) Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio by Martin Kroeker 2020-12-06 19:20:50 +0100
  • b660008c7 Work around DOT and SWAP test failures by Martin Kroeker 2020-12-06 19:15:37 +0100
  • f8346603c Fix compilation with SolarisStudio by Martin Kroeker 2020-12-06 19:14:16 +0100
  • 93473174d Fix utest build with SolarisStudio compilers by Martin Kroeker 2020-12-06 19:12:56 +0100
  • b0b14f4e9 Change comments to C style for compatibility by Martin Kroeker 2020-12-06 19:12:02 +0100
  • 3a1b1b7c8 Fix complex ABI for 32bit SolarisStudio builds by Martin Kroeker 2020-12-06 19:08:43 +0100
  • da6d5d675 Fix hostarch detection for sparc by Martin Kroeker 2020-12-06 19:07:45 +0100
  • 04fa17322 Fix build options for SolarisStudio compilers by Martin Kroeker 2020-12-06 19:05:27 +0100
  • 3853014ea Merge pull request #1 from xianyi/develop by Martin Kroeker 2020-12-06 18:52:51 +0100
  • 65de6f595 (refs/pull/3022/head) Fix test errors reported by cblas_cgemm & cblas_ctrmm by Jin Bo 2020-12-05 15:06:12 +0800