1593 Commits (7aa1ff8ff6d3f151292eeb86c629e4077b867ae0)

Author SHA1 Message Date
  Ashwin Sekhar T K 1b2508362b arm64: Fix nrm2 for input vectors with Inf 5 years ago
  Martin Kroeker 3559c5d7a2
Merge pull request #3048 from martin-frbg/issue2998 5 years ago
  Martin Kroeker 8631e2976a
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 2768bc1764
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 6f4698ee1f
Temporarily revert to the old nrm2 kernel 5 years ago
  Martin Kroeker 114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA 5 years ago
  Martin Kroeker 005cce5507
Amend SkylakeX options to support the NVIDIA compiler 5 years ago
  Martin Kroeker c73d8ee40d
Conditionally add -mfma to compiler options where needed 5 years ago
  Rajalakshmi Srinivasaraghavan 2fb11f873b POWER10: Improve copy performance 5 years ago
  Martin Kroeker 043128cbe5
Merge pull request #3029 from RajalakshmiSR/axpyp10 5 years ago
  Martin Kroeker 3331ca492d
Merge pull request #3021 from austinpagan/trsm_p10 5 years ago
  Rajalakshmi Srinivasaraghavan 346e30a46a POWER10: Improve axpy performance 5 years ago
  gxw 4b548857d6 Add msa support for loongson 5 years ago
  Martin Kroeker 7f11e33e8d
Merge pull request #3025 from TiredNotTear/develop 5 years ago
  Martin Kroeker 53e0837809
Merge pull request #3022 from jinboson/develop 5 years ago
  Hao Chen ad38bd0e89 Fix failed cgemv and zgemv test case after using msa optimization 5 years ago
  Hao Chen 47b639cc9b Fix failed sswap and dswap case by using msa optimization 5 years ago
  Martin Kroeker b660008c7e
Work around DOT and SWAP test failures 5 years ago
  Martin Kroeker f8346603cf
Fix compilation with SolarisStudio 5 years ago
  Jin Bo 65de6f5957 Fix test errors reported by cblas_cgemm & cblas_ctrmm 5 years ago
  Gordon Fossum 213c0e7abb Added special unrolled vectorized versions of "Solve" for specific sizes, 5 years ago
  Martin Kroeker 441c08c9ff
Merge pull request #3016 from xiegengxin/complex-asum 5 years ago
  Gengxin Xie 0cb7a403b2 fix error declare function blas_level1_thread_with_return_value 5 years ago
  Gengxin Xie b766c1e9bb Improve the performance of zasum and casum with AVX512 intrinsic 5 years ago
  Rajalakshmi Srinivasaraghavan 7d46e31de1 POWER10: Optimize dgemv_n 5 years ago
  Martin Kroeker f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum 5 years ago
  Xianyi Zhang 7037849498 Merge branch 'develop' into risc-v 5 years ago
  Martin Kroeker 7e9cb39a25
Merge pull request #2981 from Qiyu8/fix-sum 5 years ago
  Gengxin Xie d6e7e05bb3 Improve the performance of dasum and sasum when SMP is defined 5 years ago
  Qiyu8 ae0b1dea19 modify system.cmake to enable fma flag 5 years ago
  Qiyu8 e0dac6b53b fix the CI failure of target specific option mismatch 5 years ago
  Qiyu8 e5c2ceb675 fix the CI failure of lack the head 5 years ago
  Qiyu8 a87e537b8c modify macro 5 years ago
  Qiyu8 5bc0a7583f only FMA3 and vector larger than 128 have positive effects. 5 years ago
  Qiyu8 8c0b206d4c Optimize the performance of rot by using universal intrinsics 5 years ago
  Qiyu8 c4c591ac5a fix sum optimize issues 5 years ago
  Xianyi Zhang fc35b72ae1 Refs #2899 5 years ago
  Xianyi Zhang 913cc9a4ca Merge branch 'develop' into risc-v 5 years ago
  Martin Kroeker ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic 5 years ago
  Martin Kroeker 110c7a6de0
Merge pull request #2979 from RajalakshmiSR/dot_power10 5 years ago
  Rajalakshmi Srinivasaraghavan 6e364981a8 Optimize sdot/ddot for POWER10 5 years ago
  Martin Kroeker b976a0bf40
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds 5 years ago
  Martin Kroeker ff74319ea5
Merge pull request #2977 from martin-frbg/issue2976 5 years ago
  Martin Kroeker 28d2dfe2b3
Fix macro name used in ifdef 5 years ago
  Gengxin Xie 725ffbf041 fix typo 5 years ago
  Gengxin Xie d9ba49165a Improve the performance of rot by using AVX512 and AVX2 intrinsic 5 years ago
  Rajalakshmi Srinivasaraghavan dd7a9cc5bf POWER10: Change dgemm unroll factors 5 years ago
  Rajalakshmi Srinivasaraghavan b435491885 Optimize caxpy for POWER10 5 years ago
  Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv 5 years ago
  Martin Kroeker 67f39ad813
Merge pull request #2939 from thrasibule/Makefile_cleanup 5 years ago