1617 Commits (0a4546b742104580cee77fe8f01d9cbb20d4161b)

Author SHA1 Message Date
  Martin Kroeker 0a4546b742
Typo fix 5 years ago
  Martin Kroeker b1eed27a54
Replace naive omatcopy_rt with 4x4 blocked implementation 5 years ago
  Martin Kroeker 47691c031f
Use Haswell optimizations for Zen as well 5 years ago
  Martin Kroeker ce7ddd8921
Use Haswell optimizations for Zen as well 5 years ago
  Martin Kroeker 950c047b49
Use Haswell optimizations for Zen as well 5 years ago
  Martin Kroeker 46509953a9
Use Haswell optimizations for Zen as well 5 years ago
  Martin Kroeker db348dcff2
Enable optimized srot/drot kernels from Haswell 5 years ago
  Rajalakshmi Srinivasaraghavan 2056ffc227 Optimize cscal function for POWER10 5 years ago
  Rajalakshmi Srinivasaraghavan 3ede843d50 Optimize s/dscal function for POWER10 5 years ago
  Martin Kroeker 69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm 5 years ago
  Martin Kroeker d6905403e3
Merge pull request #3068 from alexhenrie/scan-build 5 years ago
  Rajalakshmi Srinivasaraghavan 439b93f6d2 Optimize s/drot function for POWER10 5 years ago
  Rajalakshmi Srinivasaraghavan eff7c9166e Optimize cdot function for POWER10 5 years ago
  Alex Henrie 202fc9e8ed Fix uninitialized argument value in dasum_k 5 years ago
  Martin Kroeker e378b24487
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake 5 years ago
  Albert Ziegenhagel e3f4063683 Fix building "generic" TRMM kernel with CMake 5 years ago
  Martin Kroeker b716c0ef01
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 2efa3b70dc
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 49959d4f1c
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
  Martin Kroeker c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
  Martin Kroeker 43aac5bacc
Support NVIDIA HPC compiler 5 years ago
  Chen, Guobing b0beb0b1ca Initial code for Cooperlake BF16 GEMM kernel 5 years ago
  Rajalakshmi Srinivasaraghavan 601b711c78 Optimize swap function for POWER10 5 years ago
  Ashwin Sekhar T K 1b2508362b arm64: Fix nrm2 for input vectors with Inf 5 years ago
  Martin Kroeker 3559c5d7a2
Merge pull request #3048 from martin-frbg/issue2998 5 years ago
  Martin Kroeker 8631e2976a
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 2768bc1764
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 6f4698ee1f
Temporarily revert to the old nrm2 kernel 5 years ago
  Martin Kroeker 114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA 5 years ago
  Martin Kroeker 005cce5507
Amend SkylakeX options to support the NVIDIA compiler 5 years ago
  Martin Kroeker c73d8ee40d
Conditionally add -mfma to compiler options where needed 5 years ago
  Rajalakshmi Srinivasaraghavan 2fb11f873b POWER10: Improve copy performance 5 years ago
  Martin Kroeker 043128cbe5
Merge pull request #3029 from RajalakshmiSR/axpyp10 5 years ago
  Martin Kroeker 3331ca492d
Merge pull request #3021 from austinpagan/trsm_p10 5 years ago
  Rajalakshmi Srinivasaraghavan 346e30a46a POWER10: Improve axpy performance 5 years ago
  gxw 4b548857d6 Add msa support for loongson 5 years ago
  Martin Kroeker 7f11e33e8d
Merge pull request #3025 from TiredNotTear/develop 5 years ago
  Martin Kroeker 53e0837809
Merge pull request #3022 from jinboson/develop 5 years ago
  Hao Chen ad38bd0e89 Fix failed cgemv and zgemv test case after using msa optimization 5 years ago
  Hao Chen 47b639cc9b Fix failed sswap and dswap case by using msa optimization 5 years ago
  Martin Kroeker b660008c7e
Work around DOT and SWAP test failures 5 years ago
  Martin Kroeker f8346603cf
Fix compilation with SolarisStudio 5 years ago
  Jin Bo 65de6f5957 Fix test errors reported by cblas_cgemm & cblas_ctrmm 5 years ago
  Gordon Fossum 213c0e7abb Added special unrolled vectorized versions of "Solve" for specific sizes, 5 years ago
  Martin Kroeker 441c08c9ff
Merge pull request #3016 from xiegengxin/complex-asum 5 years ago
  Gengxin Xie 0cb7a403b2 fix error declare function blas_level1_thread_with_return_value 5 years ago
  Gengxin Xie b766c1e9bb Improve the performance of zasum and casum with AVX512 intrinsic 5 years ago
  Rajalakshmi Srinivasaraghavan 7d46e31de1 POWER10: Optimize dgemv_n 5 years ago
  Martin Kroeker f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum 5 years ago