544 Commits (018dec858852fb6859c4c1909c8b79374eae2a4f)

Author SHA1 Message Date
  Martin Kroeker 17c16f2a71
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers 5 years ago
  Martin Kroeker 6232237dba
Make fallback from P10 to P9 conditional on suitable compiler 5 years ago
  Martin Kroeker 18d8a67485
Merge pull request #2994 from antonblanchard/power10-fixes 5 years ago
  Martin Kroeker 83de62c20d
Merge pull request #3026 from martin-frbg/revert747 5 years ago
  gxw 4b548857d6 Add msa support for loongson 5 years ago
  Martin Kroeker a554712439
remove extra/intermediate size step for min_jj introduced in PR747 5 years ago
  Martin Kroeker 5d26223f4a
remove extra/intermediate size step of min_jj from PR747 5 years ago
  Martin Kroeker bc5b1ddf0d
Merge pull request #3004 from martin-frbg/bsd_getauxval 5 years ago
  Martin Kroeker e7bf8ced6c
Build fix for systems that do not support getauxval 5 years ago
  Martin Kroeker 5fa305172a
Use ifeq instead of ifdef for user-definable options 5 years ago
  Martin Kroeker d3ff1f889f
Convert ifndefs to ifneq 5 years ago
  Alexander Grund 60005eb47b
Don't overwrite blas_thread_buffer if already set 5 years ago
  Anton Blanchard 043f3d6faa POWER10: Use POWER9 as a fallback 5 years ago
  Martin Kroeker ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic 5 years ago
  Gengxin Xie d9ba49165a Improve the performance of rot by using AVX512 and AVX2 intrinsic 5 years ago
  Martin Kroeker aa21cb5217
Merge pull request #2960 from thrasibule/avx2_detection 5 years ago
  Guillaume Horel 1f564d729b fix avx2 detection 5 years ago
  Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv 5 years ago
  Martin Kroeker 2207a16235
Merge pull request #2952 from martin-frbg/issue2931 5 years ago
  Martin Kroeker b937d78a6d
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails 5 years ago
  Martin Kroeker fd7da56965
Move definitions that are neither needed nor supported on SUNOS 5 years ago
  Martin Kroeker ff65952e46
Move HAVE_P10_SUPPORT to the build system 5 years ago
  Rajalakshmi Srinivasaraghavan b5d30b390d Fix build issues with bfloat16 5 years ago
  Martin Kroeker 006c7f6671
Change "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 85154c2e18
Change "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 887e00fd7f
Adapt for supporting only a subset of variable types 5 years ago
  Martin Kroeker 886a8e3190
Adapt for supporting only a subset of variable types 5 years ago
  Martin Kroeker ac653c94f3
Merge branch 'develop' into issue2588-cmake 5 years ago
  Martin Kroeker f032d8966e
Merge pull request #2874 from Flamefire/memory_fixes 5 years ago
  Martin Kroeker f6e4cf2f9d
Merge pull request #2876 from Flamefire/omp_fork_fix 5 years ago
  User User-User d2333e7842 aarch64 fix std=c18 compilation 5 years ago
  Alexander Grund 3094fc6c83
Lazyly reinit threads after a fork in OMP mode 5 years ago
  Alexander Grund 3c05f54df8
Avoid out of bounds access on invalid memory free 5 years ago
  Alexander Grund dee7c49938
Fix TABs and trailing space 5 years ago
  Martin Kroeker 896bbd55e1
Add support for building only selected variable types 5 years ago
  Martin Kroeker 357bff06b5
Add BUILD_vartype defines 5 years ago
  Martin Kroeker 988a6f429e
Add BUILD_vartype defines 5 years ago
  Martin Kroeker e5e2fbd593
Support building only selected types 5 years ago
  Martin Kroeker 3287848c8f
Support building only seleced types 5 years ago
  y00512012 06cf73a239 fix a bug of trmm 5 years ago
  Martin Kroeker ddec244a5a
Merge pull request #2838 from austinpagan/gordon_trmm 5 years ago
  fossum dfeca46098 Adding performance patch for trmm, just like #2836 5 years ago
  fossum 274d6e015b Fixing a performance bug in trsm_[LR].c. 5 years ago
  Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis 5 years ago
  Marius Hillenbrand a55fe06f25 s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions 5 years ago
  Marius Hillenbrand 4f34bcfb5e s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code 5 years ago
  Martin Kroeker 330044d821
Fix potentiol domain error in sqrt 5 years ago
  Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double 5 years ago
  Chen, Guobing 0c1c903f1e Fix OMP num specify issue 5 years ago
  Chen, Guobing e740c4873d Enable COOPERLAKE build target 5 years ago