289 Commits (107c883c8a434ad5f24606876c5f5885ad8db113)

Author SHA1 Message Date
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 10 months ago
  Martin Kroeker cf06250d36
add handling of dummy2 flag 1 year ago
  Martin Kroeker 4ec62d7f73
remove non-vectorized code path for power8, restoring PR4880 1 year ago
  Ubuntu 0cc2485594 Explicit unaligned vector load/stores in PPC64LE GEMV kernels 1 year ago
  Martin Kroeker 77fba0f400
Fix "dummy2" flag handling 1 year ago
  Martin Kroeker 81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8 1 year ago
  Martin Kroeker 98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8 1 year ago
  Martin Kroeker d7036cfd74
Remove trailing blanks that break the cmake parser 1 year ago
  tingbo.liao 3c8df6358f Further rearranged the rotm kernel for the different architectures. 1 year ago
  Sergey Fedorov 229efa42ff scal.S: use r11 on 32-bit Darwin on powerpc 1 year ago
  Sergey Fedorov 81e1be8d90 Revert "temporarily disable the default S/DSCAL kernel" 1 year ago
  Martin Kroeker 9b9c0aa5c9
temporarily disable the default S/DSCAL kernel 1 year ago
  Ayappan Perumal 020cce1068 Fix build issues with gcc compiler as well 1 year ago
  Ayappan Perumal b6ec73e77c Fix AIX build 1 year ago
  Chip Kerchner ab71a1edf2 Better VSX. 1 year ago
  Chip Kerchner 36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 1 year ago
  Martin Kroeker e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c 1 year ago
  Gordon Fossum 0b7fb5c791 CGEMM & ZGEMM using C code. 1 year ago
  Martin Kroeker c9e92348a6
Handle inf/nan if dummy2 flag is set 1 year ago
  Martin Kroeker d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 1 year ago
  Chip Kerchner 1a7b8c650d Merge branch 'develop' into betterPowerGEMVTail 1 year ago
  Martin Kroeker f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes 1 year ago
  Martin Kroeker 73f8866ffb
make NAN handling depend on DUMMY2 parameter 1 year ago
  Hong Bo Peng db98f8753f Try to fix LAPACK testing failures on P7. 1 year ago
  Martin Kroeker b9bfc8ce09
make NAN handling depend on dummy2 parameter 1 year ago
  Chip Kerchner ba47c7f4f3 Vectorize reduction stage of sgemv_t. 1 year ago
  Chip Kerchner cb154832f8 Vectorize SBGEMM incopy - 4x faster. 1 year ago
  Martin Kroeker 2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 1 year ago
  Martin Kroeker 7f8f037a36
handle INF and NAN in input 1 year ago
  Martin Kroeker f1248b849d
handle INF and NAN in input 1 year ago
  Rajalakshmi Srinivasaraghavan e112191b54 POWER: Fix issues in zscal to address lapack failures 2 years ago
  Martin Kroeker aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix 2 years ago
  Chip Kerchner 3a1417671a POWER: Fixing endianness issue in cswap/zswap kernel for AIX 2 years ago
  Amrita H S 87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler. 2 years ago
  Chip-Kerchner 99384933ff Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code" 2 years ago
  Martin Kroeker accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code 2 years ago
  austinpagan 87ba528d8b Changed C files to straighten out indentation. Removed commented lines from other file. 2 years ago
  austinpagan ddac75e0ef Adding .C versions of CGEMM and ZGEMM 2 years ago
  Chip Kerchner 2bb7ea64a1 Only vectorize 64-bit version for Power8. 2 years ago
  Chip Kerchner 09bb48d1b9 Vectorize in-copy packing/copying for SGEMM - 4X faster. 2 years ago
  Chip-Kerchner 058dd2a4cb Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions. 2 years ago
  barracuda156 d9653af018 KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing 2 years ago
  Chip-Kerchner 4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 2 years ago
  Rajalakshmi Srinivasaraghavan 980f702f72 POWER: AIX: Make use of power10 optimization 2 years ago
  Rajalakshmi Srinivasaraghavan 82fc29a57a POWER10: Fallback to POWER8 functions 2 years ago
  Martin Kroeker 8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines 2 years ago
  Ian McInerney 79c15db348 Fix power10 gcc intrinsic check 2 years ago
  TGY b5ba95a6c0 Modernize obsolete inline order 2 years ago
  Martin Kroeker 54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation) 2 years ago
  Manjul Mohan 58b88aa5f0 POWER10: Fix compiler warnings 2 years ago