2075 Commits (fbd612f8c4f3eceb16a645c8a5366af35e7b6a2e)

Author SHA1 Message Date
  Hao Chen fbd612f8c4 loongarch64: Add ic/zamin optimization functions. 2 years ago
  Hao Chen d97272cb35 loongarch64: Add c/zdot optimization functions. 2 years ago
  Hao Chen 65a0aeb128 loongarch64: Add c/zcopy optimization functions. 2 years ago
  Hao Chen 2a34fb4b80 loongarch64: Add and refine scal optimization functions. 2 years ago
  Hao Chen 8785e948b5 loongarch64: Add camin optimization function. 2 years ago
  Hao Chen 0753848e03 loongarch64: Refine and add axpy optimization functions. 2 years ago
  Hao Chen 06fd5b5995 loongarch64: Add and Refine asum optimization functions. 2 years ago
  guxiwei e771be185e Optimize copy functions with lsx. 2 years ago
  Hao Chen 179ed51d3b Add dgemm_kernel_8x4.S file. 2 years ago
  Hao Chen 173a65d4e6 loongarch64: Add and refine iamax optimization functions. 2 years ago
  zhoupeng ea70e165c7 loongarch64: Refine rot optimization. 2 years ago
  zhoupeng 116aee7527 loongarch64: Refine imin optimization. 2 years ago
  zhoupeng 8be2654193 loongarch64: Refine imax optimization. 2 years ago
  zhoupeng 154baad454 loongarch64: Refine iamin optimization. 2 years ago
  Shiyou Yin 36c12c4971 loongarch64: Refine copy,swap,nrm2,sum optimization. 2 years ago
  Shiyou Yin c6996a80e9 loongarch64: Refine amax,amin,max,min optimization. 2 years ago
  Martin Kroeker f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one 2 years ago
  barracuda156 d9653af018 KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing 2 years ago
  Chip-Kerchner 93747fb377 Merge remote-tracking branch 'origin/develop' into power10Copies 2 years ago
  Chip-Kerchner 4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 2 years ago
  yancheng d32f38fb37 loongarch64: Add optimizations for nrm2. 2 years ago
  yancheng f9b468990e loongarch64: Add optimizations for rot. 2 years ago
  yancheng c80e7e27d1 loongarch64: Add optimizations for sum and asum. 2 years ago
  yancheng d4c96a35a8 loongarch64: Add optimizations for axpy and axpby. 2 years ago
  yancheng 360acc0a41 loongarch64: Add optimizations for swap. 2 years ago
  yancheng 174c25766b loongarch64: Add optimizations for copy. 2 years ago
  yancheng 49829b2b7d loongarch64: Add optimizations for iamin. 2 years ago
  yancheng be83f5e4e0 loongarch64: Add optimizations for iamax. 2 years ago
  yancheng e3fb2b5afa loongarch64: Add optimizations for imin. 2 years ago
  yancheng e46b48e372 loongarch64: Add optimizations for imax. 2 years ago
  yancheng 702fc1d56d loongarch64: Add optimization for min. 2 years ago
  yancheng 346b384d1c loongarch64: Add optimization for max. 2 years ago
  yancheng ff2ecc6cda loongarch64: Add optimization for amin. 2 years ago
  yancheng 265b5f2e80 loongarch64: Add optimizations for amax. 2 years ago
  yancheng 993ede7c70 loongarch64: Add optimizations for scal. 2 years ago
  Martin Kroeker 39bf8ece20
Merge pull request #4340 from yinshiyou/la-dev 2 years ago
  Shiyou Yin 9fe07d82fd loongarch: Add LSX optimization for dot. 2 years ago
  Shiyou Yin 13b8c44b44 loongarch: Add optimization for dsdot kernel. 2 years ago
  Shiyou Yin 3def6a8143 loongarch: Add LASX optimization for dot. 2 years ago
  Bart Oldeman c34e2cf380 Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum 2 years ago
  Martin Kroeker 22aa401656
Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC (#4327) 2 years ago
  Bart Oldeman f8ad5344c2 Fix casum fallback kernel. 2 years ago
  Martin Kroeker 04bc801999
(Re)apply fixes for supporting only a subset of precision types from PR 3915 2 years ago
  Martin Kroeker 9019bc4945
Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well 2 years ago
  Martin Kroeker 3bfa4d4dcc
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 2 years ago
  Rajalakshmi Srinivasaraghavan 980f702f72 POWER: AIX: Make use of power10 optimization 2 years ago
  Rajalakshmi Srinivasaraghavan 9f42570e33 POWER: Increase macro size limit for AIX 2 years ago
  Martin Kroeker 9f49aef91b
Merge pull request #4255 from RajalakshmiSR/AIX-P10 2 years ago
  Martin Kroeker e7d05402e0
Fix up S/D GEMM copy function definitions after #4009 2 years ago
  Rajalakshmi Srinivasaraghavan 71d733e5f7 POWER: Avoid m4 conversions for C files 2 years ago