1475 Commits (754dc9ffb9f49412bdde6d820a3717c62af5b83b)

Author SHA1 Message Date
  Wangyang Guo 754dc9ffb9 Small Matrix: skylakex: add sgemm tn kernel 5 years ago
  Wangyang Guo 967df074b7 Small Matrix: skylakex: sgemm nt: optimize for M < 12 5 years ago
  Wangyang Guo fdd2d0fc7b Small Matrix: skylakex: add sgemm nt kernel 5 years ago
  Wangyang Guo 5f91668904 Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4 5 years ago
  Wangyang Guo 0ecaa99fc2 Small Matrix: skylakex: sgemm nn: fix error when beta not zero 5 years ago
  Wangyang Guo a1835c8ca2 Small Matrix: skylakex: sgemm nn: add n6 to improve performance 5 years ago
  Wangyang Guo 50bd888c73 Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time 5 years ago
  Wangyang Guo 95912941ca Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time 5 years ago
  Wangyang Guo 8a4bb07453 Small Matrix: skylakex: sgemm nn: clean up unused code 5 years ago
  Wangyang Guo 04ac9c7a13 Small Matrix: skylakex: sgemm_nn: optimize for M <= 8 5 years ago
  Wangyang Guo 20befbb2f9 Optimize M < 16 using AVX512 mask 5 years ago
  Wangyang Guo c3e4c4db47 small matrix: SkylakeX: add SGEMM NN kernel 5 years ago
  Xianyi Zhang 4130d1732e Refs #2587 fix small matrix c/zgemm bug. 5 years ago
  Xianyi Zhang 255b6dd0fa Merge branch 'develop' into small_matrices 5 years ago
  Xianyi Zhang 741d6c5cb8 Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 5 years ago
  Xianyi Zhang 712ca43069 Change a1b0 gemm to b0 gemm. 5 years ago
  Martin Kroeker b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 5 years ago
  Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support 5 years ago
  Martin Kroeker 6f4dc7445d
Fix typo 5 years ago
  Martin Kroeker 81fbe8d088
-march=cooperlake only available in gcc >= 10 5 years ago
  Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782) 5 years ago
  Chen, Guobing e740c4873d Enable COOPERLAKE build target 5 years ago
  Martin Kroeker cbbe38bb88
Merge pull request #2772 from mhillenibm/s390x_gemm_tuning 5 years ago
  Marius Hillenbrand 07c334e7be s390x: Factor out small block sizes for SGEMM/DGEMM on z14 5 years ago
  Marius Hillenbrand e2828e30aa s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving 5 years ago
  Rajalakshmi Srinivasaraghavan 475b5c95b9 Remove extra symbol in Makefile 5 years ago
  Martin Kroeker 81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number 5 years ago
  Martin Kroeker 0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number 5 years ago
  Martin Kroeker aa53a8a5cb
Multiply by two instead of left-shifting one place 5 years ago
  Martin Kroeker aa3a1e7d8c
Multiply by two rather than left shift by one place 5 years ago
  Rajalakshmi Srinivasaraghavan f77b6a83f4 dgemv optimization for POWER10 5 years ago
  Rajalakshmi Srinivasaraghavan d557584b71 Fix compilation issues with clang on POWER 5 years ago
  Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 6 years ago
  Rajalakshmi Srinivasaraghavan 9be2688c78 Fix to store results in correct order for POWER10 GEMM kernels 5 years ago
  Martin Kroeker 6a2a60038c
Merge pull request #2720 from martin-frbg/issue2694 5 years ago
  Martin Kroeker 251a09ec90
Typo fix 5 years ago
  Martin Kroeker 95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY 5 years ago
  Martin Kroeker 3523bb778e
Merge pull request #2721 from martin-frbg/p8align 5 years ago
  Martin Kroeker bf1f0734ff
Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only 5 years ago
  Martin Kroeker ca3561cab9
Add ifdefs around call to altivec microkernel 5 years ago
  Martin Kroeker 21072e502a
Typo fix 5 years ago
  Martin Kroeker 7c6e56b5df
Rewrite assignment to complex for better portability 5 years ago
  Martin Kroeker 661c6bfa5a
Exclude altivec code paths if the compiler does not support them 5 years ago
  Martin Kroeker 0033f8be0d
Use vec_vsx_ld/st to fix misaligned accesses flagged by asan 5 years ago
  Martin Kroeker f308e741b2
remove debug output and revert changes to cdot and crot 5 years ago
  Martin Kroeker da17abec87
fix trailing whitespace 5 years ago
  Martin Kroeker f8c2697701
Use POWER6 GEMM, TRMM and DTRSM on 32bit POWER8 5 years ago
  Martin Kroeker b144423f0f
Do not define USE_TRMM for 32bit POWER8 5 years ago
  Martin Kroeker ed7e155c35
Merge branch 'develop' into aix 5 years ago
  EGuesnet 634e1305f9
Update cgemm_kernel_8x4_power8.S 5 years ago