4847 Commits (7d3ecc92cb503df4e732b8d77ba7eb34304b44c4)
 

Author SHA1 Message Date
  Wangyang Guo 7d3ecc92cb Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel 5 years ago
  Wangyang Guo 96a4f950ef Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 5 years ago
  Wangyang Guo 754dc9ffb9 Small Matrix: skylakex: add sgemm tn kernel 5 years ago
  Wangyang Guo 967df074b7 Small Matrix: skylakex: sgemm nt: optimize for M < 12 5 years ago
  Wangyang Guo fdd2d0fc7b Small Matrix: skylakex: add sgemm nt kernel 5 years ago
  Wangyang Guo 5f91668904 Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4 5 years ago
  Wangyang Guo 0ecaa99fc2 Small Matrix: skylakex: sgemm nn: fix error when beta not zero 5 years ago
  Wangyang Guo a1835c8ca2 Small Matrix: skylakex: sgemm nn: add n6 to improve performance 5 years ago
  Wangyang Guo 50bd888c73 Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time 5 years ago
  Wangyang Guo 95912941ca Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time 5 years ago
  Wangyang Guo 8a4bb07453 Small Matrix: skylakex: sgemm nn: clean up unused code 5 years ago
  Wangyang Guo 04ac9c7a13 Small Matrix: skylakex: sgemm_nn: optimize for M <= 8 5 years ago
  Wangyang Guo 20befbb2f9 Optimize M < 16 using AVX512 mask 5 years ago
  Wangyang Guo c3e4c4db47 small matrix: SkylakeX: add SGEMM NN kernel 5 years ago
  Zhang Xianyi 77460ac255 Fix gemm_batch bug for SMALL_MATRIX_OPT=1. 5 years ago
  Zhang Xianyi 88e6806e3f Init cblas_?gemm_batch implementation. 5 years ago
  Xianyi Zhang 4130d1732e Refs #2587 fix small matrix c/zgemm bug. 5 years ago
  Xianyi Zhang 255b6dd0fa Merge branch 'develop' into small_matrices 5 years ago
  Xianyi Zhang 741d6c5cb8 Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 5 years ago
  Martin Kroeker 514a3d7d63
Merge pull request #2798 from kadler/aix-cpuid 5 years ago
  Kevin Adler 085aae8bdb
Fix compile error on AIX cpuid detection 5 years ago
  Xianyi Zhang 712ca43069 Change a1b0 gemm to b0 gemm. 5 years ago
  Martin Kroeker 5c6c2cd4f6
Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify 5 years ago
  Martin Kroeker e54be4ba1c
Merge pull request #2792 from pkubaj/patch-1 5 years ago
  pkubaj 48a1364e10
Add aliases for armv6, armv7 5 years ago
  Chen, Guobing 0c1c903f1e Fix OMP num specify issue 5 years ago
  Martin Kroeker a073fa870e
Merge pull request #2791 from martin-frbg/issue2787 5 years ago
  Martin Kroeker b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 5 years ago
  Martin Kroeker b11bb6e728
Merge pull request #2790 from martin-frbg/issue2789 5 years ago
  Martin Kroeker 1840bc5b52
Add OpenMP dependency to pkgconfig file if needed 5 years ago
  Martin Kroeker 7c0977c267
Add OpenMP dependency to pkgconfig file if needed 5 years ago
  Martin Kroeker fb3d80c42a
Merge pull request #78 from xianyi/develop 5 years ago
  Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support 5 years ago
  Martin Kroeker bd3207b4b4
Update system.cmake 5 years ago
  Martin Kroeker b8ebfc9335
Update system.cmake 5 years ago
  Martin Kroeker 7c1986640b
fallback from cooperlake to skylake if gcc<10 5 years ago
  Martin Kroeker 71d33c952d
Typo fix 5 years ago
  Martin Kroeker 6a3c074786
-march=cooperlake requires gcc10 5 years ago
  Martin Kroeker 430f741b30
-march=cooperlake requires gcc10 5 years ago
  Martin Kroeker 6f4dc7445d
Fix typo 5 years ago
  Martin Kroeker 81fbe8d088
-march=cooperlake only available in gcc >= 10 5 years ago
  Martin Kroeker bb9cf766f5
make march=cooperlake option conditional on gcc >= 10.1 5 years ago
  Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782) 5 years ago
  Martin Kroeker 2c72972570
Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config 5 years ago
  Albert Ziegenhagel 6b731d917f Do not require pkg-config to generate the *.pc file 5 years ago
  Martin Kroeker 5dcf47cd97
Merge pull request #2784 from martin-frbg/issue2783 5 years ago
  Martin Kroeker aa286e301b
Add typedef for bfloat16 if needed 5 years ago
  Martin Kroeker 9f0ef9cdfc
Merge pull request #77 from xianyi/develop 5 years ago
  Martin Kroeker 6bfc66663c
revert 5 years ago
  Martin Kroeker a8c6fb9e1c
revert 5 years ago