174 Commits (ce329ab6869bd958cde05c1dcd39ce7c6bc02cd9)

Author SHA1 Message Date
  Bine Brank ce329ab686 add sve zhemm copy routines 4 years ago
  Bine Brank 0140373802 add sve ztrmm 4 years ago
  Bine Brank f7b6912868 ztrmm sve copy kernels 4 years ago
  Bine Brank 40b14e4957 fix zgemm kernel 4 years ago
  Bine Brank 6ec4aab875 zgemm sve copy routines 4 years ago
  Bine Brank 878064f394 sve zgemm kernel 4 years ago
  Bine Brank 683a7548bf added macros for sve zgemm kernels 4 years ago
  Bine Brank e3c9947c0f prepare kernel for sve zgemm 4 years ago
  Jia-Chen b610d2de37 optimize cgemm on ARM cortex A53 & cortex A55 4 years ago
  Bine Brank a8f62a347b fix UNROLL_MN and add to targets for SVE 4 years ago
  Bine Brank a1fea1fe2a sgemm v2x8 SVE kernel 4 years ago
  Bine Brank abe1ce3434 strmm sve v1x8 kernel 4 years ago
  Bine Brank 0de36f7b5c trmm sve copy fucntions for single precision 4 years ago
  Bine Brank 86ae89bf33 add sgemm kernel and copy functions for sgemm and ssymm 4 years ago
  Martin Kroeker 454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm 4 years ago
  Jia-Chen 5c1cd5e0c2 MOD: add comments to a53 zgemm kernel 4 years ago
  Jia-Chen 9f59b19fcd MOD: optimize zgemm on cortex-A53/cortex-A55 4 years ago
  Bine Brank 531a28b6a0 removed unused code (compiler warnings) 4 years ago
  Bine Brank 9b9cb90bb1 modify Makefile for SVE copy 4 years ago
  Bine Brank b58d4f31ab some clean-up & commentary 4 years ago
  Bine Brank e6ed4be02e symm SVE copy rutines 4 years ago
  Jia-Chen 302f22693a MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 4 years ago
  Bine Brank 3c7eed0e53 add remaining trmm copy rutines for SVE 4 years ago
  Bine Brank 7d996b1c36 dtrmm_utcopy sve function 4 years ago
  Bine Brank ab7917910d add v2x8 kernel + fix sve dtrmm 4 years ago
  Bine Brank 7093372e32 add ARMV8SVE target 4 years ago
  Bine Brank a8fbdbac34 fix sve dgemm kernel + sve dtrmm 4 years ago
  Bine Brank 746b4f0f17 added SVE ncopy and tcopy 4 years ago
  Bine Brank 1a10d3e09d add sve dgemm prototype 4 years ago
  Martin Kroeker 22bf5c27ba
Add basic support for the Fujitsu A64FX (#3415) 4 years ago
  Martin Kroeker 8c20ca345a
Use Neoverse's current mix of ThunderX2 kernels for Vortex as well 4 years ago
  Martin Kroeker 90cc944625
Move alphaI to x22 to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 590fbff06e
move alpha to x19/x20 to leave x18 unused for OSX 4 years ago
  Martin Kroeker 380940271b
Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 7d75177446
Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 0a4ac4b585
Use x21 for I to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 7d4a221579
Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX) 4 years ago
  User User-User 39ef0880ae copy conf 5 years ago
  Gilles Gouaillardet 9d292d37b2 arm64: add the missing d9 register to the clobber list 5 years ago
  CodesWithWolves d2bda3b56a Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro 5 years ago
  Martin Kroeker b716c0ef01
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 2efa3b70dc
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 49959d4f1c
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
  Martin Kroeker c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
  Ashwin Sekhar T K 1b2508362b arm64: Fix nrm2 for input vectors with Inf 5 years ago
  Martin Kroeker 8631e2976a
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 2768bc1764
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 6f4698ee1f
Temporarily revert to the old nrm2 kernel 5 years ago
  Martin Kroeker e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot 5 years ago