167 Commits (afa0cece5cbca7ce9c749b3101ac36b15518508e)

Author SHA1 Message Date
  Sunita Nadampalli 19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 4 years ago
  Jia-Chen b610d2de37 optimize cgemm on ARM cortex A53 & cortex A55 4 years ago
  Bine Brank a8f62a347b fix UNROLL_MN and add to targets for SVE 4 years ago
  Bine Brank a1fea1fe2a sgemm v2x8 SVE kernel 4 years ago
  Bine Brank abe1ce3434 strmm sve v1x8 kernel 4 years ago
  Bine Brank 0de36f7b5c trmm sve copy fucntions for single precision 4 years ago
  Bine Brank 86ae89bf33 add sgemm kernel and copy functions for sgemm and ssymm 4 years ago
  Martin Kroeker 454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm 4 years ago
  Jia-Chen 5c1cd5e0c2 MOD: add comments to a53 zgemm kernel 4 years ago
  Jia-Chen 9f59b19fcd MOD: optimize zgemm on cortex-A53/cortex-A55 4 years ago
  Bine Brank 531a28b6a0 removed unused code (compiler warnings) 4 years ago
  Bine Brank 9b9cb90bb1 modify Makefile for SVE copy 4 years ago
  Bine Brank b58d4f31ab some clean-up & commentary 4 years ago
  Bine Brank e6ed4be02e symm SVE copy rutines 4 years ago
  Jia-Chen 302f22693a MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 4 years ago
  Bine Brank 3c7eed0e53 add remaining trmm copy rutines for SVE 4 years ago
  Bine Brank 7d996b1c36 dtrmm_utcopy sve function 4 years ago
  Bine Brank ab7917910d add v2x8 kernel + fix sve dtrmm 4 years ago
  Bine Brank 7093372e32 add ARMV8SVE target 4 years ago
  Bine Brank a8fbdbac34 fix sve dgemm kernel + sve dtrmm 4 years ago
  Bine Brank 746b4f0f17 added SVE ncopy and tcopy 4 years ago
  Bine Brank 1a10d3e09d add sve dgemm prototype 4 years ago
  Martin Kroeker 22bf5c27ba
Add basic support for the Fujitsu A64FX (#3415) 4 years ago
  Martin Kroeker 8c20ca345a
Use Neoverse's current mix of ThunderX2 kernels for Vortex as well 4 years ago
  Martin Kroeker 90cc944625
Move alphaI to x22 to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 590fbff06e
move alpha to x19/x20 to leave x18 unused for OSX 4 years ago
  Martin Kroeker 380940271b
Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 7d75177446
Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 0a4ac4b585
Use x21 for I to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 7d4a221579
Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX) 4 years ago
  User User-User 39ef0880ae copy conf 4 years ago
  Gilles Gouaillardet 9d292d37b2 arm64: add the missing d9 register to the clobber list 4 years ago
  CodesWithWolves d2bda3b56a Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro 4 years ago
  Martin Kroeker b716c0ef01
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 2efa3b70dc
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 49959d4f1c
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
  Martin Kroeker c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
  Ashwin Sekhar T K 1b2508362b arm64: Fix nrm2 for input vectors with Inf 5 years ago
  Martin Kroeker 8631e2976a
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 2768bc1764
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 6f4698ee1f
Temporarily revert to the old nrm2 kernel 5 years ago
  Martin Kroeker e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot 5 years ago
  User User-User d2333e7842 aarch64 fix std=c18 compilation 5 years ago
  Qiyu8 60e6c68e38 Adapt ARM architect 5 years ago
  Martin Kroeker 775a87242d
Rename KERNEL.SILICON to KERNEL.VORTEX 5 years ago
  Martin Kroeker 80794fe8fd
Create KERNEL.SILICON 5 years ago
  Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 5 years ago
  ZhangDanfeng bc6fd20a40 fix INIT8x4 5 years ago
  ZhangDanfeng 9b7877ccf1 sgemm copy source init 5 years ago