148 Commits (bcfbdc81b2caed97032e63a23fd992c8ce3d0490)

Author SHA1 Message Date
  Jia-Chen 5c1cd5e0c2 MOD: add comments to a53 zgemm kernel 4 years ago
  Jia-Chen 9f59b19fcd MOD: optimize zgemm on cortex-A53/cortex-A55 4 years ago
  Jia-Chen 302f22693a MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 4 years ago
  Martin Kroeker 22bf5c27ba
Add basic support for the Fujitsu A64FX (#3415) 4 years ago
  Martin Kroeker 8c20ca345a
Use Neoverse's current mix of ThunderX2 kernels for Vortex as well 4 years ago
  Martin Kroeker 90cc944625
Move alphaI to x22 to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 590fbff06e
move alpha to x19/x20 to leave x18 unused for OSX 4 years ago
  Martin Kroeker 380940271b
Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 7d75177446
Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 0a4ac4b585
Use x21 for I to leave x18 unused (reserved on OSX) 4 years ago
  Martin Kroeker 7d4a221579
Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX) 4 years ago
  User User-User 39ef0880ae copy conf 4 years ago
  Gilles Gouaillardet 9d292d37b2 arm64: add the missing d9 register to the clobber list 4 years ago
  CodesWithWolves d2bda3b56a Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro 4 years ago
  Martin Kroeker b716c0ef01
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 2efa3b70dc
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 49959d4f1c
Add workaround for NVIDIA HPC 5 years ago
  Martin Kroeker 0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
  Martin Kroeker c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 5 years ago
  Ashwin Sekhar T K 1b2508362b arm64: Fix nrm2 for input vectors with Inf 5 years ago
  Martin Kroeker 8631e2976a
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 2768bc1764
Temporarily revert to the old nrm2 kernels 5 years ago
  Martin Kroeker 6f4698ee1f
Temporarily revert to the old nrm2 kernel 5 years ago
  Martin Kroeker e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot 5 years ago
  User User-User d2333e7842 aarch64 fix std=c18 compilation 5 years ago
  Qiyu8 60e6c68e38 Adapt ARM architect 5 years ago
  Martin Kroeker 775a87242d
Rename KERNEL.SILICON to KERNEL.VORTEX 5 years ago
  Martin Kroeker 80794fe8fd
Create KERNEL.SILICON 5 years ago
  Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 5 years ago
  ZhangDanfeng bc6fd20a40 fix INIT8x4 5 years ago
  ZhangDanfeng 9b7877ccf1 sgemm copy source init 5 years ago
  ZhangDanfeng f82fa802d1 Insert prefetch 5 years ago
  张丹枫 9df79ae9a3 update sgemm and strmm kernel selecting strategy 5 years ago
  张丹枫 a1fc6041cd use general register to speedup 5 years ago
  张丹枫 edb423d772 align general register using to strmm_kernel_8x8 5 years ago
  zhangdanfeng 0e6eb8c247 sgemm kernel use sgemm_kernel_8x8_cortexa53 5 years ago
  zhangdanfeng d475db29c6 optimized for cortex-a53 5 years ago
  Ashwin Sekhar T K 8353cb245a ARM64: Improve DAXPY for ThunderX2 5 years ago
  Martin Kroeker 144be81ca1
fix initialization to zero in the NEON SGEMM_BETA kernel as well 5 years ago
  Martin Kroeker 07cdd5d05c
Fix zero initialization for beta=0 case 5 years ago
  s00548429 bec7923a0d Fix the functional bugs for zamax. 5 years ago
  Ali Saidi c623a965f9 Add Neoverse-N1 core 6 years ago
  Martin Kroeker e57b11acca
Add preliminary support for EMAG8180 6 years ago
  Martin Kroeker 456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero 6 years ago
  shengyang 80db5f11e1 update 6 years ago
  chenxuqiang 52de4cc8fd kernel/arm64/dgemm_beta.S: add beta == zero branch 6 years ago
  Martin Kroeker 44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2 6 years ago
  Martin Kroeker 86ab939936
Merge pull request #2354 from ZuoQ3/develop 6 years ago
  shengyang 8d84403205 Use arm neon instructions to optimize ncopy operation 6 years ago
  w00421467 0833a4846a Use arm neon instructions to optimize sgemm_beta operation 6 years ago