1237 Commits (274ff5cdb884f869c8cb99afceb56b1e8f59f87f)

Author SHA1 Message Date
  wjc404 6ff013bae0
native support for icopy_4 6 years ago
  wjc404 0d669e04bb
Update dgemm_kernel_8x8_skylakex.c 6 years ago
  wjc404 17cdd9f9e1
some correction 6 years ago
  wjc404 6bcb06fcb1
make further changes to icopy_8 easier 6 years ago
  wjc404 b7315f8401
Add files via upload 6 years ago
  wjc404 9b19e9e1b0
Update dgemm_kernel_8x8_skylakex.c 6 years ago
  wjc404 6bd67ddbab
Update dgemm_kernel_8x8_skylakex.c 6 years ago
  wjc404 844629af57
Add files via upload 6 years ago
  Martin Kroeker a448884a63
Remove automatic label postfixes from macro included only once 6 years ago
  Martin Kroeker 3a2df19db6
Fix accidental duplication of jump instruction 6 years ago
  Martin Kroeker d2093a40d3
Merge pull request #2277 from martin-frbg/issue2275 6 years ago
  Martin Kroeker 56837e9d92
Make local labels in macro compatible with the xcode assembler 6 years ago
  Martin Kroeker 5e244d80f2
Merge pull request #2271 from quickwritereader/strmm_fix 6 years ago
  AbdelRauf ede5efebab trmm fix 6 years ago
  Martin Kroeker 596a22325a
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc 6 years ago
  Martin Kroeker 7f58f3ad0e
Fix mis-edits in the gcc-derived power8 caxpy kernel 6 years ago
  Martin Kroeker 673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263) 6 years ago
  Martin Kroeker e7c4d6705a
Revert #2051 and replace with a better fix (#2261) 6 years ago
  Martin Kroeker f3c314550c
Merge pull request #2243 from quickwritereader/develop 6 years ago
  AbdelRauf 847c20c9b7 fix uninitialized variables i 6 years ago
  AbdelRauf 4c22828812 caxpy and cdot are using vec_vsx_ld 6 years ago
  AbdelRauf e79712d969 cgemv using vec_vsx_ld instead of letting gcc to decide 6 years ago
  AbdelRauf be09551cdf aligned 6 years ago
  Martin Kroeker 11c59acfb1
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows 6 years ago
  Martin Kroeker 3a55dca2dc
Make x86_64 zdot compile with PGI and Sun C again 6 years ago
  Martin Kroeker 9ef96b32a6
Add multithreading support to the x86_64 zdot kernel (#2222) 6 years ago
  Martin Kroeker 103b32fdb7
Merge pull request #2216 from martin-frbg/issue2214 6 years ago
  Martin Kroeker aef9804089
Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV 6 years ago
  Martin Kroeker dccff2e785
Merge pull request #2206 from martin-frbg/zen-dtrmm 6 years ago
  Martin Kroeker 5c3458a6e7
Merge pull request #2199 from martin-frbg/zen-dtrsm 6 years ago
  Martin Kroeker acf6002ab2
Replace most vpermpd calls in the Haswell DTRSM_RN kernel 6 years ago
  Martin Kroeker 2dfb804cb9
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel 6 years ago
  Martin Kroeker 4c153ec9da
Merge pull request #2196 from wjc404/develop 6 years ago
  wjc404 7eecd8e39c
Add files via upload 6 years ago
  Martin Kroeker 7b0b7c11d2
Merge pull request #2190 from martin-frbg/zdot-zen 6 years ago
  Martin Kroeker 28e96458e5
Replace vpermpd with vpermilpd 6 years ago
  wjc404 95fb98f556
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 4801c6d36b
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 9440fa607d
Add files via upload 6 years ago
  wjc404 94db259e5b
Add files via upload 6 years ago
  wjc404 f49f8047ac
Add files via upload 6 years ago
  wjc404 825777faab
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 9c89757562
Add files via upload 6 years ago
  wjc404 9b04baeaee
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 8a074b3965
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 211ab03b14
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 1733f927e6
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 182b06d6ad
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 7a9050d681
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 0ba29fd262
Update dgemm_kernel_4x8_haswell.S for zen2 6 years ago