531 Commits (2cd9306bb5138f8ec796964fa578b2ea1b73e921)

Author SHA1 Message Date
  wjc404 2cd9306bb5
Update KERNEL.ZEN 6 years ago
  wjc404 c418c81224
Update KERNEL.HASWELL 6 years ago
  wjc404 025741f16a
Fast Haswell CGEMM kernel 6 years ago
  wjc404 f41d52665d
Fast Haswell ZGEMM kernel 6 years ago
  wjc404 d573d24de7
Fast Haswell ZGEMM kernel 6 years ago
  Isuru Fernando b863b32ac5 Workaround an ICE in clang 9.0.0 6 years ago
  wjc404 934e601e93
Update dgemm_kernel_4x8_skylakex_2.c 6 years ago
  wjc404 eb1e9c8c92
some optimizations 6 years ago
  Wang, Long bfb5fbdb4d revised fix windows compatible for #2313 6 years ago
  Wang, Long 1191db1a49 For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length 6 years ago
  Wang, Long 0caf1434c9 Fix the integer overflow issue for large matrix size 6 years ago
  wjc404 819e852ae7
AVX512 CGEMM & ZGEMM kernels 6 years ago
  wjc404 836c414e22
optimizations of software prefetching 6 years ago
  wjc404 430c11e135
Add files via upload 6 years ago
  wjc404 fbacd2605d
optimizations via software prefetches 6 years ago
  wjc404 1df9a2013d
new sgemm kernel for skylakex 6 years ago
  wjc404 6ff013bae0
native support for icopy_4 6 years ago
  wjc404 0d669e04bb
Update dgemm_kernel_8x8_skylakex.c 6 years ago
  wjc404 17cdd9f9e1
some correction 6 years ago
  wjc404 6bcb06fcb1
make further changes to icopy_8 easier 6 years ago
  wjc404 b7315f8401
Add files via upload 6 years ago
  wjc404 9b19e9e1b0
Update dgemm_kernel_8x8_skylakex.c 6 years ago
  wjc404 6bd67ddbab
Update dgemm_kernel_8x8_skylakex.c 6 years ago
  wjc404 844629af57
Add files via upload 6 years ago
  Martin Kroeker 11c59acfb1
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows 6 years ago
  Martin Kroeker 3a55dca2dc
Make x86_64 zdot compile with PGI and Sun C again 6 years ago
  Martin Kroeker 9ef96b32a6
Add multithreading support to the x86_64 zdot kernel (#2222) 6 years ago
  Martin Kroeker dccff2e785
Merge pull request #2206 from martin-frbg/zen-dtrmm 6 years ago
  Martin Kroeker 5c3458a6e7
Merge pull request #2199 from martin-frbg/zen-dtrsm 6 years ago
  Martin Kroeker acf6002ab2
Replace most vpermpd calls in the Haswell DTRSM_RN kernel 6 years ago
  Martin Kroeker 2dfb804cb9
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel 6 years ago
  Martin Kroeker 4c153ec9da
Merge pull request #2196 from wjc404/develop 6 years ago
  wjc404 7eecd8e39c
Add files via upload 6 years ago
  Martin Kroeker 7b0b7c11d2
Merge pull request #2190 from martin-frbg/zdot-zen 6 years ago
  Martin Kroeker 28e96458e5
Replace vpermpd with vpermilpd 6 years ago
  wjc404 95fb98f556
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 4801c6d36b
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 9440fa607d
Add files via upload 6 years ago
  wjc404 94db259e5b
Add files via upload 6 years ago
  wjc404 f49f8047ac
Add files via upload 6 years ago
  wjc404 825777faab
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 9c89757562
Add files via upload 6 years ago
  wjc404 9b04baeaee
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 8a074b3965
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 211ab03b14
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 1733f927e6
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 182b06d6ad
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 7a9050d681
Update dgemm_kernel_4x8_haswell.S 6 years ago
  wjc404 0ba29fd262
Update dgemm_kernel_4x8_haswell.S for zen2 6 years ago
  Martin Kroeker 9ea30f3788
Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125) 6 years ago