1263 Commits (934e601e934f5cf930382dfd9d7e92b937d1d2ed)

Author SHA1 Message Date
  wjc404 934e601e93
Update dgemm_kernel_4x8_skylakex_2.c 6 years ago
  wjc404 eb1e9c8c92
some optimizations 6 years ago
  Andreas Arnez d117dfd505 Change bad usage of "asum" to "sum" in ZARCH versions of ?sum 6 years ago
  Martin Kroeker b09b5be0a4
Merge pull request #2315 from ewanglong/develop 6 years ago
  Wang, Long bfb5fbdb4d revised fix windows compatible for #2313 6 years ago
  Martin Kroeker 08fa83aba2
Merge pull request #2312 from martin-frbg/power8be 6 years ago
  Wang, Long 1191db1a49 For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length 6 years ago
  Wang, Long 0caf1434c9 Fix the integer overflow issue for large matrix size 6 years ago
  Martin Kroeker cad0d150db
Define alternate kernels for big-endian POWER8 6 years ago
  Martin Kroeker eba0aeb7cd
Fix compilation for big-endian POWER8 6 years ago
  Martin Kroeker 0c07c356c1
Define alternate kernels for big-endian PPC440 6 years ago
  Martin Kroeker 3e67017ac8
Merge pull request #2309 from martin-frbg/ppc970-be 6 years ago
  Martin Kroeker b3ac6ee222
Define alternate kernels for big-endian PPC970 6 years ago
  Martin Kroeker 71e96163db
Merge pull request #2305 from wjc404/develop 6 years ago
  wjc404 819e852ae7
AVX512 CGEMM & ZGEMM kernels 6 years ago
  Martin Kroeker 4c6a457358
Merge pull request #2300 from wjc404/develop 6 years ago
  wjc404 836c414e22
optimizations of software prefetching 6 years ago
  Martin Kroeker 3cd97f1a80
Merge pull request #2301 from martin-frbg/ppc8be 6 years ago
  wjc404 430c11e135
Add files via upload 6 years ago
  wjc404 fbacd2605d
optimizations via software prefetches 6 years ago
  Martin Kroeker 68597002ea
The assembly microkernel is not safe to use on ELFv1 6 years ago
  Martin Kroeker d2a6285549
The assembly microkernel is not safe to use on ELFv1 6 years ago
  Martin Kroeker d999688d1a
The assembly microkernel is not safe to use on ELFv1 6 years ago
  Martin Kroeker 928fe1b28e
The assembly microkernel is not safe to use on ELFv1 6 years ago
  wjc404 1df9a2013d
new sgemm kernel for skylakex 6 years ago
  Martin Kroeker 85ccdce8c4
Remove the IOS fallbacks to generic C kernels 6 years ago
  wjc404 6ff013bae0
native support for icopy_4 6 years ago
  wjc404 0d669e04bb
Update dgemm_kernel_8x8_skylakex.c 6 years ago
  wjc404 17cdd9f9e1
some correction 6 years ago
  wjc404 6bcb06fcb1
make further changes to icopy_8 easier 6 years ago
  wjc404 b7315f8401
Add files via upload 6 years ago
  wjc404 9b19e9e1b0
Update dgemm_kernel_8x8_skylakex.c 6 years ago
  wjc404 6bd67ddbab
Update dgemm_kernel_8x8_skylakex.c 6 years ago
  wjc404 844629af57
Add files via upload 6 years ago
  Martin Kroeker a448884a63
Remove automatic label postfixes from macro included only once 6 years ago
  Martin Kroeker 3a2df19db6
Fix accidental duplication of jump instruction 6 years ago
  Martin Kroeker d2093a40d3
Merge pull request #2277 from martin-frbg/issue2275 6 years ago
  Martin Kroeker 56837e9d92
Make local labels in macro compatible with the xcode assembler 6 years ago
  Martin Kroeker 5e244d80f2
Merge pull request #2271 from quickwritereader/strmm_fix 6 years ago
  AbdelRauf ede5efebab trmm fix 6 years ago
  Martin Kroeker 596a22325a
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc 6 years ago
  Martin Kroeker 7f58f3ad0e
Fix mis-edits in the gcc-derived power8 caxpy kernel 6 years ago
  Martin Kroeker 673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263) 6 years ago
  Martin Kroeker e7c4d6705a
Revert #2051 and replace with a better fix (#2261) 6 years ago
  Martin Kroeker f3c314550c
Merge pull request #2243 from quickwritereader/develop 6 years ago
  AbdelRauf 847c20c9b7 fix uninitialized variables i 6 years ago
  AbdelRauf 4c22828812 caxpy and cdot are using vec_vsx_ld 6 years ago
  AbdelRauf e79712d969 cgemv using vec_vsx_ld instead of letting gcc to decide 6 years ago
  AbdelRauf be09551cdf aligned 6 years ago
  Martin Kroeker 11c59acfb1
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows 6 years ago