457 Commits (f4b82d7bc4c20da29c19b2eece602002bd5fe4af)

Author SHA1 Message Date
  Martin Kroeker 32b0f1168e
Fix declaration of input arguments in the Sandybridge GER microkernels (#1967) 7 years ago
  Martin Kroeker b495e54310
Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 7 years ago
  Martin Kroeker d5e6940253
Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 7 years ago
  Arjan van de Ven 795285c587 Fix thinko in skylake beta handling 7 years ago
  Arjan van de Ven d321448a63 dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell 7 years ago
  Arjan van de Ven c43331ad0a dgemm: Use the skylakex beta function also for haswell 7 years ago
  Arjan van de Ven 69d206440a Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support 7 years ago
  Arjan van de Ven 0586899a10 Use sgemm_ncopy_4_skylakex.c also for Haswell 7 years ago
  Arjan van de Ven 00dc09ad19 Use the skylake sgemm beta code also for haswell 7 years ago
  Arjan van de Ven cdc668d82b Add a "sgemm direct" mode for small matrixes 7 years ago
  Martin Kroeker 701ea88347
Use p2align instead of align for OSX compatibility 7 years ago
  Andrew 19c4bdd8b3 Add return value so that freebsd system clang does not err out 7 years ago
  Arjan van de Ven dcc5d6291e skylakex: Make the sgemm/dgemm beta code robust for a N=0 or M=0 case 7 years ago
  Arjan van de Ven 55b244ca0d enable the SGEMM/SKX C based kernel 7 years ago
  Arjan van de Ven d4bad73834 Add a C+intrinsics version of the SGEMM/skylakex kernel 7 years ago
  Arjan van de Ven 582c589727 dgemm/skylakex: replace discrete mul/add with fma 7 years ago
  Arjan van de Ven adbf6afa25 Add vector optimizations for ncopy as well for dgemm/skylakex 7 years ago
  Arjan van de Ven 32bec8afbb add a skylakex optimized dgemm beta function 7 years ago
  Arjan van de Ven 20c5d668fe dgemm/avx512 simplify and speed up the 4x4 kernel 7 years ago
  Arjan van de Ven 6d43c51ccf undo slow dgemm/skylake microoptimization 7 years ago
  Arjan van de Ven d74dc39b0f Add optimized *copy versions for skylakex 7 years ago
  Arjan van de Ven 66b43affbc Add a 24x8 kernel to the skylakex dgemm implementation 7 years ago
  Arjan van de Ven 1938819c25 skylake dgemm: Add a 16x8 kernel 7 years ago
  Martin Kroeker b7496c3638
Function name needs to be CNAME, set from outside to allow suffixing for dynamic_arch 7 years ago
  Arjan van de Ven 45fe8cb0c5 Create a AVX512 enabled version of DGEMM 7 years ago
  Martin Kroeker 375dff54fc
Merge pull request #1733 from fenrus75/dsymv 7 years ago
  Martin Kroeker a5f165275a
Merge pull request #1732 from fenrus75/dgemv 7 years ago
  Martin Kroeker 8c13aa495a
Merge pull request #1730 from fenrus75/fix-sdot 7 years ago
  Arjan van de Ven 9bec34cb67 Add an AVX512 enabled DSYMV (L) function 7 years ago
  Arjan van de Ven 87bebdbd8a Add an AVX512 enabled DGEMV (n) function 7 years ago
  Arjan van de Ven 36add7570a Fix typo in sdot function 7 years ago
  Arjan van de Ven cacacc8007 Add an AVX512 enabled DSCAL function 7 years ago
  Martin Kroeker 1a00ef3d27
Merge pull request #1725 from fenrus75/axpy 7 years ago
  Arjan van de Ven 2e99873ff7 Add a AVX512 enabled SAXPY/DAXPY functions 7 years ago
  Arjan van de Ven 00abaa865b Add an AVX512 enabled SDOT function 7 years ago
  Arjan van de Ven 7932ff3ea9 Add an AVX512 enabled DDOT function 7 years ago
  Martin Kroeker 6e54b0a027
Disable the 16x2 DTRMM kernel on SkylakeX as well 7 years ago
  Martin Kroeker f0a8dc2eec
Disable the AVX512 DGEMM kernel for now 7 years ago
  Craig Donner c2545b0fd6 Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
  Arjan van de Ven 89372e0993 Use AVX512 also for DGEMM 7 years ago
  Arjan van de Ven 99c7bba8e4 Initial support for SkylakeX / AVX512 7 years ago
  Martin Kroeker 840e01061f
Merge pull request #1491 from martin-frbg/ddot_mt 7 years ago
  Martin Kroeker a55694dd5b
Declare dot_compute static to avoid conflicts in multiarch builds 7 years ago
  Martin Kroeker 85a41e9cdb
Add multithreading support for Haswell DDOT 7 years ago
  Martin Kroeker 81215711a2
Re-enable DAXPY microkernels for x86_64 8 years ago
  Martin Kroeker 497f0c3d8a
Replace .align with .p2align in the Nehalem microkernels 8 years ago
  Martin Kroeker ea37db828e
Convert .align to .p2align for OSX compatibility 8 years ago
  Martin Kroeker 7c1925acec
Use .p2align instead of .align for compatibility on Sandybridge as well 8 years ago
  Martin Kroeker 2359c7c1a9
Use .p2align instead of .align for portability 8 years ago
  Martin Kroeker e388459a27
Merge pull request #1419 from brada4/develop 8 years ago