444 Commits (76a66eaac8aaa795dddc26af4e43acb455654a18)

Author SHA1 Message Date
  Arjan van de Ven 55b244ca0d enable the SGEMM/SKX C based kernel 7 years ago
  Arjan van de Ven d4bad73834 Add a C+intrinsics version of the SGEMM/skylakex kernel 7 years ago
  Arjan van de Ven 582c589727 dgemm/skylakex: replace discrete mul/add with fma 7 years ago
  Arjan van de Ven adbf6afa25 Add vector optimizations for ncopy as well for dgemm/skylakex 7 years ago
  Arjan van de Ven 32bec8afbb add a skylakex optimized dgemm beta function 7 years ago
  Arjan van de Ven 20c5d668fe dgemm/avx512 simplify and speed up the 4x4 kernel 7 years ago
  Arjan van de Ven 6d43c51ccf undo slow dgemm/skylake microoptimization 7 years ago
  Arjan van de Ven d74dc39b0f Add optimized *copy versions for skylakex 7 years ago
  Arjan van de Ven 66b43affbc Add a 24x8 kernel to the skylakex dgemm implementation 7 years ago
  Arjan van de Ven 1938819c25 skylake dgemm: Add a 16x8 kernel 7 years ago
  Martin Kroeker b7496c3638
Function name needs to be CNAME, set from outside to allow suffixing for dynamic_arch 7 years ago
  Arjan van de Ven 45fe8cb0c5 Create a AVX512 enabled version of DGEMM 7 years ago
  Martin Kroeker 375dff54fc
Merge pull request #1733 from fenrus75/dsymv 7 years ago
  Martin Kroeker a5f165275a
Merge pull request #1732 from fenrus75/dgemv 7 years ago
  Martin Kroeker 8c13aa495a
Merge pull request #1730 from fenrus75/fix-sdot 7 years ago
  Arjan van de Ven 9bec34cb67 Add an AVX512 enabled DSYMV (L) function 7 years ago
  Arjan van de Ven 87bebdbd8a Add an AVX512 enabled DGEMV (n) function 7 years ago
  Arjan van de Ven 36add7570a Fix typo in sdot function 7 years ago
  Arjan van de Ven cacacc8007 Add an AVX512 enabled DSCAL function 7 years ago
  Martin Kroeker 1a00ef3d27
Merge pull request #1725 from fenrus75/axpy 7 years ago
  Arjan van de Ven 2e99873ff7 Add a AVX512 enabled SAXPY/DAXPY functions 7 years ago
  Arjan van de Ven 00abaa865b Add an AVX512 enabled SDOT function 7 years ago
  Arjan van de Ven 7932ff3ea9 Add an AVX512 enabled DDOT function 7 years ago
  Martin Kroeker 6e54b0a027
Disable the 16x2 DTRMM kernel on SkylakeX as well 7 years ago
  Martin Kroeker f0a8dc2eec
Disable the AVX512 DGEMM kernel for now 7 years ago
  Craig Donner c2545b0fd6 Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
  Arjan van de Ven 89372e0993 Use AVX512 also for DGEMM 7 years ago
  Arjan van de Ven 99c7bba8e4 Initial support for SkylakeX / AVX512 7 years ago
  Martin Kroeker 840e01061f
Merge pull request #1491 from martin-frbg/ddot_mt 7 years ago
  Martin Kroeker a55694dd5b
Declare dot_compute static to avoid conflicts in multiarch builds 7 years ago
  Martin Kroeker 85a41e9cdb
Add multithreading support for Haswell DDOT 7 years ago
  Martin Kroeker 81215711a2
Re-enable DAXPY microkernels for x86_64 8 years ago
  Martin Kroeker 497f0c3d8a
Replace .align with .p2align in the Nehalem microkernels 8 years ago
  Martin Kroeker ea37db828e
Convert .align to .p2align for OSX compatibility 8 years ago
  Martin Kroeker 7c1925acec
Use .p2align instead of .align for compatibility on Sandybridge as well 8 years ago
  Martin Kroeker 2359c7c1a9
Use .p2align instead of .align for portability 8 years ago
  Martin Kroeker e388459a27
Merge pull request #1419 from brada4/develop 8 years ago
  Andrew 4938faa822 core.IdenticalExpr clang501 checker 8 years ago
  Martin Kroeker 42285d8e70
Merge pull request #1410 from brada4/develop 8 years ago
  Andrew 4d0b005e5b Eliminate remaining unused results in kernels (clang5 analyzer) 8 years ago
  Martin Kroeker b81656936f
Merge pull request #1409 from martin-frbg/issue1292-2 8 years ago
  Martin Kroeker b973990df2
Tag %1 and %2 as both input and output operands 8 years ago
  Martin Kroeker 1e31124eb0
Merge pull request #1406 from martin-frbg/issue1292 8 years ago
  Martin Kroeker 723f396a20
Tag %1 and %2 as both input and output 8 years ago
  Martin Kroeker 43c0622e7b
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well 8 years ago
  Martin Kroeker 0623636c98
Use Sandybridge daxpy kernel on Haswell and Zen for now 8 years ago
  Andrew 281a2b952f warning cleanup (#1380) 8 years ago
  Martin Kroeker 6c77b5f267
Merge pull request #1369 from martin-frbg/dsdot 8 years ago
  Martin Kroeker c92cd6d162
Add trivially optimized dsdot based on sdot 8 years ago
  Martin Kroeker cae5d9a20b
Add trivially optimized dsdot based on sdot 8 years ago