479 Commits (a6a8cc2b7fa30f46fdaa4fb6e50c19da8c11e335)

Author SHA1 Message Date
  Martin Kroeker c04a729081
Add ?sum definitions for generic kernel 6 years ago
  Martin Kroeker 9d717cb5ee
Add x86_64 implementation of ?sum 6 years ago
  Martin Kroeker 32c7063cb0
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1 6 years ago
  Martin Kroeker e608d4f7fe
Disable the AVX512 DGEMM kernel (again) 7 years ago
  Celelibi b7f59da42d Fix crash in sgemm SSE/nano kernel on x86_64 7 years ago
  Andrew 6eee1beac5 move fix to right place 7 years ago
  Martin Kroeker e12cdf58ef
Merge pull request #2024 from martin-frbg/gcc9fixes4 7 years ago
  Martin Kroeker 1860c9456d
Merge pull request #2023 from martin-frbg/gcc9fixes3 7 years ago
  Martin Kroeker f9bb76d29a
Fix inline assembly constraints in Bulldozer TRSM kernels 7 years ago
  Martin Kroeker efb9038f72
Fix inline assembly constraints 7 years ago
  Martin Kroeker e976557d29
Fix inline assembly constraints 7 years ago
  Martin Kroeker 9d8be15789
Fix inline assembly constraints 7 years ago
  Martin Kroeker d752799a0f
Merge pull request #2021 from martin-frbg/gcc9fixes2 7 years ago
  Martin Kroeker c26c0b77a7
Fix wrong constraints in inline assembly 7 years ago
  Martin Kroeker 1c6da2d03c
Merge pull request #2019 from martin-frbg/gcc9fixes 7 years ago
  Martin Kroeker 4255a58cd2
Rename operands to put lda on the input/output constraint list 7 years ago
  Martin Kroeker 46e415b140
Save and restore input argument 8 (lda4) 7 years ago
  Bart Oldeman 69a97ca7b9 dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3 7 years ago
  Martin Kroeker ab1630f9fa
Fix declaration of arguments in inline assembly 7 years ago
  Martin Kroeker b824fa70eb
Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 7 years ago
  Martin Kroeker 91481a3e4e
Fix declaration of input arguments in inline assembly 7 years ago
  Martin Kroeker dc6ac9eab0
Fix declaration of input arguments in the x86_64 s/dGEMV_T and s/dGEMV_N kernels 7 years ago
  Martin Kroeker 32b0f1168e
Fix declaration of input arguments in the Sandybridge GER microkernels (#1967) 7 years ago
  Martin Kroeker b495e54310
Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 7 years ago
  Martin Kroeker d5e6940253
Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 7 years ago
  Arjan van de Ven 795285c587 Fix thinko in skylake beta handling 7 years ago
  Arjan van de Ven d321448a63 dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell 7 years ago
  Arjan van de Ven c43331ad0a dgemm: Use the skylakex beta function also for haswell 7 years ago
  Arjan van de Ven 69d206440a Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support 7 years ago
  Arjan van de Ven 0586899a10 Use sgemm_ncopy_4_skylakex.c also for Haswell 7 years ago
  Arjan van de Ven 00dc09ad19 Use the skylake sgemm beta code also for haswell 7 years ago
  Arjan van de Ven cdc668d82b Add a "sgemm direct" mode for small matrixes 7 years ago
  Martin Kroeker 701ea88347
Use p2align instead of align for OSX compatibility 7 years ago
  Andrew 19c4bdd8b3 Add return value so that freebsd system clang does not err out 7 years ago
  Arjan van de Ven dcc5d6291e skylakex: Make the sgemm/dgemm beta code robust for a N=0 or M=0 case 7 years ago
  Arjan van de Ven 55b244ca0d enable the SGEMM/SKX C based kernel 7 years ago
  Arjan van de Ven d4bad73834 Add a C+intrinsics version of the SGEMM/skylakex kernel 7 years ago
  Arjan van de Ven 582c589727 dgemm/skylakex: replace discrete mul/add with fma 7 years ago
  Arjan van de Ven adbf6afa25 Add vector optimizations for ncopy as well for dgemm/skylakex 7 years ago
  Arjan van de Ven 32bec8afbb add a skylakex optimized dgemm beta function 7 years ago
  Arjan van de Ven 20c5d668fe dgemm/avx512 simplify and speed up the 4x4 kernel 7 years ago
  Arjan van de Ven 6d43c51ccf undo slow dgemm/skylake microoptimization 7 years ago
  Arjan van de Ven d74dc39b0f Add optimized *copy versions for skylakex 7 years ago
  Arjan van de Ven 66b43affbc Add a 24x8 kernel to the skylakex dgemm implementation 7 years ago
  Arjan van de Ven 1938819c25 skylake dgemm: Add a 16x8 kernel 7 years ago
  Martin Kroeker b7496c3638
Function name needs to be CNAME, set from outside to allow suffixing for dynamic_arch 7 years ago
  Arjan van de Ven 45fe8cb0c5 Create a AVX512 enabled version of DGEMM 7 years ago
  Martin Kroeker 375dff54fc
Merge pull request #1733 from fenrus75/dsymv 7 years ago
  Martin Kroeker a5f165275a
Merge pull request #1732 from fenrus75/dgemv 7 years ago
  Martin Kroeker 8c13aa495a
Merge pull request #1730 from fenrus75/fix-sdot 7 years ago