133 Commits (a135f5d9ed3ce118bd0f9ddee8f920864756d7df)

Author SHA1 Message Date
  wernsaar a135f5d9ed added gemm_tcopy_2_bulldozer.S 12 years ago
  wernsaar d0b6299b13 added dgemm_tcopy_8_bulldozer.S 12 years ago
  wernsaar 9e58dd509e added gemm_ncopy_2_bulldozer.S 12 years ago
  wernsaar 7c8227101b cleanup of dgemv_n_bulldozer.S and optimization of inner loop 12 years ago
  wernsaar f67fa62851 added dgemv_n_bulldozer.S 12 years ago
  wernsaar 0ded1fcc1c performance optimizations in sgemm_kernel_16x2_bulldozer.S 12 years ago
  wernsaar a789b588cd added cgemm_kernel_4x2_bulldozer.S 12 years ago
  wernsaar 8eaa04acbb added zgemm_kernel_2x2_bulldozer.S 12 years ago
  wernsaar d854b30ae6 Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3 12 years ago
  wernsaar d65bbec99b added new sgemm kernel for BULLDOZER 12 years ago
  wernsaar e4c39c7c26 changed stack touching 12 years ago
  wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 12 years ago
  wernsaar 69aa6c8fb1 bad performance with some data 12 years ago
  wernsaar 60b263f3d2 removed trsm_kernel_RT_4x4_bulldozer.S. wrong results 12 years ago
  wernsaar 7ac306e0da added trsm_kernel_RT_4x4_bulldozer.S 12 years ago
  wernsaar 4cb454cdf2 added trsm_kernel_LT_4x4_bulldozer.S 12 years ago
  wernsaar 19ad2fb128 prefetch improved. Defined 2 different kernels for inner loop 12 years ago
  wernsaar 6821677489 minor improvements and code cleanup 12 years ago
  wernsaar 7641f6e253 Merged some improvements into dgemm_kernel_4x4_bulldozer.S. 12 years ago
  wernsaar 6e3f6f25a5 New version of dgemm_kernel_4x4_bulldozer.S 12 years ago
  wernsaar f300ce3df5 new optimization of dgemm kernel for bulldozer: 10% performance increase 13 years ago
  wernsaar 66e64131ed optimized again bulldozer dgemm kernel 13 years ago
  wernsaar 9405f26f4b new dgemm_kernel for bulldozer 13 years ago
  Zhang Xianyi 5c8bf6ae0e Merge branch 'bulldozer' into develop 13 years ago
  Zhang Xianyi d311236dfd Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64. 13 years ago
  Zhang Xianyi 0b08f7479e Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86. 13 years ago
  Zhang Xianyi 99d1978df7 Fixed #180. the typos in kernel/x86_64/sgemv_t.S 13 years ago
  Zhang Xianyi 08bf6674d5 Refs #177. Fixed sgemv_t compiling bug on Win64. 13 years ago
  Zhang Xianyi 69200884e1 Refs #173. Fixed overflow internal buffer bug of gemv_n on x86 13 years ago
  Zhang Xianyi 0d1518add9 Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86 13 years ago
  Zhang Xianyi 91ed4e4450 Refs #171. Prevent loading the dirty number from the buffer in sgemv_t x86 kernel. 13 years ago
  Zhang Xianyi fd3046b32a Refs #173. Fixed overflow internal buffer bug of gemv_t on x86. 13 years ago
  Julian Taylor 9fb341a9f8 set parameters for CORE_ATHLON 13 years ago
  Zhang Xianyi f19af5ecc0 Refs #54. Added AMD Bulldozer x86_64 dgemm kernel developed by Werner Saar <wernsaar at googlemail.com> 13 years ago
  Zhang Xianyi bfaaa975e6 Added BULLDOZER target. So far it uses barcelona kernels. 13 years ago
  Zhang Xianyi b7c0fa6bd2 Init AMD Bulldozer codebase. 13 years ago
  Zhang Xianyi cea1a885b5 Refs #154. Fixed the build bug of dgemv_t on MinW64. 13 years ago
  Zhang Xianyi 5f0117385e Refs #154. Fixed a SEGFAULT bug of dgemv_t when m is very large. 13 years ago
  Zhang Xianyi 2573311308 refs #140. Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32. 13 years ago
  Jameson Nash d0e731e8b8 provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line 13 years ago
  Xianyi Zhang 25f1a573fd Fixed the build bug when DYNAMIC_ARCH=0. 13 years ago
  wangqian 857a0fa0df Fixed the issue of mixing AVX and SSE codes in S/D/C/ZGEMM. 13 years ago
  wangqian d34fce56e4 Refs #83 Fixed S/DGEMM calling conventions bug on windows. 13 years ago
  wangqian 6cfcb54a28 Fixed align problem in S and C precision GEMM kernels. 13 years ago
  wangqian 3ef96aa567 Fixed bug in MOVQ redefine and ALIGN SIZE problem. 13 years ago
  wangqian f76f952547 Refs #83 #53. Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions. 13 years ago
  Zhang Xianyi eefd30881c Refs #113. Fixed the build bug on AMD Bobcat 64-bit OS. 13 years ago
  Zhang Xianyi d3b67d0bd8 Refs #113. Fixed the typo BOBCATE -> BOBCAT 13 years ago
  Zhang Xianyi d6cab3f37e Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX. 13 years ago
  Xianyi Zhang a53c6e2440 Merge branch 'develop' into sandybridge 13 years ago