72 Commits (e45a347cd2d2dc36f4425fa2796c0f2e731bea51)

Author SHA1 Message Date
  wernsaar e45a347cd2 repaired trmm bug in sgemm_kernel_16x2_bulldozer.S 13 years ago
  wernsaar 99727ac013 repaired trmm bug in cgemm_kernel_4x2_bulldozer.S 13 years ago
  wernsaar 6e0a2fbc0c repaired trmm bug in zgemm_kernel_2x2_bulldozer.S 13 years ago
  wernsaar 0a22f99c58 repaired trmm bug in dgemm_kernel_8x2_bulldozer.S 13 years ago
  wernsaar 84bd0aabaa added dtrsm_kernel_LT_8x2_bulldozer.S 13 years ago
  Zhang Xianyi 886cbaf4e4 Support AMD Piledriver by bulldozer kernels. 13 years ago
  Zhang Xianyi 57944538b6 Use ALIGN_5 instead of .algin 32 in assembly kernel. Added ALIGN_5 for 32-bit OSX. 13 years ago
  Zhang Xianyi fb298b34ae Merge pull request #235 from wernsaar/develop 13 years ago
  wernsaar 16012767f4 added dcopy_bulldozer.S 13 years ago
  wernsaar bcbac31b47 added ddot_bulldozer.S 13 years ago
  wernsaar 8dc0c72583 added daxpy_bulldozer.S 13 years ago
  wernsaar 89405a1a0b cleanup of dgemm_ncopy_8_bulldozer.S 13 years ago
  wernsaar 4f2b12b8a8 added dgemv_t_bulldozer.S 13 years ago
  Zhang Xianyi 646e168d26 Merge pull request #233 from wernsaar/develop 13 years ago
  wernsaar 93dbbe1fb8 added dgemm_ncopy_8_bulldozer.S 13 years ago
  wernsaar a135f5d9ed added gemm_tcopy_2_bulldozer.S 13 years ago
  wernsaar d0b6299b13 added dgemm_tcopy_8_bulldozer.S 13 years ago
  wernsaar 9e58dd509e added gemm_ncopy_2_bulldozer.S 13 years ago
  wernsaar 7c8227101b cleanup of dgemv_n_bulldozer.S and optimization of inner loop 13 years ago
  wernsaar f67fa62851 added dgemv_n_bulldozer.S 13 years ago
  Zhang Xianyi cd1d473ba0 Merge pull request #230 from wernsaar/develop 13 years ago
  wernsaar 0ded1fcc1c performance optimizations in sgemm_kernel_16x2_bulldozer.S 13 years ago
  wernsaar a789b588cd added cgemm_kernel_4x2_bulldozer.S 13 years ago
  wernsaar 8eaa04acbb added zgemm_kernel_2x2_bulldozer.S 13 years ago
  wernsaar d65bbec99b added new sgemm kernel for BULLDOZER 13 years ago
  wernsaar e4c39c7c26 changed stack touching 13 years ago
  wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 13 years ago
  Zhang Xianyi 9f59f384d8 Refs #223. Fixed s/dgemv bug on windows. 13 years ago
  wangqian 23965f164c Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64. 13 years ago
  wernsaar 69aa6c8fb1 bad performance with some data 13 years ago
  wernsaar 60b263f3d2 removed trsm_kernel_RT_4x4_bulldozer.S. wrong results 13 years ago
  wernsaar 7ac306e0da added trsm_kernel_RT_4x4_bulldozer.S 13 years ago
  wernsaar 4cb454cdf2 added trsm_kernel_LT_4x4_bulldozer.S 13 years ago
  wernsaar 19ad2fb128 prefetch improved. Defined 2 different kernels for inner loop 13 years ago
  wernsaar 6821677489 minor improvements and code cleanup 13 years ago
  Zhang Xianyi 3326f3152c Merge pull request #213 from wernsaar/develop 13 years ago
  wernsaar 7641f6e253 Merged some improvements into dgemm_kernel_4x4_bulldozer.S. 13 years ago
  Zhang Xianyi 3ad29452d1 Merge pull request #211 from wernsaar/develop 13 years ago
  wernsaar 6e3f6f25a5 New version of dgemm_kernel_4x4_bulldozer.S 13 years ago
  Zhang Xianyi 724ae159ce Fixed the Windows x86_64 ABI bug in s/daxpy kernels. 13 years ago
  wernsaar f300ce3df5 new optimization of dgemm kernel for bulldozer: 10% performance increase 13 years ago
  wernsaar 66e64131ed optimized again bulldozer dgemm kernel 13 years ago
  wernsaar 9405f26f4b new dgemm_kernel for bulldozer 13 years ago
  Zhang Xianyi 5c8bf6ae0e Merge branch 'bulldozer' into develop 13 years ago
  Zhang Xianyi d311236dfd Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64. 13 years ago
  Zhang Xianyi 99d1978df7 Fixed #180. the typos in kernel/x86_64/sgemv_t.S 13 years ago
  Zhang Xianyi 08bf6674d5 Refs #177. Fixed sgemv_t compiling bug on Win64. 13 years ago
  Zhang Xianyi fd3046b32a Refs #173. Fixed overflow internal buffer bug of gemv_t on x86. 13 years ago
  Zhang Xianyi f19af5ecc0 Refs #54. Added AMD Bulldozer x86_64 dgemm kernel developed by Werner Saar <wernsaar at googlemail.com> 13 years ago
  Zhang Xianyi bfaaa975e6 Added BULLDOZER target. So far it uses barcelona kernels. 13 years ago