457 Commits (a135f5d9ed3ce118bd0f9ddee8f920864756d7df)
 

Author SHA1 Message Date
  wernsaar a135f5d9ed added gemm_tcopy_2_bulldozer.S 12 years ago
  wernsaar d0b6299b13 added dgemm_tcopy_8_bulldozer.S 12 years ago
  wernsaar 9e58dd509e added gemm_ncopy_2_bulldozer.S 12 years ago
  wernsaar 7c8227101b cleanup of dgemv_n_bulldozer.S and optimization of inner loop 12 years ago
  wernsaar f67fa62851 added dgemv_n_bulldozer.S 12 years ago
  wernsaar 0ded1fcc1c performance optimizations in sgemm_kernel_16x2_bulldozer.S 12 years ago
  wernsaar a789b588cd added cgemm_kernel_4x2_bulldozer.S 12 years ago
  wernsaar 8eaa04acbb added zgemm_kernel_2x2_bulldozer.S 12 years ago
  wernsaar d854b30ae6 Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3 12 years ago
  wernsaar d65bbec99b added new sgemm kernel for BULLDOZER 12 years ago
  wernsaar e4c39c7c26 changed stack touching 12 years ago
  wernsaar ba800f0883 correct GEMM_THREAD in param.h 12 years ago
  wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 12 years ago
  wernsaar 731220f870 changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit 12 years ago
  wernsaar 69aa6c8fb1 bad performance with some data 12 years ago
  wernsaar 60b263f3d2 removed trsm_kernel_RT_4x4_bulldozer.S. wrong results 12 years ago
  wernsaar 7ac306e0da added trsm_kernel_RT_4x4_bulldozer.S 12 years ago
  wernsaar 4cb454cdf2 added trsm_kernel_LT_4x4_bulldozer.S 12 years ago
  wernsaar 19ad2fb128 prefetch improved. Defined 2 different kernels for inner loop 12 years ago
  wernsaar 6821677489 minor improvements and code cleanup 12 years ago
  wernsaar 7641f6e253 Merged some improvements into dgemm_kernel_4x4_bulldozer.S. 13 years ago
  wernsaar 6e3f6f25a5 New version of dgemm_kernel_4x4_bulldozer.S 13 years ago
  wernsaar f300ce3df5 new optimization of dgemm kernel for bulldozer: 10% performance increase 13 years ago
  wernsaar 66e64131ed optimized again bulldozer dgemm kernel 13 years ago
  wernsaar 9405f26f4b new dgemm_kernel for bulldozer 13 years ago
  Zhang Xianyi 54e7b37630 Merge branch 'develop' 13 years ago
  Zhang Xianyi 529f1b5006 Refs#194. Export the missing LAPACK s/dlamc3 functions. 13 years ago
  Zhang Xianyi e5ac3007e0 Merge branch 'develop' 13 years ago
  Zhang Xianyi 0d0405b434 Updated the doc for 0.2.6 version. 13 years ago
  Zhang Xianyi f1ce74ffdd Improved the print when OS don't support AVX. 13 years ago
  Zhang Xianyi d744c9590a In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly. 13 years ago
  Zhang Xianyi 3cc6ae793e Refs #174. Return sb pointer when OpenMP or Windows. 13 years ago
  Zhang Xianyi 4c2123c334 Fixed the overflowing bug in single thread cholesky factorization. 13 years ago
  Zhang Xianyi 5155e3f509 Refs #174. Fixed the overflowing buffer bug of multithreading hbmv and sbmv. 13 years ago
  Zhang Xianyi 5c8bf6ae0e Merge branch 'bulldozer' into develop 13 years ago
  Zaheer Chothia a9500d0079 Missing line continuation -- follow-up to last commit (64ad8b9809). 13 years ago
  Zaheer Chothia 64ad8b9809 Refs #193. Don't use C99 complex numbers when building C++ code. 13 years ago
  Zaheer Chothia 875d520ccf Refs #193. cblas: move #include out of extern "C" block. 13 years ago
  Zhang Xianyi d311236dfd Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64. 13 years ago
  Zhang Xianyi 36e0982966 Refs #187. Use perl to generate cblas_noconst.h instead of sed. 13 years ago
  Zhang Xianyi 8cdb795438 Refs #187. Use binary code for xgetbv, which is compatible with old compiler. 13 years ago
  Zaheer Chothia 4db6660de4 Refs #185. Add missing 'const' to declarations in <cblas.h>. Thanks to Dan Povey! 13 years ago
  Zhang Xianyi 0b08f7479e Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86. 13 years ago
  Zaheer Chothia 200e4acf15 cblas: typedef enums for improved compatibility with Intel MKL. 13 years ago
  Zhang Xianyi 99d1978df7 Fixed #180. the typos in kernel/x86_64/sgemv_t.S 13 years ago
  Zhang Xianyi 08bf6674d5 Refs #177. Fixed sgemv_t compiling bug on Win64. 13 years ago
  Zhang Xianyi 8b122ff9dc Refs #176. Fixed make.inc overriding RANLIB bug when cross-compiling LAPACK. 13 years ago
  Zhang Xianyi 69200884e1 Refs #173. Fixed overflow internal buffer bug of gemv_n on x86 13 years ago
  Zhang Xianyi 0d1518add9 Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86 13 years ago
  Zhang Xianyi 91ed4e4450 Refs #171. Prevent loading the dirty number from the buffer in sgemv_t x86 kernel. 13 years ago