a135f5d9e
added gemm_tcopy_2_bulldozer.S by
2013-06-18 11:01:33 +0200
d0b6299b1
added dgemm_tcopy_8_bulldozer.S by
2013-06-17 14:19:09 +0200
9e58dd509
added gemm_ncopy_2_bulldozer.S by
2013-06-17 12:55:12 +0200
7c8227101
cleanup of dgemv_n_bulldozer.S and optimization of inner loop by
2013-06-16 12:50:45 +0200
f67fa6285
added dgemv_n_bulldozer.S by
2013-06-15 16:42:37 +0200
cd1d473ba
Merge pull request #230 from wernsaar/develop by
2013-06-13 07:29:27 -0700
b2ebf211e
(refs/pull/230/merge)
Merge 0ded1fcc1c into 56f160134d by
2013-06-13 07:29:02 -0700
56f160134
Refs #231. Change the default C compiler to clang on Mac OSX. by
2013-06-13 22:15:19 +0800
0ded1fcc1
(refs/pull/230/head)
performance optimizations in sgemm_kernel_16x2_bulldozer.S by
2013-06-13 11:35:15 +0200
a789b588c
added cgemm_kernel_4x2_bulldozer.S by
2013-06-12 15:55:27 +0200
8eaa04acb
added zgemm_kernel_2x2_bulldozer.S by
2013-06-11 12:00:49 +0200
d854b30ae
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3 by
2013-06-09 17:26:42 +0200
d65bbec99
added new sgemm kernel for BULLDOZER by
2013-06-09 15:57:42 +0200
e4c39c7c2
changed stack touching by
2013-06-08 10:43:08 +0200
ba800f088
correct GEMM_THREAD in param.h by
2013-06-08 10:03:59 +0200
25491e42f
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S by
2013-06-08 09:40:17 +0200
960b0c88a
Refs #227. Detected LLVM/Clang compiler. by
2013-06-06 23:43:40 +0800
65ffead0c
Refs #124. Check XSAVE flag on x86 CPU. by
2013-06-06 22:50:43 +0800
f2fb8c703
Change LIBSUFFIX from .lib to .a on windows. by
2013-06-04 16:05:28 +0800
9f59f384d
Refs #223. Fixed s/dgemv bug on windows. by
2013-06-04 16:01:05 +0800
23965f164
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64. by
2013-05-29 19:48:31 +0800
6a7284094
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86. by
2013-05-29 13:23:12 +0800
947457fb7
Fixed the bug about testing the exist of lapack tar package. by
2013-05-24 15:52:35 +0800
79120bf9a
Refs #205. Merge boegel's codes about downloading LAPACK. by
2013-05-24 15:29:10 +0800
acb11905d
Fixed #199. Saved USE_THREAD switch for make install. by
2013-05-24 15:15:52 +0800
109500178
Refs #220. Support Power7 by old Power6 kernels. by
2013-05-21 22:59:45 +0800
e50a66486
Refs #215. Fixed the compatible between <complex.h> and <complex> in C++. by
2013-05-17 16:41:05 +0800
357078b93
Refs #216. Revert the default value of GEMM_MULTITHREAD_THRESHOLD to 4. by
2013-05-03 09:08:54 +0800
731220f87
changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit by
2013-04-30 10:07:17 +0200
69aa6c8fb
bad performance with some data by
2013-04-28 11:14:23 +0200
60b263f3d
removed trsm_kernel_RT_4x4_bulldozer.S. wrong results by
2013-04-27 17:23:08 +0200
7ac306e0d
added trsm_kernel_RT_4x4_bulldozer.S by
2013-04-27 16:48:48 +0200
4cb454cdf
added trsm_kernel_LT_4x4_bulldozer.S by
2013-04-27 14:30:00 +0200
19ad2fb12
prefetch improved. Defined 2 different kernels for inner loop by
2013-04-27 13:40:49 +0200
5d96e4f22
Refs #210. Disable checking /lib/libpthread.so*. by
2013-04-27 15:02:04 +0800
682167748
minor improvements and code cleanup by
2013-04-26 20:05:42 +0200
dbbda55e6
Updated the mailing list for OpenBLAS. by
2013-04-25 00:45:42 +0800
6c34a7f43
Updated the mailing list for OpenBLAS. by
2013-04-25 00:44:22 +0800
3326f3152
Merge pull request #213 from wernsaar/develop by
2013-04-17 23:56:09 -0700
c7fdc692c
(refs/pull/213/merge)
Merge 7641f6e253 into 48bdc1ad3b by
2013-04-16 10:15:33 -0700
7641f6e25
(refs/pull/213/head)
Merged some improvements into dgemm_kernel_4x4_bulldozer.S. Changed the copy functions to generic to solve prefetch conflicts by
2013-04-16 19:05:06 +0200
48bdc1ad3
Added NO_PARALLEL_MAKE flag to disable parallel make. by
2013-04-15 21:37:30 +0800
3ad29452d
Merge pull request #211 from wernsaar/develop by
2013-04-15 00:20:55 -0700
e2fc2344c
(refs/pull/211/merge)
Merge 6e3f6f25a5 into a068d54981 by
2013-04-12 09:11:40 -0700
6e3f6f25a
(refs/pull/211/head)
New version of dgemm_kernel_4x4_bulldozer.S The peak performance with 8 cores is now 90 GFlops by
2013-04-12 17:55:51 +0200
986d542ac
Merge branch 'loongson3a' into loongson3b by
2013-04-11 16:07:59 +0800
990efcab6
Merge branch 'loongson3b' into loongson3a by
2013-04-11 16:11:03 +0000
75a5dc397
Added the configure for the host loongcc compiling on Loongson3. by
2013-04-11 16:10:47 +0000
6958c1a1a
Fixed the SEGFAULT bug with Loongcc and Loongson3. by
2013-04-11 15:33:43 +0800
a068d5498
Refs #209. Export the missing cblas_cdotc_sub functions. by
2013-04-08 23:21:28 +0800
d692ee07f
Merge branch 'loongson3a' into loongson3b by
2013-04-08 14:56:39 +0800
1a57717b1
Added the configuration of Loongcc compiler for Loongson 3 CPU. by
2013-04-07 15:42:07 +0800
6b01d5871
Disable the optimization of muli-threading gemm on the Loongson3A. by
2013-03-30 20:12:43 +0000
35b943f17
Merge branch 'develop' into loongson3a by
2013-03-27 14:36:15 +0000
e02924287
Merge pull request #206 from wlbksy/patch-1 by
2013-03-23 09:57:41 -0700
f8c889529
(refs/pull/206/merge)
Merge 7a9b94b519 into f4846afbad by
2013-03-22 23:41:47 -0700
7a9b94b51
(refs/pull/206/head)
Fix #204 by
2013-03-23 14:41:26 +0800
e3c21da90
(refs/pull/205/merge)
Merge 66b919d99f into f4846afbad by
2013-03-22 11:47:05 -0700
66b919d99
(refs/pull/205/head)
adjusted Makefile to allow for provided required LAPACK source files rather than downloading them by
2013-03-22 19:45:11 +0100
f4846afba
Merge pull request #201 from Explorer09/develop by
2013-03-18 07:31:30 -0700
17176ae7e
(refs/pull/201/merge)
Merge 53588bc786 into d831b2ff8b by
2013-03-17 08:16:26 -0700
53588bc78
(refs/pull/201/head)
getarch.c: Minor re-ordering of architecture list by
2013-03-17 23:09:23 +0800
b47f13ee4
getarch.c: Minor re-ordering of architecture list by
2013-03-17 23:07:48 +0800
309f90e56
TargetList.txt: minor re-ordering by
2013-03-17 23:03:05 +0800
773c01f49
Typo correction in README.md by
2013-03-17 22:48:24 +0800
d831b2ff8
Override CFLAGS in LAPACK make.in. by
2013-03-10 01:01:16 +0800
724ae159c
Fixed the Windows x86_64 ABI bug in s/daxpy kernels. by
2013-03-08 22:28:34 +0800
2c9a203bd
Merge pull request #198 from wernsaar/develop by
2013-03-06 13:39:53 -0800
65e54956d
(refs/pull/198/merge)
Merge f300ce3df5 into e2c7c75715 by
2013-03-06 09:04:11 -0800
f300ce3df
(refs/pull/198/head)
new optimization of dgemm kernel for bulldozer: 10% performance increase by
2013-03-06 17:26:03 +0100
e2c7c7571
Merge pull request #197 from wernsaar/develop by
2013-03-06 01:11:08 -0800
059e985db
(refs/pull/197/merge)
Merge 66e64131ed into 5900b1462e by
2013-03-05 10:59:43 -0800
66e64131e
(refs/pull/197/head)
optimized again bulldozer dgemm kernel by
2013-03-05 19:51:37 +0100
5900b1462
Merge pull request #195 from wernsaar/develop by
2013-03-05 05:35:42 -0800
901230f0d
(refs/pull/195/merge)
Merge 9405f26f4b into 529f1b5006 by
2013-03-04 08:59:38 -0800
9405f26f4
(refs/pull/195/head)
new dgemm_kernel for bulldozer by
2013-03-04 17:37:38 +0100
54e7b3763
(tag: v0.2.6)
Merge branch 'develop' by
2013-03-02 14:42:06 +0800
529f1b500
Refs#194. Export the missing LAPACK s/dlamc3 functions. by
2013-03-02 14:41:18 +0800
e5ac3007e
Merge branch 'develop' by
2013-03-02 14:24:23 +0800
0d0405b43
Updated the doc for 0.2.6 version. by
2013-03-02 14:22:27 +0800
f1ce74ffd
Improved the print when OS don't support AVX. by
2013-03-02 14:15:54 +0800
d744c9590
In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly. by
2013-03-01 14:36:47 +0800
3cc6ae793
Refs #174. Return sb pointer when OpenMP or Windows. by
2013-02-26 00:48:21 +0800
4c2123c33
Fixed the overflowing bug in single thread cholesky factorization. by
2013-02-23 12:51:13 +0800
5155e3f50
Refs #174. Fixed the overflowing buffer bug of multithreading hbmv and sbmv. by
2013-02-13 16:05:58 +0800
5c8bf6ae0
Merge branch 'bulldozer' into develop by
2013-02-10 01:19:42 +0800
6ae2f868f
Set the affinity. Only use 1 core of each module on bulldozer. by
2013-02-09 18:18:55 +0100
a1ead62f2
Disable the warning of sgemm bulldozer kernel. by
2013-02-09 17:03:13 +0100
013358014
Used sgemm bulldozer kernel on 64 bit. by
2013-02-09 16:29:14 +0100
274246651
Merge branch 'bulldozer' of git://github.com/wernsaar/OpenBLAS into bulldozer by
2013-02-09 16:25:07 +0100
299b5a44d
Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer by
2013-02-09 16:22:04 +0100
a9500d007
Missing line continuation -- follow-up to last commit (64ad8b9809). by
2013-02-01 09:34:12 +0100
64ad8b980
Refs #193. Don't use C99 complex numbers when building C++ code. by
2013-02-01 09:24:44 +0100
875d520cc
Refs #193. cblas: move #include out of extern "C" block. by
2013-01-31 08:48:27 +0100
d311236df
Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64. by
2013-01-25 16:18:27 +0800
36e098296
Refs #187. Use perl to generate cblas_noconst.h instead of sed. by
2013-01-22 00:29:54 +0800
8cdb79543
Refs #187. Use binary code for xgetbv, which is compatible with old compiler. by
2013-01-22 00:18:21 +0800
4db6660de
Refs #185. Add missing 'const' to declarations in <cblas.h>. Thanks to Dan Povey! by
2013-01-20 21:53:52 +0100
0b08f7479
Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86. by
2013-01-20 21:22:12 +0800
200e4acf1
cblas: typedef enums for improved compatibility with Intel MKL. by
2012-06-25 13:51:46 +0200