wangqian
|
6a72840945
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.
|
13 years ago |
Zhang Xianyi
|
947457fb7c
|
Fixed the bug about testing the exist of lapack tar package.
|
13 years ago |
Zhang Xianyi
|
79120bf9a0
|
Refs #205. Merge boegel's codes about downloading LAPACK.
|
13 years ago |
Zhang Xianyi
|
acb11905d5
|
Fixed #199. Saved USE_THREAD switch for make install.
|
13 years ago |
Zhang Xianyi
|
109500178c
|
Refs #220. Support Power7 by old Power6 kernels.
|
13 years ago |
Zhang Xianyi
|
e50a664865
|
Refs #215. Fixed the compatible between <complex.h> and <complex> in C++.
|
13 years ago |
Zhang Xianyi
|
357078b93e
|
Refs #216. Revert the default value of GEMM_MULTITHREAD_THRESHOLD to 4.
|
13 years ago |
Zhang Xianyi
|
5d96e4f224
|
Refs #210. Disable checking /lib/libpthread.so*.
|
13 years ago |
Xianyi Zhang
|
dbbda55e67
|
Updated the mailing list for OpenBLAS.
|
13 years ago |
Xianyi Zhang
|
6c34a7f43c
|
Updated the mailing list for OpenBLAS.
|
13 years ago |
Zhang Xianyi
|
3326f3152c
|
Merge pull request #213 from wernsaar/develop
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
|
13 years ago |
wernsaar
|
7641f6e253
|
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
|
13 years ago |
Zhang Xianyi
|
48bdc1ad3b
|
Added NO_PARALLEL_MAKE flag to disable parallel make.
|
13 years ago |
Zhang Xianyi
|
3ad29452d1
|
Merge pull request #211 from wernsaar/develop
New version of dgemm_kernel_4x4_bulldozer.S
|
13 years ago |
wernsaar
|
6e3f6f25a5
|
New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
|
13 years ago |
Zhang Xianyi
|
a068d54981
|
Refs #209. Export the missing cblas_cdotc_sub functions.
|
13 years ago |
Zhang Xianyi
|
e029242870
|
Merge pull request #206 from wlbksy/patch-1
Fix #204 wget in mingw/msys sometimes download file with trailing name,
|
13 years ago |
wlbksy
|
7a9b94b519
|
Fix #204
|
13 years ago |
Kenneth Hoste
|
66b919d99f
|
adjusted Makefile to allow for provided required LAPACK source files rather than downloading them
|
13 years ago |
Zhang Xianyi
|
f4846afbad
|
Merge pull request #201 from Explorer09/develop
|
13 years ago |
Explorer09
|
53588bc786
|
getarch.c: Minor re-ordering of architecture list
|
13 years ago |
Explorer09
|
b47f13ee4c
|
getarch.c: Minor re-ordering of architecture list
|
13 years ago |
Explorer09
|
309f90e563
|
TargetList.txt: minor re-ordering
|
13 years ago |
Explorer09
|
773c01f496
|
Typo correction in README.md
|
13 years ago |
Zhang Xianyi
|
d831b2ff8b
|
Override CFLAGS in LAPACK make.in.
|
13 years ago |
Zhang Xianyi
|
724ae159ce
|
Fixed the Windows x86_64 ABI bug in s/daxpy kernels.
|
13 years ago |
Zhang Xianyi
|
2c9a203bd1
|
Merge pull request #198 from wernsaar/develop
new optimization of dgemm kernel for bulldozer: 10% performance increase
|
13 years ago |
wernsaar
|
f300ce3df5
|
new optimization of dgemm kernel for bulldozer: 10% performance increase
|
13 years ago |
Zhang Xianyi
|
e2c7c75715
|
Merge pull request #197 from wernsaar/develop
optimized again bulldozer dgemm kernel
|
13 years ago |
wernsaar
|
66e64131ed
|
optimized again bulldozer dgemm kernel
|
13 years ago |
Zhang Xianyi
|
5900b1462e
|
Merge pull request #195 from wernsaar/develop
Develop dgemm for bullozer
|
13 years ago |
wernsaar
|
9405f26f4b
|
new dgemm_kernel for bulldozer
|
13 years ago |
Zhang Xianyi
|
54e7b37630
|
Merge branch 'develop'
|
13 years ago |
Zhang Xianyi
|
529f1b5006
|
Refs#194. Export the missing LAPACK s/dlamc3 functions.
|
13 years ago |
Zhang Xianyi
|
e5ac3007e0
|
Merge branch 'develop'
|
13 years ago |
Zhang Xianyi
|
0d0405b434
|
Updated the doc for 0.2.6 version.
|
13 years ago |
Zhang Xianyi
|
f1ce74ffdd
|
Improved the print when OS don't support AVX.
|
13 years ago |
Zhang Xianyi
|
d744c9590a
|
In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly.
|
13 years ago |
Zhang Xianyi
|
3cc6ae793e
|
Refs #174. Return sb pointer when OpenMP or Windows.
|
13 years ago |
Zhang Xianyi
|
4c2123c334
|
Fixed the overflowing bug in single thread cholesky factorization.
|
13 years ago |
Zhang Xianyi
|
5155e3f509
|
Refs #174. Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
|
13 years ago |
Zhang Xianyi
|
5c8bf6ae0e
|
Merge branch 'bulldozer' into develop
|
13 years ago |
Zaheer Chothia
|
a9500d0079
|
Missing line continuation -- follow-up to last commit (64ad8b9809).
|
13 years ago |
Zaheer Chothia
|
64ad8b9809
|
Refs #193. Don't use C99 complex numbers when building C++ code.
|
13 years ago |
Zaheer Chothia
|
875d520ccf
|
Refs #193. cblas: move #include out of extern "C" block.
Standard headers may contain C++ templates which are not permitted inside an
extern "C" block. This might be the case when we include <complex.h>.
|
13 years ago |
Zhang Xianyi
|
d311236dfd
|
Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64.
|
13 years ago |
Zhang Xianyi
|
36e0982966
|
Refs #187. Use perl to generate cblas_noconst.h instead of sed.
Thank Dan Povey's patch. https://github.com/xianyi/OpenBLAS/issues/187
|
13 years ago |
Zhang Xianyi
|
8cdb795438
|
Refs #187. Use binary code for xgetbv, which is compatible with old compiler.
|
13 years ago |
Zaheer Chothia
|
4db6660de4
|
Refs #185. Add missing 'const' to declarations in <cblas.h>. Thanks to Dan Povey!
The 'const' modifications were done automatically using this scripts:
https://kaldi.svn.sourceforge.net/svnroot/kaldi/sandbox/dan/tools/for_openblas
|
13 years ago |
Zhang Xianyi
|
0b08f7479e
|
Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86.
|
13 years ago |