traz
|
708d2b6255
|
Fix compute error in ztrmm.
|
15 years ago |
traz
|
e72113f06a
|
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
|
15 years ago |
traz
|
14f81da375
|
Change prefetch length of A and B, the performance is 2.1G now.
|
15 years ago |
traz
|
1c96d345e2
|
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
|
15 years ago |
traz
|
88d94d0ec8
|
Fixed #30 strmm computational error on Loongson3A.
|
15 years ago |
traz
|
fc84909115
|
Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a.
|
15 years ago |
traz
|
5ca4e51df0
|
Remove the useless code, modify code comments and format.
|
15 years ago |
traz
|
a9320f896e
|
Fixed #25 dtrmm and dtrsm computational error on Loongson3A.
|
15 years ago |
traz
|
29dce62b8f
|
Finish dtrsm_kernel_Rx.S on Loongson3A.
|
15 years ago |
traz
|
432c309f63
|
Finish dtrsm_kernel_Lx.S on Loongson3A.
|
15 years ago |
traz
|
d2f351d819
|
Modify dtrsm compiler options
|
15 years ago |
traz
|
5a991b7149
|
Fixed #24 drmm error on Loongson3A
|
15 years ago |
traz
|
9320933520
|
Completely dtrmm function.
|
15 years ago |
traz
|
921caefa56
|
Increased handling trmm part, no edge handling. Test size(M and N) must be a multiple of 4 .
|
15 years ago |
traz
|
ecd4c1f3d9
|
Modify prefetching C.
|
15 years ago |
traz
|
ab9e4ce351
|
Adjust kc size from 112 to 116 .
|
15 years ago |
Xianyi Zhang
|
921e040b15
|
Changed default page size to 16KB on Loongson 3A.
|
15 years ago |
traz
|
1aa9a298e1
|
Change BLOCK SIZE of LOONGSON3A TARGET.
|
15 years ago |
traz
|
782205a693
|
Add dgemm compiler Options in KERNEL.LOONGSON3A.
|
15 years ago |
traz
|
ac494c0d04
|
New kernel in LOONGSON3A.
|
15 years ago |
Xianyi Zhang
|
141091f528
|
Merge branch 'master' of github.com:xianyi/OpenBLAS into x86
|
15 years ago |
Xianyi Zhang
|
e4bb6f2482
|
Fixed the detecting bug on Intel Core i5. Thank ggl329 for the patch.
|
15 years ago |
Xianyi Zhang
|
0edcdd470e
|
Updated the developing version to v0.1 alpha2.
|
15 years ago |
Xianyi Zhang
|
d672491122
|
Init Changelog file for next release version(v0.1alpha2).
|
15 years ago |
Xianyi Zhang
|
972062903c
|
OpenBLAS 0.1 alpha version 1.
|
15 years ago |
Xianyi Zhang
|
d9aa359e69
|
Merge remote branch 'origin/loongson3a' into x86
|
15 years ago |
Xianyi Zhang
|
04769bdf54
|
Merge remote branch 'origin/loongson3a' into x86
|
15 years ago |
Xianyi Zhang
|
6f058487ab
|
Detect Intel Core Clarkdale & Arrandale
|
15 years ago |
Xianyi Zhang
|
f405b5bcc5
|
Fixed the bug about Loongson3A gsLQC1 & gsSQC1 instructions in daxpy kernel. Now daxpy is correct.
|
15 years ago |
Xianyi Zhang
|
2b8643e0de
|
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
|
15 years ago |
Xianyi Zhang
|
c84f8be453
|
Supported detecting new kernel(2.6.36) & new Loongson3A03 CPU.
|
15 years ago |
Wang Qian
|
d5cffd506a
|
Modified the default kernel makefile in MIPS64 arch.
|
15 years ago |
Xianyi Zhang
|
5838f12995
|
Support unalign address in daxpy on loongson3a simd..
|
15 years ago |
Xianyi Zhang
|
5444a3f8f7
|
Unroll to 16 in daxpy on loongson3a.
|
15 years ago |
Xianyi Zhang
|
88cbfcc5b5
|
Merge commit 'origin/x86' into loongson3a
|
15 years ago |
Xianyi Zhang
|
ce78abe37e
|
Merge branch 'x86' of github.com:xianyi/OpenBLAS into x86
|
15 years ago |
Xianyi Zhang
|
8f1090d32a
|
Support NO_LAPACK=1 to build the lib without LAPACK functions.
|
15 years ago |
Xianyi
|
272f62a2b6
|
Changed movlps macro name in capital in x86/zdot_sse2.S file.
|
15 years ago |
Xianyi
|
36016fe349
|
On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.
This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.
|
15 years ago |
Xianyi Zhang
|
44acb7503e
|
Added zdotu with x & y offset=1 test case.
|
15 years ago |
Xianyi Zhang
|
6eb02bbb9c
|
Merge remote branch 'origin/x86' into loongson3a
|
15 years ago |
Xianyi Zhang
|
0e782b9bd3
|
updated the changelog.
|
15 years ago |
Xianyi Zhang
|
588737210d
|
Fixed randomly SEGFAULT when nodemask==NULL with above Linux 2.6.34. Fixed #12. Thank Mr.Ei-ji Nakama providing this patch.
|
15 years ago |
Xianyi Zhang
|
cdf33edac3
|
Added Changelog. Fixed #11.
|
15 years ago |
Xianyi Zhang
|
f7a5e049e2
|
Enable Debug flags in memory alloc and init functions.
|
15 years ago |
Xianyi Zhang
|
1b97ec1a7c
|
Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes.
|
15 years ago |
Xianyi Zhang
|
36b3a730d3
|
Merge branch 'x86' of github.com:xianyi/OpenBLAS into x86
|
15 years ago |
Xianyi Zhang
|
128418f49b
|
Fixed #10. Supported GOTO_NUM_THREADS & GOTO_THREADS_TIMEOUT environment variables.
|
15 years ago |
Xianyi
|
12214e1d0f
|
Fixed #7. Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86 32bits arch.
|
15 years ago |
Xianyi Zhang
|
cd2cbabecc
|
Added unit test case (zdotu, N=1).
|
15 years ago |