1c96d345e
Improve zgemm performance from 1G to 1.8G, change block size in param.h. by
2011-06-21 22:16:23 +0000
82f527482
Refs #39. It's unnecessary to include sys/mman.h file in blas_server_omp.c. by
2011-06-22 01:52:20 +0800
e568df0da
Refs #38. Prepare the docs with v0.1alpha2. by
2011-06-21 18:06:13 +0800
c4efde771
Merge branch 'loongson3a' into release-v0.1alpha2 by
2011-06-21 17:50:00 +0800
7a1e6202e
Merge branch 'add_install_target' into develop by
2011-06-21 17:40:16 +0800
32353a9d3
Refs #20. Fixed the installation bug with DYNAMIC_ARCH=1. by
2011-06-21 17:39:08 +0800
2e6e9272f
Merge branch 'add_install_target' into develop by
2011-06-20 18:40:05 +0800
d978436c4
Refs #20. Updated the docs. by
2011-06-20 18:36:29 +0800
fab36f1ad
Fixed #20. Added install target in makefile. You can use "make install PREFIX=your_installation_directory". by
2011-06-20 18:35:35 +0800
7945919f2
Updated gitignore file. by
2011-06-19 12:07:31 +0800
c642b61d4
Merge branch 'master' of github.com:xianyi/OpenBLAS into develop by
2011-06-19 11:59:38 +0800
aeed8d622
Fixed #27. Temporarily walk around axpy's low performance issue with small imput size & multithreads. by
2011-06-19 11:55:29 +0800
1a4181afd
Merge pull request #36 from pipping/master by
2011-06-11 05:59:00 -0700
a36468f5c
(refs/pull/36/merge)
Merge 49742cb2d3 into 8cc628a953 by
2011-06-11 05:56:41 -0700
49742cb2d
(refs/pull/36/head)
Make USE_OPENMP=0 disable openmp by
2011-06-11 14:36:16 +0200
b3d188774
Fixed #35 a build bug with NO_LAPACK=1 DYNAMIC_ARCH=1 FC=gfortran. I forgot to test it with gfortran in last bug fixed commit. by
2011-06-09 22:59:49 +0800
8d50a9fd1
Fixed #35 a build bug with NO_LAPACK=1 & DYNAMIC_ARCH=1. by
2011-06-09 11:38:59 +0800
149638322
Print the wall time (cycles) with enabling FUNCTION_PROFILE. by
2011-06-09 10:40:15 +0800
4335bca2f
Fixed #33 ztrmm bug on Nehalem. by
2011-06-07 12:53:25 +0800
31040e4d8
Fixed #32 a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack. by
2011-06-03 13:19:54 +0800
3d7e62eb8
Fixed #31 Shared library placement on Mac. Thank Mr.Viral B. Shah for this patch. by
2011-05-30 12:42:17 +0800
88d94d0ec
Fixed #30 strmm computational error on Loongson3A. by
2011-05-28 09:48:34 +0000
af40551c9
Fixed the makefile bug about openblas_set_num_threads. by
2011-05-27 21:15:30 +0800
c30c22a76
Fixed a bug about detecting underscore prefix in c_check. by
2011-05-27 18:16:19 +0800
cc09e6ef3
Ingnore *.obj files in git. by
2011-05-27 18:12:45 +0800
fc8490911
Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a. by
2011-05-27 09:47:17 +0000
5ca4e51df
Remove the useless code, modify code comments and format. by
2011-05-18 10:54:51 +0000
fcb5ce011
Fixed #28. Convert the result to double precision in MIPS64 dsdot_k kernel. by
2011-05-17 21:24:00 +0000
a9320f896
Fixed #25 dtrmm and dtrsm computational error on Loongson3A. by
2011-05-14 22:00:57 +0000
830a823be
Added missed testing codes for dsdot. by
2011-05-13 02:41:39 +0800
b206fc707
Fixed #28. Convert the result to double precision in the end of dsdot kernel. by
2011-05-13 02:34:30 +0800
1d6051095
Added the unit testcase for dsdot. by
2011-05-13 02:19:55 +0800
03272a606
Added the unit test for drotmg. by
2011-05-13 01:21:39 +0800
0dc9eca36
Merge branch 'hotfix-readme_about_branches' into develop by
2011-05-12 19:06:31 +0800
8cc628a95
Merge branch 'hotfix-readme_about_branches' by
2011-05-12 19:06:02 +0800
bbc517292
Added the spec of git branches about this project. by
2011-05-12 19:05:20 +0800
29dce62b8
Finish dtrsm_kernel_Rx.S on Loongson3A. by
2011-05-11 10:44:23 +0000
fa8e4fd87
Fixed #26 the wrong result of rotmg. Used fabs() instead of abs(). by
2011-05-11 01:12:32 +0800
432c309f6
Finish dtrsm_kernel_Lx.S on Loongson3A. by
2011-05-10 12:48:43 +0000
d2f351d81
Modify dtrsm compiler options by
2011-05-09 17:31:58 +0000
5a991b714
Fixed #24 drmm error on Loongson3A by
2011-05-09 17:28:20 +0000
417b8ec79
Added openblas_set_num_threads for Fortran. by
2011-05-06 17:03:35 +0800
7dcf4eeee
Fixed #23. Fixed a bug of f_check script about generating link flags. by
2011-05-04 13:03:10 +0800
1acf5ace2
Fixed a bug when detecting Intel CPU. by
2011-05-03 17:19:36 +0800
fcf9b82f1
Fixed a build bug with NO_LAPACK=1 and SANNITY_CHECK=1. by
2011-05-03 14:42:11 +0800
2aab238c6
Fixed #16. Print the user-friendly message when detecting CPU failed. by
2011-04-22 22:14:06 +0800
b8d93812f
Added docs for make TARGET=your_cpu_target. by
2011-04-22 22:07:46 +0800
ff6ae89d3
Fixed #19. Provided an error msg when the arch is not supported. by
2011-04-22 20:21:42 +0800
0a45e5495
Fixed #21. Added extern C to support C++. Thank Tasio for the patch. by
2011-04-20 13:41:38 +0800
932093352
Completely dtrmm function. by
2011-04-17 20:26:49 +0000
921caefa5
Increased handling trmm part, no edge handling. Test size(M and N) must be a multiple of 4 . by
2011-04-15 21:56:25 +0000
ecd4c1f3d
Modify prefetching C. by
2011-04-11 22:46:36 +0000
ab9e4ce35
Adjust kc size from 112 to 116 . by
2011-04-11 22:17:57 +0000
921e040b1
Changed default page size to 16KB on Loongson 3A. by
2011-04-11 21:46:48 +0000
00ef0cd43
Supported goto_set_num_threads & openblas_set_num_threads functions when USE_OPENMP=1. by
2011-04-07 14:52:35 +0800
989c6f8b0
Fixed #14 the SEGFAULT bug on 64 cores. On SMP server, the number of CPUs or cores should be less than or equal to 64. by
2011-03-28 10:58:39 +0800
552f31dbb
Fixed #13. Fixed blasint undefined bug in <cblas.h> file. by
2011-03-25 01:16:12 +0800
5452ba385
Updated the developing version to v0.1 alpha2. by
2011-03-20 23:35:31 +0800
54745902b
Init Changelog file for next release version(v0.1alpha2). by
2011-03-20 23:30:09 +0800
1aa9a298e
Change BLOCK SIZE of LOONGSON3A TARGET. by
2011-04-06 10:39:31 +0000
782205a69
Add dgemm compiler Options in KERNEL.LOONGSON3A. by
2011-04-06 10:38:34 +0000
ac494c0d0
New kernel in LOONGSON3A. by
2011-04-06 10:36:44 +0000
85f99d476
Fixed #14 the SEGFAULT bug on 64 cores. On SMP server, the number of CPUs or cores should be less than or equal to 64. by
2011-03-28 10:58:39 +0800
5e7f29b19
Fixed #13. Fixed blasint undefined bug in <cblas.h> file. by
2011-03-25 01:16:12 +0800
141091f52
Merge branch 'master' of github.com:xianyi/OpenBLAS into x86 by
2011-03-22 14:16:18 +0800
e4bb6f248
(tag: v0.1alpha1)
Fixed the detecting bug on Intel Core i5. Thank ggl329 for the patch. by
2011-03-22 14:09:47 +0800
0edcdd470
Updated the developing version to v0.1 alpha2. by
2011-03-20 23:35:31 +0800
d67249112
Init Changelog file for next release version(v0.1alpha2). by
2011-03-20 23:30:09 +0800
972062903
OpenBLAS 0.1 alpha version 1. by
2011-03-20 22:44:57 +0800
d9aa359e6
Merge remote branch 'origin/loongson3a' into x86 by
2011-03-20 21:57:58 +0800
04769bdf5
Merge remote branch 'origin/loongson3a' into x86 by
2011-03-20 21:57:09 +0800
6f058487a
Detect Intel Core Clarkdale & Arrandale by
2011-03-20 21:56:40 +0800
f405b5bcc
Fixed the bug about Loongson3A gsLQC1 & gsSQC1 instructions in daxpy kernel. Now daxpy is correct. by
2011-03-18 23:05:56 +0000
2b8643e0d
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a by
2011-03-18 01:20:15 +0000
c84f8be45
Supported detecting new kernel(2.6.36) & new Loongson3A03 CPU. by
2011-03-18 01:10:58 +0000
d5cffd506
Modified the default kernel makefile in MIPS64 arch. by
2011-03-07 11:22:32 +0000
5838f1299
Support unalign address in daxpy on loongson3a simd.. by
2011-03-05 10:17:10 +0800
5444a3f8f
Unroll to 16 in daxpy on loongson3a. by
2011-03-04 17:50:17 +0800
88cbfcc5b
Merge commit 'origin/x86' into loongson3a by
2011-03-04 14:11:52 +0000
ce78abe37
Merge branch 'x86' of github.com:xianyi/OpenBLAS into x86 by
2011-03-04 11:53:04 +0800
8f1090d32
Support NO_LAPACK=1 to build the lib without LAPACK functions. by
2011-03-04 11:51:32 +0800
272f62a2b
Changed movlps macro name in capital in x86/zdot_sse2.S file. by
2011-03-03 00:46:39 +0800
36016fe34
On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191. This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9. by
2011-03-02 18:45:30 +0800
44acb7503
Added zdotu with x & y offset=1 test case. by
2011-03-02 18:03:40 +0800
6eb02bbb9
Merge remote branch 'origin/x86' into loongson3a by
2011-03-02 13:52:05 +0800
0e782b9bd
updated the changelog. by
2011-03-02 13:40:55 +0800
588737210
Fixed randomly SEGFAULT when nodemask==NULL with above Linux 2.6.34. Fixed #12. Thank Mr.Ei-ji Nakama providing this patch. by
2011-03-02 13:38:32 +0800
cdf33edac
Added Changelog. Fixed #11. by
2011-02-26 12:27:56 +0800
f7a5e049e
Enable Debug flags in memory alloc and init functions. by
2011-02-26 11:51:39 +0800
1b97ec1a7
Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes. by
2011-02-26 11:19:54 +0800
36b3a730d
Merge branch 'x86' of github.com:xianyi/OpenBLAS into x86 by
2011-02-24 17:02:52 +0800
128418f49
Fixed #10. Supported GOTO_NUM_THREADS & GOTO_THREADS_TIMEOUT environment variables. by
2011-02-24 15:16:21 +0800
12214e1d0
Fixed #7. Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86 32bits arch. by
2011-02-23 20:08:34 +0800
cd2cbabec
Added unit test case (zdotu, N=1). by
2011-02-22 14:16:46 +0800
854137e0f
Supported building debug version. by
2011-02-22 13:40:40 +0800
afbe3c979
Improved the quality of codes in unit test. Thanks José Luis García Pallero by
2011-02-21 00:42:46 +0800
0cfd29a81
Fixed #7. 1)Disable the multi-thread and 2) Modified kernel codes to avoid unloop in axpy function when incx==0 or incy==0. by
2011-02-21 00:24:21 +0800
109b86d00
Added axpy unit test with incx==0 and incy==0. by
2011-02-21 00:17:33 +0800
78da0e0a0
Fixed #6. Disable multi-thread swap when incx==0 or incy==0. by
2011-02-20 17:14:38 +0800
8dd3fd7f2
Added swap unit test with incx==0 and incy==0. by
2011-02-20 17:13:12 +0800