0a696bd4c
Improved the makefile for Intel compiler. by
2012-02-20 23:36:58 +0800
fda39c6cb
(tag: v0.1alpha2.5)
Updated the Changelog. by
2012-02-20 09:06:43 +0800
875da22a4
Merge pull request #77 from nolta/master by
2012-02-19 16:44:35 -0800
765387af1
(refs/pull/77/merge)
Merge 363a563ec2 into 0caa5616f2 by
2012-02-19 11:08:43 -0800
363a563ec
(refs/pull/77/head)
fix #49 by
2012-02-19 14:07:34 -0500
8da6fdc2c
Merge branch 'hotfix-0.1alpha2.5' into develop by
2012-02-19 23:11:06 +0800
0caa5616f
Merge branch 'hotfix-0.1alpha2.5' by
2012-02-19 22:56:06 +0800
727e6d83c
Released 0.1 alpha 2.5. Updated the documents. by
2012-02-19 22:55:31 +0800
da3f101a7
Merge branch 'develop' into hotfix-0.1alpha2.5 by
2012-02-19 22:31:09 +0800
fe613de8e
refs #69. Auto-detect Intel Core i6/i7 (Sandy Bridge) CPU with Nehalem assembly kernels. by
2012-02-13 19:20:35 +0800
142e99d4e
Merge branch 'master' into develop by
2012-01-20 21:32:13 +0800
7af0139a0
Modify P Q R size of Loongson3b. by
2012-01-11 16:05:39 +0000
8e53b57bb
Appending gemmkernel and trmmkernel C code in kernel/generic, this code can be used to execute on a new platform which dose not have optimized assemble kernel. by
2012-01-10 17:16:13 +0000
0d3647c39
Merge pull request #76 from StefanKarpinski/patch-1 by
2012-01-01 05:57:25 -0800
233c7e4be
(refs/pull/76/merge)
Merge 0d76196a09 into fe7a932ab8 by
2011-12-28 20:54:45 -0800
0d76196a0
(refs/pull/76/head)
Fix #68: don't require SystemStubs on OS X. by
2011-12-28 23:53:20 -0500
b281f3dee
Merge remote branch 'origin/loongson3a' into loongson3b by
2011-12-06 13:49:39 +0000
a4292976e
Adding detection of complex situations in symm.c, otherwise the buffer address of sb will overlap the end of sa. by
2011-12-05 14:54:25 +0000
c2dad58ad
Adding n32 multiple threads condition. by
2011-12-01 16:33:11 +0000
d5a6d789e
Fixed a typo in Makefile. by
2011-11-28 15:31:46 +0800
875dde437
Merge branch 'lapack_3.4.0' into develop by
2011-11-28 15:28:54 +0800
5be22ca80
Refs #72. Upgraded LAPACK to 3.4.0 version. by
2011-11-28 15:28:22 +0800
66904fc4e
BLAS3 used standard MIPS instructions without extensions on Loongson 3B. by
2011-11-25 11:20:25 +0000
8163ab7e5
Change the block size on Loongson 3B. by
2011-11-23 18:40:35 +0000
ef6f7f32a
Fixed mbind bug on Loongson 3B. Check the return value of my_mbind function. by
2011-11-23 17:17:41 +0000
285e69e2d
Disable using simple thread level3 to fix a bug on Loongson 3B. by
2011-11-17 16:46:26 +0000
d1baf14a6
Enable thread affinity on Loongson 3B. Fixed the bug of reading cycle counter. by
2011-11-11 17:49:41 +0000
0884f6b78
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3b by
2011-11-11 14:26:49 +0000
2d78fb05c
Add conjugate condition to gemv. by
2011-11-10 15:38:48 +0000
b95ad4cfa
Support detecting ICT Loongson-3B CPU. by
2011-11-09 19:28:22 +0000
3bbe3ddb3
Merge branch 'develop' of github.com:xianyi/OpenBLAS into loongson3b by
2011-11-09 19:08:29 +0000
a32e56500
Fix the compute error of gemv when incx and incy are negative numbers. by
2011-11-04 19:32:21 +0000
c1e618ea2
Add complete gemv function on Loongson3a platform. by
2011-11-03 13:53:48 +0000
19f5b5c13
Fixed #66 the bug in zgemv kernel with transpose matrix on 64-bit MingW (Windows). by
2011-10-18 18:44:23 +0800
c852ce398
Ref #65. Fixed 64-bit Windows calling convention bug in cdot and zdot. by
2011-10-18 10:23:17 +0800
ba31b19c0
Ref #62. In OpenMP implementation, check the return value of omp_get_max_threads(). It makes sure the return value as same as blas_cpu_numbers which is an internal global variable to store the number of threads in OpenBLAS. by
2011-10-16 22:56:19 +0800
66a3c6df4
Ref #63. Fixed generating DLL bug on ming-w64. by
2011-10-09 17:25:44 +0800
57658a8c1
ref #62. Added the user friendly message with USE_OPENMP=1. The users should use OMP_NUM_THREADS. by
2011-10-09 15:14:48 +0800
9fe3049de
Adding conditional compilation(#if defined(LOONGSON3A)) to avoid affecting the performance of other platforms. by
2011-09-26 15:21:45 +0000
831858b88
Modify aligned address of sa and sb to improve the performance of multi-threads. by
2011-09-23 20:59:48 +0000
8de2ba67d
Merge branch 'hotfix-0.1alpha2.4' into develop by
2011-09-18 17:00:29 +0800
fe7a932ab
(tag: v0.1alpha2.4)
Merge branch 'hotfix-0.1alpha2.4' by
2011-09-18 16:57:28 +0800
1d31c79dc
Prepared the document for 0.1 alpha 2.4 version. by
2011-09-18 05:46:08 +0800
d40e5621e
Change the installation folder into /include and /lib. by
2011-09-18 05:07:00 +0800
bcc795621
Refs #57. Continue to fix absolute path issue about shared library on Mac OSX. by
2011-09-18 01:35:12 +0800
821cbb299
Updated the document for 0.1 alpha 2.4. by
2011-09-17 07:55:59 +0800
74fa79035
Merge branch 'develop' into hotfix-0.1alpha2.4 by
2011-09-17 07:32:10 +0800
756477bfe
Output the installation tip after building complete. by
2011-09-17 07:21:11 +0800
864c68ffc
Bump the version number. by
2011-09-17 03:05:26 +0800
68cae521d
Refs #57. The bug about absolute path of shared library on Mac OSX. by
2011-09-17 02:58:01 +0800
d0152ec8c
Fixed #61 a building bug about setting TARGET and DYNAMIC_ARCH at the same time. by
2011-09-17 02:27:56 +0800
e08cfaf9c
Complete all the complex single-precision functions of level3, but the performance needs further improve. by
2011-09-16 17:50:40 +0000
ee4bb8bd2
Add ctrmm part in cgemm_kernel_loongson3a_4x2_ps.S. by
2011-09-16 16:08:39 +0000
7fa3d23dd
Complete cgemm function, but no optimization. by
2011-09-15 16:08:23 +0000
9679dd077
Fix some compute error. by
2011-09-14 20:00:35 +0000
048742f38
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a by
2011-09-14 16:32:36 +0000
7b410b7f0
Fixed #58 zdot SEGFAULT bug with GCC-4.6. Thank Mr. John for this patch. by
2011-09-14 23:52:51 +0800
d238a768a
Use ps instructions in cgemm. by
2011-09-14 15:32:25 +0000
260db9fb9
Merge branch 'hotfix-0.1alpha2.3' into develop by
2011-09-09 00:57:47 +0800
e27b761d7
(tag: v0.1alpha2.3)
Merge branch 'hotfix-0.1alpha2.3' by
2011-09-09 00:55:04 +0800
16fc08332
Refs #47. Fixed the seting parameter bug on Loongson 3A single thread version. by
2011-09-08 16:39:34 +0000
3c856c0c1
Check the return value of pthread_create. Update the docs with known issue on Loongson 3A. by
2011-09-06 18:27:33 +0000
dc9c69db9
Merge branch 'develop' into loongson3a by
2011-09-06 18:19:50 +0000
b1fe26c45
refs #55. Changed DTB_ENTRIES to DTB_DEFAULT_ENTRIES in x86 gemv_n kernel codes. by
2011-09-06 14:14:07 +0800
0389b631f
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a by
2011-09-05 16:31:40 +0000
64fa709d1
Fixed #46. Initialize variables in cblat3.f and zblat3.f. by
2011-09-05 16:30:55 +0000
4727fe8ab
Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads. by
2011-09-05 15:13:05 +0000
90481ce74
Updated the doc about 0.1alpha2.3. by
2011-09-05 17:40:55 +0800
9fc6764fa
refs #55. Added DTB_ENTRIES into dynamic arch setting parameters. Now, it can read DTB_ENTRIES on runtime. by
2011-09-05 17:37:07 +0800
74d4cdb81
Fix an illegal instruction for strmm_RTLU. by
2011-09-02 19:41:06 +0000
790614683
Fix an error for strmm_LLTN. by
2011-09-02 16:57:33 +0000
3274ff47b
Fix an error for strmm_LLTN. by
2011-09-02 16:50:50 +0000
a059c553a
Fix a compute error for strmm. by
2011-09-02 16:00:04 +0000
23e182ca7
Fix stack-pointer bug for strmm. by
2011-09-02 15:28:01 +0000
a15bc9582
Add strmm part. by
2011-09-02 09:15:09 +0000
74a3f6348
Tuning mb, kb, nb size to get the best performance. by
2011-09-01 17:15:28 +0000
09f49fa89
Using PS instructions to improve the performance of sgemm and it is 4.2Gflops now. by
2011-08-31 21:24:03 +0000
b9d89f8aa
Fixed the bug about installation. f77blas.h works OK now. by
2011-08-31 18:21:37 +0800
cb0214787
Modify compile options. by
2011-08-30 20:57:00 +0000
2e8cdd154
Using ps instruction. by
2011-08-30 20:54:19 +0000
b29d327d1
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a by
2011-07-18 17:06:53 +0000
c8360e3ae
Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops. by
2011-07-18 17:03:38 +0000
19d2ab485
Merge branch 'hotfix-0.1alpha2.2' into develop by
2011-07-14 01:09:21 +0800
12d77deee
(tag: v0.1alpha2.2)
Merge branch 'hotfix-0.1alpha2.2' by
2011-07-14 01:03:09 +0800
043927c7d
Update the documents for 0.1alpha2.2 version. by
2011-07-14 01:02:19 +0800
30947ea2d
Fixed #44 a makefile bug when DYNAMIC_ARCH=1 and INTERFACE64=1. by
2011-07-14 00:54:23 +0800
33313b022
Merge branch 'develop' into loongson3a by
2011-07-07 14:25:51 +0800
a5300420e
Merge branch 'hotfix-0.1alpha2.1' into develop by
2011-06-28 15:46:55 +0800
9b46bf1eb
(tag: v0.1alpha2.1)
Merge branch 'hotfix-0.1alpha2.1' by
2011-06-28 15:43:08 +0800
c06b7be32
Refs #42. Output the error message when detecting fortran compiler failed. by
2011-06-28 15:42:09 +0800
68532fa9e
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a by
2011-06-24 09:28:12 +0000
708d2b625
Fix compute error in ztrmm. by
2011-06-24 09:27:41 +0000
e72113f06
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G. by
2011-06-23 21:11:00 +0000
fc21f7ad2
Merge branch 'release-v0.1alpha2' into loongson3a by
2011-06-23 16:08:23 +0800
14f81da37
Change prefetch length of A and B, the performance is 2.1G now. by
2011-06-23 10:46:58 +0000
ca8bf5abb
Merge branch 'release-v0.1alpha2' into develop by
2011-06-23 16:07:34 +0800
4a73f5c5e
(tag: v0.1alpha2)
Merge branch 'release-v0.1alpha2' by
2011-06-23 15:18:40 +0800
6a0762949
Fixed #38. Released v0.1 alpha2. by
2011-06-23 15:16:24 +0800
859b71645
Refs #37. Updated REAME about the compatible issue with EKOPath compiler. by
2011-06-23 15:09:34 +0800
078bfd0b4
Refs #39. Moved the shared lib (dll) to top directory in MingW64 compiler environment. by
2011-06-22 13:19:39 +0800