Martin Kroeker
fd2ff2714f
Merge pull request #2359 from martin-frbg/lapack-pr330
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
6 years ago
Martin Kroeker
2ea2bd99c7
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
from Reference-LAPACK PR 330
6 years ago
Martin Kroeker
fbb894948c
Merge pull request #22 from xianyi/develop
rebase
6 years ago
Martin Kroeker
e711659c90
Merge pull request #2358 from shengyang-3390/develop
Test all 7 declared values of N in float and double cblas3 tests
6 years ago
shengyang
893e6e57c4
modified: ctest/din3 ctest/sin3
6 years ago
Martin Kroeker
456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
6 years ago
Martin Kroeker
9998f8ed8b
Merge pull request #2356 from shengyang-3390/develop
Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)
6 years ago
shengyang
80db5f11e1
update
6 years ago
chenxuqiang
52de4cc8fd
kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
6 years ago
Martin Kroeker
44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
6 years ago
Martin Kroeker
86ab939936
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
6 years ago
Martin Kroeker
375b1875c8
[WIP] Update LAPACK to 3.9.0 ( #2353 )
* Update make.inc entries for LAPACK 3.9.0
Reference-LAPACK PR 347 changed some variable names and relative paths
* Update LAPACK to 3.9.0
* Add new functions from LAPACK 3.9.0
* Add new functions from LAPACK 3.9.0
* Restore LOADER command
as it makes it easier to specify pthread as needed
* Restore LOADER
* Restore EIG/LIN prefixes in cmdbase
* add binary path to lapack_testing.py call
* Restore OpenMP version check
* Restore OpenMP version check
* Restore fix for out-of-bounds array accesses
from #2096
6 years ago
Martin Kroeker
6c85cb1869
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
6 years ago
Martin Kroeker
995768bbc5
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
6 years ago
int_13h
96ad579428
add in runtime cpu detection for zarch ( #2349 )
add in runtime cpu detection for zarch
6 years ago
shengyang
8d84403205
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
6 years ago
shengyang
8729db117c
modified: ctest/din3
modified: ctest/sin3
6 years ago
w00421467
0833a4846a
Use arm neon instructions to optimize sgemm_beta operation
6 years ago
zq
50f7fc1401
[WIP] Use arm neon instructions to optimize tcopy operation
6 years ago
w00421467
d1b53806be
Merge remote-tracking branch 'pub/develop' into develop
6 years ago
wjc404
a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c
6 years ago
wjc404
700fe5b5ee
Add files via upload
6 years ago
wjc404
bb2729c855
Update CONTRIBUTORS.md
6 years ago
wjc404
aae44d040d
Update CONTRIBUTORS.md
6 years ago
wjc404
6362c34ee6
Update param.h
6 years ago
wjc404
f60840c420
Update KERNEL.ZEN
6 years ago
wjc404
109e18cd96
Update KERNEL.HASWELL
6 years ago
wjc404
ae1579be13
Create zgemm3m_kernel_4x4_haswell.c
6 years ago
w00421467
3ccf8885ac
prefetching for dgemm_beta
6 years ago
Martin Kroeker
454847588e
Update LAPACK to 3.9.0
6 years ago
Martin Kroeker
0257f26488
Merge pull request #21 from xianyi/develop
rebase
6 years ago
Martin Kroeker
c45b7aef14
Merge pull request #2348 from wjc404/develop
AVX2 CGEMM3M kernel
6 years ago
wjc404
312060d0d6
Update CONTRIBUTORS.md
6 years ago
wjc404
cd765f094b
Update cgemm3m_kernel_8x4_haswell.c
6 years ago
wjc404
64639f440f
Update param.h
6 years ago
wjc404
3a66c8cac1
Update KERNEL.ZEN
6 years ago
wjc404
4c35b8dbaa
Update gemm3m_level3.c
6 years ago
wjc404
ed9af2f7da
Update KERNEL.HASWELL
6 years ago
wjc404
5fd1edead9
Create cgemm3m_kernel_8x4_haswell.c
6 years ago
Martin Kroeker
26478eb0d0
Merge pull request #2345 from wjc404/develop
Optimize AVX2 CGEMM
6 years ago
wjc404
eeecd623d8
Update cgemm_kernel_8x2_haswell.c
6 years ago
wjc404
3ce6bcdb5f
Update CONTRIBUTORS.md
6 years ago
wjc404
6fbe51072b
Update CONTRIBUTORS.md
6 years ago
wjc404
611445c7f8
Update param.h
6 years ago
wjc404
2cd9306bb5
Update KERNEL.ZEN
6 years ago
wjc404
c418c81224
Update KERNEL.HASWELL
6 years ago
wjc404
025741f16a
Fast Haswell CGEMM kernel
6 years ago
Martin Kroeker
0ae49d2990
Merge pull request #2344 from wjc404/develop
Optimize AVX2 ZGEMM
6 years ago
wjc404
105e26e12a
Adjust Haswell ZGEMM blocking parameters
6 years ago
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
6 years ago