Martin Kroeker
995768bbc5
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
6 years ago
int_13h
96ad579428
add in runtime cpu detection for zarch ( #2349 )
add in runtime cpu detection for zarch
6 years ago
shengyang
8d84403205
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
6 years ago
shengyang
8729db117c
modified: ctest/din3
modified: ctest/sin3
6 years ago
w00421467
0833a4846a
Use arm neon instructions to optimize sgemm_beta operation
6 years ago
zq
50f7fc1401
[WIP] Use arm neon instructions to optimize tcopy operation
6 years ago
w00421467
d1b53806be
Merge remote-tracking branch 'pub/develop' into develop
6 years ago
wjc404
a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c
6 years ago
wjc404
700fe5b5ee
Add files via upload
6 years ago
wjc404
bb2729c855
Update CONTRIBUTORS.md
6 years ago
wjc404
aae44d040d
Update CONTRIBUTORS.md
6 years ago
wjc404
6362c34ee6
Update param.h
6 years ago
wjc404
f60840c420
Update KERNEL.ZEN
6 years ago
wjc404
109e18cd96
Update KERNEL.HASWELL
6 years ago
wjc404
ae1579be13
Create zgemm3m_kernel_4x4_haswell.c
6 years ago
w00421467
3ccf8885ac
prefetching for dgemm_beta
6 years ago
Martin Kroeker
454847588e
Update LAPACK to 3.9.0
6 years ago
Martin Kroeker
0257f26488
Merge pull request #21 from xianyi/develop
rebase
6 years ago
Martin Kroeker
c45b7aef14
Merge pull request #2348 from wjc404/develop
AVX2 CGEMM3M kernel
6 years ago
wjc404
312060d0d6
Update CONTRIBUTORS.md
6 years ago
wjc404
cd765f094b
Update cgemm3m_kernel_8x4_haswell.c
6 years ago
wjc404
64639f440f
Update param.h
6 years ago
wjc404
3a66c8cac1
Update KERNEL.ZEN
6 years ago
wjc404
4c35b8dbaa
Update gemm3m_level3.c
6 years ago
wjc404
ed9af2f7da
Update KERNEL.HASWELL
6 years ago
wjc404
5fd1edead9
Create cgemm3m_kernel_8x4_haswell.c
6 years ago
Martin Kroeker
26478eb0d0
Merge pull request #2345 from wjc404/develop
Optimize AVX2 CGEMM
6 years ago
wjc404
eeecd623d8
Update cgemm_kernel_8x2_haswell.c
6 years ago
wjc404
3ce6bcdb5f
Update CONTRIBUTORS.md
6 years ago
wjc404
6fbe51072b
Update CONTRIBUTORS.md
6 years ago
wjc404
611445c7f8
Update param.h
6 years ago
wjc404
2cd9306bb5
Update KERNEL.ZEN
6 years ago
wjc404
c418c81224
Update KERNEL.HASWELL
6 years ago
wjc404
025741f16a
Fast Haswell CGEMM kernel
6 years ago
Martin Kroeker
0ae49d2990
Merge pull request #2344 from wjc404/develop
Optimize AVX2 ZGEMM
6 years ago
wjc404
105e26e12a
Adjust Haswell ZGEMM blocking parameters
6 years ago
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
6 years ago
wjc404
d573d24de7
Fast Haswell ZGEMM kernel
6 years ago
Martin Kroeker
31d6c2eb7d
Merge pull request #2340 from Zeyiii/develop
[WIP] Use arm neon instructions to optimize gemm beta operation
6 years ago
w00421467
b7cc69ee62
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
6 years ago
w00421467
aeef942c4f
use arm neon instructions to optimize gemm beta operation
6 years ago
Martin Kroeker
445ca2f418
Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout
driver: more reasonable thread wait timeout on Windows.
6 years ago
Jehan
13226e3101
driver: more reasonable thread wait timeout on Windows.
It used to be 5ms, which might not be long enough in some cases for the
thread to exit well, but then when set to 5000 (5s), it would slow down
any program depending on OpenBlas.
Let's just set it to 50ms, which is at least 10 times longer than
originally, but still reasonable in case of failed thread termination.
6 years ago
Martin Kroeker
1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
6 years ago
Martin Kroeker
c6ecb195e6
Merge pull request #2337 from martin-frbg/issue2336
Support two-digit version numbers in gcc version check
6 years ago
Martin Kroeker
b28db31429
Support two-digit version numbers in gcc version check
fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.
6 years ago
Kavana Bhat
6baa9b07d7
AIX changes for Power8
6 years ago
Martin Kroeker
a4896b5538
Update DYNAMIC_ARCH support for ARM64 and PPC ( #2332 )
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
6 years ago
Kavana Bhat
3938e59569
AIX changes for Power8
6 years ago
Martin Kroeker
9d5079008f
Merge pull request #2334 from martin-frbg/fix2228
Remove misplaced file
6 years ago