shengyang
80db5f11e1
update
6 years ago
Martin Kroeker
44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
6 years ago
Martin Kroeker
86ab939936
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
6 years ago
Martin Kroeker
6c85cb1869
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
6 years ago
Martin Kroeker
995768bbc5
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
6 years ago
int_13h
96ad579428
add in runtime cpu detection for zarch ( #2349 )
add in runtime cpu detection for zarch
6 years ago
shengyang
8d84403205
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
6 years ago
w00421467
0833a4846a
Use arm neon instructions to optimize sgemm_beta operation
6 years ago
zq
50f7fc1401
[WIP] Use arm neon instructions to optimize tcopy operation
6 years ago
w00421467
d1b53806be
Merge remote-tracking branch 'pub/develop' into develop
6 years ago
wjc404
a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c
6 years ago
wjc404
700fe5b5ee
Add files via upload
6 years ago
wjc404
f60840c420
Update KERNEL.ZEN
6 years ago
wjc404
109e18cd96
Update KERNEL.HASWELL
6 years ago
wjc404
ae1579be13
Create zgemm3m_kernel_4x4_haswell.c
6 years ago
w00421467
3ccf8885ac
prefetching for dgemm_beta
6 years ago
wjc404
cd765f094b
Update cgemm3m_kernel_8x4_haswell.c
6 years ago
wjc404
3a66c8cac1
Update KERNEL.ZEN
6 years ago
wjc404
ed9af2f7da
Update KERNEL.HASWELL
6 years ago
wjc404
5fd1edead9
Create cgemm3m_kernel_8x4_haswell.c
6 years ago
wjc404
eeecd623d8
Update cgemm_kernel_8x2_haswell.c
6 years ago
wjc404
2cd9306bb5
Update KERNEL.ZEN
6 years ago
wjc404
c418c81224
Update KERNEL.HASWELL
6 years ago
wjc404
025741f16a
Fast Haswell CGEMM kernel
6 years ago
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
6 years ago
wjc404
d573d24de7
Fast Haswell ZGEMM kernel
6 years ago
w00421467
b7cc69ee62
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
6 years ago
w00421467
aeef942c4f
use arm neon instructions to optimize gemm beta operation
6 years ago
Martin Kroeker
1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
6 years ago
Kavana Bhat
6baa9b07d7
AIX changes for Power8
6 years ago
Kavana Bhat
3938e59569
AIX changes for Power8
6 years ago
Isuru Fernando
b863b32ac5
Workaround an ICE in clang 9.0.0
This bug is not there in 8.x nor in the 9.0 daily snapshot.
6 years ago
Martin Kroeker
dd04143d4a
Merge pull request #2328 from martin-frbg/ppc9
Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version
6 years ago
Martin Kroeker
f3a6164bff
Merge pull request #2324 from antonblanchard/power9_segv
Fix SEGV in cdot_power9
6 years ago
Martin Kroeker
dedd822d1a
Fix caxpy/caxpyc naming in localentry
6 years ago
Martin Kroeker
2181fb7047
Fix caxpy/caxpyc naming in localentry
6 years ago
Martin Kroeker
a9b62c03f8
Substitute precompiled gcc7 codes only when gcc is older than 9.x
6 years ago
Martin Kroeker
97762234f9
Add variable for gcc >=9 test
used in KERNEL.POWER9
6 years ago
wjc404
934e601e93
Update dgemm_kernel_4x8_skylakex_2.c
6 years ago
Anton Blanchard
cf2a8e410c
Fix SEGV in cdot_power9
We were corrupting r2 because the local entry wasn't being
setup correctly.
6 years ago
wjc404
eb1e9c8c92
some optimizations
6 years ago
Andreas Arnez
d117dfd505
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead. The mismatch causes a build error. This is fixed.
6 years ago
Martin Kroeker
b09b5be0a4
Merge pull request #2315 from ewanglong/develop
revised fix windows compatible for #2313
6 years ago
Wang, Long
bfb5fbdb4d
revised fix windows compatible for #2313
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Martin Kroeker
08fa83aba2
Merge pull request #2312 from martin-frbg/power8be
Further Power8 big-endian corrections
6 years ago
Wang, Long
1191db1a49
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Wang, Long
0caf1434c9
Fix the integer overflow issue for large matrix size
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.
Signed-off-by: Wang, Long <long1.wang@intel.com>
6 years ago
Martin Kroeker
cad0d150db
Define alternate kernels for big-endian POWER8
6 years ago
Martin Kroeker
eba0aeb7cd
Fix compilation for big-endian POWER8
6 years ago
Martin Kroeker
0c07c356c1
Define alternate kernels for big-endian PPC440
6 years ago