Xianyi Zhang
4aa2d89217
Merge branch 'develop' into risc-v
6 years ago
Martin Kroeker
ddcbed6690
Merge pull request #2437 from martin-frbg/issue2434
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
6 years ago
Martin Kroeker
07454bf4d5
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
6 years ago
Martin Kroeker
4046985913
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
6 years ago
Martin Kroeker
e57b11acca
Add preliminary support for EMAG8180
6 years ago
Martin Kroeker
0b39cf95b0
Fix endianness conditionals
6 years ago
Martin Kroeker
9f39f0a2c3
Specify ismin/ismax assembly kernels for POWER8 directly
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
6 years ago
Martin Liska
aeea14ee40
Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S.
6 years ago
Martin Liska
18bcc36a69
Fix implementation of iamax_sse.S as reported in #2116 .
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
6 years ago
Martin Liska
0e7f43c898
Add missing USE_MIN in kernel/CMakeLists.txt.
6 years ago
Martin Kroeker
cafdd999b8
Update caxpy_power8.S
6 years ago
Martin Kroeker
92ca92a46c
Update caxpy_power8.S
6 years ago
Martin Kroeker
486c35c5dc
Update icamin_power8.S
6 years ago
Martin Kroeker
5ba3699f41
Update isamin_power8.S
6 years ago
Martin Kroeker
8eefa530cd
Update isamax_power8.S
6 years ago
Martin Kroeker
de40d47edf
Update isamin_power8.S
6 years ago
Martin Kroeker
7c162b8a21
Update isamax_power8.S
6 years ago
Martin Kroeker
0544cbc806
Fix syntax of endianness conditional
6 years ago
Martin Kroeker
120d20731f
Fix syntax of endianness conditional
6 years ago
Martin Kroeker
dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround
6 years ago
Bart Oldeman
7ea5e07d1c
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
6 years ago
Martin Kroeker
7e5cbb6f35
Fix bad conditional syntax that caused spurious application of USE_TRMM
6 years ago
wjc404
3447d04eaf
Update dgemm_kernel_16x2_skylakex.c
6 years ago
wjc404
8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
4e00d96a78
Update dgemm_kernel_16x2_skylakex.c
6 years ago
wjc404
096da2f51a
Update dgemm_kernel_16x2_skylakex.c
6 years ago
wjc404
081b188529
Update KERNEL.SKYLAKEX
6 years ago
wjc404
8019e70211
AVX512 16x2 DGEMM kernel
6 years ago
Qiyu8
ff42e68652
Optimize genenal Gemm Beta
6 years ago
Martin Kroeker
70f45749b9
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
6 years ago
wjc404
e5dcdeb550
Update sgemm_direct_skylakex.c
6 years ago
wjc404
952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c
6 years ago
wjc404
feaafbedd3
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
6 years ago
Martin Kroeker
b36018be6d
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
6 years ago
wjc404
3a100b2797
Update KERNEL.SKYLAKEX
6 years ago
Martin Kroeker
38742d5547
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
6 years ago
wjc404
bd4c032f52
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
92b10212de
optimize AVX2 SGEMM
6 years ago
wjc404
b73bf01378
optimize AVX2 SGEMM
6 years ago
wjc404
eb3c9f1db9
optimize AVX2 SGEMM
6 years ago
Martin Kroeker
456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
6 years ago
shengyang
80db5f11e1
update
6 years ago
chenxuqiang
52de4cc8fd
kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
6 years ago
Martin Kroeker
44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
6 years ago
Martin Kroeker
86ab939936
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
6 years ago
Martin Kroeker
6c85cb1869
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
6 years ago
Martin Kroeker
995768bbc5
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
6 years ago
int_13h
96ad579428
add in runtime cpu detection for zarch ( #2349 )
add in runtime cpu detection for zarch
6 years ago
shengyang
8d84403205
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
6 years ago