Martin Kroeker
7353ea5afc
Delete KERNEL.24K
5 years ago
Martin Kroeker
6a04efb122
Rename KERNEL files to include MIPS prefix
5 years ago
Martin Kroeker
d712ea724c
Add MIPS24K support
5 years ago
Martin Kroeker
5b0093b5fe
Convert aligned moves to unaligned
should have no performance impact on reasonably modern cpus and fixes occasional crashes in actual user code.
5 years ago
Martin Kroeker
e9bfa2291a
Fix parameter overflow
5 years ago
gxw
8d07cf9b67
Fix compilation problem on loongson platform
Using "make TARGET=GENERIC" on loongson platform will get the following
error messages:
"make[1]: *** No rule to make target 'sgemm_incopy.o', needed by 'libs'"
Add kernel/mips64/KERNEL.generic to slove the problem.
5 years ago
Martin Kroeker
806f89166e
Make ARMV7 compile with xcode and add a CI job for it ( #2537 )
* Add an ARMV7 iOS build on Travis
* thread_local appears to be unavailable on ARMV7 iOS
* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH
* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
5 years ago
Martin Kroeker
c6af9bbb32
Merge pull request #2534 from martin-frbg/issue2496
Fix zero initialization for beta=0 case
5 years ago
Martin Kroeker
144be81ca1
fix initialization to zero in the NEON SGEMM_BETA kernel as well
5 years ago
Martin Kroeker
07cdd5d05c
Fix zero initialization for beta=0 case
use immediate initialization instead of multiplication in case register content is a NaN
5 years ago
Martin Kroeker
567d2760e6
Merge pull request #2520 from wjc404/develop
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
5 years ago
wjc404
b8307768e2
Add files via upload
5 years ago
Martin Kroeker
af8a619e1f
Merge pull request #2517 from wjc404/develop
Temporary fix for SKX STRSM
5 years ago
wjc404
62b9608986
Update KERNEL.SKYLAKEX
5 years ago
Martin Kroeker
a1b181cea2
Merge pull request #2516 from wjc404/develop
AVX2 STRSM kernels
5 years ago
wjc404
cdc0e9011e
Update KERNEL.ZEN
5 years ago
wjc404
fa049d49c2
AVX2 STRSM kernel
5 years ago
s00548429
bec7923a0d
Fix the functional bugs for zamax.
5 years ago
Rajalakshmi Srinivasaraghavan
2afc074803
Fix DYNAMIC_ARCH build for POWER9
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks. This patch fixes some of the macros that are used
to check compiler version. On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
5 years ago
Martin Kroeker
4f371b0fbf
Use POWER8 kernels on big-endian POWER9 for now
6 years ago
Martin Kroeker
ea8eec5d17
Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
6 years ago
Ali Saidi
c623a965f9
Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
6 years ago
wjc404
dd22eb7621
Update cgemm_kernel_8x2_haswell.c
6 years ago
wjc404
2352331e60
Update zgemm_kernel_4x2_haswell.c
6 years ago
wjc404
1b980001dd
Update zgemm_kernel_4x2_haswell.c
6 years ago
wjc404
2515e1152f
Update cgemm_kernel_8x2_haswell.c
6 years ago
Martin Kroeker
ddcbed6690
Merge pull request #2437 from martin-frbg/issue2434
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
6 years ago
wjc404
903854c168
Add files via upload
6 years ago
wjc404
a2ff577a30
Update KERNEL.ZEN
6 years ago
wjc404
97a32cb0a5
Update KERNEL.HASWELL
6 years ago
Martin Kroeker
07454bf4d5
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
6 years ago
Martin Kroeker
4046985913
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
6 years ago
Martin Kroeker
e57b11acca
Add preliminary support for EMAG8180
6 years ago
Martin Kroeker
0b39cf95b0
Fix endianness conditionals
6 years ago
Martin Kroeker
9f39f0a2c3
Specify ismin/ismax assembly kernels for POWER8 directly
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
6 years ago
Martin Liska
aeea14ee40
Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S.
6 years ago
Martin Liska
18bcc36a69
Fix implementation of iamax_sse.S as reported in #2116 .
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
6 years ago
Martin Liska
0e7f43c898
Add missing USE_MIN in kernel/CMakeLists.txt.
6 years ago
wjc404
f566787e6e
Update KERNEL.SKYLAKEX
6 years ago
wjc404
e3368cbf18
AVX512 STRMM kernel
6 years ago
Martin Kroeker
cafdd999b8
Update caxpy_power8.S
6 years ago
Martin Kroeker
92ca92a46c
Update caxpy_power8.S
6 years ago
Martin Kroeker
486c35c5dc
Update icamin_power8.S
6 years ago
Martin Kroeker
5ba3699f41
Update isamin_power8.S
6 years ago
Martin Kroeker
8eefa530cd
Update isamax_power8.S
6 years ago
Martin Kroeker
de40d47edf
Update isamin_power8.S
6 years ago
Martin Kroeker
7c162b8a21
Update isamax_power8.S
6 years ago
Martin Kroeker
0544cbc806
Fix syntax of endianness conditional
6 years ago
Martin Kroeker
120d20731f
Fix syntax of endianness conditional
6 years ago
Martin Kroeker
dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround
6 years ago