Martin Kroeker
28e96458e5
Replace vpermpd with vpermilpd
to improve performance on Zen/Zen2 (as demonstrated by wjc404 in #2180 )
6 years ago
Martin Kroeker
6b6c9b1441
Merge pull request #2172 from quickwritereader/develop
power9 cgemm/ctrmm. new sgemm 8x16
6 years ago
AbdelRauf
a97b301aaa
cgemm/ctrmm power9
6 years ago
Piotr Kubaj
eebfeba768
Fix build on FreeBSD/powerpc64.
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
6 years ago
kavanabhat
a575f1e4c7
Update dtrmm_kernel_16x4_power8.S
6 years ago
AbdelRauf
cdbfb891da
new sgemm 8x16
6 years ago
Martin Kroeker
a17cf36225
Merge pull request #2153 from quickwritereader/develop
improved power9 zgemm,sgemm
6 years ago
AbdelRauf
148c4cc5fd
conflict resolve
6 years ago
AbdelRauf
d0c3543c3f
power9 zgemm ztrmm optimized
6 years ago
AbdelRauf
a469b32cf4
sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
6 years ago
AbdelRauf
8fe794f059
improved zgemm power9 based on power8
6 years ago
Martin Kroeker
74c10b57c6
Use generic kernels for complex (I)AMAX to support softfp
6 years ago
Martin Kroeker
c5495d2056
Ensure correct output for DAMAX with softfp
6 years ago
Martin Kroeker
c70496b108
Separate implementations of AMAX and IAMAX on arm
As noted in #1912 and comment on #1942 , the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
6 years ago
Martin Kroeker
9ea30f3788
Replace ISMIN and ISAMIN kernels on all x86_64 platforms ( #2125 )
* Mark iamax_sse.S as unsuitable for MIN due to issue #2116
* Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116
6 years ago
Martin Kroeker
6a8b4269b5
Merge pull request #2111 from martin-frbg/issue1955
Disable the SkyLakeX DGEMMIxCOPY kernels as well
6 years ago
Martin Kroeker
b1561ecc68
Disable DGEMMINCOPY as well for now
#1955
6 years ago
Martin Kroeker
7ed8431527
Disable the SkyLakeX DGEMMITCOPY kernel as well
as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955
6 years ago
Martin Kroeker
3f427c0cf9
Merge pull request #2107 from quickwritereader/develop
sgemm/strmm kernel for power9
6 years ago
AbdelRauf
47f892198c
conflict resolve
6 years ago
AbdelRauf
628b335e83
Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop
6 years ago
AbdelRauf
0f105dd8a5
sgemm/strmm
6 years ago
Martin Kroeker
ccfb7ead15
Merge pull request #2072 from martin-frbg/sum
Add (C)BLAS extension ?sum
6 years ago
Rashmica Gupta
bcdf1d4917
Add in runtime CPU detection for POWER.
6 years ago
Martin Kroeker
c04a729081
Add ?sum definitions for generic kernel
6 years ago
Martin Kroeker
100d94f94e
Add ?sum
6 years ago
Martin Kroeker
246ca29679
Add ZARCH implementation of ?sum
as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed
6 years ago
Martin Kroeker
9d717cb5ee
Add x86_64 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
6 years ago
Martin Kroeker
e3bc83f2a8
Add x86 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
6 years ago
Martin Kroeker
70f2a4e0d7
Add SPARC implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure
6 years ago
Martin Kroeker
706dfe263b
Add POWER implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
6 years ago
Martin Kroeker
688fa9201c
Add MIPS64 implementation of ?sum
as trivial copy of ?asum with the fabs replaced by mov to preserve code structure
6 years ago
Martin Kroeker
cdbe0f0235
Add MIPS implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
6 years ago
Martin Kroeker
f8b82bc6dc
Add ia64 implementation of ?sum
as trivial copy of asum with the fabs calls removed
6 years ago
Martin Kroeker
3e3ccb9011
Add ARM64 implementations of ?sum
as trivial copies of the respective ?asum kernels with the fabs calls removed
6 years ago
Martin Kroeker
94ab4e6fb2
Add ARM implementations of ?sum
(trivial copies of the respective ?asum with the fabs calls removed)
6 years ago
Martin Kroeker
c3cfc6986b
Add implementations of ssum/dsum and csum/zsum
as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure
6 years ago
Martin Kroeker
b9f4943a14
Add ?sum
6 years ago
Martin Kroeker
32c7063cb0
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
Disable the AVX512 DGEMM kernel (again)
6 years ago
Martin Kroeker
7c51cc8527
Merge branch 'develop' into develop
6 years ago
AbdelRauf
853a18bc17
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
6 years ago
Martin Kroeker
e608d4f7fe
Disable the AVX512 DGEMM kernel (again)
Due to as yet unresolved errors seen in #1955 and #2029
6 years ago
Martin Kroeker
03d7110900
Merge pull request #2042 from maomao194313/develop
add TARGET support for HiSilicon tsv110 CPUs
6 years ago
Martin Kroeker
f18ab6c17b
Merge pull request #2051 from martin-frbg/issue2048
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
6 years ago
Martin Kroeker
5b95534afc
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
for issue #2048
6 years ago
Celelibi
b7f59da42d
Fix crash in sgemm SSE/nano kernel on x86_64
Fix bug #2047 .
Signed-off-by: Celelibi <celelibi@gmail.com>
7 years ago
maomao194313
783ba8058f
HiSilicon tsv110 CPUs optimization branch
add HiSilicon tsv110 CPUs optimization branch
7 years ago
Andrew
6eee1beac5
move fix to right place
7 years ago
Martin Kroeker
e12cdf58ef
Merge pull request #2024 from martin-frbg/gcc9fixes4
Fix inline assembly constraints in Bulldozer TRSM kernels
7 years ago
Martin Kroeker
1860c9456d
Merge pull request #2023 from martin-frbg/gcc9fixes3
Fix inline assembly constraints in various x86_64 GEMVN kernels
7 years ago