Martin Kroeker
dccff2e785
Merge pull request #2206 from martin-frbg/zen-dtrmm
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
6 years ago
Martin Kroeker
5c3458a6e7
Merge pull request #2199 from martin-frbg/zen-dtrsm
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
6 years ago
Martin Kroeker
acf6002ab2
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
6 years ago
Martin Kroeker
2dfb804cb9
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
to improve performance on AMD Zen (#2180 ) applying wjc404's improvement of the DGEMM kernel from #2186
6 years ago
Martin Kroeker
4c153ec9da
Merge pull request #2196 from wjc404/develop
Add vbroadcastsd kernel to dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
7eecd8e39c
Add files via upload
6 years ago
Martin Kroeker
7b0b7c11d2
Merge pull request #2190 from martin-frbg/zdot-zen
Replace vpermpd with vpermilpd in the Haswell/Zen zdot microkernel
6 years ago
Martin Kroeker
28e96458e5
Replace vpermpd with vpermilpd
to improve performance on Zen/Zen2 (as demonstrated by wjc404 in #2180 )
6 years ago
wjc404
95fb98f556
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
4801c6d36b
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
9440fa607d
Add files via upload
6 years ago
wjc404
94db259e5b
Add files via upload
6 years ago
wjc404
f49f8047ac
Add files via upload
6 years ago
wjc404
825777faab
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
9c89757562
Add files via upload
6 years ago
wjc404
9b04baeaee
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
8a074b3965
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
211ab03b14
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
1733f927e6
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
182b06d6ad
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
7a9050d681
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
0ba29fd262
Update dgemm_kernel_4x8_haswell.S for zen2
replaced a bunch of vpermpd instructions with vpermilpd and vperm2f128
6 years ago
Martin Kroeker
9ea30f3788
Replace ISMIN and ISAMIN kernels on all x86_64 platforms ( #2125 )
* Mark iamax_sse.S as unsuitable for MIN due to issue #2116
* Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116
6 years ago
Martin Kroeker
b1561ecc68
Disable DGEMMINCOPY as well for now
#1955
6 years ago
Martin Kroeker
7ed8431527
Disable the SkyLakeX DGEMMITCOPY kernel as well
as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955
6 years ago
Martin Kroeker
c04a729081
Add ?sum definitions for generic kernel
6 years ago
Martin Kroeker
9d717cb5ee
Add x86_64 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
6 years ago
Martin Kroeker
32c7063cb0
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
Disable the AVX512 DGEMM kernel (again)
6 years ago
Martin Kroeker
e608d4f7fe
Disable the AVX512 DGEMM kernel (again)
Due to as yet unresolved errors seen in #1955 and #2029
6 years ago
Celelibi
b7f59da42d
Fix crash in sgemm SSE/nano kernel on x86_64
Fix bug #2047 .
Signed-off-by: Celelibi <celelibi@gmail.com>
7 years ago
Andrew
6eee1beac5
move fix to right place
7 years ago
Martin Kroeker
e12cdf58ef
Merge pull request #2024 from martin-frbg/gcc9fixes4
Fix inline assembly constraints in Bulldozer TRSM kernels
7 years ago
Martin Kroeker
1860c9456d
Merge pull request #2023 from martin-frbg/gcc9fixes3
Fix inline assembly constraints in various x86_64 GEMVN kernels
7 years ago
Martin Kroeker
f9bb76d29a
Fix inline assembly constraints in Bulldozer TRSM kernels
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
7 years ago
Martin Kroeker
efb9038f72
Fix inline assembly constraints
7 years ago
Martin Kroeker
e976557d29
Fix inline assembly constraints
rework indices to allow marking argument lda as input and output.
7 years ago
Martin Kroeker
9d8be15789
Fix inline assembly constraints
rework indices to allow marking argument lda4 as input and output. For #2009
7 years ago
Martin Kroeker
d752799a0f
Merge pull request #2021 from martin-frbg/gcc9fixes2
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
7 years ago
Martin Kroeker
c26c0b77a7
Fix wrong constraints in inline assembly
for #2009
7 years ago
Martin Kroeker
1c6da2d03c
Merge pull request #2019 from martin-frbg/gcc9fixes
Fix unannounced modification of input operand 8 (lda4) in Haswell GEMVN microkernel
7 years ago
Martin Kroeker
4255a58cd2
Rename operands to put lda on the input/output constraint list
7 years ago
Martin Kroeker
46e415b140
Save and restore input argument 8 (lda4)
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009 )
7 years ago
Bart Oldeman
69a97ca7b9
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2
See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
7 years ago
Martin Kroeker
ab1630f9fa
Fix declaration of arguments in inline assembly
Argument 0 is modified so should be input and output
7 years ago
Martin Kroeker
b824fa70eb
Fix declaration of assembly arguments in SSYMV and DSYMV microkernels
Arguments 0 and 1 are both input and output
7 years ago
Martin Kroeker
91481a3e4e
Fix declaration of input arguments in inline assembly
Argument 0 is modified as it doubles as a counter
7 years ago
Martin Kroeker
dc6ac9eab0
Fix declaration of input arguments in the x86_64 s/dGEMV_T and s/dGEMV_N kernels
Arguments 0 and 1 need to be tagged as both input and output
7 years ago
Martin Kroeker
32b0f1168e
Fix declaration of input arguments in the Sandybridge GER microkernels ( #1967 )
* Tag arguments 0 and 1 as both input and output
7 years ago
Martin Kroeker
b495e54310
Fix declaration of input arguments in the x86_64 SCAL microkernels ( #1966 )
* Tag arguments 0 and 1 as both input and output (see #1964 )
7 years ago
Martin Kroeker
d5e6940253
Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY ( #1965 )
* Tag operands 0 and 1 as both input and output
For #1964 (basically a continuation of coding problems first seen in #1292 )
7 years ago