Martin Kroeker
f9c5023e04
Merge pull request #1994 from quickwritereader/develop
sgemv cgemv pairs
7 years ago
Ubuntu
4abc375a91
sgemv cgemv pairs
7 years ago
Martin Kroeker
874df65491
Fix incorrect sgemv results for IBM z14
part of PR #1993 that was inadvertently misplaced into the toplevel directory
7 years ago
Martin Kroeker
877023e1e1
Fix precision of zarch DSDOT
from patch provided by aarnez in #991
7 years ago
Martin Kroeker
265142edd5
Fix typo in the zarch min/max kernels
from patch provided by aarnez in #991
7 years ago
Martin Kroeker
885a3c4350
USE_TRMM on Z14
from patch provided by aarnez in #991
7 years ago
maamountki
82124729af
Merge branch 'develop' into z14
7 years ago
maamountki
29416cb5a3
[ZARCH] Add Z13 version for max/min functions
7 years ago
maamountki
48b9b94f7f
[ZARCH] Improve loading performance for camax/icamax
7 years ago
Martin Kroeker
86a824c97f
Fix wrong comparison that made IMIN identical to IMAX
as reported by aarnez in #1990
7 years ago
Martin Kroeker
808410c2c7
Fix wrong comparison that made IMIN identical to IMAX
as suggested in #1990
7 years ago
maamountki
fcd814a8d2
[ZARCH] Fix bug in max/min functions
7 years ago
maamountki
dc4d3bccd5
[ZARCH] Fix icamax/icamin
7 years ago
maamountki
c7143c1019
[ZARCH] Fix iamax/imax single precision
7 years ago
maamountki
04873bb174
[ZARCH] Undo the last commit
7 years ago
maamountki
c8ef9fb220
[ZARCH] Fix bug in iamax/iamin/imax/imin
7 years ago
maamountki
b111829226
[ZARCH] Update max/min functions
7 years ago
Martin Kroeker
32b0f1168e
Fix declaration of input arguments in the Sandybridge GER microkernels ( #1967 )
* Tag arguments 0 and 1 as both input and output
7 years ago
Martin Kroeker
b495e54310
Fix declaration of input arguments in the x86_64 SCAL microkernels ( #1966 )
* Tag arguments 0 and 1 as both input and output (see #1964 )
7 years ago
Martin Kroeker
d5e6940253
Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY ( #1965 )
* Tag operands 0 and 1 as both input and output
For #1964 (basically a continuation of coding problems first seen in #1292 )
7 years ago
Ubuntu
43a4572038
crot fix
7 years ago
Abdelrauf
a034e65512
Merge branch 'develop' into develop
7 years ago
Ubuntu
8c3386be87
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
Fixed idamin,icamin choosing the first occurance index of equal minimals
7 years ago
maamountki
b815a04c87
[ZARCH] fix a bug in max/min functions
7 years ago
maamountki
1a7925b3a3
[ZARCH] Update dgemv_n_4.c
7 years ago
maamountki
406f835f00
[ZARCH] update cgemv_n_4.c
7 years ago
maamountki
621dedb37b
[ZARCH] Update cgemv_t_4.c
7 years ago
maamountki
b731e8246f
Update sgemv_t_4.c
7 years ago
maamountki
ecc31b743f
Update dgemv_t_4.c
7 years ago
maamountki
5d89d6b143
[ZARCH] fix sgemv_n_4.c
7 years ago
maamountki
67432b23c2
[ZARCH] fix cgemv_n_4.c
7 years ago
maamountki
be66f5d5c2
[ZARCH] fix data prefetch type in sdot
7 years ago
maamountki
c2ffef8156
[ZARCH] fix data prefetch type in ddot
7 years ago
maamountki
e7455f500c
[ZARCH] fix dsdot.c
7 years ago
maamountki
3eafcfa650
[ZARCH] fix cgemv_n_4.c
7 years ago
maamountki
94cd946b96
[ZARCH] fix cgemv_n_4.c
7 years ago
maamountki
1aa840a0a2
[ZARCH] fix sgemv_t_4.c
7 years ago
Arjan van de Ven
795285c587
Fix thinko in skylake beta handling
casting ints is cheaper but it has a rounding, not memory casing effect, resulting in
invalid outcome
7 years ago
Arjan van de Ven
d321448a63
dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell
The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives
a nice performance boost for medium sized matrices
7 years ago
Arjan van de Ven
c43331ad0a
dgemm: Use the skylakex beta function also for haswell
it's more efficient for certain tall/skinny matrices
7 years ago
Martin Kroeker
c4e23dd016
Update Makefile
7 years ago
Martin Kroeker
cfc4acc221
typo
7 years ago
Martin Kroeker
545c2b1bbb
Add -mavx2 on Haswell only if the compiler supports it
7 years ago
Arjan van de Ven
69d206440a
Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support
7 years ago
Martin Kroeker
3843e3e017
use -maxv2 on haswell
7 years ago
Martin Kroeker
fbcb14a74b
should be core-avx2
7 years ago
Martin Kroeker
2a3190dc76
fix elseifeq and use older option core2-avx for compatibility
7 years ago
Martin Kroeker
1ebe5c0f49
Add -march=haswell to HASWELL part of DYNAMIC_ARCH build
7 years ago
Arjan van de Ven
0586899a10
Use sgemm_ncopy_4_skylakex.c also for Haswell
sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the
real perf win happens; this also works great for Haswell.
This gives double digit percentage gains on small and skinny matrices
7 years ago
Arjan van de Ven
00dc09ad19
Use the skylake sgemm beta code also for haswell
with a few small changes it's possible to use the skylake sgemm code
also for haswell, this gives a modest gain (10% range) for smallish
matrixes but does wonders for very skinny matrixes
7 years ago