Martin Kroeker
bbc30700e8
Update saxpy_microk_nehalem-2.c
7 years ago
Martin Kroeker
e8d835ea46
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
715b1f263d
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
b6f4ef5aea
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
f78531a9ec
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
af29c99c85
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
2b542d1036
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
0cfb647a57
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
0172c51829
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
c931bb8172
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
ba9f792e75
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
cd3a35ee79
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
d384880da5
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
922e448978
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
6fcb55b22f
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
2bd18c7b73
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
b13f3c3bcf
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
3f1719a98d
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
dc15f3b5a7
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
00aff05c40
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
c9078eb8b4
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
de207d10c1
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
c23c17163f
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
c18c2c9d9b
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
ca02ac724f
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
9d46f84f24
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
6008f65318
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
d94e7da701
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
7af8f34df4
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
bb16456fe1
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
2f5a7c1656
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
30a7bd8e15
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
47e2b4592e
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
a671e19dd2
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
663eef3b66
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
4e6f8fec31
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
8a6bbf5a5b
Tag operands 0 and 1 as both input and output
7 years ago
Martin Kroeker
f0dd058430
Tag operands 0 and 1 as both input and output
For #1964 (basically a continuation of coding problems first seen in #1292 )
7 years ago
Arjan van de Ven
795285c587
Fix thinko in skylake beta handling
casting ints is cheaper but it has a rounding, not memory casing effect, resulting in
invalid outcome
7 years ago
Arjan van de Ven
d321448a63
dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell
The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives
a nice performance boost for medium sized matrices
7 years ago
Arjan van de Ven
c43331ad0a
dgemm: Use the skylakex beta function also for haswell
it's more efficient for certain tall/skinny matrices
7 years ago
Arjan van de Ven
69d206440a
Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support
7 years ago
Arjan van de Ven
0586899a10
Use sgemm_ncopy_4_skylakex.c also for Haswell
sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the
real perf win happens; this also works great for Haswell.
This gives double digit percentage gains on small and skinny matrices
7 years ago
Arjan van de Ven
00dc09ad19
Use the skylake sgemm beta code also for haswell
with a few small changes it's possible to use the skylake sgemm code
also for haswell, this gives a modest gain (10% range) for smallish
matrixes but does wonders for very skinny matrixes
7 years ago
Arjan van de Ven
cdc668d82b
Add a "sgemm direct" mode for small matrixes
OpenBLAS has a fancy algorithm for copying the input data while laying
it out in a more CPU friendly memory layout.
This is great for large matrixes; the cost of the copy is easily
ammortized by the gains from the better memory layout.
But for small matrixes (on CPUs that can do efficient unaligned loads) this
copy can be a net loss.
This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses
the whole copy machinary for ALPHA=1/BETA=0/... standard arguments,
for small matrixes only.
What is small? For the non-threaded case this has been measured to be
in the M*N*K = 28 * 512 * 512 range, while in the threaded case it's
less, around M*N*K = 1 * 512 * 512
7 years ago
Martin Kroeker
701ea88347
Use p2align instead of align for OSX compatibility
fixes #1902
7 years ago
Andrew
19c4bdd8b3
Add return value so that freebsd system clang does not err out
7 years ago
Arjan van de Ven
dcc5d6291e
skylakex: Make the sgemm/dgemm beta code robust for a N=0 or M=0 case
in the threading code there are cases where N or M can become 0,
and the optimized beta code did not handle this well, leading
to a crash
during the audit for the crash a few edge conditions on the if statements
were found and fixed as well
7 years ago
Arjan van de Ven
55b244ca0d
enable the SGEMM/SKX C based kernel
In QA the final bug was found so now the sklyakex sgemm C based kernel can
be activated....
7 years ago
Arjan van de Ven
d4bad73834
Add a C+intrinsics version of the SGEMM/skylakex kernel
for most sizes this is 1.2x to 1.4x faster than the current code
7 years ago