Bart Oldeman
5ceca1a4d8
Add sscal.c + microkernels for Haswell, Zen, Skylake and newer.
Unlike [dcz]scal, sscal still used the original GotoBLAS SSE code from scal_sse.S.
This code follows dscal as closely as possible, except for the inc_x > 1 code
for which a plain C loop is used much like the one in cscal.c, instead of an
adaptation of the SSE2 asm code of dscal.c (I tried but the performance wasn't
better than the plain C loop).
3 years ago
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
5 years ago
Gengxin Xie
cb3c190a3a
Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic
5 years ago
wjc404
fa049d49c2
AVX2 STRSM kernel
5 years ago
wjc404
97a32cb0a5
Update KERNEL.HASWELL
6 years ago
wjc404
b73bf01378
optimize AVX2 SGEMM
6 years ago
wjc404
109e18cd96
Update KERNEL.HASWELL
6 years ago
wjc404
ed9af2f7da
Update KERNEL.HASWELL
6 years ago
wjc404
c418c81224
Update KERNEL.HASWELL
6 years ago
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
6 years ago
Arjan van de Ven
d321448a63
dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell
The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives
a nice performance boost for medium sized matrices
7 years ago
Arjan van de Ven
c43331ad0a
dgemm: Use the skylakex beta function also for haswell
it's more efficient for certain tall/skinny matrices
7 years ago
Arjan van de Ven
0586899a10
Use sgemm_ncopy_4_skylakex.c also for Haswell
sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the
real perf win happens; this also works great for Haswell.
This gives double digit percentage gains on small and skinny matrices
7 years ago
Arjan van de Ven
00dc09ad19
Use the skylake sgemm beta code also for haswell
with a few small changes it's possible to use the skylake sgemm code
also for haswell, this gives a modest gain (10% range) for smallish
matrixes but does wonders for very skinny matrixes
7 years ago
Martin Kroeker
28c3fa8950
Add dsdot
8 years ago
Werner Saar
c8f2c5d636
added optimized trsm_kernels
10 years ago
Werner Saar
e7c969e164
added optimized dtrmm_kernel for haswell
10 years ago
Werner Saar
9bd962f655
modified haswell parameter dgemm_unroll_n
10 years ago
Werner Saar
31c9e399e9
added optimized cscal kernel for haswell
10 years ago
Werner Saar
d63034303b
added optimized zscal kernel for haswell
10 years ago
Werner Saar
02e772c7e4
added optimized dscal kernel for haswell
10 years ago
Werner Saar
1c4b0eeae3
added optimized ssymv kernels for haswell
10 years ago
Werner Saar
3814bf60d3
added optimized dsymv kernels for haswell
10 years ago
Werner Saar
6d0db0151f
added optimized zaxpy-kernels
10 years ago
Werner Saar
248c9340c3
added optimized caxpy-kernel for haswell
10 years ago
Werner Saar
fd838c75bc
add optimized cdot- and zdot-kernel for haswell
10 years ago
Werner Saar
53bb924287
added optimized saxpy- and daxpy-kernel for haswell
10 years ago
Werner Saar
701b9d7556
added optimized sdot- and ddot-kernel for HASWELL
10 years ago
wernsaar
8f100a14f2
optimized cgemv_t kernel for haswell
11 years ago
wernsaar
1a352b24e6
updated KERNEL.HASWELL
11 years ago
wernsaar
e0192a6914
bugfix in zgemv_n_4.c
11 years ago
wernsaar
baa46e4fba
added and tested optimized dgemv_n kernel for haswell
11 years ago
wernsaar
debc6d1a05
bugfix in KERNEL.HASWELL
11 years ago
wernsaar
e73a0113ec
added optimized gemv kernels
11 years ago
wernsaar
80f7786875
enabled optimized sgemv kernels for piledriver
11 years ago
wernsaar
d143f84dd2
added optimized sgemv_n kernel for haswell
11 years ago
wernsaar
11eab4c019
added optimized cgemv_n for haswell
11 years ago
wernsaar
4568d32b6b
added optimized cgemv_t kernel for haswell
11 years ago
wernsaar
8c582d362d
optimized zgemv_t_microk_haswell-2.c
11 years ago
wernsaar
11e34ddd1b
bugfix for zgemv_n_microk_haswell-2.c
11 years ago
wernsaar
58b075daef
added optimized zgemv_t kernel for haswell
11 years ago
wernsaar
dbc2eff029
disabled optimized haswell zgemv_n kernel for windows ( bad rounding )
11 years ago
wernsaar
462b4885ff
added optimized zgemv_n kernel for haswell
11 years ago
wernsaar
006ef3ea01
added optimized dgemv_t kernel for haswell
11 years ago
wernsaar
60f17628cc
added optimized dgemv_n kernel for haswell
11 years ago
wernsaar
7aa43c8928
enabled optimized sgemv kernels for windows
11 years ago
wernsaar
95a8caa2f3
added optimized sgemv_t kernel
11 years ago
wernsaar
2bab92961f
enabled optimized sgemv_n kernels for windows
11 years ago
wernsaar
3fbc13eb65
modified sgemv_n for haswell
11 years ago
wernsaar
6acbafe45b
added sgemv_n microkernel for haswell
11 years ago