Werner Saar
3814bf60d3
added optimized dsymv kernels for haswell
10 years ago
Werner Saar
6d0db0151f
added optimized zaxpy-kernels
10 years ago
Zhang Xianyi
37b9033c90
Merge pull request #543 from jeromerobert/develop
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
10 years ago
Werner Saar
13889515b3
added optimized caxpy-kernel for sandybridge
10 years ago
Werner Saar
248c9340c3
added optimized caxpy-kernel for haswell
10 years ago
Werner Saar
e9f33b4ca7
added optimized caxpy-kernel for steamroller
10 years ago
Werner Saar
f5d847122a
updated caxpy_microk_bulldozer-2.c and caxpy.c
10 years ago
Jerome Robert
a4c96eca67
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
Refs #478 , #482 , 9798481, fd9fd42
10 years ago
Werner Saar
baa0363ea2
add optimized ddot-kernel for piledriver
10 years ago
Werner Saar
34ba66606a
add optimized daxpy-kernel for piledriver
10 years ago
Werner Saar
f615dc7603
added optimized saxpy kernel for steamroller
10 years ago
Werner Saar
331c417637
optimized saxpy for piledriver
10 years ago
Werner Saar
d7a17ad85d
optimized sdot-kernel for pilediver
10 years ago
Werner Saar
d35f6c63c2
add optimized daxpy-kernel for steamroller
10 years ago
Werner Saar
166d76e864
added optimized sdot-kernel for steamroller
10 years ago
Werner Saar
f9f127d838
added optimized ddot kernel for steamroller
10 years ago
wernsaar
62231ab337
Merge pull request #538 from wernsaar/develop
Added optimized cdot- and zdot-kernels
10 years ago
Werner Saar
3119def9a7
updated cdot and zdot
10 years ago
Werner Saar
33b332372a
add optimized cdot- and zdot-kernel for sandybridge
10 years ago
Werner Saar
fd838c75bc
add optimized cdot- and zdot-kernel for haswell
10 years ago
Werner Saar
b57a60dac8
updated cdot and zdot for piledriver
10 years ago
Werner Saar
5c51163972
added optimized cdot- and zdot-kernel for steamroller
10 years ago
Werner Saar
9299d8cfd6
added optimized cdot- and zdot-kernels for bulldozer
10 years ago
Zhang Xianyi
0a3d3b945d
Refs #535 . Fix the wrong vector instruction in sgemm sandy bridge kernel.
10 years ago
Werner Saar
60c6dec6e6
updated some lines for bulldozer
10 years ago
Werner Saar
47898cca35
added optimized saxpy- and daxpy-kernel for sandybridge
10 years ago
Werner Saar
53bb924287
added optimized saxpy- and daxpy-kernel for haswell
10 years ago
Werner Saar
a901b065d3
added optimized ddot-kernel for sandybridge
10 years ago
Werner Saar
3937e2a0a0
add optimized sdot-kernel for sandybridge
10 years ago
Werner Saar
9707d608d5
removed double definition line
10 years ago
Werner Saar
701b9d7556
added optimized sdot- and ddot-kernel for HASWELL
10 years ago
Zhang Xianyi
e5b96e55a7
Fix build bug for ARM64.
11 years ago
Zhang Xianyi
ea7f9dacf4
Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.
11 years ago
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
11 years ago
Zhang Xianyi
229ce2ccd1
Add cortex-a9 and cortex-a15 targets.
11 years ago
Zhang Xianyi
41aad0407f
Merge pull request #482 from jeromerobert/develop
Allow to do gemv and ger buffer allocation on the stack
11 years ago
Werner Saar
ddf983d643
added optimizations for steamroller
11 years ago
Werner Saar
4319769b79
added target processor STEAMROLLER
11 years ago
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
11 years ago
Werner Saar
587e16fba3
Ref #458 : Backport, sandybrigde uses nehalem zgemm kernel
11 years ago
Werner Saar
6261342de3
small optimization on dgemm_kernel for N=1
11 years ago
Werner Saar
bc5fff7085
changed inline assembler labels to short form
11 years ago
Zhang Xianyi
0cf29ba6d2
Fixed a bug of sgemm sandy bridge kernel.
Reported by Julia project. JuliaLang/julia#9084
11 years ago
Zhang Xianyi
2fb02626da
Update organization info.
11 years ago
Zhang Xianyi
a85c2785ae
Refs #467 . Added generic kernel file for x86_64.
11 years ago
Benedikt Huber
58c90d5937
# The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).
1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.
Added Dave Nuechterlein to the contributors list.
11 years ago
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
11 years ago
wernsaar
b7c9566eea
removed obsolete gemv kernel files
11 years ago
wernsaar
6df1b0be81
optimized zgemv_n_microk_sandy-4.c
11 years ago
wernsaar
2ac1e076c1
added optimized zgemv_n kernel for sandybridge
11 years ago