wernsaar
cf5544b417
optimization for small size
11 years ago
wernsaar
d143f84dd2
added optimized sgemv_n kernel for haswell
11 years ago
wernsaar
7794237475
undef WHEREAMI
11 years ago
wernsaar
a64fe9bcc9
added optimized sgemv_n kernel for sandybridge
11 years ago
wernsaar
2021d0f9d6
experimentally removed expensive function calls
11 years ago
wernsaar
6df7a88930
optimized sgemv_t for sandybridge
11 years ago
wernsaar
53de943690
bugfix for sgemv_n_4.c
11 years ago
wernsaar
7f910010a0
optimized sgemv_n kernel for small sizes
11 years ago
wernsaar
3a5d8dbff9
optimized sgemv_n_4.c
11 years ago
wernsaar
2a60c6d4b0
optimized sgemv_n for small sizes
11 years ago
wernsaar
0fc560ba23
bugfix for buffer overflow
11 years ago
wernsaar
d1800397f5
optimized interface/gemv.c for multithreading
11 years ago
wernsaar
f4ff889491
updated interface/gemv.c for multithreading
11 years ago
wernsaar
210bec9111
added plot-header to compare multithreading
11 years ago
wernsaar
f3b50dcf5b
removed obsolete instructions from sgemv_t_4.c
11 years ago
wernsaar
93eaba959d
optimized sgemv_t for bulldozer
11 years ago
wernsaar
9570e56965
optimized sgemv_t_4.c for small sizes
11 years ago
wernsaar
d7f91f8b4f
extended gemv.c benchmark
11 years ago
wernsaar
53f1277b6b
modified benchmark/gemv.c
11 years ago
wernsaar
bc99faef1b
optimized sgemv_t_4.c for uneven sizes
11 years ago
wernsaar
848c0f16f7
optimized sgemv_t_4.c for small size
11 years ago
wernsaar
e2fc8c8c2c
changed 1 test value (bug in lapack-testing?)
11 years ago
wernsaar
53e6dbf6ca
optimized sgemv_t kernel for small sizes
11 years ago
Zhang Xianyi
868f8a8756
Merge pull request #443 from idunham/fix
Workaround PIC limitations in cpuid.
11 years ago
Isaac Dunham
db7e6366cd
Workaround PIC limitations in cpuid.
cpuid uses register ebx, but ebx is reserved in PIC.
So save ebx, swap ebx & edi, and return edi.
Copied from Igor Pavlov's equivalent fix for 7zip (in CpuArch.c),
which is public domain and thus OK license-wise.
11 years ago
Zhang Xianyi
2702323f7d
Merge pull request #440 from wernsaar/develop
optimizations for leve1 and level2 blas functions
11 years ago
wernsaar
20cd850125
modification for clang compiler
11 years ago
wernsaar
5fa6158731
renoved flag no-integrated-as, because not working on macosx
11 years ago
wernsaar
84badf8086
EXPERIMENTAL: added the flag -no-integrated-as for clang compiler in Makefile.system
11 years ago
Zhang Xianyi
c8cc4a0d22
Fixed the typo in Changelog.txt
11 years ago
wernsaar
3885eebdb8
added optimized zaxpy bulldozer kernel
11 years ago
wernsaar
ee74445155
added optimized caxpy kernel for bulldozer
11 years ago
wernsaar
9d2ace8bac
added optimized daxpy kernel for bulldozer
11 years ago
wernsaar
b55f997302
added optimized daxpy kernel for nehalem
11 years ago
wernsaar
29125864b3
updated gemm.c
11 years ago
wernsaar
e45c960c2c
added optimized saxpy kernel for nehalem
11 years ago
wernsaar
55e81da379
added axpy benchmark-test
11 years ago
wernsaar
ac76b6267f
added optimized dgemv_n kernel for nehalem
11 years ago
wernsaar
f1b96c4846
added optimized ddot kernel for bulldozer
11 years ago
wernsaar
16d6be852d
added optimized ddot kernel for nehalem
11 years ago
wernsaar
53ec5789e2
bugfix for Makefile
11 years ago
wernsaar
95a707ced3
update of KERNEL.BULLDOZER
11 years ago
wernsaar
5d97b0754c
added optimized sdot kernel for nehalem
11 years ago
wernsaar
8a9e868919
added optimized sdot for bulldozer
11 years ago
wernsaar
7e404de3de
bugfix in Makefile
11 years ago
wernsaar
e4472ad850
added sdot and ddot benchmarks
11 years ago
wernsaar
fb0b4552a5
added hemv benchmark
11 years ago
wernsaar
6f73ffc114
added benchmarks for csymv and zsymv
11 years ago
wernsaar
c8b0645266
added optimized symv_L kernels for nehalem
11 years ago
wernsaar
ec05ff3f64
added optimized ssymv_L kernel for bulldozer
11 years ago