Martin Kroeker
|
6c77b5f267
|
Merge pull request #1369 from martin-frbg/dsdot
Add optimized dsdot to all other x86_64 kernels that use sdot.c
|
8 years ago |
Martin Kroeker
|
c92cd6d162
|
Add trivially optimized dsdot based on sdot
|
8 years ago |
Martin Kroeker
|
cae5d9a20b
|
Add trivially optimized dsdot based on sdot
|
8 years ago |
Martin Kroeker
|
3d891c3106
|
Add trivially optimized dsdot based on sdot
|
8 years ago |
Martin Kroeker
|
4fbdcfa823
|
Add trivially optimized dsdot based on sdot
|
8 years ago |
Martin Kroeker
|
1bb6a96ebc
|
Add trivially optimized dsdot based on sdot
|
8 years ago |
Martin Kroeker
|
6bd163f37a
|
Add trivially optimized dsdot based on sdot
|
8 years ago |
Martin Kroeker
|
f0333333d1
|
Add trivially optimized dsdot based on sdot
|
8 years ago |
Andrew
|
e89b979b2c
|
fix spurious compiler warning fix (no code change)
|
8 years ago |
Andrew
|
7e9b29b9b8
|
fix spurious compiler warning (no code change)
|
8 years ago |
Martin Kroeker
|
6157d0902a
|
Merge pull request #1358 from martin-frbg/unused_vars
Clean up spurious unused variables in the kernels
|
8 years ago |
Martin Kroeker
|
3fea849bbf
|
Remove unused variables from Haswell dtrmm and Bulldozer dtrsm
|
8 years ago |
Martin Kroeker
|
8f177621bc
|
Remove unused variables at0...at3 from ?symv_U
|
8 years ago |
Martin Kroeker
|
5f402b7759
|
Remove unused (loop?) variable j from the gemv_n_4 implementations
|
8 years ago |
Martin Kroeker
|
a07807caac
|
Eliminate loop code when called as/from dsdot
|
8 years ago |
Martin Kroeker
|
5e3e91d0fc
|
Split the microkernel workload into chunks of 32 floats for dsdot mode to limit loss of precision
|
8 years ago |
Martin Kroeker
|
28c3fa8950
|
Add dsdot
|
8 years ago |
Martin Kroeker
|
8ac87c1cb6
|
Implement DSDOT with unchanged sdot microkernels
|
8 years ago |
Isuru Fernando
|
505b218829
|
Merge remote-tracking branch 'upstream/develop' into dyn
|
8 years ago |
Isuru Fernando
|
1d1854032b
|
Add missing EXCAVATOR
|
8 years ago |
Isuru Fernando
|
2c51a990ac
|
Fix extra whitespaces. CMake parser macro fails with it
TODO: Fix the parser macro to strip trailing whitespaces
|
8 years ago |
Isuru Fernando
|
ca17b4b75c
|
Fix complex support for MSVC headers
|
8 years ago |
Denis Steckelmacher
|
c9ff735da6
|
Add ZEN support (tested for auto-detected static backend)
|
9 years ago |
Martin Kroeker
|
a6efabf155
|
Replace gnu _real_ , _imag_ extensions in initializers
|
9 years ago |
Martin Kroeker
|
dc34a0da96
|
Merge pull request #915 from mdong/small_fix_for_icc
remove input from clobbered list
|
9 years ago |
Martin Kroeker
|
4998e19869
|
Change file comments to work around clang 3.9 assembler bug
|
9 years ago |
Martin Kroeker
|
16446d1d23
|
Remove explicit include of complex.h
|
9 years ago |
mdong
|
098d8ec5d6
|
remove input from clobbered list
|
10 years ago |
Werner Saar
|
298b13bba4
|
updated some kernel files for EXCAVATOR
|
10 years ago |
Zhang Xianyi
|
f24d5307cf
|
Refs #834. Fix zgemv config bug on Steamroller.
|
10 years ago |
Zhang Xianyi
|
d4380c1fe4
|
Refs xianyi/OpenBLAS-CI#10 , Fix sdot for scipy test_iterative.test_convergence test failure on AMD bulldozer and piledriver.
|
10 years ago |
Werner Saar
|
faa5e2e5e3
|
FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c
|
10 years ago |
Werner Saar
|
fdf291be30
|
Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller
|
10 years ago |
Werner Saar
|
c99cc41cbd
|
Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller
|
10 years ago |
Werner Saar
|
acdff55a6a
|
Bugfix for ztrmv
|
10 years ago |
Zhang Xianyi
|
7d6b68eb4a
|
Refs #786. Revert to default assembly kernel.
|
10 years ago |
Zhang Xianyi
|
8f758eeff9
|
Refs #786. avoid old assembly c/zgemv kernels.
|
10 years ago |
Zhang Xianyi
|
efa4f5c936
|
Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.
|
10 years ago |
Zhang Xianyi
|
6e7be06e07
|
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
|
10 years ago |
Zhang Xianyi
|
962376664d
|
Refs #768. Swap the result of zdot x87 fp kernel.
|
10 years ago |
Zhang Xianyi
|
c44ff4d648
|
Refs #714. avoid compiling warnings.
|
10 years ago |
Werner Saar
|
c8f2c5d636
|
added optimized trsm_kernels
|
10 years ago |
Zhang Xianyi
|
69363622a8
|
Fix DYNAMIC_ARCH=1 bug.
|
10 years ago |
Zhang Xianyi
|
f874465bb8
|
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
Disable CBLAS and LAPACK.
|
10 years ago |
Zhang Xianyi
|
ab0a0a75fc
|
Merge branch 'develop' into cmake
|
10 years ago |
Zhang Xianyi
|
1cf2b10224
|
Use pure C generic target on x86 and x86_64.
make TARGET=GENERIC
?gemm3m is unimplemented on generic target.
|
10 years ago |
Zhang Xianyi
|
7ac7e147d4
|
Fixed cmake building bugs on Linux. Disable LAPACK by default.
|
10 years ago |
Werner Saar
|
e7c969e164
|
added optimized dtrmm_kernel for haswell
|
11 years ago |
Werner Saar
|
9bd962f655
|
modified haswell parameter dgemm_unroll_n
|
11 years ago |
Werner Saar
|
24f58c8bb1
|
added optimized cscal and zscal kernels for steamroller
|
11 years ago |