xoviat
2758d0cbb6
[gensymbol] switch symbol
8 years ago
xoviat
6cc42a4caa
[appveyor] move copy
8 years ago
xoviat
9b981b9b7c
[appveyor] fix
8 years ago
xoviat
51da07f676
[appveyor] copy libs
8 years ago
xoviat
09c8fce898
[appeyor] find ws2_32.lib
8 years ago
xoviat
f12152b9ea
[cmake] add library directories
8 years ago
xoviat
de1695648e
[f_check] revert changes
8 years ago
xoviat
c0f50ced48
[f_check] add library directories
8 years ago
xoviat
2ad344a046
[cmake] remove force compiler
8 years ago
xoviat
b498dada26
[appveyor] fix conda
8 years ago
xoviat
29ea2568cf
[appveyor] activate conda
8 years ago
xoviat
5b5bd87ee3
[appveyor] add flang
8 years ago
Martin Kroeker
ab87ee6b48
Merge pull request #1329 from martin-frbg/dsdot
(Trivial) optimized dsdot implementation for HASWELL
8 years ago
Martin Kroeker
a07807caac
Eliminate loop code when called as/from dsdot
8 years ago
Martin Kroeker
b71f4fe681
Merge pull request #1334 from ashwinyes/develop_aarch64_20171024_addlocallabels
ARM64: Convert all labels to local labels
8 years ago
Ashwin Sekhar T K
a0128aa489
ARM64: Convert all labels to local labels
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.
To avoid this, all the labels within the kernels are changed to local
labels.
8 years ago
Martin Kroeker
627133f9ad
Merge pull request #1333 from martin-frbg/haswell32
Fix 32bit HASWELL builds
8 years ago
Martin Kroeker
0e2cf102e1
Fix 32bit HASWELL
8 years ago
Martin Kroeker
5e3e91d0fc
Split the microkernel workload into chunks of 32 floats for dsdot mode to limit loss of precision
8 years ago
Martin Kroeker
28c3fa8950
Add dsdot
8 years ago
Martin Kroeker
8ac87c1cb6
Implement DSDOT with unchanged sdot microkernels
8 years ago
Martin Kroeker
b7cee00455
Merge pull request #1327 from martin-frbg/cmake-relapack
Make ReLAPACK available in cmake builds
8 years ago
Martin Kroeker
962b20a9bb
Optionally add ReLAPACK to LIB_COMPONENTS
8 years ago
Martin Kroeker
fbf83f4833
Add cmake build list file for ReLAPACK
8 years ago
Martin Kroeker
78cec6209c
Add ReLAPACK option
8 years ago
Martin Kroeker
c460027dbe
Merge pull request #1325 from grisuthedragon/patch-1
Update README.md to include POWER8
8 years ago
Martin Köhler
bfa9b9f6b2
Update README.md
Add POWER 8 to the list of additional architectures.
8 years ago
Martin Kroeker
c7a8512d12
Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names ( #1323 )
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295 )
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
8 years ago
Martin Kroeker
db72ad8f6a
Merge pull request #1320 from timmoon10/develop
2D thread distribution for multi-threaded GEMMs
8 years ago
Martin Kroeker
97ecd4996a
Merge pull request #1319 from martin-frbg/issue601
Fix out-of-bounds memory accesses exposed by xccblat3 testcase
8 years ago
Martin Kroeker
1eb43cccad
Merge pull request #1317 from martin-frbg/power8-asm
Save and restore VSX registers
8 years ago
Martin Kroeker
9d92f526dd
Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
8 years ago
Martin Kroeker
514d237257
Merge pull request #1279 from xsacha/develop
CMake improvements
8 years ago
Tim Moon
30486a356c
Reduce number of data partitions in n.
8 years ago
Martin Kroeker
e1b2502840
Merge pull request #1316 from timmoon10/develop
Variable thread count for multi-threaded GEMMs
8 years ago
Tim Moon
9de52b489a
Cleaning up and documenting multi-threaded GEMM code.
8 years ago
Tim Moon
860dcfc703
Use 2D thread distribution for small GEMMs.
Allows maximum use of available cores if one of M and N is small and the other is large.
8 years ago
Martin Kroeker
f96afd94b0
Fix out-of-bounds accesses where the data should be zero anyway
8 years ago
Martin Kroeker
ebe84215e4
Merge pull request #1318 from pv/potrf-smoketest
Add trivial smoketest for xpotrf
8 years ago
Pauli Virtanen
845e6d750f
Add trivial smoketest for xpotrf
8 years ago
Tim Moon
a89d6711c6
Increasing flexibility of GEMM benchmark.
m, n, and k can be set to arbitrary constants. A and B matrices can be transposed independently.
8 years ago
Martin Kroeker
9c017a2218
Save and restore VSX registers
8 years ago
Tim Moon
0e6b11b708
Merge https://github.com/timmoon10/OpenBLAS into develop
8 years ago
Tim Moon
6aaa107865
Reducing threads for multi-threaded GEMMs on small matrices.
8 years ago
Martin Kroeker
00c42dc815
Merge pull request #1314 from martin-frbg/nofortran-fix-2
Rewrite NOFORTRAN conditionals
8 years ago
Martin Kroeker
79e754e548
Rewrite NOFORTRAN conditionals
... so that they do not trigger accidentally when NOFORTRAN is empty/unset
8 years ago
Martin Kroeker
2ccd7f6e0c
Merge pull request #1310 from sva-img/develop
Added mips I6500 core
8 years ago
Shivraj Patil
e3d844b062
Added mips I6500 core
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
8 years ago
Martin Kroeker
def146efed
Merge pull request #1308 from sebastien-villemot/develop
Add support for TARGET=ZARCH_GENERIC and TARGET=Z13
8 years ago
Sébastien Villemot
7543e578a4
Add support for TARGET=ZARCH_GENERIC and TARGET=Z13
8 years ago