Martin Kroeker
e011ad820a
Merge pull request #2372 from martin-frbg/winexit
Do not run any cleanup if the program is exiting anyway
6 years ago
Martin Kroeker
23f322f997
Do not run any cleanup if the program is exiting anyway
From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain
6 years ago
Martin Kroeker
093d37de8d
Merge pull request #2371 from martin-frbg/issue2370
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
6 years ago
Martin Kroeker
d65e9a2bbd
Merge pull request #2253 from thrasibule/xerbla
fix error messages
6 years ago
Martin Kroeker
78100b8093
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
as suggested by hjmndv in #2370
6 years ago
Martin Kroeker
70f45749b9
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
6 years ago
wjc404
e5dcdeb550
Update sgemm_direct_skylakex.c
6 years ago
wjc404
952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c
6 years ago
wjc404
feaafbedd3
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
6 years ago
wjc404
1c67567008
improve skylakex paralleled sgemm performance
6 years ago
Martin Kroeker
4e979bf75b
Merge pull request #2366 from martin-frbg/install390
Add new file lapack.h from LAPACK 3.9.0 to installable headers
6 years ago
Martin Kroeker
daa4310db5
Install new lapack.h
new file in LAPACK 3.9.0, split off from lapacke.h
6 years ago
Martin Kroeker
b8f3605132
Merge pull request #23 from xianyi/develop
rebase
6 years ago
Martin Kroeker
b36018be6d
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
6 years ago
wjc404
3a100b2797
Update KERNEL.SKYLAKEX
6 years ago
Martin Kroeker
38742d5547
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
6 years ago
wjc404
bd4c032f52
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
9f5cdc49d4
Update CONTRIBUTORS.md
6 years ago
wjc404
b7b408a120
optimize AVX2 SGEMM
6 years ago
wjc404
92b10212de
optimize AVX2 SGEMM
6 years ago
wjc404
b73bf01378
optimize AVX2 SGEMM
6 years ago
wjc404
eb3c9f1db9
optimize AVX2 SGEMM
6 years ago
Martin Kroeker
fd2ff2714f
Merge pull request #2359 from martin-frbg/lapack-pr330
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
6 years ago
Martin Kroeker
2ea2bd99c7
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
from Reference-LAPACK PR 330
6 years ago
Martin Kroeker
fbb894948c
Merge pull request #22 from xianyi/develop
rebase
6 years ago
Martin Kroeker
e711659c90
Merge pull request #2358 from shengyang-3390/develop
Test all 7 declared values of N in float and double cblas3 tests
6 years ago
shengyang
893e6e57c4
modified: ctest/din3 ctest/sin3
6 years ago
Martin Kroeker
456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
6 years ago
Martin Kroeker
9998f8ed8b
Merge pull request #2356 from shengyang-3390/develop
Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)
6 years ago
shengyang
80db5f11e1
update
6 years ago
chenxuqiang
52de4cc8fd
kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
6 years ago
Martin Kroeker
44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
6 years ago
Martin Kroeker
86ab939936
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
6 years ago
Martin Kroeker
375b1875c8
[WIP] Update LAPACK to 3.9.0 ( #2353 )
* Update make.inc entries for LAPACK 3.9.0
Reference-LAPACK PR 347 changed some variable names and relative paths
* Update LAPACK to 3.9.0
* Add new functions from LAPACK 3.9.0
* Add new functions from LAPACK 3.9.0
* Restore LOADER command
as it makes it easier to specify pthread as needed
* Restore LOADER
* Restore EIG/LIN prefixes in cmdbase
* add binary path to lapack_testing.py call
* Restore OpenMP version check
* Restore OpenMP version check
* Restore fix for out-of-bounds array accesses
from #2096
6 years ago
Martin Kroeker
6c85cb1869
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
6 years ago
Martin Kroeker
995768bbc5
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
6 years ago
int_13h
96ad579428
add in runtime cpu detection for zarch ( #2349 )
add in runtime cpu detection for zarch
6 years ago
shengyang
8d84403205
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
6 years ago
shengyang
8729db117c
modified: ctest/din3
modified: ctest/sin3
6 years ago
w00421467
0833a4846a
Use arm neon instructions to optimize sgemm_beta operation
6 years ago
zq
50f7fc1401
[WIP] Use arm neon instructions to optimize tcopy operation
6 years ago
w00421467
d1b53806be
Merge remote-tracking branch 'pub/develop' into develop
6 years ago
wjc404
a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c
6 years ago
wjc404
700fe5b5ee
Add files via upload
6 years ago
wjc404
bb2729c855
Update CONTRIBUTORS.md
6 years ago
wjc404
aae44d040d
Update CONTRIBUTORS.md
6 years ago
wjc404
6362c34ee6
Update param.h
6 years ago
wjc404
f60840c420
Update KERNEL.ZEN
6 years ago
wjc404
109e18cd96
Update KERNEL.HASWELL
6 years ago