wjc404
f6fcbd7906
Fix performance bug when LDC is a multiple of 1024
6 years ago
wjc404
b0558c11b9
Update param.h
6 years ago
wjc404
f566787e6e
Update KERNEL.SKYLAKEX
6 years ago
wjc404
e3368cbf18
AVX512 STRMM kernel
6 years ago
wjc404
3447d04eaf
Update dgemm_kernel_16x2_skylakex.c
6 years ago
wjc404
8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
4e00d96a78
Update dgemm_kernel_16x2_skylakex.c
6 years ago
wjc404
096da2f51a
Update dgemm_kernel_16x2_skylakex.c
6 years ago
wjc404
2f96a2c55b
Update trmm_R.c
6 years ago
wjc404
833bd0f8ff
Update trmm_L.c
6 years ago
wjc404
77b8f49556
Update level3_thread.c
6 years ago
wjc404
1c3e20ce48
Update level3.c
6 years ago
wjc404
83b6be7976
Update param.h
6 years ago
wjc404
081b188529
Update KERNEL.SKYLAKEX
6 years ago
wjc404
f3f969f681
Update param.h
6 years ago
wjc404
8019e70211
AVX512 16x2 DGEMM kernel
6 years ago
Martin Kroeker
8d2a796f49
Merge pull request #2378 from martin-frbg/issue2377
Add -march option for AVX512 in cmake as well
6 years ago
Martin Kroeker
8dc9fd4dfe
Add -march option for AVX512
6 years ago
Martin Kroeker
abc67bdd74
Merge pull request #2375 from ewanglong/master
fix a few performance drop in some matrix size per data type
6 years ago
Martin Kroeker
1f62a82789
Merge pull request #2376 from wjc404/develop
Fix remaining bugs in parallel GEMM3M
6 years ago
wjc404
e9fb8f62b1
Update level3_gemm3m_thread.c
6 years ago
Wang,Long
fbf4f48f4a
fix a few performance drop in some matrix size per data type
Signed-off-by: Wang,Long <long1.wang@intel.com>
6 years ago
Martin Kroeker
b9ad450295
Merge pull request #2373 from Qiyu8/optimize#gemmbeta
Optimize genenal Gemm Beta
6 years ago
Martin Kroeker
e011ad820a
Merge pull request #2372 from martin-frbg/winexit
Do not run any cleanup if the program is exiting anyway
6 years ago
Qiyu8
ff42e68652
Optimize genenal Gemm Beta
6 years ago
Martin Kroeker
23f322f997
Do not run any cleanup if the program is exiting anyway
From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain
6 years ago
Martin Kroeker
093d37de8d
Merge pull request #2371 from martin-frbg/issue2370
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
6 years ago
Martin Kroeker
d65e9a2bbd
Merge pull request #2253 from thrasibule/xerbla
fix error messages
6 years ago
Martin Kroeker
78100b8093
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
as suggested by hjmndv in #2370
6 years ago
Martin Kroeker
70f45749b9
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
6 years ago
wjc404
e5dcdeb550
Update sgemm_direct_skylakex.c
6 years ago
wjc404
952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c
6 years ago
wjc404
feaafbedd3
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
6 years ago
wjc404
1c67567008
improve skylakex paralleled sgemm performance
6 years ago
Martin Kroeker
4e979bf75b
Merge pull request #2366 from martin-frbg/install390
Add new file lapack.h from LAPACK 3.9.0 to installable headers
6 years ago
Martin Kroeker
daa4310db5
Install new lapack.h
new file in LAPACK 3.9.0, split off from lapacke.h
6 years ago
Martin Kroeker
b8f3605132
Merge pull request #23 from xianyi/develop
rebase
6 years ago
Martin Kroeker
b36018be6d
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
6 years ago
wjc404
3a100b2797
Update KERNEL.SKYLAKEX
6 years ago
Martin Kroeker
38742d5547
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
6 years ago
wjc404
bd4c032f52
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c
6 years ago
wjc404
9f5cdc49d4
Update CONTRIBUTORS.md
6 years ago
wjc404
b7b408a120
optimize AVX2 SGEMM
6 years ago
wjc404
92b10212de
optimize AVX2 SGEMM
6 years ago
wjc404
b73bf01378
optimize AVX2 SGEMM
6 years ago
wjc404
eb3c9f1db9
optimize AVX2 SGEMM
6 years ago
Martin Kroeker
fd2ff2714f
Merge pull request #2359 from martin-frbg/lapack-pr330
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
6 years ago
Martin Kroeker
2ea2bd99c7
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
from Reference-LAPACK PR 330
6 years ago
Martin Kroeker
fbb894948c
Merge pull request #22 from xianyi/develop
rebase
6 years ago