Rohit Goswami
|
f30f33bed3
|
MAINT: Add more symbols for the test
|
1 year ago |
Rohit Goswami
|
a91f69940b
|
MAINT: Quick and dirty working set of symbols
Well working as in has enough symbols for the import, currently is
failing NumPy tests, including dot matrix multiplications...
|
1 year ago |
Rohit Goswami
|
91feee93e2
|
ENH: Add TRMM_KERNEL bindings
|
1 year ago |
Rohit Goswami
|
43eff1b4f8
|
BLD: Add L3 driver symbols
Bringing us up to 2247 symbols..
|
1 year ago |
Rohit Goswami
|
fa33385f05
|
MAINT: Add syrk
|
1 year ago |
Rohit Goswami
|
54cb61b0ae
|
MAINT: Rework to use the ext_l3 mapping
|
1 year ago |
Rohit Goswami
|
83c6c673b2
|
MAINT: Use syrk as an exception
|
1 year ago |
Rohit Goswami
|
076e80fd63
|
ENH: Start adding L3 driver symbols
|
1 year ago |
Rohit Goswami
|
325c8721b5
|
MAINT: Start adding L3
|
1 year ago |
Rohit Goswami
|
1178c0f53c
|
ENH: Compile all L2 drivers
|
1 year ago |
Rohit Goswami
|
b0b39f30ae
|
ENH: Add in the rest of the level2 symbols
|
1 year ago |
Rohit Goswami
|
b7b42ac7b8
|
MAINT: Start working on kernels and driver L2
|
1 year ago |
Martin Kroeker
|
d0b9948b23
|
Guard against invalid thread_status.queue
|
1 year ago |
Martin Kroeker
|
7e9a4ba427
|
Merge pull request #4741 from shivammonaka/Pthread_Scalability_Improvement
Enhancing Core Utilization in BLAS Calls: A Scalable Architecture
|
1 year ago |
Martin Kroeker
|
9b2a0c79cb
|
Add Zhaoxin KX7000
|
1 year ago |
shivammonaka
|
9e22d70957
|
Dynamic locking in Pthread Backend to allow multiple BLAS calls to be executed parallelly
|
1 year ago |
Martin Kroeker
|
db070a9223
|
add gemm_batch drivers
|
1 year ago |
Martin Kroeker
|
d0794f88dc
|
add gemm_batch driver
|
1 year ago |
Martin Kroeker
|
0073affe63
|
Merge pull request #4693 from goplanid/locks-improvement
Lock Management Improvements for Memory Allocation Efficiency
|
1 year ago |
Martin Kroeker
|
6ca9ffa7f5
|
Merge pull request #4655 from yamazakimitsufumi/update_2d_thread_distribution
Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance
|
1 year ago |
Deeksha Goplani
|
0dc80a5c8d
|
locks improvement
|
1 year ago |
Martin Kroeker
|
8da6f7e5f2
|
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
|
1 year ago |
gxw
|
637c650f4f
|
loongarch64: Add buffer offset for target LOONGSON3R5
|
1 year ago |
Martin Kroeker
|
5500b4ab26
|
Merge pull request #4680 from theAeon/develop
Expose whether locking is enabled in get_config
|
1 year ago |
Martin Kroeker
|
f0f1ff7820
|
fix HUGETLB allocation for TLS mode as well
|
1 year ago |
Andrew Robbins
|
edfe1aa471
|
Expose whether locking is enabled in get_config
|
1 year ago |
Martin Kroeker
|
dc99b61380
|
sort unwanted interdependencies of alloc_shm and alloc_hugetlb
|
1 year ago |
Martin Kroeker
|
ddcd7d6fa8
|
Merge branch 'develop' into Threading_Callback
|
1 year ago |
yamazaki-mitsufumi
|
51ab1903e7
|
Expanding the scop of 2D thread distribution
|
1 year ago |
gxw
|
d8c4ea8793
|
loongarch: Optimizing the performance of the GEMM on servers
|
1 year ago |
shivammonaka
|
7102367fde
|
Introduced callback to Pthread, Win32 and OpenMP backend
|
2 years ago |
Mark Seminatore
|
b0ad8a78ff
|
code to fix lost work in case of re-entrant calls to exec_blas_async()
|
1 year ago |
Martin Kroeker
|
88b5330ae7
|
Restore outer loop of blas_buffer_inuse setup
|
1 year ago |
shivammonaka
|
d49ebc54e1
|
Merge branch 'shivam-develop' into shivam-Locks
|
1 year ago |
shivammonaka
|
bc191015e3
|
Using OpenMP locks with NUM_PARALLEL
|
1 year ago |
Mark Seminatore
|
b29fd48998
|
Merge branch 'develop' into win_tidy
|
2 years ago |
Mark Seminatore
|
98c56a7314
|
more cleanup
|
2 years ago |
Chip Kerchner
|
d408ecedba
|
Add environment variable to display coretype for dynamic arch.
|
2 years ago |
Chip Kerchner
|
ac6b4b7aa4
|
Make sure CPU ID works for all POWER_10 conditions
|
2 years ago |
Chip Kerchner
|
08ce6b1c1c
|
Add missing CPU ID definitions for old versions of AIX.
|
2 years ago |
Martin Kroeker
|
a4fde2c5ac
|
Merge pull request #4451 from martin-frbg/overflow_reset
Reset "buffer management structure overflowed" state and free auxiliary struct on blas_shutdown
|
2 years ago |
Martin Kroeker
|
e61d96303d
|
Fix missing NO_AVX2 fallback for SapphireRapids
|
2 years ago |
Mark Seminatore
|
42cb567f0f
|
more cleanup
|
2 years ago |
Mark Seminatore
|
0d7fe5ea61
|
clean up whitespace
|
2 years ago |
Martin Kroeker
|
d938aed7fe
|
reset "mem structure overflowed" state on shutdown
|
2 years ago |
Chris Sidebottom
|
aaf65210cc
|
Add dynamic support for Arm(R) Neoverse(TM) V2 processor
Whilst I figure out how best to map the L2 parameters without
duplicating all of `ARMV8SVE`, lets just map this to `NEOVERSEV1`.
|
2 years ago |
Martin Kroeker
|
152a6c43b6
|
Add blas_omp_threads_local
|
2 years ago |
Martin Kroeker
|
8a9d492af7
|
Add default for blas_omp_threads_local
|
2 years ago |
Martin Kroeker
|
87d31af2ae
|
Add openblas_set_num_threads_local()
|
2 years ago |
Martin Kroeker
|
e7a895e714
|
Add Apple M as NeoverseN1
|
2 years ago |