Martin Kroeker
51c1fb1f93
Fix ?spmv build and misinterpretation of NO_LAPACK=0
1 year ago
shubham.chaudhari
8e289ecddc
Simplified thread throttling function in gemv
1 year ago
shubham.chaudhari
189dbbc04f
Add thread throttling for dynamic arch neoversev1
1 year ago
shubham.chaudhari
b6cb5ece58
Add thread throttling profile for DGEMV on NEOVERSEV1
1 year ago
Martin Kroeker
7338a473a7
Merge pull request #5150 from Harishmcw/WoA-Experiments
Redefined threading logic for GESV and GEMV on WoA
1 year ago
Martin Kroeker
09ba099461
make throttling code conditional on SMP
1 year ago
Harishmcw
030ae1fd97
Redefined threading logic for WoA
1 year ago
Martin Kroeker
c03a81b927
Merge pull request #5141 from michalowski-arm/fork-throttle
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
1 year ago
Martin Kroeker
75b958a018
Transform the B array back if necessary before returning
1 year ago
Marek Michalowski
650a062e19
Add thread throttling profile for SGEMV on `NEOVERSEV2`
1 year ago
Marek Michalowski
b723c1b7b7
Add thread throttling profile for SGEMM on `NEOVERSEV2`
1 year ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
1 year ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
1 year ago
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
1 year ago
Martin Kroeker
60d0be0e97
Update nrm2.c
1 year ago
Martin Kroeker
0fd5448b2c
Handle INCX=0
1 year ago
Martin Kroeker
db7e5f1fa7
Update gemmt.c
1 year ago
Martin Kroeker
ff30ac9666
Update Makefile
1 year ago
Martin Kroeker
7c3e169b67
Update gemmt.c
1 year ago
Martin Kroeker
09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such
1 year ago
Marek Michalowski
838bb57e27
Merge branch 'develop' into develop
1 year ago
Martin Kroeker
a54f9a9c69
Merge pull request #5071 from annop-w/sgemm_throttling
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago
Marek Michalowski
4d5b13f765
Add thread throttling profile for SGEMV on `NEOVERSEV1`
1 year ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Annop Wongwathanarat
c8cd8da496
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago
Martin Kroeker
a1075477c3
Merge pull request #4994 from martin-frbg/issue4886
Disable multithreading in ?TRTRI for small workloads
1 year ago
Martin Kroeker
0c440f8a27
disable multithreading for small workloads
1 year ago
Martin Kroeker
2a290dfc2c
forward GEMM3M calls for GENERIC targets to the regular C/ZGEMM for now
1 year ago
Martin Kroeker
0cf656fd3e
Add copies of GEMMT under its new name GEMMTR
1 year ago
Chris Daley
cb48505251
optimize gemv forwarding on ARM64 systems
1 year ago
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
1 year ago
Chip Kerchner
1d51ca5798
Change multi-threading logic for SBGEMV to be the same as SGEMV.
1 year ago
Martin Kroeker
9762464718
Fix CBLAS interface filling in the wrong triangle for Row-Major
1 year ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Martin Kroeker
7878976236
disable forwarding from SBGEMM to SBGEMV for now
1 year ago
Chris Sidebottom
b26424c6a2
Allow opt into GEMM -> GEMV forwarding
1 year ago
Chris Sidebottom
90eb863d4b
Re-add accidental removal
1 year ago
Chris Sidebottom
28b5334f22
Complete implementation of GEMV forwarding
1 year ago
Martin Kroeker
3db5dbc88e
forward to GEMV when one argument is actually a vector
1 year ago
gxw
f3cebb3ca3
x86: Fixed numpy CI failure when the target is ZEN.
1 year ago
Martin Kroeker
2f12a47405
fix build options for CAXPYC/ZAXPYC
1 year ago
Martin Kroeker
db9f7bc552
fix float array types to include bfloat16
1 year ago
Martin Kroeker
076766df4e
Update CMakeLists.txt
1 year ago
Martin Kroeker
ff6670cb83
don't generate non-cblas files for gemm_batch
1 year ago
Martin Kroeker
362a063396
remove return value
1 year ago
Martin Kroeker
89c7bbcba6
add cblas_?gemm_batch
1 year ago
Martin Kroeker
2957281275
Introduce a lower limit for multithreading
2 years ago
Martin Kroeker
5fd871d7ea
Introduce a lower limit for multithreading
2 years ago
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
2 years ago
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
2 years ago