Martin Kroeker
965463f177
Include float-bfloat conversion functions in ONLY_CBLAS builds as well
6 months ago
youcai
41f9701ebc
Fix cmake building with cblas_bgemm
6 months ago
Martin Kroeker
30dbca5051
fix misleading indentation to silence a gcc warning
6 months ago
Martin Kroeker
39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta
SME1 based direct kernel (with alpha and beta) for cblas_sgemm level 3
6 months ago
Rajendra Prasad Matcha
eae0abfdb6
SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API.
6 months ago
Chris Sidebottom
947d7af4c9
Fix CMake references to bscal and bgemv
6 months ago
Chris Sidebottom
e105411460
Add infrastructure for bgemv/bscal
- Sets up all the various entrypoints for `bgemv`
- Adds `bscal` for use in the `bgemv` interface
- Adds test cases for comparing `sgemv` and `bgemv`
- Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate
enough to pass above tests
6 months ago
Chris Sidebottom
740efd71c4
Add optimized BGEMM kernel for NEOVERSEV1 target
This also improves the testing and generic kernel by re-using the BF16
conversion functions.
Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com>
6 months ago
Chris Sidebottom
66d9185ebe
Fix CMake support
6 months ago
Chris Sidebottom
f95e7b0e32
Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.
Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com>
6 months ago
Usui, Tetsuzo
14107e37d9
Add parallel laed3
7 months ago
Martin Kroeker
d96daa220d
Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
7 months ago
Srangrang
ec14e1648c
fix: resolve non-RISCV host build failed issue
- adjust interface to disable "small matrix" pathway
- separate HFLOAT16 from BFLOAT16
- remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions
Related to PR#5290
Co-authored-by Martin
7 months ago
Martin Kroeker
5e393f207c
fix source file used for sbgemmt/sbgemmtr
7 months ago
Martin Kroeker
11ff18bb0f
Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal
kernel/generic: Fixed cscal and zscal
7 months ago
gkdddd
670ec6f757
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0
Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
7 months ago
Martin Kroeker
42b7d1f897
Fix addressing of alpha in CBLAS
8 months ago
Martin Kroeker
6680e0592f
Fix conditional inclusion of SGEMM_KERNEL_DIRECT
8 months ago
Martin Kroeker
70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
9 months ago
Ruiyang Wu
02fd1df10b
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
10 months ago
Martin Kroeker
51c1fb1f93
Fix ?spmv build and misinterpretation of NO_LAPACK=0
10 months ago
shubham.chaudhari
8e289ecddc
Simplified thread throttling function in gemv
10 months ago
shubham.chaudhari
189dbbc04f
Add thread throttling for dynamic arch neoversev1
11 months ago
shubham.chaudhari
b6cb5ece58
Add thread throttling profile for DGEMV on NEOVERSEV1
11 months ago
Martin Kroeker
7338a473a7
Merge pull request #5150 from Harishmcw/WoA-Experiments
Redefined threading logic for GESV and GEMV on WoA
11 months ago
Martin Kroeker
09ba099461
make throttling code conditional on SMP
11 months ago
Harishmcw
030ae1fd97
Redefined threading logic for WoA
11 months ago
Martin Kroeker
c03a81b927
Merge pull request #5141 from michalowski-arm/fork-throttle
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
11 months ago
Martin Kroeker
75b958a018
Transform the B array back if necessary before returning
11 months ago
Marek Michalowski
650a062e19
Add thread throttling profile for SGEMV on `NEOVERSEV2`
11 months ago
Marek Michalowski
b723c1b7b7
Add thread throttling profile for SGEMM on `NEOVERSEV2`
11 months ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
11 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
1 year ago
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
11 months ago
Martin Kroeker
60d0be0e97
Update nrm2.c
11 months ago
Martin Kroeker
0fd5448b2c
Handle INCX=0
11 months ago
Martin Kroeker
db7e5f1fa7
Update gemmt.c
11 months ago
Martin Kroeker
ff30ac9666
Update Makefile
11 months ago
Martin Kroeker
7c3e169b67
Update gemmt.c
11 months ago
Martin Kroeker
09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such
11 months ago
Marek Michalowski
838bb57e27
Merge branch 'develop' into develop
1 year ago
Martin Kroeker
a54f9a9c69
Merge pull request #5071 from annop-w/sgemm_throttling
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago
Marek Michalowski
4d5b13f765
Add thread throttling profile for SGEMV on `NEOVERSEV1`
1 year ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
gxw
e114880dc4
kernel/generic: Fixed cscal and zscal
1 year ago
Annop Wongwathanarat
c8cd8da496
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago
Martin Kroeker
a1075477c3
Merge pull request #4994 from martin-frbg/issue4886
Disable multithreading in ?TRTRI for small workloads
1 year ago
Martin Kroeker
0c440f8a27
disable multithreading for small workloads
1 year ago
Martin Kroeker
2a290dfc2c
forward GEMM3M calls for GENERIC targets to the regular C/ZGEMM for now
1 year ago
Martin Kroeker
0cf656fd3e
Add copies of GEMMT under its new name GEMMTR
1 year ago