Chip Kerchner
9ac0fb0111
Merge branch 'develop' into vectorizeBF16GEMV
1 year ago
Martin Kroeker
624e9d110e
Merge pull request #4916 from martin-frbg/issue4901
Fix SIGILL/SIGSEGV in PPCG4 SGEMM and fix NAN handling in PPCG4 SSCAL/DSCAL
1 year ago
Martin Kroeker
d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds
1 year ago
Martin Kroeker
7c4f3638fd
switch PPCG4 SGEMM kernel to 4x4
1 year ago
Chip Kerchner
915a6d6e44
Add casting.
1 year ago
Chip Kerchner
7ec3c16d82
Remove beta from optimized functions.
1 year ago
Martin Kroeker
54afc24e4d
Merge pull request #4906 from XiWeiGu/arm64_cmake_small_matrix_opt
ARM64: Enable SMALL_MATRIX_OPT when compiling with CMake
1 year ago
Martin Kroeker
b4495a8fb8
Merge branch 'develop' into arm64_cmake_small_matrix_opt
1 year ago
Martin Kroeker
68eefe60b9
Merge pull request #4915 from martin-frbg/issue4907
Support LoongArch64 compilation with LLVM
1 year ago
Martin Kroeker
4f00f02567
Do not add -mabi flags for Loongson when the compiler is flang
1 year ago
Martin Kroeker
f817f26062
Add simpler EPILOGUE for clang
1 year ago
Martin Kroeker
a492181665
filter out Loongarch -mabi options for flang-new
1 year ago
Martin Kroeker
de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
1 year ago
Martin Kroeker
edaf5933c4
Merge pull request #4913 from martin-frbg/issue4912
Declare the input array in CBLAS_?GEADD as const in cblas.h
1 year ago
Martin Kroeker
71131406ae
Declare the input array in CBLAS_?GEADD as const
1 year ago
Chip Kerchner
7cc00f68c9
Remove more duplicate.
1 year ago
Chip Kerchner
e238a68c03
Remove duplicate.
1 year ago
Martin Kroeker
f10d47c4bb
Merge pull request #4910 from martin-frbg/issue4908
fix placement of -fopenmp in the pkgconfig file
1 year ago
Chip Kerchner
32095b0cbb
Remove parameter.
1 year ago
Martin Kroeker
a1073f5eed
Merge pull request #4900 from XiWeiGu/la64_core_rename
LoongArch64: Rename core
1 year ago
Martin Kroeker
fa77561396
add openmp option to pkgconfig template
1 year ago
Martin Kroeker
176107d23a
Add -fopenmp to cflags in pkgconfig file if set
1 year ago
Martin Kroeker
0228d36211
move -fopenmp to CFLAGS
1 year ago
gxw
7087b0a7d0
ARM64: Enable SMALL_MATRIX_OPT when compiling with CMake
1 year ago
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Chip Kerchner
c8788208c8
Fixing block issue with transpose version.
1 year ago
Chip Kerchner
d7c0d87cd1
Small changes.
1 year ago
Chip Kerchner
eb6f3a05ef
Common MMA code.
1 year ago
Chip Kerchner
fb287d17fc
Common code.
1 year ago
Chip Kerchner
8ab6245771
Small change.
1 year ago
Chip Kerchner
df19375560
Almost final code for MMA.
1 year ago
Chip Kerchner
05aa63e738
More MMA BF16 GEMV code.
1 year ago
Chip Kerchner
c9ce37d527
Force vector pairs in clang.
1 year ago
Chip Kerchner
89a12fa083
MMA BF16 GEMV code.
1 year ago
Martin Kroeker
92f7a2dc3e
Merge pull request #4899 from martin-frbg/flangmtune
Strip any mtune option from FFLAGS is the compiler is flang-new
1 year ago
Martin Kroeker
969bb949b1
Strip any mtune option from FFLAGS is the compiler is flang-new
1 year ago
Martin Kroeker
fca86e359c
Merge pull request #4887 from goplanid/develop
Small GEMM improvements for AArch64 with SVE
1 year ago
Chip Kerchner
7947970f9d
Move common code.
1 year ago
Martin Kroeker
60c1519e01
Merge pull request #4896 from martin-frbg/update_azure_mac_hpc
AzureCI: Update Intel oneAPI download for Mac to final version
1 year ago
Martin Kroeker
c8313d9d80
Merge pull request #4895 from martin-frbg/update_homebrewjob
CI: Update nightly-homebrew workflow
1 year ago
Martin Kroeker
b588e922a1
Update oneAPI download location for Mac to final
1 year ago
Martin Kroeker
4178905fa7
Update version of upload-artifacts following deprecation
1 year ago
Martin Kroeker
5f70e245a2
Merge pull request #4894 from martin-frbg/issue4893
Fix function definition in the f2c-converted ctest and remove suppression of gcc14 error
1 year ago
Martin Kroeker
383e0b133e
remove suppression of gcc14's incompatible pointer error
1 year ago
Martin Kroeker
869a169c57
Fix ZAXPYTEST prototype
1 year ago
Chip Kerchner
72216d28c2
Fix bug with inc_y adding results twice.
1 year ago
Chip Kerchner
2f142ee857
More common code.
1 year ago
Chip Kerchner
39fd29f1de
Minor improvement and turn off BF16 GEMV forwarding by default.
1 year ago
Chip Kerchner
8541b25e1d
Special case beta is one.
1 year ago