Martin Kroeker
518e376820
Merge pull request #5090 from martin-frbg/cmakeutils
Fix CMake interpretation of KERNEL file variables relevant to WoA
1 year ago
Martin Kroeker
111c9b0733
Add translations for C_COMPILER and OSNAME
1 year ago
Martin Kroeker
4924319c50
fix position of srotm, qrotm
1 year ago
Martin Kroeker
b58cba9eb6
fix qrotm build rules
1 year ago
Marek Michalowski
4d5b13f765
Add thread throttling profile for SGEMV on `NEOVERSEV1`
1 year ago
gxw
2da86b80c9
LoongArch64: Fixed scalar version of cscal and zscal
1 year ago
gxw
5392f6df69
LoongArch64: Fixed LASX version of cscal and zscal
1 year ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Annop Wongwathanarat
c0318cea6e
Simplify gemv_t_sve_v1x3 kernel
1 year ago
gxw
b2117bb2ca
LoongArch64: Fixed LSX version of cscal and zscal
1 year ago
gxw
6954845d8d
utest: Add utest for {c/z}scal and {c/z}gemv
1 year ago
gxw
e114880dc4
kernel/generic: Fixed cscal and zscal
1 year ago
Martin Kroeker
76db346f7e
Merge pull request #5082 from martin-frbg/woa_cpuid
Get ARM64 TARGET information from the registry on Windows
1 year ago
Martin Kroeker
5f7b03a441
Merge pull request #5083 from martin-frbg/fixmips64ci
MIPS64 CI :fix breakage from inadvertent line join in yml file
1 year ago
Martin Kroeker
100e74d4d6
restore deleted line break
1 year ago
Martin Kroeker
ca3e1c8f9c
Get TARGET information from the registry on Windows
1 year ago
Martin Kroeker
87083fdbf6
[WIP] Work around assembler limitations in current LLVM for Windows on Arm ( #5076 )
* Protect align directives in assembly files that are currently problematic with LLVM on WoA
* use the armv8 zdot on WoA to work around other LLVM issues
1 year ago
Martin Kroeker
2954dc1a70
CI: Add NeoverseN2 build on the new Cobalt-100 ( #5080 )
* Add NeoverseN2 build
1 year ago
Martin Kroeker
7c3a920a81
CI: Update ubuntu-latest runners to fix side effects of switch to 24.04 ( #5079 )
1 year ago
Martin Kroeker
a7483d181b
Merge pull request #5074 from tingboliao/develop
Optimize the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
1 year ago
tingbo.liao
ef7f54b357
Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Martin Kroeker
eba7338484
Merge pull request #5073 from XiWeiGu/la64_update_symv_lsx_version
LoongArch64: Update symv lsx version
1 year ago
gxw
e0a8216554
LoongArch64: Update dsymv LSX version
1 year ago
gxw
a9070ba3f9
LoongArch64: Update ssymv LSX version
1 year ago
Martin Kroeker
9b981035db
Merge pull request #5070 from xry111/xry111/lasx-la664
LoongArch64: Fix dsymv and ssymv LASX version
1 year ago
Martin Kroeker
fee353e63d
Merge pull request #5072 from martin-frbg/azureosx13
Azure CI: update deprecated macos-12 jobs to macos-13 image
1 year ago
Martin Kroeker
0c0112dfef
update deprecated macos-12 jobs to macos-13 image
1 year ago
Annop Wongwathanarat
c8cd8da496
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago
Xi Ruoyao
af10c132b8
LoongArch64: Fix dsymv and ssymv LASX version
"fmov.d $f2, $f4" leaves all the bits higher than the 63-th bit
unpredictable but it's obvious that the following code uses the value of
those high bits. We actually want to replicate the lower 64 bits here,
so we should use xvreplve0.d instead.
LA464 (Loongson 3[A-Z]-5000) happens to replicate them for us due to
some uarch internal details so the issue was not detected, but for LA664
(Loongson 3[A-Z]-6000) and future uarch we need to do things correctly
or we end up getting a lot of test failures.
Closes: https://bbs.aosc.io/t/topic/302
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
1 year ago
Martin Kroeker
4e817f804c
Update version to 0.3.29.dev
1 year ago
Martin Kroeker
8a316e68a5
Update version to 0.3.29.dev
1 year ago
Martin Kroeker
07756abb3e
Merge pull request #5067 from OpenMathLib/release-0.3.0
merge release 0.3.29 back into develop to copy tag
1 year ago
Martin Kroeker
8795fc7985
set version to 0.3.29
1 year ago
Martin Kroeker
e0c134e1f6
set version to 0.3.29
1 year ago
Martin Kroeker
9207052d85
Merge pull request #5066 from OpenMathLib/develop
Merge changes from develop in preparation of the 0.3.29 release
1 year ago
Martin Kroeker
7f5b703a80
Merge pull request #5065 from martin-frbg/changelog0329
Update the Changelog for version 0.3.29
1 year ago
Martin Kroeker
20f6114e98
add descriptions of build/runtime vars to 0.3.29 improvements
1 year ago
Martin Kroeker
f422845b6d
Merge pull request #5064 from martin-frbg/lapack1080
Replace LAPACK ?LARFT with a recursive implementation (Reference-LAPACK PR 1080)
1 year ago
Martin Kroeker
ce66ffe7bb
Update the Changelog for version 0.3.29
1 year ago
Martin Kroeker
d035e80d33
move the original non-recursive ?LARFT here (Reference-LAPACK PR 1080)
1 year ago
Martin Kroeker
459fa8102b
Create subdirectory for the old non-recursive ?larft
1 year ago
Martin Kroeker
0c4b4cd78c
move the non-recursive original ?larft here (Reference-LAPACK PR 1080)
1 year ago
Martin Kroeker
ed516994d6
replace ?larft with a recursive implementation (Reference-LAPACK PR 1080)
1 year ago
Martin Kroeker
5527eda561
Merge pull request #5063 from martin-frbg/lapack1062
Remove comparison that is always false (Reference-LAPACK PR 1062)
1 year ago
Martin Kroeker
4c1a23673a
Remove comparison that is always false (Reference-LAPACK PR 1062)
1 year ago
Martin Kroeker
d74eb02954
Merge pull request #5057 from martin-frbg/issue5050
Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds
1 year ago
Martin Kroeker
30f7a4120b
Merge pull request #5056 from tingboliao/dev_omatcopy_20250108
Optimize the omatcopy_cn/zomatcopy_cn kernels with RVV 1.0 intrinsic.
1 year ago
Martin Kroeker
0b9de3ef7d
Merge pull request #5042 from tingboliao/develop
Add the test cases of rot to improve the unit tests for rot_rvv.
1 year ago
Martin Kroeker
c31f148c76
Merge pull request #5061 from XiWeiGu/la64_update_symv
LoongArch64: Update symv
1 year ago
gxw
20a8e48f25
LoongArch64: Update ssymv LASX version
1 year ago