Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
1 year ago
Martin Kroeker
9b11fd5802
Merge pull request #5088 from michalowski-arm/develop
Add thread throttling profile for SGEMV on `NEOVERSEV1`
1 year ago
Martin Kroeker
5930c162ef
Merge pull request #5097 from matthew-brett/fix-woa-cmd
Fix Windows on ARM build instructions
1 year ago
Marek Michalowski
838bb57e27
Merge branch 'develop' into develop
1 year ago
Matthew Brett
252c43265d
Fix Windows on ARM build instructions
The command as merged uses the compiler target as the compiler path.
I have run and tested a build with this command.
@Mugundanmcw - is this correct?
1 year ago
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
1 year ago
Martin Kroeker
a54f9a9c69
Merge pull request #5071 from annop-w/sgemm_throttling
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago
Martin Kroeker
9f2319b46d
Merge pull request #5094 from martin-frbg/issue5093
Fix "make install" operation when CPP_THREAD_SAFETY_TEST is selected
1 year ago
Martin Kroeker
9faebb3c97
fix lost indentation in the rules for the thread safety test
1 year ago
Martin Kroeker
262018f14c
Merge pull request #5092 from XiWeiGu/la64_fixed_cmake
LoongArch64: Fixed cmake
1 year ago
Martin Kroeker
180ba5e7d0
Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
1 year ago
gxw
1ebcbdbab3
LoongArch64: Fixed the issue of using the old-style TARGET in cmake builds
1 year ago
Deeksha Goplani
d1bfa979f7
small gemm kernel packing modifications
1 year ago
Martin Kroeker
1a6a9fb22f
add another generator line for rotm
1 year ago
Martin Kroeker
518e376820
Merge pull request #5090 from martin-frbg/cmakeutils
Fix CMake interpretation of KERNEL file variables relevant to WoA
1 year ago
Martin Kroeker
111c9b0733
Add translations for C_COMPILER and OSNAME
1 year ago
Martin Kroeker
4924319c50
fix position of srotm, qrotm
1 year ago
Martin Kroeker
b58cba9eb6
fix qrotm build rules
1 year ago
Marek Michalowski
4d5b13f765
Add thread throttling profile for SGEMV on `NEOVERSEV1`
1 year ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Annop Wongwathanarat
c0318cea6e
Simplify gemv_t_sve_v1x3 kernel
1 year ago
Martin Kroeker
76db346f7e
Merge pull request #5082 from martin-frbg/woa_cpuid
Get ARM64 TARGET information from the registry on Windows
1 year ago
Martin Kroeker
5f7b03a441
Merge pull request #5083 from martin-frbg/fixmips64ci
MIPS64 CI :fix breakage from inadvertent line join in yml file
1 year ago
Martin Kroeker
100e74d4d6
restore deleted line break
1 year ago
Martin Kroeker
ca3e1c8f9c
Get TARGET information from the registry on Windows
1 year ago
Martin Kroeker
87083fdbf6
[WIP] Work around assembler limitations in current LLVM for Windows on Arm ( #5076 )
* Protect align directives in assembly files that are currently problematic with LLVM on WoA
* use the armv8 zdot on WoA to work around other LLVM issues
1 year ago
Martin Kroeker
2954dc1a70
CI: Add NeoverseN2 build on the new Cobalt-100 ( #5080 )
* Add NeoverseN2 build
1 year ago
Martin Kroeker
7c3a920a81
CI: Update ubuntu-latest runners to fix side effects of switch to 24.04 ( #5079 )
1 year ago
Martin Kroeker
a7483d181b
Merge pull request #5074 from tingboliao/develop
Optimize the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
1 year ago
tingbo.liao
ef7f54b357
Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Martin Kroeker
eba7338484
Merge pull request #5073 from XiWeiGu/la64_update_symv_lsx_version
LoongArch64: Update symv lsx version
1 year ago
gxw
e0a8216554
LoongArch64: Update dsymv LSX version
1 year ago
gxw
a9070ba3f9
LoongArch64: Update ssymv LSX version
1 year ago
Martin Kroeker
9b981035db
Merge pull request #5070 from xry111/xry111/lasx-la664
LoongArch64: Fix dsymv and ssymv LASX version
1 year ago
Martin Kroeker
fee353e63d
Merge pull request #5072 from martin-frbg/azureosx13
Azure CI: update deprecated macos-12 jobs to macos-13 image
1 year ago
Martin Kroeker
0c0112dfef
update deprecated macos-12 jobs to macos-13 image
1 year ago
Annop Wongwathanarat
c8cd8da496
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago
Xi Ruoyao
af10c132b8
LoongArch64: Fix dsymv and ssymv LASX version
"fmov.d $f2, $f4" leaves all the bits higher than the 63-th bit
unpredictable but it's obvious that the following code uses the value of
those high bits. We actually want to replicate the lower 64 bits here,
so we should use xvreplve0.d instead.
LA464 (Loongson 3[A-Z]-5000) happens to replicate them for us due to
some uarch internal details so the issue was not detected, but for LA664
(Loongson 3[A-Z]-6000) and future uarch we need to do things correctly
or we end up getting a lot of test failures.
Closes: https://bbs.aosc.io/t/topic/302
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
1 year ago
Martin Kroeker
4e817f804c
Update version to 0.3.29.dev
1 year ago
Martin Kroeker
8a316e68a5
Update version to 0.3.29.dev
1 year ago
Martin Kroeker
07756abb3e
Merge pull request #5067 from OpenMathLib/release-0.3.0
merge release 0.3.29 back into develop to copy tag
1 year ago
Martin Kroeker
8795fc7985
set version to 0.3.29
1 year ago
Martin Kroeker
e0c134e1f6
set version to 0.3.29
1 year ago
Martin Kroeker
9207052d85
Merge pull request #5066 from OpenMathLib/develop
Merge changes from develop in preparation of the 0.3.29 release
1 year ago
Martin Kroeker
7f5b703a80
Merge pull request #5065 from martin-frbg/changelog0329
Update the Changelog for version 0.3.29
1 year ago
Martin Kroeker
20f6114e98
add descriptions of build/runtime vars to 0.3.29 improvements
1 year ago
Martin Kroeker
f422845b6d
Merge pull request #5064 from martin-frbg/lapack1080
Replace LAPACK ?LARFT with a recursive implementation (Reference-LAPACK PR 1080)
1 year ago
Martin Kroeker
ce66ffe7bb
Update the Changelog for version 0.3.29
1 year ago
Martin Kroeker
d035e80d33
move the original non-recursive ?LARFT here (Reference-LAPACK PR 1080)
1 year ago
Martin Kroeker
459fa8102b
Create subdirectory for the old non-recursive ?larft
1 year ago