Martin Kroeker
09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such
1 year ago
Martin Kroeker
c139b63342
Merge pull request #5107 from jhgit/develop
fix signedness of pointer to integer type passed to blas_lock()
1 year ago
John Hein
6cd9bbe531
fix signedness of pointer to integer type passed to blas_lock()
1 year ago
Martin Kroeker
5de5072940
Improve flang-new identification and add CI job for it on OSX-x86_64 ( #5103 )
* AzureCI: Add LLVM/flang-new build on OSX-x86_64
* distinguish classic flang from flang-new in name based recognition
1 year ago
Martin Kroeker
1f74fb9a07
Merge pull request #5101 from martin-frbg/issue5100
Fix CMake build for PPCG4 breaking due to unparsable KERNEL file
1 year ago
Martin Kroeker
d7036cfd74
Remove trailing blanks that break the cmake parser
1 year ago
Martin Kroeker
3375a0c990
Merge pull request #5099 from martin-frbg/issue5097-2
Simplify build instructions for Windows on Arm
1 year ago
Martin Kroeker
7a27e2b00d
Simplify build instructions for Windows on Arm
1 year ago
Martin Kroeker
fdeac17237
Merge pull request #5098 from martin-frbg/issue5095
Fix compilation with BUILD_BFLOAT16 enabled
1 year ago
Martin Kroeker
1829ac5b44
Add (dummy) declaration of SBROT_M
1 year ago
Martin Kroeker
53d20a83f3
Merge pull request #5089 from annop-w/gemv_t
Simplify gemv_t_sve_v1x3 kernel
1 year ago
Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
1 year ago
Martin Kroeker
9b11fd5802
Merge pull request #5088 from michalowski-arm/develop
Add thread throttling profile for SGEMV on `NEOVERSEV1`
1 year ago
Martin Kroeker
5930c162ef
Merge pull request #5097 from matthew-brett/fix-woa-cmd
Fix Windows on ARM build instructions
1 year ago
Marek Michalowski
838bb57e27
Merge branch 'develop' into develop
1 year ago
Matthew Brett
252c43265d
Fix Windows on ARM build instructions
The command as merged uses the compiler target as the compiler path.
I have run and tested a build with this command.
@Mugundanmcw - is this correct?
1 year ago
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
1 year ago
Martin Kroeker
a54f9a9c69
Merge pull request #5071 from annop-w/sgemm_throttling
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago
Martin Kroeker
9f2319b46d
Merge pull request #5094 from martin-frbg/issue5093
Fix "make install" operation when CPP_THREAD_SAFETY_TEST is selected
1 year ago
Martin Kroeker
9faebb3c97
fix lost indentation in the rules for the thread safety test
1 year ago
Martin Kroeker
262018f14c
Merge pull request #5092 from XiWeiGu/la64_fixed_cmake
LoongArch64: Fixed cmake
1 year ago
Martin Kroeker
180ba5e7d0
Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
1 year ago
gxw
1ebcbdbab3
LoongArch64: Fixed the issue of using the old-style TARGET in cmake builds
1 year ago
Deeksha Goplani
d1bfa979f7
small gemm kernel packing modifications
1 year ago
Martin Kroeker
1a6a9fb22f
add another generator line for rotm
1 year ago
Martin Kroeker
518e376820
Merge pull request #5090 from martin-frbg/cmakeutils
Fix CMake interpretation of KERNEL file variables relevant to WoA
1 year ago
Martin Kroeker
111c9b0733
Add translations for C_COMPILER and OSNAME
1 year ago
Martin Kroeker
4924319c50
fix position of srotm, qrotm
1 year ago
Martin Kroeker
b58cba9eb6
fix qrotm build rules
1 year ago
Marek Michalowski
4d5b13f765
Add thread throttling profile for SGEMV on `NEOVERSEV1`
1 year ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Annop Wongwathanarat
c0318cea6e
Simplify gemv_t_sve_v1x3 kernel
1 year ago
Martin Kroeker
76db346f7e
Merge pull request #5082 from martin-frbg/woa_cpuid
Get ARM64 TARGET information from the registry on Windows
1 year ago
Martin Kroeker
5f7b03a441
Merge pull request #5083 from martin-frbg/fixmips64ci
MIPS64 CI :fix breakage from inadvertent line join in yml file
1 year ago
Martin Kroeker
100e74d4d6
restore deleted line break
1 year ago
Martin Kroeker
ca3e1c8f9c
Get TARGET information from the registry on Windows
1 year ago
Martin Kroeker
87083fdbf6
[WIP] Work around assembler limitations in current LLVM for Windows on Arm ( #5076 )
* Protect align directives in assembly files that are currently problematic with LLVM on WoA
* use the armv8 zdot on WoA to work around other LLVM issues
1 year ago
Martin Kroeker
2954dc1a70
CI: Add NeoverseN2 build on the new Cobalt-100 ( #5080 )
* Add NeoverseN2 build
1 year ago
Martin Kroeker
7c3a920a81
CI: Update ubuntu-latest runners to fix side effects of switch to 24.04 ( #5079 )
1 year ago
Martin Kroeker
a7483d181b
Merge pull request #5074 from tingboliao/develop
Optimize the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
1 year ago
tingbo.liao
ef7f54b357
Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
1 year ago
Martin Kroeker
eba7338484
Merge pull request #5073 from XiWeiGu/la64_update_symv_lsx_version
LoongArch64: Update symv lsx version
1 year ago
gxw
e0a8216554
LoongArch64: Update dsymv LSX version
1 year ago
gxw
a9070ba3f9
LoongArch64: Update ssymv LSX version
1 year ago
Martin Kroeker
9b981035db
Merge pull request #5070 from xry111/xry111/lasx-la664
LoongArch64: Fix dsymv and ssymv LASX version
1 year ago
Martin Kroeker
fee353e63d
Merge pull request #5072 from martin-frbg/azureosx13
Azure CI: update deprecated macos-12 jobs to macos-13 image
1 year ago
Martin Kroeker
0c0112dfef
update deprecated macos-12 jobs to macos-13 image
1 year ago
Annop Wongwathanarat
c8cd8da496
Add thread throttling profile for SGEMM on NEOVERSEV1
1 year ago
Xi Ruoyao
af10c132b8
LoongArch64: Fix dsymv and ssymv LASX version
"fmov.d $f2, $f4" leaves all the bits higher than the 63-th bit
unpredictable but it's obvious that the following code uses the value of
those high bits. We actually want to replicate the lower 64 bits here,
so we should use xvreplve0.d instead.
LA464 (Loongson 3[A-Z]-5000) happens to replicate them for us due to
some uarch internal details so the issue was not detected, but for LA664
(Loongson 3[A-Z]-6000) and future uarch we need to do things correctly
or we end up getting a lot of test failures.
Closes: https://bbs.aosc.io/t/topic/302
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
1 year ago
Martin Kroeker
4e817f804c
Update version to 0.3.29.dev
1 year ago