Martin Kroeker
b8f66ba0ee
Merge pull request #5367 from Mousius/bgemm-init
Temporarily disable test_bgemm
10 months ago
Martin Kroeker
cdebb4fd4b
Merge pull request #5365 from martin-frbg/issue5324
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds using CMake
10 months ago
Martin Kroeker
ff614575c9
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds
10 months ago
Martin Kroeker
0e11537cab
Merge pull request #5357 from Mousius/bgemm-init
Add infrastructure for BGEMM
10 months ago
Chris Sidebottom
8cd4be8d47
Temporarily disable test_bgemm
10 months ago
Chris Sidebottom
66d9185ebe
Fix CMake support
10 months ago
Martin Kroeker
98aefb70b4
Merge pull request #5292 from isharif168/optimized_gemv_n_1x3
Optimize gemv_n_sve_v1x3 kernel
10 months ago
Martin Kroeker
fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3
10 months ago
Chris Sidebottom
48394384ef
Use correct constants for per-target BGEMM/SBGEMM
This fixes the build and tests on `NEOVERSEV1` target, which was failing
with specific constants for `SBGEMM`
Co-authored-by: Ye Tao <ye.tao@arm.com>
10 months ago
Chris Sidebottom
73bf0b941a
Add bgemm to gensymbol
10 months ago
Chris Sidebottom
f95e7b0e32
Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.
Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com>
10 months ago
Martin Kroeker
15d6e58510
Merge pull request #5364 from martin-frbg/blashalf
change BLAS_HALF to BLAS_BFLOAT16 in parallelized POTRF (another missed rename)
10 months ago
Martin Kroeker
04bb5acd79
change BLAS_HALF to BLAS_BFLOAT16 (another missed rename)
10 months ago
Martin Kroeker
3d31887073
Merge pull request #5362 from Mousius/fix-bf16
Fix SBGEMM BFLOAT16 build
10 months ago
Martin Kroeker
0ddf8ebd42
Merge pull request #5354 from pratiklp00/p11
Add Support for POWER11
10 months ago
Martin Kroeker
d2ea9bbb6d
Merge pull request #5363 from guoyuanplct/develop
Update CONTRIBUTORS.md
10 months ago
guoyuanplct
4ff549a450
Update CONTRIBUTORS.md
10 months ago
guoyuanplct
309c48e327
Update CONTRIBUTORS.md
10 months ago
Chris Sidebottom
552e1c7a7a
Correct compiler flags for NEOVERSEV1 target
10 months ago
Chris Sidebottom
46b9b7a080
Also enable BFLOAT16 for make cirun
10 months ago
Chris Sidebottom
eaaa628af2
Enable BUILD_BFLOAT16 in cirun
10 months ago
Chris Sidebottom
7a97c4ca97
Rename HALF -> BFLOAT16 in some more places
10 months ago
Martin Kroeker
ee6560c89f
Merge pull request #5360 from sertonix/cpuid-arm
Fix cpuid.S on arm
10 months ago
Sertonix
8d11e4630c
Fix cpuid.S on arm
The ARM assembly syntax differs a bit
Fixes 61b9339d3a getarch/cpuid.S: Fix warning about executable stack
Signed-off-by: Sertonix <sertonix@posteo.net>
10 months ago
Martin Kroeker
03a4afcf14
Merge pull request #5359 from martin-frbg/gitign_isnan
update gitignore configuration
10 months ago
Martin Kroeker
901de8f33a
remove lapacke_mangling.h and add la_xisnan.mod
10 months ago
Martin Kroeker
ce6991780a
Merge pull request #5356 from ilina-linaro/ilina-woa
Update README.md to include Windows on Arm64
10 months ago
Martin Kroeker
df013c5e28
Merge pull request #5358 from iha-taisei/dot_unroll
Performance improvements of [SD]DOT with loop-unrolling on A64FX
10 months ago
Iha, Taisei
f7ad906b49
Performance improvements of [SD]DOT with loop-unrolling on A64FX
10 months ago
Lina Iyer
7f360001f9
Update README.md to include Windows on Arm64
Update README.md to indicate that binaries are available for Windows on ARM64
10 months ago
Martin Kroeker
36c2589d3a
Merge pull request #5355 from tetsuzo-usui/add_parallel_laed3
Improve [SD]SYEVD performance by parallelizing [SD]LAED3
10 months ago
Usui, Tetsuzo
14107e37d9
Add parallel laed3
10 months ago
Martin Kroeker
a06bcf836b
Merge pull request #5353 from nakagawa-fj/feature/gemm_divide_rate_for_A64FX
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX
10 months ago
Masato Nakagawa
5253c8f165
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
A64FX.
10 months ago
Martin Kroeker
8f0a1a3f82
Merge pull request #5303 from martin-frbg/issue5289
Exit if memory allocation keeps failing, instead of retrying forever
10 months ago
Martin Kroeker
2c0dd2468e
Merge pull request #5350 from martin-frbg/issue5341
Declare the server_lock mutex volatile in addition to static
10 months ago
Martin Kroeker
7ae24d0b85
Merge pull request #5351 from martin-frbg/lapack1140
Fix documentation error and ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140)
10 months ago
Martin Kroeker
5aeca597fe
Fix documentation error and ordering bug (Reference-LAPACK PR 1140)
10 months ago
Martin Kroeker
dcb289539b
Merge pull request #5344 from MaartenBaert/fix-dlasd7
LAPACK: Fix documentation error and ordering bug in DLASD7
10 months ago
Martin Kroeker
9bcffbd655
Declare the server_lock mutex volatile in addition to static
10 months ago
Martin Kroeker
334cd242d4
Merge pull request #5348 from hideaki-motoki/issue5343_prefered_size_for_a64fx
Setting `GEMM_PREFERED_SIZE` parameter for `A64FX`
10 months ago
h-motoki
bba75d5e45
GEMM_PREFERED_SIZE parameter has been changed for A64FX.
10 months ago
Martin Kroeker
4062c10370
Merge pull request #5345 from OpenMathLib/revert-5251-issue5250
Revert "Fix out-of-bounds accesses in ?/SCAL/?GEEV triggered by preceding errrors/invalid inputs"
10 months ago
Martin Kroeker
b78d1dc0ae
Merge pull request #5342 from martin-frbg/cmake_ampere
Add CMake build settings for the Ampere One cpu
10 months ago
Martin Kroeker
83a01d29ca
Revert "Fix out-of-bounds accesses in ?/SCAL/?GEEV triggered by preceding errrors/invalid inputs"
10 months ago
Martin Kroeker
560fa88c96
Add cross-build parameters for Ampere One
10 months ago
Martin Kroeker
55bb5ef867
Add compiler options for Ampere One
10 months ago
Maarten Baert
b37889e52d
Merge branch 'OpenMathLib:develop' into fix-dlasd7
10 months ago
pratiklp00
1dde4a13c0
p11 changes
10 months ago
Martin Kroeker
11ce79a4f0
Merge pull request #5329 from foxtran/fix/docs
Update FAQ
10 months ago