Martin Kroeker
06c09deee9
Merge pull request #5426 from hideaki-motoki/issue5417_axpy_sve
Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E`
8 months ago
yuanjia
c2cc7a3602
riscv64: optimize gemv_t_vector.c
8 months ago
h-motoki
855945befb
Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E
8 months ago
Martin Kroeker
9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking
Add and use vectorized packing in ZVL128B and ZVL256B for RISCV
8 months ago
Martin Kroeker
f3b2a15fad
Merge pull request #5420 from yuanjia111/develop
Move the value assignment of vector x in gemv_n_sve.c to the outermos…
8 months ago
Chip Kerchner
64401b4417
Disable vectorized packing for DGEMM - since it is slower than scalar.
8 months ago
Chip Kerchner
c00afc86a6
Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions.
9 months ago
yuanjia
803e8d4838
Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval.
1.Verify correctness using BLAS-Tester
2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is:
export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100
export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100
9 months ago
Chris Sidebottom
5f47b872f1
Remove older kernels for BGEMM on NEOVERSEV1
9 months ago
Chris Sidebottom
114316f361
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
This changes the kernels to pack full SVE vectors and reduces the
overall complexity of the inner GEMM loop.
9 months ago
Martin Kroeker
f1ee61ea30
Include NEON header for the bfloat conversion functions
9 months ago
Martin Kroeker
b3ffd5524a
Include NEON header for the bfloat conversion functions
9 months ago
Martin Kroeker
0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372
Work around gcc15.1 on POWER misoptimizing DGEMV at -O3
9 months ago
Martin Kroeker
a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug
9 months ago
Chip Kerchner
72f082f31d
Fix bad vector zero initializer and other compiler warnings for RISC-V.
9 months ago
Martin Kroeker
a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16
ARM64: Enable bfloat16 kernels by default
9 months ago
abhishek-fujitsu
0bc79da587
add neon header
9 months ago
Chris Sidebottom
ea2faf0c9a
Add optimized BGEMM for NEOVERSEN2 target
This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.
9 months ago
Chris Sidebottom
2c3cdaf74e
Optimized BGEMV for NEOVERSEV1 target
- Adds bgemv T based off of sbgemv T kernel
- Adds bgemv N which is slightly alterated to not use Y as an
accumulator due to the output being bf16 which results in loss of
precision
- Enables BGEMM_GEMV_FORWARD to proxy BGEMM to BGEMV with new kernels
9 months ago
Martin Kroeker
e2d941e9af
Declare the "small" kernel static in addition to inline
9 months ago
Martin Kroeker
8214700930
Declare the "small" kernel static in addition to inline
9 months ago
Martin Kroeker
39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta
SME1 based direct kernel (with alpha and beta) for cblas_sgemm level 3
9 months ago
Rajendra Prasad Matcha
eae0abfdb6
SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API.
10 months ago
Chris Sidebottom
947d7af4c9
Fix CMake references to bscal and bgemv
9 months ago
Chris Sidebottom
e105411460
Add infrastructure for bgemv/bscal
- Sets up all the various entrypoints for `bgemv`
- Adds `bscal` for use in the `bgemv` interface
- Adds test cases for comparing `sgemv` and `bgemv`
- Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate
enough to pass above tests
10 months ago
Chris Sidebottom
740efd71c4
Add optimized BGEMM kernel for NEOVERSEV1 target
This also improves the testing and generic kernel by re-using the BF16
conversion functions.
Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com>
10 months ago
Martin Kroeker
343830c26f
Add BGEMM parameter tables
10 months ago
Martin Kroeker
ff614575c9
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds
10 months ago
Martin Kroeker
0e11537cab
Merge pull request #5357 from Mousius/bgemm-init
Add infrastructure for BGEMM
10 months ago
Chris Sidebottom
66d9185ebe
Fix CMake support
10 months ago
Martin Kroeker
fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3
10 months ago
Chris Sidebottom
f95e7b0e32
Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.
Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com>
10 months ago
Iha, Taisei
f7ad906b49
Performance improvements of [SD]DOT with loop-unrolling on A64FX
10 months ago
Martin Kroeker
d96daa220d
Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
10 months ago
Martin Kroeker
ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone
Add support for Ampere AmpereOne processors
10 months ago
davidz-ampere
aa90ab4142
Add support for Ampere AmpereOne processors
10 months ago
Ian McInerney
badef1d32e
Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types
10 months ago
Martin Kroeker
3318a2b904
override CDOT and ZDOT with the generic C kernel
10 months ago
davidz-ampere
84730068af
reduce duplicate kernel code
10 months ago
davidz-ampere
be68ef03b4
Add support for Ampere processors
10 months ago
Srangrang
9f13b2c6ac
style: modify HALF to BFLOAT16 in benchmark folder
10 months ago
Srangrang
ec14e1648c
fix: resolve non-RISCV host build failed issue
- adjust interface to disable "small matrix" pathway
- separate HFLOAT16 from BFLOAT16
- remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions
Related to PR#5290
Co-authored-by Martin
10 months ago
Martin Kroeker
e338d34ce1
fix path
11 months ago
Martin Kroeker
d36093d084
temporarily change default C/ZSCAL to the non-asm implementation
11 months ago
Martin Kroeker
b3c90564d7
resync with the generic arm version for inf/nan handling
11 months ago
Martin Kroeker
6bdc7f9eb7
Merge pull request #5300 from martin-frbg/fixup5296
kernel/riscv64: Fix cscal/zscal for riscv64_generic
11 months ago
Martin Kroeker
73af02b89f
use dummy2 as Inf/NAN handling flag
11 months ago
Martin Kroeker
549a9f1dbb
Disable the default SSE kernels for CSCAL/ZSCAL for now
11 months ago
Martin Kroeker
58eeb9041c
fix handling of dummy2
11 months ago
Martin Kroeker
7c77537b25
Merge pull request #5297 from martin-frbg/zscal_x86_sparc
kernel/(x86|sparc): Fix cscal and zscal by reverting to the generic C kernels
11 months ago