OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	ef8a44d981	Merge `2b5d8c789d` into `06c09deee9`	9 months ago
Martin Kroeker	2b5d8c789d	remove debugging printout	9 months ago
Martin Kroeker	b4fc09e9e1	Add registers d8 to d15 to clobber lists as the code does not expressly save them	9 months ago
Martin Kroeker	8e50b8d525	Add d8 to d15 to clobber lists as the code does not expressly save them	9 months ago
h-motoki	855945befb	Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E	9 months ago
Martin Kroeker	edaa73fd24	Hide the local 2VLx2VL symbol as static is insufficient for this with gcc	9 months ago
Martin Kroeker	501728a354	adjust register 20 accesses to 21 after moving x18	9 months ago
Martin Kroeker	05dbb54362	Delete misplaced file	9 months ago
Martin Kroeker	0bc19a1335	Update SME kernel details	9 months ago
Martin Kroeker	ca542f319f	Add VORTEXM4	9 months ago
Martin Kroeker	53d3bb50cc	Get symbol name from build system; change b.first to b.mi for AppleClang compatibility	9 months ago
Martin Kroeker	08a00326a4	Build symbol name from build system variables	9 months ago
Martin Kroeker	89898fc499	Add sgemm_direct_performant for switching between direct and regular kernels	9 months ago
Martin Kroeker	22c6607db9	Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS	9 months ago
Martin Kroeker	ca22e28ca1	Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S	9 months ago
Martin Kroeker	f3b2a15fad	Merge pull request #5420 from yuanjia111/develop Move the value assignment of vector x in gemv_n_sve.c to the outermos…	9 months ago
yuanjia	803e8d4838	Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 1.Verify correctness using BLAS-Tester 2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is: export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100 export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100	9 months ago
Chris Sidebottom	5f47b872f1	Remove older kernels for BGEMM on NEOVERSEV1	9 months ago
Chris Sidebottom	114316f361	Optimize SBGEMM / BGEMM for NEOVERSEV1 further This changes the kernels to pack full SVE vectors and reduces the overall complexity of the inner GEMM loop.	9 months ago
Martin Kroeker	f1ee61ea30	Include NEON header for the bfloat conversion functions	9 months ago
Martin Kroeker	b3ffd5524a	Include NEON header for the bfloat conversion functions	9 months ago
Martin Kroeker	a5e7c0e3e0	Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 ARM64: Enable bfloat16 kernels by default	9 months ago
abhishek-fujitsu	0bc79da587	add neon header	10 months ago
Chris Sidebottom	ea2faf0c9a	Add optimized BGEMM for NEOVERSEN2 target This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.	10 months ago
Chris Sidebottom	2c3cdaf74e	Optimized BGEMV for NEOVERSEV1 target - Adds bgemv T based off of sbgemv T kernel - Adds bgemv N which is slightly alterated to not use Y as an accumulator due to the output being bf16 which results in loss of precision - Enables BGEMM_GEMV_FORWARD to proxy BGEMM to BGEMV with new kernels	10 months ago
Martin Kroeker	39c90f9859	Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta SME1 based direct kernel (with alpha and beta) for cblas_sgemm level 3	10 months ago
Rajendra Prasad Matcha	eae0abfdb6	SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API.	10 months ago
Chris Sidebottom	740efd71c4	Add optimized BGEMM kernel for NEOVERSEV1 target This also improves the testing and generic kernel by re-using the BF16 conversion functions. Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	10 months ago
Martin Kroeker	fd37406817	Merge branch 'develop' into optimized_gemv_n_1x3	10 months ago
Iha, Taisei	f7ad906b49	Performance improvements of [SD]DOT with loop-unrolling on A64FX	10 months ago
Martin Kroeker	ee26caffb3	Merge pull request #5309 from davidz-ampere/dev-ampereone Add support for Ampere AmpereOne processors	11 months ago
davidz-ampere	aa90ab4142	Add support for Ampere AmpereOne processors	11 months ago
Ian McInerney	badef1d32e	Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types	11 months ago
davidz-ampere	84730068af	reduce duplicate kernel code	11 months ago
davidz-ampere	be68ef03b4	Add support for Ampere processors	11 months ago
Martin Kroeker	58eeb9041c	fix handling of dummy2	11 months ago
Martin Kroeker	1589d0b21e	Merge pull request #5281 from martin-frbg/zscal_arm64 kernel/arm64: fixed cscal and zscal	11 months ago
Sharif Inamdar	8279e68805	Optimize gemv_n_sve_v1x3 kernel - Calculate predicate outside the loop - Divide matrix in blocks of 3	11 months ago
Arne Juul	5442aff218	Accumulate results in output register explicitly	11 months ago
Martin Kroeker	28f8fdaf0f	support flag for NaN/Inf handling and fix scaling of NaN/Inf values	1 year ago
Martin Kroeker	5141a90993	Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222 ) * Fix ARMV9SME target and add support_sme1 code for MacOS * make sgemm_direct unconditionally available on all arm64 * build a (dummy) sgemm_direct kernel on all arm64 * Update dynamic_arm64.c	1 year ago
Martin Kroeker	151b74284e	Merge pull request #5203 from quic/fix-sgemmdirect-sme1 Add vector registers to clobber list to prevent compiler optimization.	1 year ago
abhishek-fujitsu	9c02cdb073	optimise dot using thread throttling for NEOVERSE V1	1 year ago
Martin Kroeker	d0e8fd6d40	Merge pull request #5239 from annop-w/gemv_n_sve Use SVE kernel for S/DGEMVN for SVE machines	1 year ago
Iha, Taisei	08b5c18d70	fixed a potential out-of-bounds on gemv.	1 year ago
Annop Wongwathanarat	e11744a411	Use SVE kernel for S/DGEMVN for SVE machines	1 year ago
Martin Kroeker	dd38b4e811	Merge pull request #5225 from annop-w/gemv_n Improve performance for SGEMVN on NEONVERSEN1	1 year ago
Martin Kroeker	0241d516f6	Merge pull request #5220 from iha-taisei/sdgemv_n_unroll Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.	1 year ago
Annop Wongwathanarat	d535728803	Improve performance for SGEMVN on NEONVERSEN1	1 year ago
Usui, Tetsuzo	d711906e3e	Add symv kernels for arm64	1 year ago

1 2 3 4 5 ...

369 Commits (ef8a44d981fad2673bf32743c4153559949e31fb)