OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chris Sidebottom	f95e7b0e32	Add infrastructure for BGEMM Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places. Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	7 months ago
Masato Nakagawa	5253c8f165	Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX.	7 months ago
h-motoki	bba75d5e45	GEMM_PREFERED_SIZE parameter has been changed for A64FX.	7 months ago
Martin Kroeker	d96daa220d	Merge pull request #5290 from Srangrang/develop Add support for FP16 to openBLAS and shgemm on RISCV	7 months ago
davidz-ampere	aa90ab4142	Add support for Ampere AmpereOne processors	7 months ago
davidz-ampere	be68ef03b4	Add support for Ampere processors	7 months ago
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	8 months ago
Srangrang	0a967797a1	Add FP16 support for RISCV	8 months ago
Martin Kroeker	a34b487f22	Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN	9 months ago
Vaisakh K V	f66ca05b31	Merge branch 'develop' into topic/sgemm_direct_sme1	11 months ago
Vaisakh K V	d23eb3b93e	Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1	1 year ago
Ye Tao	c748e6a338	optimized sbgemm kernel for neoverse-v1 (sve-256) Signed-off-by: Ye Tao <ye.tao@arm.com>	1 year ago
Aditya Tewari	4379a6fbe3	* checkpoint sbgemm for SVE-256	1 year ago
Martin Kroeker	926e56e389	Align GEMM3M parameters for GENERIC with ZGEMM and add P/Q/R	1 year ago
Martin Kroeker	a47b3c8867	Fix unroll parameter selection for MIPS64_GENERIC	1 year ago
Martin Kroeker	7c4f3638fd	switch PPCG4 SGEMM kernel to 4x4	1 year ago
gxw	48698b2b1d	LoongArch64: Rename core Use microarchitecture name instead of meaningless strings to name the core, the legacy core is still retained. 1. Rename LOONGSONGENERIC to LA64_GENERIC 2. Rename LOONGSON3R5 to LA464 3. Rename LOONGSON2K1000 to LA264	1 year ago
Chip Kerchner	b1737698db	Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences.	1 year ago
Piotr Kubaj	4c12090776	Fix build on FreeBSD/powerpc64*	1 year ago
gxw	6017ad7146	loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6	1 year ago
Usui, Tetsuzo	ca673ca774	Add GEMM_PREFERED_SIZE parameter for Neoverse V1	1 year ago
Martin Kroeker	93d975d8fd	Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset loongarch: Optimizing the performance of the GEMM on servers	1 year ago
gxw	d8c4ea8793	loongarch: Optimizing the performance of the GEMM on servers	1 year ago
Martin Kroeker	ba6d485102	Adjust SWITCH_RATIO for ZEN and apply GEMM_PREFERRED_SIZE	1 year ago
Martin Kroeker	584e87661d	set SWITCH_RATIO for Cortex-A76	1 year ago
Martin Kroeker	b925f61fb0	Add support for Cortex-A76	1 year ago
Rajalakshmi Srinivasaraghavan	f5b2a877e2	POWER9: Use default param values from POWER8 on AIX AIX uses KERNEL.POWER8 optimization on POWER9 and changing the default GEMM parameters in param.h to use POWER8 values on POWER9.	1 year ago
pengxu	4787a55c64	Optimized cgemm kernel 16x4 LASX for LoongArch	1 year ago
pengxu	fe3da43b7d	Optimized zgemm kernel 84 LASX, 44 LSX and cgemm kernel 8*4 LSX for LoongArch	2 years ago
Martin Kroeker	e5d2725e5a	Merge pull request #4185 from XiWeiGu/mips_enable_msa MIPS: Enable MSA	2 years ago
Sergei Lewis	1093def0d1	Merge branch 'risc-v' into develop	2 years ago
Martin Kroeker	889c5d026a	Merge pull request #4456 from kseniyazaytseva/riscv-rvv10 Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics	2 years ago
kseniyazaytseva	b193ea3d7b	Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics * Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores) * Fixed nrm2, axpby, ncopy, zgemv and scal kernels * Added zero size checks	2 years ago
Dirreke	ec89466e14	Add CSKY support	2 years ago
Martin Kroeker	504f9b0c5e	Increase S/D GEMM PQ to match typical L2 size as forNeoverseV1	2 years ago
Martin Kroeker	2802478449	revert change to Loongson2k1000 zgemm	2 years ago
Martin Kroeker	44b5b9e39f	Update C/ZGEMM MN for Loongson2k1000	2 years ago
Martin Kroeker	519b40fad9	Merge pull request #4398 from yinshiyou/la-dev Add Optimizations for LoongArch.	2 years ago
pengxu	a5d0d21378	loongarch64: Add zgemm and cgemm optimization	2 years ago
Hao Chen	179ed51d3b	Add dgemm_kernel_8x4.S file.	2 years ago
Darshan Patel	dab0da8243	Update GEMM param for NEOVERSEV1	2 years ago
Octavian Maghiar	e4586e81b8	[RISC-V] Add RISC-V Vector 128-bit target Current RVV x280 target depends on vlen=512-bits for Level 3 operations. Commit adds generic target that supports vlen=128-bits. New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations. Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.	2 years ago
Rajalakshmi Srinivasaraghavan	980f702f72	POWER: AIX: Make use of power10 optimization POWER10 optimizations are disabled when using default AIX assembler. As we have fixed many issues recently, enabling optimization path for default assembler.	2 years ago
gxw	553cc1372f	LoongArch64: Add sgemm_kernel	2 years ago
gxw	4d0f000db6	MIPS: Enable MSA	2 years ago
gxw	d46772e037	LoongArch64: Add compiler feature checks	2 years ago
Chris Sidebottom	84a268b6ca	Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores. I also replicated the unrolling the copies from sgemm and dgemm.	2 years ago
Chris Sidebottom	f971ef55f2	Add ARMV8SVE to AArch64 Dynamic Dispatch In order to enable support for future cores which have similar tunings (in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters. To make `ARMV8SVE` more representive of the common 128-bit SVE case, I've split it and similar parameters from A64FX which has the wider 512-bit SVE.	2 years ago
Martin Kroeker	72caceb324	Merge pull request #4009 from Mousius/sve-gemm Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1	2 years ago
Martin Kroeker	437c0bf2b4	Merge pull request #3843 from Mousius/switch-ratio Propagate SWITCH_RATIO to DYNAMIC_ARCH builds	2 years ago

1 2 3 4 5 ...

312 Commits (cdebb4fd4b2bbbf856e5abdcedbe9a5cf348ef8e)