OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Mark Ryan	ce79fe12fd	disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1 The compiler options that enable 16 bit floating point instructions should not be enabled by default when building the RISCV64_ZVL128B and RISCV64_ZVL256B targets. The zfh and zvfh extensions are not part of the 'V' extension and are not required by any of the RVA profiles. There's no guarantee that kernels built with zfh and zvfh will work correctly on fully compliant RVA23U64 devices. To fix the issue we only build the RISCV64_ZVL128B and RISCV64_ZVL256B kernels with the half float flags if BUILD_HFLOAT16=1. We also update the RISC-V dynamic detection code to disable the RISCV64_ZVL128B and RISCV64_ZVL256B kernels at runtime if we've built with DYNAMIC_ARCH=1 and BUILD_HFLOAT16=1 and are running on a device that does not support both Zfh and Zvfh. Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/5428	5 months ago
Martin Kroeker	06c09deee9	Merge pull request #5426 from hideaki-motoki/issue5417_axpy_sve Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E`	5 months ago
Martin Kroeker	da7d0f4a38	Merge pull request #5427 from yuanjia111/develop Optimize the gemv_t_vector.c kernel for RISCV64_ZVL256B target	5 months ago
yuanjia	c2cc7a3602	riscv64: optimize gemv_t_vector.c	5 months ago
h-motoki	e23f9c6642	Merge remote-tracking branch 'upstream/develop' into issue5417_axpy_sve	5 months ago
Martin Kroeker	b3f247ae5a	Merge pull request #5425 from martin-frbg/fixup5389 Increase L2 defaults for RISCV X280 / ZVL256B and ARM SVE targets in CMake cross-compilation	5 months ago
h-motoki	855945befb	Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E	5 months ago
Martin Kroeker	7c1839899e	Increase assumed L2 sizes for RISCV X280 / ZVL256B and for SVE-capable ARM64	5 months ago
Martin Kroeker	9c43301b6d	Merge pull request #5421 from reibax-marcus/develop fix: broken cblas installation when using makefile based builds	5 months ago
Martin Kroeker	9d6df1dd3e	Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking Add and use vectorized packing in ZVL128B and ZVL256B for RISCV	5 months ago
Martin Kroeker	f3b2a15fad	Merge pull request #5420 from yuanjia111/develop Move the value assignment of vector x in gemv_n_sve.c to the outermos…	5 months ago
Chip Kerchner	64401b4417	Disable vectorized packing for DGEMM - since it is slower than scalar.	5 months ago
Martin Kroeker	5e43ba948c	Merge pull request #5419 from Mousius/bgemm-optimisation Optimize SBGEMM / BGEMM for NEOVERSEV1 further	5 months ago
Chip Kerchner	c00afc86a6	Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions.	5 months ago
Xabier Marquiegui	3a6b79c50f	fix: broken cblas installation when using makefile based builds Fix cblas.h missing from target directory if NO_CBLAS is defined but has a value that indicates you do want cblas built and installed.	5 months ago
yuanjia	803e8d4838	Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 1.Verify correctness using BLAS-Tester 2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is: export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100 export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100	5 months ago
Chris Sidebottom	5f47b872f1	Remove older kernels for BGEMM on NEOVERSEV1	5 months ago
Chris Sidebottom	114316f361	Optimize SBGEMM / BGEMM for NEOVERSEV1 further This changes the kernels to pack full SVE vectors and reduces the overall complexity of the inner GEMM loop.	5 months ago
Martin Kroeker	75c6ab4036	CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 (#5411 ) * Update to 20.1.8 * fix PATH to avoid the obsolete LLVM19 that appeared in the preinstalled msvc folder hierarchy	5 months ago
Martin Kroeker	5c5f852ee3	Merge pull request #5415 from martin-frbg/Fixum-5399 Fix compilation of the NeoverseN2 SBGEMM kernel	6 months ago
Martin Kroeker	f1ee61ea30	Include NEON header for the bfloat conversion functions	6 months ago
Martin Kroeker	b3ffd5524a	Include NEON header for the bfloat conversion functions	6 months ago
Martin Kroeker	d23680b81d	Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1 Multi-thread Performance Improvement of GEMM on NeoverseV1 with DIVIDE_RATE=1	6 months ago
Martin Kroeker	b4cc4be2ce	Merge pull request #5410 from martin-frbg/issue5404 Adjust multithreading threshold in S/DGER and add an intermediate step	6 months ago
Martin Kroeker	0968dddf1a	Merge pull request #5409 from martin-frbg/issue5372 Work around gcc15.1 on POWER misoptimizing DGEMV at -O3	6 months ago
Martin Kroeker	eddfe1e6b3	Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings Fix bad vector zero initializer and other compiler warnings for RISC-V.	6 months ago
Martin Kroeker	30d11bc92c	Adjust multithreading threshold and add an intermediate step	6 months ago
Martin Kroeker	a3b9c933c5	mark xbuffer as volatile to work around gcc15.1 optimizer bug	6 months ago
Chip Kerchner	72f082f31d	Fix bad vector zero initializer and other compiler warnings for RISC-V.	6 months ago
Masato Nakagawa	7e29f11396	Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)	6 months ago
Martin Kroeker	9a64b32b44	Merge pull request #5406 from martin-frbg/fixbgemmtest Fix building of bgemm tests on GEMM3M-capable (x86) targets	6 months ago
Martin Kroeker	b66a01f909	Fix building of bgemm tests on GEMM3M-capable (x86) targets	6 months ago
Martin Kroeker	a5e7c0e3e0	Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 ARM64: Enable bfloat16 kernels by default	6 months ago
abhishek-fujitsu	6356190d06	fix gfortran link path in dynamic_arch.yml	6 months ago
abhishek-fujitsu	4c8dcb3a8f	Darwin/arm64: disable SVE/SME and fix gfortran link path	6 months ago
Martin Kroeker	33b50548eb	Merge pull request #5403 from martin-frbg/issue5402 Introduce a (crude) threshold to multithreading in STRMV/DTRMV	6 months ago
Martin Kroeker	c504aedca1	Merge pull request #5400 from Mousius/neoversev2-target Add NEOVERSEV2 target support	6 months ago
Martin Kroeker	b9e107932a	add NeoverseV2	6 months ago
Martin Kroeker	2f89a5970e	fix NeoverseV2 typo	6 months ago
Martin Kroeker	a9e8fa06bf	Introduce a (crude) threshold to multithreading	6 months ago
Martin Kroeker	b4c2b34a45	Merge pull request #5401 from martin-frbg/followup-5397 Include float-bfloat conversion functions in ONLY_CBLAS builds as well	6 months ago
Martin Kroeker	c9204f7b6f	Merge pull request #5399 from Mousius/bgemm-8x4 Add optimized BGEMM for NEOVERSEN2 target	6 months ago
Martin Kroeker	a55e65dba9	Merge pull request #5391 from martin-frbg/issue5387 Use OpenBLAS_ROOT_DIR in OpenBLASConfig.cmake generation only if set	6 months ago
abhishek-fujitsu	0bc79da587	add neon header	6 months ago
abhishek-fujitsu	720a4743b9	update contribution list	6 months ago
abhishek-fujitsu	05fc88180c	ARM64: Enable bfloat16 kernels by default	8 months ago
Martin Kroeker	965463f177	Include float-bfloat conversion functions in ONLY_CBLAS builds as well	6 months ago
Martin Kroeker	4272cf8c7f	Merge pull request #5398 from martin-frbg/fixup-5394 Update ?GEMM-to-?GEMV forwarding settings for CMake	6 months ago
Chris Sidebottom	87247daadc	Add NEOVERSEV2 target support Did a quick run around to make `TARGET=NEVOERSEV2` build successfully. Fixes #5385	6 months ago
Chris Sidebottom	ea2faf0c9a	Add optimized BGEMM for NEOVERSEN2 target This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.	6 months ago

1 2 3 4 5 ...

9521 Commits (ce79fe12fdacdfd5d48c4a61a08f86aa6170eae9) All Branches Search

9521 Commits (ce79fe12fdacdfd5d48c4a61a08f86aa6170eae9)

All Branches