OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Srangrang	ec14e1648c	fix: resolve non-RISCV host build failed issue - adjust interface to disable "small matrix" pathway - separate HFLOAT16 from BFLOAT16 - remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions Related to PR#5290 Co-authored-by Martin	7 months ago
Srangrang	fb89820f20	Merge branch 'develop' of https://github.com/Srangrang/OpenBLAS into develop	7 months ago
Srangrang	4e1a381e5b	fix: resolve the compilation failure without zfh instruction - modify the macro conditions in Makefile.system - Delete development test code Related to issue#5279	7 months ago
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	7 months ago
guoyuanplct	d2003dc886	del lines	8 months ago
guoyuanplct	45fd2d9b07	Optimized the axpby function.	8 months ago
Srangrang	2996c25c94	add shgemm for RISCV_ZVL128B	8 months ago
Martin Kroeker	0b0bb9951d	Merge pull request #5265 from guoyuanplct/develop kernel/riscv64:Added support for omatcopy on RISCV64_ZVL256B	8 months ago
guoyuanplct	be9f7550b5	Format Code	8 months ago
guoyuanplct	4d213653d8	kernel/riscv64:Added support for omatcopy on riscv64.	8 months ago
Martin Kroeker	8afddc1a81	Merge pull request #5262 from guoyuanplct/develop kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:	8 months ago
guoyuanplct	9a7e3f102b	kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:	8 months ago
pengxu	a978ad3180	Loongarch64: add C functions of zgemm_ncopy_16	8 months ago
pengxu	0ccb050583	Loongarch64: fixed cgemm_ncopy_16_lasx	8 months ago
Martin Kroeker	5141a90993	Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222 ) * Fix ARMV9SME target and add support_sme1 code for MacOS * make sgemm_direct unconditionally available on all arm64 * build a (dummy) sgemm_direct kernel on all arm64 * Update dynamic_arm64.c	8 months ago
Martin Kroeker	151b74284e	Merge pull request #5203 from quic/fix-sgemmdirect-sme1 Add vector registers to clobber list to prevent compiler optimization.	8 months ago
Martin Kroeker	cba32d001a	Merge pull request #5245 from guoyuanplct/develop Optimized RVV_ZVL256B Implementation of zgemv_n	9 months ago
pengxu	f19e72c402	Loongarch64: fixed swap_lasx	9 months ago
pengxu	b471fa337b	Loongarch64: fixed snrm2_lasx	9 months ago
pengxu	57bb46bedf	Loongarch64: fixed rot_lasx	9 months ago
pengxu	6dc4ca2391	Loongarch64: fixed icamax_lasx	9 months ago
pengxu	b528b1b8ea	Loongarch64: fixed iamax_lasx	9 months ago
pengxu	ba9569e382	Loongarch64: fixed dot_lasx	9 months ago
pengxu	dc5fa29851	Loongarch64: fixed cscal_lasx	9 months ago
pengxu	a98dd6d911	Loongarch64: fixed copy_lasx	9 months ago
pengxu	d49319c2d2	Loongarch64: fixed cnrm2_lasx	9 months ago
pengxu	74c97ef814	Loongarch64: fixed cdot_lasx	9 months ago
pengxu	be525521ad	Loongarch64: fixed asum_lasx	9 months ago
pengxu	0cd5ca5527	Loongarch64: fixed amax_lasx	9 months ago
guoyuanplct	11ffc8680e	Format the code	9 months ago
guoyuanplct	7616c42095	Optimized RVV_ZVL256B Implementation of zgemv_n The implementation of zgemv_n using RVV_ZVL256B has been optimized. Compared to the previous implementation, it has achieved a 1.5x performance improvement.	9 months ago
abhishek-fujitsu	9c02cdb073	optimise dot using thread throttling for NEOVERSE V1	10 months ago
Martin Kroeker	d0e8fd6d40	Merge pull request #5239 from annop-w/gemv_n_sve Use SVE kernel for S/DGEMVN for SVE machines	9 months ago
Iha, Taisei	08b5c18d70	fixed a potential out-of-bounds on gemv.	9 months ago
Annop Wongwathanarat	e11744a411	Use SVE kernel for S/DGEMVN for SVE machines	9 months ago
Martin Kroeker	db0abfa907	Merge pull request #5238 from martin-frbg/revert5125 remove non-vectorized SGEMV transpose reduce path for POWER8, restoring optimizations frpm PR4880	9 months ago
Martin Kroeker	7389b6c483	Merge pull request #5237 from martin-frbg/revert5219 Fix and reinstate the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV	9 months ago
Martin Kroeker	4ec62d7f73	remove non-vectorized code path for power8, restoring PR4880	9 months ago
Martin Kroeker	1df8738f27	Merge pull request #5235 from quickwritereader/issue_unaligned_ppc64le Explicit unaligned vector load/stores in PPC64LE GEMV kernels	9 months ago
Martin Kroeker	99d9f1ff38	Fix conditional	9 months ago
Martin Kroeker	96d80801bc	Reinstate the CooperLake microkernel	9 months ago
Martin Kroeker	2e4309315c	Merge pull request #5219 from martin-frbg/sbgemvn_cooper Temporarily disable the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV	9 months ago
Ubuntu	0cc2485594	Explicit unaligned vector load/stores in PPC64LE GEMV kernels	9 months ago
Martin Kroeker	dd38b4e811	Merge pull request #5225 from annop-w/gemv_n Improve performance for SGEMVN on NEONVERSEN1	9 months ago
Martin Kroeker	0241d516f6	Merge pull request #5220 from iha-taisei/sdgemv_n_unroll Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.	9 months ago
Annop Wongwathanarat	d535728803	Improve performance for SGEMVN on NEONVERSEN1	9 months ago
Usui, Tetsuzo	d711906e3e	Add symv kernels for arm64	9 months ago
Iha, Taisei	f1e628b889	Further performance improvements to [SD]GEMV.	9 months ago
Martin Kroeker	211dfd0754	disable the CooperLake microkernel as it produces wrong results	9 months ago
Martin Kroeker	b30dc9701f	Merge pull request #5215 from annop-w/gemv_t Use SVE kernel for S/DGEMVT for SVE machines	9 months ago

1 2 3 4 5 ...

2487 Commits (ec14e1648cff986c3f4b5852ea94b8a1bec1b2ee)