OpenBLAS

Commit Graph

Author	SHA1	Message	Date
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	1 year ago
Srangrang	2996c25c94	add shgemm for RISCV_ZVL128B	1 year ago
Martin Kroeker	0b0bb9951d	Merge pull request #5265 from guoyuanplct/develop kernel/riscv64:Added support for omatcopy on RISCV64_ZVL256B	1 year ago
guoyuanplct	be9f7550b5	Format Code	1 year ago
guoyuanplct	4d213653d8	kernel/riscv64:Added support for omatcopy on riscv64.	1 year ago
Martin Kroeker	8afddc1a81	Merge pull request #5262 from guoyuanplct/develop kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:	1 year ago
guoyuanplct	9a7e3f102b	kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:	1 year ago
pengxu	a978ad3180	Loongarch64: add C functions of zgemm_ncopy_16	1 year ago
pengxu	0ccb050583	Loongarch64: fixed cgemm_ncopy_16_lasx	1 year ago
Martin Kroeker	5141a90993	Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222 ) * Fix ARMV9SME target and add support_sme1 code for MacOS * make sgemm_direct unconditionally available on all arm64 * build a (dummy) sgemm_direct kernel on all arm64 * Update dynamic_arm64.c	1 year ago
Martin Kroeker	151b74284e	Merge pull request #5203 from quic/fix-sgemmdirect-sme1 Add vector registers to clobber list to prevent compiler optimization.	1 year ago
Martin Kroeker	cba32d001a	Merge pull request #5245 from guoyuanplct/develop Optimized RVV_ZVL256B Implementation of zgemv_n	1 year ago
pengxu	f19e72c402	Loongarch64: fixed swap_lasx	1 year ago
pengxu	b471fa337b	Loongarch64: fixed snrm2_lasx	1 year ago
pengxu	57bb46bedf	Loongarch64: fixed rot_lasx	1 year ago
pengxu	6dc4ca2391	Loongarch64: fixed icamax_lasx	1 year ago
pengxu	b528b1b8ea	Loongarch64: fixed iamax_lasx	1 year ago
pengxu	ba9569e382	Loongarch64: fixed dot_lasx	1 year ago
pengxu	dc5fa29851	Loongarch64: fixed cscal_lasx	1 year ago
pengxu	a98dd6d911	Loongarch64: fixed copy_lasx	1 year ago
pengxu	d49319c2d2	Loongarch64: fixed cnrm2_lasx	1 year ago
pengxu	74c97ef814	Loongarch64: fixed cdot_lasx	1 year ago
pengxu	be525521ad	Loongarch64: fixed asum_lasx	1 year ago
pengxu	0cd5ca5527	Loongarch64: fixed amax_lasx	1 year ago
guoyuanplct	11ffc8680e	Format the code	1 year ago
guoyuanplct	7616c42095	Optimized RVV_ZVL256B Implementation of zgemv_n The implementation of zgemv_n using RVV_ZVL256B has been optimized. Compared to the previous implementation, it has achieved a 1.5x performance improvement.	1 year ago
abhishek-fujitsu	9c02cdb073	optimise dot using thread throttling for NEOVERSE V1	1 year ago
Martin Kroeker	d0e8fd6d40	Merge pull request #5239 from annop-w/gemv_n_sve Use SVE kernel for S/DGEMVN for SVE machines	1 year ago
Iha, Taisei	08b5c18d70	fixed a potential out-of-bounds on gemv.	1 year ago
Annop Wongwathanarat	e11744a411	Use SVE kernel for S/DGEMVN for SVE machines	1 year ago
Martin Kroeker	db0abfa907	Merge pull request #5238 from martin-frbg/revert5125 remove non-vectorized SGEMV transpose reduce path for POWER8, restoring optimizations frpm PR4880	1 year ago
Martin Kroeker	7389b6c483	Merge pull request #5237 from martin-frbg/revert5219 Fix and reinstate the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV	1 year ago
Martin Kroeker	4ec62d7f73	remove non-vectorized code path for power8, restoring PR4880	1 year ago
Martin Kroeker	1df8738f27	Merge pull request #5235 from quickwritereader/issue_unaligned_ppc64le Explicit unaligned vector load/stores in PPC64LE GEMV kernels	1 year ago
Martin Kroeker	99d9f1ff38	Fix conditional	1 year ago
Martin Kroeker	96d80801bc	Reinstate the CooperLake microkernel	1 year ago
Martin Kroeker	2e4309315c	Merge pull request #5219 from martin-frbg/sbgemvn_cooper Temporarily disable the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV	1 year ago
Ubuntu	0cc2485594	Explicit unaligned vector load/stores in PPC64LE GEMV kernels	1 year ago
Martin Kroeker	dd38b4e811	Merge pull request #5225 from annop-w/gemv_n Improve performance for SGEMVN on NEONVERSEN1	1 year ago
Martin Kroeker	0241d516f6	Merge pull request #5220 from iha-taisei/sdgemv_n_unroll Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.	1 year ago
Annop Wongwathanarat	d535728803	Improve performance for SGEMVN on NEONVERSEN1	1 year ago
Usui, Tetsuzo	d711906e3e	Add symv kernels for arm64	1 year ago
Iha, Taisei	f1e628b889	Further performance improvements to [SD]GEMV.	1 year ago
Martin Kroeker	211dfd0754	disable the CooperLake microkernel as it produces wrong results	1 year ago
Martin Kroeker	b30dc9701f	Merge pull request #5215 from annop-w/gemv_t Use SVE kernel for S/DGEMVT for SVE machines	1 year ago
Martin Kroeker	2893d0add4	Merge pull request #5211 from guoyuanplct/develop Optimizing the Implementation of GEMV on the RISC-V V Extension	1 year ago
Annop Wongwathanarat	ec146157d3	Use SVE kernel for S/DGEMVT for SVE machines	1 year ago
Martin Kroeker	70865a894e	Merge pull request #5180 from ywwry66/openmp_use_cmake CMake: Pass `OpenMP` compiler and linker flags through CMake targets	1 year ago
lglglglgy	1ff303f36e	Optimizing the Implementation of GEMV on the RISC-V V Extension Specialized some scenarios, performed loop unrolling, and reduced the number of multiplications.	1 year ago
ColumbusAI	7bf848454d	Update zsum.c -- fixed spelling error to successfully compile spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.	1 year ago

1 2 3 4 5 ...

2482 Commits (670ec6f7576ecc74fff96be7c00ec8fffed8647b)