OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	cba32d001a	Merge pull request #5245 from guoyuanplct/develop Optimized RVV_ZVL256B Implementation of zgemv_n	1 year ago
pengxu	f19e72c402	Loongarch64: fixed swap_lasx	1 year ago
pengxu	b471fa337b	Loongarch64: fixed snrm2_lasx	1 year ago
pengxu	57bb46bedf	Loongarch64: fixed rot_lasx	1 year ago
pengxu	6dc4ca2391	Loongarch64: fixed icamax_lasx	1 year ago
pengxu	b528b1b8ea	Loongarch64: fixed iamax_lasx	1 year ago
pengxu	ba9569e382	Loongarch64: fixed dot_lasx	1 year ago
pengxu	dc5fa29851	Loongarch64: fixed cscal_lasx	1 year ago
pengxu	a98dd6d911	Loongarch64: fixed copy_lasx	1 year ago
pengxu	d49319c2d2	Loongarch64: fixed cnrm2_lasx	1 year ago
pengxu	74c97ef814	Loongarch64: fixed cdot_lasx	1 year ago
pengxu	be525521ad	Loongarch64: fixed asum_lasx	1 year ago
pengxu	0cd5ca5527	Loongarch64: fixed amax_lasx	1 year ago
guoyuanplct	11ffc8680e	Format the code	1 year ago
guoyuanplct	7616c42095	Optimized RVV_ZVL256B Implementation of zgemv_n The implementation of zgemv_n using RVV_ZVL256B has been optimized. Compared to the previous implementation, it has achieved a 1.5x performance improvement.	1 year ago
abhishek-fujitsu	9c02cdb073	optimise dot using thread throttling for NEOVERSE V1	1 year ago
Martin Kroeker	d0e8fd6d40	Merge pull request #5239 from annop-w/gemv_n_sve Use SVE kernel for S/DGEMVN for SVE machines	1 year ago
Iha, Taisei	08b5c18d70	fixed a potential out-of-bounds on gemv.	1 year ago
Annop Wongwathanarat	e11744a411	Use SVE kernel for S/DGEMVN for SVE machines	1 year ago
Martin Kroeker	db0abfa907	Merge pull request #5238 from martin-frbg/revert5125 remove non-vectorized SGEMV transpose reduce path for POWER8, restoring optimizations frpm PR4880	1 year ago
Martin Kroeker	7389b6c483	Merge pull request #5237 from martin-frbg/revert5219 Fix and reinstate the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV	1 year ago
Martin Kroeker	4ec62d7f73	remove non-vectorized code path for power8, restoring PR4880	1 year ago
Martin Kroeker	1df8738f27	Merge pull request #5235 from quickwritereader/issue_unaligned_ppc64le Explicit unaligned vector load/stores in PPC64LE GEMV kernels	1 year ago
Martin Kroeker	99d9f1ff38	Fix conditional	1 year ago
Martin Kroeker	96d80801bc	Reinstate the CooperLake microkernel	1 year ago
Martin Kroeker	2e4309315c	Merge pull request #5219 from martin-frbg/sbgemvn_cooper Temporarily disable the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV	1 year ago
Ubuntu	0cc2485594	Explicit unaligned vector load/stores in PPC64LE GEMV kernels	1 year ago
Martin Kroeker	dd38b4e811	Merge pull request #5225 from annop-w/gemv_n Improve performance for SGEMVN on NEONVERSEN1	1 year ago
Martin Kroeker	0241d516f6	Merge pull request #5220 from iha-taisei/sdgemv_n_unroll Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.	1 year ago
Annop Wongwathanarat	d535728803	Improve performance for SGEMVN on NEONVERSEN1	1 year ago
Usui, Tetsuzo	d711906e3e	Add symv kernels for arm64	1 year ago
Iha, Taisei	f1e628b889	Further performance improvements to [SD]GEMV.	1 year ago
Martin Kroeker	211dfd0754	disable the CooperLake microkernel as it produces wrong results	1 year ago
Martin Kroeker	b30dc9701f	Merge pull request #5215 from annop-w/gemv_t Use SVE kernel for S/DGEMVT for SVE machines	1 year ago
Martin Kroeker	2893d0add4	Merge pull request #5211 from guoyuanplct/develop Optimizing the Implementation of GEMV on the RISC-V V Extension	1 year ago
Annop Wongwathanarat	ec146157d3	Use SVE kernel for S/DGEMVT for SVE machines	1 year ago
Martin Kroeker	70865a894e	Merge pull request #5180 from ywwry66/openmp_use_cmake CMake: Pass `OpenMP` compiler and linker flags through CMake targets	1 year ago
lglglglgy	1ff303f36e	Optimizing the Implementation of GEMV on the RISC-V V Extension Specialized some scenarios, performed loop unrolling, and reduced the number of multiplications.	1 year ago
ColumbusAI	7bf848454d	Update zsum.c -- fixed spelling error to successfully compile spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.	1 year ago
Egbert Eich	ea6515c4b3	On zarch don't produce objects from assembler with a writable stack section On z-series, the current version of the GNU toolchain produces warnings such as: ``` /usr/lib64/gcc/[...]/s390x-suse-linux/bin/ld: warning: ztrmm_kernel_RC_Z14.o: missing .note.GNU-stack section implies executable stack /usr/lib64/[...]/s390x-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker ``` To prevent this message and make sure we are future proof, add ``` .section .note.GNU-stack,"",@progbits ``` Also add the `.size` bit to give the asm defined functions a proper size in the symbol table. Signed-off-by: Egbert Eich <eich@suse.com>	1 year ago
Ruiyang Wu	02fd1df10b	CMake: Pass `OpenMP` compiler and linker flags through CMake targets Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than passing the compiler and linker flags manually. Furthermore, it allows the user to customize those flags by setting `OpenMP_LANG_FLAGS`, `OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.	1 year ago
Ye Tao	f27ba5efd1	fix bugs in aarch64 sbgemv_n kernel	1 year ago
Annop Wongwathanarat	edef2e4441	Fix bug in ARM64 sbgemv_t	1 year ago
Martin Kroeker	b55ca71d5b	Merge pull request #5182 from annop-w/sgemm_ncopy Optimize aarch64 sgemm_ncopy	1 year ago
Martin Kroeker	2f778554b8	Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16 replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16	1 year ago
Annop Wongwathanarat	9807f56580	Optimize aarch64 sgemm_ncopy	1 year ago
Martin Kroeker	a3e7b16072	Merge pull request #5157 from manaalmj/feature Optimize gemv_n_sve kernel	1 year ago
Ye Tao	4c00099ed6	replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16	1 year ago
Annop Wongwathanarat	a085b6c9ec	Fix aarch64 sbgemv_t compilation error for GCC < 13	1 year ago
manjam01	5c4e38ab17	Optimize gemv_n_sve kernel	1 year ago

1 2 3 4 5 ...

2470 Commits (3c878f3e706e0b718dfc097dd86f511754cbcd65)