nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	44e0d95c0d	x86 sse2/xop/avx/avx2/avx512/vnni/vnniint8 optimization for gemm int8 (#5763 ) * skip round problem * sde on ubuntu24	1 year ago
nihui	19caca3140	port rvv intrinsic 1.0+ (#5642 ) * zfh zvfh xtheadvector infra * dispatch for rvv and xtheadvector * dispatch for non-vector zfh * port xtheadvector recp rsqrt trunc * general rvv gemm * c906 and c910 ci * old tuple code clean * update riscv64 ci * update build doc * drop old th1520 toolchain	1 year ago
nihui	8d2ac57824	fix missing asimdfhm target macro in ndk-r21 (#5804 )	1 year ago
nihui	0734b657d9	spectrogram and inverse spectrogram (#5779 ) * only supports hann, hamming and all-one window * inverse spectrogram does not support length parameter * spectrogram always returns torch.view_as_real(out) as ncnn does not support complex typed mat yet * inverse spectrogram always accepts torch.view_as_complex(in) as ncnn does not support complex typed mat yet	1 year ago
nihui	9cefe9a624	avx vnni int8, avx vnni int16, avx ne convert infrastructure (#5749 )	1 year ago
nihui	c32442aa09	disable x86 auto recip optimization for potential precision loss (#5762 )	1 year ago
nihui	e7602a206b	fix gemm arm int8 scales descales offset (#5750 )	1 year ago
nihui	8fe62812c9	arm neon optimization for layernorm fp32/bf16s/fp16s (#5746 )	1 year ago
Upliner Mikhalych	cbd17cd062	Fix #5741 don't crash when vkCreateDevice fails (#5742 )	1 year ago
nihui	73d3519326	layernorm x86 optimization, re (#5745 )	1 year ago
nihui	bd1f39ed82	blacklist mesa vulkan cooperative matrix feature (#5739 ) ref https://gitlab.freedesktop.org/mesa/mesa/-/issues/10847	1 year ago
nihui	8105c75120	improve compatibility of harmonyos cpu topology abi (#5740 )	1 year ago
nihui	121b1fecd5	apply code-format changes	1 year ago
nihui	66b54cbea2	multiheadattention int8 quantization (#5733 ) * x86 vulkan fallback * comment about bf16s	1 year ago
nihui	1c7af00499	gemm int8 quantization (#5706 ) * quantize gemm * write gemm quantize scales * update doc * less openmp args * x86 riscv fallback * skip gemm vulkan int8 * fix noint8 test, fix arm bf16 test * enable vfpv4 on neon build only * fix gemm vulkan without C * fp16 pack8 output * enable elempack=8 only for asimdhp+ * tiled gemm int8 test * opt arm64 tiles, fix asimdhp dispatch	1 year ago
nihui	204583ba52	x86 sse2/avx/avx512 optimization for rmsnorm (#5672 )	1 year ago
nihui	8077d340a9	arm neon optimzation for rmsnorm (#5668 )	1 year ago
nihui	5df5413c81	embed int8 quantization and add embed test (#5667 )	1 year ago
nihui	70310e951e	fix out of range read in convolution im2col aarch64 (#5631 )	1 year ago
nihui	fdf0df3079	RMSNorm (#5630 )	1 year ago
佰阅	391152f500	c_api surpport set_vulkan_device (#5610 )	2 years ago
quink	92e0b8253b	arm/convolution_3x3_pack1to8_fp16s: prefer ldr/str over ld1/st1 (#5603 ) Depending on the arch, ldr/str can be faster than ld1/st1, especially for loading to one lane form. For example, on Cortex A75, 1. execution latency of 'ldr q0' and 'ldr h0' are 5 2. execution latency of 'ld1 {v0.16b}' is 6 3. execution latency of 'ld1 {v0.h}[0]' is 8 On Cortex X3, 1. execution latency of 'ldr q0' and 'ldr h0' are 6 2. execution latency of 'ld1 {v0.16b}' is 6 3. execution latency of 'ld1 {v0.h}[0]' is 8 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2 years ago
nihui	997c8926d7	use ruapu detection only on windows arm, enable cpu powerinfo with mingw compiler (#5593 )	2 years ago
zhangyang2057	081a9c39c8	Fix tanh typo for rvv. (#5584 ) * Fix tanh typo for rvv. * Fix tanh for rvv fp16.	2 years ago
nihui	3752d71200	fix potential fp16s bf16s conflicts on arm vfpv4 (#5578 ) * fix potential fp16s bf16s conflicts on armv7 vfpv4 * but prefer fp16 on armv8.2	2 years ago
quink	a05c113f80	Add wasi support (#5534 ) * benchmark: Don't use std::thread when NCNN_THREADS is OFF * Add wasi support cmake -B build -DCMAKE_BUILD_TYPE=Release \ --toolchain ${wasi_sdk}/share/cmake/wasi-sdk.cmake \ -DNCNN_RUNTIME_CPU=OFF \ -DNCNN_DISABLE_EXCEPTION=ON \ -DNCNN_THREADS=OFF cmake --build build After build, you can run benchncnn on cmdline with wasmtime: wasmtime --dir . benchncnn	2 years ago
nihui	4c3debae2d	multiheadattention scale param (#5526 ) * update swiftshader * skip vs2017 swiftshader	2 years ago
Xyzhao	fbd6690d6c	fix: add NCNN_PLATFORM_API macro for VkAndroidHardwareBufferImageAllocator (#5521 )	2 years ago
nihui	8235cad999	mha allow qdim differs from embed_dim (#5519 ) * test mha oom	2 years ago
nihui	39c27de47b	test concat oom (#5502 )	2 years ago
nihui	093c516898	test slice oom (#5501 )	2 years ago
Wei Wu	bb54d575a0	Update ruapu.h to the latest version. (#5499 ) The updated ruapu adds support for multiple architectures such as RISC-V, MIPS, and Loongson, and can detect more Arm features. The latest version is `10b02b3755`.	2 years ago
nihui	da7d1a10f7	test x86 arm convolution oom (#5492 ) * skip mips loongarch riscv oom test atm * test softmax oom	2 years ago
nihui	102e98970f	fix unexpected abs error on powerpc vsx (#5498 )	2 years ago
nihui	19ea54f266	more x86 vnni optimization for lstm (#5496 ) * workaround vs2019 crash	2 years ago
nihui	debc33fee2	arm handle allocation failures (#5490 )	2 years ago
nihui	b4379630fb	x86 handle allocation failures (#5489 )	2 years ago
TianZer	b0de947b32	fix mingw64 avx crash and termux build issue (#5464 ) * Remove two potential warnings for VisualStudio * fix mingw64 avx crash * fix build issue in termux	2 years ago
Asd-g	bbf2e5d533	create_gpu_instance: do not perform destroy_gpu_instance() (#5437 ) When performing destroy_gpu_instance(), g_instance.created is always 0.	2 years ago
nihui	c4a007406d	windows clang ci (#5469 ) * windows clang ci * clang msvc use x86intrin.h for xop * test arm64 compiler features	2 years ago
nihui	08b7d99a75	rnn/lstm/gru dynamic quantization (#5435 )	2 years ago
Tabbleman	b8fefb977d	clear warning: unused variable while building on x86-wsl platform (#5444 )	2 years ago
quink	e31be492d5	c_api: Fix function prototypes with no argument (#5436 ) Here is a big difference between C and C++. foo() in C means that the funtion takes an unspecified number of arguments, while foo(void) means the function takes no argument. Fix -Wstrict-prototypes warning. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2 years ago
nihui	6cdd7110be	fix instruction extension dispatch (#5427 )	2 years ago
nihui	9ce7930413	x86 optimization for convolution tiled gemm (#5426 )	2 years ago
nihui	0fd25d6c70	fix arm riscv build with NCNN_BF16=OFF (#5422 )	2 years ago
nihui	db035d602d	update ncnnoptimize layers, lightmode=false keeps original weight (#5414 )	2 years ago
nihui	056509a034	fix create_pipeline crash in vulkan-enabled layer without calling load_param/load_model first (#5410 )	2 years ago
張小凡	3b048d1923	destroy_gpu_instance() function wait for all devices to be idle before destroy (#4763 ) * destroy_gpu_instance() will internally ensure that all vulkan devices are idle before proceeding with destruction.	2 years ago
nihui	69640594f7	unified macos ios ci, drop 32bit support, drop ios arm64e, default to ios 13 (#5403 )	2 years ago

1 2 3 4 5 ...

1857 Commits (9f67ff1e9e4159854384002cf3de032df4351da0)