nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	6987efd950	fix scale avx512 (#4580 )	3 years ago
nihui	dabc4c065f	arm convolution winograd unified elempack (#4556 ) * update f43 coeffs * arm convolution winograd unified elempack * disable bf16s test atm * test gnu inline asm off	3 years ago
WuJinxuan	ff80ac2955	[ARM] Multiheadattention (#4463 )	3 years ago
nihui	d0c2738043	update riscv winograd f43 coeffs and fix some warnings (#4537 ) * update winograd f43 coeffs * rvv tanh rework * fix warnings * rebuild qemu	3 years ago
WuJinxuan	6572da3533	[x86] GroupNorm (#4471 ) Co-authored-by: EdVince <EdVince@users.noreply.github.com>	3 years ago
nihui	1832da8292	concat 4d (#4528 )	3 years ago
nihui	fb9cf7982d	eltwise 4d (#4529 )	3 years ago
nihui	32e2de015e	slice 4d (#4525 )	3 years ago
nihui	fc6ce4a641	copyto operator (#4522 )	3 years ago
nihui	242e775d21	pnnx convert torch log10, pow 2 as square (#4518 )	3 years ago
nihui	246e71c526	implement atan2 (#4516 )	3 years ago
Fangjun Kuang	92e75105c9	Support torch.cumsum (#4505 )	3 years ago
nihui	ab4cfbf5b0	enrich ncnn binary broadcast rules (#4513 )	3 years ago
nihui	dfbcd3e69b	improve vulkan winograd f43 fp16 numerical stability (#4492 )	3 years ago
nihui	fed99fd35b	gemm output transpose, prepack c (#4479 ) * mha is now permute and reshape free * gemm user defined tile mnk param	3 years ago
nihui	2e3e680d77	x86 optimization for packed convolution unified elempack (#4469 )	3 years ago
nihui	88274827da	x86 optimization for winograd unified elempack (#4456 )	3 years ago
nihui	15761fc1a6	arm vfpv4 asimdhp asimdfhm optimization for gemm (#4432 )	3 years ago
nihui	c5640a16c3	gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only (#4407 ) * gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only * relax mnk multiple restrictions * make square tiles in each thread * sanitize num_threads changes	3 years ago
nihui	fd1ac3c7a0	x86 optimization for gemm unified elempack (#4387 )	3 years ago
nihui	0736c5b658	Fix c api allocator (#4360 ) * add some c_api interfaces related to allocator setup. * fix errors in allocator parameters in c_api. * test c api allocator Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com>	3 years ago
nihui	057b5bb515	split tests (#4354 )	3 years ago
nihui	eceac35a7f	implement MultiheadAttention kdim vdim (#4347 )	3 years ago
nihui	498ca7341b	squeeze and expanddims 4d (#4346 )	3 years ago
Lry89757	6a47f8d15c	gridsample op support (#4288 ) Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com> Co-authored-by: nihui <shuizhuyuanluo@126.com>	3 years ago
nihui	5b28c1730e	implement ncnn fold and unfold (#4326 )	3 years ago
nihui	6e49fa30dc	groupnorm 1d/2d/4d (#4312 )	3 years ago
Fangjun Kuang	5281d51535	implement GLU and pnnx conversion (#4283 )	3 years ago
nihui	0b591b0d1f	implement layer feature disabled bit (#4278 )	3 years ago
miemie2013	b13c2a16ce	Optimize x86 DeformableConv2D (#4128 )	3 years ago
nihui	77eda4c19f	implement lstm proj_size (#4263 )	3 years ago
Lry89757	5eb56b2ea5	[Gelu x86] Finish intrinsic with elempack merged(fast version) (#4144 ) * Finish the gelu x86 intrinsics * Finish the fast tanh x86 simd impl	3 years ago
Lry89757	9f59711338	[Prelu x86] Finish intrinsic with elempack merged (#4177 )	3 years ago
Lry89757	9278f90114	[Elu x86] Finish intrinsic with elempack merged (#4153 )	3 years ago
LinHe	03f2ad38ce	Layer Norm x86 SIMD Optimizations (#4065 ) Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com> Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com>	3 years ago
miemie2013	720f3c9aab	Add DeformableConv2D (#4070 ) * Add DeformableConv2D * add unittest and docs * pnnx torchvision deformconv2d conversion Co-authored-by: miemie2013 <miemie2013@users.noreply.github.com> Co-authored-by: nihui <shuizhuyuanluo@126.com>	3 years ago
Lry89757	13a9533984	[BatchNorm Optimize x86] AVX512 intrinsic (#4061 ) * Add the test samples for elempack==16 * Add the AVX512 Support for batchnorm	3 years ago
nihui	76849cede4	armv8.4 i8mm optimization for convolution gemm int8 (#4034 )	3 years ago
nihui	dd86cebab8	armv8.6 ci and coverage (#4025 ) * asimdfhm in fc * move neon bf16 conversion function to arm_usability header * fix cmake option * fix build with newer gcc * arm84 coverage * arm asimdfhm optimization for innerproduct gemm fp16s	3 years ago
nihui	706831f8a9	arm vfpv4 optimization for innerproduct (#3950 )	3 years ago
nihui	440bfdd2cc	x86 f16c optimization for innerproduct (#3944 )	3 years ago
nihui	067e8e1d92	mips unified elempack for elementwise layers (#3928 )	4 years ago
nihui	1377acf945	avx512 bf16 fp16 infrastructure (#3926 )	4 years ago
nihui	20a14bf5ae	arm convolution winograd dot function, adjust arm convolution winograd strategy (#3915 )	4 years ago
nihui	ca0ba4b25f	fine grained winograd options, adjust x86 convolution winograd strategy (#3908 ) * fine grained winograd options * x86 optimization for convolution winograd f23 pack4/pack8/pack16 * fix avx512 and t4 ci * fix fast direct conv path * winograd63 is actually slower than winograd43 on very large channel	4 years ago
Evgeny Proydakov	184c479b64	Added a simple unittest for Power layer. (#3893 )	4 years ago
nihui	241524ffce	discard weight memory for x86 arm vulkan (#3865 ) * discard weight memory for x86 and vulkan * drop arm innerproduct weight * drop arm convolution weight * drop arm convolutiondepthwise weight * drop x86 vulkan deconvolution deconvolutiondepthwise weight * drop arm deconvolution deconvolutiondepthwise weight * arm neon assembly optimization for innerproduct pack4	4 years ago
tpoisonooo	6fd801b6d7	feat(src/layer): add vision_transformer benchmark (#3730 ) * feat(src/layer): add vision_transformer benchmark and relative layer * refactor(testutil.h): add para for RandomMat	4 years ago
NaLan ZeYu	5388f9f312	test: fix printf arguments mismatch (#3774 )	4 years ago
nihui	f9c1787de9	implement einsum layer and pnnx conversion (#3768 )	4 years ago

1 2 3 4 5 ...

289 Commits (a961ab992e2e4bf1cb950423bd7c2e2d40eb4ea2)