nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	c5640a16c3	gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only (#4407 ) * gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only * relax mnk multiple restrictions * make square tiles in each thread * sanitize num_threads changes	3 years ago
nihui	fd1ac3c7a0	x86 optimization for gemm unified elempack (#4387 )	3 years ago
nihui	0736c5b658	Fix c api allocator (#4360 ) * add some c_api interfaces related to allocator setup. * fix errors in allocator parameters in c_api. * test c api allocator Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com>	3 years ago
nihui	057b5bb515	split tests (#4354 )	3 years ago
nihui	eceac35a7f	implement MultiheadAttention kdim vdim (#4347 )	3 years ago
nihui	498ca7341b	squeeze and expanddims 4d (#4346 )	3 years ago
Lry89757	6a47f8d15c	gridsample op support (#4288 ) Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com> Co-authored-by: nihui <shuizhuyuanluo@126.com>	3 years ago
nihui	5b28c1730e	implement ncnn fold and unfold (#4326 )	3 years ago
nihui	6e49fa30dc	groupnorm 1d/2d/4d (#4312 )	3 years ago
Fangjun Kuang	5281d51535	implement GLU and pnnx conversion (#4283 )	3 years ago
nihui	0b591b0d1f	implement layer feature disabled bit (#4278 )	3 years ago
miemie2013	b13c2a16ce	Optimize x86 DeformableConv2D (#4128 )	3 years ago
nihui	77eda4c19f	implement lstm proj_size (#4263 )	3 years ago
Lry89757	5eb56b2ea5	[Gelu x86] Finish intrinsic with elempack merged(fast version) (#4144 ) * Finish the gelu x86 intrinsics * Finish the fast tanh x86 simd impl	3 years ago
Lry89757	9f59711338	[Prelu x86] Finish intrinsic with elempack merged (#4177 )	3 years ago
Lry89757	9278f90114	[Elu x86] Finish intrinsic with elempack merged (#4153 )	3 years ago
LinHe	03f2ad38ce	Layer Norm x86 SIMD Optimizations (#4065 ) Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com> Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com>	3 years ago
miemie2013	720f3c9aab	Add DeformableConv2D (#4070 ) * Add DeformableConv2D * add unittest and docs * pnnx torchvision deformconv2d conversion Co-authored-by: miemie2013 <miemie2013@users.noreply.github.com> Co-authored-by: nihui <shuizhuyuanluo@126.com>	3 years ago
Lry89757	13a9533984	[BatchNorm Optimize x86] AVX512 intrinsic (#4061 ) * Add the test samples for elempack==16 * Add the AVX512 Support for batchnorm	3 years ago
nihui	76849cede4	armv8.4 i8mm optimization for convolution gemm int8 (#4034 )	3 years ago
nihui	dd86cebab8	armv8.6 ci and coverage (#4025 ) * asimdfhm in fc * move neon bf16 conversion function to arm_usability header * fix cmake option * fix build with newer gcc * arm84 coverage * arm asimdfhm optimization for innerproduct gemm fp16s	3 years ago
nihui	706831f8a9	arm vfpv4 optimization for innerproduct (#3950 )	4 years ago
nihui	440bfdd2cc	x86 f16c optimization for innerproduct (#3944 )	4 years ago
nihui	067e8e1d92	mips unified elempack for elementwise layers (#3928 )	4 years ago
nihui	1377acf945	avx512 bf16 fp16 infrastructure (#3926 )	4 years ago
nihui	20a14bf5ae	arm convolution winograd dot function, adjust arm convolution winograd strategy (#3915 )	4 years ago
nihui	ca0ba4b25f	fine grained winograd options, adjust x86 convolution winograd strategy (#3908 ) * fine grained winograd options * x86 optimization for convolution winograd f23 pack4/pack8/pack16 * fix avx512 and t4 ci * fix fast direct conv path * winograd63 is actually slower than winograd43 on very large channel	4 years ago
Evgeny Proydakov	184c479b64	Added a simple unittest for Power layer. (#3893 )	4 years ago
nihui	241524ffce	discard weight memory for x86 arm vulkan (#3865 ) * discard weight memory for x86 and vulkan * drop arm innerproduct weight * drop arm convolution weight * drop arm convolutiondepthwise weight * drop x86 vulkan deconvolution deconvolutiondepthwise weight * drop arm deconvolution deconvolutiondepthwise weight * arm neon assembly optimization for innerproduct pack4	4 years ago
tpoisonooo	6fd801b6d7	feat(src/layer): add vision_transformer benchmark (#3730 ) * feat(src/layer): add vision_transformer benchmark and relative layer * refactor(testutil.h): add para for RandomMat	4 years ago
NaLan ZeYu	5388f9f312	test: fix printf arguments mismatch (#3774 )	4 years ago
nihui	f9c1787de9	implement einsum layer and pnnx conversion (#3768 )	4 years ago
nihui	ee6402553c	layernorm for vector and mat along w, pnnx convnext end2end test (#3764 )	4 years ago
jasonZhang	e62d674e5d	Add unittest and SSE&AVX optimized for BNLL (#3759 )	4 years ago
nihui	308965b7e9	sanitize cooperative matrix option in tests	4 years ago
nihui	0ea327b557	x86 sse/avx/avx512 optimization for softmax (#3712 )	4 years ago
nihui	131f3d1323	x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694 ) * convolution winograd pack16to1 * x86 deconvolution and deconvolutiondepthwise * simpleomp allow 32 arguments * drop shadow variable workaround * less winograd test error	4 years ago
nihui	dadc640c66	x86 avx512 optimization (#3581 ) * unified relu avx512 * unifed clip avx512 * unaryop avx512 * sigmoid avx512 * binaryop avx512 * padding convolution avx512 * convolutiondepthwise avx512 * innerproduct avx512 * reshape avx512 * slice avx512 * hardsigmoid hardswish avx512 * swish avx512 * pooling avx512 * crop avx512 * convolution sgemm pack16 * convolution 3x3 winograd pack16 * interp avx512 * convolution sgemm pack1to16 * convolution sgemm pack16to8 * convolution sgemm pack8to16 * convolution sgemm pack16to4 * fix vulkan permute pack8 * fix vulkan convolution gemm pack8to1	4 years ago
nihui	559e5b23f9	vulkan tensorcore optimization (#3628 ) * query and enable cooperative matrix * fix build with old vulkan sdk * implement cooperative matrix optimization * add nvidia-t4 coverage * adjust test option for more coverage	4 years ago
nihui	002c07d4ec	mix vulkan winograd f23 and f43 (#3639 ) * mix vulkan winograd f23 and f43 * larget epsilon for winograd optimization test	4 years ago
nihui	d42e048b56	pnnx convert torch.addmm (#3634 )	4 years ago
nihui	6e19ab26ba	massive vulkan optimization (#3602 ) * vulkan deconvolution sgemm col2im * vulkan convolution winograd43 * improve fp16s numeric stablity * vulkan convolution im2col sgemm * check squeezenet top2, as top3 vs top4 score too close..	4 years ago
nihui	2880eff264	deconv1d deconv3d (#3584 ) * fix sigmoid returns nan with very large input	4 years ago
nihui	920aa79f04	drop x86 avx2 fp16 (#3568 )	4 years ago
nihui	d452eca28f	convert torch.matmul, eliminate noop pad and identity op, fuse transpose matmul, fuse select to unbind (#3554 )	4 years ago
Yuzhong Yan	681141ff42	[YZ] Fix bug in unit test (#3556 )	4 years ago
nihui	33e225f173	fix c api test	4 years ago
nihui	c5d7f963b9	layer tile (#3491 )	4 years ago
Xiaohan Liu	3daabd515d	add missing doffset (#3475 )	4 years ago
nihui	922f8b33c1	reduction4d, merge keepdims arg, add test (#3469 )	4 years ago

1 2 3 4 5 ...

271 Commits (da44ec5b140bebd565d029b60f2959f02f96ebf2)