nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	11cffce114	armv8.2 infrastructure (#1856 ) * runtime cpu dispatch * force thread one * disable openmp for coverage * simplify test layer * print NCNN_TARGET_ARCH * less ci build variants * weight fp16 storage option * test convdw int8 * apple a12 a13 * ncnn_add_layer ncnn_add_shader cmake macro	5 years ago
nihui	b5e288b521	layer creator function is not necessary for built-in layers	5 years ago
Tijmen Verhulsdonck	66618340ac	x86 fp16 weight storage optimizations (#1871 ) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io>	5 years ago
Tijmen Verhulsdonck	988e8088ea	Fix benchmark (#1864 )	6 years ago
nihui	01b8b79ed2	packing layout option respect support_packing property	6 years ago
Tijmen Verhulsdonck	d1b5711791	X86 Elempack 8 AVX implementations. (#1853 ) * added avx implementations of FC and Max pool * Specify AVX2 * Small fixes and using Fused avx activations * fix type casting * fixing some CI errors * Fix code format * fix pooling test * remove vector typedef * More compile fixes * remove vector typedef * set c++ version to 17 * Force c++ 17 * Fixing mathfun * Try and workaround typedef issues * typefix * Remove typedef * switch to static inline * attempting to fix msvc bug * Verified MSVX FIX * Fixing clang build * commit before switch * More avx and packing implementation * Fix ctest * starting the depthwise pack 8 implementation * Unrolled loop * add depthwise pack 8 implementations * Working 1x1 pack 8 implementation added * revert incorrect changes * added conact elempack 8 * more elempack enabled layers added and started on the conversion of the winograd pack4 conv 3x3 * Added code formatting * fix styling * Unroll loops * unrolling loops * Added more elempac layers for mobilenet v3 * revert commit * fix code style * remove arm neon references * remove pack4 references * More cleanup * added packing avx code * fixing linux build ctests * remove usage of aligned loads * More aligned mem ops removed * Cleanup, revert some files and remove not working winograd and shufflechannel implementation * add stackoverflow referal * Fix windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * implement requested chaanges * remove reshape * revert arm file change * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * fix unterminated directive Co-authored-by: Restyled.io <commits@restyled.io>	6 years ago
nihui	3ef995ed1e	format code style and setup restyled.io (#1840 )	6 years ago
zhiliu6	63d7e2c88d	Add support for darknet EfficientNetB0-Yolov3 conversion. (#1821 ) * fuse mish and depthwise convolution * darknet2ncnn: support enet-coco conversion.	6 years ago
zhiliu6	3bfabf1d6a	Add fused convolution and mish layer support. (#1761 )	6 years ago
nihui	e14716dfef	convolution and pooling make padding helper, flatten innerproduct pooling bf16s neon	6 years ago
xieydd	b760e22da2	fix requant relu6 bug (#1590 ) * fix requant relu6 bug * fix * delete pipeline change in forward/forward_inplace avoid race in multithreading	6 years ago
nihui	6f2ef1932d	int8 code refactoring wip, add int8 test	6 years ago
nihuini	336d1c1edd	remove the ncnn namespace for in source Option	6 years ago
nihuini	cd4be6d0fa	call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks	6 years ago
nihuini	c0a4ffcf66	convolution pad_value param	6 years ago
nihuini	e4b44d293e	more autopad SAME_LOWER	6 years ago
nihuini	9a6ee37eef	asymmetric padding parameter for convolution and deconvolution family	6 years ago
nihui	b4c388a72a	Mat misc function accept option parameter, deconvolution pack4 arm neon	6 years ago
BUG1989	bcfe9f453f	initial the ncnn post training quantization tools (#1067 ) * initial the ncnn post training quantization tools * clear some comments of tools * fix the Travis ci compiler error	7 years ago
nihuini	838c5df839	option api changes	7 years ago
nihui	3e003ffd98	fuse sigmoid	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihuini	3f85cafc08	fuse relu leakyrelu clip into convolution/deconvolution/innerproduct	7 years ago
BUG1989	780c7d9a72	merge de/requantize op, optimize some int8 conv layer on arm64-v8a (#867 ) * optimize the conv sgemm int8 on arm64-v8a platform * optimize int8 arm64-v8a with sadalp ins * merge requantize op into latest conv layer * merge requantize op into conv-int8 op * update the mobilenet.param in the benchmark * Update README.md update Kirin970 and RK3399 * try to fix the travis build error	7 years ago
BUG1989	8e337d440e	fix the bug with convdw7x7 op working on int8 mode (#818 )	7 years ago
BUG1989	8ff831f7cd	fix the segmentation fault when load int8 model (#811 )	7 years ago
BUG1989	df3d224484	new int8 implement,better accuracy (#749 ) * add the armv7a conv3x3s1 implement without overflow,remove old codes * fix the bug of conv3x3s2 packed int8 * new int8 implement,weight quant by perchanel,better accuracy~ * fix the bug of conv3x3s1 packed int8 neon * add the naive c fp32 and int8 winograd F(2,3) * add the neon intrinsic int8 winograd F(2,3) * optimize the armv7a int8 winograd F(2,3) with neon assembly * optimize the armv7a int8 winograd F(2,3) input transform with assembly. * add the requantize layer and int8 relu implement. * add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64. * fix int8 bugs * add the c naive im2col with sgemm * add aarch64 int8 winograd f23, conv3x3s2 naive implement * add the int8 sgemm conv7x7s2 on x86/armv7a platform * optimize the int8 sgemm by neon intrinsic and packed kernel * optimize the int8 sgemm with packed data * optimize the int8 sgemm with armv7a neon assembly * add the int8 sgemm on arm64-v8a platform * perpare to merge latest codes from master * add the int8 param files * In the Class Net,add the fuse_network method	7 years ago
nihuini	6f1b0b0a61	quantized padding in convolution, use range sweets	7 years ago
nihuini	2dbaf6f7b7	store int8 scale in binary	7 years ago
nihui	2fe7ada4d8	add arm int8 convolution stub, preload group op for x86	7 years ago
nihui	eac7c66a97	fix fp32 group convolution on x86	7 years ago
nihui	5d04a3a45c	layer holds bottom blob scale, depthwise convolution read group scales	7 years ago
nihuini	6b536701c3	sub-mat shall be allocator-aware	7 years ago
nihuini	4be27a0a89	int8 inference on x86	7 years ago
nihui	a169cec363	core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table	7 years ago
nihui	9706cd1447	implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469	7 years ago
nihuini	9ac305e160	create 3-dim sub blob for group convolution, fix #315	8 years ago
nihui	6c4c810fda	decouple modelbin of different input types, simplify timestamp function	8 years ago
nihuini	76a55693a6	decouple convolutiondepthwise and convolution, reduce binary size by 10%, fix #254	8 years ago
nihuini	03621aa7f9	more x86 stub for convolution and convolutiondepthwise	8 years ago

40 Commits (4d2d625432e8fdaaaa33042f31ceb6071eef6809)