nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	7655b9e4e9	fix build on armv7 again ...	6 years ago
nihui	a97439988f	fix build on armv7	6 years ago
nihuini	81a5dfe76b	general convolution and convolutiondepthwise arm neon pack4, wip	6 years ago
BUG1989	bcfe9f453f	initial the ncnn post training quantization tools (#1067 ) * initial the ncnn post training quantization tools * clear some comments of tools * fix the Travis ci compiler error	7 years ago
BUG1989	d9f269fa3d	use sgemm fp32 on arm platform,optimize conv1x1s2 (#1031 )	7 years ago
nihuini	4de4078779	move platform includes out of namespace	7 years ago
nihui	3e003ffd98	fuse sigmoid	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihuini	3f85cafc08	fuse relu leakyrelu clip into convolution/deconvolution/innerproduct	7 years ago
BUG1989	780c7d9a72	merge de/requantize op, optimize some int8 conv layer on arm64-v8a (#867 ) * optimize the conv sgemm int8 on arm64-v8a platform * optimize int8 arm64-v8a with sadalp ins * merge requantize op into latest conv layer * merge requantize op into conv-int8 op * update the mobilenet.param in the benchmark * Update README.md update Kirin970 and RK3399 * try to fix the travis build error	7 years ago
BUG1989	ff38053321	[WIP] arm64-v8a int8 optimization (#823 ) * requantize layer arm64-v8a neon implement * convdw3x3s1 arm64-v8a neon implement * convdw3x3s2 arm64-v8a neon implement * conv1x1s1 arm64-v8a is optimized by neon assembly * conv sgemm int8 optimized with neon assembly,kernel transform is offline * conv conv winograd int8 optimized with neon assembly,fix ci build failed * conv3x3s2 int8 arm64-v8a optimized with neon assembly,remove old codes.	7 years ago
BUG1989	8e337d440e	fix the bug with convdw7x7 op working on int8 mode (#818 )	7 years ago
BUG1989	8ff831f7cd	fix the segmentation fault when load int8 model (#811 )	7 years ago
BUG1989	df3d224484	new int8 implement,better accuracy (#749 ) * add the armv7a conv3x3s1 implement without overflow,remove old codes * fix the bug of conv3x3s2 packed int8 * new int8 implement,weight quant by perchanel,better accuracy~ * fix the bug of conv3x3s1 packed int8 neon * add the naive c fp32 and int8 winograd F(2,3) * add the neon intrinsic int8 winograd F(2,3) * optimize the armv7a int8 winograd F(2,3) with neon assembly * optimize the armv7a int8 winograd F(2,3) input transform with assembly. * add the requantize layer and int8 relu implement. * add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64. * fix int8 bugs * add the c naive im2col with sgemm * add aarch64 int8 winograd f23, conv3x3s2 naive implement * add the int8 sgemm conv7x7s2 on x86/armv7a platform * optimize the int8 sgemm by neon intrinsic and packed kernel * optimize the int8 sgemm with packed data * optimize the int8 sgemm with armv7a neon assembly * add the int8 sgemm on arm64-v8a platform * perpare to merge latest codes from master * add the int8 param files * In the Class Net,add the fuse_network method	7 years ago
nihuini	8fda293f91	neon optimize for depthwise convolution 5x5 :P	7 years ago
nihuini	ef36d79b7e	implement the missing dequantize image on armv7, prefer neon-optimized 3-dim dequantize, fix #547	7 years ago
nihuini	6f1b0b0a61	quantized padding in convolution, use range sweets	7 years ago
nihuini	2dbaf6f7b7	store int8 scale in binary	7 years ago
nihui	2fe7ada4d8	add arm int8 convolution stub, preload group op for x86	7 years ago
nihuini	6b536701c3	sub-mat shall be allocator-aware	7 years ago
nihui	a169cec363	core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table	7 years ago
nihui	9706cd1447	implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469	8 years ago
nihuini	0ce0c11851	load sub-op in advance for group convolution	8 years ago
nihuini	9ac305e160	create 3-dim sub blob for group convolution, fix #315	8 years ago
nihui	6c4c810fda	decouple modelbin of different input types, simplify timestamp function	8 years ago
nihuini	76a55693a6	decouple convolutiondepthwise and convolution, reduce binary size by 10%, fix #254	8 years ago
Lamply	6612178960	correct arm convolution depthwise mistakes (#246 )	8 years ago
nihuini	a84ba8fc0f	element type storage support in Mat, move data member the first so that a pointer to Mat is a pointer to data, convenient index access for float vector	8 years ago
nihuini	57df1076ff	neon optimize for depthwise convolution 3x3, about 20%~35% speed gain	8 years ago
nihui	bdb70a2010	padding w h in convolution and deconvolution	8 years ago
nihui	44b4519307	non-square convolution and deconvolution kernel stride dilation	8 years ago
nihuini	47218db6e5	fix minus padding SAME, fix #116	8 years ago
nihuini	7830b3da42	fix potential overread when bias_term is zero	8 years ago
nihuini	b4e3615ee4	depth-wise optimize	8 years ago
nihuini	934f48cb5e	arm neon optimize for group convolution	8 years ago

35 Commits (7655b9e4e9265b817ad103aeb52bb6a302ed445a)