nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihuini	cd4be6d0fa	call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks	6 years ago
nihuini	c0a4ffcf66	convolution pad_value param	6 years ago
tpoisonooo	8dbafe7764	constraint input value to [-127, +127] (#1258 ) * constraint input value to [-127, +127] * keep new line at the end	6 years ago
nihuini	c4bebc6371	x86 conv3x3s1 winograd43 produce wrong result, revert to the good-old winograd23 version	6 years ago
nihuini	e4b44d293e	more autopad SAME_LOWER	6 years ago
nihuini	9a6ee37eef	asymmetric padding parameter for convolution and deconvolution family	6 years ago
nihui	b4c388a72a	Mat misc function accept option parameter, deconvolution pack4 arm neon	6 years ago
nihuini	eced9c81c6	fix crash on x86 conv7x7s1 dilation > 1, fix #1110 , fix #1117	7 years ago
xue	cb14d1bbf3	x86模式下不使用sse时卷积越界的bug	7 years ago
hanson.young	698acd28f6	修复在window x64下AVX2编译的问题 (#1076 ) * fix windows avx2 error in convolution_3x3.h and convolution_sgemm.h * fix compatibility of avx2 under Linux and Windows * Update CMakeLists.txt Space instead of tab * Update convolution_3x3.h Space instead of tab * Update convolution_sgemm.h Space instead of tab	7 years ago
BUG1989	bcfe9f453f	initial the ncnn post training quantization tools (#1067 ) * initial the ncnn post training quantization tools * clear some comments of tools * fix the Travis ci compiler error	7 years ago
BUG1989	c2022f4501	optimize conv sgemm with sse on intel platform (#1035 ) * optimize conv sgemm with sse * Update convolution_x86.cpp	7 years ago
nihuini	838c5df839	option api changes	7 years ago
nihuini	4de4078779	move platform includes out of namespace	7 years ago
BUG1989	b53541e8f9	fix arm winograd int8,optimize winograd x86 (#1025 )	7 years ago
BUG1989	01b3804828	optimization the x86 convolution layer with avx2 (#1019 ) * add the "Tu Fa" conv sgemm fp32 with avx2 for x86 * add avx2 cmake option * fix some bugs of avx2 pull request	7 years ago
nihui	3e003ffd98	fuse sigmoid	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihuini	3f85cafc08	fuse relu leakyrelu clip into convolution/deconvolution/innerproduct	7 years ago
BUG1989	93a34a897d	add int8 winograd F(4,3) with neon assembly optimization (#891 ) * add the implement of int8 winograd F(4,3) * add int8 winograd F(4,3) naive c to arm64-v8a platform * optimize int8 winograd F(4,3) with neon * merge dequant op into int8 winograd F(4,3) * enable int8 wino F(4,3) case with all size	7 years ago
BUG1989	780c7d9a72	merge de/requantize op, optimize some int8 conv layer on arm64-v8a (#867 ) * optimize the conv sgemm int8 on arm64-v8a platform * optimize int8 arm64-v8a with sadalp ins * merge requantize op into latest conv layer * merge requantize op into conv-int8 op * update the mobilenet.param in the benchmark * Update README.md update Kirin970 and RK3399 * try to fix the travis build error	7 years ago
BUG1989	ff38053321	[WIP] arm64-v8a int8 optimization (#823 ) * requantize layer arm64-v8a neon implement * convdw3x3s1 arm64-v8a neon implement * convdw3x3s2 arm64-v8a neon implement * conv1x1s1 arm64-v8a is optimized by neon assembly * conv sgemm int8 optimized with neon assembly,kernel transform is offline * conv conv winograd int8 optimized with neon assembly,fix ci build failed * conv3x3s2 int8 arm64-v8a optimized with neon assembly,remove old codes.	7 years ago
BUG1989	8e337d440e	fix the bug with convdw7x7 op working on int8 mode (#818 )	7 years ago
BUG1989	8ff831f7cd	fix the segmentation fault when load int8 model (#811 )	7 years ago
BUG1989	df3d224484	new int8 implement,better accuracy (#749 ) * add the armv7a conv3x3s1 implement without overflow,remove old codes * fix the bug of conv3x3s2 packed int8 * new int8 implement,weight quant by perchanel,better accuracy~ * fix the bug of conv3x3s1 packed int8 neon * add the naive c fp32 and int8 winograd F(2,3) * add the neon intrinsic int8 winograd F(2,3) * optimize the armv7a int8 winograd F(2,3) with neon assembly * optimize the armv7a int8 winograd F(2,3) input transform with assembly. * add the requantize layer and int8 relu implement. * add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64. * fix int8 bugs * add the c naive im2col with sgemm * add aarch64 int8 winograd f23, conv3x3s2 naive implement * add the int8 sgemm conv7x7s2 on x86/armv7a platform * optimize the int8 sgemm by neon intrinsic and packed kernel * optimize the int8 sgemm with packed data * optimize the int8 sgemm with armv7a neon assembly * add the int8 sgemm on arm64-v8a platform * perpare to merge latest codes from master * add the int8 param files * In the Class Net,add the fuse_network method	7 years ago
BUG1989	3489d02037	fix the dequantize arm bug (#580 ) * innerproduce layer with int8 impl,the type of top_blob shoud be integer. * fix the dequantize_arm bug	7 years ago
BUG1989	7d2d18d31f	innerproduce layer with int8 impl,the type of top_blob shoud be integer. (#578 )	7 years ago
nihuini	6f1b0b0a61	quantized padding in convolution, use range sweets	7 years ago
nihui	72411b7a6c	restore the old conv3x3s2 as reference, fast dilation convolution fails on striding	7 years ago
nihuini	2dbaf6f7b7	store int8 scale in binary	7 years ago
nihui	fe14037777	more sub op preload	7 years ago
nihui	2fe7ada4d8	add arm int8 convolution stub, preload group op for x86	7 years ago
nihui	eac7c66a97	fix fp32 group convolution on x86	7 years ago
nihui	5d04a3a45c	layer holds bottom blob scale, depthwise convolution read group scales	7 years ago
nihuini	6b536701c3	sub-mat shall be allocator-aware	7 years ago
nihuini	4be27a0a89	int8 inference on x86	8 years ago
nihui	a169cec363	core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table	8 years ago
nihui	9706cd1447	implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469	8 years ago
Hyungsuk Yoon	8f56e00b4b	make convolution with dilation fast	8 years ago
nihuini	9ac305e160	create 3-dim sub blob for group convolution, fix #315	8 years ago
nihui	7d1e49584d	call Innerproduct for convolution on flattened blob	8 years ago
nihui	6c4c810fda	decouple modelbin of different input types, simplify timestamp function	8 years ago
nihuini	76a55693a6	decouple convolutiondepthwise and convolution, reduce binary size by 10%, fix #254	8 years ago
nihuini	03621aa7f9	more x86 stub for convolution and convolutiondepthwise	8 years ago
nihui	bdb70a2010	padding w h in convolution and deconvolution	8 years ago
nihui	44b4519307	non-square convolution and deconvolution kernel stride dilation	8 years ago
Hyungsuk Yoon	c641db8034	Fix bug for convolution on x86	8 years ago
Hyungsuk Yoon	574b010ca4	Rename mismatched signatures	8 years ago
nihuini	47218db6e5	fix minus padding SAME, fix #116	8 years ago
nihuini	23630b14b9	implement tensorflow style padding SAME type for convolution and pooling, second try	9 years ago

1 2

52 Commits (cd4be6d0fadd6d01635a4fd3934d97e90e6f71ff)