nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihuini	4de4078779	move platform includes out of namespace	7 years ago
nihui	3e003ffd98	fuse sigmoid	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihuini	3f85cafc08	fuse relu leakyrelu clip into convolution/deconvolution/innerproduct	7 years ago
BUG1989	93a34a897d	add int8 winograd F(4,3) with neon assembly optimization (#891 ) * add the implement of int8 winograd F(4,3) * add int8 winograd F(4,3) naive c to arm64-v8a platform * optimize int8 winograd F(4,3) with neon * merge dequant op into int8 winograd F(4,3) * enable int8 wino F(4,3) case with all size	7 years ago
BUG1989	780c7d9a72	merge de/requantize op, optimize some int8 conv layer on arm64-v8a (#867 ) * optimize the conv sgemm int8 on arm64-v8a platform * optimize int8 arm64-v8a with sadalp ins * merge requantize op into latest conv layer * merge requantize op into conv-int8 op * update the mobilenet.param in the benchmark * Update README.md update Kirin970 and RK3399 * try to fix the travis build error	7 years ago
BUG1989	2f4c4a8202	fix the compile error when using armv7a without neon (#835 )	7 years ago
BUG1989	ff38053321	[WIP] arm64-v8a int8 optimization (#823 ) * requantize layer arm64-v8a neon implement * convdw3x3s1 arm64-v8a neon implement * convdw3x3s2 arm64-v8a neon implement * conv1x1s1 arm64-v8a is optimized by neon assembly * conv sgemm int8 optimized with neon assembly,kernel transform is offline * conv conv winograd int8 optimized with neon assembly,fix ci build failed * conv3x3s2 int8 arm64-v8a optimized with neon assembly,remove old codes.	7 years ago
BUG1989	8e337d440e	fix the bug with convdw7x7 op working on int8 mode (#818 )	7 years ago
BUG1989	df3d224484	new int8 implement,better accuracy (#749 ) * add the armv7a conv3x3s1 implement without overflow,remove old codes * fix the bug of conv3x3s2 packed int8 * new int8 implement,weight quant by perchanel,better accuracy~ * fix the bug of conv3x3s1 packed int8 neon * add the naive c fp32 and int8 winograd F(2,3) * add the neon intrinsic int8 winograd F(2,3) * optimize the armv7a int8 winograd F(2,3) with neon assembly * optimize the armv7a int8 winograd F(2,3) input transform with assembly. * add the requantize layer and int8 relu implement. * add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64. * fix int8 bugs * add the c naive im2col with sgemm * add aarch64 int8 winograd f23, conv3x3s2 naive implement * add the int8 sgemm conv7x7s2 on x86/armv7a platform * optimize the int8 sgemm by neon intrinsic and packed kernel * optimize the int8 sgemm with packed data * optimize the int8 sgemm with armv7a neon assembly * add the int8 sgemm on arm64-v8a platform * perpare to merge latest codes from master * add the int8 param files * In the Class Net,add the fuse_network method	7 years ago
BUG1989	229f8fd8db	add the armv7a conv3x3s2, convdw3x3s1/s2 int8 implement without overflow	7 years ago
BUG1989	4289c64090	add the armv7a conv1x1s1 sgemm int8 implement without overflow : )	7 years ago
nihuini	19ad4cf284	fix build without neon	7 years ago
nihuini	8526a69777	packed int8 convolution 3x3 stride 1 for armv7, 7%~25% faster than vanilla one, but God knows how hard I try :\|	7 years ago
nihuini	6f1b0b0a61	quantized padding in convolution, use range sweets	7 years ago
nihui	72411b7a6c	restore the old conv3x3s2 as reference, fast dilation convolution fails on striding	7 years ago
nihui	1f20eb4e8c	pack weight and more unroll makes improvement, ~20% faster for conv3x3s2	7 years ago
nihui	fe14037777	more sub op preload	7 years ago
nihui	2fe7ada4d8	add arm int8 convolution stub, preload group op for x86	7 years ago
nihui	5d04a3a45c	layer holds bottom blob scale, depthwise convolution read group scales	7 years ago
nihuini	da352916fe	fix pd using flag condition	7 years ago
nihuini	e34aa7786a	armv7 int8 quantize/dequantize and conv1x1s1	7 years ago
nihui	a169cec363	core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table	7 years ago
nihui	9706cd1447	implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469	8 years ago
nihui	5879cb4d15	sgemm outperform direct conv on large channel	8 years ago
nihui	56a667472a	sgemm is always faster on common channel size	8 years ago
nihuini	d172a34329	direct assembly port, enable convolution 1x1 sgemm on armv7	8 years ago
nihuini	0fdb8da60e	sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq	8 years ago
nihui	72bb261e7a	switch to winograd5	8 years ago
Hyungsuk Yoon	8f56e00b4b	make convolution with dilation fast	8 years ago
nihui	7d1e49584d	call Innerproduct for convolution on flattened blob	8 years ago
nihui	03c1f63c2e	switch to winograd4	8 years ago
nihui	a181d25098	new model load api, fix #215	8 years ago
nihui	bdb70a2010	padding w h in convolution and deconvolution	8 years ago
nihui	44b4519307	non-square convolution and deconvolution kernel stride dilation	8 years ago
nihuini	964040fe3c	more runtime decisions for winograd path	8 years ago
nihui	c77ca16468	enable conv3x3s1 winograd optimization, two paths for small image on armv7 and all for aarch64	8 years ago
nihui	790829bc62	partition dot tiles and reuse kernel register, over 20% improvement for tiny image	8 years ago
nihui	eea3ca577a	disable winograd atm ...	8 years ago
nihui	0385d8e8ad	implement winograd64 optimization for convolution 3x3s1	8 years ago
nihuini	47218db6e5	fix minus padding SAME, fix #116	8 years ago
nihuini	23630b14b9	implement tensorflow style padding SAME type for convolution and pooling, second try	8 years ago
nihuini	b7db8be4f6	add ncnn source qwq	9 years ago

1 2

93 Commits (90e6be457b8094fcb63d219691cdf0c41fe01fc0)