nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihuini	cd4be6d0fa	call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks	6 years ago
nihuini	c0a4ffcf66	convolution pad_value param	6 years ago
nihuini	296e0022df	deconvolution output adj and output shape	6 years ago
nihuini	0e26e3094e	autopad SAME_LOWER	6 years ago
nihuini	9a6ee37eef	asymmetric padding parameter for convolution and deconvolution family	6 years ago
nihui	b4c388a72a	Mat misc function accept option parameter, deconvolution pack4 arm neon	6 years ago
tpoisonooo	1ca4387c9c	Auto choose conv implementation (#1085 ) * add relative README_CN.md; * obtain time cost with op->forward().	6 years ago
BUG1989	bcfe9f453f	initial the ncnn post training quantization tools (#1067 ) * initial the ncnn post training quantization tools * clear some comments of tools * fix the Travis ci compiler error	7 years ago
nihuini	838c5df839	option api changes	7 years ago
nihui	3e003ffd98	fuse sigmoid	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihuini	528fe8e9e3	gpu convolution/deconvolution/innerproduct fuse activation	7 years ago
nihuini	3f85cafc08	fuse relu leakyrelu clip into convolution/deconvolution/innerproduct	7 years ago
nihui	274392eb80	convolution padding same on gpu	7 years ago
BUG1989	780c7d9a72	merge de/requantize op, optimize some int8 conv layer on arm64-v8a (#867 ) * optimize the conv sgemm int8 on arm64-v8a platform * optimize int8 arm64-v8a with sadalp ins * merge requantize op into latest conv layer * merge requantize op into conv-int8 op * update the mobilenet.param in the benchmark * Update README.md update Kirin970 and RK3399 * try to fix the travis build error	7 years ago
nihuini	b2e41bf83d	fallback convolution to cpu path for pad -233	7 years ago
nihuini	433a92401a	auto barrier in pipeline and copy command	7 years ago
BUG1989	8e337d440e	fix the bug with convdw7x7 op working on int8 mode (#818 )	7 years ago
BUG1989	df3d224484	new int8 implement,better accuracy (#749 ) * add the armv7a conv3x3s1 implement without overflow,remove old codes * fix the bug of conv3x3s2 packed int8 * new int8 implement,weight quant by perchanel,better accuracy~ * fix the bug of conv3x3s1 packed int8 neon * add the naive c fp32 and int8 winograd F(2,3) * add the neon intrinsic int8 winograd F(2,3) * optimize the armv7a int8 winograd F(2,3) with neon assembly * optimize the armv7a int8 winograd F(2,3) input transform with assembly. * add the requantize layer and int8 relu implement. * add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64. * fix int8 bugs * add the c naive im2col with sgemm * add aarch64 int8 winograd f23, conv3x3s2 naive implement * add the int8 sgemm conv7x7s2 on x86/armv7a platform * optimize the int8 sgemm by neon intrinsic and packed kernel * optimize the int8 sgemm with packed data * optimize the int8 sgemm with armv7a neon assembly * add the int8 sgemm on arm64-v8a platform * perpare to merge latest codes from master * add the int8 param files * In the Class Net,add the fuse_network method	7 years ago
nihui	cc4376d8e6	do not upload unnecessary pack1 weight, reduce gpu memory usage	7 years ago
nihui	0ad0c07526	drop duplicated weight data in convolution-fc, use the more light-weight pipelines	7 years ago
nihuini	e213605cd4	reduce memory usage of weight packing	7 years ago
nihui	f4e12101c0	fix convolution typed innerproduct pack4	7 years ago
nihui	7ee3216fff	add convolution pack1to4 pack4to1	7 years ago
nihui	ad68e1e0e6	enable googlenet alexnet vulkan benchmark, fix build on msvc	7 years ago
nihui	f0b4933eac	massive simd optimize in compute shader (#772 ) * init vec4 shader * more vec4 shader ... * convolutiondepthwise is depthwise * pooling pack4, fix global pooling * dropout pack4, relu pack4 * softmax pack4 * more shader vec4 .. * fix staging remap, remove layer pipeline member, add destroy_pipeline interface, add pack4 glue code * eltwise pack4 glue code * add binary pack4, unary pack4 * add binaryop unaryop pack4 glue code	7 years ago
nihui	10b8ac68cc	[WIP] vulkan compute (#618 ) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build	7 years ago
nihuini	23de61fd07	as we already have the int8_scale_term switch, do not have to rely on the actual scale value	7 years ago
nihuini	6f1b0b0a61	quantized padding in convolution, use range sweets	7 years ago
nihuini	2dbaf6f7b7	store int8 scale in binary	7 years ago
nihui	fe14037777	more sub op preload	7 years ago
nihui	2fe7ada4d8	add arm int8 convolution stub, preload group op for x86	7 years ago
nihui	5d04a3a45c	layer holds bottom blob scale, depthwise convolution read group scales	7 years ago
nihui	a169cec363	core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table	7 years ago
nihui	9706cd1447	implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469	7 years ago
nihui	7d1e49584d	call Innerproduct for convolution on flattened blob	8 years ago
nihuini	a84ba8fc0f	element type storage support in Mat, move data member the first so that a pointer to Mat is a pointer to data, convenient index access for float vector	8 years ago
nihui	a181d25098	new model load api, fix #215	8 years ago
nihui	bdb70a2010	padding w h in convolution and deconvolution	8 years ago
nihui	44b4519307	non-square convolution and deconvolution kernel stride dilation	8 years ago
nihui	1e2265dd99	new param load api	8 years ago
nihuini	47218db6e5	fix minus padding SAME, fix #116	8 years ago
nihuini	23630b14b9	implement tensorflow style padding SAME type for convolution and pooling, second try	8 years ago
nihuini	320cbca902	implement tensorflow style padding SAME type for convolution and pooling	8 years ago
nihuini	9bba77aa8e	fix dilation convolution, fix #64 fix #75	8 years ago
nihuini	b7db8be4f6	add ncnn source qwq	9 years ago

46 Commits (cd4be6d0fadd6d01635a4fd3934d97e90e6f71ff)