nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	556b79ce4d	create layer decoupled (#5258 ) * create layer decoupled * no more virtual public * allow build test with shared library * decouple cpu vulkan * drop old scripts	2 years ago
nihui	f1ea792b26	fix too many microtask error in old libomp runtime (#4002 )	3 years ago
Evgeny Proydakov	86a785c4aa	Fixed linux-gcc noint8t build: (#3888 )	4 years ago
nihui	131f3d1323	x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694 ) * convolution winograd pack16to1 * x86 deconvolution and deconvolutiondepthwise * simpleomp allow 32 arguments * drop shadow variable workaround * less winograd test error	4 years ago
nihui	c0a94cd9ca	fix armv7 without neon (#3514 )	4 years ago
nihui	999e640d43	dynamic convolution weight (#3408 )	4 years ago
zhiliu6	814f89ef1a	Fuse HardSwish activation into Convolution and InnerProduct (#3233 ) * add general fused activation * add NCNN_FORCE_INLINE option	4 years ago
nihui	7e1aaa5828	cmake option NCNN_INT8 (#2839 )	5 years ago
nihui	5fe75f19ef	architecture changes for int8 packing (#2771 ) * quantize and dequantize tests * unify activation and usability function * drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build * benchmark use requantize int8 model	5 years ago
Leo	5afd318b86	Support remove libstdc++ denpendency (#2030 ) * [build] add toolchain file w/o stdcxx dependency * [build] link m and gcc lib explicitly * [ncnn] complete simple stl impl * [ncnn] adapt for ncnn simplestl * [test] adapt for ncnn simplestl * [ncnn] fix missing algorithm and list when simplestl disabled * [ncnn] fix guard for operator new and delete * [style] fix the code style * [build] fix build failed on darwin and emscripten * [ci] do not import cxx to avoid operator conflict * [ncnn] add temporary partial_sort impl using bubble sort heap sort should be used for better perf. * [ncnn] add std greater and less function * [ncnn] fix placement new operator overload * [ncnn] add operator delete with size info * [build] disable exception, rtti, example and tools when simplestl on * [build] add toolchain for arm simplestl * [build] add toolchain for aarch64 simplestl * [ncnn] move initializer to constructor * [ncnn] use deteiled type instead of auto * [ncnn] use plain lib name in target_link_libraries	5 years ago
nihui	11cffce114	armv8.2 infrastructure (#1856 ) * runtime cpu dispatch * force thread one * disable openmp for coverage * simplify test layer * print NCNN_TARGET_ARCH * less ci build variants * weight fp16 storage option * test convdw int8 * apple a12 a13 * ncnn_add_layer ncnn_add_shader cmake macro	6 years ago
nihui	b5e288b521	layer creator function is not necessary for built-in layers	6 years ago
nihui	3ef995ed1e	format code style and setup restyled.io (#1840 )	6 years ago
zhiliu6	3bfabf1d6a	Add fused convolution and mish layer support. (#1761 )	6 years ago
Naiyang Lin	ceef2470a5	Add logger.h (#1753 )	6 years ago
nihui	e14716dfef	convolution and pooling make padding helper, flatten innerproduct pooling bf16s neon	6 years ago
nihui	6f2ef1932d	int8 code refactoring wip, add int8 test	6 years ago
Sungmann Cho	9bfc554bc9	Fix warnings on Visual Studio (#1431 ) * Fix warnings C4244, C4267 in src/layer/yolov3detectionoutput.cpp C4244: '=': conversion from 'int' to 'float', possible loss of data C4244: 'initializing': conversion from 'float' to 'int', possible loss of data C4244: 'initializing': conversion from 'double' to 'float', possible loss of data C4244: 'return': conversion from 'double' to 'float', possible loss of data C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data * Fix warnings C4244, C4267 in src/layer/yolodetectionoutput.cpp C4244: '=': conversion from 'int' to 'float', possible loss of data C4244: 'initializing': conversion from 'float' to 'int', possible loss of data C4244: 'initializing': conversion from 'double' to 'float', possible loss of data C4244: 'return': conversion from 'double' to 'float', possible loss of data C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data * Fix warning C4244 in src/layer/quantize.cpp C4244: 'initializing': conversion from 'double' to 'int', possible loss of data * Fix warnings C4244, C4267 in src/layer/detectionoutput.cpp C4244: '=': conversion from 'int' to 'float', possible loss of data C4244: 'initializing': conversion from 'double' to 'float', possible loss of data C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data * Fix warning C4244 in src/layer/roipooling.cpp C4244: 'initializing': conversion from 'double' to 'int', possible loss of data * Fix warning C4244 in src/layer/sigmoid.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4267 in src/layer/slice.cpp C4267: '=': conversion from 'size_t' to 'int', possible loss of data C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data * Fix warning C4267 in src/layer/softmax.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/interp.cpp C4244: '=': conversion from 'float' to 'int', possible loss of data C4244: 'initializing': conversion from 'double' to 'int', possible loss of data * Fix warning C4244 in src/layer/instancenorm.cpp C4244: 'initializing': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/deconvolutiondepthwise.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/convolutiondepthwise.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/net.cpp C4244: 'return': conversion from '__int64' to 'int', possible loss of data C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data C4267: 'return': conversion from 'size_t' to 'int', possible loss of data * Fix warning C4244 in src/layer/bnll.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4267 in src/layer/concat.cpp C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data * Fix warning C4267 in tools/mxnet/mxnet2ncnn.cpp C4244: 'initializing': conversion from 'double' to 'float', possible loss of data C4267: '=': conversion from 'size_t' to 'int', possible loss of data C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data C4305: 'initializing': truncation from 'double' to 'float'	6 years ago
nihuini	336d1c1edd	remove the ncnn namespace for in source Option	6 years ago
nihuini	567e2bd501	a dirty hack for resolving int8 pack4 crash	6 years ago
nihuini	cd4be6d0fa	call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks	6 years ago
nihuini	c0a4ffcf66	convolution pad_value param	6 years ago
Xu Yang	31cf7f3c5b	fix ConvolutionDepthWise int8_requantize (#1233 )	6 years ago
nihuini	296e0022df	deconvolution output adj and output shape	6 years ago
nihuini	0e26e3094e	autopad SAME_LOWER	6 years ago
nihuini	9a6ee37eef	asymmetric padding parameter for convolution and deconvolution family	6 years ago
nihui	b4c388a72a	Mat misc function accept option parameter, deconvolution pack4 arm neon	6 years ago
BUG1989	bcfe9f453f	initial the ncnn post training quantization tools (#1067 ) * initial the ncnn post training quantization tools * clear some comments of tools * fix the Travis ci compiler error	7 years ago
nihuini	838c5df839	option api changes	7 years ago
nihui	3e003ffd98	fuse sigmoid	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihuini	528fe8e9e3	gpu convolution/deconvolution/innerproduct fuse activation	7 years ago
nihuini	3f85cafc08	fuse relu leakyrelu clip into convolution/deconvolution/innerproduct	7 years ago
nihui	274392eb80	convolution padding same on gpu	7 years ago
BUG1989	780c7d9a72	merge de/requantize op, optimize some int8 conv layer on arm64-v8a (#867 ) * optimize the conv sgemm int8 on arm64-v8a platform * optimize int8 arm64-v8a with sadalp ins * merge requantize op into latest conv layer * merge requantize op into conv-int8 op * update the mobilenet.param in the benchmark * Update README.md update Kirin970 and RK3399 * try to fix the travis build error	7 years ago
nihuini	b2e41bf83d	fallback convolution to cpu path for pad -233	7 years ago
nihuini	433a92401a	auto barrier in pipeline and copy command	7 years ago
BUG1989	8e337d440e	fix the bug with convdw7x7 op working on int8 mode (#818 )	7 years ago
BUG1989	8ff831f7cd	fix the segmentation fault when load int8 model (#811 )	7 years ago
BUG1989	df3d224484	new int8 implement,better accuracy (#749 ) * add the armv7a conv3x3s1 implement without overflow,remove old codes * fix the bug of conv3x3s2 packed int8 * new int8 implement,weight quant by perchanel,better accuracy~ * fix the bug of conv3x3s1 packed int8 neon * add the naive c fp32 and int8 winograd F(2,3) * add the neon intrinsic int8 winograd F(2,3) * optimize the armv7a int8 winograd F(2,3) with neon assembly * optimize the armv7a int8 winograd F(2,3) input transform with assembly. * add the requantize layer and int8 relu implement. * add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64. * fix int8 bugs * add the c naive im2col with sgemm * add aarch64 int8 winograd f23, conv3x3s2 naive implement * add the int8 sgemm conv7x7s2 on x86/armv7a platform * optimize the int8 sgemm by neon intrinsic and packed kernel * optimize the int8 sgemm with packed data * optimize the int8 sgemm with armv7a neon assembly * add the int8 sgemm on arm64-v8a platform * perpare to merge latest codes from master * add the int8 param files * In the Class Net,add the fuse_network method	7 years ago
nihui	cc4376d8e6	do not upload unnecessary pack1 weight, reduce gpu memory usage	7 years ago
nihuini	43c4b57201	group deconvolution packing family	7 years ago
nihuini	8547864b6f	group convolution packing family	7 years ago
nihuini	37413ea95c	implement depthwise deconvolution vulkan, fix top blob state	7 years ago
nihuini	e213605cd4	reduce memory usage of weight packing	7 years ago
nihui	960ffa1a50	optimize workgroup size for convolution depthwise and innerproduct pack4	7 years ago
nihui	ad68e1e0e6	enable googlenet alexnet vulkan benchmark, fix build on msvc	7 years ago
nihui	f0b4933eac	massive simd optimize in compute shader (#772 ) * init vec4 shader * more vec4 shader ... * convolutiondepthwise is depthwise * pooling pack4, fix global pooling * dropout pack4, relu pack4 * softmax pack4 * more shader vec4 .. * fix staging remap, remove layer pipeline member, add destroy_pipeline interface, add pack4 glue code * eltwise pack4 glue code * add binary pack4, unary pack4 * add binaryop unaryop pack4 glue code	7 years ago
nihui	10b8ac68cc	[WIP] vulkan compute (#618 ) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build	7 years ago
nihuini	ef36d79b7e	implement the missing dequantize image on armv7, prefer neon-optimized 3-dim dequantize, fix #547	7 years ago

1 2

69 Commits (1e75a2df21b8fc6c19444a6660fcd67c793f45a2)