nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	11cffce114	armv8.2 infrastructure (#1856 ) * runtime cpu dispatch * force thread one * disable openmp for coverage * simplify test layer * print NCNN_TARGET_ARCH * less ci build variants * weight fp16 storage option * test convdw int8 * apple a12 a13 * ncnn_add_layer ncnn_add_shader cmake macro	5 years ago
nihuini	0d6cc01d55	innerproduct handle mish activation, fix naive C testing, fix #1930	5 years ago
nihui	b5e288b521	layer creator function is not necessary for built-in layers	5 years ago
Tijmen Verhulsdonck	a91a18b901	AVX innerproduct and pooling 2x2 versions (#1839 ) * added avx implementations of FC and Max pool * Specify AVX2 * Small fixes and using Fused avx activations * fix type casting * fixing some CI errors * Fix code format * fix pooling test * remove vector typedef * More compile fixes * remove vector typedef * set c++ version to 17 * Force c++ 17 * Fixing mathfun * Try and workaround typedef issues * typefix * Remove typedef * switch to static inline * attempting to fix msvc bug * Verified MSVX FIX * Fixing clang build * fix pooling on MSVC * Remove c++ 17 * Implement requested changes * small cleanup * fix aligment * fix pooling import * revert "Implement requested changes" This reverts commit `5d8cc9494b`. * fix import order * Undo aligned load	6 years ago
nihui	3ef995ed1e	format code style and setup restyled.io (#1840 )	6 years ago
Xu Yang	dbd9cbab4a	fix layer innerproduct when build with requant option on (#1624 )	6 years ago
nihui	6f2ef1932d	int8 code refactoring wip, add int8 test	6 years ago
Sungmann Cho	c62e2702b3	Fix warnings on Visual Studio (#1456 ) * Fix warning C4244 in src/layer/convolution.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/convolution_sgemm_int8.h C4244: 'initializing': conversion from 'double' to 'int', possible loss of data * Fix warning C4244 in src/layer/deconvolution.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/elu.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4267 in src/layer/embed.cpp C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data * Fix warning C4244 in src/layer/exp.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/innerproduct.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/log.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data C4244: 'initializing': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/lrn.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/mvn.cp C4244: 'initializing': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/power.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warnings C4244 and C4267 in src/layer/proposal.cpp C4244: 'initializing': conversion from 'double' to 'float', possible loss of data C4244: 'initializing': conversion from 'double' to 'int', possible loss of data C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data * Fix warning C4244 in src/layer/reduction.cpp C4244: 'return': conversion from 'double' to 'T', possible loss of data * Fix warning C4244 in src/layer/tanh.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src/layer/binaryop.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warnings C4244 and C4267 in src/layer/unaryop.cpp C4244: 'return': conversion from 'double' to 'T', possible loss of data C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data * Fix warning C4244 in src/layer/x86/convolutiondepthwise_3x3_int8.h C4244: 'initializing': conversion from 'double' to 'int', possible loss of data	6 years ago
nihuini	336d1c1edd	remove the ncnn namespace for in source Option	6 years ago
nihuini	567e2bd501	a dirty hack for resolving int8 pack4 crash	6 years ago
nihuini	cd4be6d0fa	call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks	6 years ago
BUG1989	bcfe9f453f	initial the ncnn post training quantization tools (#1067 ) * initial the ncnn post training quantization tools * clear some comments of tools * fix the Travis ci compiler error	7 years ago
nihuini	838c5df839	option api changes	7 years ago
nihui	3e003ffd98	fuse sigmoid	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihui	be81ecf1f6	fix build on msvc	7 years ago
nihuini	528fe8e9e3	gpu convolution/deconvolution/innerproduct fuse activation	7 years ago
nihuini	3f85cafc08	fuse relu leakyrelu clip into convolution/deconvolution/innerproduct	7 years ago
nihuini	433a92401a	auto barrier in pipeline and copy command	7 years ago
BUG1989	df3d224484	new int8 implement,better accuracy (#749 ) * add the armv7a conv3x3s1 implement without overflow,remove old codes * fix the bug of conv3x3s2 packed int8 * new int8 implement,weight quant by perchanel,better accuracy~ * fix the bug of conv3x3s1 packed int8 neon * add the naive c fp32 and int8 winograd F(2,3) * add the neon intrinsic int8 winograd F(2,3) * optimize the armv7a int8 winograd F(2,3) with neon assembly * optimize the armv7a int8 winograd F(2,3) input transform with assembly. * add the requantize layer and int8 relu implement. * add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64. * fix int8 bugs * add the c naive im2col with sgemm * add aarch64 int8 winograd f23, conv3x3s2 naive implement * add the int8 sgemm conv7x7s2 on x86/armv7a platform * optimize the int8 sgemm by neon intrinsic and packed kernel * optimize the int8 sgemm with packed data * optimize the int8 sgemm with armv7a neon assembly * add the int8 sgemm on arm64-v8a platform * perpare to merge latest codes from master * add the int8 param files * In the Class Net,add the fuse_network method	7 years ago
nihui	cc4376d8e6	do not upload unnecessary pack1 weight, reduce gpu memory usage	7 years ago
nihuini	e213605cd4	reduce memory usage of weight packing	7 years ago
nihui	960ffa1a50	optimize workgroup size for convolution depthwise and innerproduct pack4	7 years ago
nihui	a15b389d86	fix innerproduct pack1to4 pack4to1 weight upload	7 years ago
nihui	9480dcbc36	fix innerproduct out packing	7 years ago
nihui	f9dc551081	add innerproduct pack1to4 pack4to1 glue code	7 years ago
nihui	303996af4c	auto flatten before innerproduct	7 years ago
nihui	f0b4933eac	massive simd optimize in compute shader (#772 ) * init vec4 shader * more vec4 shader ... * convolutiondepthwise is depthwise * pooling pack4, fix global pooling * dropout pack4, relu pack4 * softmax pack4 * more shader vec4 .. * fix staging remap, remove layer pipeline member, add destroy_pipeline interface, add pack4 glue code * eltwise pack4 glue code * add binary pack4, unary pack4 * add binaryop unaryop pack4 glue code	7 years ago
nihui	10b8ac68cc	[WIP] vulkan compute (#618 ) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build	7 years ago
BUG1989	7d2d18d31f	innerproduce layer with int8 impl,the type of top_blob shoud be integer. (#578 )	7 years ago
nihuini	23de61fd07	as we already have the int8_scale_term switch, do not have to rely on the actual scale value	7 years ago
nihuini	2dbaf6f7b7	store int8 scale in binary	7 years ago
nihui	5d04a3a45c	layer holds bottom blob scale, depthwise convolution read group scales	7 years ago
nihui	a169cec363	core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table	7 years ago
nihui	9706cd1447	implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469	7 years ago
nihui	08e261f423	innerproduct produce continous blob, fix #236	8 years ago
nihuini	a84ba8fc0f	element type storage support in Mat, move data member the first so that a pointer to Mat is a pointer to data, convenient index access for float vector	8 years ago
nihui	a181d25098	new model load api, fix #215	8 years ago
nihui	1e2265dd99	new param load api	8 years ago
nihuini	b7db8be4f6	add ncnn source qwq	9 years ago

40 Commits (4d2d625432e8fdaaaa33042f31ceb6071eef6809)