nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	ac8c56d6fe	update qcom410 and imx7d benchmark	6 years ago
nihuini	2e59da35a9	fill input and weight data with zero	6 years ago
nihuini	02b07b3e43	update qcom810 and iphone5s benchmark	6 years ago
nihui	b77f92dab9	update qcom410 and imx7d benchmark	6 years ago
新无止竞博客	4379f30e23	fix benchmark compilation errors (#1334 ) * fix benchmark compilation errors * some arm-linux toolchain file also defines ANDROID, it will break its build for missing libandroid.so	6 years ago
nihuini	64333429bb	data reader wrapper, fix #1325	6 years ago
nihuini	567e2bd501	a dirty hack for resolving int8 pack4 crash	6 years ago
nihuini	f8caef7691	add shufflenet_v2 benchmark	6 years ago
nihui	49206ea03c	eliminate useless reshape for mbv3	6 years ago
nihuini	9d4255a4a4	add mobilenet_v3 benchmark	6 years ago
Eric Liu	f5eee84185	Update mobilenetv2-yolov3 (#1165 ) * Add mobilenetv2_yolov3 ,and remove old model * Recovery yolov2 * Update link	6 years ago
Natsu	6d1944f2c3	CMake improvement (#1115 ) * CMake improvement * Fix bugs * Fix typo * Propagate vulkan dependency * import vulkan * add config files, now exported target cmake should be able to find packages * Propagate no-rtti and no-exception * Provide a option to control rtti and exception in mobile platform * Make cmake clean * Resolve conflicts * Update CMake PIE is propagated by INTERFACE_POSITION_INDEPENDENT_CODE * Remove bad things	6 years ago
BUG1989	bcfe9f453f	initial the ncnn post training quantization tools (#1067 ) * initial the ncnn post training quantization tools * clear some comments of tools * fix the Travis ci compiler error	7 years ago
nihuini	21b5508c96	shared locked vkallocator cannot prevent concurrent accessing during actual gpu inference, use seperated vkallocator for each queue	7 years ago
BUG1989	06a63467e9	update qcom675 benchmark (#1062 )	7 years ago
BUG1989	bb0d8360dc	update RK3399 benchmark (#1059 )	7 years ago
Howave	0bd8ba9505	update Snapdragon 835 benchmark (Xiaomi 6 with non performance mode) (#1055 )	7 years ago
nihui	1273d69c20	update qcom410 imx7d benchmark	7 years ago
nihuini	040a8d2427	set vulkan device by gpu index	7 years ago
nihuini	9f9ac56538	update qcom810 and iphone5s benchmark	7 years ago
nihuini	838c5df839	option api changes	7 years ago
nihuini	9b33e647bd	use fixed blob names for benchmark	7 years ago
nihuini	8cb107e78c	apply model optimize	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihuini	60e8261812	input param whc order	7 years ago
nihui	a75d45fa9a	chmod -x	7 years ago
nihui	2fe769f314	update fused param files, enable ncnnoptimize tool build	7 years ago
Shuai Yuan	822e6ccefe	Add CPU, GPU benchmark of Qualcomm Snapdragon 835 (#916 ) * Add CPU benchmark of Qualcomm Snapdragon 835 Add CPU benchmark of Qualcomm Snapdragon 835 * Add CPU benchmark of Qualcomm Snapdragon 835 * Add CPU benchmark of Qualcomm Snapdragon 835 * Add CPU, GPU benchmark of Qualcomm Snapdragon 835 Qualcomm MSM8998 Snapdragon 835 (Kyro 2.45GHz x 4 + Kyro 1.9GHz x 4 + Adreno 540) * Update README.md * Update README.md	7 years ago
BUG1989	780c7d9a72	merge de/requantize op, optimize some int8 conv layer on arm64-v8a (#867 ) * optimize the conv sgemm int8 on arm64-v8a platform * optimize int8 arm64-v8a with sadalp ins * merge requantize op into latest conv layer * merge requantize op into conv-int8 op * update the mobilenet.param in the benchmark * Update README.md update Kirin970 and RK3399 * try to fix the travis build error	7 years ago
nihuini	20fb006282	coverage never works without proper unittest	7 years ago
nihuini	d263cd507c	gpu packing and unpacking	7 years ago
BUG1989	2f4c4a8202	fix the compile error when using armv7a without neon (#835 )	7 years ago
nihuini	1f4bdd91b5	uint32_t typed workgroup size	7 years ago
nihuini	a88b6bfbb8	update softmax version	7 years ago
BUG1989	df3d224484	new int8 implement,better accuracy (#749 ) * add the armv7a conv3x3s1 implement without overflow,remove old codes * fix the bug of conv3x3s2 packed int8 * new int8 implement,weight quant by perchanel,better accuracy~ * fix the bug of conv3x3s1 packed int8 neon * add the naive c fp32 and int8 winograd F(2,3) * add the neon intrinsic int8 winograd F(2,3) * optimize the armv7a int8 winograd F(2,3) with neon assembly * optimize the armv7a int8 winograd F(2,3) input transform with assembly. * add the requantize layer and int8 relu implement. * add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64. * fix int8 bugs * add the c naive im2col with sgemm * add aarch64 int8 winograd f23, conv3x3s2 naive implement * add the int8 sgemm conv7x7s2 on x86/armv7a platform * optimize the int8 sgemm by neon intrinsic and packed kernel * optimize the int8 sgemm with packed data * optimize the int8 sgemm with armv7a neon assembly * add the int8 sgemm on arm64-v8a platform * perpare to merge latest codes from master * add the int8 param files * In the Class Net,add the fuse_network method	7 years ago
nihui	d85775fbcd	fix softmax axis order on 3-dim, fix caffe reshape conversion, regenerate ssd param	7 years ago
nihui	182c340b3a	enable ssd vulkan benchmark	7 years ago
nihuini	f162de7263	drop deprecated hack	7 years ago
nihuini	83efa73cf6	fallback to cpu forward if layer not support vulkan, automatically!	7 years ago
nihuini	ab4c94aea9	fix cpu-only build	7 years ago
nihuini	b54e115f6e	enable mobilenet-yolo mobilenet-yolov3 vulkan benchmark	7 years ago
nihui	723d326760	enable shufflenet and vgg16 in vulkan benchmark	7 years ago
nihui	ad68e1e0e6	enable googlenet alexnet vulkan benchmark, fix build on msvc	7 years ago
nihui	b98db6da47	gpu device option	7 years ago
Eric Liu	e6b1412217	Increase a few performance of yolov3 and change tab to space (#767 ) * Fixed a yolov3 resolution bug * Set yolo defalut mean to 1.0 * Fix coding style and increase a few performance * Update mobilenet yolov3 benchmark param	7 years ago
nihui	10b8ac68cc	[WIP] vulkan compute (#618 ) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build	7 years ago
nihui	aaa0765ded	Update README.md	7 years ago
nihuini	884e4f76ac	add mobilenet-yolov3 benchmark	7 years ago
nihuini	4b9abf4d6a	mnasnet and proxylessnasnet benchmark	7 years ago
nihuini	4efc3edc4a	fix mobilenet-yolo benchmark, multiscale feature and depthwise convolution for the last blocks	7 years ago

1 2

82 Commits (61ba8ec68e0c361c929bb47b391af1702fc0ab4e)