nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	171b9d1bba	use spdx license header, copyright Tencent (#6152 )	11 months ago
nihui	24a3b99f1f	drop layer support_image_storage and option use_image_storage (#6126 ) * fix pyncnn build	1 year ago
nihui	abf0de4488	update ruapu to detect zfh zvfh xtheadvector (#5841 ) * always prefer xtheadvector * update ci toolchain	1 year ago
nihui	211e238639	drop layer forward vkimagemat (#6124 ) vkimagemat was originally used as a mat storage in the hope of improving performance on old adreno gpus, but in fact it is slower than the cpu in most cases and is no longer suitable for the latest adreno architecture and large shapes	1 year ago
nihui	a8e4db713b	initialize layer featmask zero (#6078 )	1 year ago
nihui	19caca3140	port rvv intrinsic 1.0+ (#5642 ) * zfh zvfh xtheadvector infra * dispatch for rvv and xtheadvector * dispatch for non-vector zfh * port xtheadvector recp rsqrt trunc * general rvv gemm * c906 and c910 ci * old tuple code clean * update riscv64 ci * update build doc * drop old th1520 toolchain	1 year ago
nihui	056509a034	fix create_pipeline crash in vulkan-enabled layer without calling load_param/load_model first (#5410 )	2 years ago
nihui	556b79ce4d	create layer decoupled (#5258 ) * create layer decoupled * no more virtual public * allow build test with shared library * decouple cpu vulkan * drop old scripts	2 years ago
邓实诚	a1e3ebf8e5	implement simplemath (#4905 ) * complete abs, fmod and sin function in simplemath.h * remove some unused variables in simplemath.cpp * modify test-coverage.yml and add some functions to simplemath.cpp * modify erf.cpp which included math.h * include platform.h for NCNN_SIMPLEMATH definition * move utility constants and functions in simplemath.h to simplemath.cpp * guard simplemath functions with extern "C" * add NCNN_EXPORT macro in simplemath.h * include plateform.h and guard all declarations with NCNN_SIMPLEMATH * clean unused code in test_unaryop.cpp * guard #include <vector> with NCNN_SIMPLEMATH in benchncnn.cpp * add 'static' to guard functions that not declarated in header file * modify sin and cos with better implementation --------- Co-authored-by: HonestDeng <HonestDeng@users.noreply.github.com>	2 years ago
nihui	6c21b08727	check loongarch lasx and enable (#4820 )	3 years ago
junchao-loongson	279222c2c9	add vector optimization for loongarch64 (#4242 )	3 years ago
nihui	7886e90c65	split arm82 source for smaller binary and memory footprint (#3877 ) * split arm82 source, wip * check compiler arm82 only for arm 64bit target * drop arm82 registery * strict check compiler support arm82	4 years ago
nihui	72c467d1d9	mips msa optimization for quantize dequantize requantize (#3672 )	4 years ago
nihui	920aa79f04	drop x86 avx2 fp16 (#3568 )	4 years ago
nihui	4654030541	decouple x86 fma avx2 (#3560 )	4 years ago
nihui	930c36ebe2	avx512 infrastructure (#3407 )	4 years ago
nihui	878cb713d5	optional arm82 dot source (#3415 )	4 years ago
nihuini	11794675f3	apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path	4 years ago
nihuini	affbefe311	some space cleanup, blob clone from allocator	4 years ago
Tijmen Verhulsdonck	eaa7e24db6	Added ability to switch AVX/AVX2 during runtime (#3076 )	4 years ago
nihui	1c31ac2549	runtime cpu dispatch for mips msa and loongson mmi	5 years ago
nihui	2f70343aec	cmake clean (#3032 )	5 years ago
nihui	bcbb55f033	apple device always has armv8.2 dot (#2963 )	5 years ago
nihuini	afc02d57f9	runtime detect armv8.2 dotprod	5 years ago
nihui	11958424c2	runtime riscv v and zfh dispatch, riscv v optimization for cast	5 years ago
nihui	5fe75f19ef	architecture changes for int8 packing (#2771 ) * quantize and dequantize tests * unify activation and usability function * drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build * benchmark use requantize int8 model	5 years ago
nihui	54c0a13b9f	build shared library (#2525 ) * build shared lib and enable lto * reserved for layer and option * allocator pimpl * datareader pimpl * paramdict pimpl, disable copy assign for allocator and datareader * modelbin pimpl * net extractor pimpl * gpu pimple * disable copy assign vulkandevice, code format * command pimpl, dummy image readonly * pipeline pipelinecache pimpl, export platform class * code format, export simple family * update ci * disable lto on android armv7, merge webassembly ci * link libgcc, fix macos dylib version * pipeline pimpl, gpu info pimpl * destroy gpu info after vulkan device * ignore msvc stl class warning * fix ncnn_paramdict_get_float return type * fix vktransfer upload fp16 without flatten, add command test	5 years ago
nihui	1040f40c8b	update c api for custom allocator datareader modelbin and layer registration, add cookie userdata to layer	5 years ago
Cai Shanli	a9df4f6c59	add custom layer destroyer (#2481 ) * add custom layer destroyer * set default layer destroyer with 0	5 years ago
Leo	5afd318b86	Support remove libstdc++ denpendency (#2030 ) * [build] add toolchain file w/o stdcxx dependency * [build] link m and gcc lib explicitly * [ncnn] complete simple stl impl * [ncnn] adapt for ncnn simplestl * [test] adapt for ncnn simplestl * [ncnn] fix missing algorithm and list when simplestl disabled * [ncnn] fix guard for operator new and delete * [style] fix the code style * [build] fix build failed on darwin and emscripten * [ci] do not import cxx to avoid operator conflict * [ncnn] add temporary partial_sort impl using bubble sort heap sort should be used for better perf. * [ncnn] add std greater and less function * [ncnn] fix placement new operator overload * [ncnn] add operator delete with size info * [build] disable exception, rtti, example and tools when simplestl on * [build] add toolchain for arm simplestl * [build] add toolchain for aarch64 simplestl * [ncnn] move initializer to constructor * [ncnn] use deteiled type instead of auto * [ncnn] use plain lib name in target_link_libraries	5 years ago
nihui	54e79a62d7	fix crash on non-arm82 build	5 years ago
nihui	bb5bfe3841	avx2 infrastructure (#1943 )	5 years ago
nihui	11cffce114	armv8.2 infrastructure (#1856 ) * runtime cpu dispatch * force thread one * disable openmp for coverage * simplify test layer * print NCNN_TARGET_ARCH * less ci build variants * weight fp16 storage option * test convdw int8 * apple a12 a13 * ncnn_add_layer ncnn_add_shader cmake macro	5 years ago
nihui	3ef995ed1e	format code style and setup restyled.io (#1840 )	6 years ago
Naiyang Lin	ceef2470a5	Add logger.h (#1753 )	6 years ago
nihui	62da1228e1	adreno image shader + fp16 + fp16a (#1714 ) * wip * wip * fix * image and imageview can not be destroyed until command execution ends * fast copy path for tightly packed data * wip * texture load works * 1d 3d image * record clone image, multiple commands share one image reference * upload download image * layer forward accept vkimagemat * vkimagemat graph works * staging vkimagemat for passing dynamic parameters, macro for fp32+image shader, padding image shader * vkimagemat elemsize * convolution test pass * conv1x1s1 image shader * fast staging image allocator from host memory, pooling image shader * convolutiondepthwise image shader * innerproduct image shader * packing image shader * crop deconvolution image shader * resolve spirv binding types * image fp16 and fp16a, cast image shader * eltwise image shader * wip * absval image shader * deconvolutiondepthwise image shader * concat image shader, squeezenet works * noop split image shader * uniform precision hint * layer support_image_storage * wip * vulkan device utility operator * command is storage and packing option aware * fallback to cpu on image allocation failed, mobilenetssd works * flatten image shader, enable more test * ci test * check imgfp32 imgfp16 imgfp16a features * fix ci test * fix ci test * upgrade swiftshader * wip * opt aggressive * imgfp16p * opt none * convolution winograd image shader * fix flush range, fast copy path for continous buffer * minor fix * fix innerproduct * wip ... * wip * cast fix * packing test * wip * image fp16p is fp16p * wip * silence * more line info * code clean * softmax image shader	6 years ago
nihui	7365bb80a2	vkmat and command api breaks (#1689 ) * vkmat and command api breaks * always use compute queue for compute buffer transfer * no barrier for readonly weight buffer * record clone, drop queue_owner * bring back layer forward * fix validation errors * lifecycle inside command makes life easier * update doc * record_import_android_hardware_buffer	6 years ago
nihui	7d1eec3d5d	the use_bf16_storage option	6 years ago
nihuini	6935b78926	new layer attribute support_packing	7 years ago
Howave	123ca35e00	fix compile warnings (#1042 )	7 years ago
nihuini	e09607bc22	add option to upload model function, pipeline creation honors option use flags, setting allocator per extractor do not make much sense	7 years ago
nihuini	838c5df839	option api changes	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihuini	433a92401a	auto barrier in pipeline and copy command	7 years ago
nihuini	2672cd437f	add layer type index member	7 years ago
nihuini	85a28959e4	fix binaryop shader binding, use shared buffer state, fix blob copy in non-light mode, fix #817	7 years ago
nihui	b49cb56ad9	constify vulkan device handle, use default local vulkan device if not specified	7 years ago
nihui	f0b4933eac	massive simd optimize in compute shader (#772 ) * init vec4 shader * more vec4 shader ... * convolutiondepthwise is depthwise * pooling pack4, fix global pooling * dropout pack4, relu pack4 * softmax pack4 * more shader vec4 .. * fix staging remap, remove layer pipeline member, add destroy_pipeline interface, add pack4 glue code * eltwise pack4 glue code * add binary pack4, unary pack4 * add binaryop unaryop pack4 glue code	7 years ago
nihui	10b8ac68cc	[WIP] vulkan compute (#618 ) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build	7 years ago
nihui	9706cd1447	implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469	8 years ago

1 2

57 Commits (d74f1e5654da4e405c10f5e1c2dacc68239dbd2f)