nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	e80fcbca8f	prefer faster and larger device local only memory on amd integrated graphics, heap budget value follows the same strategy as blob allocator (#4936 )	2 years ago
nihui	249b264336	workaround moltenvk error on spec const composite op (#4714 ) * workaround moltenvk error on spec const composite op * workaround moltenvk crying on binding image with memory offset	3 years ago
LinHe	9426e21166	Memory Pool Improvement For Variadic Sized Inputs (#4190 ) * Simple miss count for better space efficiency * Simple double ended greedy; * Add size drop threshold setter; * set workspace allocator cr to zero as we had some sort of recylcing capability :P Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com>	3 years ago
nihui	0ea7a672fa	fix undefined reference to vkGetAndroidHardwareBufferPropertiesANDROID, add android-29 shared ci (#4056 )	3 years ago
nihui	54c0a13b9f	build shared library (#2525 ) * build shared lib and enable lto * reserved for layer and option * allocator pimpl * datareader pimpl * paramdict pimpl, disable copy assign for allocator and datareader * modelbin pimpl * net extractor pimpl * gpu pimple * disable copy assign vulkandevice, code format * command pimpl, dummy image readonly * pipeline pipelinecache pimpl, export platform class * code format, export simple family * update ci * disable lto on android armv7, merge webassembly ci * link libgcc, fix macos dylib version * pipeline pimpl, gpu info pimpl * destroy gpu info after vulkan device * ignore msvc stl class warning * fix ncnn_paramdict_get_float return type * fix vktransfer upload fp16 without flatten, add command test	5 years ago
Evgeny Proydakov	80cd5f3ed5	Fixed compile warning [-Wunused-variable] in src/allocator.cpp for linux-gcc-nostdio-nostring build. (#2455 )	5 years ago
nihui	cf3cf83cd3	unified image shader storage type (#2231 ) * drop bug_layout_binding_id_alias flag	5 years ago
Evgeny Proydakov	2b66348b62	Fixed compile warnings for gcc linux gpu 64 build. [-Wunused-parameter] [-Wunused-variable] (#2215 )	5 years ago
Leo	5afd318b86	Support remove libstdc++ denpendency (#2030 ) * [build] add toolchain file w/o stdcxx dependency * [build] link m and gcc lib explicitly * [ncnn] complete simple stl impl * [ncnn] adapt for ncnn simplestl * [test] adapt for ncnn simplestl * [ncnn] fix missing algorithm and list when simplestl disabled * [ncnn] fix guard for operator new and delete * [style] fix the code style * [build] fix build failed on darwin and emscripten * [ci] do not import cxx to avoid operator conflict * [ncnn] add temporary partial_sort impl using bubble sort heap sort should be used for better perf. * [ncnn] add std greater and less function * [ncnn] fix placement new operator overload * [ncnn] add operator delete with size info * [build] disable exception, rtti, example and tools when simplestl on * [build] add toolchain for arm simplestl * [build] add toolchain for aarch64 simplestl * [ncnn] move initializer to constructor * [ncnn] use deteiled type instead of auto * [ncnn] use plain lib name in target_link_libraries	5 years ago
nihui	3ef995ed1e	format code style and setup restyled.io (#1840 )	6 years ago
xfan1024	f9a66465c2	fix compile error (#1775 )	6 years ago
nihui	ce6abe24b8	memory type required by buffer and image can be different	6 years ago
Naiyang Lin	ceef2470a5	Add logger.h (#1753 )	6 years ago
nihui	e8688b042f	fuse packing cast storage, binaryop image shader, dummy buffer and image, device-wide utility packing converter operators, fix multi-blob layer test	6 years ago
nihui	62da1228e1	adreno image shader + fp16 + fp16a (#1714 ) * wip * wip * fix * image and imageview can not be destroyed until command execution ends * fast copy path for tightly packed data * wip * texture load works * 1d 3d image * record clone image, multiple commands share one image reference * upload download image * layer forward accept vkimagemat * vkimagemat graph works * staging vkimagemat for passing dynamic parameters, macro for fp32+image shader, padding image shader * vkimagemat elemsize * convolution test pass * conv1x1s1 image shader * fast staging image allocator from host memory, pooling image shader * convolutiondepthwise image shader * innerproduct image shader * packing image shader * crop deconvolution image shader * resolve spirv binding types * image fp16 and fp16a, cast image shader * eltwise image shader * wip * absval image shader * deconvolutiondepthwise image shader * concat image shader, squeezenet works * noop split image shader * uniform precision hint * layer support_image_storage * wip * vulkan device utility operator * command is storage and packing option aware * fallback to cpu on image allocation failed, mobilenetssd works * flatten image shader, enable more test * ci test * check imgfp32 imgfp16 imgfp16a features * fix ci test * fix ci test * upgrade swiftshader * wip * opt aggressive * imgfp16p * opt none * convolution winograd image shader * fix flush range, fast copy path for continous buffer * minor fix * fix innerproduct * wip ... * wip * cast fix * packing test * wip * image fp16p is fp16p * wip * silence * more line info * code clean * softmax image shader	6 years ago
nihui	7365bb80a2	vkmat and command api breaks (#1689 ) * vkmat and command api breaks * always use compute queue for compute buffer transfer * no barrier for readonly weight buffer * record clone, drop queue_owner * bring back layer forward * fix validation errors * lifecycle inside command makes life easier * update doc * record_import_android_hardware_buffer	6 years ago
nihuini	ee118e7d70	reconstruct import android hardwarebuffer api, wip	6 years ago
nihui	61ae6e865e	fix vkallocator flush invalidate size	6 years ago
nihuini	b361b24832	do not enforce coherent memory type, queue transfer after uploading model weight	6 years ago
nihui	038666e049	the initial auto test (#1464 ) * cpu test * wip * ci run test * travis ci for arm64 * arm64 ctest * copy vulkan loader * wip * run * Update ccpp.yml * gpu test * swiftshader * cache macos swiftshader * try MoltenVK * try vulkaninfo * give swiftshader another try * disable failed macos gpu test * more conv test, fix conv3x3s1 gpu test fail * fix deconvolution test * dilation test * cmake option to build tests * ncnn_add_layer_test macro * host barrier before upload and after download, handle packing layout option * test packing layout * wip * wip * merge deconvolution packing and non-packing code * merge convolution packing and non-packing code * pass top_blob_count param * fix build * take care of non-coherent mappable memory	6 years ago
nihuini	a86c2f44c3	vkimagemat, vkimageallocator, convenient construct from android hardware buffer	6 years ago
nihui	a867d96822	dynamic memory type querying, respect memory requirement memory type bits	6 years ago
volvet	ecd64fb36b	Fixed lots of compile warnings (#1286 ) * Fixed lots of compile warnings * refine the unused warning change	6 years ago
nihuini	21b5508c96	shared locked vkallocator cannot prevent concurrent accessing during actual gpu inference, use seperated vkallocator for each queue	7 years ago
nihuini	4729ea3505	bottom blob memory never alias, reuse blob memory more elegantly relying on refcount	7 years ago
nihui	8724440c59	bind wait barrier count member to memory, fix #932	7 years ago
nihuini	85a28959e4	fix binaryop shader binding, use shared buffer state, fix blob copy in non-light mode, fix #817	7 years ago
nihui	68afd1fa17	reset fence	7 years ago
nihui	559183904b	fix random crash on dedicated allocation	7 years ago
nihui	10b8ac68cc	[WIP] vulkan compute (#618 ) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build	7 years ago
RogerOu	add45371de	fix warning: allocator non-virtual destructor	7 years ago
nihui	9706cd1447	implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469	7 years ago

32 Commits (db035d602de6ec0cd3bdd191cb21f4b73e7599be)