* [build] add toolchain file w/o stdcxx dependency
* [build] link m and gcc lib explicitly
* [ncnn] complete simple stl impl
* [ncnn] adapt for ncnn simplestl
* [test] adapt for ncnn simplestl
* [ncnn] fix missing algorithm and list when simplestl disabled
* [ncnn] fix guard for operator new and delete
* [style] fix the code style
* [build] fix build failed on darwin and emscripten
* [ci] do not import cxx to avoid operator conflict
* [ncnn] add temporary partial_sort impl using bubble sort
heap sort should be used for better perf.
* [ncnn] add std greater and less function
* [ncnn] fix placement new operator overload
* [ncnn] add operator delete with size info
* [build] disable exception, rtti, example and tools when simplestl on
* [build] add toolchain for arm simplestl
* [build] add toolchain for aarch64 simplestl
* [ncnn] move initializer to constructor
* [ncnn] use deteiled type instead of auto
* [ncnn] use plain lib name in target_link_libraries
* Fix warning C4244 in src/layer/convolution.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/convolution_sgemm_int8.h
C4244: 'initializing': conversion from 'double' to 'int', possible loss of data
* Fix warning C4244 in src/layer/deconvolution.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/elu.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4267 in src/layer/embed.cpp
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
* Fix warning C4244 in src/layer/exp.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/innerproduct.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/log.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/lrn.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/mvn.cp
C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/power.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warnings C4244 and C4267 in src/layer/proposal.cpp
C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
C4244: 'initializing': conversion from 'double' to 'int', possible loss of data
C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
* Fix warning C4244 in src/layer/reduction.cpp
C4244: 'return': conversion from 'double' to 'T', possible loss of data
* Fix warning C4244 in src/layer/tanh.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/binaryop.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warnings C4244 and C4267 in src/layer/unaryop.cpp
C4244: 'return': conversion from 'double' to 'T', possible loss of data
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
* Fix warning C4244 in src/layer/x86/convolutiondepthwise_3x3_int8.h
C4244: 'initializing': conversion from 'double' to 'int', possible loss of data
std::{unary|binary}_function<> are deprecated in C++11 and removed in C++17.
Actually, these were unnecessary in general. In the C++98/03 era, many user-
defined function object classes derived from these base classes in an attempt
to imitate STL conventions. However, STL containers and algorithms have never
required such inheritance (or the typedefs that they provide). Only the
function object adaptors (like bind1st()) needed such typedefs. Currently,
there are no uses of function object adapters in our codebase, so we can
eliminate the inheritance completely.
* vulkan infrastructure
* vkallocator and vkmat
* layer interface for vulkan compute
* wip...
* default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface
* simplify command api, vkmat holds staging buffer, relu works
* initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works
* init extension functions
* dynamic local size and group count
* group count=1 is invalid
* regard device max workgroup size limit
* fix relu oooops
* decouple command record and staging allocation
* create result blob
* add pooling shader
* buffer is faster than image :)
* fix pooling shader
* add innerproduct shader
* readonly writeonly decoration
* simplify buffer creation
* decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D
* fix vulkan building issues in visual studio (#1)
* fix building issues on visual studio
* ignore benchmark
* cancel changes
* ... ...
* decouple paramdict and vulkandevice
* fix staging buffer destroy in model loading
* remove vkdev member in option
* add padding shader
* simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output
* add convolutiondepthwise and softmax shader
* specialization float type, add leakyrelu
* add dropout shader
* add batchnorm shader
* split vulkan forward
* add scale shader
* push constant type can be int or float
* set_optimal_local_size_xyz
* add eltwise shader
* concat vulkan forward
* fix convolution without bias
* add dummy shader for concat and split, more fix ...
* optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor
* check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR
* binaryop and unaryop shader
* hide raw command buffer
* simple vkbenchncnn benchmark
* create device with transfer queue
* rename command to vkcompute, add vktransfer and layer upload_model interface
* external VkMat, copy and map wrt buffer offset
* command copy respect offset and size
* decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights
* fix build on android
* binding count can not vary :(
* barrier check state, fix sub-op destruction
* declare local_size_xyz constant, fix crash on radv
* fix local_size_xyz, second try
* more barrier and state fix
* fix softmax
* reconstruct buffer memory allocator, reuse blob buffer, less verbose output
* find unified memory type index
* weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment
* use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation
* find more useful vulkan extensions and enable them
* fix msvc build
* respect VK_KHR_dedicated_allocation for weight buffer allocation
* fix android build
* fix bias name conflicts with metal
* decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording
* drop dummy shader, inplace softmax, multiple shader module works
* fix unique queue family index error
* flatten support vulkan
* mnasnet run
* find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk
* some minor changes
* add some high level api
* use dedicated transfer queue to upload weight model
* prefer mappable buffer on unified memory
* global pooling and convolution fc, reuse staging buffer
* implement ring-buffer style blob allocator, add VkBufferMemory capacity
* use blob allocator for workspace blob, it works fine :)
* vulkan option off
* Update layer.cpp
* fix build with vulkan off
* less verbose output, fix crash on vulkan_compute off
* merge benchncnn tool
* allocator clear api, use new weight buffer allocator per net
* add default locked allocator
* mapped mat ptr api, persistent mapped memory works generally :)
* travis ci linux vulkan
* travis ci vulkan wip ...
* more gpu wip ...
* more gpu wip ...
* wip...
* wip...
* wip... ...
* wip... ios vulkan build...
* find glslangValidator on ios build
* use dynamic moltenvk library
* travis ci wip ...
* ios simulator does not support metal at all
* fix cpu only extractor
* optimize workgroup size, first try
* optimize workgroup size, second try
* conv1x1s1d1 vec4
* revert build system
* fix ncnn2mem build
* fix ncnn2mem build