* convolution winograd pack16to1
* x86 deconvolution and deconvolutiondepthwise
* simpleomp allow 32 arguments
* drop shadow variable workaround
* less winograd test error
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
* [build] add toolchain file w/o stdcxx dependency
* [build] link m and gcc lib explicitly
* [ncnn] complete simple stl impl
* [ncnn] adapt for ncnn simplestl
* [test] adapt for ncnn simplestl
* [ncnn] fix missing algorithm and list when simplestl disabled
* [ncnn] fix guard for operator new and delete
* [style] fix the code style
* [build] fix build failed on darwin and emscripten
* [ci] do not import cxx to avoid operator conflict
* [ncnn] add temporary partial_sort impl using bubble sort
heap sort should be used for better perf.
* [ncnn] add std greater and less function
* [ncnn] fix placement new operator overload
* [ncnn] add operator delete with size info
* [build] disable exception, rtti, example and tools when simplestl on
* [build] add toolchain for arm simplestl
* [build] add toolchain for aarch64 simplestl
* [ncnn] move initializer to constructor
* [ncnn] use deteiled type instead of auto
* [ncnn] use plain lib name in target_link_libraries
* runtime cpu dispatch
* force thread one
* disable openmp for coverage
* simplify test layer
* print NCNN_TARGET_ARCH
* less ci build variants
* weight fp16 storage option
* test convdw int8
* apple a12 a13
* ncnn_add_layer ncnn_add_shader cmake macro
* Fix warnings C4244, C4267 in src/layer/yolov3detectionoutput.cpp
C4244: '=': conversion from 'int' to 'float', possible loss of data
C4244: 'initializing': conversion from 'float' to 'int', possible loss of data
C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
C4244: 'return': conversion from 'double' to 'float', possible loss of data
C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
* Fix warnings C4244, C4267 in src/layer/yolodetectionoutput.cpp
C4244: '=': conversion from 'int' to 'float', possible loss of data
C4244: 'initializing': conversion from 'float' to 'int', possible loss of data
C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
C4244: 'return': conversion from 'double' to 'float', possible loss of data
C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
* Fix warning C4244 in src/layer/quantize.cpp
C4244: 'initializing': conversion from 'double' to 'int', possible loss of data
* Fix warnings C4244, C4267 in src/layer/detectionoutput.cpp
C4244: '=': conversion from 'int' to 'float', possible loss of data
C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
* Fix warning C4244 in src/layer/roipooling.cpp
C4244: 'initializing': conversion from 'double' to 'int', possible loss of data
* Fix warning C4244 in src/layer/sigmoid.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4267 in src/layer/slice.cpp
C4267: '=': conversion from 'size_t' to 'int', possible loss of data
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
* Fix warning C4267 in src/layer/softmax.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/interp.cpp
C4244: '=': conversion from 'float' to 'int', possible loss of data
C4244: 'initializing': conversion from 'double' to 'int', possible loss of data
* Fix warning C4244 in src/layer/instancenorm.cpp
C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/deconvolutiondepthwise.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/layer/convolutiondepthwise.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4244 in src/net.cpp
C4244: 'return': conversion from '__int64' to 'int', possible loss of data
C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
C4267: 'return': conversion from 'size_t' to 'int', possible loss of data
* Fix warning C4244 in src/layer/bnll.cpp
C4244: '=': conversion from 'double' to 'float', possible loss of data
* Fix warning C4267 in src/layer/concat.cpp
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
* Fix warning C4267 in tools/mxnet/mxnet2ncnn.cpp
C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
C4267: '=': conversion from 'size_t' to 'int', possible loss of data
C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
C4305: 'initializing': truncation from 'double' to 'float'
* optimize the conv sgemm int8 on arm64-v8a platform
* optimize int8 arm64-v8a with sadalp ins
* merge requantize op into latest conv layer
* merge requantize op into conv-int8 op
* update the mobilenet.param in the benchmark
* Update README.md
update Kirin970 and RK3399
* try to fix the travis build error
* add the armv7a conv3x3s1 implement without overflow,remove old codes
* fix the bug of conv3x3s2 packed int8
* new int8 implement,weight quant by perchanel,better accuracy~
* fix the bug of conv3x3s1 packed int8 neon
* add the naive c fp32 and int8 winograd F(2,3)
* add the neon intrinsic int8 winograd F(2,3)
* optimize the armv7a int8 winograd F(2,3) with neon assembly
* optimize the armv7a int8 winograd F(2,3) input transform with assembly.
* add the requantize layer and int8 relu implement.
* add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64.
* fix int8 bugs
* add the c naive im2col with sgemm
* add aarch64 int8 winograd f23, conv3x3s2 naive implement
* add the int8 sgemm conv7x7s2 on x86/armv7a platform
* optimize the int8 sgemm by neon intrinsic and packed kernel
* optimize the int8 sgemm with packed data
* optimize the int8 sgemm with armv7a neon assembly
* add the int8 sgemm on arm64-v8a platform
* perpare to merge latest codes from master
* add the int8 param files
* In the Class Net,add the fuse_network method
* vulkan infrastructure
* vkallocator and vkmat
* layer interface for vulkan compute
* wip...
* default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface
* simplify command api, vkmat holds staging buffer, relu works
* initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works
* init extension functions
* dynamic local size and group count
* group count=1 is invalid
* regard device max workgroup size limit
* fix relu oooops
* decouple command record and staging allocation
* create result blob
* add pooling shader
* buffer is faster than image :)
* fix pooling shader
* add innerproduct shader
* readonly writeonly decoration
* simplify buffer creation
* decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D
* fix vulkan building issues in visual studio (#1)
* fix building issues on visual studio
* ignore benchmark
* cancel changes
* ... ...
* decouple paramdict and vulkandevice
* fix staging buffer destroy in model loading
* remove vkdev member in option
* add padding shader
* simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output
* add convolutiondepthwise and softmax shader
* specialization float type, add leakyrelu
* add dropout shader
* add batchnorm shader
* split vulkan forward
* add scale shader
* push constant type can be int or float
* set_optimal_local_size_xyz
* add eltwise shader
* concat vulkan forward
* fix convolution without bias
* add dummy shader for concat and split, more fix ...
* optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor
* check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR
* binaryop and unaryop shader
* hide raw command buffer
* simple vkbenchncnn benchmark
* create device with transfer queue
* rename command to vkcompute, add vktransfer and layer upload_model interface
* external VkMat, copy and map wrt buffer offset
* command copy respect offset and size
* decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights
* fix build on android
* binding count can not vary :(
* barrier check state, fix sub-op destruction
* declare local_size_xyz constant, fix crash on radv
* fix local_size_xyz, second try
* more barrier and state fix
* fix softmax
* reconstruct buffer memory allocator, reuse blob buffer, less verbose output
* find unified memory type index
* weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment
* use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation
* find more useful vulkan extensions and enable them
* fix msvc build
* respect VK_KHR_dedicated_allocation for weight buffer allocation
* fix android build
* fix bias name conflicts with metal
* decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording
* drop dummy shader, inplace softmax, multiple shader module works
* fix unique queue family index error
* flatten support vulkan
* mnasnet run
* find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk
* some minor changes
* add some high level api
* use dedicated transfer queue to upload weight model
* prefer mappable buffer on unified memory
* global pooling and convolution fc, reuse staging buffer
* implement ring-buffer style blob allocator, add VkBufferMemory capacity
* use blob allocator for workspace blob, it works fine :)
* vulkan option off
* Update layer.cpp
* fix build with vulkan off
* less verbose output, fix crash on vulkan_compute off
* merge benchncnn tool
* allocator clear api, use new weight buffer allocator per net
* add default locked allocator
* mapped mat ptr api, persistent mapped memory works generally :)
* travis ci linux vulkan
* travis ci vulkan wip ...
* more gpu wip ...
* more gpu wip ...
* wip...
* wip...
* wip... ...
* wip... ios vulkan build...
* find glslangValidator on ios build
* use dynamic moltenvk library
* travis ci wip ...
* ios simulator does not support metal at all
* fix cpu only extractor
* optimize workgroup size, first try
* optimize workgroup size, second try
* conv1x1s1d1 vec4
* revert build system
* fix ncnn2mem build
* fix ncnn2mem build