* runtime cpu dispatch
* force thread one
* disable openmp for coverage
* simplify test layer
* print NCNN_TARGET_ARCH
* less ci build variants
* weight fp16 storage option
* test convdw int8
* apple a12 a13
* ncnn_add_layer ncnn_add_shader cmake macro
* added avx implementations of FC and Max pool
* Specify AVX2
* Small fixes and using Fused avx activations
* fix type casting
* fixing some CI errors
* Fix code format
* fix pooling test
* remove vector typedef
* More compile fixes
* remove vector typedef
* set c++ version to 17
* Force c++ 17
* Fixing mathfun
* Try and workaround typedef issues
* typefix
* Remove typedef
* switch to static inline
* attempting to fix msvc bug
* Verified MSVX FIX
* Fixing clang build
* fix pooling on MSVC
* Remove c++ 17
* Implement requested changes
* small cleanup
* fix aligment
* fix pooling import
* revert "Implement requested changes"
This reverts commit 5d8cc9494b.
* fix import order
* Undo aligned load
* use Mat class for Shape description
* shape specialization constant in compute shader
* wip
* wip
* test forward_inplace, add binaryop unaryop sigmoid test
* fix arm unaryop test
* fix arm binaryop test
* make shape hint optional, cast int8 to fp32, add cast test
* wip
* follow the good and old local size setting for conv1x1
* the optimal local size rewrite
* fix build on msvc
* add permute shader for all packing layout, add permute test
* concat and slice patial shape constant, slice test
* fix slice test
* interp test
* add lrn test, test packing layout implicitly
* add eltwise test
* add normalize test
* add instancenorm test
* reorg shape constant
* simple local group size partition
* add shape constant param