vkimagemat was originally used as a mat storage in the hope of improving performance on old adreno gpus, but in fact it is slower than the cpu in most cases and is no longer suitable for the latest adreno architecture and large shapes
* added avx implementations of FC and Max pool
* Specify AVX2
* Small fixes and using Fused avx activations
* fix type casting
* fixing some CI errors
* Fix code format
* fix pooling test
* remove vector typedef
* More compile fixes
* remove vector typedef
* set c++ version to 17
* Force c++ 17
* Fixing mathfun
* Try and workaround typedef issues
* typefix
* Remove typedef
* switch to static inline
* attempting to fix msvc bug
* Verified MSVX FIX
* Fixing clang build
* commit before switch
* More avx and packing implementation
* Fix ctest
* starting the depthwise pack 8 implementation
* Unrolled loop
* add depthwise pack 8 implementations
* Working 1x1 pack 8 implementation added
* revert incorrect changes
* added conact elempack 8
* more elempack enabled layers added and started on the conversion of the winograd pack4 conv 3x3
* Added code formatting
* fix styling
* Unroll loops
* unrolling loops
* Added more elempac layers for mobilenet v3
* revert commit
* fix code style
* remove arm neon references
* remove pack4 references
* More cleanup
* added packing avx code
* fixing linux build ctests
* remove usage of aligned loads
* More aligned mem ops removed
* Cleanup, revert some files and remove not working winograd and shufflechannel implementation
* add stackoverflow referal
* Fix windows build
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* implement requested chaanges
* remove reshape
* revert arm file change
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* fix unterminated directive
Co-authored-by: Restyled.io <commits@restyled.io>
* use Mat class for Shape description
* shape specialization constant in compute shader
* wip
* wip
* test forward_inplace, add binaryop unaryop sigmoid test
* fix arm unaryop test
* fix arm binaryop test
* make shape hint optional, cast int8 to fp32, add cast test
* wip
* follow the good and old local size setting for conv1x1
* the optimal local size rewrite
* fix build on msvc
* add permute shader for all packing layout, add permute test
* concat and slice patial shape constant, slice test
* fix slice test
* interp test
* add lrn test, test packing layout implicitly
* add eltwise test
* add normalize test
* add instancenorm test
* reorg shape constant
* simple local group size partition
* add shape constant param