* added avx implementations of FC and Max pool
* Specify AVX2
* Small fixes and using Fused avx activations
* fix type casting
* fixing some CI errors
* Fix code format
* fix pooling test
* remove vector typedef
* More compile fixes
* remove vector typedef
* set c++ version to 17
* Force c++ 17
* Fixing mathfun
* Try and workaround typedef issues
* typefix
* Remove typedef
* switch to static inline
* attempting to fix msvc bug
* Verified MSVX FIX
* Fixing clang build
* commit before switch
* More avx and packing implementation
* Fix ctest
* starting the depthwise pack 8 implementation
* Unrolled loop
* add depthwise pack 8 implementations
* Working 1x1 pack 8 implementation added
* revert incorrect changes
* added conact elempack 8
* more elempack enabled layers added and started on the conversion of the winograd pack4 conv 3x3
* Added code formatting
* fix styling
* Unroll loops
* unrolling loops
* Added more elempac layers for mobilenet v3
* revert commit
* fix code style
* remove arm neon references
* remove pack4 references
* More cleanup
* added packing avx code
* fixing linux build ctests
* remove usage of aligned loads
* More aligned mem ops removed
* Cleanup, revert some files and remove not working winograd and shufflechannel implementation
* add stackoverflow referal
* Fix windows build
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* implement requested chaanges
* remove reshape
* revert arm file change
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* fix unterminated directive
Co-authored-by: Restyled.io <commits@restyled.io>
* use Mat class for Shape description
* shape specialization constant in compute shader
* wip
* wip
* test forward_inplace, add binaryop unaryop sigmoid test
* fix arm unaryop test
* fix arm binaryop test
* make shape hint optional, cast int8 to fp32, add cast test
* wip
* follow the good and old local size setting for conv1x1
* the optimal local size rewrite
* fix build on msvc
* add permute shader for all packing layout, add permute test
* concat and slice patial shape constant, slice test
* fix slice test
* interp test
* add lrn test, test packing layout implicitly
* add eltwise test
* add normalize test
* add instancenorm test
* reorg shape constant
* simple local group size partition
* add shape constant param