* added avx implementations of FC and Max pool
* Specify AVX2
* Small fixes and using Fused avx activations
* fix type casting
* fixing some CI errors
* Fix code format
* fix pooling test
* remove vector typedef
* More compile fixes
* remove vector typedef
* set c++ version to 17
* Force c++ 17
* Fixing mathfun
* Try and workaround typedef issues
* typefix
* Remove typedef
* switch to static inline
* attempting to fix msvc bug
* Verified MSVX FIX
* Fixing clang build
* commit before switch
* More avx and packing implementation
* Fix ctest
* starting the depthwise pack 8 implementation
* Unrolled loop
* add depthwise pack 8 implementations
* Working 1x1 pack 8 implementation added
* revert incorrect changes
* added conact elempack 8
* more elempack enabled layers added and started on the conversion of the winograd pack4 conv 3x3
* Added code formatting
* fix styling
* Unroll loops
* unrolling loops
* Added more elempac layers for mobilenet v3
* revert commit
* fix code style
* remove arm neon references
* remove pack4 references
* More cleanup
* added packing avx code
* fixing linux build ctests
* remove usage of aligned loads
* More aligned mem ops removed
* Cleanup, revert some files and remove not working winograd and shufflechannel implementation
* add stackoverflow referal
* Fix windows build
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* implement requested chaanges
* remove reshape
* revert arm file change
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* fix unterminated directive
Co-authored-by: Restyled.io <commits@restyled.io>
* add the armv7a conv3x3s1 implement without overflow,remove old codes
* fix the bug of conv3x3s2 packed int8
* new int8 implement,weight quant by perchanel,better accuracy~
* fix the bug of conv3x3s1 packed int8 neon
* add the naive c fp32 and int8 winograd F(2,3)
* add the neon intrinsic int8 winograd F(2,3)
* optimize the armv7a int8 winograd F(2,3) with neon assembly
* optimize the armv7a int8 winograd F(2,3) input transform with assembly.
* add the requantize layer and int8 relu implement.
* add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64.
* fix int8 bugs
* add the c naive im2col with sgemm
* add aarch64 int8 winograd f23, conv3x3s2 naive implement
* add the int8 sgemm conv7x7s2 on x86/armv7a platform
* optimize the int8 sgemm by neon intrinsic and packed kernel
* optimize the int8 sgemm with packed data
* optimize the int8 sgemm with armv7a neon assembly
* add the int8 sgemm on arm64-v8a platform
* perpare to merge latest codes from master
* add the int8 param files
* In the Class Net,add the fuse_network method
* print benchmark information for every layer, especially for CONVOLUTION
* print benchmark information for every layer, especially for CONVOLUTION, for cross-platform.
* move the function implementation to cpp file to avoid multiple definitions