* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
* feat: add denormal options
Flush-To-Zero(FTZ) and Denormals-Are-Zero(DAZ) are modes that bypass IEEE754 methods of dealing with denormal floating-point numbers on x86_64 and some x86 CPUs.
* feat: Integrate `flush_denormals` into `Extractor::extract`
* chore: replace global variable with `ThreadLocalStorage`
* add bridge image for adreno image storage upload and download
* enable sbn1, print bugbilz flag
* blacklist old adreno
* let user choose use_image_storage option even when bug_storage_buffer_no_l1
* SSE2: BatchNorm
* Fixed batch norm in AVX configuration
* Optimized register size switch
* Attempt to pass CI
* Attempt to pass CI
* Bias op
* Element wise ops
* Support packing on x86 by default
* Fixed macro range in bias
* Use aligned read for packed data
* Update testutil.h
* Update pooling_x86.cpp
* Support wasn SIMD
* Fix emscripten compiler flags
* fix build
* more ci fix
* concat x86 pack4
* flatten x86 pack4
* more x86 pack4
* ci pass
* fix
* enable sse2 mathfun
* enable --experimental-wasm-simd
Co-authored-by: nihui <shuizhuyuanluo@126.com>
Co-authored-by: nihuini <nihuini@tencent.com>
* runtime cpu dispatch
* force thread one
* disable openmp for coverage
* simplify test layer
* print NCNN_TARGET_ARCH
* less ci build variants
* weight fp16 storage option
* test convdw int8
* apple a12 a13
* ncnn_add_layer ncnn_add_shader cmake macro
* added avx implementations of FC and Max pool
* Specify AVX2
* Small fixes and using Fused avx activations
* fix type casting
* fixing some CI errors
* Fix code format
* fix pooling test
* remove vector typedef
* More compile fixes
* remove vector typedef
* set c++ version to 17
* Force c++ 17
* Fixing mathfun
* Try and workaround typedef issues
* typefix
* Remove typedef
* switch to static inline
* attempting to fix msvc bug
* Verified MSVX FIX
* Fixing clang build
* commit before switch
* More avx and packing implementation
* Fix ctest
* starting the depthwise pack 8 implementation
* Unrolled loop
* add depthwise pack 8 implementations
* Working 1x1 pack 8 implementation added
* revert incorrect changes
* added conact elempack 8
* more elempack enabled layers added and started on the conversion of the winograd pack4 conv 3x3
* Added code formatting
* fix styling
* Unroll loops
* unrolling loops
* Added more elempac layers for mobilenet v3
* revert commit
* fix code style
* remove arm neon references
* remove pack4 references
* More cleanup
* added packing avx code
* fixing linux build ctests
* remove usage of aligned loads
* More aligned mem ops removed
* Cleanup, revert some files and remove not working winograd and shufflechannel implementation
* add stackoverflow referal
* Fix windows build
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* implement requested chaanges
* remove reshape
* revert arm file change
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* fix unterminated directive
Co-authored-by: Restyled.io <commits@restyled.io>