* Support Ingenic X2000 & T40
T40 claimed to have MIPS32R2 with MSA, but tested not working on my unit, so set to mips32r2 without MSA.
* Add Ingenic T40XP benchmark result
* SSE2: BatchNorm
* Fixed batch norm in AVX configuration
* Optimized register size switch
* Attempt to pass CI
* Attempt to pass CI
* Bias op
* Element wise ops
* Support packing on x86 by default
* Fixed macro range in bias
* Use aligned read for packed data
* Update testutil.h
* Update pooling_x86.cpp
* Support wasn SIMD
* Fix emscripten compiler flags
* fix build
* more ci fix
* concat x86 pack4
* flatten x86 pack4
* more x86 pack4
* ci pass
* fix
* enable sse2 mathfun
* enable --experimental-wasm-simd
Co-authored-by: nihui <shuizhuyuanluo@126.com>
Co-authored-by: nihuini <nihuini@tencent.com>
* [build] add toolchain file w/o stdcxx dependency
* [build] link m and gcc lib explicitly
* [ncnn] complete simple stl impl
* [ncnn] adapt for ncnn simplestl
* [test] adapt for ncnn simplestl
* [ncnn] fix missing algorithm and list when simplestl disabled
* [ncnn] fix guard for operator new and delete
* [style] fix the code style
* [build] fix build failed on darwin and emscripten
* [ci] do not import cxx to avoid operator conflict
* [ncnn] add temporary partial_sort impl using bubble sort
heap sort should be used for better perf.
* [ncnn] add std greater and less function
* [ncnn] fix placement new operator overload
* [ncnn] add operator delete with size info
* [build] disable exception, rtti, example and tools when simplestl on
* [build] add toolchain for arm simplestl
* [build] add toolchain for aarch64 simplestl
* [ncnn] move initializer to constructor
* [ncnn] use deteiled type instead of auto
* [ncnn] use plain lib name in target_link_libraries
* runtime cpu dispatch
* force thread one
* disable openmp for coverage
* simplify test layer
* print NCNN_TARGET_ARCH
* less ci build variants
* weight fp16 storage option
* test convdw int8
* apple a12 a13
* ncnn_add_layer ncnn_add_shader cmake macro
* added fp16 weight storage version
* Small changes
* Fixed fp16 weight storage layers
* fix innerproduct
* fix loop error
* Fix windows build.
Disable fp 16 conversion when detecting int8 weights.
Implement requested changes.
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* Update option.cpp
Set fp16 storage based on vulkan being used or not.
* added ability for storing state in lstm layer
* added avx lstm
* added arm lstm
* fix innerproduct activation location and add 4 parallel channel version
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* revert arm file
* commit before switch
* implement requested changes
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* More x86 optimized implementations of common layers.
Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy
Added fp16 innerproduct for arm
* fix non avx build
* Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation.
* Fix build check for fp16 arm
* Bypass lstm_fp16 if not supported
* Build order was incorrect
* fix std::min missing in windows build
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
* attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type
* remove double "fix"
* Specify ieee fp16 format
* implement requested changes
* fix arm non-fp16 build
* fix arm lstm
* Restyled/pull 1881 (#15)
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
Co-authored-by: Restyled.io <commits@restyled.io>
* Check blob size on arm lstm
* fix styling
Co-authored-by: Restyled.io <commits@restyled.io>