You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

test_lstm.cpp 25 kB

LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614
  1. // Tencent is pleased to support the open source community by making ncnn available.
  2. //
  3. // Copyright (C) 2020 THL A29 Limited, a Tencent company. All rights reserved.
  4. //
  5. // Licensed under the BSD 3-Clause License (the "License"); you may not use this file except
  6. // in compliance with the License. You may obtain a copy of the License at
  7. //
  8. // https://opensource.org/licenses/BSD-3-Clause
  9. //
  10. // Unless required by applicable law or agreed to in writing, software distributed
  11. // under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
  12. // CONDITIONS OF ANY KIND, either express or implied. See the License for the
  13. // specific language governing permissions and limitations under the License.
  14. #include "testutil.h"
  15. static int test_lstm(const ncnn::Mat& a, int outch, int direction, int hidden_size = 0)
  16. {
  17. int input_size = a.w;
  18. int num_directions = direction == 2 ? 2 : 1;
  19. if (hidden_size == 0)
  20. hidden_size = outch;
  21. ncnn::ParamDict pd;
  22. pd.set(0, outch);
  23. pd.set(1, hidden_size * input_size * 4 * num_directions);
  24. pd.set(2, direction);
  25. pd.set(3, hidden_size);
  26. std::vector<ncnn::Mat> weights(hidden_size == outch ? 3 : 4);
  27. weights[0] = RandomMat(hidden_size * input_size * 4 * num_directions);
  28. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  29. weights[2] = RandomMat(outch * hidden_size * 4 * num_directions);
  30. if (hidden_size != outch)
  31. {
  32. weights[3] = RandomMat(hidden_size * outch * num_directions);
  33. }
  34. int ret = test_layer("LSTM", pd, weights, a);
  35. if (ret != 0)
  36. {
  37. fprintf(stderr, "test_lstm failed a.dims=%d a=(%d %d %d) outch=%d direction=%d hidden_size=%d\n", a.dims, a.w, a.h, a.c, outch, direction, hidden_size);
  38. }
  39. return ret;
  40. }
  41. static int test_lstm_with_hidden(const ncnn::Mat& a, int outch, int direction, int hidden_size = 0)
  42. {
  43. int input_size = a.w;
  44. int num_directions = direction == 2 ? 2 : 1;
  45. if (hidden_size == 0)
  46. hidden_size = outch;
  47. ncnn::ParamDict pd;
  48. pd.set(0, outch);
  49. pd.set(1, hidden_size * input_size * 4 * num_directions);
  50. pd.set(2, direction);
  51. pd.set(3, hidden_size);
  52. std::vector<ncnn::Mat> weights(hidden_size == outch ? 3 : 4);
  53. weights[0] = RandomMat(hidden_size * input_size * 4 * num_directions);
  54. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  55. weights[2] = RandomMat(outch * hidden_size * 4 * num_directions);
  56. if (hidden_size != outch)
  57. {
  58. weights[3] = RandomMat(hidden_size * outch * num_directions);
  59. }
  60. // initial hidden state
  61. ncnn::Mat hidden = RandomMat(outch, num_directions);
  62. // initial cell state
  63. ncnn::Mat cell = RandomMat(hidden_size, num_directions);
  64. std::vector<ncnn::Mat> as(3);
  65. as[0] = a;
  66. as[1] = hidden;
  67. as[2] = cell;
  68. int ret = test_layer("LSTM", pd, weights, as, 3);
  69. if (ret != 0)
  70. {
  71. fprintf(stderr, "test_lstm_with_hidden failed a.dims=%d a=(%d %d %d) outch=%d direction=%d hidden_size=%d\n", a.dims, a.w, a.h, a.c, outch, direction, hidden_size);
  72. }
  73. return ret;
  74. }
  75. static int test_lstm_with_hidden_input(const ncnn::Mat& a, int outch, int direction, int hidden_size = 0)
  76. {
  77. int input_size = a.w;
  78. int num_directions = direction == 2 ? 2 : 1;
  79. if (hidden_size == 0)
  80. hidden_size = outch;
  81. ncnn::ParamDict pd;
  82. pd.set(0, outch);
  83. pd.set(1, hidden_size * input_size * 4 * num_directions);
  84. pd.set(2, direction);
  85. pd.set(3, hidden_size);
  86. std::vector<ncnn::Mat> weights(hidden_size == outch ? 3 : 4);
  87. weights[0] = RandomMat(hidden_size * input_size * 4 * num_directions);
  88. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  89. weights[2] = RandomMat(outch * hidden_size * 4 * num_directions);
  90. if (hidden_size != outch)
  91. {
  92. weights[3] = RandomMat(hidden_size * outch * num_directions);
  93. }
  94. // initial hidden state
  95. ncnn::Mat hidden = RandomMat(outch, num_directions);
  96. // initial cell state
  97. ncnn::Mat cell = RandomMat(hidden_size, num_directions);
  98. std::vector<ncnn::Mat> as(3);
  99. as[0] = a;
  100. as[1] = hidden;
  101. as[2] = cell;
  102. int ret = test_layer("LSTM", pd, weights, as, 1);
  103. if (ret != 0)
  104. {
  105. fprintf(stderr, "test_lstm_with_hidden_input failed a.dims=%d a=(%d %d %d) outch=%d direction=%d hidden_size=%d\n", a.dims, a.w, a.h, a.c, outch, direction, hidden_size);
  106. }
  107. return ret;
  108. }
  109. static int test_lstm_with_hidden_output(const ncnn::Mat& a, int outch, int direction, int hidden_size = 0)
  110. {
  111. int input_size = a.w;
  112. int num_directions = direction == 2 ? 2 : 1;
  113. if (hidden_size == 0)
  114. hidden_size = outch;
  115. ncnn::ParamDict pd;
  116. pd.set(0, outch);
  117. pd.set(1, hidden_size * input_size * 4 * num_directions);
  118. pd.set(2, direction);
  119. pd.set(3, hidden_size);
  120. std::vector<ncnn::Mat> weights(hidden_size == outch ? 3 : 4);
  121. weights[0] = RandomMat(hidden_size * input_size * 4 * num_directions);
  122. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  123. weights[2] = RandomMat(outch * hidden_size * 4 * num_directions);
  124. if (hidden_size != outch)
  125. {
  126. weights[3] = RandomMat(hidden_size * outch * num_directions);
  127. }
  128. std::vector<ncnn::Mat> as(1);
  129. as[0] = a;
  130. int ret = test_layer("LSTM", pd, weights, as, 3);
  131. if (ret != 0)
  132. {
  133. fprintf(stderr, "test_lstm_with_hidden_output failed a.dims=%d a=(%d %d %d) outch=%d direction=%d hidden_size=%d\n", a.dims, a.w, a.h, a.c, outch, direction, hidden_size);
  134. }
  135. return ret;
  136. }
  137. static int test_lstm_0()
  138. {
  139. return 0
  140. || test_lstm(RandomMat(4, 1), 2, 2)
  141. || test_lstm(RandomMat(8, 2), 2, 2)
  142. || test_lstm(RandomMat(16, 8), 7, 2)
  143. || test_lstm(RandomMat(17, 8), 8, 2)
  144. || test_lstm(RandomMat(19, 15), 8, 2)
  145. || test_lstm(RandomMat(5, 16), 16, 2)
  146. || test_lstm(RandomMat(3, 16), 8, 2)
  147. || test_lstm(RandomMat(8, 16), 16, 2)
  148. || test_lstm(RandomMat(31, 3), 31, 2)
  149. || test_lstm(RandomMat(2, 5), 17, 2, 15);
  150. }
  151. static int test_lstm_1()
  152. {
  153. return 0
  154. || test_lstm_with_hidden(RandomMat(4, 4), 1, 2)
  155. || test_lstm_with_hidden(RandomMat(8, 2), 2, 2)
  156. || test_lstm_with_hidden(RandomMat(16, 8), 7, 2)
  157. || test_lstm_with_hidden(RandomMat(17, 8), 8, 2)
  158. || test_lstm_with_hidden(RandomMat(19, 15), 8, 2)
  159. || test_lstm_with_hidden(RandomMat(5, 16), 16, 2)
  160. || test_lstm_with_hidden(RandomMat(3, 16), 8, 2)
  161. || test_lstm_with_hidden(RandomMat(2, 5), 79, 2, 33)
  162. || test_lstm_with_hidden(RandomMat(4, 4), 1, 1)
  163. || test_lstm_with_hidden(RandomMat(8, 2), 2, 1)
  164. || test_lstm_with_hidden(RandomMat(16, 8), 7, 1)
  165. || test_lstm_with_hidden(RandomMat(17, 8), 8, 1)
  166. || test_lstm_with_hidden(RandomMat(19, 15), 8, 1)
  167. || test_lstm_with_hidden(RandomMat(5, 16), 16, 1)
  168. || test_lstm_with_hidden(RandomMat(3, 16), 8, 1)
  169. || test_lstm_with_hidden(RandomMat(2, 5), 79, 1, 33)
  170. || test_lstm_with_hidden(RandomMat(4, 2), 1, 0)
  171. || test_lstm_with_hidden(RandomMat(8, 2), 2, 0)
  172. || test_lstm_with_hidden(RandomMat(16, 8), 7, 0)
  173. || test_lstm_with_hidden(RandomMat(17, 8), 8, 0)
  174. || test_lstm_with_hidden(RandomMat(19, 15), 8, 0)
  175. || test_lstm_with_hidden(RandomMat(5, 16), 16, 0)
  176. || test_lstm_with_hidden(RandomMat(3, 16), 8, 0)
  177. || test_lstm_with_hidden(RandomMat(2, 5), 17, 0, 15)
  178. || test_lstm_with_hidden_input(RandomMat(4, 4), 1, 2)
  179. || test_lstm_with_hidden_input(RandomMat(8, 2), 2, 2)
  180. || test_lstm_with_hidden_input(RandomMat(16, 8), 7, 2)
  181. || test_lstm_with_hidden_input(RandomMat(17, 8), 8, 2)
  182. || test_lstm_with_hidden_input(RandomMat(19, 15), 8, 2)
  183. || test_lstm_with_hidden_input(RandomMat(5, 16), 16, 2)
  184. || test_lstm_with_hidden_input(RandomMat(3, 16), 8, 2)
  185. || test_lstm_with_hidden_input(RandomMat(2, 5), 79, 2, 33)
  186. || test_lstm_with_hidden_input(RandomMat(4, 4), 1, 1)
  187. || test_lstm_with_hidden_input(RandomMat(8, 2), 2, 1)
  188. || test_lstm_with_hidden_input(RandomMat(16, 8), 7, 1)
  189. || test_lstm_with_hidden_input(RandomMat(17, 8), 8, 1)
  190. || test_lstm_with_hidden_input(RandomMat(19, 15), 8, 1)
  191. || test_lstm_with_hidden_input(RandomMat(5, 16), 16, 1)
  192. || test_lstm_with_hidden_input(RandomMat(3, 16), 8, 1)
  193. || test_lstm_with_hidden_input(RandomMat(2, 5), 79, 1, 33)
  194. || test_lstm_with_hidden_input(RandomMat(4, 2), 1, 0)
  195. || test_lstm_with_hidden_input(RandomMat(8, 2), 2, 0)
  196. || test_lstm_with_hidden_input(RandomMat(16, 8), 7, 0)
  197. || test_lstm_with_hidden_input(RandomMat(17, 8), 8, 0)
  198. || test_lstm_with_hidden_input(RandomMat(19, 15), 8, 0)
  199. || test_lstm_with_hidden_input(RandomMat(5, 16), 16, 0)
  200. || test_lstm_with_hidden_input(RandomMat(3, 16), 8, 0)
  201. || test_lstm_with_hidden_input(RandomMat(2, 5), 17, 0, 15)
  202. || test_lstm_with_hidden_output(RandomMat(4, 4), 1, 2)
  203. || test_lstm_with_hidden_output(RandomMat(8, 2), 2, 2)
  204. || test_lstm_with_hidden_output(RandomMat(16, 8), 7, 2)
  205. || test_lstm_with_hidden_output(RandomMat(17, 8), 8, 2)
  206. || test_lstm_with_hidden_output(RandomMat(19, 15), 8, 2)
  207. || test_lstm_with_hidden_output(RandomMat(5, 16), 16, 2)
  208. || test_lstm_with_hidden_output(RandomMat(3, 16), 8, 2)
  209. || test_lstm_with_hidden_output(RandomMat(2, 5), 79, 2, 33)
  210. || test_lstm_with_hidden_output(RandomMat(4, 4), 1, 1)
  211. || test_lstm_with_hidden_output(RandomMat(8, 2), 2, 1)
  212. || test_lstm_with_hidden_output(RandomMat(16, 8), 7, 1)
  213. || test_lstm_with_hidden_output(RandomMat(17, 8), 8, 1)
  214. || test_lstm_with_hidden_output(RandomMat(19, 15), 8, 1)
  215. || test_lstm_with_hidden_output(RandomMat(5, 16), 16, 1)
  216. || test_lstm_with_hidden_output(RandomMat(3, 16), 8, 1)
  217. || test_lstm_with_hidden_output(RandomMat(2, 5), 79, 1, 33)
  218. || test_lstm_with_hidden_output(RandomMat(4, 2), 1, 0)
  219. || test_lstm_with_hidden_output(RandomMat(8, 2), 2, 0)
  220. || test_lstm_with_hidden_output(RandomMat(16, 8), 7, 0)
  221. || test_lstm_with_hidden_output(RandomMat(17, 8), 8, 0)
  222. || test_lstm_with_hidden_output(RandomMat(19, 15), 8, 0)
  223. || test_lstm_with_hidden_output(RandomMat(5, 16), 16, 0)
  224. || test_lstm_with_hidden_output(RandomMat(3, 16), 8, 0)
  225. || test_lstm_with_hidden_output(RandomMat(2, 5), 17, 0, 15);
  226. }
  227. static int test_lstm_2()
  228. {
  229. return 0
  230. || test_lstm(RandomMat(4, 1), 1, 0)
  231. || test_lstm(RandomMat(8, 2), 2, 0)
  232. || test_lstm(RandomMat(16, 8), 7, 0)
  233. || test_lstm(RandomMat(17, 8), 8, 0)
  234. || test_lstm(RandomMat(19, 15), 8, 0)
  235. || test_lstm(RandomMat(5, 16), 16, 0)
  236. || test_lstm(RandomMat(3, 16), 8, 0)
  237. || test_lstm(RandomMat(8, 16), 16, 0)
  238. || test_lstm(RandomMat(2, 5), 17, 0, 15);
  239. }
  240. static int test_lstm_3()
  241. {
  242. return 0
  243. || test_lstm(RandomMat(4, 1), 1, 1)
  244. || test_lstm(RandomMat(8, 2), 2, 1)
  245. || test_lstm(RandomMat(16, 8), 7, 1)
  246. || test_lstm(RandomMat(17, 8), 8, 1)
  247. || test_lstm(RandomMat(19, 15), 8, 1)
  248. || test_lstm(RandomMat(5, 16), 16, 1)
  249. || test_lstm(RandomMat(3, 16), 8, 1)
  250. || test_lstm(RandomMat(8, 16), 16, 1)
  251. || test_lstm(RandomMat(2, 5), 17, 1, 15);
  252. }
  253. #if NCNN_INT8
  254. static int test_lstm_int8(const ncnn::Mat& a, int outch, int direction, int hidden_size = 0)
  255. {
  256. int input_size = a.w;
  257. int num_directions = direction == 2 ? 2 : 1;
  258. if (hidden_size == 0)
  259. hidden_size = outch;
  260. ncnn::ParamDict pd;
  261. pd.set(0, outch);
  262. pd.set(1, hidden_size * input_size * 4 * num_directions);
  263. pd.set(2, direction);
  264. pd.set(3, hidden_size);
  265. pd.set(8, 2); // int8_scale_term
  266. std::vector<ncnn::Mat> weights(hidden_size == outch ? 5 : 6);
  267. weights[0] = RandomS8Mat(hidden_size * input_size * 4 * num_directions);
  268. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  269. weights[2] = RandomS8Mat(outch * hidden_size * 4 * num_directions);
  270. if (hidden_size != outch)
  271. {
  272. weights[3] = RandomMat(hidden_size * outch * num_directions);
  273. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  274. weights[5] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  275. }
  276. else
  277. {
  278. weights[3] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  279. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  280. }
  281. int ret = test_layer("LSTM", pd, weights, a);
  282. if (ret != 0)
  283. {
  284. fprintf(stderr, "test_lstm_int8 failed a.dims=%d a=(%d %d %d) outch=%d direction=%d hidden_size=%d\n", a.dims, a.w, a.h, a.c, outch, direction, hidden_size);
  285. }
  286. return ret;
  287. }
  288. static int test_lstm_int8_with_hidden(const ncnn::Mat& a, int outch, int direction, int hidden_size = 0)
  289. {
  290. int input_size = a.w;
  291. int num_directions = direction == 2 ? 2 : 1;
  292. if (hidden_size == 0)
  293. hidden_size = outch;
  294. ncnn::ParamDict pd;
  295. pd.set(0, outch);
  296. pd.set(1, hidden_size * input_size * 4 * num_directions);
  297. pd.set(2, direction);
  298. pd.set(3, hidden_size);
  299. pd.set(8, 2); // int8_scale_term
  300. std::vector<ncnn::Mat> weights(hidden_size == outch ? 5 : 6);
  301. weights[0] = RandomS8Mat(hidden_size * input_size * 4 * num_directions);
  302. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  303. weights[2] = RandomS8Mat(outch * hidden_size * 4 * num_directions);
  304. if (hidden_size != outch)
  305. {
  306. weights[3] = RandomMat(hidden_size * outch * num_directions);
  307. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  308. weights[5] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  309. }
  310. else
  311. {
  312. weights[3] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  313. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  314. }
  315. // initial hidden state
  316. ncnn::Mat hidden = RandomMat(outch, num_directions);
  317. // initial cell state
  318. ncnn::Mat cell = RandomMat(hidden_size, num_directions);
  319. std::vector<ncnn::Mat> as(3);
  320. as[0] = a;
  321. as[1] = hidden;
  322. as[2] = cell;
  323. int ret = test_layer("LSTM", pd, weights, as, 3);
  324. if (ret != 0)
  325. {
  326. fprintf(stderr, "test_lstm_int8_with_hidden failed a.dims=%d a=(%d %d %d) outch=%d direction=%d hidden_size=%d\n", a.dims, a.w, a.h, a.c, outch, direction, hidden_size);
  327. }
  328. return ret;
  329. }
  330. static int test_lstm_int8_with_hidden_input(const ncnn::Mat& a, int outch, int direction, int hidden_size = 0)
  331. {
  332. int input_size = a.w;
  333. int num_directions = direction == 2 ? 2 : 1;
  334. if (hidden_size == 0)
  335. hidden_size = outch;
  336. ncnn::ParamDict pd;
  337. pd.set(0, outch);
  338. pd.set(1, hidden_size * input_size * 4 * num_directions);
  339. pd.set(2, direction);
  340. pd.set(3, hidden_size);
  341. pd.set(8, 2); // int8_scale_term
  342. std::vector<ncnn::Mat> weights(hidden_size == outch ? 5 : 6);
  343. weights[0] = RandomS8Mat(hidden_size * input_size * 4 * num_directions);
  344. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  345. weights[2] = RandomS8Mat(outch * hidden_size * 4 * num_directions);
  346. if (hidden_size != outch)
  347. {
  348. weights[3] = RandomMat(hidden_size * outch * num_directions);
  349. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  350. weights[5] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  351. }
  352. else
  353. {
  354. weights[3] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  355. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  356. }
  357. // initial hidden state
  358. ncnn::Mat hidden = RandomMat(outch, num_directions);
  359. // initial cell state
  360. ncnn::Mat cell = RandomMat(hidden_size, num_directions);
  361. std::vector<ncnn::Mat> as(3);
  362. as[0] = a;
  363. as[1] = hidden;
  364. as[2] = cell;
  365. int ret = test_layer("LSTM", pd, weights, as, 1);
  366. if (ret != 0)
  367. {
  368. fprintf(stderr, "test_lstm_int8_with_hidden_input failed a.dims=%d a=(%d %d %d) outch=%d direction=%d hidden_size=%d\n", a.dims, a.w, a.h, a.c, outch, direction, hidden_size);
  369. }
  370. return ret;
  371. }
  372. static int test_lstm_int8_with_hidden_output(const ncnn::Mat& a, int outch, int direction, int hidden_size = 0)
  373. {
  374. int input_size = a.w;
  375. int num_directions = direction == 2 ? 2 : 1;
  376. if (hidden_size == 0)
  377. hidden_size = outch;
  378. ncnn::ParamDict pd;
  379. pd.set(0, outch);
  380. pd.set(1, hidden_size * input_size * 4 * num_directions);
  381. pd.set(2, direction);
  382. pd.set(3, hidden_size);
  383. pd.set(8, 2); // int8_scale_term
  384. std::vector<ncnn::Mat> weights(hidden_size == outch ? 5 : 6);
  385. weights[0] = RandomS8Mat(hidden_size * input_size * 4 * num_directions);
  386. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  387. weights[2] = RandomS8Mat(outch * hidden_size * 4 * num_directions);
  388. if (hidden_size != outch)
  389. {
  390. weights[3] = RandomMat(hidden_size * outch * num_directions);
  391. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  392. weights[5] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  393. }
  394. else
  395. {
  396. weights[3] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  397. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  398. }
  399. std::vector<ncnn::Mat> as(1);
  400. as[0] = a;
  401. int ret = test_layer("LSTM", pd, weights, as, 3);
  402. if (ret != 0)
  403. {
  404. fprintf(stderr, "test_lstm_int8_with_hidden_output failed a.dims=%d a=(%d %d %d) outch=%d direction=%d hidden_size=%d\n", a.dims, a.w, a.h, a.c, outch, direction, hidden_size);
  405. }
  406. return ret;
  407. }
  408. static int test_lstm_4()
  409. {
  410. return 0
  411. || test_lstm_int8(RandomMat(4, 1), 2, 2)
  412. || test_lstm_int8(RandomMat(8, 2), 2, 2)
  413. || test_lstm_int8(RandomMat(16, 8), 7, 2)
  414. || test_lstm_int8(RandomMat(17, 8), 8, 2)
  415. || test_lstm_int8(RandomMat(19, 15), 8, 2)
  416. || test_lstm_int8(RandomMat(5, 16), 16, 2)
  417. || test_lstm_int8(RandomMat(3, 16), 8, 2)
  418. || test_lstm_int8(RandomMat(8, 16), 16, 2)
  419. || test_lstm_int8(RandomMat(31, 3), 31, 2)
  420. || test_lstm_int8(RandomMat(2, 5), 17, 2, 15);
  421. }
  422. static int test_lstm_5()
  423. {
  424. return 0
  425. || test_lstm_int8_with_hidden(RandomMat(4, 4), 1, 2)
  426. || test_lstm_int8_with_hidden(RandomMat(8, 2), 2, 2)
  427. || test_lstm_int8_with_hidden(RandomMat(16, 8), 7, 2)
  428. || test_lstm_int8_with_hidden(RandomMat(17, 8), 8, 2)
  429. || test_lstm_int8_with_hidden(RandomMat(19, 15), 8, 2)
  430. || test_lstm_int8_with_hidden(RandomMat(5, 16), 16, 2)
  431. || test_lstm_int8_with_hidden(RandomMat(3, 16), 8, 2)
  432. || test_lstm_int8_with_hidden(RandomMat(2, 5), 79, 2, 33)
  433. || test_lstm_int8_with_hidden(RandomMat(4, 4), 1, 1)
  434. || test_lstm_int8_with_hidden(RandomMat(8, 2), 2, 1)
  435. || test_lstm_int8_with_hidden(RandomMat(16, 8), 7, 1)
  436. || test_lstm_int8_with_hidden(RandomMat(17, 8), 8, 1)
  437. || test_lstm_int8_with_hidden(RandomMat(19, 15), 8, 1)
  438. || test_lstm_int8_with_hidden(RandomMat(5, 16), 16, 1)
  439. || test_lstm_int8_with_hidden(RandomMat(3, 16), 8, 1)
  440. || test_lstm_int8_with_hidden(RandomMat(2, 5), 79, 1, 33)
  441. || test_lstm_int8_with_hidden(RandomMat(4, 2), 1, 0)
  442. || test_lstm_int8_with_hidden(RandomMat(8, 2), 2, 0)
  443. || test_lstm_int8_with_hidden(RandomMat(16, 8), 7, 0)
  444. || test_lstm_int8_with_hidden(RandomMat(17, 8), 8, 0)
  445. || test_lstm_int8_with_hidden(RandomMat(19, 15), 8, 0)
  446. || test_lstm_int8_with_hidden(RandomMat(5, 16), 16, 0)
  447. || test_lstm_int8_with_hidden(RandomMat(3, 16), 8, 0)
  448. || test_lstm_int8_with_hidden(RandomMat(2, 5), 17, 0, 15)
  449. || test_lstm_int8_with_hidden_input(RandomMat(4, 4), 1, 2)
  450. || test_lstm_int8_with_hidden_input(RandomMat(8, 2), 2, 2)
  451. || test_lstm_int8_with_hidden_input(RandomMat(16, 8), 7, 2)
  452. || test_lstm_int8_with_hidden_input(RandomMat(17, 8), 8, 2)
  453. || test_lstm_int8_with_hidden_input(RandomMat(19, 15), 8, 2)
  454. || test_lstm_int8_with_hidden_input(RandomMat(5, 16), 16, 2)
  455. || test_lstm_int8_with_hidden_input(RandomMat(3, 16), 8, 2)
  456. || test_lstm_int8_with_hidden_input(RandomMat(2, 5), 79, 2, 33)
  457. || test_lstm_int8_with_hidden_input(RandomMat(4, 4), 1, 1)
  458. || test_lstm_int8_with_hidden_input(RandomMat(8, 2), 2, 1)
  459. || test_lstm_int8_with_hidden_input(RandomMat(16, 8), 7, 1)
  460. || test_lstm_int8_with_hidden_input(RandomMat(17, 8), 8, 1)
  461. || test_lstm_int8_with_hidden_input(RandomMat(19, 15), 8, 1)
  462. || test_lstm_int8_with_hidden_input(RandomMat(5, 16), 16, 1)
  463. || test_lstm_int8_with_hidden_input(RandomMat(3, 16), 8, 1)
  464. || test_lstm_int8_with_hidden_input(RandomMat(2, 5), 79, 1, 33)
  465. || test_lstm_int8_with_hidden_input(RandomMat(4, 2), 1, 0)
  466. || test_lstm_int8_with_hidden_input(RandomMat(8, 2), 2, 0)
  467. || test_lstm_int8_with_hidden_input(RandomMat(16, 8), 7, 0)
  468. || test_lstm_int8_with_hidden_input(RandomMat(17, 8), 8, 0)
  469. || test_lstm_int8_with_hidden_input(RandomMat(19, 15), 8, 0)
  470. || test_lstm_int8_with_hidden_input(RandomMat(5, 16), 16, 0)
  471. || test_lstm_int8_with_hidden_input(RandomMat(3, 16), 8, 0)
  472. || test_lstm_int8_with_hidden_input(RandomMat(2, 5), 17, 0, 15)
  473. || test_lstm_int8_with_hidden_output(RandomMat(4, 4), 1, 2)
  474. || test_lstm_int8_with_hidden_output(RandomMat(8, 2), 2, 2)
  475. || test_lstm_int8_with_hidden_output(RandomMat(16, 8), 7, 2)
  476. || test_lstm_int8_with_hidden_output(RandomMat(17, 8), 8, 2)
  477. || test_lstm_int8_with_hidden_output(RandomMat(19, 15), 8, 2)
  478. || test_lstm_int8_with_hidden_output(RandomMat(5, 16), 16, 2)
  479. || test_lstm_int8_with_hidden_output(RandomMat(3, 16), 8, 2)
  480. || test_lstm_int8_with_hidden_output(RandomMat(2, 5), 79, 2, 33)
  481. || test_lstm_int8_with_hidden_output(RandomMat(4, 4), 1, 1)
  482. || test_lstm_int8_with_hidden_output(RandomMat(8, 2), 2, 1)
  483. || test_lstm_int8_with_hidden_output(RandomMat(16, 8), 7, 1)
  484. || test_lstm_int8_with_hidden_output(RandomMat(17, 8), 8, 1)
  485. || test_lstm_int8_with_hidden_output(RandomMat(19, 15), 8, 1)
  486. || test_lstm_int8_with_hidden_output(RandomMat(5, 16), 16, 1)
  487. || test_lstm_int8_with_hidden_output(RandomMat(3, 16), 8, 1)
  488. || test_lstm_int8_with_hidden_output(RandomMat(2, 5), 79, 1, 33)
  489. || test_lstm_int8_with_hidden_output(RandomMat(4, 2), 1, 0)
  490. || test_lstm_int8_with_hidden_output(RandomMat(8, 2), 2, 0)
  491. || test_lstm_int8_with_hidden_output(RandomMat(16, 8), 7, 0)
  492. || test_lstm_int8_with_hidden_output(RandomMat(17, 8), 8, 0)
  493. || test_lstm_int8_with_hidden_output(RandomMat(19, 15), 8, 0)
  494. || test_lstm_int8_with_hidden_output(RandomMat(5, 16), 16, 0)
  495. || test_lstm_int8_with_hidden_output(RandomMat(3, 16), 8, 0)
  496. || test_lstm_int8_with_hidden_output(RandomMat(2, 5), 17, 0, 15);
  497. }
  498. static int test_lstm_6()
  499. {
  500. return 0
  501. || test_lstm_int8(RandomMat(4, 1), 1, 0)
  502. || test_lstm_int8(RandomMat(8, 2), 2, 0)
  503. || test_lstm_int8(RandomMat(16, 8), 7, 0)
  504. || test_lstm_int8(RandomMat(17, 8), 8, 0)
  505. || test_lstm_int8(RandomMat(19, 15), 8, 0)
  506. || test_lstm_int8(RandomMat(5, 16), 16, 0)
  507. || test_lstm_int8(RandomMat(3, 16), 8, 0)
  508. || test_lstm_int8(RandomMat(8, 16), 16, 0)
  509. || test_lstm_int8(RandomMat(2, 5), 17, 0, 15);
  510. }
  511. static int test_lstm_7()
  512. {
  513. return 0
  514. || test_lstm_int8(RandomMat(4, 1), 1, 1)
  515. || test_lstm_int8(RandomMat(8, 2), 2, 1)
  516. || test_lstm_int8(RandomMat(16, 8), 7, 1)
  517. || test_lstm_int8(RandomMat(17, 8), 8, 1)
  518. || test_lstm_int8(RandomMat(19, 15), 8, 1)
  519. || test_lstm_int8(RandomMat(5, 16), 16, 1)
  520. || test_lstm_int8(RandomMat(3, 16), 8, 1)
  521. || test_lstm_int8(RandomMat(8, 16), 16, 1)
  522. || test_lstm_int8(RandomMat(2, 5), 17, 1, 15);
  523. }
  524. #endif
  525. int main()
  526. {
  527. SRAND(7767517);
  528. #if NCNN_INT8
  529. return 0
  530. || test_lstm_0()
  531. || test_lstm_1()
  532. || test_lstm_2()
  533. || test_lstm_3()
  534. || test_lstm_4()
  535. || test_lstm_5()
  536. || test_lstm_6()
  537. || test_lstm_7();
  538. #else
  539. return 0
  540. || test_lstm_0()
  541. || test_lstm_1()
  542. || test_lstm_2()
  543. || test_lstm_3();
  544. #endif
  545. }