You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

test_lstm.cpp 24 kB

LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663
  1. // Copyright 2020 Tencent
  2. // SPDX-License-Identifier: BSD-3-Clause
  3. #include "testutil.h"
  4. static int test_lstm(int size, int T, int outch, int direction, int hidden_size = 0)
  5. {
  6. ncnn::Mat a = RandomMat(size, T);
  7. int num_directions = direction == 2 ? 2 : 1;
  8. if (hidden_size == 0)
  9. hidden_size = outch;
  10. ncnn::ParamDict pd;
  11. pd.set(0, outch);
  12. pd.set(1, hidden_size * size * 4 * num_directions);
  13. pd.set(2, direction);
  14. pd.set(3, hidden_size);
  15. std::vector<ncnn::Mat> weights(hidden_size == outch ? 3 : 4);
  16. weights[0] = RandomMat(hidden_size * size * 4 * num_directions);
  17. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  18. weights[2] = RandomMat(outch * hidden_size * 4 * num_directions);
  19. if (hidden_size != outch)
  20. {
  21. weights[3] = RandomMat(hidden_size * outch * num_directions);
  22. }
  23. int ret = test_layer("LSTM", pd, weights, a);
  24. if (ret != 0)
  25. {
  26. fprintf(stderr, "test_lstm failed size=%d T=%d outch=%d direction=%d hidden_size=%d\n", size, T, outch, direction, hidden_size);
  27. }
  28. return ret;
  29. }
  30. static int test_lstm_with_hidden(int size, int T, int outch, int direction, int hidden_size = 0)
  31. {
  32. ncnn::Mat a = RandomMat(size, T);
  33. int num_directions = direction == 2 ? 2 : 1;
  34. if (hidden_size == 0)
  35. hidden_size = outch;
  36. ncnn::ParamDict pd;
  37. pd.set(0, outch);
  38. pd.set(1, hidden_size * size * 4 * num_directions);
  39. pd.set(2, direction);
  40. pd.set(3, hidden_size);
  41. std::vector<ncnn::Mat> weights(hidden_size == outch ? 3 : 4);
  42. weights[0] = RandomMat(hidden_size * size * 4 * num_directions);
  43. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  44. weights[2] = RandomMat(outch * hidden_size * 4 * num_directions);
  45. if (hidden_size != outch)
  46. {
  47. weights[3] = RandomMat(hidden_size * outch * num_directions);
  48. }
  49. // initial hidden state
  50. ncnn::Mat hidden = RandomMat(outch, num_directions);
  51. // initial cell state
  52. ncnn::Mat cell = RandomMat(hidden_size, num_directions);
  53. std::vector<ncnn::Mat> as(3);
  54. as[0] = a;
  55. as[1] = hidden;
  56. as[2] = cell;
  57. int ret = test_layer("LSTM", pd, weights, as, 3);
  58. if (ret != 0)
  59. {
  60. fprintf(stderr, "test_lstm_with_hidden failed size=%d T=%d outch=%d direction=%d hidden_size=%d\n", size, T, outch, direction, hidden_size);
  61. }
  62. return ret;
  63. }
  64. static int test_lstm_with_hidden_input(int size, int T, int outch, int direction, int hidden_size = 0)
  65. {
  66. ncnn::Mat a = RandomMat(size, T);
  67. int num_directions = direction == 2 ? 2 : 1;
  68. if (hidden_size == 0)
  69. hidden_size = outch;
  70. ncnn::ParamDict pd;
  71. pd.set(0, outch);
  72. pd.set(1, hidden_size * size * 4 * num_directions);
  73. pd.set(2, direction);
  74. pd.set(3, hidden_size);
  75. std::vector<ncnn::Mat> weights(hidden_size == outch ? 3 : 4);
  76. weights[0] = RandomMat(hidden_size * size * 4 * num_directions);
  77. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  78. weights[2] = RandomMat(outch * hidden_size * 4 * num_directions);
  79. if (hidden_size != outch)
  80. {
  81. weights[3] = RandomMat(hidden_size * outch * num_directions);
  82. }
  83. // initial hidden state
  84. ncnn::Mat hidden = RandomMat(outch, num_directions);
  85. // initial cell state
  86. ncnn::Mat cell = RandomMat(hidden_size, num_directions);
  87. std::vector<ncnn::Mat> as(3);
  88. as[0] = a;
  89. as[1] = hidden;
  90. as[2] = cell;
  91. int ret = test_layer("LSTM", pd, weights, as, 1);
  92. if (ret != 0)
  93. {
  94. fprintf(stderr, "test_lstm_with_hidden_input failed size=%d T=%d outch=%d direction=%d hidden_size=%d\n", size, T, outch, direction, hidden_size);
  95. }
  96. return ret;
  97. }
  98. static int test_lstm_with_hidden_output(int size, int T, int outch, int direction, int hidden_size = 0)
  99. {
  100. ncnn::Mat a = RandomMat(size, T);
  101. int num_directions = direction == 2 ? 2 : 1;
  102. if (hidden_size == 0)
  103. hidden_size = outch;
  104. ncnn::ParamDict pd;
  105. pd.set(0, outch);
  106. pd.set(1, hidden_size * size * 4 * num_directions);
  107. pd.set(2, direction);
  108. pd.set(3, hidden_size);
  109. std::vector<ncnn::Mat> weights(hidden_size == outch ? 3 : 4);
  110. weights[0] = RandomMat(hidden_size * size * 4 * num_directions);
  111. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  112. weights[2] = RandomMat(outch * hidden_size * 4 * num_directions);
  113. if (hidden_size != outch)
  114. {
  115. weights[3] = RandomMat(hidden_size * outch * num_directions);
  116. }
  117. std::vector<ncnn::Mat> as(1);
  118. as[0] = a;
  119. int ret = test_layer("LSTM", pd, weights, as, 3);
  120. if (ret != 0)
  121. {
  122. fprintf(stderr, "test_lstm_with_hidden_output failed size=%d T=%d outch=%d direction=%d hidden_size=%d\n", size, T, outch, direction, hidden_size);
  123. }
  124. return ret;
  125. }
  126. static int test_lstm_0()
  127. {
  128. return 0
  129. || test_lstm(4, 1, 2, 2)
  130. || test_lstm(8, 2, 2, 2)
  131. || test_lstm(16, 8, 7, 2)
  132. || test_lstm(17, 8, 8, 2)
  133. || test_lstm(19, 15, 8, 2)
  134. || test_lstm(5, 16, 16, 2)
  135. || test_lstm(3, 16, 8, 2)
  136. || test_lstm(8, 16, 16, 2)
  137. || test_lstm(31, 3, 31, 2)
  138. || test_lstm(2, 5, 17, 2, 15);
  139. }
  140. static int test_lstm_1()
  141. {
  142. return 0
  143. || test_lstm_with_hidden(4, 4, 1, 2)
  144. || test_lstm_with_hidden(8, 2, 2, 2)
  145. || test_lstm_with_hidden(16, 8, 7, 2)
  146. || test_lstm_with_hidden(17, 8, 8, 2)
  147. || test_lstm_with_hidden(19, 15, 8, 2)
  148. || test_lstm_with_hidden(5, 16, 16, 2)
  149. || test_lstm_with_hidden(3, 16, 8, 2)
  150. || test_lstm_with_hidden(2, 5, 79, 2, 33)
  151. || test_lstm_with_hidden(4, 4, 1, 1)
  152. || test_lstm_with_hidden(8, 2, 2, 1)
  153. || test_lstm_with_hidden(16, 8, 7, 1)
  154. || test_lstm_with_hidden(17, 8, 8, 1)
  155. || test_lstm_with_hidden(19, 15, 8, 1)
  156. || test_lstm_with_hidden(5, 16, 16, 1)
  157. || test_lstm_with_hidden(3, 16, 8, 1)
  158. || test_lstm_with_hidden(2, 5, 79, 1, 33)
  159. || test_lstm_with_hidden(4, 2, 1, 0)
  160. || test_lstm_with_hidden(8, 2, 2, 0)
  161. || test_lstm_with_hidden(16, 8, 7, 0)
  162. || test_lstm_with_hidden(17, 8, 8, 0)
  163. || test_lstm_with_hidden(19, 15, 8, 0)
  164. || test_lstm_with_hidden(5, 16, 16, 0)
  165. || test_lstm_with_hidden(3, 16, 8, 0)
  166. || test_lstm_with_hidden(2, 5, 17, 0, 15)
  167. || test_lstm_with_hidden_input(4, 4, 1, 2)
  168. || test_lstm_with_hidden_input(8, 2, 2, 2)
  169. || test_lstm_with_hidden_input(16, 8, 7, 2)
  170. || test_lstm_with_hidden_input(17, 8, 8, 2)
  171. || test_lstm_with_hidden_input(19, 15, 8, 2)
  172. || test_lstm_with_hidden_input(5, 16, 16, 2)
  173. || test_lstm_with_hidden_input(3, 16, 8, 2)
  174. || test_lstm_with_hidden_input(2, 5, 79, 2, 33)
  175. || test_lstm_with_hidden_input(4, 4, 1, 1)
  176. || test_lstm_with_hidden_input(8, 2, 2, 1)
  177. || test_lstm_with_hidden_input(16, 8, 7, 1)
  178. || test_lstm_with_hidden_input(17, 8, 8, 1)
  179. || test_lstm_with_hidden_input(19, 15, 8, 1)
  180. || test_lstm_with_hidden_input(5, 16, 16, 1)
  181. || test_lstm_with_hidden_input(3, 16, 8, 1)
  182. || test_lstm_with_hidden_input(2, 5, 79, 1, 33)
  183. || test_lstm_with_hidden_input(4, 2, 1, 0)
  184. || test_lstm_with_hidden_input(8, 2, 2, 0)
  185. || test_lstm_with_hidden_input(16, 8, 7, 0)
  186. || test_lstm_with_hidden_input(17, 8, 8, 0)
  187. || test_lstm_with_hidden_input(19, 15, 8, 0)
  188. || test_lstm_with_hidden_input(5, 16, 16, 0)
  189. || test_lstm_with_hidden_input(3, 16, 8, 0)
  190. || test_lstm_with_hidden_input(2, 5, 17, 0, 15)
  191. || test_lstm_with_hidden_output(4, 4, 1, 2)
  192. || test_lstm_with_hidden_output(8, 2, 2, 2)
  193. || test_lstm_with_hidden_output(16, 8, 7, 2)
  194. || test_lstm_with_hidden_output(17, 8, 8, 2)
  195. || test_lstm_with_hidden_output(19, 15, 8, 2)
  196. || test_lstm_with_hidden_output(5, 16, 16, 2)
  197. || test_lstm_with_hidden_output(3, 16, 8, 2)
  198. || test_lstm_with_hidden_output(2, 5, 79, 2, 33)
  199. || test_lstm_with_hidden_output(4, 4, 1, 1)
  200. || test_lstm_with_hidden_output(8, 2, 2, 1)
  201. || test_lstm_with_hidden_output(16, 8, 7, 1)
  202. || test_lstm_with_hidden_output(17, 8, 8, 1)
  203. || test_lstm_with_hidden_output(19, 15, 8, 1)
  204. || test_lstm_with_hidden_output(5, 16, 16, 1)
  205. || test_lstm_with_hidden_output(3, 16, 8, 1)
  206. || test_lstm_with_hidden_output(2, 5, 79, 1, 33)
  207. || test_lstm_with_hidden_output(4, 2, 1, 0)
  208. || test_lstm_with_hidden_output(8, 2, 2, 0)
  209. || test_lstm_with_hidden_output(16, 8, 7, 0)
  210. || test_lstm_with_hidden_output(17, 8, 8, 0)
  211. || test_lstm_with_hidden_output(19, 15, 8, 0)
  212. || test_lstm_with_hidden_output(5, 16, 16, 0)
  213. || test_lstm_with_hidden_output(3, 16, 8, 0)
  214. || test_lstm_with_hidden_output(2, 5, 17, 0, 15);
  215. }
  216. static int test_lstm_2()
  217. {
  218. return 0
  219. || test_lstm(4, 1, 1, 0)
  220. || test_lstm(8, 2, 2, 0)
  221. || test_lstm(16, 8, 7, 0)
  222. || test_lstm(17, 8, 8, 0)
  223. || test_lstm(19, 15, 8, 0)
  224. || test_lstm(5, 16, 16, 0)
  225. || test_lstm(3, 16, 8, 0)
  226. || test_lstm(8, 16, 16, 0)
  227. || test_lstm(2, 5, 17, 0, 15);
  228. }
  229. static int test_lstm_3()
  230. {
  231. return 0
  232. || test_lstm(4, 1, 1, 1)
  233. || test_lstm(8, 2, 2, 1)
  234. || test_lstm(16, 8, 7, 1)
  235. || test_lstm(17, 8, 8, 1)
  236. || test_lstm(19, 15, 8, 1)
  237. || test_lstm(5, 16, 16, 1)
  238. || test_lstm(3, 16, 8, 1)
  239. || test_lstm(8, 16, 16, 1)
  240. || test_lstm(2, 5, 17, 1, 15);
  241. }
  242. #if NCNN_INT8
  243. static void RandomizeA(ncnn::Mat& m, float absmax)
  244. {
  245. absmax = ncnn::float16_to_float32(ncnn::float32_to_float16(absmax));
  246. absmax = ncnn::bfloat16_to_float32(ncnn::float32_to_bfloat16(absmax));
  247. const int h = m.h;
  248. float* p = m;
  249. for (int i = 0; i < h; i++)
  250. {
  251. float* p = m.row(i);
  252. for (int j = 0; j < m.w; j++)
  253. {
  254. p[j] = RandomFloat(-absmax, absmax);
  255. // drop 0.45 ~ 0.55
  256. float v = p[j] * (127.f / absmax);
  257. float vv = fabs(v - (int)v);
  258. float hp = ncnn::float16_to_float32(ncnn::float32_to_float16(p[j]));
  259. float hv = hp * (127.f / absmax);
  260. float hvv = fabs(hv - (int)hv);
  261. float bp = ncnn::bfloat16_to_float32(ncnn::float32_to_bfloat16(p[j]));
  262. float bv = bp * (127.f / absmax);
  263. float bvv = fabs(bv - (int)bv);
  264. while ((vv > 0.45f && vv < 0.55f) || (hvv > 0.45f && hvv < 0.55f) || (bvv > 0.45f && bvv < 0.55f))
  265. {
  266. p[j] = RandomFloat(-absmax, absmax);
  267. v = p[j] * (127.f / absmax);
  268. vv = fabs(v - (int)v);
  269. hp = ncnn::float16_to_float32(ncnn::float32_to_float16(p[j]));
  270. hv = hp * (127.f / absmax);
  271. hvv = fabs(hv - (int)hv);
  272. bp = ncnn::bfloat16_to_float32(ncnn::float32_to_bfloat16(p[j]));
  273. bv = bp * (127.f / absmax);
  274. bvv = fabs(bv - (int)bv);
  275. }
  276. }
  277. }
  278. // set random a and b
  279. m.row(RandomInt(0, h - 1))[RandomInt(0, m.w - 1)] = -absmax;
  280. m.row(RandomInt(0, h - 1))[RandomInt(0, m.w - 1)] = absmax;
  281. }
  282. static int test_lstm_int8(int size, int T, int outch, int direction, int hidden_size = 0)
  283. {
  284. int num_directions = direction == 2 ? 2 : 1;
  285. if (hidden_size == 0)
  286. hidden_size = outch;
  287. ncnn::ParamDict pd;
  288. pd.set(0, outch);
  289. pd.set(1, hidden_size * size * 4 * num_directions);
  290. pd.set(2, direction);
  291. pd.set(3, hidden_size);
  292. pd.set(8, 2); // int8_scale_term
  293. std::vector<ncnn::Mat> weights(hidden_size == outch ? 5 : 6);
  294. weights[0] = RandomS8Mat(hidden_size * size * 4 * num_directions);
  295. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  296. weights[2] = RandomS8Mat(outch * hidden_size * 4 * num_directions);
  297. if (hidden_size != outch)
  298. {
  299. weights[3] = RandomMat(hidden_size * outch * num_directions);
  300. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  301. weights[5] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  302. }
  303. else
  304. {
  305. weights[3] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  306. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  307. }
  308. ncnn::Mat a(size, T);
  309. RandomizeA(a, 10.f);
  310. int ret = test_layer("LSTM", pd, weights, a);
  311. if (ret != 0)
  312. {
  313. fprintf(stderr, "test_lstm_int8 failed size=%d T=%d outch=%d direction=%d hidden_size=%d\n", size, T, outch, direction, hidden_size);
  314. }
  315. return ret;
  316. }
  317. static int test_lstm_int8_with_hidden(int size, int T, int outch, int direction, int hidden_size = 0)
  318. {
  319. int num_directions = direction == 2 ? 2 : 1;
  320. if (hidden_size == 0)
  321. hidden_size = outch;
  322. ncnn::ParamDict pd;
  323. pd.set(0, outch);
  324. pd.set(1, hidden_size * size * 4 * num_directions);
  325. pd.set(2, direction);
  326. pd.set(3, hidden_size);
  327. pd.set(8, 2); // int8_scale_term
  328. std::vector<ncnn::Mat> weights(hidden_size == outch ? 5 : 6);
  329. weights[0] = RandomS8Mat(hidden_size * size * 4 * num_directions);
  330. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  331. weights[2] = RandomS8Mat(outch * hidden_size * 4 * num_directions);
  332. if (hidden_size != outch)
  333. {
  334. weights[3] = RandomMat(hidden_size * outch * num_directions);
  335. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  336. weights[5] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  337. }
  338. else
  339. {
  340. weights[3] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  341. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  342. }
  343. ncnn::Mat a(size, T);
  344. RandomizeA(a, 10.f);
  345. // initial hidden state
  346. ncnn::Mat hidden(outch, num_directions);
  347. RandomizeA(hidden, 10.f);
  348. // initial cell state
  349. ncnn::Mat cell(hidden_size, num_directions);
  350. RandomizeA(cell, 10.f);
  351. std::vector<ncnn::Mat> as(3);
  352. as[0] = a;
  353. as[1] = hidden;
  354. as[2] = cell;
  355. int ret = test_layer("LSTM", pd, weights, as, 3);
  356. if (ret != 0)
  357. {
  358. fprintf(stderr, "test_lstm_int8_with_hidden failed size=%d T=%d outch=%d direction=%d hidden_size=%d\n", size, T, outch, direction, hidden_size);
  359. }
  360. return ret;
  361. }
  362. static int test_lstm_int8_with_hidden_input(int size, int T, int outch, int direction, int hidden_size = 0)
  363. {
  364. int num_directions = direction == 2 ? 2 : 1;
  365. if (hidden_size == 0)
  366. hidden_size = outch;
  367. ncnn::ParamDict pd;
  368. pd.set(0, outch);
  369. pd.set(1, hidden_size * size * 4 * num_directions);
  370. pd.set(2, direction);
  371. pd.set(3, hidden_size);
  372. pd.set(8, 2); // int8_scale_term
  373. std::vector<ncnn::Mat> weights(hidden_size == outch ? 5 : 6);
  374. weights[0] = RandomS8Mat(hidden_size * size * 4 * num_directions);
  375. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  376. weights[2] = RandomS8Mat(outch * hidden_size * 4 * num_directions);
  377. if (hidden_size != outch)
  378. {
  379. weights[3] = RandomMat(hidden_size * outch * num_directions);
  380. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  381. weights[5] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  382. }
  383. else
  384. {
  385. weights[3] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  386. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  387. }
  388. ncnn::Mat a(size, T);
  389. RandomizeA(a, 10.f);
  390. // initial hidden state
  391. ncnn::Mat hidden(outch, num_directions);
  392. RandomizeA(hidden, 10.f);
  393. // initial cell state
  394. ncnn::Mat cell(hidden_size, num_directions);
  395. RandomizeA(cell, 10.f);
  396. std::vector<ncnn::Mat> as(3);
  397. as[0] = a;
  398. as[1] = hidden;
  399. as[2] = cell;
  400. int ret = test_layer("LSTM", pd, weights, as, 1);
  401. if (ret != 0)
  402. {
  403. fprintf(stderr, "test_lstm_int8_with_hidden_input failed size=%d T=%d outch=%d direction=%d hidden_size=%d\n", size, T, outch, direction, hidden_size);
  404. }
  405. return ret;
  406. }
  407. static int test_lstm_int8_with_hidden_output(int size, int T, int outch, int direction, int hidden_size = 0)
  408. {
  409. int num_directions = direction == 2 ? 2 : 1;
  410. if (hidden_size == 0)
  411. hidden_size = outch;
  412. ncnn::ParamDict pd;
  413. pd.set(0, outch);
  414. pd.set(1, hidden_size * size * 4 * num_directions);
  415. pd.set(2, direction);
  416. pd.set(3, hidden_size);
  417. pd.set(8, 2); // int8_scale_term
  418. std::vector<ncnn::Mat> weights(hidden_size == outch ? 5 : 6);
  419. weights[0] = RandomS8Mat(hidden_size * size * 4 * num_directions);
  420. weights[1] = RandomMat(hidden_size * 4 * num_directions);
  421. weights[2] = RandomS8Mat(outch * hidden_size * 4 * num_directions);
  422. if (hidden_size != outch)
  423. {
  424. weights[3] = RandomMat(hidden_size * outch * num_directions);
  425. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  426. weights[5] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  427. }
  428. else
  429. {
  430. weights[3] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  431. weights[4] = RandomMat(hidden_size * 4 * num_directions, 100.f, 200.f);
  432. }
  433. ncnn::Mat a(size, T);
  434. RandomizeA(a, 10.f);
  435. std::vector<ncnn::Mat> as(1);
  436. as[0] = a;
  437. int ret = test_layer("LSTM", pd, weights, as, 3);
  438. if (ret != 0)
  439. {
  440. fprintf(stderr, "test_lstm_int8_with_hidden_output failed size=%d T=%d outch=%d direction=%d hidden_size=%d\n", size, T, outch, direction, hidden_size);
  441. }
  442. return ret;
  443. }
  444. static int test_lstm_4()
  445. {
  446. return 0
  447. || test_lstm_int8(4, 1, 2, 2)
  448. || test_lstm_int8(8, 2, 2, 2)
  449. || test_lstm_int8(16, 8, 7, 2)
  450. || test_lstm_int8(17, 8, 8, 2)
  451. || test_lstm_int8(19, 15, 8, 2)
  452. || test_lstm_int8(5, 16, 16, 2)
  453. || test_lstm_int8(3, 16, 8, 2)
  454. || test_lstm_int8(8, 16, 16, 2)
  455. || test_lstm_int8(31, 3, 31, 2)
  456. || test_lstm_int8(2, 5, 17, 2, 15);
  457. }
  458. static int test_lstm_5()
  459. {
  460. return 0
  461. || test_lstm_int8_with_hidden(4, 4, 1, 2)
  462. || test_lstm_int8_with_hidden(8, 2, 2, 2)
  463. || test_lstm_int8_with_hidden(16, 8, 7, 2)
  464. || test_lstm_int8_with_hidden(17, 8, 8, 2)
  465. || test_lstm_int8_with_hidden(19, 15, 8, 2)
  466. || test_lstm_int8_with_hidden(5, 16, 16, 2)
  467. || test_lstm_int8_with_hidden(3, 16, 8, 2)
  468. || test_lstm_int8_with_hidden(2, 5, 79, 2, 33)
  469. || test_lstm_int8_with_hidden(4, 4, 1, 1)
  470. || test_lstm_int8_with_hidden(8, 2, 2, 1)
  471. || test_lstm_int8_with_hidden(16, 8, 7, 1)
  472. || test_lstm_int8_with_hidden(17, 8, 8, 1)
  473. || test_lstm_int8_with_hidden(19, 15, 8, 1)
  474. || test_lstm_int8_with_hidden(5, 16, 16, 1)
  475. || test_lstm_int8_with_hidden(3, 16, 8, 1)
  476. || test_lstm_int8_with_hidden(2, 5, 79, 1, 33)
  477. || test_lstm_int8_with_hidden(4, 2, 1, 0)
  478. || test_lstm_int8_with_hidden(8, 2, 2, 0)
  479. || test_lstm_int8_with_hidden(16, 8, 7, 0)
  480. || test_lstm_int8_with_hidden(17, 8, 8, 0)
  481. || test_lstm_int8_with_hidden(19, 15, 8, 0)
  482. || test_lstm_int8_with_hidden(5, 16, 16, 0)
  483. || test_lstm_int8_with_hidden(3, 16, 8, 0)
  484. || test_lstm_int8_with_hidden(2, 5, 17, 0, 15)
  485. || test_lstm_int8_with_hidden_input(4, 4, 1, 2)
  486. || test_lstm_int8_with_hidden_input(8, 2, 2, 2)
  487. || test_lstm_int8_with_hidden_input(16, 8, 7, 2)
  488. || test_lstm_int8_with_hidden_input(17, 8, 8, 2)
  489. || test_lstm_int8_with_hidden_input(19, 15, 8, 2)
  490. || test_lstm_int8_with_hidden_input(5, 16, 16, 2)
  491. || test_lstm_int8_with_hidden_input(3, 16, 8, 2)
  492. || test_lstm_int8_with_hidden_input(2, 5, 79, 2, 33)
  493. || test_lstm_int8_with_hidden_input(4, 4, 1, 1)
  494. || test_lstm_int8_with_hidden_input(8, 2, 2, 1)
  495. || test_lstm_int8_with_hidden_input(16, 8, 7, 1)
  496. || test_lstm_int8_with_hidden_input(17, 8, 8, 1)
  497. || test_lstm_int8_with_hidden_input(19, 15, 8, 1)
  498. || test_lstm_int8_with_hidden_input(5, 16, 16, 1)
  499. || test_lstm_int8_with_hidden_input(3, 16, 8, 1)
  500. || test_lstm_int8_with_hidden_input(2, 5, 79, 1, 33)
  501. || test_lstm_int8_with_hidden_input(4, 2, 1, 0)
  502. || test_lstm_int8_with_hidden_input(8, 2, 2, 0)
  503. || test_lstm_int8_with_hidden_input(16, 8, 7, 0)
  504. || test_lstm_int8_with_hidden_input(17, 8, 8, 0)
  505. || test_lstm_int8_with_hidden_input(19, 15, 8, 0)
  506. || test_lstm_int8_with_hidden_input(5, 16, 16, 0)
  507. || test_lstm_int8_with_hidden_input(3, 16, 8, 0)
  508. || test_lstm_int8_with_hidden_input(2, 5, 17, 0, 15)
  509. || test_lstm_int8_with_hidden_output(4, 4, 1, 2)
  510. || test_lstm_int8_with_hidden_output(8, 2, 2, 2)
  511. || test_lstm_int8_with_hidden_output(16, 8, 7, 2)
  512. || test_lstm_int8_with_hidden_output(17, 8, 8, 2)
  513. || test_lstm_int8_with_hidden_output(19, 15, 8, 2)
  514. || test_lstm_int8_with_hidden_output(5, 16, 16, 2)
  515. || test_lstm_int8_with_hidden_output(3, 16, 8, 2)
  516. || test_lstm_int8_with_hidden_output(2, 5, 79, 2, 33)
  517. || test_lstm_int8_with_hidden_output(4, 4, 1, 1)
  518. || test_lstm_int8_with_hidden_output(8, 2, 2, 1)
  519. || test_lstm_int8_with_hidden_output(16, 8, 7, 1)
  520. || test_lstm_int8_with_hidden_output(17, 8, 8, 1)
  521. || test_lstm_int8_with_hidden_output(19, 15, 8, 1)
  522. || test_lstm_int8_with_hidden_output(5, 16, 16, 1)
  523. || test_lstm_int8_with_hidden_output(3, 16, 8, 1)
  524. || test_lstm_int8_with_hidden_output(2, 5, 79, 1, 33)
  525. || test_lstm_int8_with_hidden_output(4, 2, 1, 0)
  526. || test_lstm_int8_with_hidden_output(8, 2, 2, 0)
  527. || test_lstm_int8_with_hidden_output(16, 8, 7, 0)
  528. || test_lstm_int8_with_hidden_output(17, 8, 8, 0)
  529. || test_lstm_int8_with_hidden_output(19, 15, 8, 0)
  530. || test_lstm_int8_with_hidden_output(5, 16, 16, 0)
  531. || test_lstm_int8_with_hidden_output(3, 16, 8, 0)
  532. || test_lstm_int8_with_hidden_output(2, 5, 17, 0, 15);
  533. }
  534. static int test_lstm_6()
  535. {
  536. return 0
  537. || test_lstm_int8(4, 1, 1, 0)
  538. || test_lstm_int8(8, 2, 2, 0)
  539. || test_lstm_int8(16, 8, 7, 0)
  540. || test_lstm_int8(17, 8, 8, 0)
  541. || test_lstm_int8(19, 15, 8, 0)
  542. || test_lstm_int8(5, 16, 16, 0)
  543. || test_lstm_int8(3, 16, 8, 0)
  544. || test_lstm_int8(8, 16, 16, 0)
  545. || test_lstm_int8(2, 5, 17, 0, 15);
  546. }
  547. static int test_lstm_7()
  548. {
  549. return 0
  550. || test_lstm_int8(4, 1, 1, 1)
  551. || test_lstm_int8(8, 2, 2, 1)
  552. || test_lstm_int8(16, 8, 7, 1)
  553. || test_lstm_int8(17, 8, 8, 1)
  554. || test_lstm_int8(19, 15, 8, 1)
  555. || test_lstm_int8(5, 16, 16, 1)
  556. || test_lstm_int8(3, 16, 8, 1)
  557. || test_lstm_int8(8, 16, 16, 1)
  558. || test_lstm_int8(2, 5, 17, 1, 15);
  559. }
  560. #endif
  561. int main()
  562. {
  563. SRAND(7767517);
  564. #if NCNN_INT8
  565. return 0
  566. || test_lstm_0()
  567. || test_lstm_1()
  568. || test_lstm_2()
  569. || test_lstm_3()
  570. || test_lstm_4()
  571. || test_lstm_5()
  572. || test_lstm_6()
  573. || test_lstm_7();
  574. #else
  575. return 0
  576. || test_lstm_0()
  577. || test_lstm_1()
  578. || test_lstm_2()
  579. || test_lstm_3();
  580. #endif
  581. }