You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

test_lstm.cpp 11 kB

LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
LSTM arm/x86 + fp16 innerproduct arm (#1881) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263
  1. // Tencent is pleased to support the open source community by making ncnn available.
  2. //
  3. // Copyright (C) 2020 THL A29 Limited, a Tencent company. All rights reserved.
  4. //
  5. // Licensed under the BSD 3-Clause License (the "License"); you may not use this file except
  6. // in compliance with the License. You may obtain a copy of the License at
  7. //
  8. // https://opensource.org/licenses/BSD-3-Clause
  9. //
  10. // Unless required by applicable law or agreed to in writing, software distributed
  11. // under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
  12. // CONDITIONS OF ANY KIND, either express or implied. See the License for the
  13. // specific language governing permissions and limitations under the License.
  14. #include "layer/lstm.h"
  15. #include "testutil.h"
  16. static int test_lstm(const ncnn::Mat& a, int outch, int direction)
  17. {
  18. int input_size = a.w;
  19. int num_directions = direction == 2 ? 2 : 1;
  20. ncnn::ParamDict pd;
  21. pd.set(0, outch);
  22. pd.set(1, outch * input_size * 4 * num_directions);
  23. pd.set(2, direction);
  24. std::vector<ncnn::Mat> weights(3);
  25. weights[0] = RandomMat(outch * input_size * 4 * num_directions);
  26. weights[1] = RandomMat(outch * 4 * num_directions);
  27. weights[2] = RandomMat(outch * outch * 4 * num_directions);
  28. int ret = test_layer<ncnn::LSTM>("LSTM", pd, weights, a);
  29. if (ret != 0)
  30. {
  31. fprintf(stderr, "test_lstm failed a.dims=%d a=(%d %d %d) outch=%d, direction = %d \n", a.dims, a.w, a.h, a.c, outch, direction);
  32. }
  33. return ret;
  34. }
  35. int test_lstm_layer_with_hidden(const ncnn::Mat& a, int outch, int direction)
  36. {
  37. int input_size = a.w;
  38. int num_directions = direction == 2 ? 2 : 1;
  39. ncnn::ParamDict pd;
  40. pd.set(0, outch);
  41. pd.set(1, outch * input_size * 4 * num_directions);
  42. pd.set(2, direction);
  43. std::vector<ncnn::Mat> weights(3);
  44. weights[0] = RandomMat(outch * input_size * 4 * num_directions);
  45. weights[1] = RandomMat(outch * 4 * num_directions);
  46. weights[2] = RandomMat(outch * outch * 4 * num_directions);
  47. // initial hidden state
  48. ncnn::Mat hidden = RandomMat(outch, num_directions);
  49. // initial cell state
  50. ncnn::Mat cell = RandomMat(outch, num_directions);
  51. std::vector<ncnn::Mat> as(3);
  52. as[0] = a;
  53. as[1] = hidden;
  54. as[2] = cell;
  55. int ret = test_layer<ncnn::LSTM>("LSTM", pd, weights, as, 3);
  56. if (ret != 0)
  57. {
  58. fprintf(stderr, "test_lstm_layer_with_hidden failed a.dims=%d a=(%d %d %d) outch=%d, direction = %d \n", a.dims, a.w, a.h, a.c, outch, direction);
  59. }
  60. return ret;
  61. }
  62. int test_lstm_layer_with_hidden_input(const ncnn::Mat& a, int outch, int direction)
  63. {
  64. int input_size = a.w;
  65. int num_directions = direction == 2 ? 2 : 1;
  66. ncnn::ParamDict pd;
  67. pd.set(0, outch);
  68. pd.set(1, outch * input_size * 4 * num_directions);
  69. pd.set(2, direction);
  70. std::vector<ncnn::Mat> weights(3);
  71. weights[0] = RandomMat(outch * input_size * 4 * num_directions);
  72. weights[1] = RandomMat(outch * 4 * num_directions);
  73. weights[2] = RandomMat(outch * outch * 4 * num_directions);
  74. // initial hidden state
  75. ncnn::Mat hidden = RandomMat(outch, num_directions);
  76. // initial cell state
  77. ncnn::Mat cell = RandomMat(outch, num_directions);
  78. std::vector<ncnn::Mat> as(3);
  79. as[0] = a;
  80. as[1] = hidden;
  81. as[2] = cell;
  82. int ret = test_layer<ncnn::LSTM>("LSTM", pd, weights, as, 1);
  83. if (ret != 0)
  84. {
  85. fprintf(stderr, "test_lstm_layer_with_hidden_input failed a.dims=%d a=(%d %d %d) outch=%d, direction = %d \n", a.dims, a.w, a.h, a.c, outch, direction);
  86. }
  87. return ret;
  88. }
  89. int test_lstm_layer_with_hidden_output(const ncnn::Mat& a, int outch, int direction)
  90. {
  91. int input_size = a.w;
  92. int num_directions = direction == 2 ? 2 : 1;
  93. ncnn::ParamDict pd;
  94. pd.set(0, outch);
  95. pd.set(1, outch * input_size * 4 * num_directions);
  96. pd.set(2, direction);
  97. std::vector<ncnn::Mat> weights(3);
  98. weights[0] = RandomMat(outch * input_size * 4 * num_directions);
  99. weights[1] = RandomMat(outch * 4 * num_directions);
  100. weights[2] = RandomMat(outch * outch * 4 * num_directions);
  101. std::vector<ncnn::Mat> as(1);
  102. as[0] = a;
  103. int ret = test_layer<ncnn::LSTM>("LSTM", pd, weights, as, 3);
  104. if (ret != 0)
  105. {
  106. fprintf(stderr, "test_lstm_layer_with_hidden_output failed a.dims=%d a=(%d %d %d) outch=%d, direction = %d \n", a.dims, a.w, a.h, a.c, outch, direction);
  107. }
  108. return ret;
  109. }
  110. static int test_lstm_0()
  111. {
  112. return 0
  113. || test_lstm(RandomMat(4, 1), 2, 2)
  114. || test_lstm(RandomMat(8, 2), 2, 2)
  115. || test_lstm(RandomMat(16, 8), 7, 2)
  116. || test_lstm(RandomMat(17, 8), 8, 2)
  117. || test_lstm(RandomMat(19, 15), 8, 2)
  118. || test_lstm(RandomMat(5, 16), 16, 2)
  119. || test_lstm(RandomMat(3, 16), 8, 2)
  120. || test_lstm(RandomMat(8, 16), 16, 2)
  121. || test_lstm(RandomMat(2, 5), 17, 2);
  122. }
  123. static int test_lstm_1()
  124. {
  125. return 0
  126. || test_lstm_layer_with_hidden(RandomMat(4, 4), 1, 2)
  127. || test_lstm_layer_with_hidden(RandomMat(8, 2), 2, 2)
  128. || test_lstm_layer_with_hidden(RandomMat(16, 8), 7, 2)
  129. || test_lstm_layer_with_hidden(RandomMat(17, 8), 8, 2)
  130. || test_lstm_layer_with_hidden(RandomMat(19, 15), 8, 2)
  131. || test_lstm_layer_with_hidden(RandomMat(5, 16), 16, 2)
  132. || test_lstm_layer_with_hidden(RandomMat(3, 16), 8, 2)
  133. || test_lstm_layer_with_hidden(RandomMat(2, 5), 99, 2)
  134. || test_lstm_layer_with_hidden(RandomMat(4, 4), 1, 1)
  135. || test_lstm_layer_with_hidden(RandomMat(8, 2), 2, 1)
  136. || test_lstm_layer_with_hidden(RandomMat(16, 8), 7, 1)
  137. || test_lstm_layer_with_hidden(RandomMat(17, 8), 8, 1)
  138. || test_lstm_layer_with_hidden(RandomMat(19, 15), 8, 1)
  139. || test_lstm_layer_with_hidden(RandomMat(5, 16), 16, 1)
  140. || test_lstm_layer_with_hidden(RandomMat(3, 16), 8, 1)
  141. || test_lstm_layer_with_hidden(RandomMat(2, 5), 99, 1)
  142. || test_lstm_layer_with_hidden(RandomMat(4, 2), 1, 0)
  143. || test_lstm_layer_with_hidden(RandomMat(8, 2), 2, 0)
  144. || test_lstm_layer_with_hidden(RandomMat(16, 8), 7, 0)
  145. || test_lstm_layer_with_hidden(RandomMat(17, 8), 8, 0)
  146. || test_lstm_layer_with_hidden(RandomMat(19, 15), 8, 0)
  147. || test_lstm_layer_with_hidden(RandomMat(5, 16), 16, 0)
  148. || test_lstm_layer_with_hidden(RandomMat(3, 16), 8, 0)
  149. || test_lstm_layer_with_hidden(RandomMat(2, 5), 17, 0)
  150. || test_lstm_layer_with_hidden_input(RandomMat(4, 4), 1, 2)
  151. || test_lstm_layer_with_hidden_input(RandomMat(8, 2), 2, 2)
  152. || test_lstm_layer_with_hidden_input(RandomMat(16, 8), 7, 2)
  153. || test_lstm_layer_with_hidden_input(RandomMat(17, 8), 8, 2)
  154. || test_lstm_layer_with_hidden_input(RandomMat(19, 15), 8, 2)
  155. || test_lstm_layer_with_hidden_input(RandomMat(5, 16), 16, 2)
  156. || test_lstm_layer_with_hidden_input(RandomMat(3, 16), 8, 2)
  157. || test_lstm_layer_with_hidden_input(RandomMat(2, 5), 99, 2)
  158. || test_lstm_layer_with_hidden_input(RandomMat(4, 4), 1, 1)
  159. || test_lstm_layer_with_hidden_input(RandomMat(8, 2), 2, 1)
  160. || test_lstm_layer_with_hidden_input(RandomMat(16, 8), 7, 1)
  161. || test_lstm_layer_with_hidden_input(RandomMat(17, 8), 8, 1)
  162. || test_lstm_layer_with_hidden_input(RandomMat(19, 15), 8, 1)
  163. || test_lstm_layer_with_hidden_input(RandomMat(5, 16), 16, 1)
  164. || test_lstm_layer_with_hidden_input(RandomMat(3, 16), 8, 1)
  165. || test_lstm_layer_with_hidden_input(RandomMat(2, 5), 99, 1)
  166. || test_lstm_layer_with_hidden_input(RandomMat(4, 2), 1, 0)
  167. || test_lstm_layer_with_hidden_input(RandomMat(8, 2), 2, 0)
  168. || test_lstm_layer_with_hidden_input(RandomMat(16, 8), 7, 0)
  169. || test_lstm_layer_with_hidden_input(RandomMat(17, 8), 8, 0)
  170. || test_lstm_layer_with_hidden_input(RandomMat(19, 15), 8, 0)
  171. || test_lstm_layer_with_hidden_input(RandomMat(5, 16), 16, 0)
  172. || test_lstm_layer_with_hidden_input(RandomMat(3, 16), 8, 0)
  173. || test_lstm_layer_with_hidden_input(RandomMat(2, 5), 17, 0)
  174. || test_lstm_layer_with_hidden_output(RandomMat(4, 4), 1, 2)
  175. || test_lstm_layer_with_hidden_output(RandomMat(8, 2), 2, 2)
  176. || test_lstm_layer_with_hidden_output(RandomMat(16, 8), 7, 2)
  177. || test_lstm_layer_with_hidden_output(RandomMat(17, 8), 8, 2)
  178. || test_lstm_layer_with_hidden_output(RandomMat(19, 15), 8, 2)
  179. || test_lstm_layer_with_hidden_output(RandomMat(5, 16), 16, 2)
  180. || test_lstm_layer_with_hidden_output(RandomMat(3, 16), 8, 2)
  181. || test_lstm_layer_with_hidden_output(RandomMat(2, 5), 99, 2)
  182. || test_lstm_layer_with_hidden_output(RandomMat(4, 4), 1, 1)
  183. || test_lstm_layer_with_hidden_output(RandomMat(8, 2), 2, 1)
  184. || test_lstm_layer_with_hidden_output(RandomMat(16, 8), 7, 1)
  185. || test_lstm_layer_with_hidden_output(RandomMat(17, 8), 8, 1)
  186. || test_lstm_layer_with_hidden_output(RandomMat(19, 15), 8, 1)
  187. || test_lstm_layer_with_hidden_output(RandomMat(5, 16), 16, 1)
  188. || test_lstm_layer_with_hidden_output(RandomMat(3, 16), 8, 1)
  189. || test_lstm_layer_with_hidden_output(RandomMat(2, 5), 99, 1)
  190. || test_lstm_layer_with_hidden_output(RandomMat(4, 2), 1, 0)
  191. || test_lstm_layer_with_hidden_output(RandomMat(8, 2), 2, 0)
  192. || test_lstm_layer_with_hidden_output(RandomMat(16, 8), 7, 0)
  193. || test_lstm_layer_with_hidden_output(RandomMat(17, 8), 8, 0)
  194. || test_lstm_layer_with_hidden_output(RandomMat(19, 15), 8, 0)
  195. || test_lstm_layer_with_hidden_output(RandomMat(5, 16), 16, 0)
  196. || test_lstm_layer_with_hidden_output(RandomMat(3, 16), 8, 0)
  197. || test_lstm_layer_with_hidden_output(RandomMat(2, 5), 17, 0);
  198. }
  199. static int test_lstm_2()
  200. {
  201. return 0
  202. || test_lstm(RandomMat(4, 1), 1, 0)
  203. || test_lstm(RandomMat(8, 2), 2, 0)
  204. || test_lstm(RandomMat(16, 8), 7, 0)
  205. || test_lstm(RandomMat(17, 8), 8, 0)
  206. || test_lstm(RandomMat(19, 15), 8, 0)
  207. || test_lstm(RandomMat(5, 16), 16, 0)
  208. || test_lstm(RandomMat(3, 16), 8, 0)
  209. || test_lstm(RandomMat(8, 16), 16, 0)
  210. || test_lstm(RandomMat(2, 5), 17, 0);
  211. }
  212. static int test_lstm_3()
  213. {
  214. return 0
  215. || test_lstm(RandomMat(4, 1), 1, 1)
  216. || test_lstm(RandomMat(8, 2), 2, 1)
  217. || test_lstm(RandomMat(16, 8), 7, 1)
  218. || test_lstm(RandomMat(17, 8), 8, 1)
  219. || test_lstm(RandomMat(19, 15), 8, 1)
  220. || test_lstm(RandomMat(5, 16), 16, 1)
  221. || test_lstm(RandomMat(3, 16), 8, 1)
  222. || test_lstm(RandomMat(8, 16), 16, 1)
  223. || test_lstm(RandomMat(2, 5), 17, 1);
  224. }
  225. int main()
  226. {
  227. SRAND(7767517);
  228. return 0 || test_lstm_0() || test_lstm_1() || test_lstm_2() || test_lstm_3();
  229. }