diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/README.md b/model_zoo/official/cv/crnn_seq2seq_ocr/README.md new file mode 100755 index 0000000000..668d20bf87 --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/README.md @@ -0,0 +1,196 @@ +# Contents + +- [Contents](#contents) + - [CRNN-Seq2Seq-OCR Description](#crnn-seq2seq-ocr-description) + - [Model Architecture](#model-architecture) + - [Dataset](#dataset) + - [Dataset Prepare](#dataset-prepare) + - [Environment Requirements](#environment-requirements) + - [Quick Start](#quick-start) + - [Script Description](#script-description) + - [Script and Sample Code](#script-and-sample-code) + - [Script Parameters](#script-parameters) + - [Training Script Parameters](#training-script-parameters) + - [Parameters Configuration](#parameters-configuration) + - [Dataset Preparation](#dataset-preparation) + - [Training Process](#training-process) + - [Training](#training) + - [Distributed Training](#distributed-training) + - [Evaluation Process](#evaluation-process) + - [Evaluation](#evaluation) + - [Model Description](#model-description) + - [Performance](#performance) + - [Training Performance](#training-performance) + - [Evaluation Performance](#evaluation-performance) + +## [CRNN-Seq2Seq-OCR Description](#contents) + +CRNN-Seq2Seq-OCR is a neural network model for image based sequence recognition tasks, such as scene text recognition and optical character recognition (OCR). Its architecture is a combination of CNN and sequence to sequence model with attention mechanism. + +## [Model Architecture](#content) + +CRNN-Seq2Seq-OCR applies a vgg structure to extract features from processed images, following with attention-based encoder and decoder layer, finally utilizes NLL to calculate loss. See src/attention_ocr.py for details. + +## [Dataset](#content) + +For training and evaluation, we use the French Street Name Signs (FSNS) released by Google as the training data, which contains approximately 1 million training images and their corresponding ground truth words. + +## [Environment Requirements](#contents) + +- Hardware(Ascend) + - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. You will be able to have access to related resources once approved. +- Framework + - [MindSpore](https://gitee.com/mindspore/mindspore) +- For more information, please check the resources below: + - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) + +## [Quick Start](#contents) + +- After the dataset is prepared, you may start running the training or the evaluation scripts as follows: + + - Running on Ascend + + ```shell + # distribute training example in Ascend + $ bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] + + # evaluation example in Ascend + $ bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH] + + # standalone training example in Ascend + $ bash run_standalone_train.sh [DATASET_NAME] [DATASET_PATH] [PLATFORM] + ``` + + For distributed training, a hccl configuration file with JSON format needs to be created in advance. + + Please follow the instructions in the link below: + [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). + +## [Script Description](#contents) + +### [Script and Sample Code](#contents) + +```shell +crnn-seq2seq-ocr +├── README.md # Descriptions about CRNN-Seq2Seq-OCR +├── scripts +│   ├── run_distribute_train.sh # Launch distributed training on Ascend(8 pcs) +│   ├── run_eval_ascend.sh # Launch Ascend evaluation +│   └── run_standalone_train.sh # Launch standalone training on Ascend(1 pcs) +├── src +│   ├── attention_ocr.py # CRNN-Seq2Seq-OCR training wrapper +│   ├── cnn.py # VGG network +│   ├── config.py # Parameter configuration +│   ├── create_mindrecord_files.py # Create mindrecord files from images and ground truth +│   ├── dataset.py # Data preprocessing for training and evaluation +│   ├── gru.py # GRU cell wrapper +│   ├── logger.py # Logger configuration +│   ├── lstm.py # LSTM cell wrapper +│   ├── seq2seq.py # CRNN-Seq2Seq-OCR model structure +│   └── utils.py # Utility functions for training and data pre-processing +│   ├── weight_init.py # weight initialization of LSTM and GRU +└── train.py # Training script +├── eval.py # Evaluation Script +``` + +### [Script Parameters](#contents) + +#### Training Script Parameters + +```shell +# distributed training on Ascend +Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] + +# standalone training +Usage: bash run_standalone_train.sh [DATASET_PATH] +``` + +#### Parameters Configuration + +Parameters for both training and evaluation can be set in config.py. + +### [Dataset Preparation](#contents) + +- You may refer to "Generate dataset" in [Quick Start](#quick-start) to automatically generate a dataset, or you may choose to generate a text image dataset by yourself. + +## [Training Process](#contents) + +- Set options in `config.py`, including learning rate and other network hyperparameters. Click [MindSpore dataset preparation tutorial](https://www.mindspore.cn/tutorial/training/zh-CN/master/use/data_preparation.html) for more information about dataset. + +### [Training](#contents) + +- Run `run_standalone_train.sh` for non-distributed training of CRNN-Seq2Seq-OCR model, only support Ascend now. + +``` bash +bash run_standalone_train.sh [DATASET_PATH] +``` + +#### [Distributed Training](#contents) + +- Run `run_distribute_train.sh` for distributed training of CRNN-Seq2Seq-OCR model on Ascend. + +``` bash +bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] +``` + +Check the `train_parallel0/log.txt` and you will get outputs as following: + +```shell +epoch: 20 step: 4080, loss is 1.56112 +epoch: 20 step: 4081, loss is 1.6368448 +epoch time: 1559886.096 ms, per step time: 382.231 ms +``` + +## [Evaluation Process](#contents) + +### [Evaluation](#contents) + +- Run `run_eval_ascend.sh` for evaluation on Ascend. + +``` bash +bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH] +``` + +Check the `eval/log` and you will get outputs as following: + +```shell +character precision = 0.967522 + +Annotation precision precision = 0.635204 +``` + +# Model Description + +## Performance + +### Evaluation Performance + +| Parameters | Ascend | +| -------------------------- | ----------------------------------------------------------- | +| Model Version | V1 | +| Resource | Ascend 910 ;CPU 2.60GHz,192cores;Memory,755G | +| uploaded Date | 02/11/2021 (month/day/year) | +| MindSpore Version | 1.2.0 | +| Dataset | FSNS | +| Training Parameters | epoch=20, batch_size=32 | +| Optimizer | SGD | +| Loss Function | Negative Log Likelihood | +| Speed | 1pc: 355 ms/step; 8pcs: 385 ms/step | +| Total time | 1pc: 64 hours; 8pcs: 9 hours | +| Parameters (M) | 12 | +| Scripts | [crnn_seq2seq_ocr script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/crnn_seq2seq_ocr) | + +### Inference Performance + +| Parameters | Ascend | +| ------------------- | --------------------------- | +| Model Version | V1 | +| Resource | Ascend 910 | +| Uploaded Date | 02/11/2021 (month/day/year) | +| MindSpore Version | 1.2.0 | +| Dataset | FSNS | +| batch_size | 32 | +| outputs | Annotation Precision, Character Precision | +| Accuracy | Annotation Precision=63.52%, Character Precision=96.75% | +| Model for inference | 12M (.ckpt file) | diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/eval.py b/model_zoo/official/cv/crnn_seq2seq_ocr/eval.py new file mode 100755 index 0000000000..3178e65122 --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/eval.py @@ -0,0 +1,181 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +CRNN-Seq2Seq-OCR Evaluation. + +""" + +import os +import codecs +import argparse +import numpy as np + +import mindspore.ops.operations as P +import mindspore.common.dtype as mstype + +from mindspore.common import set_seed +from mindspore import context, Tensor +from mindspore.train.serialization import load_checkpoint, load_param_into_net + +from src.config import config +from src.utils import initialize_vocabulary +from src.dataset import create_ocr_val_dataset +from src.attention_ocr import AttentionOCRInfer + + +set_seed(1) + + +def text_standardization(text_in): + """ + replace some particular characters + """ + stand_text = text_in.strip() + stand_text = ' '.join(stand_text.split()) + stand_text = stand_text.replace(u'(', u'(') + stand_text = stand_text.replace(u')', u')') + stand_text = stand_text.replace(u':', u':') + return stand_text + + +def LCS_length(str1, str2): + """ + calculate longest common sub-sequence between str1 and str2 + """ + if str1 is None or str2 is None: + return 0 + + len1 = len(str1) + len2 = len(str2) + if len1 == 0 or len2 == 0: + return 0 + + lcs = [[0 for _ in range(len2 + 1)] for _ in range(2)] + for i in range(1, len1 + 1): + for j in range(1, len2 + 1): + if str1[i - 1] == str2[j - 1]: + lcs[i % 2][j] = lcs[(i - 1) % 2][j - 1] + 1 + else: + if lcs[i % 2][j - 1] >= lcs[(i - 1) % 2][j]: + lcs[i % 2][j] = lcs[i % 2][j - 1] + else: + lcs[i % 2][j] = lcs[(i - 1) % 2][j] + + return lcs[len1 % 2][-1] + + +if __name__ == '__main__': + parser = argparse.ArgumentParser(description="CRNN-Seq2Seq-OCR Evaluation") + parser.add_argument("--dataset_path", type=str, default="", + help="Test Dataset path") + parser.add_argument("--checkpoint_path", type=str, default=None, + help="Checkpoint of AttentionOCR (Default:None).") + parser.add_argument("--device_target", type=str, default="Ascend", + help="device where the code will be implemented, default is Ascend") + parser.add_argument("--device_id", type=int, default=0, help="Device id, default: 0.") + + args = parser.parse_args() + + context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, device_id=args.device_id) + + prefix = "fsns.mindrecord" + mindrecord_dir = args.dataset_path + mindrecord_file = os.path.join(mindrecord_dir, prefix + "0") + print("mindrecord_file", mindrecord_file) + dataset = create_ocr_val_dataset(mindrecord_file, config.eval_batch_size) + data_loader = dataset.create_dict_iterator(num_epochs=1, output_numpy=True) + print("Dataset creation Done!") + + #Network + network = AttentionOCRInfer(config.eval_batch_size, + int(config.img_width / 4), + config.encoder_hidden_size, + config.decoder_hidden_size, + config.decoder_output_size, + config.max_length, + config.dropout_p) + + ckpt = load_checkpoint(args.checkpoint_path) + load_param_into_net(network, ckpt) + network.set_train(False) + print("Checkpoint loading Done!") + + vocab, rev_vocab = initialize_vocabulary(config.vocab_path) + eos_id = config.characters_dictionary.get("eos_id") + sos_id = config.characters_dictionary.get("go_id") + + num_correct_char = 0 + num_total_char = 0 + num_correct_word = 0 + num_total_word = 0 + + correct_file = 'result_correct.txt' + incorrect_file = 'result_incorrect.txt' + + with codecs.open(correct_file, 'w', encoding='utf-8') as fp_output_correct, \ + codecs.open(incorrect_file, 'w', encoding='utf-8') as fp_output_incorrect: + + for data in data_loader: + images = Tensor(data["image"]) + decoder_inputs = Tensor(data["decoder_input"]) + decoder_targets = Tensor(data["decoder_target"]) + + decoder_hidden = Tensor(np.zeros((1, config.eval_batch_size, config.decoder_hidden_size), + dtype=np.float16), mstype.float16) + decoder_input = Tensor((np.ones((config.eval_batch_size, 1))*sos_id).astype(np.int32)) + encoder_outputs = network.encoder(images) + batch_decoded_label = [] + + for di in range(decoder_inputs.shape[1]): + decoder_output, decoder_hidden, _ = network.decoder(decoder_input, decoder_hidden, encoder_outputs) + topi = P.Argmax()(decoder_output) + ni = P.ExpandDims()(topi, 1) + decoder_input = ni + topi_id = topi.asnumpy() + batch_decoded_label.append(topi_id) + + for b in range(config.eval_batch_size): + text = data["annotation"][b].decode("utf8") + text = text_standardization(text) + decoded_label = list(np.array(batch_decoded_label)[:, b]) + decoded_words = [] + for idx in decoded_label: + if idx == eos_id: + break + else: + decoded_words.append(rev_vocab[idx]) + predict = text_standardization("".join(decoded_words)) + + if predict == text: + num_correct_word += 1 + fp_output_correct.write('\t\t' + text + '\n') + fp_output_correct.write('\t\t' + predict + '\n\n') + print('correctly predicted : pred: {}, gt: {}'.format(predict, text)) + + else: + fp_output_incorrect.write('\t\t' + text + '\n') + fp_output_incorrect.write('\t\t' + predict + '\n\n') + print('incorrectly predicted : pred: {}, gt: {}'.format(predict, text)) + + num_total_word += 1 + num_correct_char += 2 * LCS_length(text, predict) + num_total_char += len(text) + len(predict) + + print('\nnum of correct characters = %d' % (num_correct_char)) + print('\nnum of total characters = %d' % (num_total_char)) + print('\nnum of correct words = %d' % (num_correct_word)) + print('\nnum of total words = %d' % (num_total_word)) + print('\ncharacter precision = %f' % (float(num_correct_char) / num_total_char)) + print('\nAnnotation precision precision = %f' % (float(num_correct_word) / num_total_word)) diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/scripts/run_distribute_train.sh b/model_zoo/official/cv/crnn_seq2seq_ocr/scripts/run_distribute_train.sh new file mode 100755 index 0000000000..38fb92f024 --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/scripts/run_distribute_train.sh @@ -0,0 +1,66 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# -ne 2 ] +then + echo "Usage: sh run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH]" +exit 1 +fi + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + echo "$(realpath -m $PWD/$1)" + fi +} + +PATH1=$(get_real_path $1) +echo $PATH1 + +if [ ! -f $PATH1 ] +then + echo "error: RANK_TABLE_FILE=$PATH1 is not a file" +exit 1 +fi + +PATH2=$(get_real_path $2) +echo $PATH2 +if [ ! -f $PATH2 ] +then + echo "error: PRETRAINED_PATH=$PATH2 is not a file" +exit 1 +fi + +ulimit -u unlimited +export DEVICE_NUM=8 +export RANK_SIZE=8 +export RANK_TABLE_FILE=$PATH1 + +for((i=0; i<${DEVICE_NUM}; i++)) +do + export RANK_ID=$i + export DEVICE_ID=$i + rm -rf ./train_parallel$i + mkdir ./train_parallel$i + cp ../*.py ./train_parallel$i + cp -r ../src ./train_parallel$i + cd ./train_parallel$i || exit + echo "start training for rank $RANK_ID, device $DEVICE_ID" + env > env.log + python train.py --device_id=$DEVICE_ID --rank_id=$RANK_ID --is_distribute=1 --device_num=$DEVICE_NUM --mindrecord_file=$PATH2 &> log & + cd .. +done diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/scripts/run_eval_ascend.sh b/model_zoo/official/cv/crnn_seq2seq_ocr/scripts/run_eval_ascend.sh new file mode 100755 index 0000000000..25d5489caf --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/scripts/run_eval_ascend.sh @@ -0,0 +1,64 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# != 2 ] +then + echo "Usage: sh run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH]" +exit 1 +fi + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + echo "$(realpath -m $PWD/$1)" + fi +} +PATH1=$(get_real_path $1) +PATH2=$(get_real_path $2) +echo $PATH1 +echo $PATH2 + +if [ ! -d $PATH1 ] +then + echo "error: DATASET_PATH=$PATH1 is not a folder" +exit 1 +fi + +if [ ! -f $PATH2 ] +then + echo "error: CHECKPOINT_PATH=$PATH2 is not a file" +exit 1 +fi + +export DEVICE_NUM=1 +export RANK_SIZE=$DEVICE_NUM +export DEVICE_ID=0 +export RANK_ID=0 + +if [ -d "eval" ]; +then + rm -rf ./eval +fi +mkdir ./eval +cp ../*.py ./eval +cp *.sh ./eval +cp -r ../src ./eval +cd ./eval || exit +env > env.log +echo "start eval for device $DEVICE_ID" +python eval.py --device_target="Ascend" --device_id=$DEVICE_ID --dataset_path=$PATH1 --checkpoint_path=$PATH2 &> log & +cd .. diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/scripts/run_standalone_train.sh b/model_zoo/official/cv/crnn_seq2seq_ocr/scripts/run_standalone_train.sh new file mode 100755 index 0000000000..abb0d4b834 --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/scripts/run_standalone_train.sh @@ -0,0 +1,58 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# -ne 1 ] +then + echo "Usage: sh run_standalone_train_ascend.sh [DATASET_PATH]" +exit 1 +fi + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + echo "$(realpath -m $PWD/$1)" + fi +} + +PATH1=$(get_real_path $1) +echo $PATH1 + +if [ ! -f $PATH1 ] +then + echo "error: DATASET_PATH=$PATH1 is not a file" +exit 1 +fi + +ulimit -u unlimited +export DEVICE_NUM=1 +export DEVICE_ID=1 +export RANK_ID=0 +export RANK_SIZE=1 + +if [ -d "train" ]; +then + rm -rf ./train +fi +mkdir ./train +cp ../*.py ./train +cp *.sh ./train +cp -r ../src ./train +cd ./train || exit +echo "start training for device $DEVICE_ID" +env > env.log +python train.py --device_id=$DEVICE_ID --mindrecord_file=$PATH1 --is_distributed=0 &> log & +cd .. diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/attention_ocr.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/attention_ocr.py new file mode 100755 index 0000000000..0f6ed69d37 --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/attention_ocr.py @@ -0,0 +1,178 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#" ============================================================================ +""" +CRNN-Seq2Seq-OCR model. + +""" + +import numpy as np + +import mindspore as ms +import mindspore.nn as nn +import mindspore.ops.operations as P +import mindspore.common.dtype as mstype +from mindspore import context, Tensor +from mindspore.nn.loss.loss import _Loss +from mindspore.ops import composite as C +from mindspore.ops import functional as F +from mindspore.context import ParallelMode +from mindspore.communication.management import get_group_size +from mindspore.parallel._auto_parallel_context import auto_parallel_context + +from src.seq2seq import Encoder, Decoder + + +class NLLLoss(_Loss): + def __init__(self, reduction='mean'): + super(NLLLoss, self).__init__(reduction) + self.one_hot = P.OneHot() + self.reduce_sum = P.ReduceSum() + + def construct(self, logits, label): + label_one_hot = self.one_hot(label, F.shape(logits)[-1], F.scalar_to_array(1.0), F.scalar_to_array(0.0)) + loss = self.reduce_sum(-1.0 * logits * label_one_hot, (1,)) + return self.get_loss(loss) + + +class AttentionOCRInfer(nn.Cell): + def __init__(self, batch_size, conv_out_dim, encoder_hidden_size, decoder_hidden_size, + decoder_output_size, max_length, dropout_p=0.1): + super(AttentionOCRInfer, self).__init__() + + self.encoder = Encoder(batch_size=batch_size, + conv_out_dim=conv_out_dim, + hidden_size=encoder_hidden_size) + + self.decoder = Decoder(hidden_size=decoder_hidden_size, + output_size=decoder_output_size, + max_length=max_length, + dropout_p=dropout_p) + + def construct(self, img, decoder_input, decoder_hidden): + ''' + get token output + ''' + encoder_outputs = self.encoder(img) + decoder_output, decoder_hidden, decoder_attention = self.decoder( + decoder_input, decoder_hidden, encoder_outputs) + return decoder_output, decoder_hidden, decoder_attention + + +class AttentionOCR(nn.Cell): + def __init__(self, batch_size, conv_out_dim, encoder_hidden_size, decoder_hidden_size, + decoder_output_size, max_length, dropout_p=0.1): + super(AttentionOCR, self).__init__() + self.encoder = Encoder(batch_size=batch_size, + conv_out_dim=conv_out_dim, + hidden_size=encoder_hidden_size) + self.decoder = Decoder(hidden_size=decoder_hidden_size, + output_size=decoder_output_size, + max_length=max_length, + dropout_p=dropout_p) + self.init_decoder_hidden = Tensor(np.zeros((1, batch_size, decoder_hidden_size), + dtype=np.float16), mstype.float16) + self.shape = P.Shape() + self.split = P.Split(axis=1, output_num=max_length) + self.concat = P.Concat() + self.expand_dims = P.ExpandDims() + self.argmax = P.Argmax() + self.select = P.Select() + + def construct(self, img, decoder_inputs, decoder_targets, teacher_force): + encoder_outputs = self.encoder(img) + _, text_len = self.shape(decoder_inputs) + decoder_outputs = () + decoder_input_tuple = self.split(decoder_inputs) + decoder_target_tuple = self.split(decoder_targets) + decoder_input = decoder_input_tuple[0] + decoder_hidden = self.init_decoder_hidden + + for i in range(text_len): + decoder_output, decoder_hidden, _ = self.decoder(decoder_input, decoder_hidden, encoder_outputs) + topi = self.argmax(decoder_output) + decoder_input_top = self.expand_dims(topi, 1) + decoder_input = self.select(teacher_force, decoder_target_tuple[i], decoder_input_top) + decoder_output = self.expand_dims(decoder_output, 0) + decoder_outputs += (decoder_output,) + outputs = self.concat(decoder_outputs) + return outputs + + +class AttentionOCRWithLossCell(nn.Cell): + """AttentionOCR with Loss""" + def __init__(self, network, max_length): + super(AttentionOCRWithLossCell, self).__init__() + self.network = network + self.loss = NLLLoss() + self.shape = P.Shape() + self.add = P.AddN() + self.mean = P.ReduceMean() + self.split = P.Split(axis=0, output_num=max_length) + self.squeeze = P.Squeeze() + self.cast = P.Cast() + + def construct(self, img, decoder_inputs, decoder_targets, teacher_force): + decoder_outputs = self.network(img, decoder_inputs, decoder_targets, teacher_force) + decoder_outputs = self.cast(decoder_outputs, mstype.float32) + _, text_len = self.shape(decoder_targets) + loss_total = () + decoder_output_tuple = self.split(decoder_outputs) + for i in range(text_len): + loss = self.loss(self.squeeze(decoder_output_tuple[i]), decoder_targets[:, i]) + loss = self.mean(loss) + loss_total += (loss,) + loss_output = self.add(loss_total) + return loss_output + + +grad_scale = C.MultitypeFuncGraph("grad_scale") +@grad_scale.register("Tensor", "Tensor") +def tensor_grad_scale(scale, grad): + return grad * P.Reciprocal()(scale) + + +class TrainingWrapper(nn.Cell): + def __init__(self, network, optimizer, sens=1.0): + super(TrainingWrapper, self).__init__(auto_prefix=False) + self.network = network + self.network.set_grad() + self.weights = ms.ParameterTuple(network.trainable_params()) + self.optimizer = optimizer + self.grad = C.GradOperation(get_by_list=True, sens_param=True) + self.sens = sens + self.reducer_flag = False + self.grad_reducer = None + + # Set parallel_mode + self.parallel_mode = context.get_auto_parallel_context("parallel_mode") + if self.parallel_mode in [ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL]: + self.reducer_flag = True + if self.reducer_flag: + mean = context.get_auto_parallel_context("gradients_mean") + if auto_parallel_context().get_device_num_is_set(): + degree = context.get_auto_parallel_context("device_num") + else: + degree = get_group_size() + self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree) + self.hyper_map = C.HyperMap() + + def construct(self, *args): + weights = self.weights + loss = self.network(*args) + sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens) + grads = self.grad(self.network, weights)(*args, sens) + if self.reducer_flag: + grads = self.grad_reducer(grads) + return F.depend(loss, self.optimizer(grads)) diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/cnn.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/cnn.py new file mode 100755 index 0000000000..55da29e62b --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/cnn.py @@ -0,0 +1,195 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#" ============================================================================ +""" +CRN-Seq2Seq-OCR CNN model. + +""" + +import math +import numpy as np + +import mindspore.nn as nn +from mindspore.ops import operations as P +from mindspore.common.tensor import Tensor + + +def calculate_gain(nonlinearity, param=None): + """calculate_gain""" + linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d'] + res = 0 + if nonlinearity in linear_fns or nonlinearity == 'sigmoid': + res = 1 + elif nonlinearity == 'tanh': + res = 5.0 / 3 + elif nonlinearity == 'relu': + res = math.sqrt(2.0) + elif nonlinearity == 'leaky_relu': + if param is None: + negative_slope = 0.01 + elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float): + negative_slope = param + else: + raise ValueError("negative_slope {} not a valid number".format(param)) + res = math.sqrt(2.0 / (1 + negative_slope ** 2)) + else: + raise ValueError("Unsupported nonlinearity {}".format(nonlinearity)) + return res + + +def _calculate_fan_in_and_fan_out(tensor): + """_calculate_fan_in_and_fan_out""" + dimensions = len(tensor) + if dimensions < 2: + raise ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions") + if dimensions == 2: + fan_in = tensor[1] + fan_out = tensor[0] + else: + num_input_fmaps = tensor[1] + num_output_fmaps = tensor[0] + receptive_field_size = 1 + if dimensions > 2: + receptive_field_size = tensor[2] * tensor[3] + fan_in = num_input_fmaps * receptive_field_size + fan_out = num_output_fmaps * receptive_field_size + return fan_in, fan_out + + +def _calculate_correct_fan(tensor, mode): + mode = mode.lower() + valid_modes = ['fan_in', 'fan_out'] + if mode not in valid_modes: + raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes)) + fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor) + return fan_in if mode == 'fan_in' else fan_out + + +def kaiming_normal(inputs_shape, gain_param=0, mode='fan_in', nonlinearity='leaky_relu'): + fan = _calculate_correct_fan(inputs_shape, mode) + gain = calculate_gain(nonlinearity, gain_param) + std = gain / math.sqrt(fan) + return np.random.normal(0, std, size=inputs_shape).astype(np.float32) + + +class ConvRelu(nn.Cell): + """ + Convolution Layer followed by Relu Layer + + """ + def __init__(self, in_channels, out_channels, kernel_size=1, stride=1): + super(ConvRelu, self).__init__() + shape = (out_channels, in_channels, kernel_size[0], kernel_size[1]) + self.conv = nn.Conv2d(in_channels, + out_channels, + kernel_size, + stride, + weight_init=Tensor(kaiming_normal(shape))) + self.relu = nn.ReLU() + + def construct(self, x): + x = self.conv(x) + x = self.relu(x) + return x + + +class ConvBNRelu(nn.Cell): + """ + Convolution Layer followed by Batch Normalization and Relu Layer + + """ + def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, pad_mode='same'): + super(ConvBNRelu, self).__init__() + shape = (out_channels, in_channels, kernel_size[0], kernel_size[1]) + self.conv = nn.Conv2d(in_channels, + out_channels, + kernel_size, stride, + pad_mode=pad_mode, + weight_init=Tensor(kaiming_normal(shape))) + self.bn = nn.BatchNorm2d(out_channels) + self.relu = nn.ReLU() + + def construct(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.relu(x) + return x + + +class CNN(nn.Cell): + """ + CNN Class for OCR + + """ + + def __init__(self, conv_out_dim): + super(CNN, self).__init__() + self.convRelu1 = ConvRelu(3, 64, (3, 3)) + self.maxpool1 = nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2)) + + self.convRelu2 = ConvRelu(64, 128, (3, 3)) + self.maxpool2 = nn.MaxPool2d(kernel_size=(2, 1), stride=(2, 1)) + + self.convBNRelu1 = ConvBNRelu(128, 256, (3, 3)) + self.convRelu3 = ConvRelu(256, 256, (3, 3)) + self.maxpool3 = nn.MaxPool2d(kernel_size=(2, 1), stride=(2, 1)) + + self.convBNRelu2 = ConvBNRelu(256, 384, (3, 3)) + self.convRelu4 = ConvRelu(384, 384, (3, 3)) + self.maxpool4 = nn.MaxPool2d(kernel_size=(2, 1), stride=(2, 1)) + + self.convBNRelu3 = ConvBNRelu(384, 384, (3, 3)) + self.convRelu5 = ConvRelu(384, 384, (3, 3)) + self.maxpool5 = nn.MaxPool2d(kernel_size=(2, 1), stride=(2, 1)) + + self.convBNRelu4 = ConvBNRelu(384, 384, (3, 3)) + self.convRelu6 = ConvRelu(384, 384, (3, 3)) + self.maxpool6 = nn.MaxPool2d(kernel_size=(2, 1), stride=(2, 1)) + + self.pad = nn.Pad(paddings=((0, 0), (0, 0), (0, 0), (0, 1))) + self.convBNRelu5 = ConvBNRelu(384, conv_out_dim, (2, 2), pad_mode='valid') + self.dropout = nn.Dropout(keep_prob=0.5) + + self.squeeze = P.Squeeze(2) + self.cast = P.Cast() + + def construct(self, x): + x = self.convRelu1(x) + x = self.maxpool1(x) + + x = self.convRelu2(x) + x = self.maxpool2(x) + + x = self.convBNRelu1(x) + x = self.convRelu3(x) + x = self.maxpool3(x) + + x = self.convBNRelu2(x) + x = self.convRelu4(x) + x = self.maxpool4(x) + + x = self.convBNRelu3(x) + x = self.convRelu5(x) + x = self.maxpool5(x) + + x = self.convBNRelu4(x) + x = self.convRelu6(x) + x = self.maxpool6(x) + + x = self.pad(x) + x = self.convBNRelu5(x) + x = self.dropout(x) + x = self.squeeze(x) + + return x diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/config.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/config.py new file mode 100755 index 0000000000..b660535ef6 --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/config.py @@ -0,0 +1,61 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#" ============================================================================ +"""Config parameters for CRNN-Seq2Seq-OCR model.""" + +from easydict import EasyDict as ed + + +config = ed({ + + # dataset-related + "mindrecord_dir": "", + "data_root": "", + "annotation_file": "", + + "val_data_root": "", + "val_annotation_file": "", + "data_json": "", + + "characters_dictionary": {"pad_id": 0, "go_id": 1, "eos_id": 2, "unk_id": 3}, + "labels_not_use": [u'%#�?%', u'%#背景#%', u'%#不识�?%', u'#%不识�?#', u'%#模糊#%', u'%#模糊#%'], + "vocab_path": "./general_chars.txt", + + #model-related + "img_width": 512, + "img_height": 128, + "channel_size": 3, + "conv_out_dim": 384, + "encoder_hidden_size": 128, + "decoder_hidden_size": 128, + "decoder_output_size": 10000, # vocab_size is the decoder_output_size, characters_class+1, last 9999 is the space + "dropout_p": 0.1, + "max_length": 64, + "attn_num_layers": 1, + "teacher_force_ratio": 0.5, + + #optimizer-related + "lr": 0.0008, + "adam_beta1": 0.5, + "adam_beta2": 0.999, + "loss_scale": 1024, + + #train-related + "batch_size": 32, + "num_epochs": 20, + "keep_checkpoint_max": 20, + + #eval-related + "eval_batch_size": 32 +}) diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/create_mindrecord_files.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/create_mindrecord_files.py new file mode 100755 index 0000000000..644f739698 --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/create_mindrecord_files.py @@ -0,0 +1,245 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Create FSNS MindRecord files.""" + +import os +import numpy as np + +from mindspore.mindrecord import FileWriter + +from src.config import config +from src.utils import initialize_vocabulary + + +def serialize_annotation(img_path, lex, vocab): + + go_id = config.characters_dictionary.get("go_id") + eos_id = config.characters_dictionary.get("eos_id") + + word = [go_id] + for special_label in config.labels_not_use: + if lex == special_label: + if config.print_no_train_label: + print("label in for image: %s is special label, related label is: %s, skip ..." % (img_path, lex)) + return None + + for c in lex: + if c not in vocab: + return None + + c_idx = vocab.get(c) + word.append(c_idx) + + word.append(eos_id) + word = np.array(word, dtype=np.int32) + return word + +def create_fsns_label(image_dir, anno_file_dirs): + """Get image path and annotation.""" + + if not os.path.isdir(image_dir): + raise ValueError(f'Cannot find {image_dir} dataset path.') + + image_files_dict = {} + image_anno_dict = {} + images = [] + img_id = 0 + + for anno_file_dir in anno_file_dirs: + + anno_file = open(anno_file_dir, 'r').readlines() + + for line in anno_file: + + file_name = line.split('\t')[0] + labels = line.split('\t')[1].split('\n')[0] + image_path = os.path.join(image_dir, file_name) + + if not os.path.isfile(image_path): + print(f'Cannot find image {image_path} according to annotations.') + continue + + if labels: + images.append(img_id) + image_files_dict[img_id] = image_path + image_anno_dict[img_id] = labels + img_id += 1 + + return images, image_files_dict, image_anno_dict + + +def fsns_train_data_to_mindrecord(mindrecord_dir, prefix="data_ocr.mindrecord", file_num=8): + + anno_file_dirs = [config.train_annotation_file] + images, image_path_dict, image_anno_dict = create_fsns_label(image_dir=config.data_root, + anno_file_dirs=anno_file_dirs) + vocab, _ = initialize_vocabulary(config.vocab_path) + + data_schema = {"image": {"type": "bytes"}, + "label": {"type": "int32", "shape": [-1]}, + "decoder_input": {"type": "int32", "shape": [-1]}, + "decoder_mask": {"type": "int32", "shape": [-1]}, + "decoder_target": {"type": "int32", "shape": [-1]}, + "annotation": {"type": "string"}} + + mindrecord_path = os.path.join(mindrecord_dir, prefix) + + writer = FileWriter(mindrecord_path, file_num) + writer.add_schema(data_schema, "ocr") + + for img_id in images: + + image_path = image_path_dict[img_id] + annotation = image_anno_dict[img_id] + + label_max_len = config.max_text_len + text_max_len = config.max_text_len - 2 + + if len(annotation) > text_max_len: + continue + label = serialize_annotation(image_path, annotation, vocab) + + if label is None: + continue + + label_len = len(label) + decoder_input_len = label_max_len + + if label_len <= decoder_input_len: + label = np.concatenate((label, np.zeros(decoder_input_len - label_len, dtype=np.int32))) + one_mask_len = label_len - config.go_shift + target_weight = np.concatenate((np.ones(one_mask_len, dtype=np.float32), + np.zeros(decoder_input_len - one_mask_len, dtype=np.float32))) + else: + continue + + decoder_input = (np.array(label).T).astype(np.int32) + target_weight = (np.array(target_weight).T).astype(np.int32) + + if not len(decoder_input) == len(target_weight): + continue + + target = [decoder_input[i + 1] for i in range(len(decoder_input) - 1)] + target = (np.array(target)).astype(np.int32) + + + with open(image_path, 'rb') as f: + img = f.read() + + row = {"image": img, + "label": label, + "decoder_input": decoder_input, + "decoder_mask": target_weight, + "decoder_target": target, + "annotation": str(annotation)} + + writer.write_raw_data([row]) + writer.commit() + + +def fsns_val_data_to_mindrecord(mindrecord_dir, prefix="data_ocr.mindrecord", file_num=8): + + anno_file_dirs = [config.train_annotation_file] + images, image_path_dict, image_anno_dict = create_fsns_label(image_dir=config.data_root, + anno_file_dirs=anno_file_dirs) + vocab, _ = initialize_vocabulary(config.vocab_path) + + data_schema = {"image": {"type": "bytes"}, + "decoder_input": {"type": "int32", "shape": [-1]}, + "decoder_target": {"type": "int32", "shape": [-1]}, + "annotation": {"type": "string"}} + + mindrecord_path = os.path.join(mindrecord_dir, prefix) + + writer = FileWriter(mindrecord_path, file_num) + writer.add_schema(data_schema, "ocr") + + for img_id in images: + + image_path = image_path_dict[img_id] + annotation = image_anno_dict[img_id] + + label_max_len = config.max_text_len + text_max_len = config.max_text_len - 2 + + if len(annotation) > text_max_len: + continue + label = serialize_annotation(image_path, annotation, vocab) + + if label is None: + continue + + label_len = len(label) + decoder_input_len = label_max_len + + if label_len <= decoder_input_len: + label = np.concatenate((label, np.zeros(decoder_input_len - label_len, dtype=np.int32))) + else: + continue + + decoder_input = (np.array(label).T).astype(np.int32) + + target = [decoder_input[i + 1] for i in range(len(decoder_input) - 1)] + target = (np.array(target)).astype(np.int32) + + + with open(image_path, 'rb') as f: + img = f.read() + + row = {"image": img, + "decoder_input": decoder_input, + "decoder_target": target, + "annotation": str(annotation)} + + writer.write_raw_data([row]) + writer.commit() + +def create_mindrecord(dataset="fsns", prefix="fsns.mindrecord", is_training=True): + print("Start creating dataset!") + if is_training: + mindrecord_dir = os.path.join(config.mindrecord_dir, "train") + mindrecord_files = [os.path.join(mindrecord_dir, prefix + "0")] + + if not os.path.exists(mindrecord_files[0]): + if not os.path.isdir(mindrecord_dir): + os.makedirs(mindrecord_dir) + if dataset == "fsns": + if os.path.isdir(config.data_root): + print("Create FSNS Mindrecord files for train pipeline.") + fsns_train_data_to_mindrecord(mindrecord_dir=mindrecord_dir, prefix=prefix, file_num=8) + print("Create FSNS Mindrecord files for train pipeline Done, at {}".format(mindrecord_dir)) + else: + print("{} not exits!".format(config.data_root)) + else: + print("{} dataset is not defined!".format(dataset)) + + if not is_training: + mindrecord_dir = os.path.join(config.mindrecord_dir, "val") + mindrecord_files = [os.path.join(mindrecord_dir, prefix + "0")] + + if not os.path.exists(mindrecord_files[0]): + if not os.path.isdir(mindrecord_dir): + os.makedirs(mindrecord_dir) + if dataset == "fsns": + if os.path.isdir(config.val_data_root): + print("Create FSNS Mindrecord files for val pipeline.") + fsns_val_data_to_mindrecord(mindrecord_dir=mindrecord_dir, prefix=prefix) + print("Create FSNS Mindrecord files for val pipeline Done, at {}".format(mindrecord_dir)) + else: + print("{} not exits!".format(config.val_data_root)) + else: + print("{} dataset is not defined!".format(dataset)) + + return mindrecord_files diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/dataset.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/dataset.py new file mode 100755 index 0000000000..3b72459c7a --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/dataset.py @@ -0,0 +1,144 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""FSNS dataset""" + +import cv2 +import numpy as np +from PIL import Image + +import mindspore.dataset as de +import mindspore.dataset.vision.c_transforms as C +import mindspore.dataset.vision.py_transforms as P +import mindspore.dataset.transforms.c_transforms as ops +import mindspore.common.dtype as mstype + +from src.config import config + + +class AugmentationOps(): + def __init__(self, min_area_ratio=0.8, aspect_ratio_range=(0.8, 1.2), brightness=32./255., + contrast=0.5, saturation=0.5, hue=0.2, img_tile_shape=(150, 150)): + self.min_area_ratio = min_area_ratio + self.aspect_ratio_range = aspect_ratio_range + self.img_tile_shape = img_tile_shape + self.random_image_distortion_ops = P.RandomColorAdjust(brightness=brightness, + contrast=contrast, + saturation=saturation, + hue=hue) + + def __call__(self, img): + img_h = self.img_tile_shape[0] + img_w = self.img_tile_shape[1] + img_new = np.zeros([128, 512, 3]) + + for i in range(4): + img_tile = img[:, (i*150):((i+1)*150), :] + # Random crop cut from the street sign image, resized to the same size. + # Assures that the crop covers at least 0.8 area of the input image. + # Aspect ratio of cropped image is within [0.8,1.2] range. + h = img_h + 1 + w = img_w + 1 + + while (w >= img_w or h >= img_h): + aspect_ratio = np.random.uniform(self.aspect_ratio_range[0], + self.aspect_ratio_range[1]) + h_low = np.ceil(np.sqrt(self.min_area_ratio * img_h * img_w / aspect_ratio)) + h_high = np.floor(np.sqrt(img_h * img_w / aspect_ratio)) + h = np.random.randint(h_low, h_high) + w = int(h * aspect_ratio) + + y = np.random.randint(img_w - w) + x = np.random.randint(img_h - h) + img_tile = img_tile[x:(x+h), y:(y+w), :] + # Randomly chooses one of the 4 interpolation resize methods. + interpolation = np.random.choice([cv2.INTER_LINEAR, + cv2.INTER_CUBIC, + cv2.INTER_AREA, + cv2.INTER_NEAREST]) + img_tile = cv2.resize(img_tile, (128, 128), interpolation=interpolation) + # Random color distortion ops. + img_tile_pil = Image.fromarray(img_tile) + img_tile_pil = self.random_image_distortion_ops(img_tile_pil) + img_tile = np.array(img_tile_pil) + img_new[:, (i*128):((i+1)*128), :] = img_tile + + img_new = 2 * (img_new / 255.) - 1 + return img_new + + +class ImageResizeWithRescale(): + def __init__(self, standard_img_height, standard_img_width, channel_size=3): + self.standard_img_height = standard_img_height + self.standard_img_width = standard_img_width + self.channel_size = channel_size + + def __call__(self, img): + img = cv2.resize(img, (self.standard_img_width, self.standard_img_height)) + img = 2 * (img / 255.) - 1 + return img + + +def random_teacher_force(images, source_ids, target_ids): + teacher_force = np.random.random() < config.teacher_force_ratio + teacher_force_array = np.array([teacher_force], dtype=bool) + return images, source_ids, target_ids, teacher_force_array + + +def create_ocr_train_dataset(mindrecord_file, batch_size=32, rank_size=1, rank_id=0, + is_training=True, num_parallel_workers=4, use_multiprocessing=True): + ds = de.MindDataset(mindrecord_file, + columns_list=["image", "decoder_input", "decoder_target"], + num_shards=rank_size, + shard_id=rank_id, + num_parallel_workers=num_parallel_workers, + shuffle=is_training) + aug_ops = AugmentationOps() + transforms = [C.Decode(), + aug_ops, + C.HWC2CHW()] + ds = ds.map(operations=transforms, input_columns=["image"], python_multiprocessing=use_multiprocessing, + num_parallel_workers=num_parallel_workers) + ds = ds.map(operations=ops.PadEnd([config.max_length], 0), input_columns=["decoder_target"]) + ds = ds.map(operations=random_teacher_force, input_columns=["image", "decoder_input", "decoder_target"], + output_columns=["image", "decoder_input", "decoder_target", "teacher_force"], + column_order=["image", "decoder_input", "decoder_target", "teacher_force"]) + type_cast_op_bool = ops.TypeCast(mstype.bool_) + ds = ds.map(operations=type_cast_op_bool, input_columns="teacher_force") + print("Train dataset size= %s" % (int(ds.get_dataset_size()))) + ds = ds.batch(batch_size, drop_remainder=True) + return ds + + +def create_ocr_val_dataset(mindrecord_file, batch_size=32, rank_size=1, rank_id=0, + num_parallel_workers=4, use_multiprocessing=True): + ds = de.MindDataset(mindrecord_file, + columns_list=["image", "annotation", "decoder_input", "decoder_target"], + num_shards=rank_size, + shard_id=rank_id, + num_parallel_workers=num_parallel_workers, + shuffle=False) + resize_rescale_op = ImageResizeWithRescale(standard_img_height=128, standard_img_width=512) + transforms = [C.Decode(), + resize_rescale_op, + C.HWC2CHW()] + ds = ds.map(operations=transforms, input_columns=["image"], python_multiprocessing=use_multiprocessing, + num_parallel_workers=num_parallel_workers) + ds = ds.map(operations=ops.PadEnd([config.max_length], 0), input_columns=["decoder_target"], + python_multiprocessing=use_multiprocessing, num_parallel_workers=8) + ds = ds.map(operations=ops.PadEnd([config.max_length], 0), input_columns=["decoder_input"], + python_multiprocessing=use_multiprocessing, num_parallel_workers=8) + ds = ds.batch(batch_size, drop_remainder=True) + print("Val dataset size= %s" % (str(int(ds.get_dataset_size())*batch_size))) + return ds diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/gru.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/gru.py new file mode 100644 index 0000000000..4d83a2d3fe --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/gru.py @@ -0,0 +1,55 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#" ============================================================================ +""" +GRU cell +""" +import mindspore.nn as nn +import mindspore.ops.operations as P +import mindspore.common.dtype as mstype +from src.weight_init import gru_default_state + + +class GRU(nn.Cell): + ''' + GRU model + + Args: + input_size: The number of expected features in the input + hidden_size: The number of features in the hidden state + ''' + def __init__(self, input_size, hidden_size): + super(GRU, self).__init__() + self.input_size = input_size + self.hidden_size = hidden_size + self.weight_i, self.weight_h, self.bias_i, self.bias_h = gru_default_state(self.input_size, self.hidden_size) + self.rnn = P.DynamicGRUV2() + self.cast = P.Cast() + + def construct(self, x, h): + ''' + GRU construction + + Args: + x(Tensor): GRU input + h(Tensor): GRU hidden state + + Returns: + output(Tensor): rnn output + hidden(Tensor): hidden state + ''' + x = self.cast(x, mstype.float16) + h = self.cast(h, mstype.float16) + y1, h1, _, _, _, _ = self.rnn(x, self.weight_i, self.weight_h, self.bias_i, self.bias_h, None, h) + return y1, h1 diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/logger.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/logger.py new file mode 100755 index 0000000000..a9deb929ad --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/logger.py @@ -0,0 +1,80 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Custom Logger.""" +import os +import sys +import logging +from datetime import datetime + + +class LOGGER(logging.Logger): + """ + Logger. + + Args: + logger_name: String. Logger name. + rank: Integer. Rank id. + """ + def __init__(self, logger_name, rank=0): + super(LOGGER, self).__init__(logger_name) + self.rank = rank + if rank % 8 == 0: + console = logging.StreamHandler(sys.stdout) + console.setLevel(logging.INFO) + formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s') + console.setFormatter(formatter) + self.addHandler(console) + + def setup_logging_file(self, log_dir, rank=0): + """Setup logging file.""" + self.rank = rank + if not os.path.exists(log_dir): + os.makedirs(log_dir, exist_ok=True) + log_name = datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') + '_rank_{}.log'.format(rank) + self.log_fn = os.path.join(log_dir, log_name) + fh = logging.FileHandler(self.log_fn) + fh.setLevel(logging.INFO) + formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s') + fh.setFormatter(formatter) + self.addHandler(fh) + + def info(self, msg, *args, **kwargs): + if self.isEnabledFor(logging.INFO): + self._log(logging.INFO, msg, args, **kwargs) + + def save_args(self, args): + self.info('Args:') + args_dict = vars(args) + for key in args_dict.keys(): + self.info('--> %s: %s', key, args_dict[key]) + self.info('') + + def important_info(self, msg, *args, **kwargs): + if self.isEnabledFor(logging.INFO) and self.rank == 0: + line_width = 2 + important_msg = '\n' + important_msg += ('*'*70 + '\n')*line_width + important_msg += ('*'*line_width + '\n')*2 + important_msg += '*'*line_width + ' '*8 + msg + '\n' + important_msg += ('*'*line_width + '\n')*2 + important_msg += ('*'*70 + '\n')*line_width + self.info(important_msg, *args, **kwargs) + + +def get_logger(path, rank): + """Get Logger.""" + logger = LOGGER('crnn-seq2seq-ocr', rank) + logger.setup_logging_file(path, rank) + return logger diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/lstm.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/lstm.py new file mode 100644 index 0000000000..00f9c0db7d --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/lstm.py @@ -0,0 +1,196 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#" ============================================================================ +"""lstm""" +import math +import numpy as np +from mindspore import nn, context, Tensor, Parameter, ParameterTuple +import mindspore.common.dtype as mstype +from mindspore.common.initializer import initializer +from mindspore.ops.primitive import constexpr +from mindspore.ops import operations as P + + +@constexpr +def _create_sequence_length(shape): + num_step, batch_size, _ = shape + sequence_length = Tensor(np.ones(batch_size, np.int32) * num_step, mstype.int32) + return sequence_length + +class LSTM(nn.Cell): + """ + Stacked LSTM (Long Short-Term Memory) layers. + + Args: + input_size (int): Number of features of input. + hidden_size (int): Number of features of hidden layer. + num_layers (int): Number of layers of stacked LSTM . Default: 1. + has_bias (bool): Whether the cell has bias `b_ih` and `b_hh`. Default: True. + batch_first (bool): Specifies whether the first dimension of input is batch_size. Default: False. + dropout (float, int): If not 0, append `Dropout` layer on the outputs of each + LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0]. + bidirectional (bool): Specifies whether it is a bidirectional LSTM. Default: False. + + Inputs: + - **input** (Tensor) - Tensor of shape (seq_len, batch_size, `input_size`) or + (batch_size, seq_len, `input_size`). + - **hx** (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or + mindspore.float16 and shape (num_directions * `num_layers`, batch_size, `hidden_size`). + Data type of `hx` must be the same as `input`. + + Outputs: + Tuple, a tuple contains (`output`, (`h_n`, `c_n`)). + + - **output** (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * `hidden_size`). + - **hx_n** (tuple) - A tuple of two Tensor (h_n, c_n) both of shape + (num_directions * `num_layers`, batch_size, `hidden_size`). + """ + + def __init__(self, + input_size, + hidden_size, + num_layers=1, + has_bias=True, + batch_first=False, + dropout=0, + bidirectional=False): + super(LSTM, self).__init__() + self.is_ascend = context.get_context("device_target") == "Ascend" + + self.batch_first = batch_first + self.transpose = P.Transpose() + self.num_layers = num_layers + self.bidirectional = bidirectional + self.dropout = dropout + self.lstm = P.LSTM(input_size=input_size, + hidden_size=hidden_size, + num_layers=num_layers, + has_bias=has_bias, + bidirectional=bidirectional, + dropout=float(dropout)) + + weight_size = 0 + gate_size = 4 * hidden_size + stdv = 1 / math.sqrt(hidden_size) + num_directions = 2 if bidirectional else 1 + if self.is_ascend: + self.reverse_seq = P.ReverseSequence(batch_dim=1, seq_dim=0) + self.concat = P.Concat(axis=0) + self.concat_2dim = P.Concat(axis=2) + self.cast = P.Cast() + self.shape = P.Shape() + if dropout < 0 or dropout > 1: + raise ValueError("For LSTM, dropout must be a number in range [0, 1], but got {}".format(dropout)) + if dropout == 1: + self.dropout_op = P.ZerosLike() + else: + self.dropout_op = nn.Dropout(float(1 - dropout)) + b0 = np.zeros(gate_size, dtype=np.float32) + self.w_list = [] + self.b_list = [] + self.rnns_fw = P.DynamicRNN(forget_bias=0.0) + self.rnns_bw = P.DynamicRNN(forget_bias=0.0) + + for layer in range(num_layers): + w_shape = input_size if layer == 0 else (num_directions * hidden_size) + w_np = np.random.uniform(-stdv, stdv, (w_shape + hidden_size, gate_size)).astype(np.float32) + self.w_list.append(Parameter( + initializer(Tensor(w_np), [w_shape + hidden_size, gate_size]), name='weight_fw' + str(layer))) + if has_bias: + b_np = np.random.uniform(-stdv, stdv, gate_size).astype(np.float32) + self.b_list.append(Parameter(initializer(Tensor(b_np), [gate_size]), name='bias_fw' + str(layer))) + else: + self.b_list.append(Parameter(initializer(Tensor(b0), [gate_size]), name='bias_fw' + str(layer))) + if bidirectional: + w_bw_np = np.random.uniform(-stdv, stdv, (w_shape + hidden_size, gate_size)).astype(np.float32) + self.w_list.append(Parameter(initializer(Tensor(w_bw_np), [w_shape + hidden_size, gate_size]), + name='weight_bw' + str(layer))) + b_bw_np = np.random.uniform(-stdv, stdv, (4 * hidden_size)).astype(np.float32) if has_bias else b0 + self.b_list.append(Parameter(initializer(Tensor(b_bw_np), [gate_size]), + name='bias_bw' + str(layer))) + self.w_list = ParameterTuple(self.w_list) + self.b_list = ParameterTuple(self.b_list) + else: + for layer in range(num_layers): + input_layer_size = input_size if layer == 0 else hidden_size * num_directions + increment_size = gate_size * input_layer_size + increment_size += gate_size * hidden_size + if has_bias: + increment_size += 2 * gate_size + weight_size += increment_size * num_directions + w_np = np.random.uniform(-stdv, stdv, (weight_size, 1, 1)).astype(np.float32) + self.weight = Parameter(initializer(Tensor(w_np), [weight_size, 1, 1]), name='weight') + + def _stacked_bi_dynamic_rnn(self, x, init_h, init_c, weight, bias): + """stacked bidirectional dynamic_rnn""" + x_shape = self.shape(x) + sequence_length = _create_sequence_length(x_shape) + pre_layer = x + hn = () + cn = () + output = x + for i in range(self.num_layers): + offset = i * 2 + weight_fw, weight_bw = weight[offset], weight[offset + 1] + bias_fw, bias_bw = bias[offset], bias[offset + 1] + init_h_fw, init_h_bw = init_h[offset:offset + 1, :, :], init_h[offset + 1:offset + 2, :, :] + init_c_fw, init_c_bw = init_c[offset:offset + 1, :, :], init_c[offset + 1:offset + 2, :, :] + bw_x = self.reverse_seq(pre_layer, sequence_length) + y, h, c, _, _, _, _, _ = self.rnns_fw(pre_layer, weight_fw, bias_fw, None, init_h_fw, init_c_fw) + y_bw, h_bw, c_bw, _, _, _, _, _ = self.rnns_bw(bw_x, weight_bw, bias_bw, None, init_h_bw, init_c_bw) + y_bw = self.reverse_seq(y_bw, sequence_length) + output = self.concat_2dim((y, y_bw)) + pre_layer = self.dropout_op(output) if self.dropout else output + hn += (h[-1:, :, :],) + hn += (h_bw[-1:, :, :],) + cn += (c[-1:, :, :],) + cn += (c_bw[-1:, :, :],) + status_h = self.concat(hn) + status_c = self.concat(cn) + return output, status_h, status_c + + def _stacked_dynamic_rnn(self, x, init_h, init_c, weight, bias): + """stacked mutil_layer dynamic_rnn""" + pre_layer = x + hn = () + cn = () + y = 0 + for i in range(self.num_layers): + weight_fw, bias_bw = weight[i], bias[i] + init_h_fw, init_c_bw = init_h[i:i + 1, :, :], init_c[i:i + 1, :, :] + y, h, c, _, _, _, _, _ = self.rnns_fw(pre_layer, weight_fw, bias_bw, None, init_h_fw, init_c_bw) + pre_layer = self.dropout_op(y) if self.dropout else y + hn += (h[-1:, :, :],) + cn += (c[-1:, :, :],) + status_h = self.concat(hn) + status_c = self.concat(cn) + return y, status_h, status_c + + def construct(self, x, hx): + if self.batch_first: + x = self.transpose(x, (1, 0, 2)) + h, c = hx + if self.is_ascend: + x = self.cast(x, mstype.float16) + h = self.cast(h, mstype.float16) + c = self.cast(c, mstype.float16) + if self.bidirectional: + x, h, c = self._stacked_bi_dynamic_rnn(x, h, c, self.w_list, self.b_list) + else: + x, h, c = self._stacked_dynamic_rnn(x, h, c, self.w_list, self.b_list) + else: + x, h, c, _, _ = self.lstm(x, h, c, self.weight) + if self.batch_first: + x = self.transpose(x, (1, 0, 2)) + return x, (h, c) diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/seq2seq.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/seq2seq.py new file mode 100755 index 0000000000..4bd4dc7a95 --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/seq2seq.py @@ -0,0 +1,165 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#" ============================================================================ +""" +Seq2Seq_OCR model. + +""" +import mindspore.nn as nn +import mindspore.ops.operations as P +import mindspore.common.dtype as mstype + +from src.cnn import CNN +from src.gru import GRU +from src.lstm import LSTM +from src.weight_init import lstm_default_state + + +class BidirectionalLSTM(nn.Cell): + """Bidirectional LSTM with a Dense layer + + Args: + batch_size(int): batch size of input data + input_size(int): Size of time sequence + hidden_size(int): the hidden size of LSTM layers + output_size(int): the output size of the dense layer + """ + def __init__(self, batch_size, input_size, hidden_size, output_size): + super(BidirectionalLSTM, self).__init__() + self.rnn = LSTM(input_size=input_size, hidden_size=hidden_size, bidirectional=True).to_float(mstype.float16) + self.h, self.c = lstm_default_state(batch_size, hidden_size, bidirectional=True) + self.embedding = nn.Dense(hidden_size * 2, output_size).to_float(mstype.float16) + self.shape = P.Shape() + self.reshape = P.Reshape() + self.cast = P.Cast() + + def construct(self, inputs): + inputs = self.cast(inputs, mstype.float16) + recurrent, _ = self.rnn(inputs, (self.h, self.c)) + T, b, h = self.shape(recurrent) + t_rec = self.reshape(recurrent, (T * b, h)) + output = self.embedding(t_rec) + output = self.reshape(output, (T, b, -1)) + return output + + +class AttnDecoderRNN(nn.Cell): + """Attention Decoder Structure with a one-layer GRU + + Args: + hidden_size(int): the hidden size + output_size(int): the output size + max_length(iht): max time step of the decoder + dropout_p(float): dropout probability, default is 0.1 + """ + def __init__(self, hidden_size, output_size, max_length, dropout_p=0.1): + super(AttnDecoderRNN, self).__init__() + self.hidden_size = hidden_size + self.output_size = output_size + self.dropout_p = dropout_p + self.max_length = max_length + self.embedding = nn.Embedding(self.output_size, self.hidden_size) + self.attn = nn.Dense(in_channels=self.hidden_size * 2, out_channels=self.max_length).to_float(mstype.float16) + self.attn_combine = nn.Dense(in_channels=self.hidden_size * 2, + out_channels=self.hidden_size).to_float(mstype.float16) + self.dropout = nn.Dropout(keep_prob=1.0 - self.dropout_p) + self.gru = GRU(hidden_size, hidden_size).to_float(mstype.float16) + self.out = nn.Dense(in_channels=self.hidden_size, out_channels=self.output_size).to_float(mstype.float16) + self.transpose = P.Transpose() + self.concat = P.Concat(axis=2) + self.concat1 = P.Concat(axis=1) + self.softmax = P.Softmax(axis=1) + self.relu = P.ReLU() + self.log_softmax = P.LogSoftmax(axis=1) + self.bmm = P.BatchMatMul() + self.unsqueeze = P.ExpandDims() + self.squeeze = P.Squeeze(1) + self.squeeze1 = P.Squeeze(0) + self.cast = P.Cast() + + def construct(self, inputs, hidden, encoder_outputs): + embedded = self.embedding(inputs) + embedded = self.transpose(embedded, (1, 0, 2)) + embedded = self.dropout(embedded) + embedded = self.cast(embedded, mstype.float16) + + embedded_concat = self.concat((embedded, hidden)) + embedded_concat = self.squeeze1(embedded_concat) + attn_weights = self.softmax(self.attn(embedded_concat)) + attn_weights = self.unsqueeze(attn_weights, 1) + perm_encoder_outputs = self.transpose(encoder_outputs, (1, 0, 2)) + attn_applied = self.bmm(attn_weights, perm_encoder_outputs) + attn_applied = self.squeeze(attn_applied) + embedded_squeeze = self.squeeze1(embedded) + + output = self.concat1((embedded_squeeze, attn_applied)) + output = self.attn_combine(output) + output = self.unsqueeze(output, 0) + output = self.relu(output) + + gru_hidden = self.squeeze1(hidden) + output, hidden, _, _, _, _ = self.gru(output, gru_hidden) + output = self.squeeze1(output) + output = self.log_softmax(self.out(output)) + + return output, hidden, attn_weights + + +class Encoder(nn.Cell): + """Encoder with a CNN and two BidirectionalLSTM layers + + Args: + batch_size(int): batch size of input data + conv_out_dim(int): the output dimension of the cnn layer + hidden_size(int): the hidden size of LSTM layers + """ + def __init__(self, batch_size, conv_out_dim, hidden_size): + super(Encoder, self).__init__() + self.cnn = CNN(int(conv_out_dim/4)) + self.lstm1 = BidirectionalLSTM(batch_size, conv_out_dim, hidden_size, hidden_size).to_float(mstype.float16) + self.lstm2 = BidirectionalLSTM(batch_size, hidden_size, hidden_size, hidden_size).to_float(mstype.float16) + self.transpose = P.Transpose() + self.cast = P.Cast() + self.split = P.Split(axis=3, output_num=4) + self.concat = P.Concat(axis=1) + + def construct(self, inputs): + inputs = self.cast(inputs, mstype.float32) + (x1, x2, x3, x4) = self.split(inputs) + conv1 = self.cnn(x1) + conv2 = self.cnn(x2) + conv3 = self.cnn(x3) + conv4 = self.cnn(x4) + conv = self.concat((conv1, conv2, conv3, conv4)) + conv = self.transpose(conv, (2, 0, 1)) + output = self.lstm1(conv) + output = self.lstm2(output) + return output + + +class Decoder(nn.Cell): + """Decoder + + Args: + hidden_size(int): the hidden size + output_size(int): the output size + max_length(iht): max time step of the decoder + dropout_p(float): dropout probability, default is 0.1 + """ + def __init__(self, hidden_size, output_size, max_length, dropout_p=0.1): + super(Decoder, self).__init__() + self.decoder = AttnDecoderRNN(hidden_size, output_size, max_length, dropout_p) + + def construct(self, inputs, hidden, encoder_outputs): + return self.decoder(inputs, hidden, encoder_outputs) diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/utils.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/utils.py new file mode 100755 index 0000000000..270da4fdbc --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/utils.py @@ -0,0 +1,51 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Util class or function.""" + +from __future__ import absolute_import, division, print_function, unicode_literals + +import os +import codecs +import logging + + +def initialize_vocabulary(vocabulary_path): + """ + initialize vocabulary from file. + assume the vocabulary is stored one-item-per-line + """ + characters_class = 9999 + + if os.path.exists(vocabulary_path): + rev_vocab = [] + with codecs.open(vocabulary_path, 'r', encoding='utf-8') as voc_file: + rev_vocab = [line.strip() for line in voc_file] + + vocab = {x: y for (y, x) in enumerate(rev_vocab)} + + reserved_char_size = characters_class - len(rev_vocab) + if reserved_char_size < 0: + raise ValueError("Number of characters in vocabulary is equal or larger than config.characters_class") + + for _ in range(reserved_char_size): + rev_vocab.append('') + + # put space at the last position + vocab[' '] = len(rev_vocab) + rev_vocab.append(' ') + logging.info("Initializing vocabulary ends: %s", vocabulary_path) + return vocab, rev_vocab + + raise ValueError("Initializing vocabulary ends: %s" % vocabulary_path) diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/src/weight_init.py b/model_zoo/official/cv/crnn_seq2seq_ocr/src/weight_init.py new file mode 100644 index 0000000000..5b8d3b8a2e --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/src/weight_init.py @@ -0,0 +1,41 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#" ============================================================================ +""" +weights initialization +""" +import math +import numpy as np +from mindspore import Tensor, Parameter + + +def lstm_default_state(batch_size, hidden_size, bidirectional, num_layers=1): + """init default input.""" + num_directions = 2 if bidirectional else 1 + h = Tensor(np.zeros((num_layers * num_directions, batch_size, hidden_size)).astype(np.float32)) + c = Tensor(np.zeros((num_layers * num_directions, batch_size, hidden_size)).astype(np.float32)) + return h, c + + +def gru_default_state(input_size, hidden_size): + stdv = 1 / math.sqrt(hidden_size) + weight_i = Parameter(Tensor(np.random.uniform(-stdv, stdv, (input_size, 3*hidden_size)).astype(np.float32)), + name='weight_i') + weight_h = Parameter(Tensor(np.random.uniform(-stdv, stdv, (input_size, 3*hidden_size)).astype(np.float32)), + name='weight_h') + bias_i = Parameter(Tensor(np.random.uniform(-stdv, stdv, (3*hidden_size)).astype(np.float32)), + name='bias_i') + bias_h = Parameter(Tensor(np.random.uniform(-stdv, stdv, (3*hidden_size)).astype(np.float32)), + name='bias_h') + return weight_i, weight_h, bias_i, bias_h diff --git a/model_zoo/official/cv/crnn_seq2seq_ocr/train.py b/model_zoo/official/cv/crnn_seq2seq_ocr/train.py new file mode 100755 index 0000000000..d83f2b5e25 --- /dev/null +++ b/model_zoo/official/cv/crnn_seq2seq_ocr/train.py @@ -0,0 +1,158 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +CRNN-Seq2Seq-OCR train. + +""" + +import os +import argparse +import datetime + +import mindspore.nn as nn +import mindspore.common.dtype as mstype +from mindspore.train.model import Model +from mindspore.context import ParallelMode +from mindspore.common import set_seed +from mindspore import Tensor +from mindspore import context +from mindspore.communication.management import init +from mindspore.train.callback import ModelCheckpoint +from mindspore.train.callback import CheckpointConfig, LossMonitor, TimeMonitor + +from mindspore.train.serialization import load_checkpoint, load_param_into_net + +from src.config import config +from src.dataset import create_ocr_train_dataset +from src.logger import get_logger +from src.attention_ocr import AttentionOCR, AttentionOCRWithLossCell, TrainingWrapper + + +set_seed(1) + + +def parse_args(): + """Parse train arguments.""" + parser = argparse.ArgumentParser('mindspore CRNN-Seq2Seq-OCR training') + + # device related + parser.add_argument("--device_target", type=str, default="Ascend", + help="device where the code will be implemented.") + parser.add_argument("--device_id", type=int, default=0, help="Device id, default: 0.") + + # distributed related + parser.add_argument('--is_distributed', type=int, default=0, + help='Distribute train or not, 1 for yes, 0 for no. Default: 0') + parser.add_argument('--rank_id', type=int, default=0, help='Local rank of distributed. Default: 0') + parser.add_argument('--device_num', type=int, default=1, help='World size of device. Default: 1') + + #dataset related + parser.add_argument('--mindrecord_file', type=str, default='', help='Train dataset directory.') + + # logging related + parser.add_argument('--log_interval', type=int, default=100, help='Logging interval steps. Default: 100') + parser.add_argument('--ckpt_path', type=str, default='outputs/', help='Checkpoint save location. Default: outputs/') + parser.add_argument('--pre_checkpoint_path', type=str, default='', help='Checkpoint save location.') + parser.add_argument('--ckpt_interval', type=int, default=None, help='Save checkpoint interval. Default: None') + + parser.add_argument('--is_save_on_master', type=int, default=0, + help='Save ckpt on master or all rank, 1 for master, 0 for all ranks. Default: 0') + + args, _ = parser.parse_known_args() + + # logger + args.outputs_dir = os.path.join(args.ckpt_path, + datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S')) + + return args + + +def train(): + """Train function.""" + args = parse_args() + context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, device_id=args.device_id) + + if args.is_distributed: + rank = args.rank_id + device_num = args.device_num + context.set_auto_parallel_context(device_num=device_num, + parallel_mode=ParallelMode.DATA_PARALLEL, + gradients_mean=True) + init() + else: + rank = 0 + device_num = 1 + + # Logger + args.logger = get_logger(args.outputs_dir, rank) + args.rank_save_ckpt_flag = 0 + if args.is_save_on_master: + if rank == 0: + args.rank_save_ckpt_flag = 1 + else: + args.rank_save_ckpt_flag = 1 + + # DATASET + dataset = create_ocr_train_dataset(args.mindrecord_file, + config.batch_size, + rank_size=device_num, + rank_id=rank) + args.steps_per_epoch = dataset.get_dataset_size() + args.logger.info('Finish loading dataset') + + if not args.ckpt_interval: + args.ckpt_interval = args.steps_per_epoch + args.logger.save_args(args) + + network = AttentionOCR(config.batch_size, + int(config.img_width / 4), + config.encoder_hidden_size, + config.decoder_hidden_size, + config.decoder_output_size, + config.max_length, + config.dropout_p) + + if args.pre_checkpoint_path: + param_dict = load_checkpoint(args.pre_checkpoint_path) + load_param_into_net(network, param_dict) + + network = AttentionOCRWithLossCell(network, config.max_length) + + lr = Tensor(config.lr, mstype.float32) + opt = nn.Adam(network.trainable_params(), lr, beta1=config.adam_beta1, beta2=config.adam_beta2, + loss_scale=config.loss_scale) + + network = TrainingWrapper(network, opt, sens=config.loss_scale) + + args.logger.info('Finished get network') + + callback = [TimeMonitor(data_size=1), LossMonitor()] + if args.rank_save_ckpt_flag: + ckpt_config = CheckpointConfig(save_checkpoint_steps=args.steps_per_epoch, + keep_checkpoint_max=config.keep_checkpoint_max) + save_ckpt_path = os.path.join(args.outputs_dir, 'ckpt_' + str(rank) + '/') + ckpt_cb = ModelCheckpoint(config=ckpt_config, + directory=save_ckpt_path, + prefix="crnn_seq2seq_ocr") + callback.append(ckpt_cb) + + model = Model(network) + model.train(config.num_epochs, dataset, callbacks=callback, dataset_sink_mode=False) + + args.logger.info('==========Training Done===============') + + +if __name__ == "__main__": + train()