You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

readme.md 7.1 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149
  1. # TextRCNN
  2. ## Contents
  3. - [TextRCNN Description](#textrcnn-description)
  4. - [Model Architecture](#model-architecture)
  5. - [Dataset](#dataset)
  6. - [Environment Requirements](#environment-requirements)
  7. - [Quick Start](#quick-start)
  8. - [Script Description](#script-description)
  9. - [ModelZoo Homepage](#modelzoo-homepage)
  10. ## [TextRCNN Description](#contents)
  11. TextRCNN, a model for text classification, which is proposed by the Chinese Academy of Sciences in 2015.
  12. TextRCNN actually combines RNN and CNN, first uses bidirectional RNN to obtain upper semantic and grammatical information of the input text,
  13. and then uses maximum pooling to automatically filter out the most important feature.
  14. Then connect a fully connected layer for classification.
  15. The TextCNN network structure contains a convolutional layer and a pooling layer. In RCNN, the feature extraction function of the convolutional layer is replaced by RNN. The overall structure consists of RNN and pooling layer, so it is called RCNN.
  16. [Paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/download/9745/9552): Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao: Recurrent Convolutional Neural Networks for Text Classification. AAAI 2015: 2267-2273
  17. ## [Model Architecture](#contents)
  18. Specifically, the TextRCNN is mainly composed of three parts: a recurrent structure layer, a max-pooling layer, and a fully connected layer. In the paper, the length of the word vector $|e|=50$, the length of the context vector $|c|=50$, the hidden layer size $ H=100$, the learning rate $\alpha=0.01$, the amount of words is $|V|$, the input is a sequence of words, and the output is a vector containing categories.
  19. ## [Dataset](#contents)
  20. Dataset used: [Sentence polarity dataset v1.0](<http://www.cs.cornell.edu/people/pabo/movie-review-data/>)
  21. - Dataset size:10662 movie comments in 2 classes, 9596 comments for train set, 1066 comments for test set.
  22. - Data format:text files. The processed data is in ```./data/```
  23. ## [Environment Requirements](#contents)
  24. - Hardware: Ascend
  25. - Framework: [MindSpore](https://www.mindspore.cn/install/en)
  26. - For more information, please check the resources below:[MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html), [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html).
  27. ## [Quick Start](#contents)
  28. - Preparing enviroment
  29. ```python
  30. # download the pretrained GoogleNews-vectors-negative300.bin, put it into /tmp
  31. # you can download from https://code.google.com/archive/p/word2vec/,
  32. # or from https://pan.baidu.com/s/1NC2ekA_bJ0uSL7BF3SjhIg, code: yk9a
  33. mv /tmp/GoogleNews-vectors-negative300.bin ./word2vec/
  34. ```
  35. - Preparing data
  36. ```python
  37. # split the dataset by the following scripts.
  38. mkdir -p data/test && mkdir -p data/train
  39. python data_helpers.py --task dataset_split --data_dir dataset_dir
  40. ```
  41. - Modify the source code in ```mindspore/train/model.py```, line 173, add "O3".
  42. ```python
  43. self._eval_network = nn.WithEvalCell(self._network, self._loss_fn, self._amp_level in ["O2", "O3"])
  44. ```
  45. - Runing on Ascend
  46. ```python
  47. # run training
  48. DEVICE_ID=7 python train.py
  49. # or you can use the shell script to train in background
  50. bash scripts/run_train.sh
  51. # run evaluating
  52. DEVICE_ID=7 python eval.py --ckpt_path ./ckpt/lstm-10_149.ckpt
  53. # or you can use the shell script to evaluate in background
  54. bash scripts/run_eval.sh
  55. ```
  56. ## [Script Description](#contents)
  57. ### [Script and Sample Code](#contents)
  58. ```python
  59. ├── model_zoo
  60. ├── README.md // descriptions about all the models
  61. ├── textrcnn
  62. ├── README.md // descriptions about TextRCNN
  63. ├── data_src
  64. │ ├──rt-polaritydata // directory to save the source data
  65. │ ├──rt-polaritydata.README.1.0.txt // readme file of dataset
  66. ├── scripts
  67. │ ├──run_train.sh // shell script for train on Ascend
  68. │ ├──run_eval.sh // shell script for evaluation on Ascend
  69. │ ├──sample.txt // example shell to run the above the two scripts
  70. ├── src
  71. │ ├──dataset.py // creating dataset
  72. │ ├──textrcnn.py // textrcnn architecture
  73. │ ├──config.py // parameter configuration
  74. ├── train.py // training script
  75. ├── eval.py // evaluation script
  76. ├── data_helpers.py // dataset split script
  77. ├── sample.txt // the shell to train and eval the model without scripts
  78. ```
  79. ### [Script Parameters](#contents)
  80. Parameters for both training and evaluation can be set in config.py
  81. - config for Textrcnn, Sentence polarity dataset v1.0.
  82. ```python
  83. 'num_epochs': 10, # total training epochs
  84. 'lstm_num_epochs': 15, # total training epochs when using lstm
  85. 'batch_size': 64, # training batch size
  86. 'cell': 'gru', # the RNN architecture, can be 'vanilla', 'gru' and 'lstm'.
  87. 'ckpt_folder_path': './ckpt', # the path to save the checkpoints
  88. 'preprocess_path': './preprocess', # the directory to save the processed data
  89. 'preprocess' : 'false', # whethere to preprocess the data
  90. 'data_path': './data/', # the path to store the splited data
  91. 'lr': 1e-3, # the training learning rate
  92. 'lstm_lr_init': 2e-3, # learning rate initial value when using lstm
  93. 'lstm_lr_end': 5e-4, # learning rate end value when using lstm
  94. 'lstm_lr_max': 3e-3, # learning eate max value when using lstm
  95. 'lstm_lr_warm_up_epochs': 2 # warm up epoch num when using lstm
  96. 'lstm_lr_adjust_epochs': 9 # lr adjust in lr_adjust_epoch, after that, the lr is lr_end when using lstm
  97. 'emb_path': './word2vec', # the directory to save the embedding file
  98. 'embed_size': 300, # the dimension of the word embedding
  99. 'save_checkpoint_steps': 149, # per step to save the checkpoint
  100. 'keep_checkpoint_max': 10, # max checkpoints to save
  101. 'momentum': 0.9 # the momentum rate
  102. ```
  103. ### Performance
  104. | Model | MindSpore + Ascend | TensorFlow+GPU |
  105. | -------------------------- | ----------------------------- | ------------------------- |
  106. | Resource | Ascend 910 | NV SMX2 V100-32G |
  107. | Version | 1.0.1 | 1.4.0 |
  108. | Dataset | Sentence polarity dataset v1.0 | Sentence polarity dataset v1.0 |
  109. | batch_size | 64 | 64 |
  110. | Accuracy | 0.78 | 0.78 |
  111. | Speed | 35ms/step | 77ms/step |
  112. ## [ModelZoo Homepage](#contents)
  113. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).