You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 14 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311
  1. # Contents
  2. - [DS-CNN Description](#DS-CNN-description)
  3. - [Model Architecture](#model-architecture)
  4. - [Dataset](#dataset)
  5. - [Environment Requirements](#environment-requirements)
  6. - [Quick Start](#quick-start)
  7. - [Script Description](#script-description)
  8. - [Script and Sample Code](#script-and-sample-code)
  9. - [Script Parameters](#script-parameters)
  10. - [Training Process](#training-process)
  11. - [Training](#training)
  12. - [Evaluation Process](#evaluation-process)
  13. - [Evaluation](#evaluation)
  14. - [Model Description](#model-description)
  15. - [Performance](#performance)
  16. - [Evaluation Performance](#evaluation-performance)
  17. - [Inference Performance](#evaluation-performance)
  18. - [How to use](#how-to-use)
  19. - [Inference](#inference)
  20. - [Continue Training on the Pretrained Model](#continue-training-on-the-pretrained-model)
  21. - [Transfer Learning](#transfer-learning)
  22. - [Description of Random Situation](#description-of-random-situation)
  23. - [ModelZoo Homepage](#modelzoo-homepage)
  24. # [DS-CNN Description](#contents)
  25. DS-CNN, depthwise separable convolutional neural network, was first used in Keyword Spotting in 2017. KWS application has highly constrained power budget and typically runs on tiny microcontrollers with limited memory and compute capability. depthwise separable convolutions are more efficient both in number of parameters and operations, which makes deeper and wider architecture possible even in the resource-constrained microcontroller devices.
  26. [Paper](https://arxiv.org/abs/1711.07128): Zhang, Yundong, Naveen Suda, Liangzhen Lai, and Vikas Chandra. "Hello edge: Keyword spotting on microcontrollers." arXiv preprint arXiv:1711.07128 (2017).
  27. # [Model Architecture](#contents)
  28. The overall network architecture of DS-CNN is show below:
  29. [Link](https://arxiv.org/abs/1711.07128)
  30. # [Dataset](#contents)
  31. Dataset used: [Speech commands dataset version 1](https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html)
  32. - Dataset size:2.02GiB, 65,000 one-second long utterances of 30 short words, by thousands of different people
  33. - Train: 80%
  34. - Val: 10%
  35. - Test: 10%
  36. - Data format:WAVE format file, with the sample data encoded as linear 16-bit single-channel PCM values, at a 16 KHz rate
  37. - Note:Data will be processed in download_process_data.py
  38. Dataset used: [Speech commands dataset version 2](https://arxiv.org/abs/1804.03209)
  39. - Dataset size: 8.17 GiB. 105,829 a one-second (or less) long utterances of 35 words by 2,618 speakers
  40. - Train: 80%
  41. - Val: 10%
  42. - Test: 10%
  43. - Data format:WAVE format file, with the sample data encoded as linear 16-bit single-channel PCM values, at a 16 KHz rate
  44. - Note:Data will be processed in download_process_data.py
  45. # [Environment Requirements](#contents)
  46. - Hardware(Ascend/GPU)
  47. - Prepare hardware environment with Ascend or GPU processor.
  48. - Framework
  49. - [MindSpore](https://www.mindspore.cn/install/en)
  50. - Third party open source package(if have)
  51. - numpy
  52. - soundfile
  53. - python_speech_features
  54. - For more information, please check the resources below:
  55. - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  56. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  57. # [Quick Start](#contents)
  58. After installing MindSpore via the official website, you can start training and evaluation as follows:
  59. First set the config for data, train, eval in src/config.py
  60. - download and process dataset
  61. ```bash
  62. python src/download_process_data.py
  63. ```
  64. - running on Ascend
  65. ```python
  66. # run training example
  67. python train.py
  68. # run evaluation example
  69. # if you want to eval a specific model, you should specify model_dir to the ckpt path:
  70. python eval.py --model_dir your_ckpt_path
  71. # if you want to eval all the model you saved, you should specify model_dir to the folder where the models are saved.
  72. python eval.py --model_dir your_models_folder_path
  73. ```
  74. # [Script Description](#contents)
  75. ## [Script and Sample Code](#contents)
  76. ```text
  77. ├── dscnn
  78. ├── README.md // descriptions about ds-cnn
  79. ├── scripts
  80. │ ├──run_download_process_data.sh // shell script for download dataset and prepare feature and label
  81. │ ├──run_train_ascend.sh // shell script for train on ascend
  82. │ ├──run_eval_ascend.sh // shell script for evaluation on ascend
  83. ├── src
  84. │ ├──callback.py // callbacks
  85. │ ├──config.py // parameter configuration of data, train and eval
  86. │ ├──dataset.py // creating dataset
  87. │ ├──download_process_data.py // download and prepare train, val, test data
  88. │ ├──ds_cnn.py // dscnn architecture
  89. │ ├──log.py // logging class
  90. │ ├──loss.py // loss function
  91. │ ├──lr_scheduler.py // lr_scheduler
  92. │ ├──models.py // load ckpt
  93. │ ├──utils.py // some function for prepare data
  94. ├── train.py // training script
  95. ├── eval.py // evaluation script
  96. ├── export.py // export checkpoint files into air/geir
  97. ├── requirements.txt // Third party open source package
  98. ```
  99. ## [Script Parameters](#contents)
  100. Parameters for both training and evaluation can be set in config.py.
  101. - config for dataset for Speech commands dataset version 1
  102. ```python
  103. 'data_url': 'http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz'
  104. # Location of speech training data archive on the web
  105. 'data_dir': 'data' # Where to download the dataset
  106. 'feat_dir': 'feat' # Where to save the feature and label of audios
  107. 'background_volume': 0.1 # How loud the background noise should be, between 0 and 1.
  108. 'background_frequency': 0.8 # How many of the training samples have background noise mixed in.
  109. 'silence_percentage': 10.0 # How much of the training data should be silence.
  110. 'unknown_percentage': 10.0 # How much of the training data should be unknown words
  111. 'time_shift_ms': 100.0 # Range to randomly shift the training audio by in time
  112. 'testing_percentage': 10 # What percentage of wavs to use as a test set
  113. 'validation_percentage': 10 # What percentage of wavs to use as a validation set
  114. 'wanted_words': 'yes,no,up,down,left,right,on,off,stop,go'
  115. # Words to use (others will be added to an unknown label)
  116. 'sample_rate': 16000 # Expected sample rate of the wavs
  117. 'device_id': 1000 # device ID used to train or evaluate the dataset.
  118. 'clip_duration_ms': 10 # Expected duration in milliseconds of the wavs
  119. 'window_size_ms': 40.0 # How long each spectrogram timeslice is
  120. 'window_stride_ms': 20.0 # How long each spectrogram timeslice is
  121. 'dct_coefficient_count': 20 # How many bins to use for the MFCC fingerprint
  122. ```
  123. - config for DS-CNN and train parameters of Speech commands dataset version 1
  124. ```python
  125. 'model_size_info': [6, 276, 10, 4, 2, 1, 276, 3, 3, 2, 2, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1]
  126. # Model dimensions - different for various models
  127. 'drop': 0.9 # dropout
  128. 'pretrained': '' # model_path, local pretrained model to load
  129. 'use_graph_mode': 1 # use graph mode or feed mode
  130. 'val_interval': 1 # validate interval
  131. 'per_batch_size': 100 # batch size for per gpu
  132. 'lr_scheduler': 'multistep' # lr-scheduler, option type: multistep, cosine_annealing
  133. 'lr': 0.1 # learning rate of the training
  134. 'lr_epochs': '20,40,60,80' # epoch of lr changing
  135. 'lr_gamma': 0.1 # decrease lr by a factor of exponential lr_scheduler
  136. 'eta_min': 0 # eta_min in cosine_annealing scheduler
  137. 'T_max': 80 # T-max in cosine_annealing scheduler
  138. 'max_epoch': 80 # max epoch num to train the model
  139. 'warmup_epochs': 0 # warmup epoch
  140. 'weight_decay': 0.001 # weight decay
  141. 'momentum': 0.98 # weight decay
  142. 'log_interval': 100 # logging interval
  143. 'ckpt_path': 'train_outputs' # the location where checkpoint and log will be saved
  144. 'ckpt_interval': 100 # save ckpt_interval
  145. ```
  146. - config for DS-CNN and evaluation parameters of Speech commands dataset version 1
  147. ```python
  148. 'feat_dir': 'feat' # Where to save the feature of audios
  149. 'model_dir': '' # which folder the models are saved in or specific path of one model
  150. 'wanted_words': 'yes,no,up,down,left,right,on,off,stop,go'
  151. # Words to use (others will be added to an unknown label)
  152. 'sample_rate': 16000 # Expected sample rate of the wavs
  153. 'device_id': 1000 # device ID used to train or evaluate the dataset.
  154. 'clip_duration_ms': 10 # Expected duration in milliseconds of the wavs
  155. 'window_size_ms': 40.0 # How long each spectrogram timeslice is
  156. 'window_stride_ms': 20.0 # How long each spectrogram timeslice is
  157. 'dct_coefficient_count': 20 # How many bins to use for the MFCC fingerprint
  158. 'model_size_info': [6, 276, 10, 4, 2, 1, 276, 3, 3, 2, 2, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1]
  159. # Model dimensions - different for various models
  160. 'pre_batch_size': 100 # batch size for eval
  161. 'drop': 0.9 # dropout in train
  162. 'log_path': 'eval_outputs' # path to save eval log
  163. ```
  164. ## [Training Process](#contents)
  165. ### Training
  166. - running on Ascend
  167. for shell script:
  168. ```python
  169. # sh scripts/run_train_ascend.sh [device_id]
  170. sh scripts/run_train_ascend.sh 0
  171. ```
  172. for python script:
  173. ```python
  174. # python train.py --device_id [device_id]
  175. python train.py --device_id 0
  176. ```
  177. you can see the args and loss, acc info on your screen, you also can view the results in folder train_outputs
  178. ```python
  179. epoch[1], iter[443], loss:0.73811543, mean_wps:12102.26 wavs/sec
  180. Eval: top1_cor:737, top5_cor:1699, tot:3000, acc@1=24.57%, acc@5=56.63%
  181. epoch[2], iter[665], loss:0.381568, mean_wps:12107.45 wavs/sec
  182. Eval: top1_cor:1355, top5_cor:2615, tot:3000, acc@1=45.17%, acc@5=87.17%
  183. ...
  184. ...
  185. Best epoch:41 acc:93.73%
  186. ```
  187. The checkpoints and log will be saved in the train_outputs.
  188. ## [Evaluation Process](#contents)
  189. ### Evaluation
  190. - evaluation on Speech commands dataset version 1 when running on Ascend
  191. Before running the command below, please check the checkpoint path used for evaluation. Please set model_dir in config.py or pass model_dir in your command line.
  192. for shell scripts:
  193. ```bash
  194. # sh scripts/run_eval_ascend.sh device_id model_dir
  195. sh scripts/run_eval_ascend.sh 0 train_outputs/*/*.ckpt
  196. or
  197. sh scripts/run_eval_ascend.sh 0 train_outputs/*/
  198. ```
  199. for python scripts:
  200. ```bash
  201. # python eval.py --device_id device_id --model_dir model_dir
  202. python eval.py --device_id 0 --model_dir train_outputs/*/*.ckpt
  203. or
  204. python eval.py --device_id 0 --model_dir train_outputs/*
  205. ```
  206. You can view the results on the screen or from logs in eval_outputs folder. The accuracy of the test dataset will be as follows:
  207. ```python
  208. Eval: top1_cor:2805, top5_cor:2963, tot:3000, acc@1=93.50%, acc@5=98.77%
  209. Best model:train_outputs/*/epoch41-1_223.ckpt acc:93.50%
  210. ```
  211. # [Model Description](#contents)
  212. ## [Performance](#contents)
  213. ### Train Performance
  214. | Parameters | Ascend |
  215. | -------------------------- | ------------------------------------------------------------ |
  216. | Model Version | DS-CNN |
  217. | Resource | Ascend 910; CPU 2.60GHz, 56cores; Memory 314G; OS Euler2.8 |
  218. | uploaded Date | 27/09/2020 (month/day/year) |
  219. | MindSpore Version | 1.0.0 |
  220. | Dataset | Speech commands dataset version 1 |
  221. | Training Parameters | epoch=80, batch_size = 100, lr=0.1 |
  222. | Optimizer | Momentum |
  223. | Loss Function | Softmax Cross Entropy |
  224. | outputs | probability |
  225. | Loss | 0.0019 |
  226. | Speed | 2s/epoch |
  227. | Total time | 4 mins |
  228. | Parameters (K) | 500K |
  229. | Checkpoint for Fine tuning | 3.3M (.ckpt file) |
  230. | Script | [Link]() | [Link]() |
  231. ### Inference Performance
  232. | Parameters | Ascend |
  233. | ------------------- | --------------------------- |
  234. | Model Version | DS-CNN |
  235. | Resource | Ascend 910; OS Euler2.8 |
  236. | Uploaded Date | 09/27/2020 |
  237. | MindSpore Version | 1.0.0 |
  238. | Dataset |Speech commands dataset version 1 |
  239. | Training Parameters | src/config.py |
  240. | outputs | probability |
  241. | Accuracy | 93.96% |
  242. | Total time | 3min |
  243. | Params (K) | 500K |
  244. |Checkpoint for Fine tuning (M) | 3.3M |
  245. # [Description of Random Situation](#contents)
  246. In download_process_data.py, we set the seed for split train, val, test set.
  247. # [ModelZoo Homepage](#contents)
  248. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).