You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 10 kB

4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199
  1. # Contents
  2. - [FCN-4 Description](#fcn-4-description)
  3. - [Model Architecture](#model-architecture)
  4. - [Features](#features)
  5. - [Mixed Precision](#mixed-precision)
  6. - [Environment Requirements](#environment-requirements)
  7. - [Quick Start](#quick-start)
  8. - [Script Description](#script-description)
  9. - [Script and Sample Code](#script-and-sample-code)
  10. - [Script Parameters](#script-parameters)
  11. - [Training Process](#training-process)
  12. - [Training](#training)
  13. - [Evaluation Process](#evaluation-process)
  14. - [Evaluation](#evaluation)
  15. - [Model Description](#model-description)
  16. - [Performance](#performance)
  17. - [Evaluation Performance](#evaluation-performance)
  18. - [ModelZoo Homepage](#modelzoo-homepage)
  19. ## [FCN-4 Description](#contents)
  20. This repository provides a script and recipe to train the FCN-4 model to achieve state-of-the-art accuracy.
  21. [Paper](https://arxiv.org/abs/1606.00298): `"Keunwoo Choi, George Fazekas, and Mark Sandler, “Automatic tagging using deep convolutional neural networks,” in International Society of Music Information Retrieval Conference. ISMIR, 2016."
  22. ## [Model Architecture](#contents)
  23. FCN-4 is a convolutional neural network architecture, its name FCN-4 comes from the fact that it has 4 layers. Its layers consists of Convolutional layers, Max Pooling layers, Activation layers, Fully connected layers.
  24. ## [Features](#contents)
  25. ### Mixed Precision
  26. The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
  27. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
  28. ## [Environment Requirements](#contents)
  29. - Hardware(Ascend
  30. - If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
  31. - Framework
  32. - [MindSpore](https://www.mindspore.cn/install/en)
  33. - For more information, please check the resources below:
  34. - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  35. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  36. ## [Quick Start](#contents)
  37. After installing MindSpore via the official website, you can start training and evaluation as follows:
  38. ### 1. Download and preprocess the dataset
  39. 1. down load the classification dataset (for instance, MagnaTagATune Dataset, Million Song Dataset, etc)
  40. 2. Extract the dataset
  41. 3. The information file of each clip should contain the label and path. Please refer to the annotations_final.csv in MagnaTagATune Dataset.
  42. 4. The provided pre-processing script use MagnaTagATune Dataset as an example. Please modify the code accprding to your own need.
  43. ### 2. setup parameters (src/config.py)
  44. ### 3. Train
  45. after having your dataset, first convert the audio clip into mindrecord dataset by using the following codes
  46. ```shell
  47. python pre_process_data.py --device_id 0
  48. ```
  49. Then, you can start training the model by using the following codes
  50. ```shell
  51. SLOG_PRINT_TO_STDOUT=1 python train.py --device_id 0
  52. ```
  53. ### 4. Test
  54. Then you can test your model
  55. ```shell
  56. SLOG_PRINT_TO_STDOUT=1 python eval.py --device_id 0
  57. ```
  58. ## [Script Description](#contents)
  59. ### [Script and Sample Code](#contents)
  60. ```shell
  61. ├── model_zoo
  62. ├── README.md // descriptions about all the models
  63. ├── music_auto_tagging
  64. ├── README.md // descriptions about googlenet
  65. ├── scripts
  66. │ ├──run_train.sh // shell script for distributed on Ascend
  67. │ ├──run_eval.sh // shell script for evaluation on Ascend
  68. │ ├──run_process_data.sh // shell script for convert audio clips to mindrecord
  69. ├── src
  70. │ ├──dataset.py // creating dataset
  71. │ ├──pre_process_data.py // pre-process dataset
  72. │ ├──musictagger.py // googlenet architecture
  73. │ ├──config.py // parameter configuration
  74. │ ├──loss.py // loss function
  75. │ ├──tag.txt // tag for each number
  76. ├── train.py // training script
  77. ├── eval.py // evaluation script
  78. ├── export.py // export model in air format
  79. ```
  80. ### [Script Parameters](#contents)
  81. Parameters for both training and evaluation can be set in config.py
  82. - config for FCN-4
  83. ```python
  84. 'num_classes': 50 # number of tagging classes
  85. 'num_consumer': 4 # file number for mindrecord
  86. 'get_npy': 1 # mode for converting to npy, default 1 in this case
  87. 'get_mindrecord': 1 # mode for converting npy file into mindrecord file,default 1 in this case
  88. 'audio_path': "/dev/data/Music_Tagger_Data/fea/" # path to audio clips
  89. 'npy_path': "/dev/data/Music_Tagger_Data/fea/" # path to numpy
  90. 'info_path': "/dev/data/Music_Tagger_Data/fea/" # path to info_name, which provide the label of each audio clips
  91. 'info_name': 'annotations_final.csv' # info_name
  92. 'device_target': 'Ascend' # device running the program
  93. 'device_id': 0 # device ID used to train or evaluate the dataset. Ignore it when you use run_train.sh for distributed training
  94. 'mr_path': '/dev/data/Music_Tagger_Data/fea/' # path to mindrecord
  95. 'mr_name': ['train', 'val'] # mindrecord name
  96. 'pre_trained': False # whether training based on the pre-trained model
  97. 'lr': 0.0005 # learning rate
  98. 'batch_size': 32 # training batch size
  99. 'epoch_size': 10 # total training epochs
  100. 'loss_scale': 1024.0 # loss scale
  101. 'num_consumer': 4 # file number for mindrecord
  102. 'mixed_precision': False # if use mix precision calculation
  103. 'train_filename': 'train.mindrecord0' # file name of the train mindrecord data
  104. 'val_filename': 'val.mindrecord0' # file name of the evaluation mindrecord data
  105. 'data_dir': '/dev/data/Music_Tagger_Data/fea/' # directory of mindrecord data
  106. 'device_target': 'Ascend' # device running the program
  107. 'device_id': 0, # device ID used to train or evaluate the dataset. Ignore it when you use run_train.sh for distributed training
  108. 'keep_checkpoint_max': 10, # only keep the last keep_checkpoint_max checkpoint
  109. 'save_step': 2000, # steps for saving checkpoint
  110. 'checkpoint_path': '/dev/data/Music_Tagger_Data/model/', # the absolute full path to save the checkpoint file
  111. 'prefix': 'MusicTagger', # prefix of checkpoint
  112. 'model_name': 'MusicTagger_3-50_543.ckpt', # checkpoint name
  113. ```
  114. ### [Training Process](#contents)
  115. #### Training
  116. - running on Ascend
  117. ```shell
  118. python train.py > train.log 2>&1 &
  119. ```
  120. The python command above will run in the background, you can view the results through the file `train.log`.
  121. After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
  122. ```shell
  123. # grep "loss is " train.log
  124. epoch: 1 step: 100, loss is 0.23264095
  125. epoch: 1 step: 200, loss is 0.2013525
  126. ...
  127. ```
  128. The model checkpoint will be saved in the set directory.
  129. ### [Evaluation Process](#contents)
  130. #### Evaluation
  131. ## [Model Description](#contents)
  132. ### [Performance](#contents)
  133. #### Evaluation Performance
  134. | Parameters | Ascend |
  135. | -------------------------- | ----------------------------------------------------------- |
  136. | Model Version | FCN-4 |
  137. | Resource | Ascend 910 ;CPU 2.60GHz,56cores;Memory,314G |
  138. | uploaded Date | 09/11/2020 (month/day/year) |
  139. | MindSpore Version | r0.7.0 |
  140. | Training Parameters | epoch=10, steps=534, batch_size = 32, lr=0.005 |
  141. | Optimizer | Adam |
  142. | Loss Function | Binary cross entropy |
  143. | outputs | probability |
  144. | Loss | AUC 0.909 |
  145. | Speed | 1pc: 160 samples/sec; |
  146. | Total time | 1pc: 20 mins; |
  147. | Checkpoint for Fine tuning | 198.73M(.ckpt file) |
  148. | Scripts | [music_auto_tagging script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/audio/fcn-4) |
  149. ## [ModelZoo Homepage](#contents)
  150. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).