You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 5.1 kB

5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
  1. Recommendation Model
  2. ## Overview
  3. This is an implementation of WideDeep as described in the [Wide & Deep Learning for Recommender System](https://arxiv.org/pdf/1606.07792.pdf) paper.
  4. WideDeep model jointly trained wide linear models and deep neural network, which combined the benefits of memorization and generalization for recommender systems.
  5. ## Requirements
  6. - Install [MindSpore](https://www.mindspore.cn/install/en).
  7. - Download the dataset and convert the dataset to mindrecord, command as follows:
  8. ```
  9. python src/preprocess_data.py
  10. ```
  11. Arguments:
  12. * `--data_path`: Dataset storage path (Default: ./criteo_data/).
  13. ## Dataset
  14. The Criteo datasets are used for model training and evaluation.
  15. ## Running Code
  16. ### Code Structure
  17. The entire code structure is as following:
  18. ```
  19. |--- wide_and_deep/
  20. train_and_eval.py "Entrance of Wide&Deep model training and evaluation"
  21. eval.py "Entrance of Wide&Deep model evaluation"
  22. train.py "Entrance of Wide&Deep model training"
  23. train_and_eval_multinpu.py "Entrance of Wide&Deep model data parallel training and evaluation"
  24. train_and_eval_auto_parallel.py
  25. |--- src/ "Entrance of training and evaluation"
  26. config.py "Parameters configuration"
  27. dataset.py "Dataset loader class"
  28. process_data.py "Process dataset"
  29. preprocess_data.py "Pre_process dataset"
  30. wide_and_deep.py "Model structure"
  31. callbacks.py "Callback class for training and evaluation"
  32. metrics.py "Metric class"
  33. |--- script/ "Run shell dir"
  34. run_multinpu_train.sh "Run data parallel"
  35. run_auto_parallel_train.sh "Run auto parallel"
  36. ```
  37. ### Train and evaluate model
  38. To train and evaluate the model, command as follows:
  39. ```
  40. python train_and_eval.py
  41. ```
  42. Arguments:
  43. * `--device_target`: Device where the code will be implemented (Default: Ascend).
  44. * `--data_path`: This should be set to the same directory given to the data_download's data_dir argument.
  45. * `--epochs`: Total train epochs.
  46. * `--batch_size`: Training batch size.
  47. * `--eval_batch_size`: Eval batch size.
  48. * `--field_size`: The number of features.
  49. * `--vocab_size`: The total features of dataset.
  50. * `--emb_dim`: The dense embedding dimension of sparse feature.
  51. * `--deep_layers_dim`: The dimension of all deep layers.
  52. * `--deep_layers_act`: The activation of all deep layers.
  53. * `--dropout_flag`: Whether do dropout.
  54. * `--keep_prob`: The rate to keep in dropout layer.
  55. * `--ckpt_path`:The location of the checkpoint file.
  56. * `--eval_file_name` : Eval output file.
  57. * `--loss_file_name` : Loss output file.
  58. To train the model in one device, command as follows:
  59. ```
  60. python train.py
  61. ```
  62. Arguments:
  63. * `--device_target`: Device where the code will be implemented (Default: Ascend).
  64. * `--data_path`: This should be set to the same directory given to the data_download's data_dir argument.
  65. * `--epochs`: Total train epochs.
  66. * `--batch_size`: Training batch size.
  67. * `--eval_batch_size`: Eval batch size.
  68. * `--field_size`: The number of features.
  69. * `--vocab_size`: The total features of dataset.
  70. * `--emb_dim`: The dense embedding dimension of sparse feature.
  71. * `--deep_layers_dim`: The dimension of all deep layers.
  72. * `--deep_layers_act`: The activation of all deep layers.
  73. * `--dropout_flag`: Whether do dropout.
  74. * `--keep_prob`: The rate to keep in dropout layer.
  75. * `--ckpt_path`:The location of the checkpoint file.
  76. * `--eval_file_name` : Eval output file.
  77. * `--loss_file_name` : Loss output file.
  78. To train the model in distributed, command as follows:
  79. ```
  80. # configure environment path before training
  81. bash run_multinpu_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE
  82. ```
  83. ```
  84. # configure environment path before training
  85. bash run_auto_parallel_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE
  86. ```
  87. To evaluate the model, command as follows:
  88. ```
  89. python eval.py
  90. ```
  91. Arguments:
  92. * `--device_target`: Device where the code will be implemented (Default: Ascend).
  93. * `--data_path`: This should be set to the same directory given to the data_download's data_dir argument.
  94. * `--epochs`: Total train epochs.
  95. * `--batch_size`: Training batch size.
  96. * `--eval_batch_size`: Eval batch size.
  97. * `--field_size`: The number of features.
  98. * `--vocab_size`: The total features of dataset.
  99. * `--emb_dim`: The dense embedding dimension of sparse feature.
  100. * `--deep_layers_dim`: The dimension of all deep layers.
  101. * `--deep_layers_act`: The activation of all deep layers.
  102. * `--keep_prob`: The rate to keep in dropout layer.
  103. * `--ckpt_path`:The location of the checkpoint file.
  104. * `--eval_file_name` : Eval output file.
  105. * `--loss_file_name` : Loss output file.
  106. There are other arguments about models and training process. Use the `--help` or `-h` flag to get a full list of possible arguments with detailed descriptions.