You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 4.1 kB

5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
  1. Recommendation Model
  2. ## Overview
  3. This is an implementation of WideDeep as described in the [Wide & Deep Learning for Recommender System](https://arxiv.org/pdf/1606.07792.pdf) paper.
  4. WideDeep model jointly trained wide linear models and deep neural network, which combined the benefits of memorization and generalization for recommender systems.
  5. ## Dataset
  6. The Criteo datasets are used for model training and evaluation.
  7. ## Running Code
  8. ### Code Structure
  9. The entire code structure is as following:
  10. ```
  11. |--- wide_and_deep/
  12. train_and_test.py "Entrance of Wide&Deep model training and evaluation"
  13. test.py "Entrance of Wide&Deep model evaluation"
  14. train.py "Entrance of Wide&Deep model training"
  15. train_and_test_multinpu.py "Entrance of Wide&Deep model data parallel training and evaluation"
  16. |--- src/ "entrance of training and evaluation"
  17. config.py "parameters configuration"
  18. dataset.py "Dataset loader class"
  19. process_data.py "process dataset"
  20. preprocess_data.py "pre_process dataset"
  21. WideDeep.py "Model structure"
  22. callbacks.py "Callback class for training and evaluation"
  23. metrics.py "Metric class"
  24. ```
  25. ### Train and evaluate model
  26. To train and evaluate the model, command as follows:
  27. ```
  28. python train_and_test.py
  29. ```
  30. Arguments:
  31. * `--data_path`: This should be set to the same directory given to the data_download's data_dir argument.
  32. * `--epochs`: Total train epochs.
  33. * `--batch_size`: Training batch size.
  34. * `--eval_batch_size`: Eval batch size.
  35. * `--field_size`: The number of features.
  36. * `--vocab_size`: The total features of dataset.
  37. * `--emb_dim`: The dense embedding dimension of sparse feature.
  38. * `--deep_layers_dim`: The dimension of all deep layers.
  39. * `--deep_layers_act`: The activation of all deep layers.
  40. * `--keep_prob`: The rate to keep in dropout layer.
  41. * `--ckpt_path`:The location of the checkpoint file.
  42. * `--eval_file_name` : Eval output file.
  43. * `--loss_file_name` : Loss output file.
  44. To train the model in one device, command as follows:
  45. ```
  46. python train.py
  47. ```
  48. Arguments:
  49. * `--data_path`: This should be set to the same directory given to the data_download's data_dir argument.
  50. * `--epochs`: Total train epochs.
  51. * `--batch_size`: Training batch size.
  52. * `--eval_batch_size`: Eval batch size.
  53. * `--field_size`: The number of features.
  54. * `--vocab_size`: The total features of dataset.
  55. * `--emb_dim`: The dense embedding dimension of sparse feature.
  56. * `--deep_layers_dim`: The dimension of all deep layers.
  57. * `--deep_layers_act`: The activation of all deep layers.
  58. * `--keep_prob`: The rate to keep in dropout layer.
  59. * `--ckpt_path`:The location of the checkpoint file.
  60. * `--eval_file_name` : Eval output file.
  61. * `--loss_file_name` : Loss output file.
  62. To train the model in distributed, command as follows:
  63. ```
  64. # configure environment path, RANK_TABLE_FILE, RANK_SIZE, MINDSPORE_HCCL_CONFIG_PATH before training
  65. bash run_multinpu_train.sh
  66. ```
  67. To evaluate the model, command as follows:
  68. ```
  69. python test.py
  70. ```
  71. Arguments:
  72. * `--data_path`: This should be set to the same directory given to the data_download's data_dir argument.
  73. * `--epochs`: Total train epochs.
  74. * `--batch_size`: Training batch size.
  75. * `--eval_batch_size`: Eval batch size.
  76. * `--field_size`: The number of features.
  77. * `--vocab_size`: The total features of dataset.
  78. * `--emb_dim`: The dense embedding dimension of sparse feature.
  79. * `--deep_layers_dim`: The dimension of all deep layers.
  80. * `--deep_layers_act`: The activation of all deep layers.
  81. * `--keep_prob`: The rate to keep in dropout layer.
  82. * `--ckpt_path`:The location of the checkpoint file.
  83. * `--eval_file_name` : Eval output file.
  84. * `--loss_file_name` : Loss output file.
  85. There are other arguments about models and training process. Use the `--help` or `-h` flag to get a full list of possible arguments with detailed descriptions.