Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
|
|
4 years ago | |
|---|---|---|
| .. | ||
| script | 4 years ago | |
| src | 4 years ago | |
| README.md | 4 years ago | |
| README_CN.md | 4 years ago | |
| eval.py | 5 years ago | |
| export.py | 4 years ago | |
| mindspore_hub_conf.py | 5 years ago | |
| requirements.txt | 5 years ago | |
| train.py | 5 years ago | |
| train_and_eval.py | 5 years ago | |
| train_and_eval_auto_parallel.py | 5 years ago | |
| train_and_eval_distribute.py | 4 years ago | |
| train_and_eval_parameter_server_distribute.py | 4 years ago | |
| train_and_eval_parameter_server_standalone.py | 4 years ago | |
Wide&Deep model is a classical model in Recommendation and Click Prediction area. This is an implementation of Wide&Deep as described in the Wide & Deep Learning for Recommender System paper.
Wide&Deep model jointly trained wide linear models and deep neural network, which combined the benefits of memorization and generalization for recommender systems.
Currently we support host-device mode with column partition and parameter server mode.
git clone https://gitee.com/mindspore/mindspore.git
cd mindspore/model_zoo/official/recommend/wide_and_deep
Please refer to [1] to obtain the download link
mkdir -p data/origin_data && cd data/origin_data
wget DATA_LINK
tar -zxvf dac.tar.gz
python src/preprocess_data.py --data_path=./data/ --dense_dim=13 --slot_dim=26 --threshold=100 --train_line_count=45840617 --skip_id_convert=0
Once the dataset is ready, the model can be trained and evaluated on the single device(Ascend) by the command as follows:
python train_and_eval.py --data_path=./data/mindrecord --dataset_type=mindrecord
To evaluate the model, command as follows:
python eval.py --data_path=./data/mindrecord --dataset_type=mindrecord
└── wide_and_deep
├── eval.py
├── README.md
├── script
│ ├── cluster_32p.json
│ ├── common.sh
│ ├── deploy_cluster.sh
│ ├── run_auto_parallel_train_cluster.sh
│ ├── run_auto_parallel_train.sh
│ ├── run_multigpu_train.sh
│ ├── run_multinpu_train.sh
│ ├── run_parameter_server_train_cluster.sh
│ ├── run_parameter_server_train.sh
│ ├── run_standalone_train_for_gpu.sh
│ └── start_cluster.sh
├── src
│ ├── callbacks.py
│ ├── config.py
│ ├── datasets.py
│ ├── generate_synthetic_data.py
│ ├── __init__.py
│ ├── metrics.py
│ ├── preprocess_data.py
│ ├── process_data.py
│ └── wide_and_deep.py
├── train_and_eval_auto_parallel.py
├── train_and_eval_distribute.py
├── train_and_eval_parameter_server.py
├── train_and_eval.py
└── train.py
└── export.py
The parameters is same for train.py,train_and_eval.py ,train_and_eval_distribute.py and train_and_eval_auto_parallel.py
usage: train.py [-h] [--device_target {Ascend,GPU}] [--data_path DATA_PATH]
[--epochs EPOCHS] [--full_batch FULL_BATCH]
[--batch_size BATCH_SIZE] [--eval_batch_size EVAL_BATCH_SIZE]
[--field_size FIELD_SIZE] [--vocab_size VOCAB_SIZE]
[--emb_dim EMB_DIM]
[--deep_layer_dim DEEP_LAYER_DIM [DEEP_LAYER_DIM ...]]
[--deep_layer_act DEEP_LAYER_ACT] [--keep_prob KEEP_PROB]
[--dropout_flag DROPOUT_FLAG] [--output_path OUTPUT_PATH]
[--ckpt_path CKPT_PATH] [--eval_file_name EVAL_FILE_NAME]
[--loss_file_name LOSS_FILE_NAME]
[--host_device_mix HOST_DEVICE_MIX]
[--dataset_type DATASET_TYPE]
[--parameter_server PARAMETER_SERVER]
optional arguments:
--device_target {Ascend,GPU} device where the code will be implemented. (Default:Ascend)
--data_path DATA_PATH This should be set to the same directory given to the
data_download's data_dir argument
--epochs EPOCHS Total train epochs. (Default:15)
--full_batch FULL_BATCH Enable loading the full batch. (Default:False)
--batch_size BATCH_SIZE Training batch size.(Default:16000)
--eval_batch_size Eval batch size.(Default:16000)
--field_size The number of features.(Default:39)
--vocab_size The total features of dataset.(Default:200000)
--emb_dim The dense embedding dimension of sparse feature.(Default:80)
--deep_layer_dim The dimension of all deep layers.(Default:[1024,512,256,128])
--deep_layer_act The activation function of all deep layers.(Default:'relu')
--keep_prob The keep rate in dropout layer.(Default:1.0)
--dropout_flag Enable dropout.(Default:0)
--output_path Deprecated
--ckpt_path The location of the checkpoint file. If the checkpoint file
is a slice of weight, multiple checkpoint files need to be
transferred. Use ';' to separate them and sort them in sequence
like "./checkpoints/0.ckpt;./checkpoints/1.ckpt".
(Default:./checkpoints/)
--eval_file_name Eval output file.(Default:eval.og)
--loss_file_name Loss output file.(Default:loss.log)
--host_device_mix Enable host device mode or not.(Default:0)
--dataset_type The data type of the training files, chosen from tfrecord/mindrecord/hd5.(Default:tfrecord)
--parameter_server Open parameter server of not.(Default:0)
usage: generate_synthetic_data.py [-h] [--output_file OUTPUT_FILE]
[--label_dim LABEL_DIM]
[--number_examples NUMBER_EXAMPLES]
[--dense_dim DENSE_DIM]
[--slot_dim SLOT_DIM]
[--vocabulary_size VOCABULARY_SIZE]
[--random_slot_values RANDOM_SLOT_VALUES]
optional arguments:
--output_file The output path of the generated file.(Default: ./train.txt)
--label_dim The label category. (Default:2)
--number_examples The row numbers of the generated file. (Default:4000000)
--dense_dim The number of the continue feature.(Default:13)
--slot_dim The number of the category features.(Default:26)
--vocabulary_size The vocabulary size of the total dataset.(Default:400000000)
--random_slot_values 0 or 1. If 1, the id is generated by the random. If 0, the id is set by the row_index mod part_size, where part_size is the vocab size for each slot
usage: preprocess_data.py [-h]
[--data_path DATA_PATH] [--dense_dim DENSE_DIM]
[--slot_dim SLOT_DIM] [--threshold THRESHOLD]
[--train_line_count TRAIN_LINE_COUNT]
[--skip_id_convert {0,1}]
--data_path The path of the data file.
--dense_dim The number of your continues fields.(default: 13)
--slot_dim The number of your sparse fields, it can also be called category features.(default: 26)
--threshold Word frequency below this value will be regarded as OOV. It aims to reduce the vocab size. (default: 100)
--train_line_count The number of examples in your dataset.
--skip_id_convert 0 or 1. If set 1, the code will skip the id convert, regarding the original id as the final id.(default: 0)
mkdir -p data/origin_data && cd data/origin_data
wget DATA_LINK
tar -zxvf dac.tar.gz
Please refer to [1] to obtain the download link
python src/preprocess_data.py --data_path=./data/ --dense_dim=13 --slot_dim=26 --threshold=100 --train_line_count=45840617 --skip_id_convert=0
"label\tdense_feature[0]\tdense_feature[1]...\tsparse_feature[0]\tsparse_feature[1]...".
mkdir -p syn_data/origin_data
python src/generate_synthetic_data.py --output_file=syn_data/origin_data/train.txt --number_examples=40000000 --dense_dim=13 --slot_dim=51 --vocabulary_size=2000000000 --random_slot_values=0
python src/preprocess_data.py --data_path=./syn_data/ --dense_dim=13 --slot_dim=51 --threshold=0 --train_line_count=40000000 --skip_id_convert=1
To train and evaluate the model, command as follows:
python train_and_eval.py
To train the model in data distributed training, command as follows:
# configure environment path before training
bash run_multinpu_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE
To train the model in model parallel training, commands as follows:
# configure environment path before training
bash run_auto_parallel_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE
To train the model in clusters, command as follows:'''
# deploy wide&deep script in clusters
# CLUSTER_CONFIG is a json file, the sample is in script/.
# EXECUTE_PATH is the scripts path after the deploy.
bash deploy_cluster.sh CLUSTER_CONFIG_PATH EXECUTE_PATH
# enter EXECUTE_PATH, and execute start_cluster.sh as follows.
# MODE: "host_device_mix"
bash start_cluster.sh CLUSTER_CONFIG_PATH EPOCH_SIZE VOCAB_SIZE EMB_DIM
DATASET ENV_SH RANK_TABLE_FILE MODE
To train and evaluate the model in parameter server mode, command as follows:'''
# SERVER_NUM is the number of parameter servers for this task.
# SCHED_HOST is the IP address of scheduler.
# SCHED_PORT is the port of scheduler.
# The number of workers is the same as RANK_SIZE.
bash run_parameter_server_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE SERVER_NUM SCHED_HOST SCHED_PORT
To evaluate the model, command as follows:
python eval.py
| Parameters | Single Ascend |
Single GPU |
Data-Parallel-8P | Host-Device-mode-8P |
|---|---|---|---|---|
| Resource | Ascend 910; OS Euler2.8 | Tesla V100-PCIE 32G | Ascend 910; OS Euler2.8 | Ascend 910; OS Euler2.8 |
| Uploaded Date | 08/21/2020 (month/day/year) | 08/21/2020 (month/day/year) | 08/21/2020 (month/day/year) | 08/21/2020 (month/day/year) |
| MindSpore Version | 1.0 | 1.0 | 1.0 | 1.0 |
| Dataset | [1] | [1] | [1] | [1] |
| Training Parameters | Epoch=15, batch_size=16000 |
Epoch=15, batch_size=16000 |
Epoch=15, batch_size=16000 |
Epoch=15, batch_size=16000 |
| Optimizer | FTRL,Adam | FTRL,Adam | FTRL,Adam | FTRL,Adam |
| Loss Function | SigmoidCrossEntroy | SigmoidCrossEntroy | SigmoidCrossEntroy | SigmoidCrossEntroy |
| AUC Score | 0.80937 | 0.80971 | 0.80862 | 0.80834 |
| Speed | 20.906 ms/step | 24.465 ms/step | 27.388 ms/step | 236.506 ms/step |
| Loss | wide:0.433,deep:0.444 | wide:0.444, deep:0.456 | wide:0.437, deep: 0.448 | wide:0.444, deep:0.444 |
| Params(M) | 75.84 | 75.84 | 75.84 | 75.84 |
| Checkpoint for inference | 233MB(.ckpt file) | 230MB(.ckpt) | 233MB(.ckpt file) | 233MB(.ckpt file) |
All executable scripts can be found in here
Note: The result of GPU is tested under the master version. The parameter server mode of the Wide&Deep model is still under development.
| Parameters | Wide&Deep |
|---|---|
| Resource | Ascend 910; OS Euler2.8 |
| Uploaded Date | 10/27/2020 (month/day/year) |
| MindSpore Version | 1.0 |
| Dataset | [1] |
| Batch Size | 16000 |
| Outputs | AUC |
| Accuracy | AUC=0.809 |
MindSpore support numa bind feature to get better performance from v1.1.1. Need to install numa library:
v1.1.1 support config interface to open numa bind feature:
import mindspore.dataset as de
de.config.set_numa_enable(True)
v1.2.0 support environment variable further to open numa bind feature:
export DATASET_ENABLE_NUMA=True
There are three random situations:
Please check the official homepage.
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
C++ Python Text Unity3D Asset C other