History

xiefangqi 12faaa2462 add numa info to wide&deep README		4 years ago
..
script	replace ps-lite	4 years ago

src	fixed the code spell errors.	4 years ago

README.md	add numa info to wide&deep README	4 years ago

README_CN.md	add numa info to wide&deep README	4 years ago

eval.py	wide_and_deep merge ckpt in eval	5 years ago

export.py	fix GPU device_id bug	4 years ago

mindspore_hub_conf.py	wide_and_deep mindspore_hub_config	5 years ago

requirements.txt	wide&deep only save 0ckpt in data parallel	5 years ago

train.py	modezoo wide&deep run clusters	5 years ago

train_and_eval.py	fix eval error in single device and data parallel mode	5 years ago

train_and_eval_auto_parallel.py	Fix batch size check	5 years ago

train_and_eval_distribute.py	fix the linear ratio of vgg16, deepfm and wide_deep	4 years ago

train_and_eval_parameter_server_distribute.py	ps cache support sparse	4 years ago

train_and_eval_parameter_server_standalone.py	ps cache support sparse	4 years ago

README.md

Wide&Deep Description

Wide&Deep model is a classical model in Recommendation and Click Prediction area. This is an implementation of Wide&Deep as described in the Wide & Deep Learning for Recommender System paper.

Model Architecture

Wide&Deep model jointly trained wide linear models and deep neural network, which combined the benefits of memorization and generalization for recommender systems.

Currently we support host-device mode with column partition and parameter server mode.

Dataset

[1] A dataset used in Guo H , Tang R , Ye Y , et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction[J]. 2017.

Environment Requirements

Hardware（Ascend or GPU）
- Prepare hardware environment with Ascend processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

Clone the Code

git clone https://gitee.com/mindspore/mindspore.git
cd mindspore/model_zoo/official/recommend/wide_and_deep

Download the Dataset

Please refer to [1] to obtain the download link

mkdir -p data/origin_data && cd data/origin_data
wget DATA_LINK
tar -zxvf dac.tar.gz

Use this script to preprocess the data. This may take about one hour and the generated mindrecord data is under data/mindrecord.

python src/preprocess_data.py  --data_path=./data/ --dense_dim=13 --slot_dim=26 --threshold=100 --train_line_count=45840617 --skip_id_convert=0

Start Training

Once the dataset is ready, the model can be trained and evaluated on the single device(Ascend) by the command as follows:

python train_and_eval.py --data_path=./data/mindrecord --dataset_type=mindrecord

To evaluate the model, command as follows:

python eval.py  --data_path=./data/mindrecord --dataset_type=mindrecord

Script Description

Script and Sample Code

└── wide_and_deep
    ├── eval.py
    ├── README.md
    ├── script
    │   ├── cluster_32p.json
    │   ├── common.sh
    │   ├── deploy_cluster.sh
    │   ├── run_auto_parallel_train_cluster.sh
    │   ├── run_auto_parallel_train.sh
    │   ├── run_multigpu_train.sh
    │   ├── run_multinpu_train.sh
    │   ├── run_parameter_server_train_cluster.sh
    │   ├── run_parameter_server_train.sh
    │   ├── run_standalone_train_for_gpu.sh
    │   └── start_cluster.sh
    ├── src
    │   ├── callbacks.py
    │   ├── config.py
    │   ├── datasets.py
    │   ├── generate_synthetic_data.py
    │   ├── __init__.py
    │   ├── metrics.py
    │   ├── preprocess_data.py
    │   ├── process_data.py
    │   └── wide_and_deep.py
    ├── train_and_eval_auto_parallel.py
    ├── train_and_eval_distribute.py
    ├── train_and_eval_parameter_server.py
    ├── train_and_eval.py
    └── train.py
    └── export.py

Script Parameters

Training Script Parameters

The parameters is same for train.py,train_and_eval.py ,train_and_eval_distribute.py and train_and_eval_auto_parallel.py

usage: train.py [-h] [--device_target {Ascend,GPU}] [--data_path DATA_PATH]
                [--epochs EPOCHS] [--full_batch FULL_BATCH]
                [--batch_size BATCH_SIZE] [--eval_batch_size EVAL_BATCH_SIZE]
                [--field_size FIELD_SIZE] [--vocab_size VOCAB_SIZE]
                [--emb_dim EMB_DIM]
                [--deep_layer_dim DEEP_LAYER_DIM [DEEP_LAYER_DIM ...]]
                [--deep_layer_act DEEP_LAYER_ACT] [--keep_prob KEEP_PROB]
                [--dropout_flag DROPOUT_FLAG] [--output_path OUTPUT_PATH]
                [--ckpt_path CKPT_PATH] [--eval_file_name EVAL_FILE_NAME]
                [--loss_file_name LOSS_FILE_NAME]
                [--host_device_mix HOST_DEVICE_MIX]
                [--dataset_type DATASET_TYPE]
                [--parameter_server PARAMETER_SERVER]

optional arguments:
  --device_target {Ascend,GPU}        device where the code will be implemented. (Default:Ascend)
  --data_path DATA_PATH               This should be set to the same directory given to the
                                      data_download's data_dir argument
  --epochs EPOCHS                     Total train epochs. (Default:15)
  --full_batch FULL_BATCH             Enable loading the full batch. (Default:False)
  --batch_size BATCH_SIZE             Training batch size.(Default:16000)
  --eval_batch_size                   Eval batch size.(Default:16000)
  --field_size                        The number of features.(Default:39)
  --vocab_size                        The total features of dataset.(Default:200000)
  --emb_dim                           The dense embedding dimension of sparse feature.(Default:80)
  --deep_layer_dim                    The dimension of all deep layers.(Default:[1024,512,256,128])
  --deep_layer_act                    The activation function of all deep layers.(Default:'relu')
  --keep_prob                         The keep rate in dropout layer.(Default:1.0)
  --dropout_flag                      Enable dropout.(Default:0)
  --output_path                       Deprecated
  --ckpt_path                         The location of the checkpoint file. If the checkpoint file
                                      is a slice of weight, multiple checkpoint files need to be
                                      transferred. Use ';' to separate them and sort them in sequence
                                      like "./checkpoints/0.ckpt;./checkpoints/1.ckpt".
                                      (Default:./checkpoints/)
  --eval_file_name                    Eval output file.(Default:eval.og)
  --loss_file_name                    Loss output file.(Default:loss.log)
  --host_device_mix                   Enable host device mode or not.(Default:0)
  --dataset_type                      The data type of the training files, chosen from tfrecord/mindrecord/hd5.(Default:tfrecord)
  --parameter_server                  Open parameter server of not.(Default:0)

Preprocess Script Parameters

usage: generate_synthetic_data.py [-h] [--output_file OUTPUT_FILE]
                                  [--label_dim LABEL_DIM]
                                  [--number_examples NUMBER_EXAMPLES]
                                  [--dense_dim DENSE_DIM]
                                  [--slot_dim SLOT_DIM]
                                  [--vocabulary_size VOCABULARY_SIZE]
                                  [--random_slot_values RANDOM_SLOT_VALUES]
optional arguments:
  --output_file                        The output path of the generated file.(Default: ./train.txt)
  --label_dim                          The label category. (Default:2)
  --number_examples                    The row numbers of the generated file. (Default:4000000)
  --dense_dim                          The number of the continue feature.(Default:13)
  --slot_dim                           The number of the category features.(Default:26)
  --vocabulary_size                    The vocabulary size of the total dataset.(Default:400000000)
  --random_slot_values                 0 or 1. If 1, the id is generated by the random. If 0, the id is set by the row_index mod           part_size, where part_size is the vocab size for each slot

usage: preprocess_data.py [-h]
                          [--data_path DATA_PATH] [--dense_dim DENSE_DIM]
                          [--slot_dim SLOT_DIM] [--threshold THRESHOLD]
                          [--train_line_count TRAIN_LINE_COUNT]
                          [--skip_id_convert {0,1}]

  --data_path                         The path of the data file.
  --dense_dim                         The number of your continues fields.(default: 13)
  --slot_dim                          The number of your sparse fields, it can also be called category features.(default: 26)
  --threshold                         Word frequency below this value will be regarded as OOV. It aims to reduce the vocab size.           (default: 100)
  --train_line_count                  The number of examples in your dataset.
  --skip_id_convert                   0 or 1. If set 1, the code will skip the id convert, regarding the original id as the final id.(default: 0)

Dataset Preparation

Process the Real World Data

Download the Dataset and place the raw dataset under a certain path, such as: ./data/origin_data

mkdir -p data/origin_data && cd data/origin_data
wget DATA_LINK
tar -zxvf dac.tar.gz

Please refer to [1] to obtain the download link

Use this script to preprocess the data

python src/preprocess_data.py  --data_path=./data/ --dense_dim=13 --slot_dim=26 --threshold=100 --train_line_count=45840617 --skip_id_convert=0

Generate and Process the Synthetic Data

The following command will generate 40 million lines of click data, in the format of

"label\tdense_feature[0]\tdense_feature[1]...\tsparse_feature[0]\tsparse_feature[1]...".

mkdir -p syn_data/origin_data
python src/generate_synthetic_data.py --output_file=syn_data/origin_data/train.txt --number_examples=40000000 --dense_dim=13 --slot_dim=51 --vocabulary_size=2000000000 --random_slot_values=0

Preprocess the generated data

python src/preprocess_data.py --data_path=./syn_data/  --dense_dim=13 --slot_dim=51 --threshold=0 --train_line_count=40000000 --skip_id_convert=1

Training Process

SingleDevice

To train and evaluate the model, command as follows:

python train_and_eval.py

Distribute Training

To train the model in data distributed training, command as follows:

# configure environment path before training
bash run_multinpu_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE

To train the model in model parallel training, commands as follows:

# configure environment path before training
bash run_auto_parallel_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE

To train the model in clusters, command as follows:'''

# deploy wide&deep script in clusters
# CLUSTER_CONFIG is a json file, the sample is in script/.
# EXECUTE_PATH is the scripts path after the deploy.
bash deploy_cluster.sh CLUSTER_CONFIG_PATH EXECUTE_PATH

# enter EXECUTE_PATH, and execute start_cluster.sh as follows.
# MODE: "host_device_mix"
bash start_cluster.sh CLUSTER_CONFIG_PATH EPOCH_SIZE VOCAB_SIZE EMB_DIM
                      DATASET ENV_SH RANK_TABLE_FILE MODE

Parameter Server

To train and evaluate the model in parameter server mode, command as follows:'''

# SERVER_NUM is the number of parameter servers for this task.
# SCHED_HOST is the IP address of scheduler.
# SCHED_PORT is the port of scheduler.
# The number of workers is the same as RANK_SIZE.
bash run_parameter_server_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE SERVER_NUM SCHED_HOST SCHED_PORT

Evaluation Process

To evaluate the model, command as follows:

python eval.py

Model Description

Performance

Training Performance

Parameters	Single Ascend	Single GPU	Data-Parallel-8P	Host-Device-mode-8P
Resource	Ascend 910; OS Euler2.8	Tesla V100-PCIE 32G	Ascend 910; OS Euler2.8	Ascend 910; OS Euler2.8
Uploaded Date	08/21/2020 (month/day/year)	08/21/2020 (month/day/year)	08/21/2020 (month/day/year)	08/21/2020 (month/day/year)
MindSpore Version	1.0	1.0	1.0	1.0
Dataset	[1]	[1]	[1]	[1]
Training Parameters	Epoch=15, batch_size=16000	Epoch=15, batch_size=16000	Epoch=15, batch_size=16000	Epoch=15, batch_size=16000
Optimizer	FTRL,Adam	FTRL,Adam	FTRL,Adam	FTRL,Adam
Loss Function	SigmoidCrossEntroy	SigmoidCrossEntroy	SigmoidCrossEntroy	SigmoidCrossEntroy
AUC Score	0.80937	0.80971	0.80862	0.80834
Speed	20.906 ms/step	24.465 ms/step	27.388 ms/step	236.506 ms/step
Loss	wide:0.433,deep:0.444	wide:0.444, deep:0.456	wide:0.437, deep: 0.448	wide:0.444, deep:0.444
Params(M)	75.84	75.84	75.84	75.84
Checkpoint for inference	233MB(.ckpt file)	230MB(.ckpt)	233MB(.ckpt file)	233MB(.ckpt file)

All executable scripts can be found in here

Note: The result of GPU is tested under the master version. The parameter server mode of the Wide&Deep model is still under development.

Evaluation Performance

Parameters	Wide&Deep
Resource	Ascend 910; OS Euler2.8
Uploaded Date	10/27/2020 (month/day/year)
MindSpore Version	1.0
Dataset	[1]
Batch Size	16000
Outputs	AUC
Accuracy	AUC=0.809

Ultimate performance experience

MindSpore support numa bind feature to get better performance from v1.1.1. Need to install numa library:

ubuntu : sudo apt-get install libnuma-dev
centos/euleros : sudo yum install numactl-devel

v1.1.1 support config interface to open numa bind feature：

import mindspore.dataset as de
de.config.set_numa_enable(True)

v1.2.0 support environment variable further to open numa bind feature：

export DATASET_ENABLE_NUMA=True

Description of Random Situation

There are three random situations:

Shuffle of the dataset.
Initialization of some model weights.
Dropout operations.

ModelZoo Homepage

Please check the official homepage.

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ Python Text Unity3D Asset C other

314202276@qq.com 5518576+mindspore_ci@user.noreply.gitee.com tommylike@qq.com zhaozhenlong1@huawei.com zhoufeng54@huawei.com sunsuodong@huawei.com wangkaisheng2@huawei.com yangruoqi@huawei.com shiliang10@huawei.com xiefangqi2@huawei.com caifubi1@huawei.com lingqiaomin.huawei.com chenweifeng720@huawei.com fuzhiye@huawei.com liubuyu1@huawei.com changzherui1@huawei.com guozhijian@huawei.com huanghui44@huawei.com zhaoting23@huawei.com liuxiao93@huawei.com peixu.ren1@huawei.com yaoyifan1@huawei.com lizhenyu13@huawei.com xuanyue@huawei.com yuchaojie1@huawei.com

README.md

Contents

Wide&Deep Description

Model Architecture

Dataset

Environment Requirements

Quick Start

Script Description

Script and Sample Code

Script Parameters

Training Script Parameters

Preprocess Script Parameters

Dataset Preparation

Process the Real World Data

Generate and Process the Synthetic Data

Training Process

SingleDevice

Distribute Training

Parameter Server

Evaluation Process

Model Description

Performance

Training Performance

Evaluation Performance

Ultimate performance experience

Description of Random Situation

ModelZoo Homepage

Contributors (25+)
All

README.md

Contents

Training Performance

Evaluation Performance

Ultimate performance experience

Contributors (25+) All

Contributors (25+)
All