History

dinglinhe 31ea6fc7f7 update README at model_zoo		4 years ago
..
scripts	fix deepfm gpu train script	5 years ago

src	Change GatherV2 to Gather r1.1 to master	4 years ago

README.md	update README at model_zoo	4 years ago

README_CN.md	update README at model_zoo	4 years ago

eval.py	support CPU deepfm	5 years ago

export.py	fix GPU device_id bug	4 years ago

mindspore_hub_conf.py	add yolov3_resnet18&yolov3_darknet53&yolov3_darknet53_quant&deepfm hub conf files	5 years ago

train.py	fix the linear ratio of vgg16, deepfm and wide_deep	4 years ago

README.md

DeepFM Description

Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture.

Paper: Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

Model Architecture

DeepFM consists of two components. The FM component is a factorization machine, which is proposed in to learn feature interactions for recommendation. The deep component is a feed-forward neural network, which is used to learn high-order feature interactions.
The FM and deep component share the same input raw feature vector, which enables DeepFM to learn low- and high-order feature interactions simultaneously from the input raw features.

Dataset

[1] A dataset used in Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction[J]. 2017.

Environment Requirements

Hardware（Ascend/GPU/CPU）
- Prepare hardware environment with Ascend, GPU, or CPU processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

After installing MindSpore via the official website, you can start training and evaluation as follows:

running on Ascend

# run training example
python train.py \
  --dataset_path='dataset/train' \
  --ckpt_path='./checkpoint' \
  --eval_file_name='auc.log' \
  --loss_file_name='loss.log' \
  --device_target='Ascend' \
  --do_eval=True > ms_log/output.log 2>&1 &

# run distributed training example
sh scripts/run_distribute_train.sh 8 /dataset_path /rank_table_8p.json

# run evaluation example
python eval.py \
  --dataset_path='dataset/test' \
  --checkpoint_path='./checkpoint/deepfm.ckpt' \
  --device_target='Ascend' > ms_log/eval_output.log 2>&1 &
OR
sh scripts/run_eval.sh 0 Ascend /dataset_path /checkpoint_path/deepfm.ckpt

For distributed training, a hccl configuration file with JSON format needs to be created in advance.

Please follow the instructions in the link below:

hccl tools.

running on GPU

For running on GPU, please change device_target from Ascend to GPU in configuration file src/config.py

# run training example
python train.py \
  --dataset_path='dataset/train' \
  --ckpt_path='./checkpoint' \
  --eval_file_name='auc.log' \
  --loss_file_name='loss.log' \
  --device_target='GPU' \
  --do_eval=True > ms_log/output.log 2>&1 &

# run distributed training example
sh scripts/run_distribute_train.sh 8 /dataset_path

# run evaluation example
python eval.py \
  --dataset_path='dataset/test' \
  --checkpoint_path='./checkpoint/deepfm.ckpt' \
  --device_target='GPU' > ms_log/eval_output.log 2>&1 &
OR
sh scripts/run_eval.sh 0 GPU /dataset_path /checkpoint_path/deepfm.ckpt

running on CPU

# run training example
python train.py \
  --dataset_path='dataset/train' \
  --ckpt_path='./checkpoint' \
  --eval_file_name='auc.log' \
  --loss_file_name='loss.log' \
  --device_target='CPU' \
  --do_eval=True > ms_log/output.log 2>&1 &

# run evaluation example
python eval.py \
  --dataset_path='dataset/test' \
  --checkpoint_path='./checkpoint/deepfm.ckpt' \
  --device_target='CPU' > ms_log/eval_output.log 2>&1 &

Script Description

Script and Sample Code

.
└─deepfm
  ├─README.md
  ├─mindspore_hub_conf.md             # config for mindspore hub
  ├─scripts
    ├─run_standalone_train.sh         # launch standalone training(1p) in Ascend or GPU
    ├─run_distribute_train.sh         # launch distributed training(8p) in Ascend
    ├─run_distribute_train_gpu.sh     # launch distributed training(8p) in GPU
    └─run_eval.sh                     # launch evaluating in Ascend or GPU
  ├─src
    ├─__init__.py                     # python init file
    ├─config.py                       # parameter configuration
    ├─callback.py                     # define callback function
    ├─deepfm.py                       # deepfm network
    ├─dataset.py                      # create dataset for deepfm
  ├─eval.py                           # eval net
  └─train.py                          # train net

Script Parameters

Parameters for both training and evaluation can be set in config.py

train parameters

optional arguments:
-h, --help            show this help message and exit
--dataset_path DATASET_PATH
                      Dataset path
--ckpt_path CKPT_PATH
                      Checkpoint path
--eval_file_name EVAL_FILE_NAME
                      Auc log file path. Default: "./auc.log"
--loss_file_name LOSS_FILE_NAME
                      Loss log file path. Default: "./loss.log"
--do_eval DO_EVAL     Do evaluation or not. Default: True
--device_target DEVICE_TARGET
                      Ascend or GPU. Default: Ascend

eval parameters

optional arguments:
-h, --help            show this help message and exit
--checkpoint_path CHECKPOINT_PATH
                      Checkpoint file path
--dataset_path DATASET_PATH
                      Dataset path
--device_target DEVICE_TARGET
                      Ascend or GPU. Default: Ascend

Training Process

Training

running on Ascend

python train.py \
  --dataset_path='dataset/train' \
  --ckpt_path='./checkpoint' \
  --eval_file_name='auc.log' \
  --loss_file_name='loss.log' \
  --device_target='Ascend' \
  --do_eval=True > ms_log/output.log 2>&1 &

The python command above will run in the background, you can view the results through the file ms_log/output.log.

After training, you'll get some checkpoint files under ./checkpoint folder by default. The loss value are saved in loss.log file.

2020-05-27 15:26:29 epoch: 1 step: 41257, loss is 0.498953253030777
2020-05-27 15:32:32 epoch: 2 step: 41257, loss is 0.45545706152915955
...

The model checkpoint will be saved in the current directory.

running on GPU

To do.

Distributed Training

running on Ascend
```
sh scripts/run_distribute_train.sh 8 /dataset_path /rank_table_8p.json
```
The above shell script will run distribute training in the background. You can view the results through the file log[X]/output.log. The loss value are saved in loss.log file.
running on GPU

To do.

Evaluation Process

Evaluation

evaluation on dataset when running on Ascend

Before running the command below, please check the checkpoint path used for evaluation.

python eval.py \
  --dataset_path='dataset/test' \
  --checkpoint_path='./checkpoint/deepfm.ckpt' \
  --device_target='Ascend' > ms_log/eval_output.log 2>&1 &
OR
sh scripts/run_eval.sh 0 Ascend /dataset_path /checkpoint_path/deepfm.ckpt

The above python command will run in the background. You can view the results through the file "eval_output.log". The accuracy is saved in auc.log file.

{'result': {'AUC': 0.8057789065281104, 'eval_time': 35.64779996871948}}

evaluation on dataset when running on GPU

To do.

Model Description

Performance

Training Performance

Parameters	Ascend	GPU
Model Version	DeepFM	To do
Resource	Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8	To do
uploaded Date	09/15/2020 (month/day/year)	To do
MindSpore Version	1.0.0	To do
Dataset	[1]	To do
Training Parameters	epoch=15, batch_size=1000, lr=1e-5	To do
Optimizer	Adam	To do
Loss Function	Sigmoid Cross Entropy With Logits	To do
outputs	Accuracy	To do
Loss	0.45	To do
Speed	1pc: 8.16 ms/step;	To do
Total time	1pc: 90 mins;	To do
Parameters (M)	16.5	To do
Checkpoint for Fine tuning	190M (.ckpt file)	To do
Scripts	deepfm script	To do

Inference Performance

Parameters	Ascend	GPU
Model Version	DeepFM	To do
Resource	Ascend 910; OS Euler2.8	To do
Uploaded Date	05/27/2020 (month/day/year)	To do
MindSpore Version	0.3.0-alpha	To do
Dataset	[1]	To do
batch_size	1000	To do
outputs	accuracy	To do
Accuracy	1pc: 80.55%;	To do
Model for inference	190M (.ckpt file)	To do

Description of Random Situation

We set the random seed before training in train.py.

ModelZoo Homepage

Please check the official homepage.

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ Python Text Unity3D Asset C other

314202276@qq.com 5518576+mindspore_ci@user.noreply.gitee.com tommylike@qq.com zhaozhenlong1@huawei.com zhoufeng54@huawei.com sunsuodong@huawei.com wangkaisheng2@huawei.com yangruoqi@huawei.com shiliang10@huawei.com xiefangqi2@huawei.com caifubi1@huawei.com lingqiaomin.huawei.com chenweifeng720@huawei.com fuzhiye@huawei.com liubuyu1@huawei.com changzherui1@huawei.com guozhijian@huawei.com huanghui44@huawei.com zhaoting23@huawei.com liuxiao93@huawei.com peixu.ren1@huawei.com yaoyifan1@huawei.com lizhenyu13@huawei.com xuanyue@huawei.com yuchaojie1@huawei.com

README.md

Contents

DeepFM Description

Model Architecture

Dataset

Environment Requirements

Quick Start

Script Description

Script and Sample Code

Script Parameters

Training Process

Training

Distributed Training

Evaluation Process

Evaluation

Model Description

Performance

Training Performance

Inference Performance

Description of Random Situation

ModelZoo Homepage

Contributors (25+)
All

README.md

Contents

Training

Distributed Training

Evaluation

Training Performance

Inference Performance

Contributors (25+) All

Contributors (25+)
All