History

dinglinhe 31ea6fc7f7 update README at model_zoo		4 years ago
..
scripts	dqn	4 years ago

src	dqn	4 years ago

README.md	update README at model_zoo	4 years ago

eval.py	cartpolev0-cartpolev1	4 years ago

requirements.txt	dqn	4 years ago

train.py	cartpolev0-cartpolev1	4 years ago

README.md

Contents
DQN Description

DQN Description

DQN is the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.
Paper Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.

Model Architecture

The overall network architecture of DQN is show below:

Paper

Dataset

Requirements

Hardware（Ascend/GPU/CPU）
- Prepare hardware environment with Ascend or GPU processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API
third-party libraries

pip install gym

Script Description

Scripts and Sample Code

├── dqn
  ├── README.md              # descriptions about DQN
  ├── scripts
  │   ├──run_standalone_eval_ascend.sh        # shell script for evaluation with Ascend
  │   ├──run_standalone_eval_gpu.sh         # shell script for evaluation with GPU
  │   ├──run_standalone_train_ascend.sh        # shell script for train with Ascend
  │   ├──run_standalone_train_gpu.sh         # shell script for train with GPU
  ├── src
  │   ├──agent.py             # model agent
  │   ├──config.py           # parameter configuration
  │   ├──dqn.py      # dqn architecture
  ├── train.py               # training script
  ├── eval.py                # evaluation script

Script Parameter

    'gamma': 0.8             # the proportion of choose next state value
    'epsi_high': 0.9         # the highest exploration rate
    'epsi_low': 0.05         # the Lowest exploration rate
    'decay': 200             # number of steps to start learning
    'lr': 0.001              # learning rate
    'capacity': 100000       # the capacity of data buffer
    'batch_size': 512        # training batch size
    'state_space_dim': 4     # the environment state space dim
    'action_space_dim': 2    # the action dim

Training Process

# training example
  python
      Ascend: python train.py --device_target Ascend --ckpt_path ckpt > log.txt 2>&1 &  
      GPU: python train.py --device_target GPU --ckpt_path ckpt > log.txt 2>&1 &  

  shell:
      Ascend: sh run_standalone_train_ascend.sh ckpt
      GPU: sh run_standalone_train_gpu.sh ckpt

Evaluation Process

# evaluat example
  python
      Ascend: python eval.py --device_target Ascend --ckpt_path .ckpt/checkpoint_dqn.ckpt
      GPU: python eval.py --device_target GPU --ckpt_path .ckpt/checkpoint_dqn.ckpt

  shell:
      Ascend: sh run_standalone_eval_ascend.sh .ckpt/checkpoint_dqn.ckpt
      GPU: sh run_standalone_eval_gpu.sh .ckpt/checkpoint_dqn.ckpt

Performance

Inference Performance

Parameters	DQN
Resource	Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8
uploaded Date	03/10/2021 (month/day/year)
MindSpore Version	1.1.0
Training Parameters	batch_size = 512, lr=0.001
Optimizer	RMSProp
Loss Function	MSELoss
outputs	probability
Params (M)	7.3k
Scripts	https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/rl/dqn

Description of Random Situation

We use random seed in train.py.

ModeZoo Homepage

Please check the official homepage.

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ Python Text Unity3D Asset C other

314202276@qq.com 5518576+mindspore_ci@user.noreply.gitee.com tommylike@qq.com zhaozhenlong1@huawei.com zhoufeng54@huawei.com sunsuodong@huawei.com wangkaisheng2@huawei.com yangruoqi@huawei.com shiliang10@huawei.com xiefangqi2@huawei.com caifubi1@huawei.com lingqiaomin.huawei.com chenweifeng720@huawei.com fuzhiye@huawei.com liubuyu1@huawei.com changzherui1@huawei.com guozhijian@huawei.com huanghui44@huawei.com zhaoting23@huawei.com liuxiao93@huawei.com peixu.ren1@huawei.com yaoyifan1@huawei.com lizhenyu13@huawei.com xuanyue@huawei.com yuchaojie1@huawei.com

README.md

Contents

Inference Performance

Contributors (25+) All

Contributors (25+)
All