Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
|
|
4 years ago | |
|---|---|---|
| .. | ||
| scripts | 4 years ago | |
| src | 4 years ago | |
| README.md | 4 years ago | |
| eval.py | 4 years ago | |
| requirements.txt | 4 years ago | |
| train.py | 4 years ago | |
DQN is the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.
Paper Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.
The overall network architecture of DQN is show below:
Hardware(Ascend/GPU/CPU)
Framework
For more information, please check the resources below:
third-party libraries
pip install gym
├── dqn
├── README.md # descriptions about DQN
├── scripts
│ ├──run_standalone_eval_ascend.sh # shell script for evaluation with Ascend
│ ├──run_standalone_eval_gpu.sh # shell script for evaluation with GPU
│ ├──run_standalone_train_ascend.sh # shell script for train with Ascend
│ ├──run_standalone_train_gpu.sh # shell script for train with GPU
├── src
│ ├──agent.py # model agent
│ ├──config.py # parameter configuration
│ ├──dqn.py # dqn architecture
├── train.py # training script
├── eval.py # evaluation script
'gamma': 0.8 # the proportion of choose next state value
'epsi_high': 0.9 # the highest exploration rate
'epsi_low': 0.05 # the Lowest exploration rate
'decay': 200 # number of steps to start learning
'lr': 0.001 # learning rate
'capacity': 100000 # the capacity of data buffer
'batch_size': 512 # training batch size
'state_space_dim': 4 # the environment state space dim
'action_space_dim': 2 # the action dim
# training example
python
Ascend: python train.py --device_target Ascend --ckpt_path ckpt > log.txt 2>&1 &
GPU: python train.py --device_target GPU --ckpt_path ckpt > log.txt 2>&1 &
shell:
Ascend: sh run_standalone_train_ascend.sh ckpt
GPU: sh run_standalone_train_gpu.sh ckpt
# evaluat example
python
Ascend: python eval.py --device_target Ascend --ckpt_path .ckpt/checkpoint_dqn.ckpt
GPU: python eval.py --device_target GPU --ckpt_path .ckpt/checkpoint_dqn.ckpt
shell:
Ascend: sh run_standalone_eval_ascend.sh .ckpt/checkpoint_dqn.ckpt
GPU: sh run_standalone_eval_gpu.sh .ckpt/checkpoint_dqn.ckpt
| Parameters | DQN |
|---|---|
| Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
| uploaded Date | 03/10/2021 (month/day/year) |
| MindSpore Version | 1.1.0 |
| Training Parameters | batch_size = 512, lr=0.001 |
| Optimizer | RMSProp |
| Loss Function | MSELoss |
| outputs | probability |
| Params (M) | 7.3k |
| Scripts | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/rl/dqn |
We use random seed in train.py.
Please check the official homepage.
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
C++ Python Text Unity3D Asset C other