Highly modularized implementation of popular deep RL algorithms by PyTorch. My principal here is to reuse as much components as I can through different algorithms, use as less tricks as I can and switch easily between classical control tasks like CartPole and Atari games with raw pixel inputs.
Implemented algorithms:
- Deep Q-Learning (DQN)
- Double DQN
- Dueling DQN
- Async Advantage Actor Critic (A3C)
- Async One-Step Q-Learning
- Async One-Step Sarsa
- Async N-Step Q-Learning
Curves for CartPole are trivial so I didn't place it here.
The network and parameters here are exactly same as the DeepMind Nature paper. Training curve is smoothed by a window of size 100. All the models are trained in a server with Xeon E5-2620 v3 and Titan X. For Breakout, test is triggered every 1000 episodes with 50 repetitions. In total, 16M frames cost about 4 days and 10 hours. For Pong, test is triggered every 10 episodes with no repetition. In total, 4M frames cost about 18 hours.
The network I used here is same as the network in DQN except the activation function is Elu rather than Relu. The optimizer is Adam with non-shared parameters. To my best knowledge, this network architecture is not the most suitable for A3C. If you use a 42 * 42 input, add a LSTM layer at last, you will get much much much better training speed than this. GAE can also improve performance.
The first 15M frames took about 5 hours (16 processes) in a server with two Xeon E5-2620 v3. This is the test curve. Test is triggered in a separate deterministic test process every 50K frames.
- Open AI gym
- PyTorch
- PIL (pip install Pillow)
- Python 2.7 (I didn't test with Python 3)
Detailed usage and all training details can be found in main.py
- Human Level Control through Deep Reinforcement Learning
- Asynchronous Methods for Deep Reinforcement Learning
- Deep Reinforcement Learning with Double Q-learning
- Dueling Network Architectures for Deep Reinforcement Learning
- Playing Atari with Deep Reinforcement Learning
- HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
- transedward/pytorch-dqn
- ikostrikov/pytorch-a3c