A set of reinforcement learning techniques used to design controllers to run and evaluate the simulation of the CartPole Problem on Webots
The Cart Pole environment can be better explained or reviwed by going through the priginal souce code by Open AI Gym here.
In this environment, there exists a pole on a mobile system (with minimal friction), and the goal is to keep it moving without collapsing for as long as possible. The reward for standing each timestep is +1, and if the pole moves more than 15 degrees from the vertical, the episode ends (so basically no negative rewards). There are only two possible actions that are moving the point on the pole on the mobile system right or left, every timestep.
This environment has been solved with the objective of reaching maximum reward (thus reaching the final goal), and has been done so, by using three deep reinforcement learning techniques (all use a neural network function approximator having same architecture, mapping form state to action/policy), each trained on 1,000 episodes.
The programs here are developed with the intent of being run in the Webots Simulator. More information about Webots can be found here. To use RL algorithms in the CartPole environment, I have used the Deepbots framework, more about which can be found here. The robot and the CartPole Environment have been implemented as given in the Deepbots tutorial. All controller algorithms (Deep Q-Learning, Reinforce, and Advantage Actor-Critic (A2C)) have been implemented originally and from scratch. The implementation follows the robot-supervisor scheme of Deepbots.
To navigate and view the controller codes, navigate to the CartPole/controllers directory and go into the directory of the relevant controller. The CartPole/worlds directory has the robot design as given by the Deepbots framework.
To run the controllers and view the simulation, you would need installations of Webots, Deepbots, Python 3, and Tensorflow. The versions I ran and tested them on (which work) are:
- Webots R2022a
- Deepbots 0.1.3 (currently the dev version)
- Python 3.1.0
- Tensorflow 2.3.1
CartPoleWorldAnimation.mp4
Three RL algorithms have been implemented and tested. They (in the order of their effectiveness based on visual simulation results and cumulative reward) are:
- Advantage Actor-Critic (A2C)
- Deep Q-Learning
- Reinforce Policy Gradient