Skip to content

A set of reinforcement learning techniques used to design controllers to run and evaluate the simulation of the CartPole Problem on Webots

Notifications You must be signed in to change notification settings

vikrams169/RL-based-Robot-Controller-for-the-CartPole-Problem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

RL-based-Robot-Controller-for-the-CartPole-Problem

A set of reinforcement learning techniques used to design controllers to run and evaluate the simulation of the CartPole Problem on Webots

The Cart Pole environment can be better explained or reviwed by going through the priginal souce code by Open AI Gym here.

In this environment, there exists a pole on a mobile system (with minimal friction), and the goal is to keep it moving without collapsing for as long as possible. The reward for standing each timestep is +1, and if the pole moves more than 15 degrees from the vertical, the episode ends (so basically no negative rewards). There are only two possible actions that are moving the point on the pole on the mobile system right or left, every timestep.

This environment has been solved with the objective of reaching maximum reward (thus reaching the final goal), and has been done so, by using three deep reinforcement learning techniques (all use a neural network function approximator having same architecture, mapping form state to action/policy), each trained on 1,000 episodes.

The programs here are developed with the intent of being run in the Webots Simulator. More information about Webots can be found here. To use RL algorithms in the CartPole environment, I have used the Deepbots framework, more about which can be found here. The robot and the CartPole Environment have been implemented as given in the Deepbots tutorial. All controller algorithms (Deep Q-Learning, Reinforce, and Advantage Actor-Critic (A2C)) have been implemented originally and from scratch. The implementation follows the robot-supervisor scheme of Deepbots.

To navigate and view the controller codes, navigate to the CartPole/controllers directory and go into the directory of the relevant controller. The CartPole/worlds directory has the robot design as given by the Deepbots framework.

To run the controllers and view the simulation, you would need installations of Webots, Deepbots, Python 3, and Tensorflow. The versions I ran and tested them on (which work) are:

  • Webots R2022a
  • Deepbots 0.1.3 (currently the dev version)
  • Python 3.1.0
  • Tensorflow 2.3.1
A training simulation GIF of the configuration looks like:
CartPoleWorldAnimation.mp4

Three RL algorithms have been implemented and tested. They (in the order of their effectiveness based on visual simulation results and cumulative reward) are:

  • Advantage Actor-Critic (A2C)
  • Deep Q-Learning
  • Reinforce Policy Gradient

About

A set of reinforcement learning techniques used to design controllers to run and evaluate the simulation of the CartPole Problem on Webots

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages