Skip to content

Files

Latest commit

aece4a4 · Jun 17, 2025

History

History
123 lines (89 loc) · 3.86 KB

README.md

File metadata and controls

123 lines (89 loc) · 3.86 KB

Biped Robot with Deep Reinforcement Learning

This project explores how to teach a low-cost biped robot to walk using deep reinforcement learning algorithms. The entire pipeline is designed to be accessible, reproducible, and modular, combining simple hardware (12 DOF servos, Jetson Nano) with simulation in MuJoCo and training using PyTorch.

We compare and evaluate four state-of-the-art RL algorithms for continuous control:

  • DDPG (Deep Deterministic Policy Gradient)
  • D4PG (Distributed Distributional DDPG)
  • SAC (Soft Actor-Critic)
  • MPO (Maximum a Posteriori Policy Optimization)

The project aims to democratize humanoid robotics by making it feasible to train bipedal locomotion using affordable tools and open-source code.

Robot


📂 Project Structure

.
├── models/             # PyTorch models: actor critic networks, DDPG, D4PG, SAC, MPO
├── env/                # MuJoCo simulation environments
├── src/                # Training & evaluation scripts
├── config/             # YAML configs per experiment
├── video/              # Example walking videos
├── data/               # Data and graphs from training runs
├── utils/              # Miscellaneous utilities
├── checkpoints_final/  # Trained models
├── requirements.txt    # required python libraries
└── README.md

⚙️ How to Run

1. Set up the environment

The project requires Python 3.11+

# Recommended: create a virtual environment
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Install dependencies
pip install -r requirements.txt

2. Train an agent

# Create a new .yaml config file in the config/ folder
# or use an existing one, e.g., config\final\train_config_sac.yaml
# Modify train.py to use your config file (TODO: add it as a command line argument)
# Example command to train using SAC on the flat environment
python src/train.py

3. Evaluate a trained model

# After training, you can test the model
# Modify src/test_model.py to use your trained model (TODO: add it as a command line argument)
python src/test_model.py

4. Visualize results

Training data is saved to the folder you choose in the config file automatically. To visualize the results, you can use TensorBoard to plot the training curves.


📊 Results

Training Curves

Episode Score Graph Episode Length Graph

Mean Reward Graph Steps per Second Graph


🎥 Walking Demo

Robot Walking


🧠 Algorithms Summary

Algorithm Type Strengths Weaknesses
DDPG Off-policy Fast, simple Unstable, outdated
D4PG Off-policy Fast, stable, great for tuning Not powerful enough
SAC Off-policy Robust, best final performance Slow training
MPO Off-policy Theoretically grounded Sensitive, complex hyperparams

🧪 Future Work

  • Sim-to-real transfer with domain randomization
  • Use of temporal models (LSTM, 1D CNN)
  • Add direction, rotation, and velocity control

📜 License

MIT License. Feel free to fork, modify, and contribute!


🤖 Contact

Project by Pablo Gómez Martínez
Contact: pablodiegogomez@gmail.com