Skip to content

Commit d1002bd

Browse files
authored
Reorganize README.md (#93)
* Reorganize README.md * push docs change * quick fix * update docs
1 parent e761845 commit d1002bd

File tree

6 files changed

+32
-99
lines changed

6 files changed

+32
-99
lines changed

.github/workflows/docs.yaml

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
1-
name: ci
1+
name: docs
22
on:
33
push:
44
branches:
55
- master
66
- main
7-
- documentation-site
87
jobs:
98
deploy:
109
runs-on: ubuntu-latest

README.md

+19-88
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,13 @@
11
# CleanRL (Clean Implementation of RL Algorithms)
22

3-
[<img src="https://img.shields.io/badge/discord-cleanrl-green?label=Discord&logo=discord&logoColor=ffffff&labelColor=7289DA&color=2c2f33">](https://discord.gg/D6RCjA6sVT)
4-
[![Meeting Recordings : cleanrl](https://img.shields.io/badge/meeting%20recordings-cleanrl-green?logo=youtube&logoColor=ffffff&labelColor=FF0000&color=282828&style=flat?label=healthinesses)](https://www.youtube.com/watch?v=dm4HdGujpPs&list=PLQpKd36nzSuMynZLU2soIpNSMeXMplnKP&index=2)
5-
[<img src="https://github.com/vwxyzjn/cleanrl/workflows/build/badge.svg">](
6-
https://github.com/vwxyzjn/cleanrl/actions)
3+
<img src="
4+
https://img.shields.io/github/license/vwxyzjn/cleanrl">
5+
[![tests](https://github.com/vwxyzjn/cleanrl/actions/workflows/tests.yaml/badge.svg)](https://github.com/vwxyzjn/cleanrl/actions/workflows/tests.yaml)
6+
[![ci](https://github.com/vwxyzjn/cleanrl/actions/workflows/docs.yaml/badge.svg)](https://github.com/vwxyzjn/cleanrl/actions/workflows/docs.yaml)
7+
[<img src="https://img.shields.io/discord/767863440248143916?label=discord">](https://discord.gg/D6RCjA6sVT)
78
[<img src="https://badge.fury.io/py/cleanrl.svg">](
89
https://pypi.org/project/cleanrl/)
9-
10-
11-
10+
[<img src="https://img.shields.io/youtube/channel/views/UCDdC6BIFRI0jvcwuhi3aI6w?style=social">](https://www.youtube.com/channel/UCDdC6BIFRI0jvcwuhi3aI6w/videos)
1211

1312

1413
CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:
@@ -24,6 +23,8 @@ CleanRL is a Deep Reinforcement Learning library that provides high-quality sing
2423
* 🧫 Experiment Management with [Weights and Biases](https://wandb.ai/site)
2524
* 💸 Cloud Integration with docker and AWS
2625

26+
You can read more about CleanRL in our [technical paper]((https://arxiv.org/abs/2111.08819)) and [documentation](https://docs.cleanrl.dev/).
27+
2728
Good luck have fun :rocket:
2829

2930
## Get started
@@ -90,104 +91,32 @@ python cleanrl/ppg_procgen.py --gym-id starpilot
9091
python cleanrl/ppg_procgen_impala_cnn.py --gym-id starpilot
9192
```
9293

94+
You may also use a prebuilt development environment hosted in Gitpod:
95+
96+
[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/vwxyzjn/cleanrl/tree/gitpod)
9397

9498
## Algorithms Implemented
9599
- [x] Deep Q-Learning (DQN)
96-
* [dqn.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn.py)
97-
* For discrete action space.
98-
* [dqn_atari.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn_atari.py)
99-
* For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
100100
- [x] Categorical DQN (C51)
101-
* [c51.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/c51.py)
102-
* For discrete action space.
103-
* [c51_atari.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/c51_atari.py)
104-
* For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
105-
* [c51_atari_visual.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/c51_atari_visual.py)
106-
* Adds return and q-values visulization for `dqn_atari.py`.
107101
- [x] Proximal Policy Gradient (PPO)
108-
* All of the PPO implementations below are augmented with some code-level optimizations. See https://costa.sh/blog-the-32-implementation-details-of-ppo.html for more details
109-
* [ppo.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo.py)
110-
* For discrete action space.
111-
* [ppo_continuous_action.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_continuous_action.py)
112-
* For continuous action space. Also implemented Mujoco-specific code-level optimizations
113-
* [ppo_atari.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py)
114-
* For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
115102
- [x] Soft Actor Critic (SAC)
116-
* [sac_continuous_action.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/sac_continuous_action.py)
117-
* For continuous action space.
118103
- [x] Deep Deterministic Policy Gradient (DDPG)
119-
* [ddpg_continuous_action.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ddpg_continuous_action.py)
120-
* For continuous action space.
121104
- [x] Twin Delayed Deep Deterministic Policy Gradient (TD3)
122-
* [td3_continuous_action.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py)
123-
* For continuous action space.
124105
- [x] Apex Deep Q-Learning (Apex-DQN)
125-
* [apex_dqn_atari_visual.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/apex_dqn_atari_visual.py)
126-
* For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
127106

128107
## Open RL Benchmark
129108

130-
[Open RL Benchmark](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Open-RL-Benchmark-0-5-0---Vmlldzo0MDcxOA) by [CleanRL](https://github.com/vwxyzjn/cleanrl) is a comprehensive, interactive and reproducible benchmark of deep Reinforcement Learning (RL) algorithms. It uses Weights and Biases to keep track of the experiment data of popular deep RL algorithms (e.g. DQN, PPO, DDPG, TD3) in a variety of games (e.g. Atari, Mujoco, PyBullet, Procgen, Griddly, MicroRTS). The experiment data includes:
131-
132-
- reproducibility info:
133-
- [source code](https://app.wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/code?workspace=user-costa-huang) and [requirements.txt](https://app.wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/files/requirements.txt)
134-
- [](https://app.wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/code?workspace=user-costa-huang)[hyper-parameters](https://app.wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/overview?workspace=user-costa-huang) and [the exact command to reproduce results](https://app.wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/overview?workspace=user-costa-huang)
135-
- metrics:
136-
- [training metrics and videos of the agents playing the game](https://app.wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg?workspace=user-costa-huang)
137-
- [system metrics](https://app.wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/system?workspace=user-costa-huang) and [logs](https://app.wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/logs?workspace=user-costa-huang)
138-
139-
[Open RL Benchmark](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Open-RL-Benchmark-0-5-0---Vmlldzo0MDcxOA) has over 1000+ experiments including runs from other projects, which is overwhelming to present in a single report. Instead, we present the results in separate reports. Please click on the links below to access them.
140-
141-
- [Atari results](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Atari--VmlldzoxMTExNTI)
142-
- [Mujoco results](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Mujoco--VmlldzoxODE0NjE)
143-
- [PyBullet results](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/PyBullet-and-Other-Continuous-Action-Tasks--VmlldzoxODE0NzY)
144-
- [Procgen results](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Procgen-New--Vmlldzo0NDUyMTg)
145-
- [Griddly results](https://wandb.ai/griddly/griddly-paper-generalize?workspace=user-costa-huang)
146-
- [Gym-μRTS results](https://wandb.ai/vwxyzjn/gym-microrts-paper/reports/Gym-RTS-Toward-Affordable-Deep-Reinforcement-Learning-Research-in-Real-time-Strategy-Games--Vmlldzo2MDIzMTg)
147-
- [Slimevolleygym results](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Slimevolleygym--Vmlldzo0ODA1MjA)
148-
- [PySC2 results](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Gym-pysc2-Benchmark--VmlldzoyNTEyMTc)
149-
- [CarRacing-v0](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/CarRacing-v0--VmlldzoyNDUwMzU)
150-
- [Montezuma Revenge results](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Montezuma-Revenge--Vmlldzo1MDYxNTk)
151-
152-
153-
We hope it could bring a new level of transparency, openness, and reproducibility. Our plan is to
154-
benchmark as many algorithms and games as possible. If you are interested, please join us and contribute
155-
more algorithms and games. To get started, check out our [contribution guide](https://github.com/vwxyzjn/cleanrl/blob/master/CONTRIBUTING.md) and our [roadmap for the Open RL Benchmark](https://github.com/vwxyzjn/cleanrl/projects/1)
156-
109+
CleanRL has a sub project called Open RL Benchmark (https://benchmark.cleanrl.dev/), where we have tracked thousands of experiments across domains. The benchmark is interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. Here are some screenshots.
157110

158-
## Cloud integration
159-
160-
Check out the documentation [here](https://github.com/vwxyzjn/cleanrl/tree/master/cloud)
111+
![](docs/static/o2.png)
112+
![](docs/static/o3.png)
113+
![](docs/static/o1.png)
161114

162115

163116
## Support and get involved
164117

165118
We have a [Discord Community](https://discord.gg/D6RCjA6sVT) for support. Feel free to ask questions. Posting in [Github Issues](https://github.com/vwxyzjn/cleanrl/issues) and PRs are also welcome. Also our past video recordings are available at [YouTube](https://www.youtube.com/watch?v=dm4HdGujpPs&list=PLQpKd36nzSuMynZLU2soIpNSMeXMplnKP&index=2)
166119

167-
<!-- In addition, we also have a monthly development cycle to implement new RL algorithms. Feel free to participate or ask questions there, too. You can sign up for our mailing list at our [Google Groups](https://groups.google.com/forum/#!forum/rlimplementation/join) to receive event RVSP which contains the Hangout video call address every week. -->
168-
169-
## Contribution
170-
171-
We have a short contribution guide here https://github.com/vwxyzjn/cleanrl/blob/master/CONTRIBUTING.md. Consider adding new algorithms
172-
or test new games on the Open RL Benchmark (https://benchmark.cleanrl.dev)
173-
174-
Big thanks to all the contributors of CleanRL!
175-
176-
## References
177-
178-
I have been heavily inspired by the many repos and blog posts. Below contains a incomplete list of them.
179-
180-
* http://inoryy.com/post/tensorflow2-deep-reinforcement-learning/
181-
* https://github.com/seungeunrho/minimalRL
182-
* https://github.com/Shmuma/Deep-Reinforcement-Learning-Hands-On
183-
* https://github.com/hill-a/stable-baselines
184-
185-
The following ones helped me a lot with the continuous action space handling:
186-
187-
* https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail
188-
* https://github.com/zhangchuheng123/Reinforcement-Implementation/blob/master/code/ppo.py
189-
190-
191120
## Citing CleanRL
192121

193122
If you use CleanRL in your work, please cite our technical [paper](https://arxiv.org/abs/2111.08819):
@@ -197,6 +126,8 @@ If you use CleanRL in your work, please cite our technical [paper](https://arxiv
197126
title={CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},
198127
author={Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga},
199128
year={2021},
200-
journal={arXiv preprint arXiv:2111.08819},
201-
url={https://arxiv.org/abs/2111.08819}
129+
eprint={2111.08819},
130+
archivePrefix={arXiv},
131+
primaryClass={cs.LG}
202132
}
133+
```

docs/index.md

+12-9
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,29 @@
11
# CleanRL
22

3-
[<img src="https://img.shields.io/badge/discord-cleanrl-green?label=Discord&logo=discord&logoColor=ffffff&labelColor=7289DA&color=2c2f33">](https://discord.gg/D6RCjA6sVT)
4-
[![Meeting Recordings : cleanrl](https://img.shields.io/badge/meeting%20recordings-cleanrl-green?logo=youtube&logoColor=ffffff&labelColor=FF0000&color=282828&style=flat?label=healthinesses)](https://www.youtube.com/watch?v=dm4HdGujpPs&list=PLQpKd36nzSuMynZLU2soIpNSMeXMplnKP&index=2)
5-
[<img src="https://github.com/vwxyzjn/cleanrl/workflows/build/badge.svg">](
6-
https://github.com/vwxyzjn/cleanrl/actions)
3+
<img src="
4+
https://img.shields.io/github/license/vwxyzjn/cleanrl">
5+
[![tests](https://github.com/vwxyzjn/cleanrl/actions/workflows/tests.yaml/badge.svg)](https://github.com/vwxyzjn/cleanrl/actions/workflows/tests.yaml)
6+
[![ci](https://github.com/vwxyzjn/cleanrl/actions/workflows/docs.yaml/badge.svg)](https://github.com/vwxyzjn/cleanrl/actions/workflows/docs.yaml)
7+
[<img src="https://img.shields.io/discord/767863440248143916?label=discord">](https://discord.gg/D6RCjA6sVT)
78
[<img src="https://badge.fury.io/py/cleanrl.svg">](
89
https://pypi.org/project/cleanrl/)
10+
[<img src="https://img.shields.io/youtube/channel/views/UCDdC6BIFRI0jvcwuhi3aI6w?style=social">](https://www.youtube.com/channel/UCDdC6BIFRI0jvcwuhi3aI6w/videos)
911

1012
## Overview
1113

1214
CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:
1315

1416

15-
* Single-file Implementation <br>
16-
**Every detail about an algorithm is put into the algorithm's own file.** Therefore, it's easier for you to fully understand an algorithm and do research with it.
17-
* Benchmarked Implementation <br>
18-
[Details](https://benchmark.cleanrl.dev) on 7+ algorithms and 34+ games
17+
* Single-file Implementation
18+
* **Every detail about an algorithm is put into the algorithm's own file.** Therefore, it's easier for you to fully understand an algorithm and do research with it.
19+
* Benchmarked Implementation on 7+ algorithms and 34+ games
1920
* Tensorboard Logging
2021
* Local Reproducibility via Seeding
2122
* Videos of Gameplay Capturing
2223
* Experiment Management with [Weights and Biases](https://wandb.ai/site)
23-
* [Cloud Integration](cloud.md) with Docker and AWS
24+
* Cloud Integration with Docker and AWS
25+
26+
You can read more about CleanRL in our [technical paper]((https://arxiv.org/abs/2111.08819)) and [documentation](https://docs.cleanrl.dev/).
2427

2528
Good luck have fun 🚀
2629

docs/static/o1.png

718 KB
Loading

docs/static/o2.png

960 KB
Loading

docs/static/o3.png

853 KB
Loading

0 commit comments

Comments
 (0)