Skip to content

Commit 0a247b6

Browse files
committed
update documentation
1 parent 02ab41e commit 0a247b6

29 files changed

+21
-28
lines changed

docs/rl-algorithms/c51.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -67,9 +67,9 @@ Below are the average episodic returns for `c51_atari.py`.
6767

6868
| Environment | `c51_atari.py` 10M steps | (Bellemare et al., 2017, Figure 14)[^1] 50M steps | (Hessel et al., 2017, Figure 5)[^3]
6969
| ----------- | ----------- | ----------- | ---- |
70-
| BreakoutNoFrameskip-v4 | 467.00 ± 96.11 | 748 | ~500 at 10M steps, ~600 at 50M steps
71-
| PongNoFrameskip-v4 | 19.32 ± 0.92 | 20.9 | ~20 10M steps, ~20 at 50M steps
72-
| BeamRiderNoFrameskip-v4 | 9986.96 ± 1953.30 | 14,074 | ~12000 10M steps, ~14000 at 50M steps
70+
| BreakoutNoFrameskip-v4 | 461.86 ± 69.65 | 748 | ~500 at 10M steps, ~600 at 50M steps
71+
| PongNoFrameskip-v4 | 19.46 ± 0.70 | 20.9 | ~20 10M steps, ~20 at 50M steps
72+
| BeamRiderNoFrameskip-v4 | 9592.90 ± 2270.15 | 14,074 | ~12000 10M steps, ~14000 at 50M steps
7373

7474

7575
Note that we save computational time by reducing timesteps from 50M to 10M, but our `c51_atari.py` scores the same or higher than (Mnih et al., 2015)[^1] in 10M steps.
@@ -156,9 +156,9 @@ Below are the average episodic returns for `c51.py`.
156156

157157
| Environment | `c51.py` |
158158
| ----------- | ----------- |
159-
| CartPole-v1 | 498.51 ± 1.77 |
160-
| Acrobot-v1 | -88.81 ± 8.86 |
161-
| MountainCar-v0 | -167.71 ± 26.85 |
159+
| CartPole-v1 | 481.20 ± 20.53 |
160+
| Acrobot-v1 | -87.70 ± 5.52 |
161+
| MountainCar-v0 | -166.38 ± 27.94 |
162162

163163

164164
Note that the C51 has no official benchmark on classic control environments, so we did not include a comparison. That said, our `c51.py` was able to achieve near perfect scores in `CartPole-v1` and `Acrobot-v1`; further, it can obtain successful runs in the sparse environment `MountainCar-v0`.

docs/rl-algorithms/c51/Acrobot-v1.png

2.57 KB
Loading
Loading
-9.29 KB
Loading
15.5 KB
Loading
-6.79 KB
Loading
-1.53 KB
Loading

docs/rl-algorithms/ddpg.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -169,9 +169,9 @@ Below are the average episodic returns for [`ddpg_continuous_action.py`](https:/
169169

170170
| Environment | [`ddpg_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ddpg_continuous_action.py) | [`OurDDPG.py`](https://github.com/sfujim/TD3/blob/master/OurDDPG.py) (Fujimoto et al., 2018, Table 1)[^2] | [`DDPG.py`](https://github.com/sfujim/TD3/blob/master/DDPG.py) using settings from (Lillicrap et al., 2016)[^1] in (Fujimoto et al., 2018, Table 1)[^2] |
171171
| ----------- | ----------- | ----------- | ----------- |
172-
| HalfCheetah | 9260.485 ± 643.088 |8577.29 | 3305.60|
173-
| Walker2d | 1728.72 ± 758.33 | 3098.11 | 1843.85 |
174-
| Hopper | 1404.44 ± 544.78 | 1860.02 | 2020.46 |
172+
| HalfCheetah | 9382.32 ± 1395.52 |8577.29 | 3305.60|
173+
| Walker2d | 1598.35 ± 862.66 | 3098.11 | 1843.85 |
174+
| Hopper | 1313.43 ± 684.46 | 1860.02 | 2020.46 |
175175

176176

177177

13.2 KB
Loading

docs/rl-algorithms/ddpg/Hopper-v2.png

15.6 KB
Loading
5.61 KB
Loading

docs/rl-algorithms/dqn.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -92,9 +92,9 @@ Below are the average episodic returns for `dqn_atari.py`.
9292

9393
| Environment | `dqn_atari.py` 10M steps | (Mnih et al., 2015)[^1] 50M steps | (Hessel et al., 2017, Figure 5)[^3]
9494
| ----------- | ----------- | ----------- | ---- |
95-
| BreakoutNoFrameskip-v4 | 337.64 ± 69.47 |401.2 ± 26.9 | ~230 at 10M steps, ~300 at 50M steps
96-
| PongNoFrameskip-v4 | 20.293 ± 0.37 | 18.9 ± 1.3 | ~20 10M steps, ~20 at 50M steps
97-
| BeamRiderNoFrameskip-v4 | 6207.41 ± 1019.96 | 6846 ± 1619 | ~6000 10M steps, ~7000 at 50M steps
95+
| BreakoutNoFrameskip-v4 | 366.928 ± 39.89 |401.2 ± 26.9 | ~230 at 10M steps, ~300 at 50M steps
96+
| PongNoFrameskip-v4 | 20.25 ± 0.41 | 18.9 ± 1.3 | ~20 10M steps, ~20 at 50M steps
97+
| BeamRiderNoFrameskip-v4 | 6673.24 ± 1434.37 | 6846 ± 1619 | ~6000 10M steps, ~7000 at 50M steps
9898

9999

100100
Note that we save computational time by reducing timesteps from 50M to 10M, but our `dqn_atari.py` scores the same or higher than (Mnih et al., 2015)[^1] in 10M steps.
@@ -179,9 +179,9 @@ Below are the average episodic returns for `dqn.py`.
179179

180180
| Environment | `dqn.py` |
181181
| ----------- | ----------- |
182-
| CartPole-v1 | 471.21 ± 43.45 |
183-
| Acrobot-v1 | -93.37 ± 8.46 |
184-
| MountainCar-v0 | -170.51 ± 26.22 |
182+
| CartPole-v1 | 488.69 ± 16.11 |
183+
| Acrobot-v1 | -91.54 ± 7.20 |
184+
| MountainCar-v0 | -194.95 ± 8.48 |
185185

186186

187187
Note that the DQN has no official benchmark on classic control environments, so we did not include a comparison. That said, our `dqn.py` was able to achieve near perfect scores in `CartPole-v1` and `Acrobot-v1`; further, it can obtain successful runs in the sparse environment `MountainCar-v0`.

docs/rl-algorithms/dqn/Acrobot-v1.png

-5.58 KB
Loading
Loading
Loading
-18.5 KB
Loading
-40.9 KB
Loading
636 Bytes
Loading

docs/rl-algorithms/sac.md

+3-10
Original file line numberDiff line numberDiff line change
@@ -197,9 +197,9 @@ The table below compares the results of CleanRL's [`sac_continuous_action.py`](h
197197
198198
| Environment | [`sac_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/sac_continuous_action.py) |[SAC: Algorithms and Applications](https://arxiv.org/abs/1812.05905) @ 1M steps|
199199
| --------------- | ------------------ | ---------------- |
200-
| HalfCheetah-v2 | 9,063 ± 1381 | ~11,250 |
201-
| Walker2d-v2 | 4554 ± 296 | ~4,800 |
202-
| Hopper-v2 | 2347 ± 538 | ~3,250 |
200+
| HalfCheetah-v2 | 10310.37 ± 1873.21 | ~11,250 |
201+
| Walker2d-v2 | 4418.15 ± 592.82 | ~4,800 |
202+
| Hopper-v2 | 2685.76 ± 762.16 | ~3,250 |
203203
204204
205205
### Learning curves
@@ -212,17 +212,10 @@ The table below compares the results of CleanRL's [`sac_continuous_action.py`](h
212212
213213
<div></div>
214214
215-
<div class="grid-container">
216-
<img src="../sac/HalfCheetahBulletEnv-v0.png">
217-
<img src="../sac/Walker2DBulletEnv-v0.png">
218-
<img src="../sac/HopperBulletEnv-v0.png">
219-
</div>
220215
221216
### Tracked experiments and gameplay videos
222217
223218
<iframe src="https://wandb.ai/openrlbenchmark/openrlbenchmark/reports/MuJoCo-CleanRL-s-SAC--VmlldzoxNzI1NDM0" style="width:100%; height:1200px" title="MuJoCo: CleanRL's DDPG"></iframe>
224219
225220
226-
<iframe src="https://wandb.ai/openrlbenchmark/openrlbenchmark/reports/PyBullet-CleanRL-s-SAC--VmlldzoxNzI1NDQw" style="width:100%; height:1200px" title="PyBullet: CleanRL's DDPG"></iframe>
227-
228221
[^1]:Diederik P Kingma, Max Welling (2016). Auto-Encoding Variational Bayes. ArXiv, abs/1312.6114. https://arxiv.org/abs/1312.6114
368 KB
Loading
Binary file not shown.

docs/rl-algorithms/sac/Hopper-v2.png

419 KB
Loading
-122 KB
Binary file not shown.
-127 KB
Binary file not shown.
389 KB
Loading

docs/rl-algorithms/td3.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -76,9 +76,9 @@ Below are the average episodic returns for [`td3_continuous_action.py`](https://
7676

7777
| Environment | [`td3_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py) | [`TD3.py`](https://github.com/sfujim/TD3/blob/master/TD3.py) (Fujimoto et al., 2018, Table 1)[^2] |
7878
| ----------- | ----------- | ----------- |
79-
| HalfCheetah | 9391.52 ± 448.54 |9636.95 ± 859.065 |
80-
| Walker2d | 3895.80 ± 333.89 | 4682.82 ± 539.64 |
81-
| Hopper | 3379.25 ± 200.22 | 3564.07 ± 114.74 |
79+
| HalfCheetah | 9018.31 ± 1078.31 |9636.95 ± 859.065 |
80+
| Walker2d | 4246.07 ± 1210.84 | 4682.82 ± 539.64 |
81+
| Hopper | 3391.78 ± 232.21 | 3564.07 ± 114.74 |
8282

8383

8484

-20.4 KB
Loading

docs/rl-algorithms/td3/Hopper-v2.png

6.79 KB
Loading
-7.76 KB
Loading

0 commit comments

Comments
 (0)