vwxyzjn
diff --git a/‎docs/rl-algorithms/c51.md
+6-6 b/‎docs/rl-algorithms/c51.md
+6-6
diff --git a/‎docs/rl-algorithms/c51/Acrobot-v1.png
2.57 KB b/‎docs/rl-algorithms/c51/Acrobot-v1.png
2.57 KB
diff --git a/‎docs/rl-algorithms/c51/BeamRiderNoFrameskip-v4.png
-3.76 KB b/‎docs/rl-algorithms/c51/BeamRiderNoFrameskip-v4.png
-3.76 KB
diff --git a/‎docs/rl-algorithms/c51/BreakoutNoFrameskip-v4.png
-9.29 KB b/‎docs/rl-algorithms/c51/BreakoutNoFrameskip-v4.png
-9.29 KB
diff --git a/‎docs/rl-algorithms/c51/CartPole-v1.png
15.5 KB b/‎docs/rl-algorithms/c51/CartPole-v1.png
15.5 KB
diff --git a/‎docs/rl-algorithms/c51/MountainCar-v0.png
-6.79 KB b/‎docs/rl-algorithms/c51/MountainCar-v0.png
-6.79 KB
diff --git a/‎docs/rl-algorithms/c51/PongNoFrameskip-v4.png
-1.53 KB b/‎docs/rl-algorithms/c51/PongNoFrameskip-v4.png
-1.53 KB
diff --git a/‎docs/rl-algorithms/ddpg.md
+3-3 b/‎docs/rl-algorithms/ddpg.md
+3-3
diff --git a/‎docs/rl-algorithms/ddpg/HalfCheetah-v2.png
13.2 KB b/‎docs/rl-algorithms/ddpg/HalfCheetah-v2.png
13.2 KB
diff --git a/‎docs/rl-algorithms/ddpg/Hopper-v2.png
15.6 KB b/‎docs/rl-algorithms/ddpg/Hopper-v2.png
15.6 KB
diff --git a/‎docs/rl-algorithms/ddpg/Walker2d-v2.png
5.61 KB b/‎docs/rl-algorithms/ddpg/Walker2d-v2.png
5.61 KB
diff --git a/‎docs/rl-algorithms/dqn.md
+6-6 b/‎docs/rl-algorithms/dqn.md
+6-6
diff --git a/‎docs/rl-algorithms/dqn/Acrobot-v1.png
-5.58 KB b/‎docs/rl-algorithms/dqn/Acrobot-v1.png
-5.58 KB
diff --git a/‎docs/rl-algorithms/dqn/BeamRiderNoFrameskip-v4.png
19.1 KB b/‎docs/rl-algorithms/dqn/BeamRiderNoFrameskip-v4.png
19.1 KB
diff --git a/‎docs/rl-algorithms/dqn/BreakoutNoFrameskip-v4.png
1 KB b/‎docs/rl-algorithms/dqn/BreakoutNoFrameskip-v4.png
1 KB
diff --git a/‎docs/rl-algorithms/dqn/CartPole-v1.png
-18.5 KB b/‎docs/rl-algorithms/dqn/CartPole-v1.png
-18.5 KB
diff --git a/‎docs/rl-algorithms/dqn/MountainCar-v0.png
-40.9 KB b/‎docs/rl-algorithms/dqn/MountainCar-v0.png
-40.9 KB
diff --git a/‎docs/rl-algorithms/dqn/PongNoFrameskip-v4.png
636 Bytes b/‎docs/rl-algorithms/dqn/PongNoFrameskip-v4.png
636 Bytes
diff --git a/‎docs/rl-algorithms/sac.md
+3-10 b/‎docs/rl-algorithms/sac.md
+3-10
diff --git a/‎docs/rl-algorithms/sac/HalfCheetah-v2.png
368 KB b/‎docs/rl-algorithms/sac/HalfCheetah-v2.png
368 KB
diff --git a/‎docs/rl-algorithms/sac/HalfCheetahBulletEnv-v0.png
-116 KB b/‎docs/rl-algorithms/sac/HalfCheetahBulletEnv-v0.png
-116 KB
diff --git a/‎docs/rl-algorithms/sac/Hopper-v2.png
419 KB b/‎docs/rl-algorithms/sac/Hopper-v2.png
419 KB
diff --git a/‎docs/rl-algorithms/sac/HopperBulletEnv-v0.png
-122 KB b/‎docs/rl-algorithms/sac/HopperBulletEnv-v0.png
-122 KB
diff --git a/‎docs/rl-algorithms/sac/Walker2DBulletEnv-v0.png
-127 KB b/‎docs/rl-algorithms/sac/Walker2DBulletEnv-v0.png
-127 KB
diff --git a/‎docs/rl-algorithms/sac/Walker2d-v2.png
389 KB b/‎docs/rl-algorithms/sac/Walker2d-v2.png
389 KB
diff --git a/‎docs/rl-algorithms/td3.md
+3-3 b/‎docs/rl-algorithms/td3.md
+3-3
diff --git a/‎docs/rl-algorithms/td3/HalfCheetah-v2.png
-20.4 KB b/‎docs/rl-algorithms/td3/HalfCheetah-v2.png
-20.4 KB
diff --git a/‎docs/rl-algorithms/td3/Hopper-v2.png
6.79 KB b/‎docs/rl-algorithms/td3/Hopper-v2.png
6.79 KB
diff --git a/‎docs/rl-algorithms/td3/Walker2d-v2.png
-7.76 KB b/‎docs/rl-algorithms/td3/Walker2d-v2.png
-7.76 KB
@@ -67,9 +67,9 @@ Below are the average episodic returns for `c51_atari.py`.
 
 | Environment      | `c51_atari.py` 10M steps | (Bellemare et al., 2017, Figure 14)[^1] 50M steps | (Hessel et al., 2017, Figure 5)[^3] 
 | ----------- | ----------- | ----------- | ---- |
-| BreakoutNoFrameskip-v4      | 467.00 ± 96.11      | 748  | ~500 at 10M steps, ~600 at 50M steps
-| PongNoFrameskip-v4  | 19.32 ± 0.92    |  20.9 |  ~20 10M steps, ~20 at 50M steps 
-| BeamRiderNoFrameskip-v4   | 9986.96 ± 1953.30        | 14,074 | ~12000 10M steps, ~14000 at 50M steps 
+| BreakoutNoFrameskip-v4      | 461.86 ± 69.65      | 748  | ~500 at 10M steps, ~600 at 50M steps
+| PongNoFrameskip-v4  | 19.46 ± 0.70    |  20.9 |  ~20 10M steps, ~20 at 50M steps 
+| BeamRiderNoFrameskip-v4   | 9592.90 ± 2270.15        | 14,074 | ~12000 10M steps, ~14000 at 50M steps 
 
 
 Note that we save computational time by reducing timesteps from 50M to 10M, but our `c51_atari.py` scores the same or higher than (Mnih et al., 2015)[^1] in 10M steps.
@@ -156,9 +156,9 @@ Below are the average episodic returns for `c51.py`.
 
 | Environment      | `c51.py`  | 
 | ----------- | ----------- | 
-| CartPole-v1      | 498.51 ± 1.77      |
-| Acrobot-v1  | -88.81 ± 8.86     | 
-| MountainCar-v0   | -167.71 ± 26.85        | 
+| CartPole-v1      | 481.20 ± 20.53      |
+| Acrobot-v1  | -87.70 ± 5.52     | 
+| MountainCar-v0   | -166.38 ± 27.94        | 
 
 
 Note that the C51 has no official benchmark on classic control environments, so we did not include a comparison. That said, our `c51.py` was able to achieve near perfect scores in `CartPole-v1` and `Acrobot-v1`; further, it can obtain successful runs in the sparse environment `MountainCar-v0`.
 
@@ -169,9 +169,9 @@ Below are the average episodic returns for [`ddpg_continuous_action.py`](https:/
 
 | Environment      | [`ddpg_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ddpg_continuous_action.py) | [`OurDDPG.py`](https://github.com/sfujim/TD3/blob/master/OurDDPG.py) (Fujimoto et al., 2018, Table 1)[^2]  | [`DDPG.py`](https://github.com/sfujim/TD3/blob/master/DDPG.py) using settings from (Lillicrap et al., 2016)[^1] in (Fujimoto et al., 2018, Table 1)[^2]    |
 | ----------- | ----------- | ----------- | ----------- |
-| HalfCheetah      | 9260.485 ± 643.088      |8577.29  | 3305.60|
-| Walker2d   | 1728.72 ± 758.33     |  3098.11 | 1843.85 |
-| Hopper   | 1404.44 ± 544.78         |  1860.02 | 2020.46 |
+| HalfCheetah      | 9382.32 ± 1395.52      |8577.29  | 3305.60|
+| Walker2d   | 1598.35 ± 862.66     |  3098.11 | 1843.85 |
+| Hopper   | 1313.43 ± 684.46         |  1860.02 | 2020.46 |
 
 
 
 
@@ -92,9 +92,9 @@ Below are the average episodic returns for `dqn_atari.py`.
 
 | Environment      | `dqn_atari.py` 10M steps | (Mnih et al., 2015)[^1] 50M steps | (Hessel et al., 2017, Figure 5)[^3] 
 | ----------- | ----------- | ----------- | ---- |
-| BreakoutNoFrameskip-v4      | 337.64 ± 69.47      |401.2 ± 26.9  | ~230 at 10M steps, ~300 at 50M steps
-| PongNoFrameskip-v4  | 20.293 ± 0.37     |  18.9 ± 1.3 |  ~20 10M steps, ~20 at 50M steps 
-| BeamRiderNoFrameskip-v4   | 6207.41 ± 1019.96        | 6846 ± 1619 | ~6000 10M steps, ~7000 at 50M steps 
+| BreakoutNoFrameskip-v4      | 366.928 ± 39.89      |401.2 ± 26.9  | ~230 at 10M steps, ~300 at 50M steps
+| PongNoFrameskip-v4  | 20.25 ± 0.41     |  18.9 ± 1.3 |  ~20 10M steps, ~20 at 50M steps 
+| BeamRiderNoFrameskip-v4   | 6673.24 ± 1434.37        | 6846 ± 1619 | ~6000 10M steps, ~7000 at 50M steps 
 
 
 Note that we save computational time by reducing timesteps from 50M to 10M, but our `dqn_atari.py` scores the same or higher than (Mnih et al., 2015)[^1] in 10M steps.
@@ -179,9 +179,9 @@ Below are the average episodic returns for `dqn.py`.
 
 | Environment      | `dqn.py`  | 
 | ----------- | ----------- | 
-| CartPole-v1      | 471.21 ± 43.45      |
-| Acrobot-v1  | -93.37 ± 8.46     | 
-| MountainCar-v0   | -170.51 ± 26.22        | 
+| CartPole-v1      | 488.69 ± 16.11      |
+| Acrobot-v1  | -91.54 ± 7.20     | 
+| MountainCar-v0   | -194.95 ± 8.48        | 
 
 
 Note that the DQN has no official benchmark on classic control environments, so we did not include a comparison. That said, our `dqn.py` was able to achieve near perfect scores in `CartPole-v1` and `Acrobot-v1`; further, it can obtain successful runs in the sparse environment `MountainCar-v0`.
 
@@ -197,9 +197,9 @@ The table below compares the results of CleanRL's [`sac_continuous_action.py`](h
 
 | Environment      | [`sac_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/sac_continuous_action.py) |[SAC: Algorithms and Applications](https://arxiv.org/abs/1812.05905) @ 1M steps|
 | --------------- | ------------------ | ---------------- |
-| HalfCheetah-v2  | 9,063 ± 1381       | ~11,250          |
-| Walker2d-v2     | 4554 ± 296         | ~4,800           |
-| Hopper-v2       | 2347 ± 538         | ~3,250           |
+| HalfCheetah-v2  | 10310.37 ± 1873.21       | ~11,250          |
+| Walker2d-v2     | 4418.15 ± 592.82         | ~4,800           |
+| Hopper-v2       | 2685.76 ± 762.16         | ~3,250           |
 
 
 ### Learning curves
@@ -212,17 +212,10 @@ The table below compares the results of CleanRL's [`sac_continuous_action.py`](h
 
 <div></div>
 
-<div class="grid-container">
-    <img src="../sac/HalfCheetahBulletEnv-v0.png">
-    <img src="../sac/Walker2DBulletEnv-v0.png">
-    <img src="../sac/HopperBulletEnv-v0.png">
-</div>
 
 ### Tracked experiments and gameplay videos
 
 <iframe src="https://wandb.ai/openrlbenchmark/openrlbenchmark/reports/MuJoCo-CleanRL-s-SAC--VmlldzoxNzI1NDM0" style="width:100%; height:1200px" title="MuJoCo: CleanRL's DDPG"></iframe>
 
 
-<iframe src="https://wandb.ai/openrlbenchmark/openrlbenchmark/reports/PyBullet-CleanRL-s-SAC--VmlldzoxNzI1NDQw" style="width:100%; height:1200px" title="PyBullet: CleanRL's DDPG"></iframe>
-
 [^1]:Diederik P Kingma, Max Welling (2016). Auto-Encoding Variational Bayes. ArXiv, abs/1312.6114. https://arxiv.org/abs/1312.6114
@@ -76,9 +76,9 @@ Below are the average episodic returns for [`td3_continuous_action.py`](https://
 
 | Environment      | [`td3_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py) | [`TD3.py`](https://github.com/sfujim/TD3/blob/master/TD3.py) (Fujimoto et al., 2018, Table 1)[^2]  |
 | ----------- | ----------- | ----------- | 
-| HalfCheetah      | 9391.52 ± 448.54      |9636.95 ± 859.065  |
-| Walker2d   | 3895.80 ± 333.89     |  4682.82 ± 539.64 | 
-| Hopper   | 3379.25 ± 200.22         |  3564.07 ± 114.74 | 
+| HalfCheetah      | 9018.31 ± 1078.31      |9636.95 ± 859.065  |
+| Walker2d   | 4246.07 ± 1210.84     |  4682.82 ± 539.64 | 
+| Hopper   | 3391.78 ± 232.21        |  3564.07 ± 114.74 |