+ Also note that our `ppo_procgen.py` which closely matches implementation details of `openai/baselines`' PPO which might not be the same as `openai/phasic-policy-gradient`'s PPO. We take the reported results from (Cobbe et al., 2020)[^1] and (Cobbe et al., 2021)[^2] and compared them in a [google sheet](https://docs.google.com/spreadsheets/d/1ZC_D2WPL6-PzhecM4ZFQWQ6nY6dkXeQDOIgRHVp1BNU/edit?usp=sharing) (screenshot shown below). As shown, the performance seems to diverge a bit. We also note that (Cobbe et al., 2020)[^1] used [`procgen==0.9.2`](https://github.com/openai/train-procgen/blob/1a2ae2194a61f76a733a39339530401c024c3ad8/environment.yml#L10) and (Cobbe et al., 2021)[^2] used [`procgen==0.10.4`](https://github.com/openai/phasic-policy-gradient/blob/7295473f0185c82f9eb9c1e17a373135edd8aacc/environment.yml#L10), which also could cause performance difference. It is for this reason, we ran our own `openai/phasic-policy-gradient` experiments on the `easy` distribution for comparison, but this does mean it's challenging to compare our results against those in the original PPG paper (Cobbe et al., 2021)[^2].
0 commit comments