-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype Envpool Support #100
Conversation
cleanrl/ppo_atari_envpool.py
Outdated
self.num_envs = getattr(env, "num_envs", 1) | ||
self.episode_returns = None | ||
self.episode_lengths = None | ||
self.is_vector_env = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_vector_env is not referenced except in line 185 which is a comment.
Wonder what's the performance comparing with https://github.com/NVlabs/cule with large number of envs. |
Ran a hyper-parameter sweep (sweeps/nfrd091p) overnight, now i can solve Pong in ~5 mins, according to runs/opk2dmta, with hyper parameters
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am not too familiar with envpool, I got that it is essential related to the environments, and the PPO logic did not seem to have changed.
My attempt converged around 30 minutes, but did use a weaker CPU server than yours, so I suspect the wall time efficiency is highly depended on hardware. Nevertheless, it is still faster than ppo_atari.py which does not use envpool ( using same hyper parameters), which has yet to converge stably enough after 1h30.
For reference, ppo_atari_envpoo.py
has an SPS of around 1729, while ppo_atary.py
has an SPS 489.
In any case, this PR looks good for me.
Great work.
CPU specs:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 1
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
Stepping: 1
CPU MHz: 1200.179
CPU max MHz: 3400.0000
CPU min MHz: 1200.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 640 KiB
L1i cache: 640 KiB
L2 cache: 5 MiB
L3 cache: 50 MiB
GPU spces: 1080
This time it seems to take around 50 minutes for Pong. Is the 5 min PPO solving Pong-v5 really due to the hyper parameters mentioned above ? Also, I noticed that you used the same machine for all the runs, so I was wondering if the concurrence of the training scripts could have some impact on the overall performance too ... |
@dosssman it was largely a bit of hyperparameter tuning. Also, I was running these scripts one at a time, so no concurrent issues. |
This PR adds envpool example. Interestingly, after increasing
num_envs=32
, I was able to solve Pong in 10 mins :DSee the tracked experiment in costa-huang/cleanRL/runs/3rx432mj