Prototype Envpool Support #100

vwxyzjn · 2022-01-09T03:30:22Z

This PR adds envpool example. Interestingly, after increasing num_envs=32, I was able to solve Pong in 10 mins :D

See the tracked experiment in costa-huang/cleanRL/runs/3rx432mj

gitpod-io · 2022-01-09T03:30:24Z

yooceii · 2022-01-09T05:35:58Z

cleanrl/ppo_atari_envpool.py

+        self.num_envs = getattr(env, "num_envs", 1)
+        self.episode_returns = None
+        self.episode_lengths = None
+        self.is_vector_env = True


is_vector_env is not referenced except in line 185 which is a comment.

yooceii · 2022-01-09T05:38:08Z

Wonder what's the performance comparing with https://github.com/NVlabs/cule with large number of envs.

vwxyzjn · 2022-01-09T14:23:57Z

Ran a hyper-parameter sweep (sweeps/nfrd091p) overnight, now i can solve Pong in ~5 mins, according to runs/opk2dmta, with hyper parameters

--clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3

See also vwxyzjn/cleanrl#100

dosssman

While I am not too familiar with envpool, I got that it is essential related to the environments, and the PPO logic did not seem to have changed.

My attempt converged around 30 minutes, but did use a weaker CPU server than yours, so I suspect the wall time efficiency is highly depended on hardware. Nevertheless, it is still faster than ppo_atari.py which does not use envpool ( using same hyper parameters), which has yet to converge stably enough after 1h30.

For reference, ppo_atari_envpoo.py has an SPS of around 1729, while ppo_atary.py has an SPS 489.

In any case, this PR looks good for me.
Great work.

CPU specs:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          20
On-line CPU(s) list:             0-19
Thread(s) per core:              1
Core(s) per socket:              10
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
Stepping:                        1
CPU MHz:                         1200.179
CPU max MHz:                     3400.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4800.00
Virtualization:                  VT-x
L1d cache:                       640 KiB
L1i cache:                       640 KiB
L2 cache:                        5 MiB
L3 cache:                        50 MiB

GPU spces: 1080

vwxyzjn · 2022-01-17T14:40:44Z

https://wandb.ai/vwxyzjn/ppo-details/reports/Envpool--VmlldzoxNDM3ODQz @dosssman

dosssman · 2022-01-19T04:32:15Z

This time it seems to take around 50 minutes for Pong.

Is the 5 min PPO solving Pong-v5 really due to the hyper parameters mentioned above ?

Also, I noticed that you used the same machine for all the runs, so I was wondering if the concurrence of the training scripts could have some impact on the overall performance too ...

vwxyzjn · 2022-01-19T04:42:21Z

@dosssman it was largely a bit of hyperparameter tuning. Also, I was running these scripts one at a time, so no concurrent issues.

vwxyzjn mentioned this pull request Jan 9, 2022

Add CleanRL examples: PPO solve Pong in 5 mins sail-sg/envpool#48

Merged

vwxyzjn requested review from dosssman and yooceii January 9, 2022 03:52

yooceii approved these changes Jan 9, 2022

View reviewed changes

dosssman approved these changes Jan 17, 2022

View reviewed changes

vwxyzjn added 3 commits February 8, 2022 11:24

push changes

3c08650

pushing changes

68e2fa0

Update lock files and docs

ae91eac

vwxyzjn force-pushed the new-envpool branch from 2613097 to ae91eac Compare February 8, 2022 17:03

vwxyzjn added 2 commits February 8, 2022 12:04

Update docs

3cd4026

Update docs

66e301a

vwxyzjn merged commit 57fdf35 into master Feb 8, 2022

vwxyzjn deleted the new-envpool branch February 8, 2022 21:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype Envpool Support #100

Prototype Envpool Support #100

vwxyzjn commented Jan 9, 2022

gitpod-io bot commented Jan 9, 2022

yooceii Jan 9, 2022

yooceii commented Jan 9, 2022

vwxyzjn commented Jan 9, 2022

dosssman left a comment •

edited

Loading

vwxyzjn commented Jan 17, 2022 •

edited

Loading

dosssman commented Jan 19, 2022

vwxyzjn commented Jan 19, 2022

Prototype Envpool Support #100

Prototype Envpool Support #100

Conversation

vwxyzjn commented Jan 9, 2022

gitpod-io bot commented Jan 9, 2022

yooceii Jan 9, 2022

Choose a reason for hiding this comment

yooceii commented Jan 9, 2022

vwxyzjn commented Jan 9, 2022

dosssman left a comment • edited Loading

Choose a reason for hiding this comment

vwxyzjn commented Jan 17, 2022 • edited Loading

dosssman commented Jan 19, 2022

vwxyzjn commented Jan 19, 2022

dosssman left a comment •

edited

Loading

vwxyzjn commented Jan 17, 2022 •

edited

Loading