Proper multi-gpu support with PPO #178

vwxyzjn · 2022-05-04T22:53:01Z

Description

This is a follow up on #162

Types of changes

New algorithm

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-05-04T22:53:03Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	May 29, 2022 at 3:25PM (UTC)

gitpod-io · 2022-05-04T22:53:05Z

yooceii · 2022-05-29T18:33:16Z

cleanrl/ppo_atari_multigpu.py

+        if capture_video:
+            if idx == 0:
+                env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")


Suggested change

if capture_video:

if idx == 0:

env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")

if capture_video and idx == 0:

env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")

yooceii · 2022-05-29T18:36:24Z

cleanrl/ppo_atari_multigpu.py

+    args.batch_size = int(args.num_envs * args.num_steps)
+    args.minibatch_size = int(args.batch_size // args.num_minibatches)


Dup to line 89-90?

Feel free to remove lines 89-90

yooceii · 2022-05-29T18:39:13Z

cleanrl/ppo_atari_multigpu.py

+    args.seed += local_rank
+    random.seed(args.seed)
+    np.random.seed(args.seed)
+    torch.manual_seed(args.seed - local_rank)


Why not

Suggested change

args.seed += local_rank

random.seed(args.seed)

np.random.seed(args.seed)

torch.manual_seed(args.seed - local_rank)

torch.manual_seed(args.seed)

args.seed += local_rank

random.seed(args.seed)

np.random.seed(args.seed)

The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API

yooceii · 2022-05-29T18:44:47Z

cleanrl/ppo_atari_multigpu.py

+    assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported"
+
+    agent = Agent(envs).to(device)
+    torch.manual_seed(args.seed)


Dup to line 201?

See comment above.

yooceii · 2022-05-29T18:48:52Z

cleanrl/ppo_atari_multigpu.py

+    # TRY NOT TO MODIFY: start the game
+    global_step = 0
+    start_time = time.time()
+    next_obs = torch.Tensor(envs.reset()).to(device)


Suggested change

next_obs = torch.Tensor(envs.reset()).to(device)

next_obs = torch.tensor(envs.reset()).to(device)

per https://discuss.pytorch.org/t/difference-between-torch-tensor-and-torch-tensor/30786/2

Hmm in the past I have weird issues with torch.tensor. I'd also avoid changing it just in ppo_atari_multigpu.py but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)

yooceii · 2022-05-29T18:49:15Z

cleanrl/ppo_atari_multigpu.py

+            # TRY NOT TO MODIFY: execute the game and log data.
+            next_obs, reward, done, info = envs.step(action.cpu().numpy())
+            rewards[step] = torch.tensor(reward).to(device).view(-1)
+            next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device)


Same to line 236

See comment above.

yooceii · 2022-05-29T18:54:15Z

cleanrl/ppo_atari_multigpu.py

+        y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy()
+        var_y = np.var(y_true)
+        explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y


nit probably move under the line 387

This just follows the structure like in other files.

yooceii · 2022-05-29T18:58:54Z

Sry, didn't have time to review before merging.

vwxyzjn

Thank you @yooceii for the review! I left a few comments. Feel free to open a PR to fix applicable issues :)

vwxyzjn · 2022-05-29T21:19:34Z

cleanrl/ppo_atari_multigpu.py

+    args.batch_size = int(args.num_envs * args.num_steps)
+    args.minibatch_size = int(args.batch_size // args.num_minibatches)


Feel free to remove lines 89-90

vwxyzjn · 2022-05-29T21:22:12Z

cleanrl/ppo_atari_multigpu.py

+    args.seed += local_rank
+    random.seed(args.seed)
+    np.random.seed(args.seed)
+    torch.manual_seed(args.seed - local_rank)


The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API

vwxyzjn · 2022-05-29T21:22:20Z

cleanrl/ppo_atari_multigpu.py

+    assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported"
+
+    agent = Agent(envs).to(device)
+    torch.manual_seed(args.seed)


See comment above.

vwxyzjn · 2022-05-29T21:23:21Z

cleanrl/ppo_atari_multigpu.py

+    # TRY NOT TO MODIFY: start the game
+    global_step = 0
+    start_time = time.time()
+    next_obs = torch.Tensor(envs.reset()).to(device)


Hmm in the past I have weird issues with torch.tensor. I'd also avoid changing it just in ppo_atari_multigpu.py but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)

vwxyzjn · 2022-05-29T21:23:31Z

cleanrl/ppo_atari_multigpu.py

+            # TRY NOT TO MODIFY: execute the game and log data.
+            next_obs, reward, done, info = envs.step(action.cpu().numpy())
+            rewards[step] = torch.tensor(reward).to(device).view(-1)
+            next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device)


See comment above.

vwxyzjn · 2022-05-29T21:24:42Z

cleanrl/ppo_atari_multigpu.py

+        y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy()
+        var_y = np.var(y_true)
+        explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y


This just follows the structure like in other files.

vwxyzjn added 3 commits May 2, 2022 23:22

Add multi-gpu example

7ee1828

fix pre-commit

971c6bc

Add documentation and benchmark

c8fe88b

Update documentation

5cbbddf

vercel bot deployed to Preview May 4, 2022 22:56 View deployment

vwxyzjn mentioned this pull request May 5, 2022

Prototype multi-gpu support with PPO #162

Closed

18 tasks

Quick fix

c1a3286

vercel bot deployed to Preview May 5, 2022 13:51 View deployment

vwxyzjn mentioned this pull request May 10, 2022

multi-gpu implementation for PPO #182

Closed

vwxyzjn requested review from dosssman and yooceii May 14, 2022 03:44

Also record world size in the params

ada9cfa

vercel bot deployed to Preview May 18, 2022 22:06 View deployment

remove trailing space

fbea751

vercel bot deployed to Preview May 25, 2022 16:13 View deployment

revert changes

f9f75c8

vercel bot deployed to Preview May 29, 2022 14:28 View deployment

Update test cases

0e933e9

vercel bot deployed to Preview May 29, 2022 14:33 View deployment

Update test cases

a0be9e3

vercel bot deployed to Preview May 29, 2022 14:34 View deployment

Fix CI

d29a9ad

vercel bot deployed to Preview May 29, 2022 14:42 View deployment

Fix tests

6d8bed8

vercel bot deployed to Preview May 29, 2022 14:45 View deployment

Fix pre-commit

8d234f4

vercel bot deployed to Preview May 29, 2022 14:46 View deployment

Fix tests

16aa83e

vercel bot deployed to Preview May 29, 2022 14:50 View deployment

Add a note that multi gpu only supported in linux

2931b56

vercel bot deployed to Preview May 29, 2022 15:25 View deployment

vwxyzjn merged commit 3555838 into master May 29, 2022

yooceii reviewed May 29, 2022

View reviewed changes

vwxyzjn commented May 29, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper multi-gpu support with PPO #178

Proper multi-gpu support with PPO #178

vwxyzjn commented May 4, 2022 •

edited

Loading

vercel bot commented May 4, 2022 •

edited

Loading

gitpod-io bot commented May 4, 2022

yooceii May 29, 2022 •

edited

Loading

yooceii May 29, 2022

vwxyzjn May 29, 2022

yooceii May 29, 2022

vwxyzjn May 29, 2022

yooceii May 29, 2022

vwxyzjn May 29, 2022

yooceii May 29, 2022

vwxyzjn May 29, 2022

yooceii May 29, 2022

vwxyzjn May 29, 2022

yooceii May 29, 2022

vwxyzjn May 29, 2022

yooceii commented May 29, 2022

vwxyzjn left a comment

vwxyzjn May 29, 2022

vwxyzjn May 29, 2022

vwxyzjn May 29, 2022

vwxyzjn May 29, 2022

vwxyzjn May 29, 2022

vwxyzjn May 29, 2022

		args.batch_size = int(args.num_envs * args.num_steps)
		args.minibatch_size = int(args.batch_size // args.num_minibatches)

	next_obs = torch.Tensor(envs.reset()).to(device)
	next_obs = torch.tensor(envs.reset()).to(device)

Proper multi-gpu support with PPO #178

Proper multi-gpu support with PPO #178

Conversation

vwxyzjn commented May 4, 2022 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented May 4, 2022 • edited Loading

gitpod-io bot commented May 4, 2022

yooceii May 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yooceii commented May 29, 2022

vwxyzjn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vwxyzjn commented May 4, 2022 •

edited

Loading

vercel bot commented May 4, 2022 •

edited

Loading

yooceii May 29, 2022 •

edited

Loading