-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper multi-gpu support with PPO #178
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
if capture_video: | ||
if idx == 0: | ||
env = gym.wrappers.RecordVideo(env, f"videos/{run_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if capture_video: | |
if idx == 0: | |
env = gym.wrappers.RecordVideo(env, f"videos/{run_name}") | |
if capture_video and idx == 0: | |
env = gym.wrappers.RecordVideo(env, f"videos/{run_name}") |
args.batch_size = int(args.num_envs * args.num_steps) | ||
args.minibatch_size = int(args.batch_size // args.num_minibatches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dup to line 89-90?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to remove lines 89-90
args.seed += local_rank | ||
random.seed(args.seed) | ||
np.random.seed(args.seed) | ||
torch.manual_seed(args.seed - local_rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not
args.seed += local_rank | |
random.seed(args.seed) | |
np.random.seed(args.seed) | |
torch.manual_seed(args.seed - local_rank) | |
torch.manual_seed(args.seed) | |
args.seed += local_rank | |
random.seed(args.seed) | |
np.random.seed(args.seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API
assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported" | ||
|
||
agent = Agent(envs).to(device) | ||
torch.manual_seed(args.seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dup to line 201?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
# TRY NOT TO MODIFY: start the game | ||
global_step = 0 | ||
start_time = time.time() | ||
next_obs = torch.Tensor(envs.reset()).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_obs = torch.Tensor(envs.reset()).to(device) | |
next_obs = torch.tensor(envs.reset()).to(device) |
per https://discuss.pytorch.org/t/difference-between-torch-tensor-and-torch-tensor/30786/2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm in the past I have weird issues with torch.tensor
. I'd also avoid changing it just in ppo_atari_multigpu.py
but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)
# TRY NOT TO MODIFY: execute the game and log data. | ||
next_obs, reward, done, info = envs.step(action.cpu().numpy()) | ||
rewards[step] = torch.tensor(reward).to(device).view(-1) | ||
next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same to line 236
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy() | ||
var_y = np.var(y_true) | ||
explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit probably move under the line 387
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just follows the structure like in other files.
Sry, didn't have time to review before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @yooceii for the review! I left a few comments. Feel free to open a PR to fix applicable issues :)
args.batch_size = int(args.num_envs * args.num_steps) | ||
args.minibatch_size = int(args.batch_size // args.num_minibatches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to remove lines 89-90
args.seed += local_rank | ||
random.seed(args.seed) | ||
np.random.seed(args.seed) | ||
torch.manual_seed(args.seed - local_rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API
assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported" | ||
|
||
agent = Agent(envs).to(device) | ||
torch.manual_seed(args.seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
# TRY NOT TO MODIFY: start the game | ||
global_step = 0 | ||
start_time = time.time() | ||
next_obs = torch.Tensor(envs.reset()).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm in the past I have weird issues with torch.tensor
. I'd also avoid changing it just in ppo_atari_multigpu.py
but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)
# TRY NOT TO MODIFY: execute the game and log data. | ||
next_obs, reward, done, info = envs.step(action.cpu().numpy()) | ||
rewards[step] = torch.tensor(reward).to(device).view(-1) | ||
next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy() | ||
var_y = np.var(y_true) | ||
explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just follows the structure like in other files.
Description
This is a follow up on #162
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).