Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on algorithm itself #8

Open
QiXuanWang opened this issue Jul 12, 2019 · 2 comments
Open

Question on algorithm itself #8

QiXuanWang opened this issue Jul 12, 2019 · 2 comments

Comments

@QiXuanWang
Copy link

Usually PPO is for continous action, but for OpenAI FIVE, shouldn't the action be discrete? What's the technique to make PPO applicable to Dota2 actions?

@alexis-jacq
Copy link
Owner

PPO is just a trick to regularize policies updates, that can be use for any kind of state/action spaces. if action are discrete, just replace the multivariate Gaussian by a softmax distribution over actions. It would just change the way log_probs and entropies are computed. You have an example of PPO implementation with both options for discrete and continuous actions here: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/model.py

@QiXuanWang
Copy link
Author

Thanks for this. Laterly I found some information which states similar as yours. Thanks for the pointing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants