Question on algorithm itself #8

QiXuanWang · 2019-07-12T07:51:08Z

Usually PPO is for continous action, but for OpenAI FIVE, shouldn't the action be discrete? What's the technique to make PPO applicable to Dota2 actions?

alexis-jacq · 2019-07-16T08:22:40Z

PPO is just a trick to regularize policies updates, that can be use for any kind of state/action spaces. if action are discrete, just replace the multivariate Gaussian by a softmax distribution over actions. It would just change the way log_probs and entropies are computed. You have an example of PPO implementation with both options for discrete and continuous actions here: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/model.py

QiXuanWang · 2019-07-18T13:47:54Z

Thanks for this. Laterly I found some information which states similar as yours. Thanks for the pointing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on algorithm itself #8

Question on algorithm itself #8

QiXuanWang commented Jul 12, 2019

alexis-jacq commented Jul 16, 2019

QiXuanWang commented Jul 18, 2019

Question on algorithm itself #8

Question on algorithm itself #8

Comments

QiXuanWang commented Jul 12, 2019

alexis-jacq commented Jul 16, 2019

QiXuanWang commented Jul 18, 2019