You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Usually PPO is for continous action, but for OpenAI FIVE, shouldn't the action be discrete? What's the technique to make PPO applicable to Dota2 actions?
The text was updated successfully, but these errors were encountered:
PPO is just a trick to regularize policies updates, that can be use for any kind of state/action spaces. if action are discrete, just replace the multivariate Gaussian by a softmax distribution over actions. It would just change the way log_probs and entropies are computed. You have an example of PPO implementation with both options for discrete and continuous actions here: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/model.py
Usually PPO is for continous action, but for OpenAI FIVE, shouldn't the action be discrete? What's the technique to make PPO applicable to Dota2 actions?
The text was updated successfully, but these errors were encountered: