- 2. BACKGROUND
- 2.1 Reinforcement Learning
- 2.2 Multi-Agent Settings
- 2.3 Centralized vs Decentralized Control
- 2.4 Cooperative, Zero-sum, and General-Sum
- 2.5 Partial Observability
- 2.6 Centralized Training, Decentralized Execution
- 2.7 Value Functions
- 2.8 Nash Equilibria
- 2.9 Deep Learning for MARL
- 2.10 Q-Learning and DQN
- 2.11 Reinforce and Actor-Critic
- 3. Counterfactual Multi-Agent Policy Gradients
- 4 Multi-Agent Common Knowledge Reinforcement Learning
- 5 Stabilizing Experience Replay
- 6. Learning to Communicate with Deep Multi-Agent ReinforcementLearning
- 7. Bayesian Action Decoder
- 8. Learning with Opponent-Learning Awareness
- 9. DiCE: The Infinitely Differentiable Monte Carlo Estimator