Multi-Agent Reinforcement Learning

Reinforcement learning (RL) can be extended to multi-agent systems (MAS) by replacing the single agent MDP formulation with a more general Markov game, also known as a stochastic game (Shapley, 1953). A Markov game, introduced by Littman (1994) as an extension of the Markov decision process (MDP) formalisation, is a mathematical framework that uses game theory to reason about multiple interactive agents in a shared environment. It is formalised (Yang and Wang, 2021) by the tuple:

$$\left\langle N, \mathbb{S},\lbrace\mathbb{A}^i\rbrace_{i \in{1, \ldots, N}}, \mathcal{P},\lbrace\mathcal{R}^i\rbrace_{i \in{1, \ldots, N}}, \gamma\right\rangle$$

where:

The goal of each agent $i$ in a multi-agent reinforcement learning (MARL) scenario is to solve the Markov game by finding the behavioural policy, known as mixed strategy in game theory, $\pi^i \in \Pi^i: \mathbb{S} \rightarrow \Delta(\mathbb{A}^i)$ that maximises the discounted cumulative reward. Finding an optimal policy requires each agent to strategically consider the actions of other agents when determining its own policy (Yang and Wang, 2021). The type of game is determined by the reward structure (Gronauer and Diepold, 2022):

References


Back to Home