Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

Laetitia Matignon; Guillaume J. Laurent; Nadine Le Fort-Piat; Laetitia Matignon; Guillaume J. Laurent; Nadine Le Fort-Piat

doi:10.1017/S0269888912000057

Abstract: In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.

Other Articles By Authors

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

UMR CNRS 6174

Published online: 22 February 2012

Abstract: Abstract: In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.

HTML

Acknowledgments

This work is partially supported by the Smart Surface ANR (French National Research Agency) project (ANR_06_ROBO_0009_03).

Markov games are also called stochastic games, but we use the term Markov games to avoid confusion with stochastic (non-deterministic) Markov games.

A fully cooperative Markov game is also called an identical payoff stochastic game (Peshkin et al., 2000) or a multi-agent Markov decision process (Boutilier, 1999).

The greedy policy based on Q_i picks for every state the action with the highest Q-value.

Library of Simulink tools for reinforcement learning.

12 and 6 are received with equal probabilities in the stochastic mode.

This concept is close to the concept of off-policy algorithms for single-agent problem, see Sutton and Barto (1998), for example.

With the exception of distributed Q-learning that complies with its theoretical guarantees.

Rights and permissions

References (62)

About this article

Cite this article

Laetitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat. 2012. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review. 27:57 doi: 10.1017/S0269888912000057

Laetitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat. 2012. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review. 27:57 doi: 10.1017/S0269888912000057

{{lists.name}}

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors