Search
2012 Volume 27
Article Contents
RESEARCH ARTICLE   Open Access    

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

More Information

Article Metrics

Article views(16) PDF downloads(24)

RESEARCH ARTICLE   Open Access    

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

The Knowledge Engineering Review  27 Article number: 10.1017/S0269888912000057  (2012)  |  Cite this article

Abstract: Abstract: In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.

    • This work is partially supported by the Smart Surface ANR (French National Research Agency) project (ANR_06_ROBO_0009_03).

    • Markov games are also called stochastic games, but we use the term Markov games to avoid confusion with stochastic (non-deterministic) Markov games.

    • A fully cooperative Markov game is also called an identical payoff stochastic game (Peshkin et al., 2000) or a multi-agent Markov decision process (Boutilier, 1999).

    • The greedy policy based on Qi picks for every state the action with the highest Q-value.

    • Library of Simulink tools for reinforcement learning.

    • 12 and 6 are received with equal probabilities in the stochastic mode.

    • This concept is close to the concept of off-policy algorithms for single-agent problem, see Sutton and Barto (1998), for example.

    • With the exception of distributed Q-learning that complies with its theoretical guarantees.

    • Copyright © Cambridge University Press 20122012Cambridge University Press
References (62)
  • About this article
    Cite this article
    Laetitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat. 2012. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review. 27:57 doi: 10.1017/S0269888912000057
    Laetitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat. 2012. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review. 27:57 doi: 10.1017/S0269888912000057
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return