Search
2016 Volume 31
Article Contents
RESEARCH ARTICLE   Open Access    

Context-sensitive reward shaping for sparse interaction multi-agent systems

More Information
  • Abstract: Potential-based reward shaping is a commonly used approach in reinforcement learning to direct exploration based on prior knowledge. Both in single and multi-agent settings this technique speeds up learning without losing any theoretical convergence guarantees. However, if speed ups through reward shaping are to be achieved in multi-agent environments, a different shaping signal should be used for each context in which agents have a different subgoal or when agents are involved in a different interaction situation.This paper describes the use of context-aware potential functions in a multi-agent system in which the interactions between agents are sparse. This means that, unknown to the agents a priori, the interactions between the agents only occur sporadically in certain regions of the state space. During these interactions, agents need to coordinate in order to reach the global optimal solution.We demonstrate how different reward shaping functions can be used on top of Future Coordinating Q-learning (FCQ-learning); an algorithm capable of automatically detecting when agents should take each other into consideration. Using FCQ-learning, coordination problems can even be anticipated before the actual problems occur, allowing the problems to be solved timely. We evaluate our approach on a range of gridworld problems, as well as a simulation of air traffic control.
  • 加载中
  • Boutilier C.1996. Planning, learning and coordination in multiagent decision processes. In Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, 195–210.

    Google Scholar

    Claus C. & Boutilier C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence, 746–752. AAAI Press.

    Google Scholar

    De Hauwere Y.-M., Vrancx P. & Nowé A.2010. Learning multi-agent state space representations. In The 9th International Conference on Autonomous Agents and Multiagent Systems, 715–722.

    Google Scholar

    De Hauwere Y.-M., Vrancx P. & Nowé A.2011a. Adaptive state representations for multi-agent reinforcement learning. In Proceedings of the 3th International Conference on Agents and Artificial Intelligence, 181–189.

    Google Scholar

    De Hauwere Y.-M., Vrancx P. & Nowé A.2011b. Solving delayed coordination problems in MAS (extended abstract). In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1115–1116.

    Google Scholar

    De Hauwere Y.-M., Vrancx P. & Nowé A.2011c. Solving sparse delayed coordination problems in multi-agent reinforcement learning. In Adaptive Agents and Multi-Agent Systems V, Lecture Notes in Artificial Intelligence Volume 7113, 45–52. Springer-Verlag.

    Google Scholar

    Devlin S. & Kudenko D.2011. Theoretical considerations of potential-based reward shaping for multiagent systems. In The 10th International Conference on Autonomous Agents and Multiagent Systems—Volume 1, 225–232.

    Google Scholar

    Devlin S. & Kudenko D. (In Press), Plan-based reward shaping for multi-agent reinforcement learning. Knowledge Engineering Review.

    Google Scholar

    Greenwald A. & Hall K.2003. Correlated-Q learning. In AAAI Spring Symposium, 242–249. AAAI Press.

    Google Scholar

    Grzes M. & Kudenko D.2008. Plan-based reward shaping for reinforcement learning. In 4th International IEEE Conference on Intelligent Systems, 2008. IS’08, 2, 10–22–10–29.

    Google Scholar

    Hu J. & Wellman M.2003. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research4, 1039–1069.

    Google Scholar

    Kok J., ’t Hoen P., Bakker B. & Vlassis N.2005. Utile coordination: learning interdependencies among cooperative agents. In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG’05), 29–36.

    Google Scholar

    Melo F. & Veloso M.2009. Learning of coordination: exploiting sparse interactions in multiagent systems. In Proceedings of the 8th International Conference on Autonomous Agents and Multi-Agent Systems, 773–780.

    Google Scholar

    Melo F. & Veloso M.2010. Local Multiagent Coordination in Decentralised MDPs with Sparse Interactions. Technical report CMU-CS-10-133, School of Computer Science, Carnegie Mellon University.

    Google Scholar

    Ng A. Y., Harada D. & Russell S.1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287. Morgan Kaufmann.

    Google Scholar

    Randløv J. & Alstrøm P.1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, ICML’98, 463–471. Morgan Kaufmann.

    Google Scholar

    Tsitsiklis J.1994. Asynchronous stochastic approximation and Q-learning. Journal of Machine Learning16(3), 185–202.

    Google Scholar

    Tumer K. & Khani N.2009. Learning from actions not taken in multiagent systems. Advances in Complex Systems12(4–5), 455–473.

    Google Scholar

    Vrancx P., Verbeeck K. & Nowé A.2008. Decentralized learning in Markov games. IEEE Transactions on Systems, Man and Cybernetics (Part B: Cybernetics)38(4), 976–981.

    Google Scholar

    Watkins C.1989. Learning from Delayed Rewards. PhD thesis, University of Cambridge.

    Google Scholar

  • Cite this article

    Yann-Michaël de Hauwere, Sam Devlin, Daniel Kudenko, Ann Nowé. 2016. Context-sensitive reward shaping for sparse interaction multi-agent systems. The Knowledge Engineering Review 31(1)59−76, doi: 10.1017/S0269888915000193
    Yann-Michaël de Hauwere, Sam Devlin, Daniel Kudenko, Ann Nowé. 2016. Context-sensitive reward shaping for sparse interaction multi-agent systems. The Knowledge Engineering Review 31(1)59−76, doi: 10.1017/S0269888915000193

Article Metrics

Article views(21) PDF downloads(5)

RESEARCH ARTICLE   Open Access    

Context-sensitive reward shaping for sparse interaction multi-agent systems

The Knowledge Engineering Review  31 2016, 31(1): 59−76  |  Cite this article

Abstract: Abstract: Potential-based reward shaping is a commonly used approach in reinforcement learning to direct exploration based on prior knowledge. Both in single and multi-agent settings this technique speeds up learning without losing any theoretical convergence guarantees. However, if speed ups through reward shaping are to be achieved in multi-agent environments, a different shaping signal should be used for each context in which agents have a different subgoal or when agents are involved in a different interaction situation.This paper describes the use of context-aware potential functions in a multi-agent system in which the interactions between agents are sparse. This means that, unknown to the agents a priori, the interactions between the agents only occur sporadically in certain regions of the state space. During these interactions, agents need to coordinate in order to reach the global optimal solution.We demonstrate how different reward shaping functions can be used on top of Future Coordinating Q-learning (FCQ-learning); an algorithm capable of automatically detecting when agents should take each other into consideration. Using FCQ-learning, coordination problems can even be anticipated before the actual problems occur, allowing the problems to be solved timely. We evaluate our approach on a range of gridworld problems, as well as a simulation of air traffic control.

    • The authors would like to thank the anonymous reviewers as well as the editors of the journal for their constructive comments.

    • For a two-agent environment with each nine local states.

    • © Cambridge University Press, 2016 2016Cambridge University Press
References (20)
  • About this article
    Cite this article
    Yann-Michaël de Hauwere, Sam Devlin, Daniel Kudenko, Ann Nowé. 2016. Context-sensitive reward shaping for sparse interaction multi-agent systems. The Knowledge Engineering Review 31(1)59−76, doi: 10.1017/S0269888915000193
    Yann-Michaël de Hauwere, Sam Devlin, Daniel Kudenko, Ann Nowé. 2016. Context-sensitive reward shaping for sparse interaction multi-agent systems. The Knowledge Engineering Review 31(1)59−76, doi: 10.1017/S0269888915000193
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return