Search
2016 Volume 31
Article Contents
RESEARCH ARTICLE   Open Access    

Plan-based reward shaping for multi-agent reinforcement learning

More Information
  • Abstract: Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.
  • 加载中
  • Asmuth J., Littman M. & Zinkov R.2008. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 604–609.

    Google Scholar

    Babes M., de Cote E. & Littman M.2008. Social reward shaping in the prisoner’s dilemma. In Proceedings of the Seventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 3, 1389–1392.

    Google Scholar

    Bertsekas D. P.2007. Dynamic Programming and Optimal Control, 3rd edition. Athena Scientific.

    Google Scholar

    Claus C. & Boutilier C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence, 746–752.

    Google Scholar

    De Hauwere Y., Vrancx P. & Nowé A.2011. Solving delayed coordination problems in mas (extended abstract). In The 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1115–1116.

    Google Scholar

    Devlin S., Grześ M. & Kudenko D.2011. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems14(2), 251–278.

    Google Scholar

    Devlin S. & Kudenko D.2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the Tenth Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

    Google Scholar

    Devlin S. & Kudenko D.2012. Dynamic potential-based reward shaping. In Proceedings of the Eleventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

    Google Scholar

    De Weerdt M., Ter Mors A. & Witteveen C.2005. Multi-agent planning - an introduction to planning and coordination. Technical report, Delft University of Technology.

    Google Scholar

    Grześ M.2010. Improving exploration in reinforcement learning through domain knowledge and parameter analysis. Technical report, University of York.

    Google Scholar

    Grześ M. & Kudenko D.2008a. Multigrid reinforcement learning with reward shaping. In Artificial Neural Networks-ICANN5163, 357–366. Lecture Notes in Computer Science, Springer.

    Google Scholar

    Grześ M. & Kudenko D.2008b. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS'08), 22–29. IEEE.

    Google Scholar

    Grześ M. & Kudenko D.2009. Improving optimistic exploration in model-free reinforcement learning. Adaptive and Natural Computing Algorithms5495, 360–369.

    Google Scholar

    Marthi B.2007. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine learning, 608. ACM.

    Google Scholar

    Nash J.1951. Non-cooperative games. Annals of Mathematics54(2), 286–295.

    Google Scholar

    Ng A. Y., Harada D. & Russell S. J.1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287.

    Google Scholar

    Peot M. & Smith D.1992. Conditional nonlinear planning. In Artificial Intelligence Planning Systems: Proceedings of the First International Conference, 189. Morgan Kaufmann Publisher.

    Google Scholar

    Puterman M. L.1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc.

    Google Scholar

    Randløv J. & Alstrom P.1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, 463–471.

    Google Scholar

    Rosenschein J.1982. Synchronization of multi-agent plans. In Proceedings of the National Conference on Artificial Intelligence, 115–119.

    Google Scholar

    Shoham Y., Powers R. & Grenager T.2007. If multi-agent learning is the answer, what is the question?Artificial Intelligence171(7), 365–377.

    Google Scholar

    Shoham Y. & Tennenholtz M.1995. On social laws for artificial agent societies: off-line design. Artificial Intelligence73(1–2), 231–252.

    Google Scholar

    Sutton R. S.1984. Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts.

    Google Scholar

    Sutton R. S. & Barto A. G.1998. Reinforcement Learning: An Introduction. MIT Press.

    Google Scholar

    Ziparo V.2005. Multi-agent planning. Technical report, University of Rome.

    Google Scholar

  • Cite this article

    Sam Devlin, Daniel Kudenko. 2016. Plan-based reward shaping for multi-agent reinforcement learning. The Knowledge Engineering Review 31(1)44−58, doi: 10.1017/S0269888915000181
    Sam Devlin, Daniel Kudenko. 2016. Plan-based reward shaping for multi-agent reinforcement learning. The Knowledge Engineering Review 31(1)44−58, doi: 10.1017/S0269888915000181

Article Metrics

Article views(23) PDF downloads(4)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Plan-based reward shaping for multi-agent reinforcement learning

The Knowledge Engineering Review  31 2016, 31(1): 44−58  |  Cite this article

Abstract: Abstract: Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

    • Experiments with a negative reward on each time step and γ=1 made no significant change in the behaviour of the agents.

    • Please note the joint-plan-based agents’ illustrated performance in Figure 2 does not reach 600 as the value presented is discounted by the time it takes the agents to complete the episode.

    • For individual-plan-based agents p=0.064, for all others p<0.05.

    • Consequently MaxReward now equals 1200.

    • © Cambridge University Press, 2016 2016Cambridge University Press
References (25)
  • About this article
    Cite this article
    Sam Devlin, Daniel Kudenko. 2016. Plan-based reward shaping for multi-agent reinforcement learning. The Knowledge Engineering Review 31(1)44−58, doi: 10.1017/S0269888915000181
    Sam Devlin, Daniel Kudenko. 2016. Plan-based reward shaping for multi-agent reinforcement learning. The Knowledge Engineering Review 31(1)44−58, doi: 10.1017/S0269888915000181
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return