Plan-based reward shaping for multi-agent reinforcement learning

Sam Devlin; Daniel Kudenko; Sam Devlin; Daniel Kudenko

doi:10.1017/S0269888915000181

2016 Volume 31

Article Contents

Next Previous

RESEARCH ARTICLE Open Access

Plan-based reward shaping for multi-agent reinforcement learning

Sam Devlin¹,
Daniel Kudenko¹

Department of Computer Science

More Information

Published online: 11 February 2016
The Knowledge Engineering Review 31, Article number: 10.1017/S0269888915000181 (2016) | Cite this article

Abstract

Abstract: Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.
Rights and permissions
© Cambridge University Press, 2016 2016Cambridge University Press

References

Asmuth J., Littman M. & Zinkov R.2008. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 604–609.

Google Scholar

Babes M., de Cote E. & Littman M.2008. Social reward shaping in the prisoner’s dilemma. In Proceedings of the Seventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 3, 1389–1392.

Google Scholar

Bertsekas D. P.2007. Dynamic Programming and Optimal Control, 3rd edition. Athena Scientific.

Google Scholar

Claus C. & Boutilier C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence, 746–752.

Google Scholar

De Hauwere Y., Vrancx P. & Nowé A.2011. Solving delayed coordination problems in mas (extended abstract). In The 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1115–1116.

Google Scholar

Devlin S., Grześ M. & Kudenko D.2011. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems14(2), 251–278.

Google Scholar

Devlin S. & Kudenko D.2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the Tenth Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

Google Scholar

Devlin S. & Kudenko D.2012. Dynamic potential-based reward shaping. In Proceedings of the Eleventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

Google Scholar

De Weerdt M., Ter Mors A. & Witteveen C.2005. Multi-agent planning - an introduction to planning and coordination. Technical report, Delft University of Technology.

Google Scholar

Grześ M.2010. Improving exploration in reinforcement learning through domain knowledge and parameter analysis. Technical report, University of York.

Google Scholar

Grześ M. & Kudenko D.2008a. Multigrid reinforcement learning with reward shaping. In Artificial Neural Networks-ICANN5163, 357–366. Lecture Notes in Computer Science, Springer.

Google Scholar

Grześ M. & Kudenko D.2008b. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS'08), 22–29. IEEE.

Google Scholar

Grześ M. & Kudenko D.2009. Improving optimistic exploration in model-free reinforcement learning. Adaptive and Natural Computing Algorithms5495, 360–369.

Google Scholar

Marthi B.2007. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine learning, 608. ACM.

Google Scholar

Nash J.1951. Non-cooperative games. Annals of Mathematics54(2), 286–295.

Google Scholar

Ng A. Y., Harada D. & Russell S. J.1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287.

Google Scholar

Peot M. & Smith D.1992. Conditional nonlinear planning. In Artificial Intelligence Planning Systems: Proceedings of the First International Conference, 189. Morgan Kaufmann Publisher.

Google Scholar

Puterman M. L.1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc.

Google Scholar

Randløv J. & Alstrom P.1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, 463–471.

Google Scholar

Rosenschein J.1982. Synchronization of multi-agent plans. In Proceedings of the National Conference on Artificial Intelligence, 115–119.

Google Scholar

Shoham Y., Powers R. & Grenager T.2007. If multi-agent learning is the answer, what is the question?Artificial Intelligence171(7), 365–377.

Google Scholar

Shoham Y. & Tennenholtz M.1995. On social laws for artificial agent societies: off-line design. Artificial Intelligence73(1–2), 231–252.

Google Scholar

Sutton R. S.1984. Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts.

Google Scholar

Sutton R. S. & Barto A. G.1998. Reinforcement Learning: An Introduction. MIT Press.

Google Scholar

Ziparo V.2005. Multi-agent planning. Technical report, University of Rome.

Google Scholar

About this article

Cite this article

Sam Devlin, Daniel Kudenko. 2016. Plan-based reward shaping for multi-agent reinforcement learning. The Knowledge Engineering Review. 31:181 doi: 10.1017/S0269888915000181

Sam Devlin, Daniel Kudenko. 2016. Plan-based reward shaping for multi-agent reinforcement learning. The Knowledge Engineering Review. 31:181 doi: 10.1017/S0269888915000181

Download PDF

Article Metrics

Article views(268) PDF downloads(184)

Plan-based reward shaping for multi-agent reinforcement learning

Sam Devlin¹,
Daniel Kudenko¹

Department of Computer Science

Published online: 11 February 2016

The Knowledge Engineering Review 31, Article number: 10.1017/S0269888915000181 (2016) | Cite this article

Abstract: Abstract: Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

HTML

Experiments with a negative reward on each time step and γ=1 made no significant change in the behaviour of the agents.

Please note the joint-plan-based agents’ illustrated performance in Figure 2 does not reach 600 as the value presented is discounted by the time it takes the agents to complete the episode.

For individual-plan-based agents p=0.064, for all others p<0.05.

Consequently MaxReward now equals 1200.

Rights and permissions

References (25)

About this article

Cite this article

Sam Devlin, Daniel Kudenko. 2016. Plan-based reward shaping for multi-agent reinforcement learning. The Knowledge Engineering Review. 31:181 doi: 10.1017/S0269888915000181

Sam Devlin, Daniel Kudenko. 2016. Plan-based reward shaping for multi-agent reinforcement learning. The Knowledge Engineering Review. 31:181 doi: 10.1017/S0269888915000181

DownLoad: Full-Size Img PowerPoint

Return

{{lists.name}}

Plan-based reward shaping for multi-agent reinforcement learning

Abstract