doi:10.1017/S0269888915000181

Asmuth J., Littman M. & Zinkov R.2008. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 604–609.

Babes M., de Cote E. & Littman M.2008. Social reward shaping in the prisoner’s dilemma. In Proceedings of the Seventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 3, 1389–1392.

Bertsekas D. P.2007. Dynamic Programming and Optimal Control, 3rd edition. Athena Scientific.

Claus C. & Boutilier C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence, 746–752.

De Hauwere Y., Vrancx P. & Nowé A.2011. Solving delayed coordination problems in mas (extended abstract). In The 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1115–1116.

Devlin S., Grześ M. & Kudenko D.2011. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems14(2), 251–278.

Devlin S. & Kudenko D.2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the Tenth Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

Devlin S. & Kudenko D.2012. Dynamic potential-based reward shaping. In Proceedings of the Eleventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

De Weerdt M., Ter Mors A. & Witteveen C.2005. Multi-agent planning - an introduction to planning and coordination. Technical report, Delft University of Technology.

Grześ M.2010. Improving exploration in reinforcement learning through domain knowledge and parameter analysis. Technical report, University of York.

Grześ M. & Kudenko D.2008a. Multigrid reinforcement learning with reward shaping. In Artificial Neural Networks-ICANN5163, 357–366. Lecture Notes in Computer Science, Springer.

Grześ M. & Kudenko D.2008b. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS'08), 22–29. IEEE.

Grześ M. & Kudenko D.2009. Improving optimistic exploration in model-free reinforcement learning. Adaptive and Natural Computing Algorithms5495, 360–369.

Marthi B.2007. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine learning, 608. ACM.

Nash J.1951. Non-cooperative games. Annals of Mathematics54(2), 286–295.

Ng A. Y., Harada D. & Russell S. J.1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287.

Peot M. & Smith D.1992. Conditional nonlinear planning. In Artificial Intelligence Planning Systems: Proceedings of the First International Conference, 189. Morgan Kaufmann Publisher.

Puterman M. L.1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc.

Randløv J. & Alstrom P.1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, 463–471.

Rosenschein J.1982. Synchronization of multi-agent plans. In Proceedings of the National Conference on Artificial Intelligence, 115–119.

Shoham Y., Powers R. & Grenager T.2007. If multi-agent learning is the answer, what is the question?Artificial Intelligence171(7), 365–377.

Shoham Y. & Tennenholtz M.1995. On social laws for artificial agent societies: off-line design. Artificial Intelligence73(1–2), 231–252.

Sutton R. S.1984. Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts.

Sutton R. S. & Barto A. G.1998. Reinforcement Learning: An Introduction. MIT Press.

Ziparo V.2005. Multi-agent planning. Technical report, University of Rome.