|
Asmuth J., Littman M. & Zinkov R.2008. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 604–609. |
|
Babes M., de Cote E. & Littman M.2008. Social reward shaping in the prisoner’s dilemma. In Proceedings of the Seventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 3, 1389–1392. |
|
Bertsekas D. P.2007. Dynamic Programming and Optimal Control, 3rd edition. Athena Scientific. |
|
Claus C. & Boutilier C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence, 746–752. |
|
De Hauwere Y., Vrancx P. & Nowé A.2011. Solving delayed coordination problems in mas (extended abstract). In The 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1115–1116. |
|
Devlin S., Grześ M. & Kudenko D.2011. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems14(2), 251–278. |
|
Devlin S. & Kudenko D.2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the Tenth Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS). |
|
Devlin S. & Kudenko D.2012. Dynamic potential-based reward shaping. In Proceedings of the Eleventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS). |
|
De Weerdt M., Ter Mors A. & Witteveen C.2005. Multi-agent planning - an introduction to planning and coordination. Technical report, Delft University of Technology. |
|
Grześ M.2010. Improving exploration in reinforcement learning through domain knowledge and parameter analysis. Technical report, University of York. |
|
Grześ M. & Kudenko D.2008a. Multigrid reinforcement learning with reward shaping. In Artificial Neural Networks-ICANN5163, 357–366. Lecture Notes in Computer Science, Springer. |
|
Grześ M. & Kudenko D.2008b. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS'08), 22–29. IEEE. |
|
Grześ M. & Kudenko D.2009. Improving optimistic exploration in model-free reinforcement learning. Adaptive and Natural Computing Algorithms5495, 360–369. |
|
Marthi B.2007. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine learning, 608. ACM. |
|
Nash J.1951. Non-cooperative games. Annals of Mathematics54(2), 286–295. |
|
Ng A. Y., Harada D. & Russell S. J.1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287. |
|
Peot M. & Smith D.1992. Conditional nonlinear planning. In Artificial Intelligence Planning Systems: Proceedings of the First International Conference, 189. Morgan Kaufmann Publisher. |
|
Puterman M. L.1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc. |
|
Randløv J. & Alstrom P.1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, 463–471. |
|
Rosenschein J.1982. Synchronization of multi-agent plans. In Proceedings of the National Conference on Artificial Intelligence, 115–119. |
|
Shoham Y., Powers R. & Grenager T.2007. If multi-agent learning is the answer, what is the question?Artificial Intelligence171(7), 365–377. |
|
Shoham Y. & Tennenholtz M.1995. On social laws for artificial agent societies: off-line design. Artificial Intelligence73(1–2), 231–252. |
|
Sutton R. S.1984. Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts. |
|
Sutton R. S. & Barto A. G.1998. Reinforcement Learning: An Introduction. MIT Press. |
|
Ziparo V.2005. Multi-agent planning. Technical report, University of Rome. |