Abstract: Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect.This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.
Asmuth J., Littman M. & Zinkov R.2008. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 604–609.
Devlin S., Grześ M. & Kudenko D.2011. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems.
Devlin S. & Kudenko D.2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of The Tenth Annual International Conference on Autonomous Agents and Multiagent Systems.
Devlin S. & Kudenko D.2012. Dynamic potential-based reward shaping. In Proceedings of The Eleventh Annual International Conference on Autonomous Agents and Multiagent Systems.
Efthymiadis K. & Kudenko D.2013. Using plan-based reward shaping to learn strategies in StarCraft: Brood War. In Computational Intelligence and Games (CIG). IEEE.
Fikes R. E. & Nilsson N. J.1972. STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence2(3), 189–208.
Grześ M. & Kudenko D.2008b. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS’08), 22–29. IEEE.
Marthi B.2007. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine Learning, 608. ACM.
Ng A. Y., Harada D. & Russell S. J.1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287.
Randløv J. & Alstrom P.1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, 463–471.
Abstract: Abstract: Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect.This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.
HTML
Acknowledgments
This study was partially sponsored by QinetiQ under the EPSRC ICASE project ‘Planning and belief revision in reinforcement learning’.
Please note that one step in the plan will map to many low-level states. Therefore, even when provided with the correct knowledge, the agent must learn how to execute this plan at the low level.
A rule $\phi $, along with its consequences, is retracted from a set of beliefs K. To retain logical closure, other rules might need to be retracted. The contracted belief base is denoted as $K\dot{{\minus}}\phi $ (Gärdenfors, 1992).