Search
2016 Volume 31
Article Contents
RESEARCH ARTICLE   Open Access    

Overcoming incorrect knowledge in plan-based reward shaping

More Information
  • Abstract: Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect.This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.
  • 加载中
  • Asmuth J., Littman M. & Zinkov R.2008. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 604–609.

    Google Scholar

    Bertsekas D. P.2007. Dynamic Programming and Optimal Control (2 Vol Set), 3rd edition. Athena Scientific.

    Google Scholar

    Devlin S., Grześ M. & Kudenko D.2011. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems.

    Google Scholar

    Devlin S. & Kudenko D.2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of The Tenth Annual International Conference on Autonomous Agents and Multiagent Systems.

    Google Scholar

    Devlin S. & Kudenko D.2012. Dynamic potential-based reward shaping. In Proceedings of The Eleventh Annual International Conference on Autonomous Agents and Multiagent Systems.

    Google Scholar

    Efthymiadis K. & Kudenko D.2013. Using plan-based reward shaping to learn strategies in StarCraft: Brood War. In Computational Intelligence and Games (CIG). IEEE.

    Google Scholar

    Fikes R. E. & Nilsson N. J.1972. STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence2(3), 189–208.

    Google Scholar

    Gärdenfors P.1992. Belief revision: an introduction. Belief Revision29, 1–28.

    Google Scholar

    Grześ M. & Kudenko D.2008a. Multigrid reinforcement learning with reward shaping. In Artificial Neural Networks-ICANN 2008, 357–366.

    Google Scholar

    Grześ M. & Kudenko D.2008b. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS’08), 22–29. IEEE.

    Google Scholar

    Marthi B.2007. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine Learning, 608. ACM.

    Google Scholar

    Ng A. Y., Harada D. & Russell S. J.1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287.

    Google Scholar

    Puterman M. L.1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc.

    Google Scholar

    Randløv J. & Alstrom P.1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, 463–471.

    Google Scholar

    Sutton R. S. & Barto A. G.1998. Reinforcement Learning: An Introduction. MIT Press.

    Google Scholar

  • Cite this article

    Kyriakos Efthymiadis, Sam Devlin, Daniel Kudenko. 2016. Overcoming incorrect knowledge in plan-based reward shaping. The Knowledge Engineering Review 31(1)31−43, doi: 10.1017/S026988891500017X
    Kyriakos Efthymiadis, Sam Devlin, Daniel Kudenko. 2016. Overcoming incorrect knowledge in plan-based reward shaping. The Knowledge Engineering Review 31(1)31−43, doi: 10.1017/S026988891500017X

Article Metrics

Article views(24) PDF downloads(4)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Overcoming incorrect knowledge in plan-based reward shaping

The Knowledge Engineering Review  31 2016, 31(1): 31−43  |  Cite this article

Abstract: Abstract: Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect.This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.

    • This study was partially sponsored by QinetiQ under the EPSRC ICASE project ‘Planning and belief revision in reinforcement learning’.

    • Please note that one step in the plan will map to many low-level states. Therefore, even when provided with the correct knowledge, the agent must learn how to execute this plan at the low level.

    • A rule $\phi $, along with its consequences, is retracted from a set of beliefs K. To retain logical closure, other rules might need to be retracted. The contracted belief base is denoted as $K\dot{{\minus}}\phi $ (Gärdenfors, 1992).

    • © Cambridge University Press, 2016 2016Cambridge University Press
References (15)
  • About this article
    Cite this article
    Kyriakos Efthymiadis, Sam Devlin, Daniel Kudenko. 2016. Overcoming incorrect knowledge in plan-based reward shaping. The Knowledge Engineering Review 31(1)31−43, doi: 10.1017/S026988891500017X
    Kyriakos Efthymiadis, Sam Devlin, Daniel Kudenko. 2016. Overcoming incorrect knowledge in plan-based reward shaping. The Knowledge Engineering Review 31(1)31−43, doi: 10.1017/S026988891500017X
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return