Overcoming incorrect knowledge in plan-based reward shaping

Kyriakos Efthymiadis; Sam Devlin; Daniel Kudenko; Kyriakos Efthymiadis; Sam Devlin; Daniel Kudenko

doi:10.1017/S026988891500017X

Abstract: Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect.This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.

Other Articles By Authors

Overcoming incorrect knowledge in plan-based reward shaping

Department of Computer Science

Published online: 11 February 2016

Abstract: Abstract: Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect.This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.

HTML

Acknowledgments

This study was partially sponsored by QinetiQ under the EPSRC ICASE project ‘Planning and belief revision in reinforcement learning’.

Please note that one step in the plan will map to many low-level states. Therefore, even when provided with the correct knowledge, the agent must learn how to execute this plan at the low level.

A rule $\phi $, along with its consequences, is retracted from a set of beliefs K. To retain logical closure, other rules might need to be retracted. The contracted belief base is denoted as $K\dot{{\minus}}\phi $ (Gärdenfors, 1992).

Rights and permissions

References (15)

About this article

Cite this article

Kyriakos Efthymiadis, Sam Devlin, Daniel Kudenko. 2016. Overcoming incorrect knowledge in plan-based reward shaping. The Knowledge Engineering Review. 31: doi: 10.1017/S026988891500017X

Kyriakos Efthymiadis, Sam Devlin, Daniel Kudenko. 2016. Overcoming incorrect knowledge in plan-based reward shaping. The Knowledge Engineering Review. 31: doi: 10.1017/S026988891500017X

{{lists.name}}

Overcoming incorrect knowledge in plan-based reward shaping

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors