Search
2018 Volume 33
Article Contents
ORIGINAL RESEARCH   Open Access    

Leveraging human knowledge in tabular reinforcement learning: a study of human subjects

More Information
  • Abstract: Reinforcement learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named State Action Similarity Solutions (SASS) which is based on the notion of similarities in the agent’s state–action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single-speedup method with minimal human designer effort overhead.
  • 加载中
  • Albus J. S. 1981. Brains, Behavior and Robotics. McGraw-Hill Inc.

    Google Scholar

    Benda M. 1985. On Optimal Cooperation of Knowledge Sources. Technical report BCS-G2010-28.

    Google Scholar

    Bianchi R. A., Martins M. F., Ribeiro C. H. & Costa A. H. 2014. ‘Heuristically-accelerated multiagent reinforcement learning’. IEEE Transactions on Cybernetics 44(2), 252–265.

    Google Scholar

    Brockman G., Cheung V., Pettersson L., Schneider J., Schulman J., Tang J. & Zaremba W. 2016. Openai gym. https://gym.openai.com (accessed 24 October 2017).

    Google Scholar

    Bruner J. S. 1957. Going beyond the information given. Contemporary Approaches to Cognition 1(1), 119–160.

    Google Scholar

    Brys T., Harutyunyan A., Suay H. B., Chernova S., Taylor M. E. & Nowé A. 2015. Reinforcement learning from demonstration through shaping. In IJCAI, 3352–3358.

    Google Scholar

    Brys T., Nowé A., Kudenko D. & Taylor M. E. 2014. Combining multiple correlated reward and shaping signals by measuring confidence. In AAAI, 1687–1693.

    Google Scholar

    Busoniu L., Babuska R., De Schutter B. & Ernst D. 2010. Reinforcement Learning and Dynamic Programming Using Function Approximators, 39. CRC Press.

    Google Scholar

    Devlin S., Grze´s M. & Kudenko D. 2011. Multi-agent, reward shaping for robocup keepaway. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 3, 1227–1228. International Foundation for Autonomous Agents and Multiagent Systems.

    Google Scholar

    Geramifard A., Klein R. H., Dann C., Dabney W. & How J. P. 2013. RLPy: The Reinforcement Learning Library for Education and Research. http://acl.mit.edu/RLPy.

    Google Scholar

    Girgin S., Polat F. & Alhajj R. 2007. Positive impact of state similarity on reinforcement learning performance. IEEE Transactions on Cybernetics 37(5), 1256–1270.

    Google Scholar

    Hart S. G. & Staveland L. E. 1988. Development of NASA-TLX (task load index): results of empirical and theoretical research. Advances in Psychology 52, 139–183.

    Google Scholar

    Hester T. & Stone P. 2013. Texplore: real-time sample-efficient reinforcement learning for robots. Machine Learning 90(3), 385–429.

    Google Scholar

    Jong N. K. & Stone P. 2007. Model-based function approximation in reinforcement learning. In AAMAS, 95. ACM.

    Google Scholar

    Karakovskiy S. & Togelius J. 2012. The Mario AI benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 55–67.

    Google Scholar

    Kelly G. 1955. Personal Construct Psychology. Norton.

    Google Scholar

    Knox W. B. & Stone P. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of AAMAS.

    Google Scholar

    Leffler B. R., Littman M. L. & Edmunds T. 2007. Efficient reinforcement learning with relocatable action models. AAAI 7, 572–577.

    Google Scholar

    Littman M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. ICML 157, 157–163.

    Google Scholar

    Martins M. F. & Bianchi R. A. 2013. Heuristically-accelerated reinforcement learning: a comparative analysis of performance. In Conference Towards Autonomous Robotic Systems, 15–27. Springer.

    Google Scholar

    Mataric M. J. 1994. Reward functions for accelerated learning. In Machine Learning: Proceedings of the Eleventh International Conference, 181–189.

    Google Scholar

    Mnih V., Kavukcuoglu K., Silver D., Rusu A. A., Veness J., Bellemare M. G., Graves A., Riedmiller M., Fidjeland A. K., Ostrovski G., Petersen S., Beattie C., Sadik A., Antonoglou I., King H., Kumaran D., Wierstra D., Legg S. & Hassabis D. 2015. Human-level control through deep reinforcement learning. Nature 518(7540), 529–533.

    Google Scholar

    Narayanamurthy S. M. & Ravindran B. 2008. On the hardness of finding symmetries in Markov decision processes. In ICML, 688–695.

    Google Scholar

    Ng A. Y., Harada D. & Russell S. 1999. Policy invariance under reward transformations: theory and application to reward shaping. ICML. 99, 278–287.

    Google Scholar

    Peng B., MacGlashan J., Loftin R., Littman M. L., Roberts D. L. & Taylor M. E. 2016. A need for speed: adapting agent action speed to improve task learning from non-expert humans. In AAMAS, 957–965.

    Google Scholar

    Randløv J. & Alstrøm P. 1998. Learning to drive a bicycle using reinforcement learning and shaping. ICML 98, 463–471.

    Google Scholar

    Ribeiro C. H. 1995. Attentional mechanisms as a strategy for generalisation in the q-learning algorithm. Proceedings of ICANN 95, 455–460.

    Google Scholar

    Ribeiro C. & Szepesv´ari C. 1996. Q-learning combined with spreading: convergence and results. In Proceedings of the ISRF-IEE International Conference on Intelligent and Cognitive Systems (Neural Networks Symposium), 32–36.

    Google Scholar

    Rosenfeld A. & Kraus S. 2018. Predicting human decision-making: from prediction to action. Synthesis Lectures on Artificial Intelligence and Machine Learning 12(1), 1–150.

    Google Scholar

    Rosenfeld A., Taylor M. E. & Kraus S. 2017a. Leveraging human knowledge in tabular reinforcement learning: a study of human subjects. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19–25, 2017, 3823–3830.

    Google Scholar

    Rosenfeld A., Taylor M. E. & Kraus S. 2017b. Speeding up tabular reinforcement learning using stateaction similarities. In AAMAS, 1722–1724.

    Google Scholar

    Schaul T., Bayer J., Wierstra D., Sun Y., Felder M., Sehnke F., Rückstieß T & Schmidhuber J. 2010. PyBrain, Journal of Machine Learning Research 11, 743–746.

    Google Scholar

    Sequeira P., Melo F. S. & Paiva A. 2013. An associative state-space metric for learning in factored mdps. In Portuguese Conference on Artificial Intelligence, 163–174. Springer.

    Google Scholar

    Skinner B. F. 1958. Reinforcement today. American Psychologist 13(3), 94.

    Google Scholar

    Stone P., Kuhlmann G., Taylor M. E. & Liu Y. 2006. Keepaway soccer: from machine learning testbed to benchmark. In RoboCup-2005: Robot Soccer World Cup IX, I. Noda, A. Jacoff, A. Bredenfeld & Y. Takahashi (eds). Springer Verlag 4020, 93–105.

    Google Scholar

    Suay H. B., Brys T., Taylor M. E. & Chernova S. 2016. Learning from demonstration for shaping through inverse reinforcement learning. In AAMAS, 429–437.

    Google Scholar

    Sutton R. S. & Barto A. G. 1998. Reinforcement Learning: An Introduction. MIT press.

    Google Scholar

    Szepesvári C. & Littman M. L. 1999. ‘A unified analysis of value-function-based reinforcementlearning algorithms’. Neural Computation 11(8), 2017–2060.

    Google Scholar

    Tamassia M., Zambetta F., Raffe W., Mueller F. & Li X. 2016. Dynamic choice of state abstraction in q-learning. In ECAI.

    Google Scholar

    Tanner B. & White A. 2009. RL-Glue : language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research 10, 2133–2136.

    Google Scholar

    Watkins C. J. C. H. 1989. Learning from Delayed Rewards. PhD thesis, University of Cambridge.

    Google Scholar

    Witten I. H., Frank E., Hall M. A. & Pal C. J. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.

    Google Scholar

    Zinkevich M. & Balch T. 2001. Symmetry in Markov decision processes and its implications for single agent and multi agent learning. In ICML.

    Google Scholar

  • Cite this article

    Ariel Rosenfeld, Moshe Cohen, Matthew E. Taylor, Sarit Kraus. 2018. Leveraging human knowledge in tabular reinforcement learning: a study of human subjects. The Knowledge Engineering Review 33(1), doi: 10.1017/S0269888918000206
    Ariel Rosenfeld, Moshe Cohen, Matthew E. Taylor, Sarit Kraus. 2018. Leveraging human knowledge in tabular reinforcement learning: a study of human subjects. The Knowledge Engineering Review 33(1), doi: 10.1017/S0269888918000206

Article Metrics

Article views(67) PDF downloads(37)

ORIGINAL RESEARCH   Open Access    

Leveraging human knowledge in tabular reinforcement learning: a study of human subjects

Abstract: Abstract: Reinforcement learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named State Action Similarity Solutions (SASS) which is based on the notion of similarities in the agent’s state–action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single-speedup method with minimal human designer effort overhead.

    • This article extends our previous reports from AAMAS 2017 (Rosenfeld et al., 2017b) (short paper) and IJCAI 2017 (Rosenfeld et al., 2017a) (full paper) in several major aspects: first, in the former, the SASS approach was presented and tested by three experts as described in Section 4.4. Then, in Rosenfeld et al. (2017a), the study was extended to include an additional 16 non-expert designers who implemented the $QS$-learning and $QA$-learning conditions as discussed in Experiment 1 (Section 4.2). In this article, we almost triple our participant pool by recruiting an additional 32 participants and perform an additional experiment (Experiment 2, Section 4.3). As a result of this addition, we were able to investigate the RS condition, which was not investigated in previous reports, and provide a much broader and in-depth investigation of human designers. This addition also enhances the credibility and validity of our previously reported results and demonstrates new insights which were not previously observed. An extended version of Rosenfeld et al. (2017b) entitled ‘Speeding up Tabular Reinforcement Learning Using State–Action Similarities’ was presented at the Fifteenth Adaptive Learning Agents (ALA) workshop at AAMAS 2017 and received the Best Paper Award of the workshop. This research was funded in part by MAFAT. It has also taken place at the Intelligent Robot Learning (IRL) Lab, which is supported in part by NASA NNX16CD07C, NSF IIS-1734558, and USDA 2014-67021-22174.

    • All experiments were authorized by the corresponding institutional review board.

    • The opponent was given a hand-coded policy, similar to that used in the original paper, which instructs it to avoid colliding with the other player while it has the ball and attempts to score a goal. While defending, the agent chases its opponent and tries to steal the ball.

    • © Cambridge University Press, 2018 2018Cambridge University Press
References (43)
  • About this article
    Cite this article
    Ariel Rosenfeld, Moshe Cohen, Matthew E. Taylor, Sarit Kraus. 2018. Leveraging human knowledge in tabular reinforcement learning: a study of human subjects. The Knowledge Engineering Review 33(1), doi: 10.1017/S0269888918000206
    Ariel Rosenfeld, Moshe Cohen, Matthew E. Taylor, Sarit Kraus. 2018. Leveraging human knowledge in tabular reinforcement learning: a study of human subjects. The Knowledge Engineering Review 33(1), doi: 10.1017/S0269888918000206
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return