The University of Southern Mississippi, 118 College Drive #5106, Hattiesburg, MS 39406, USA e-mails: Bikramjit.Banerjee@usm.edu, Sneha.Racharla@usm.edu "/>
Search
2021 Volume 36
Article Contents
RESEARCH ARTICLE   Open Access    

Human–agent transfer from observations

More Information
  • Abstract: Learning from human demonstration (LfD), among many speedup techniques for reinforcement learning (RL), has seen many successful applications. We consider one LfD technique called human–agent transfer (HAT), where a model of the human demonstrator’s decision function is induced via supervised learning and used as an initial bias for RL. Some recent work in LfD has investigated learning from observations only, that is, when only the demonstrator’s states (and not its actions) are available to the learner. Since the demonstrator’s actions are treated as labels for HAT, supervised learning becomes untenable in their absence. We adapt the idea of learning an inverse dynamics model from the data acquired by the learner’s interactions with the environment and deploy it to fill in the missing actions of the demonstrator. The resulting version of HAT—called state-only HAT (SoHAT)—is experimentally shown to preserve some advantages of HAT in benchmark domains with both discrete and continuous actions. This paper also establishes principled modifications of an existing baseline algorithm—called A3C—to create its HAT and SoHAT variants that are used in our experiments.
  • 加载中
  • Argall , B. D., Chernova , S., Veloso , M. & Browning , B. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469–483.

    Google Scholar

    Bojarski , M., Testa , D., et al. 2016. End to end learning for self-driving cars. arXiv preprint .

    Google Scholar

    Chernova , S. & Veloso , M. 2007. Confidence-based policy learning from demonstration using Gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 233, 1–8. ACM.

    Google Scholar

    Daftry , S., Bagnell , J. & Hebert , M. 2016. Learning transferable policies for monocular reactive MAV control. In Proceedings of the International Symposium on Experimental Robotics, 3–11.

    Google Scholar

    Da Silva , F. L. & Reali Costa , A. H. 2019. A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research 64, 645–703.

    Google Scholar

    de la Cruz Jr, G. V., Du , Y. & Taylor , M. E. 2017. Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning. arXiv preprint .

    Google Scholar

    Fernandez , F., Garcia , J. & Veloso , M. 2010. Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems 58(7), 866–871.

    Google Scholar

    Giusti , A., Guzzi , J., et al. 2016. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters 1(2), 661–667.

    Google Scholar

    Ho , J. & Ermon , S. 2016. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, 4565–4573.

    Google Scholar

    Jain , V., Doshi , P. & Banerjee , B. 2019. Model-free IRL using maximum likelihood estimation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, 3951–3958.

    Google Scholar

    Judah , K., Fern , A. & Dietterich , T. G. 2012. Active imitation learning via reduction to I.I.D. active learning. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 428–437.

    Google Scholar

    Karakovskiy , S. & Togelius , J. 2012. The Mario AI benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 55–67.

    Google Scholar

    Kingma , D. P. & Ba , J. 2015. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations.

    Google Scholar

    Kolter , J. Z., Abbeel , P. & Ng , A. Y. 2008. Hierarchical apprenticeship learning with application to quadruped locomotion. In Advances in Neural Information Processing Systems (NIPS), 769–776.

    Google Scholar

    Liu , Y., Gupta , A., Abbeel , P. & Levine , S. 2018. Imitation from observation: learning to imitate behaviors from raw video via context translation. In Proceedings of the International Conference on Robotics and Automation (ICRA-18).

    Google Scholar

    Mnih , V., Badia , A. P., Mirza , M., Graves , A., Lillicrap , T., Harley , T., Silver , D. & Kavukcuoglu , K. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research 48, PMLR, New York, New York, USA, Balcan , M. F. and Weinberger , K. Q. (eds), 1928–1937.

    Google Scholar

    Mnih , V., Kavukcuoglu , K., Silver , D., Rusu , A. A., Veness , J., Bellemare , M. G., Graves , A., Riedmiller , M., Fidjeland , A. K., Ostrovski , G., Petersen , S., Beattie , C., Sadik , A., Antonoglou , I., King , H., Kumaran , D., Wierstra , D., Legg , S. & Hassabis , D. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–533.

    Google Scholar

    Niekum , S., Osentoski , S., Konidaris , G., Chitta , S., Marthi , B. & Barto , A. G. 2015. Learning grounded finite-state representations from unstructured demonstrations. International Journal of Robotics Research 34(2), 131–157.

    Google Scholar

    Ramachandran , D. & Amir , E. 2007. Bayesian inverse reinforcement learning. In Proceedings of the International Joint Conference on Artificial Intelligence, 2586–2591.

    Google Scholar

    Ross , S., Gordon , G. & Bagnell , J. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTAT), 627–635.

    Google Scholar

    Russell , S. 1998. Learning agents for uncertain environments (extended abstract). In Eleventh Annual Conference on Computational Learning Theory, 101–103.

    Google Scholar

    Schaal , S. 1997. Learning from demonstration. In Advances in Neural Information Processing Systems (NIPS), 1040–1046.

    Google Scholar

    Silver , D., Huang , A., Maddison , C. J., Guez , A., Sifre , L., van den Driessche , G., Schrittwieser , J., Antonoglou , I., Panneershelvam , V., Lanctot , M., Dieleman , S., Grewe , D., Nham , J., Kalchbrenner , N., Sutskever , I., Lillicrap , T., Leach , M., Kavukcuoglu , K., Graepel , T. & Hassabis , D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489.

    Google Scholar

    Subramanian , K., Isbell Jr, C. L. & Thomaz , A. L. 2016. Exploration from demonstration for interactive reinforcement learning. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 447–456.

    Google Scholar

    Sutton , R. & Barto , A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.

    Google Scholar

    Sutton , R. S., McAllester , D., Singh , S. & Mansour , Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12. MIT Press, 1057–1063.

    Google Scholar

    Tamassia , M., Zambetta , F., Raffe , W., Mueller , F. & Li , X. 2017. Learning options from demonstrations: A Pac-Man case study. IEEE Transactions on Computational Intelligence and AI in Games 10(1), 91–96.

    Google Scholar

    Taylor , M. E. & Stone , P. 2009. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(1), 1633–1685.

    Google Scholar

    Taylor , M. E., Suay , H. B. & Chernova , S. 2011. Integrating reinforcement learning with human demonstrations of varying ability. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

    Google Scholar

    Torabi , F., Warnell , G. & Stone , P. 2018. Behavioral cloning from observation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), 4950–4957.

    Google Scholar

    Uchibe , E. 2018. Model-free deep inverse reinforcement learning by logistic regression. Neural Processing Letters 47(3), 891–905.

    Google Scholar

    VRoman , M. C. 2014. Maximum Likelihood Inverse Reinforcement Learning. PhD thesis, Rutgers University.

    Google Scholar

    Walsh , T. J., Hewlett , D. K. & Morrison , C. T. 2011. Blending autonomous exploration and apprenticeship learning. In Advances in Neural Information Processing Systems (NIPS), 2258–2266.

    Google Scholar

    Wang , Z. & Taylor , M. E. 2017. Improving reinforcement learning with confidence-based demonstrations. In Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI).

    Google Scholar

    Wang , Z. & Taylor , M. E. 2019. Interactive reinforcement learning with dynamic reuse of prior knowledge from human and agent demonstrations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI-19), 3820–3827.

    Google Scholar

    Williams , R. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8(3–4), 229–256.

    Google Scholar

    Ziebart , B. D., Maas , A., Bagnell , J. A. & Dey , A. K. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 1433–1438.

    Google Scholar

  • Cite this article

    Bikramjit Banerjee, Sneha Racharla. 2021. Human–agent transfer from observations. The Knowledge Engineering Review 36(1), doi: 10.1017/S0269888920000387
    Bikramjit Banerjee, Sneha Racharla. 2021. Human–agent transfer from observations. The Knowledge Engineering Review 36(1), doi: 10.1017/S0269888920000387

Article Metrics

Article views(53) PDF downloads(38)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Human–agent transfer from observations

Abstract: Abstract: Learning from human demonstration (LfD), among many speedup techniques for reinforcement learning (RL), has seen many successful applications. We consider one LfD technique called human–agent transfer (HAT), where a model of the human demonstrator’s decision function is induced via supervised learning and used as an initial bias for RL. Some recent work in LfD has investigated learning from observations only, that is, when only the demonstrator’s states (and not its actions) are available to the learner. Since the demonstrator’s actions are treated as labels for HAT, supervised learning becomes untenable in their absence. We adapt the idea of learning an inverse dynamics model from the data acquired by the learner’s interactions with the environment and deploy it to fill in the missing actions of the demonstrator. The resulting version of HAT—called state-only HAT (SoHAT)—is experimentally shown to preserve some advantages of HAT in benchmark domains with both discrete and continuous actions. This paper also establishes principled modifications of an existing baseline algorithm—called A3C—to create its HAT and SoHAT variants that are used in our experiments.

    • We are grateful to the anonymous reviewers for helpful comments and suggestions. This work was supported in part by a National Science Foundation grant IIS-1526813.

    • © The Author(s), 2020. Published by Cambridge University Press2020Cambridge University Press
References (37)
  • About this article
    Cite this article
    Bikramjit Banerjee, Sneha Racharla. 2021. Human–agent transfer from observations. The Knowledge Engineering Review 36(1), doi: 10.1017/S0269888920000387
    Bikramjit Banerjee, Sneha Racharla. 2021. Human–agent transfer from observations. The Knowledge Engineering Review 36(1), doi: 10.1017/S0269888920000387
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return