Human–agent transfer from
                    observations

Bikramjit Banerjee; Sneha Racharla; Bikramjit Banerjee; Sneha Racharla

doi:10.1017/S0269888920000387

2021 Volume 36

Article Contents

Next Previous

RESEARCH ARTICLE Open Access

Human–agent transfer from observations

The University of Southern Mississippi, 118 College Drive #5106, Hattiesburg, MS 39406, USA e-mails: Bikramjit.Banerjee@usm.edu, Sneha.Racharla@usm.edu

More Information

Received: 03 June 2020
Revised: 10 October 2020
Accepted: 12 October 2020
Published online: 27 November 2020
The Knowledge Engineering Review 36, Article number: e2 (2021) | Cite this article

Abstract

Abstract: Learning from human demonstration (LfD), among many speedup techniques for reinforcement learning (RL), has seen many successful applications. We consider one LfD technique called human–agent transfer (HAT), where a model of the human demonstrator’s decision function is induced via supervised learning and used as an initial bias for RL. Some recent work in LfD has investigated learning from observations only, that is, when only the demonstrator’s states (and not its actions) are available to the learner. Since the demonstrator’s actions are treated as labels for HAT, supervised learning becomes untenable in their absence. We adapt the idea of learning an inverse dynamics model from the data acquired by the learner’s interactions with the environment and deploy it to fill in the missing actions of the demonstrator. The resulting version of HAT—called state-only HAT (SoHAT)—is experimentally shown to preserve some advantages of HAT in benchmark domains with both discrete and continuous actions. This paper also establishes principled modifications of an existing baseline algorithm—called A3C—to create its HAT and SoHAT variants that are used in our experiments.
Rights and permissions
© The Author(s), 2020. Published by Cambridge University Press2020Cambridge University Press

References

Argall , B. D., Chernova , S., Veloso , M. & Browning , B. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469–483.

Google Scholar

Bojarski , M., Testa , D., et al. 2016. End to end learning for self-driving cars. arXiv preprint .

Google Scholar

Chernova , S. & Veloso , M. 2007. Confidence-based policy learning from demonstration using Gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 233, 1–8. ACM.

Google Scholar

Daftry , S., Bagnell , J. & Hebert , M. 2016. Learning transferable policies for monocular reactive MAV control. In Proceedings of the International Symposium on Experimental Robotics, 3–11.

Google Scholar

Da Silva , F. L. & Reali Costa , A. H. 2019. A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research 64, 645–703.

Google Scholar

de la Cruz Jr, G. V., Du , Y. & Taylor , M. E. 2017. Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning. arXiv preprint .

Google Scholar

Fernandez , F., Garcia , J. & Veloso , M. 2010. Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems 58(7), 866–871.

Google Scholar

Giusti , A., Guzzi , J., et al. 2016. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters 1(2), 661–667.

Google Scholar

Ho , J. & Ermon , S. 2016. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, 4565–4573.

Google Scholar

Jain , V., Doshi , P. & Banerjee , B. 2019. Model-free IRL using maximum likelihood estimation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, 3951–3958.

Google Scholar

Judah , K., Fern , A. & Dietterich , T. G. 2012. Active imitation learning via reduction to I.I.D. active learning. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 428–437.

Google Scholar

Karakovskiy , S. & Togelius , J. 2012. The Mario AI benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 55–67.

Google Scholar

Kingma , D. P. & Ba , J. 2015. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations.

Google Scholar

Kolter , J. Z., Abbeel , P. & Ng , A. Y. 2008. Hierarchical apprenticeship learning with application to quadruped locomotion. In Advances in Neural Information Processing Systems (NIPS), 769–776.

Google Scholar

Liu , Y., Gupta , A., Abbeel , P. & Levine , S. 2018. Imitation from observation: learning to imitate behaviors from raw video via context translation. In Proceedings of the International Conference on Robotics and Automation (ICRA-18).

Google Scholar

Mnih , V., Badia , A. P., Mirza , M., Graves , A., Lillicrap , T., Harley , T., Silver , D. & Kavukcuoglu , K. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research 48, PMLR, New York, New York, USA, Balcan , M. F. and Weinberger , K. Q. (eds), 1928–1937.

Google Scholar

Mnih , V., Kavukcuoglu , K., Silver , D., Rusu , A. A., Veness , J., Bellemare , M. G., Graves , A., Riedmiller , M., Fidjeland , A. K., Ostrovski , G., Petersen , S., Beattie , C., Sadik , A., Antonoglou , I., King , H., Kumaran , D., Wierstra , D., Legg , S. & Hassabis , D. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–533.

Google Scholar

Niekum , S., Osentoski , S., Konidaris , G., Chitta , S., Marthi , B. & Barto , A. G. 2015. Learning grounded finite-state representations from unstructured demonstrations. International Journal of Robotics Research 34(2), 131–157.

Google Scholar

Ramachandran , D. & Amir , E. 2007. Bayesian inverse reinforcement learning. In Proceedings of the International Joint Conference on Artificial Intelligence, 2586–2591.

Google Scholar

Ross , S., Gordon , G. & Bagnell , J. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTAT), 627–635.

Google Scholar

Russell , S. 1998. Learning agents for uncertain environments (extended abstract). In Eleventh Annual Conference on Computational Learning Theory, 101–103.

Google Scholar

Schaal , S. 1997. Learning from demonstration. In Advances in Neural Information Processing Systems (NIPS), 1040–1046.

Google Scholar

Silver , D., Huang , A., Maddison , C. J., Guez , A., Sifre , L., van den Driessche , G., Schrittwieser , J., Antonoglou , I., Panneershelvam , V., Lanctot , M., Dieleman , S., Grewe , D., Nham , J., Kalchbrenner , N., Sutskever , I., Lillicrap , T., Leach , M., Kavukcuoglu , K., Graepel , T. & Hassabis , D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489.

Google Scholar

Subramanian , K., Isbell Jr, C. L. & Thomaz , A. L. 2016. Exploration from demonstration for interactive reinforcement learning. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 447–456.

Google Scholar

Sutton , R. & Barto , A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.

Google Scholar

Sutton , R. S., McAllester , D., Singh , S. & Mansour , Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12. MIT Press, 1057–1063.

Google Scholar

Tamassia , M., Zambetta , F., Raffe , W., Mueller , F. & Li , X. 2017. Learning options from demonstrations: A Pac-Man case study. IEEE Transactions on Computational Intelligence and AI in Games 10(1), 91–96.

Google Scholar

Taylor , M. E. & Stone , P. 2009. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(1), 1633–1685.

Google Scholar

Taylor , M. E., Suay , H. B. & Chernova , S. 2011. Integrating reinforcement learning with human demonstrations of varying ability. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

Google Scholar

Torabi , F., Warnell , G. & Stone , P. 2018. Behavioral cloning from observation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), 4950–4957.

Google Scholar

Uchibe , E. 2018. Model-free deep inverse reinforcement learning by logistic regression. Neural Processing Letters 47(3), 891–905.

Google Scholar

VRoman , M. C. 2014. Maximum Likelihood Inverse Reinforcement Learning. PhD thesis, Rutgers University.

Google Scholar

Walsh , T. J., Hewlett , D. K. & Morrison , C. T. 2011. Blending autonomous exploration and apprenticeship learning. In Advances in Neural Information Processing Systems (NIPS), 2258–2266.

Google Scholar

Wang , Z. & Taylor , M. E. 2017. Improving reinforcement learning with confidence-based demonstrations. In Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI).

Google Scholar

Wang , Z. & Taylor , M. E. 2019. Interactive reinforcement learning with dynamic reuse of prior knowledge from human and agent demonstrations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI-19), 3820–3827.

Google Scholar

Williams , R. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8(3–4), 229–256.

Google Scholar

Ziebart , B. D., Maas , A., Bagnell , J. A. & Dey , A. K. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 1433–1438.

Google Scholar

About this article

Cite this article

Bikramjit Banerjee, Sneha Racharla. 2021. Human–agent transfer from observations. The Knowledge Engineering Review. 36:387 doi: 10.1017/S0269888920000387

Bikramjit Banerjee, Sneha Racharla. 2021. Human–agent transfer from observations. The Knowledge Engineering Review. 36:387 doi: 10.1017/S0269888920000387

Download PDF

Article Metrics

Article views(174) PDF downloads(127)

{{lists.name}}

Human–agent transfer from observations

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors