doi:10.1017/S0269888920000387

Argall , B. D., Chernova , S., Veloso , M. & Browning , B. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469–483.

Bojarski , M., Testa , D., et al. 2016. End to end learning for self-driving cars. arXiv preprint .

Chernova , S. & Veloso , M. 2007. Confidence-based policy learning from demonstration using Gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 233, 1–8. ACM.

Daftry , S., Bagnell , J. & Hebert , M. 2016. Learning transferable policies for monocular reactive MAV control. In Proceedings of the International Symposium on Experimental Robotics, 3–11.

Da Silva , F. L. & Reali Costa , A. H. 2019. A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research 64, 645–703.

de la Cruz Jr, G. V., Du , Y. & Taylor , M. E. 2017. Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning. arXiv preprint .

Fernandez , F., Garcia , J. & Veloso , M. 2010. Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems 58(7), 866–871.

Giusti , A., Guzzi , J., et al. 2016. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters 1(2), 661–667.

Ho , J. & Ermon , S. 2016. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, 4565–4573.

Jain , V., Doshi , P. & Banerjee , B. 2019. Model-free IRL using maximum likelihood estimation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, 3951–3958.

Judah , K., Fern , A. & Dietterich , T. G. 2012. Active imitation learning via reduction to I.I.D. active learning. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 428–437.

Karakovskiy , S. & Togelius , J. 2012. The Mario AI benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 55–67.

Kingma , D. P. & Ba , J. 2015. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations.

Kolter , J. Z., Abbeel , P. & Ng , A. Y. 2008. Hierarchical apprenticeship learning with application to quadruped locomotion. In Advances in Neural Information Processing Systems (NIPS), 769–776.

Liu , Y., Gupta , A., Abbeel , P. & Levine , S. 2018. Imitation from observation: learning to imitate behaviors from raw video via context translation. In Proceedings of the International Conference on Robotics and Automation (ICRA-18).

Mnih , V., Badia , A. P., Mirza , M., Graves , A., Lillicrap , T., Harley , T., Silver , D. & Kavukcuoglu , K. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research 48, PMLR, New York, New York, USA, Balcan , M. F. and Weinberger , K. Q. (eds), 1928–1937.

Mnih , V., Kavukcuoglu , K., Silver , D., Rusu , A. A., Veness , J., Bellemare , M. G., Graves , A., Riedmiller , M., Fidjeland , A. K., Ostrovski , G., Petersen , S., Beattie , C., Sadik , A., Antonoglou , I., King , H., Kumaran , D., Wierstra , D., Legg , S. & Hassabis , D. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–533.

Niekum , S., Osentoski , S., Konidaris , G., Chitta , S., Marthi , B. & Barto , A. G. 2015. Learning grounded finite-state representations from unstructured demonstrations. International Journal of Robotics Research 34(2), 131–157.

Ramachandran , D. & Amir , E. 2007. Bayesian inverse reinforcement learning. In Proceedings of the International Joint Conference on Artificial Intelligence, 2586–2591.

Ross , S., Gordon , G. & Bagnell , J. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTAT), 627–635.

Russell , S. 1998. Learning agents for uncertain environments (extended abstract). In Eleventh Annual Conference on Computational Learning Theory, 101–103.

Schaal , S. 1997. Learning from demonstration. In Advances in Neural Information Processing Systems (NIPS), 1040–1046.

Silver , D., Huang , A., Maddison , C. J., Guez , A., Sifre , L., van den Driessche , G., Schrittwieser , J., Antonoglou , I., Panneershelvam , V., Lanctot , M., Dieleman , S., Grewe , D., Nham , J., Kalchbrenner , N., Sutskever , I., Lillicrap , T., Leach , M., Kavukcuoglu , K., Graepel , T. & Hassabis , D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489.

Subramanian , K., Isbell Jr, C. L. & Thomaz , A. L. 2016. Exploration from demonstration for interactive reinforcement learning. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 447–456.

Sutton , R. & Barto , A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.

Sutton , R. S., McAllester , D., Singh , S. & Mansour , Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12. MIT Press, 1057–1063.

Tamassia , M., Zambetta , F., Raffe , W., Mueller , F. & Li , X. 2017. Learning options from demonstrations: A Pac-Man case study. IEEE Transactions on Computational Intelligence and AI in Games 10(1), 91–96.

Taylor , M. E. & Stone , P. 2009. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(1), 1633–1685.

Taylor , M. E., Suay , H. B. & Chernova , S. 2011. Integrating reinforcement learning with human demonstrations of varying ability. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

Torabi , F., Warnell , G. & Stone , P. 2018. Behavioral cloning from observation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), 4950–4957.

Uchibe , E. 2018. Model-free deep inverse reinforcement learning by logistic regression. Neural Processing Letters 47(3), 891–905.

VRoman , M. C. 2014. Maximum Likelihood Inverse Reinforcement Learning. PhD thesis, Rutgers University.

Walsh , T. J., Hewlett , D. K. & Morrison , C. T. 2011. Blending autonomous exploration and apprenticeship learning. In Advances in Neural Information Processing Systems (NIPS), 2258–2266.

Wang , Z. & Taylor , M. E. 2017. Improving reinforcement learning with confidence-based demonstrations. In Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI).

Wang , Z. & Taylor , M. E. 2019. Interactive reinforcement learning with dynamic reuse of prior knowledge from human and agent demonstrations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI-19), 3820–3827.

Williams , R. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8(3–4), 229–256.

Ziebart , B. D., Maas , A., Bagnell , J. A. & Dey , A. K. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 1433–1438.