|
Amodei , D., Olah , C., Steinhardt , J., Christiano , P. F., Schulman , J. & ManÉ , D. 2016. Concrete problems in AI safety. CoRR. |
|
Bacon , P.-L., Harb , J. & Precup , D. 2017. The option-critic architecture. In AAAI, 1726–1734. |
|
Barreto , A., Borsa , D., Hou , S., Comanici , G., AygÜn , E., Hamel , P., Toyama , D., Mourad , S., Silver , D., Precup , D., et al. 2019. The option keyboard: combining skills in reinforcement learning. In Advances in Neural Information Processing Systems, 13052–13062. |
|
Barto , A. G. & Mahadevan , S. 2003. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13(4), 341–379. |
|
Bellemare , M. G., Naddaf , Y., Veness , J. & Bowling , M. 2013. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279. |
|
Borkar , V. S. & Meyn , S. P. 2002. Risk-sensitive optimal control for Markov decision processes with monotone cost. Mathematics of Operations Research 27(1), 192–209. |
|
Daniel , C., Van Hoof , H., Peters , J. & Neumann , G. 2016. Probabilistic inference for determining options in reinforcement learning. Machine Learning 104(2–3), 337–357. |
|
Dietterich , T. G. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303. |
|
Fikes , R. E., Hart , P. E. & Nilsson , N. J. 1972. Learning and executing generalized robot plans. Artificial Intelligence 3, 251–288. |
|
Fikes , R. E., Hart , P. E. & Nilsson , N. J. 1981. Learning and executing generalized robot plans. In Readings in Artificial Intelligence. Elsevier, 231–249. |
|
Future of Life Institute 2017. Asilomar AI principles. |
|
Garca , J. & FernÁndez , F. 2015. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480. |
|
Gehring , C. & Precup , D. 2013. Smart exploration in reinforcement learning using absolute temporal difference errors. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS 2013, 1037–1044. |
|
Geibel , P. & Wysotzki , F. 2005. Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence Research (JAIR) 24, 81–108. |
|
Harb , J., Bacon , P.-L., Klissarov , M. & Precup , D. 2018. When waiting is not an option: learning options with a deliberation cost. In AAAI. |
|
Heger , M. 1994. Consideration of risk in reinforcement learning. In Machine Learning Proceedings 1994. Elsevier, 105–111. |
|
Howard , R. A. & Matheson , J. E. 1972. Risk-sensitive Markov decision processes. Management Science 18(7), 356–369. |
|
Iba , G. A. 1989. A heuristic approach to the discovery of macro-operators. Machine Learning 3(4), 285–317. |
|
Iyengar , G. N. 2005. Robust dynamic programming. Mathematics of Operations Research 30(2), 257–280. |
|
Jain , A., Patil , G., Jain , A., Khetarpal , K. & Precup , D. 2021. Variance penalized on-policy and off-policy actor-critic. arXiv preprint arXiv:2102.01985. |
|
Jain , A. & Precup , D. 2018. Eligibility traces for options. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 1008–1016. |
|
Khetarpal , K., Klissarov , M., Chevalier-Boisvert , M., Bacon , P.-L. & Precup , D. 2020. Options of interest: Temporal abstraction with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence, 34, 4444–4451. |
|
Konidaris , G. & Barto , A. G. 2007. Building portable options: Skill transfer in reinforcement learning. In IJCAI, 7, 895–900. |
|
Konidaris , G., Kuindersma , S., Grupen , R. A. & Barto , A. G. 2011. Autonomous skill acquisition on a mobile manipulator. In AAAI. |
|
Korf , R. E. 1983. Learning to Solve Problems by Searching for Macro-operators. PhD thesis, Pittsburgh, PA, USA. AAI8425820. |
|
Kulkarni , T. D., Narasimhan , K., Saeedi , A. & Tenenbaum , J. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in Neural Information Processing Systems, 3675–3683. |
|
Law , E. L., Coggan , M., Precup , D. & Ratitch , B. 2005. Risk-directed exploration in reinforcement learning. In Planning and Learning in A Priori Unknown or Dynamic Domains, 97. |
|
Lim , S. H., Xu , H. & Mannor , S. 2013. Reinforcement learning in robust Markov decision processes. Advances in Neural Information Processing Systems 26, 701–709. |
|
Machado , M. C., Bellemare , M. G., Talvitie , E., Veness , J., Hausknecht , M. & Bowling , M. 2017. Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. ArXiv e-prints. |
|
Mankowitz , D. J., Mann , T. A. & Mannor , S. 2016. Adaptive skills adaptive partitions (ASAP). In Advances in Neural Information Processing Systems, 1588–1596. |
|
McGovern , A. & Barto , A. G. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In ICML, 1, 361–368. |
|
Menache , I., Mannor , S. & Shimkin , N. 2002. Q-cut - dynamic discovery of sub-goals in reinforcement learning. In European Conference on Machine Learning. Springer, 295–306. |
|
Mnih , V., Badia , A. P., Mirza , M., Graves , A., Lillicrap , T., Harley , T., Silver , D. & Kavukcuoglu , K. 2016. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, 1928–1937. |
|
Nair , A., Srinivasan , P., Blackwell , S., Alcicek , C., Fearon , R., Maria , A. D., Panneershelvam , V., Suleyman , M., Beattie , C., Petersen , S., Legg , S., Mnih , V., Kavukcuoglu , K. & Silver , D. 2015. Massively parallel methods for deep reinforcement learning. CoRR. |
|
Nilim , A. & El Ghaoui , L. 2005. Robust control of Markov decision processes with uncertain transition matrices. Operations Research 53(5), 780–798. |
|
Parr , R. & Russell , S. J. 1998. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, 1043–1049. |
|
Precup , D. 2000. Temporal abstraction in reinforcement learning (University of Massachusetts Amherst). |
|
Riemer , M., Liu , M. & Tesauro , G. 2018. Learning abstract options. In Advances in Neural Information Processing Systems, 10424–10434. |
|
Sherstan , C., Ashley , D. R., Bennett , B., Young , K., White , A., White , M. & Sutton , R. S. 2018. Comparing direct and indirect temporal-difference methods for estimating the variance of the return. In Proceedings of Uncertainty in Artificial Intelligence, 63–72. |
|
Stolle , M. & Precup , D. 2002. Learning options in reinforcement learning. In International Symposium on Abstraction, Reformulation & Approximation. Springer, 212–223. |
|
Sutton , R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3(1), 9–44. |
|
Sutton , R. S. & Barto , A. G. 1998. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition. |
|
Sutton , R. S., McAllester , D. A., Singh , S. P. & Mansour , Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, 1057–1063. |
|
Sutton , R. S., Precup , D. & Singh , S. 1999. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1-2), 181–211. |
|
Tamar , A., Di Castro , D. & Mannor , S. 2012. Policy gradients with variance related risk criteria. In Proceedings of the Twenty-Ninth International Conference on Machine Learning, 387–396. |
|
Tamar , A., Di Castro , D. & Mannor , S. 2016. Learning the variance of the reward-to-go. Journal of Machine Learning Research 17(13), 1–36. |
|
Tamar , A., Xu , H. & Mannor , S. 2013. Scaling up robust MDPs by reinforcement learning. arXiv preprint arXiv:1306.6189. |
|
Van Hasselt , H., Guez , A. & Silver , D. 2016. Deep reinforcement learning with double Q-learning. In AAAI, 16, 2094–2100. |
|
Vezhnevets , A., Mnih , V., Osindero , S., Graves , A., Vinyals , O., Agapiou , J., et al. 2016. Strategic attentive writer for learning macro-actions. In Advances in Neural Information Processing Systems, 3486–3494. |
|
Wang , Z., de Freitas , N. & Lanctot , M. 2015. Dueling network architectures for deep reinforcement learning. CoRR. |
|
White , D. 1994. A mathematical programming approach to a problem in variance penalised Markov decision processes. Operations-Research-Spektrum 15(4), 225–230. |