|
Albus J. S. 1981. Brains, Behavior and Robotics. McGraw-Hill Inc.
Google Scholar
|
|
Benda M. 1985. On Optimal Cooperation of Knowledge Sources. Technical report BCS-G2010-28.
Google Scholar
|
|
Bianchi R. A., Martins M. F., Ribeiro C. H. & Costa A. H. 2014. ‘Heuristically-accelerated multiagent reinforcement learning’. IEEE Transactions on Cybernetics 44(2), 252–265.
Google Scholar
|
|
Brockman G., Cheung V., Pettersson L., Schneider J., Schulman J., Tang J. & Zaremba W. 2016. Openai gym. https://gym.openai.com (accessed 24 October 2017).
Google Scholar
|
|
Bruner J. S. 1957. Going beyond the information given. Contemporary Approaches to Cognition 1(1), 119–160.
Google Scholar
|
|
Brys T., Harutyunyan A., Suay H. B., Chernova S., Taylor M. E. & Nowé A. 2015. Reinforcement learning from demonstration through shaping. In IJCAI, 3352–3358.
Google Scholar
|
|
Brys T., Nowé A., Kudenko D. & Taylor M. E. 2014. Combining multiple correlated reward and shaping signals by measuring confidence. In AAAI, 1687–1693.
Google Scholar
|
|
Busoniu L., Babuska R., De Schutter B. & Ernst D. 2010. Reinforcement Learning and Dynamic Programming Using Function Approximators, 39. CRC Press.
Google Scholar
|
|
Devlin S., Grze´s M. & Kudenko D. 2011. Multi-agent, reward shaping for robocup keepaway. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 3, 1227–1228. International Foundation for Autonomous Agents and Multiagent Systems.
Google Scholar
|
|
Geramifard A., Klein R. H., Dann C., Dabney W. & How J. P. 2013. RLPy: The Reinforcement Learning Library for Education and Research. http://acl.mit.edu/RLPy.
Google Scholar
|
|
Girgin S., Polat F. & Alhajj R. 2007. Positive impact of state similarity on reinforcement learning performance. IEEE Transactions on Cybernetics 37(5), 1256–1270.
Google Scholar
|
|
Hart S. G. & Staveland L. E. 1988. Development of NASA-TLX (task load index): results of empirical and theoretical research. Advances in Psychology 52, 139–183.
Google Scholar
|
|
Hester T. & Stone P. 2013. Texplore: real-time sample-efficient reinforcement learning for robots. Machine Learning 90(3), 385–429.
Google Scholar
|
|
Jong N. K. & Stone P. 2007. Model-based function approximation in reinforcement learning. In AAMAS, 95. ACM.
Google Scholar
|
|
Karakovskiy S. & Togelius J. 2012. The Mario AI benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 55–67.
Google Scholar
|
|
Kelly G. 1955. Personal Construct Psychology. Norton.
Google Scholar
|
|
Knox W. B. & Stone P. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of AAMAS.
Google Scholar
|
|
Leffler B. R., Littman M. L. & Edmunds T. 2007. Efficient reinforcement learning with relocatable action models. AAAI 7, 572–577.
Google Scholar
|
|
Littman M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. ICML 157, 157–163.
Google Scholar
|
|
Martins M. F. & Bianchi R. A. 2013. Heuristically-accelerated reinforcement learning: a comparative analysis of performance. In Conference Towards Autonomous Robotic Systems, 15–27. Springer.
Google Scholar
|
|
Mataric M. J. 1994. Reward functions for accelerated learning. In Machine Learning: Proceedings of the Eleventh International Conference, 181–189.
Google Scholar
|
|
Mnih V., Kavukcuoglu K., Silver D., Rusu A. A., Veness J., Bellemare M. G., Graves A., Riedmiller M., Fidjeland A. K., Ostrovski G., Petersen S., Beattie C., Sadik A., Antonoglou I., King H., Kumaran D., Wierstra D., Legg S. & Hassabis D. 2015. Human-level control through deep reinforcement learning. Nature 518(7540), 529–533.
Google Scholar
|
|
Narayanamurthy S. M. & Ravindran B. 2008. On the hardness of finding symmetries in Markov decision processes. In ICML, 688–695.
Google Scholar
|
|
Ng A. Y., Harada D. & Russell S. 1999. Policy invariance under reward transformations: theory and application to reward shaping. ICML. 99, 278–287.
Google Scholar
|
|
Peng B., MacGlashan J., Loftin R., Littman M. L., Roberts D. L. & Taylor M. E. 2016. A need for speed: adapting agent action speed to improve task learning from non-expert humans. In AAMAS, 957–965.
Google Scholar
|
|
Randløv J. & Alstrøm P. 1998. Learning to drive a bicycle using reinforcement learning and shaping. ICML 98, 463–471.
Google Scholar
|
|
Ribeiro C. H. 1995. Attentional mechanisms as a strategy for generalisation in the q-learning algorithm. Proceedings of ICANN 95, 455–460.
Google Scholar
|
|
Ribeiro C. & Szepesv´ari C. 1996. Q-learning combined with spreading: convergence and results. In Proceedings of the ISRF-IEE International Conference on Intelligent and Cognitive Systems (Neural Networks Symposium), 32–36.
Google Scholar
|
|
Rosenfeld A. & Kraus S. 2018. Predicting human decision-making: from prediction to action. Synthesis Lectures on Artificial Intelligence and Machine Learning 12(1), 1–150.
Google Scholar
|
|
Rosenfeld A., Taylor M. E. & Kraus S. 2017a. Leveraging human knowledge in tabular reinforcement learning: a study of human subjects. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19–25, 2017, 3823–3830.
Google Scholar
|
|
Rosenfeld A., Taylor M. E. & Kraus S. 2017b. Speeding up tabular reinforcement learning using stateaction similarities. In AAMAS, 1722–1724.
Google Scholar
|
|
Schaul T., Bayer J., Wierstra D., Sun Y., Felder M., Sehnke F., Rückstieß T & Schmidhuber J. 2010. PyBrain, Journal of Machine Learning Research 11, 743–746.
Google Scholar
|
|
Sequeira P., Melo F. S. & Paiva A. 2013. An associative state-space metric for learning in factored mdps. In Portuguese Conference on Artificial Intelligence, 163–174. Springer.
Google Scholar
|
|
Skinner B. F. 1958. Reinforcement today. American Psychologist 13(3), 94.
Google Scholar
|
|
Stone P., Kuhlmann G., Taylor M. E. & Liu Y. 2006. Keepaway soccer: from machine learning testbed to benchmark. In RoboCup-2005: Robot Soccer World Cup IX, I. Noda, A. Jacoff, A. Bredenfeld & Y. Takahashi (eds). Springer Verlag 4020, 93–105.
Google Scholar
|
|
Suay H. B., Brys T., Taylor M. E. & Chernova S. 2016. Learning from demonstration for shaping through inverse reinforcement learning. In AAMAS, 429–437.
Google Scholar
|
|
Sutton R. S. & Barto A. G. 1998. Reinforcement Learning: An Introduction. MIT press.
Google Scholar
|
|
Szepesvári C. & Littman M. L. 1999. ‘A unified analysis of value-function-based reinforcementlearning algorithms’. Neural Computation 11(8), 2017–2060.
Google Scholar
|
|
Tamassia M., Zambetta F., Raffe W., Mueller F. & Li X. 2016. Dynamic choice of state abstraction in q-learning. In ECAI.
Google Scholar
|
|
Tanner B. & White A. 2009. RL-Glue : language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research 10, 2133–2136.
Google Scholar
|
|
Watkins C. J. C. H. 1989. Learning from Delayed Rewards. PhD thesis, University of Cambridge.
Google Scholar
|
|
Witten I. H., Frank E., Hall M. A. & Pal C. J. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.
Google Scholar
|
|
Zinkevich M. & Balch T. 2001. Symmetry in Markov decision processes and its implications for single agent and multi agent learning. In ICML.
Google Scholar
|