doi:10.1017/S026988891500020X

Bush R. & Mosteller F.1955. Stochastic Models for Learning. Wiley.

Castelletti A., Pianosi F. & Restelli M.2012. Tree-based fitted Q-iteration for multi-objective Markov decision problems. In IJCNN, 1–8. IEEE.

Claus C. & Boutilier C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of National Conference on Artificial Intelligence (AAAI-98), 746–752.

Hilgard E.1948. Theories of Learning. Appleton-Century-Crofts.

Hilgard E. & Bower B.1966. Theories of Learning. Prentice Hall.

Howell M. & Best M.2000. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Engineering Practice8(2), 147–154.

Howell M., Frost G., Gordon T. & Wu Q1997. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics7(3), 263–276.

Kapetanakis S., Kudenko D. & Strens M.2003. Learning to coordinate using commitment sequences in cooperative multiagent-systems. In Proceedings of the Third Symposium on Adaptive Agents and Multiagent Systems (AAMAS-03), 2004.

Parzen E.1960. Modern Probability Theory And Its Applications, Wiley Classics Edition. Wiley-Interscience.

Rodríguez A., Grau R. & Nowé A.2011. Continuous action reinforcement learning automata. Performance and convergence. In Proceedings of the Third International Conference on Agents and Artificial Intelligence, Filipe, J. & Fred, A. (eds). SciTePress, 473–478.

Thathachar M. & Sastry P.2004. Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers.

Tsetlin M.1961. The behavior of finite automata in random media. Avtomatika i Telemekhanika22, 1345–1354.

Tsetlin M.1962. The behavior of finite automata in random media. Avtomatika i Telemekhanika22, 1210–1219.

Tsypkin Y.1971. Adaptation and Learning in Automatic systems. Academic Press.

Tsypkin Y.1973. Foundations of the Theory of Learning Systems. Academic Press.

Veelen M. & Spreij P.2009. Evolution in games with a continuous action space. Economic Theory39(3), 355–376.

Verbeeck K.2004. Coordinated Exploration in Multi-Agent Reinforcement Learning. PhD thesis, Vrije Universiteit Brussel, Faculteit Wetenschappen, DINF, Computational Modeling Lab, September.

Vrabie D., Pastravanu O., Abu-Khalaf M. & Lewis F.2009. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica2(45), 477–484.