|
Bush R. & Mosteller F.1955. Stochastic Models for Learning. Wiley. |
|
Castelletti A., Pianosi F. & Restelli M.2012. Tree-based fitted Q-iteration for multi-objective Markov decision problems. In IJCNN, 1–8. IEEE. |
|
Claus C. & Boutilier C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of National Conference on Artificial Intelligence (AAAI-98), 746–752. |
|
Hilgard E.1948. Theories of Learning. Appleton-Century-Crofts. |
|
Hilgard E. & Bower B.1966. Theories of Learning. Prentice Hall. |
|
Howell M. & Best M.2000. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Engineering Practice8(2), 147–154. |
|
Howell M., Frost G., Gordon T. & Wu Q1997. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics7(3), 263–276. |
|
Kapetanakis S., Kudenko D. & Strens M.2003. Learning to coordinate using commitment sequences in cooperative multiagent-systems. In Proceedings of the Third Symposium on Adaptive Agents and Multiagent Systems (AAMAS-03), 2004. |
|
Parzen E.1960. Modern Probability Theory And Its Applications, Wiley Classics Edition. Wiley-Interscience. |
|
Rodríguez A., Grau R. & Nowé A.2011. Continuous action reinforcement learning automata. Performance and convergence. In Proceedings of the Third International Conference on Agents and Artificial Intelligence, Filipe, J. & Fred, A. (eds). SciTePress, 473–478. |
|
Thathachar M. & Sastry P.2004. Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers. |
|
Tsetlin M.1961. The behavior of finite automata in random media. Avtomatika i Telemekhanika22, 1345–1354. |
|
Tsetlin M.1962. The behavior of finite automata in random media. Avtomatika i Telemekhanika22, 1210–1219. |
|
Tsypkin Y.1971. Adaptation and Learning in Automatic systems. Academic Press. |
|
Tsypkin Y.1973. Foundations of the Theory of Learning Systems. Academic Press. |
|
Veelen M. & Spreij P.2009. Evolution in games with a continuous action space. Economic Theory39(3), 355–376. |
|
Verbeeck K.2004. Coordinated Exploration in Multi-Agent Reinforcement Learning. PhD thesis, Vrije Universiteit Brussel, Faculteit Wetenschappen, DINF, Computational Modeling Lab, September. |
|
Vrabie D., Pastravanu O., Abu-Khalaf M. & Lewis F.2009. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica2(45), 477–484. |