Search
2016 Volume 31
Article Contents
RESEARCH ARTICLE   Open Access    

A reinforcement learning approach to coordinate exploration with limited communication in continuous action games

More Information
  • Abstract: Learning automata are reinforcement learners belonging to the class of policy iterators. They have already been shown to exhibit nice convergence properties in a wide range of discrete action game settings. Recently, a new formulation for a continuous action reinforcement learning automata (CARLA) was proposed. In this paper, we study the behavior of these CARLA in continuous action games and propose a novel method for coordinated exploration of the joint-action space. Our method allows a team of independent learners, using CARLA, to find the optimal joint action in common interest settings. We first show that independent agents using CARLA will converge to a local optimum of the continuous action game. We then introduce a method for coordinated exploration which allows the team of agents to find the global optimum of the game. We validate our approach in a number of experiments.
  • 加载中
  • Bush R. & Mosteller F.1955. Stochastic Models for Learning. Wiley.

    Google Scholar

    Castelletti A., Pianosi F. & Restelli M.2012. Tree-based fitted Q-iteration for multi-objective Markov decision problems. In IJCNN, 1–8. IEEE.

    Google Scholar

    Claus C. & Boutilier C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of National Conference on Artificial Intelligence (AAAI-98), 746–752.

    Google Scholar

    Hilgard E.1948. Theories of Learning. Appleton-Century-Crofts.

    Google Scholar

    Hilgard E. & Bower B.1966. Theories of Learning. Prentice Hall.

    Google Scholar

    Howell M. & Best M.2000. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Engineering Practice8(2), 147–154.

    Google Scholar

    Howell M., Frost G., Gordon T. & Wu Q1997. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics7(3), 263–276.

    Google Scholar

    Kapetanakis S., Kudenko D. & Strens M.2003. Learning to coordinate using commitment sequences in cooperative multiagent-systems. In Proceedings of the Third Symposium on Adaptive Agents and Multiagent Systems (AAMAS-03), 2004.

    Google Scholar

    Parzen E.1960. Modern Probability Theory And Its Applications, Wiley Classics Edition. Wiley-Interscience.

    Google Scholar

    Rodríguez A., Grau R. & Nowé A.2011. Continuous action reinforcement learning automata. Performance and convergence. In Proceedings of the Third International Conference on Agents and Artificial Intelligence, Filipe, J. & Fred, A. (eds). SciTePress, 473–478.

    Google Scholar

    Thathachar M. & Sastry P.2004. Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers.

    Google Scholar

    Tsetlin M.1961. The behavior of finite automata in random media. Avtomatika i Telemekhanika22, 1345–1354.

    Google Scholar

    Tsetlin M.1962. The behavior of finite automata in random media. Avtomatika i Telemekhanika22, 1210–1219.

    Google Scholar

    Tsypkin Y.1971. Adaptation and Learning in Automatic systems. Academic Press.

    Google Scholar

    Tsypkin Y.1973. Foundations of the Theory of Learning Systems. Academic Press.

    Google Scholar

    Veelen M. & Spreij P.2009. Evolution in games with a continuous action space. Economic Theory39(3), 355–376.

    Google Scholar

    Verbeeck K.2004. Coordinated Exploration in Multi-Agent Reinforcement Learning. PhD thesis, Vrije Universiteit Brussel, Faculteit Wetenschappen, DINF, Computational Modeling Lab, September.

    Google Scholar

    Vrabie D., Pastravanu O., Abu-Khalaf M. & Lewis F.2009. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica2(45), 477–484.

    Google Scholar

  • Cite this article

    Abdel Rodríguez, Peter Vrancx, Ricardo Grau, Ann Nowé. 2016. A reinforcement learning approach to coordinate exploration with limited communication in continuous action games. The Knowledge Engineering Review 31(1)77−95, doi: 10.1017/S026988891500020X
    Abdel Rodríguez, Peter Vrancx, Ricardo Grau, Ann Nowé. 2016. A reinforcement learning approach to coordinate exploration with limited communication in continuous action games. The Knowledge Engineering Review 31(1)77−95, doi: 10.1017/S026988891500020X

Article Metrics

Article views(22) PDF downloads(5)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

A reinforcement learning approach to coordinate exploration with limited communication in continuous action games

The Knowledge Engineering Review  31 2016, 31(1): 77−95  |  Cite this article

Abstract: Abstract: Learning automata are reinforcement learners belonging to the class of policy iterators. They have already been shown to exhibit nice convergence properties in a wide range of discrete action game settings. Recently, a new formulation for a continuous action reinforcement learning automata (CARLA) was proposed. In this paper, we study the behavior of these CARLA in continuous action games and propose a novel method for coordinated exploration of the joint-action space. Our method allows a team of independent learners, using CARLA, to find the optimal joint action in common interest settings. We first show that independent agents using CARLA will converge to a local optimum of the continuous action game. We then introduce a method for coordinated exploration which allows the team of agents to find the global optimum of the game. We validate our approach in a number of experiments.

    • This is a common assumption in control applications.

    • The original formulation is J(Λ)=∫ΛR (a, Λ) df (a) but for consistency reasons, we adapted this formulation to the notation used in this paper.

    • A conflicting interest version of ESRL also exists, however, as we only use common interest settings in this paper, we only describe the common interest version.

    • ESRL can perfectly deal with games with stochastic payoff but we illustrate the idea on a deterministic example.

    • © Cambridge University Press, 2016 2016Cambridge University Press
References (18)
  • About this article
    Cite this article
    Abdel Rodríguez, Peter Vrancx, Ricardo Grau, Ann Nowé. 2016. A reinforcement learning approach to coordinate exploration with limited communication in continuous action games. The Knowledge Engineering Review 31(1)77−95, doi: 10.1017/S026988891500020X
    Abdel Rodríguez, Peter Vrancx, Ricardo Grau, Ann Nowé. 2016. A reinforcement learning approach to coordinate exploration with limited communication in continuous action games. The Knowledge Engineering Review 31(1)77−95, doi: 10.1017/S026988891500020X
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return