School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA"/>
Search
2024 Volume 39
Article Contents
RESEARCH ARTICLE   Open Access    

Reinforcement actor-critic learning as a rehearsal in MicroRTS

More Information
  • Abstract: Real-time strategy (RTS) games have provided a fertile ground for AI research with notable recent successes based on deep reinforcement learning (RL). However, RL remains a data-hungry approach featuring a high sample complexity. In this paper, we focus on a sample complexity reduction technique called reinforcement learning as a rehearsal (RLaR) and on the RTS game of MicroRTS to formulate and evaluate it. RLaR has been formulated in the context of action-value function based RL before. Here, we formulate it for a different RL framework, called actor-critic RL. We show that on the one hand the actor-critic framework allows RLaR to be much simpler, but on the other hand, it leaves room for a key component of RLaR–a prediction function that relates a learner’s observations with that of its opponent. This function, when leveraged for exploration, accelerates RL as our experiments in MicroRTS show. Further experiments provide evidence that RLaR may reduce actor noise compared to a variant that does not utilize RLaR’s exploration. This study provides the first evaluation of RLaR’s efficacy in a domain with a large strategy space.
  • 加载中
  • Balla , R.-K. & Fern , A. 2009. UCT for tactical assault planning in real-time strategy games. In International Joint Conference of Artificial Intelligence (IJCAI), San Francisco, CA, USA, 40–45.

    Google Scholar

    Barriga , N. A., Stanescu , M., Besoain , F. & Buro , M. 2019. Improving rts game ai by supervised policy learning, tactical search, and deep reinforcement learning. IEEE Computational Intelligence Magazine 14(3), 8–18.

    Google Scholar

    Buro , M. 2003. Real-time strategy games: A new AI research challenge. In Proceedings of International Joint Conferences on Artificial Intelligence, 1534–1535.

    Google Scholar

    Chung , M., Buro , M. & Schaeffer , J. 2005. Monte Carlo planning in RTS games. In IEEE Symposium on Computational Intelligence and Games (CIG). Citeseer.

    Google Scholar

    Churchill , D. & Buro , M. 2011. Build order optimization in starCraft. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 7(1), 14–19.

    Google Scholar

    Churchill , D., Saffidine , A. & Buro , M. 2012. Fast heuristic search for RTS Game Combat scenarios. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 8(1), 112–117.

    Google Scholar

    Critch , L. & Churchill , D. 2020. Combining influence maps with heuristic search for executing sneak-attacks in RTS games. In Proceedings of 2020 IEEE Conference on Games (CoG-20), 740–743.

    Google Scholar

    Forbus , K., Mahoney , J. & Dill , K. 2002. How qualitative spatial reasoning can improve strategy game AIs. IEEE Intelligent Systems 17(4), 25–30.

    Google Scholar

    Hochreiter , S. & Schmidhuber , J. 1997. Long short-term memory. Neural Computation 9(8), 1735–1780.

    Google Scholar

    Huang , S., Ontañón , S., Bamford , C. & Grela , L. 2021. Gym-RTS: Toward affordable full game real-time strategy games research with deep reinforcement learning. In 2021 IEEE Conference on Games (CoG). https://arxiv.org/abs/2105.13807

    Google Scholar

    Jaidee , U. & Muñoz-Avila , H. 2012. ClassQ-l: A Q-learning algorithm for adversarial real-time strategy games. In Eighth Artificial Intelligence and Interactive Digital Entertainment Conference.

    Google Scholar

    Kelly , R. & Churchill , D. 2020. Transfer Learning Between RTS Combat Scenarios Using Component-Action Deep Reinforcement Learning. https://ceur-ws.org/Vol-2862/paper28.pdf.

    Google Scholar

    Kraemer , L. & Banerjee , B. 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94.

    Google Scholar

    Marthi , B., Russell , S. J., Latham , D. & Guestrin , C. 2005. Concurrent hierarchical reinforcement learning. In International Joint Conference of Artificial Intelligence (IJCAI), 779–785.

    Google Scholar

    Mnih , V., Kavukcuoglu , K., Silver , D., Rusu , A., Veness , J., Bellemare , M., Graves , A., Riedmiller , M., Fidjeland , A., Ostrovski , G., Petersen , S., Beattie , C., Sadik , A., Antonoglou , I., King , H., Kumaran , D., Wierstra , D., Legg , S. & Hassabis , D. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–33.

    Google Scholar

    Ng , A. Y., Harada , D. & Russell , S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287. Morgan Kaufmann.

    Google Scholar

    Nguyen , T. & Banerjee , B. 2021. Reinforcement learning as a rehearsal for swarm foraging. Swarm Intelligence 16(1), 29–58.

    Google Scholar

    Niel , R., Krebbers , J., Drugan , M. M. & Wiering , M. A. 2018. Hierarchical reinforcement learning for real-time strategy games. In Proceedings of ICAART-2018, 470–477.

    Google Scholar

    Oh , J., Guo , Y., Singh , S. & Lee , H. 2018. Self-imitation learning. In ICML.

    Google Scholar

    Ontañón , S., Mishra , K., Sugandh , N. & Ram , A. 2008. Learning from demonstration and case-based planning for real-time strategy games. In Soft Computing Applications in Industry, 293–310. Springer.

    Google Scholar

    Ontañón , S. 2013. The combinatorial multi-armed Bandit problem and its application to real-time strategy games. In Proceedings of the Ninth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), AAAI, Boston, MA, 58–64.

    Google Scholar

    Ontañón , S., Synnaeve , G., Uriarte , A., Richoux , F., Churchill , D. & Preuss , M. 2013. A survey of real-time strategy game AI research and competition in starCraft. IEEE Transactions on Computational Intelligence and AI in Games 5(4), 293–311.

    Google Scholar

    Perkins , L. 2010. Terrain analysis in real-time strategy games: An integrated approach to choke point detection and region decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), 6, 168–173.

    Google Scholar

    Ray , D. & Sturtevant , N. R. 2023. Navigation in adversarial environments guided by PRA* and a local RL planner. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-23) 19(1), 343–351. https://ojs.aaai.org/index.php/AIIDE/article/view/27530

    Google Scholar

    Richoux , F. 2020. MicroPhantom: Playing MicroRTS under uncertainty and chaos. In 2020 IEEE Conference on Games (CoG), 670–677.

    Google Scholar

    Sharma , M., Holmes , M., Santamara , J., Irani , A., Jr, C. & Ram, A. 2007. Transfer learning in real-time strategy games using hybrid CBR/RL. In Proceedings of International Joint Conference on Artificial Intelligence, 1041–1046.

    Google Scholar

    Sohn , K., Lee , H. & Yan , X. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems 25.

    Google Scholar

    Sutton , R. & Barto , A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.

    Google Scholar

    Sutton , R. S., McAllester , D., Singh , S. & Mansour , Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, 1057–1063. MIT Press.

    Google Scholar

    Synnaeve , G. & Bessiere , P. 2011. A Bayesian model for plan recognition in RTS games applied to StarCraft. In Proceedings of the 7th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2011, 79–84.

    Google Scholar

    Vinyals , O., Babuschkin , I., Czarnecki , W., Mathieu , M., Dudzik , A., Chung , J., Choi , D., Powell , R., Ewalds , T., Georgiev , P., Oh , J., Horgan , D., Kroiss , M., Danihelka , I., Huang , A., Sifre , L., Cai , T., Agapiou , J., Jaderberg , M. & Silver , D. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575.

    Google Scholar

    Watkins , C. & Dayan , P. 1992. Q-learning. Machine Learning 3, 279–292.

    Google Scholar

    Weber , B. G., Mateas , M. & Jhala , A. 2011a. Building human-level AI for real-time strategy games. In AAAI Fall Symposium Series.

    Google Scholar

    Weber , B., Mateas , M. & Jhala , A. 2011b. A particle model for state estimation in real-time strategy games. In Proceedings of AIIDE, 103–108.

    Google Scholar

    Wender , S. & Watson , I. 2012. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft: Broodwar. In 2012 IEEE Conference on Computational Intelligence and Games (CIG), 402–408. IEEE.

    Google Scholar

    Williams , R. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8(3-4), 229–256.

    Google Scholar

  • Cite this article

    Shiron Manandhar, Bikramjit Banerjee. 2024. Reinforcement actor-critic learning as a rehearsal in MicroRTS. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000092
    Shiron Manandhar, Bikramjit Banerjee. 2024. Reinforcement actor-critic learning as a rehearsal in MicroRTS. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000092

Article Metrics

Article views(112) PDF downloads(64)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Reinforcement actor-critic learning as a rehearsal in MicroRTS

Abstract: Abstract: Real-time strategy (RTS) games have provided a fertile ground for AI research with notable recent successes based on deep reinforcement learning (RL). However, RL remains a data-hungry approach featuring a high sample complexity. In this paper, we focus on a sample complexity reduction technique called reinforcement learning as a rehearsal (RLaR) and on the RTS game of MicroRTS to formulate and evaluate it. RLaR has been formulated in the context of action-value function based RL before. Here, we formulate it for a different RL framework, called actor-critic RL. We show that on the one hand the actor-critic framework allows RLaR to be much simpler, but on the other hand, it leaves room for a key component of RLaR–a prediction function that relates a learner’s observations with that of its opponent. This function, when leveraged for exploration, accelerates RL as our experiments in MicroRTS show. Further experiments provide evidence that RLaR may reduce actor noise compared to a variant that does not utilize RLaR’s exploration. This study provides the first evaluation of RLaR’s efficacy in a domain with a large strategy space.

    • We gratefully acknowledge constructive feedback from anonymous reviewers on previous drafts of this manuscript. This work was supported in part by Air Force Research Lab grant FA8750-20-1-0105.

    • Due to discounting and long trajectories, the backup values of early states tend to be very small if the standard terminal sparse rewards +1, –1 were used. Moreover, the score from the game appears to be limited to the range $[{-}1000,1000]$ thus making the terminal rewards of drawn games comparable to our chosen win/loss rewards.

    • This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
References (36)
  • About this article
    Cite this article
    Shiron Manandhar, Bikramjit Banerjee. 2024. Reinforcement actor-critic learning as a rehearsal in MicroRTS. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000092
    Shiron Manandhar, Bikramjit Banerjee. 2024. Reinforcement actor-critic learning as a rehearsal in MicroRTS. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000092
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return