Reinforcement actor-critic learning as a rehearsal in MicroRTS

Shiron Manandhar; Bikramjit Banerjee; Shiron Manandhar; Bikramjit Banerjee

doi:10.1017/S0269888924000092

2024 Volume 39

Article Contents

Next Previous

RESEARCH ARTICLE Open Access

Reinforcement actor-critic learning as a rehearsal in MicroRTS

School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA

More Information

Corresponding author: Corresponding author: Bikramjit Banerjee; Email: Bikramjit.Banerjee@usm.edu

Received: 29 April 2023
Revised: 13 April 2024
Accepted: 02 June 2024
Published online: 08 November 2024
The Knowledge Engineering Review 39, Article number: e6 (2024) | Cite this article

Abstract

Abstract: Real-time strategy (RTS) games have provided a fertile ground for AI research with notable recent successes based on deep reinforcement learning (RL). However, RL remains a data-hungry approach featuring a high sample complexity. In this paper, we focus on a sample complexity reduction technique called reinforcement learning as a rehearsal (RLaR) and on the RTS game of MicroRTS to formulate and evaluate it. RLaR has been formulated in the context of action-value function based RL before. Here, we formulate it for a different RL framework, called actor-critic RL. We show that on the one hand the actor-critic framework allows RLaR to be much simpler, but on the other hand, it leaves room for a key component of RLaR–a prediction function that relates a learner’s observations with that of its opponent. This function, when leveraged for exploration, accelerates RL as our experiments in MicroRTS show. Further experiments provide evidence that RLaR may reduce actor noise compared to a variant that does not utilize RLaR’s exploration. This study provides the first evaluation of RLaR’s efficacy in a domain with a large strategy space.
Rights and permissions
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.

References

Balla , R.-K. & Fern , A. 2009. UCT for tactical assault planning in real-time strategy games. In International Joint Conference of Artificial Intelligence (IJCAI), San Francisco, CA, USA, 40–45.

Google Scholar

Barriga , N. A., Stanescu , M., Besoain , F. & Buro , M. 2019. Improving rts game ai by supervised policy learning, tactical search, and deep reinforcement learning. IEEE Computational Intelligence Magazine 14(3), 8–18.

Google Scholar

Buro , M. 2003. Real-time strategy games: A new AI research challenge. In Proceedings of International Joint Conferences on Artificial Intelligence, 1534–1535.

Google Scholar

Chung , M., Buro , M. & Schaeffer , J. 2005. Monte Carlo planning in RTS games. In IEEE Symposium on Computational Intelligence and Games (CIG). Citeseer.

Google Scholar

Churchill , D. & Buro , M. 2011. Build order optimization in starCraft. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 7(1), 14–19.

Google Scholar

Churchill , D., Saffidine , A. & Buro , M. 2012. Fast heuristic search for RTS Game Combat scenarios. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 8(1), 112–117.

Google Scholar

Critch , L. & Churchill , D. 2020. Combining influence maps with heuristic search for executing sneak-attacks in RTS games. In Proceedings of 2020 IEEE Conference on Games (CoG-20), 740–743.

Google Scholar

Forbus , K., Mahoney , J. & Dill , K. 2002. How qualitative spatial reasoning can improve strategy game AIs. IEEE Intelligent Systems 17(4), 25–30.

Google Scholar

Hochreiter , S. & Schmidhuber , J. 1997. Long short-term memory. Neural Computation 9(8), 1735–1780.

Google Scholar

Huang , S., Ontañón , S., Bamford , C. & Grela , L. 2021. Gym-RTS: Toward affordable full game real-time strategy games research with deep reinforcement learning. In 2021 IEEE Conference on Games (CoG). https://arxiv.org/abs/2105.13807

Google Scholar

Jaidee , U. & Muñoz-Avila , H. 2012. ClassQ-l: A Q-learning algorithm for adversarial real-time strategy games. In Eighth Artificial Intelligence and Interactive Digital Entertainment Conference.

Google Scholar

Kelly , R. & Churchill , D. 2020. Transfer Learning Between RTS Combat Scenarios Using Component-Action Deep Reinforcement Learning. https://ceur-ws.org/Vol-2862/paper28.pdf.

Google Scholar

Kraemer , L. & Banerjee , B. 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94.

Google Scholar

Marthi , B., Russell , S. J., Latham , D. & Guestrin , C. 2005. Concurrent hierarchical reinforcement learning. In International Joint Conference of Artificial Intelligence (IJCAI), 779–785.

Google Scholar

Mnih , V., Kavukcuoglu , K., Silver , D., Rusu , A., Veness , J., Bellemare , M., Graves , A., Riedmiller , M., Fidjeland , A., Ostrovski , G., Petersen , S., Beattie , C., Sadik , A., Antonoglou , I., King , H., Kumaran , D., Wierstra , D., Legg , S. & Hassabis , D. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–33.

Google Scholar

Ng , A. Y., Harada , D. & Russell , S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287. Morgan Kaufmann.

Google Scholar

Nguyen , T. & Banerjee , B. 2021. Reinforcement learning as a rehearsal for swarm foraging. Swarm Intelligence 16(1), 29–58.

Google Scholar

Niel , R., Krebbers , J., Drugan , M. M. & Wiering , M. A. 2018. Hierarchical reinforcement learning for real-time strategy games. In Proceedings of ICAART-2018, 470–477.

Google Scholar

Oh , J., Guo , Y., Singh , S. & Lee , H. 2018. Self-imitation learning. In ICML.

Google Scholar

Ontañón , S., Mishra , K., Sugandh , N. & Ram , A. 2008. Learning from demonstration and case-based planning for real-time strategy games. In Soft Computing Applications in Industry, 293–310. Springer.

Google Scholar

Ontañón , S. 2013. The combinatorial multi-armed Bandit problem and its application to real-time strategy games. In Proceedings of the Ninth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), AAAI, Boston, MA, 58–64.

Google Scholar

Ontañón , S., Synnaeve , G., Uriarte , A., Richoux , F., Churchill , D. & Preuss , M. 2013. A survey of real-time strategy game AI research and competition in starCraft. IEEE Transactions on Computational Intelligence and AI in Games 5(4), 293–311.

Google Scholar

Perkins , L. 2010. Terrain analysis in real-time strategy games: An integrated approach to choke point detection and region decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), 6, 168–173.

Google Scholar

Ray , D. & Sturtevant , N. R. 2023. Navigation in adversarial environments guided by PRA* and a local RL planner. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-23) 19(1), 343–351. https://ojs.aaai.org/index.php/AIIDE/article/view/27530

Google Scholar

Richoux , F. 2020. MicroPhantom: Playing MicroRTS under uncertainty and chaos. In 2020 IEEE Conference on Games (CoG), 670–677.

Google Scholar

Sharma , M., Holmes , M., Santamara , J., Irani , A., Jr, C. & Ram, A. 2007. Transfer learning in real-time strategy games using hybrid CBR/RL. In Proceedings of International Joint Conference on Artificial Intelligence, 1041–1046.

Google Scholar

Sohn , K., Lee , H. & Yan , X. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems 25.

Google Scholar

Sutton , R. & Barto , A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.

Google Scholar

Sutton , R. S., McAllester , D., Singh , S. & Mansour , Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, 1057–1063. MIT Press.

Google Scholar

Synnaeve , G. & Bessiere , P. 2011. A Bayesian model for plan recognition in RTS games applied to StarCraft. In Proceedings of the 7th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2011, 79–84.

Google Scholar

Vinyals , O., Babuschkin , I., Czarnecki , W., Mathieu , M., Dudzik , A., Chung , J., Choi , D., Powell , R., Ewalds , T., Georgiev , P., Oh , J., Horgan , D., Kroiss , M., Danihelka , I., Huang , A., Sifre , L., Cai , T., Agapiou , J., Jaderberg , M. & Silver , D. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575.

Google Scholar

Watkins , C. & Dayan , P. 1992. Q-learning. Machine Learning 3, 279–292.

Google Scholar

Weber , B. G., Mateas , M. & Jhala , A. 2011a. Building human-level AI for real-time strategy games. In AAAI Fall Symposium Series.

Google Scholar

Weber , B., Mateas , M. & Jhala , A. 2011b. A particle model for state estimation in real-time strategy games. In Proceedings of AIIDE, 103–108.

Google Scholar

Wender , S. & Watson , I. 2012. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft: Broodwar. In 2012 IEEE Conference on Computational Intelligence and Games (CIG), 402–408. IEEE.

Google Scholar

Williams , R. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8(3-4), 229–256.

Google Scholar

About this article

Cite this article

Shiron Manandhar, Bikramjit Banerjee. 2024. Reinforcement actor-critic learning as a rehearsal in MicroRTS. The Knowledge Engineering Review. 39:2 doi: 10.1017/S0269888924000092

Shiron Manandhar, Bikramjit Banerjee. 2024. Reinforcement actor-critic learning as a rehearsal in MicroRTS. The Knowledge Engineering Review. 39:2 doi: 10.1017/S0269888924000092

Download PDF

Article Metrics

Article views(361) PDF downloads(196)

{{lists.name}}

Reinforcement actor-critic learning as a rehearsal in MicroRTS

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors