doi:10.1017/S0269888923000012

Allis , L. V. 1988. A knowledge-based approach of connect-four. Journal of the International Computer Games Association 11, 165.

Applebaum , A., Miller , D., Strom , B., Korban , C. & Wolf , R. 2016. Intelligent, automated red team emulation. In Proceedings of the 32nd Annual Conference on Computer Security Applications, 363–373.

Arulkumaran , K., Deisenroth , M. P., Brundage , M. & Bharath , A. A. 2017. Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine 34(6), 26–38.

Backes , M., Hoffmann , J., Künnemann , R., Speicher , P. & Steinmetz , M. 2017. Simulated penetration testing and mitigation analysis. ArXiv abs/1705.05088.

Baillie , C., Standen , M., Schwartz , J., Docking , M., Bowman , D. & Kim , J. 2020. Cyborg: An autonomous cyber operations research gym. ArXiv abs/2002.10667.

Bräm , T., Brunner , G., Richter , O. & Wattenhofer , R. 2020. Attentive multi-task deep reinforcement learning. In Machine Learning and Knowledge Discovery in Databases, Brefeld , U., Fromont , E., Hotho , A., Knobbe , A., Maathuis , M. & Robardet , C. (eds). Springer International Publishing, 134–149.

Brockman , G., Cheung , V., Pettersson , L., Schneider , J., Schulman , J., Tang , J. & Zaremba , W. 2016. Openai gym. ArXiv abs/1606.01540.

Corporation , T. M. (n.d.a). Mitre att&ck. https://attack.mitre.org.

Corporation , T. M. (n.d.b). Mitre engage. https://engage.mitre.org.

Droste , S., Jansen , T. & Wegener , I. 2002. On the analysis of the (1+1) evolutionary algorithm. Theoretical Computer Science 276(1), 51–81. https://www.sciencedirect.com/science/article/pii/S0304397501001827.

Engström , V. & Lagerström , R. 2022. Two decades of cyberattack simulations: a systematic literature review. Computers & Security, 102681.

Falco , G., Viswanathan , A., Caldera , C. & Shrobe , H. 2018. A master attack methodology for an ai-based automated attack planner for smart cities. IEEE Access 6, 48360–48373.

Goldberg , D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning, 1st edition. Addison-Wesley Longman Publishing Co., Inc.

Grondman , I., Buşoniu , L., Lopes , G. & Babuška , R. 2012. A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 1291–1307.

Group , A. 2022. Adversarial agent-learning for cybersecurity. https://github.com/ALFA-group/adversarial_agent_learning_for_cybersecurity.

Hansen , N. (2016). The CMA evolution strategy: a tutorial. ArXiv abs/1604.00772.

Harris , S. N. & Tauritz , D. R. 2021. Competitive Coevolution for Defense and Security: Elo-Based Similar-Strength Opponent Sampling. Association for Computing Machinery, 1898–1906. https://doi.org/10.1145/3449726.3463193.

Huang , L. & Zhu , Q. 2020. A dynamic games approach to proactive defense strategies against advanced persistent threats in cyber-physical systems. Computers & Security 89,101660.

Jiménez , S., De La Rosa , T., Fernández , S., Fernández , F. & Borrajo , D. 2012. A review of machine learning for automated planning. The Knowledge Engineering Review 27(4), 433–467.

Klijn , D. & Eiben , A. E. 2021. A Coevolutionary Approach to Deep Multi-Agent Reinforcement Learning. Association for Computing Machinery, 283–284. https://doi.org/10.1145/3449726.3459576.

Lee , K., Lee , B.-U., Shin , U. & Kweon , I. S. 2020. An efficient asynchronous method for integrating evolutionary and gradient-based policy search. In Advances in Neural Information Processing Systems, Larochelle , H., Ranzato , M., Hadsell , R., Balcan , M. F. & Lin , H. (eds), 33. Curran Associates, Inc., 10124–10135. https://proceedings.neurips.cc/paper/2020/file/731309c4bb223491a9f67eac5214fb2e-Paper.pdf.

Lillicrap , T. P., Hunt , J. J., Pritzel , A., Heess , N. M. O., Erez , T., Tassa , Y., Silver , D. & Wierstra , D. 2016. Continuous control with deep reinforcement learning. ArXiv abs/1509.02971.

Liu , J., Pérez-Liébana , D. & Lucas , S. M. 2016. Rolling horizon coevolutionary planning for two-player video games. In 2016 8th Computer Science and Electronic Engineering (CEEC), 174–179.

Liu , L., Yasin Chouhan , A., Li , T., Fatima , R. & Wang , J. 2018. Improving software security awareness using a serious game. IET Software 13, 159–169.

Luh , R., Temper , M., Tjoa , S., Schrittwieser , S. & Janicke , H. 2019. Penquest: a gamified attacker/defender meta model for cyber security assessment and education. Journal of Computer Virology and Hacking Techniques 16, 19–61.

Macua , S. V., Davies , I., Tukiainen , A. & De Cote , E. M. 2021. Fully distributed actor-critic architecture for multitask deep reinforcement learning. The Knowledge Engineering Review 36, e6.

Metz , L., Ibarz , J., Jaitly , N. & Davidson , J. 2017. Discrete sequential prediction of continuous actions for deep RL. ArXiv abs/1705.05035.

Mnih , V., Kavukcuoglu , K., Silver , D., Rusu , A. A., Veness , J., Bellemare , M. G., Graves , A., Riedmiller , M. A., Fidjeland , A., Ostrovski , G., Petersen , S., Beattie , C., Sadik , A., Antonoglou , I., King , H., Kumaran , D., Wierstra , D., Legg , S. & Hassabis , D. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–533.

Molina-Markham , A., Winder , R. K. & Ridley , A. 2021. Network defense is not a game. ArXiv abs/2104.10262.

Nguyen , T. T. & Reddi , V. J. 2021. Deep reinforcement learning for cyber security. IEEE Transactions on Neural Networks and Learning Systems, 1–17.

Olesen , T. V. A. N., Nguyen , D. T. T., Palm , R. B. & Risi , S. 2021. Evolutionary planning in latent space. In Applications of Evolutionary Computation, Castillo , P. A. & Jiménez Laredo , J. L. (eds). Springer International Publishing, 522–536.

Palangi , H., Deng , L., Shen , Y., Gao , J., He , X., Chen , J., Song , X. & Ward , R. 2016. Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 694–707.

Panait , L. & Luke , S. 2002. A comparison of two competitive fitness functions. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, GECCO’02, Morgan Kaufmann Publishers Inc., 503–511.

Partalas , I., Vrakas , D. & Vlahavas , I. 2012. Reinforcement learning and automated planning: a survey. In Artificial Intelligence for Advanced Problem Solving Techniques.

Popovici , E., Bucci , A., Wiegand , R. P. & Jong , E. D. 2012. Coevolutionary principles. In Handbook of Natural Computing.

Potter , M. A. & Jong , K. A. D. 2000. Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evolutionary Computation 8, 1–29.

Pourchot , A. & Sigaud , O. 2018. CEM-RL: combining evolutionary and gradient-based methods for policy search. ArXiv abs/1810.01222.

Prince , M. H., McGehee , A. J. & Tauritz , D. R. 2021. Edm-drl: toward stable reinforcement learning through ensembled directed mutation. In Applications of Evolutionary Computation, Castillo , P. A. & Jiménez Laredo , J. L. (eds). Springer International Publishing, 275–290.

Rechenberg , I. 1989. Evolution strategy: nature’s way of optimization. In Optimization: Methods and Applications, Possibilities and Limitations, Bergmann , H. W. (ed.). Springer Berlin Heidelberg, 106–126.

Reinstadler , B. 2021. Ai Attack Planning for Emulated Networks. Master’s thesis, Massachusetts Institute of Technology.

Rush , G., Tauritz , D. R. & Kent , A. D. 2015. Coevolutionary agent-based network defense lightweight event system (candles). In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation. GECCO Companion’15. Association for Computing Machinery, 859–866. https://doi.org/10.1145/2739482.2768429.

Salimans , T., Ho , J., Chen , X. & Sutskever , I. 2017. Evolution strategies as a scalable alternative to reinforcement learning. ArXiv abs/1703.03864.

Schaul , T. & Schmidhuber , J. 2008. A scalable neural network architecture for board games. In 2008 IEEE Symposium on Computational Intelligence and Games, CIG 2008, 357–364.

Sigaud , O. & Stulp , F. 2019. Policy search in continuous action domains: an overview. Neural Networks: The Official Journal of the International Neural Network Society 113, 28–40.

Silver , D., Hubert , T., Schrittwieser , J., Antonoglou , I., Lai , M., Guez , A., Lanctot , M., Sifre , L., Kumaran , D., Graepel , T., Lillicrap , T., Simonyan , K. & Hassabis , D. 2018. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144. https://www.science.org/doi/abs/10.1126/science.aar6404.

Simione , L. & Nolfi , S. 2017. Achieving long-term progress in competitive co-evolution. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 1–8.

Team , M. D. R. 2021. Cyberbattlesim. https://github.com/microsoft/cyberbattlesim. Created by Christian Seifert, Michael Betser, William Blum, James Bono, Kate Farris, Emily Goren, Justin Grana, Kristian Holsheimer, Brandon Marken, Joshua Neil, Nicole Nichols, Jugal Parikh, Haoran Wei.

The MITRE Corporation 2020. Caldera. https://github.com/mitre/caldera.

Vinyals , O., Babuschkin , I., Czarnecki , W., Mathieu , M., Dudzik , A., Chung , J., Choi , D., Powell , R., Ewalds , T., Georgiev , P., Oh , J., Horgan , D., Kroiss , M., Danihelka , I., Huang , A., Sifre , L., Cai , T., Agapiou , J., Jaderberg , M., Vezhnevets , A., Leblond , R., Pohlen , T., Dalibard , V., Budden , D., Sulsky , Y., Molloy , J., Paine , T., Gulcehre , C., Wang , Z., Pfaff , T., Wu , Y., Ring , R., Yogatama , D., Wünsch , D., McKinney , K., Smith , O., Schaul , T., Lillicrap , T., Kavukcuoglu , K., Hassabis , D., Apps , C. & Silver , D. 2019. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354.

Walter , E. C., Ferguson-Walter , K. J. & Ridley , A. D. 2021. Incorporating deception into cyberbattlesim for autonomous defense. ArXiv abs/2108.13980.

Yang , L.-X., Pengdeng , L., Zhang , Y., Yang , X., Xiang , Y. & Zhou , W. 2018. Effective repair strategy against advanced persistent threat: a differential game approach. IEEE Transactions on Information Forensics and Security14(7), 1713–1728.

Zhu , Q. & Rass , S. 2018. On multi-phase and multi-stage game-theoretic modeling of advanced persistent threats. IEEE Access 6, 13958–13971.