Williams College, Williamstown, MA 01267, USA; e-mail: aes7@williams.edu"/> MIT CSAIL, Cambridge, MA 02139, USA; e-mails: hembergerik@csail.mit.edu, mtulla@mit.edu, unamay@csail.mit.edu"/>
Search
2023 Volume 38
Article Contents
RESEARCH ARTICLE   Open Access    

Adversarial agent-learning for cybersecurity: a comparison of algorithms

More Information
  • Abstract: We investigate artificial intelligence and machine learning methods for optimizing the adversarial behavior of agents in cybersecurity simulations. Our cybersecurity simulations integrate the modeling of agents launching Advanced Persistent Threats (APTs) with the modeling of agents using detection and mitigation mechanisms against APTs. This simulates the phenomenon of how attacks and defenses coevolve. The simulations and machine learning are used to search for optimal agent behaviors. The central question is: under what circumstances, is one training method more advantageous than another? We adapt and compare a variety of deep reinforcement learning (DRL), evolutionary strategies (ES) and Monte Carlo Tree Search methods within Connect 4, a baseline game environment, and on both a simulation supporting a simple APT threat model, SNAPT, as well as CyberBattleSim, an open-source cybersecurity simulation. Our results show that when attackers are trained by DRL and ES algorithms, as well as when they are trained with both algorithms being used in alternation, they are able to effectively choose complex exploits that thwart a defense. The algorithm that combines DRL and ES achieves the best comparative performance when attackers and defenders are simultaneously trained, rather than when each is trained against its non-learning counterpart.
  • 加载中
  • Allis , L. V. 1988. A knowledge-based approach of connect-four. Journal of the International Computer Games Association 11, 165.

    Google Scholar

    Applebaum , A., Miller , D., Strom , B., Korban , C. & Wolf , R. 2016. Intelligent, automated red team emulation. In Proceedings of the 32nd Annual Conference on Computer Security Applications, 363–373.

    Google Scholar

    Arulkumaran , K., Deisenroth , M. P., Brundage , M. & Bharath , A. A. 2017. Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine 34(6), 26–38.

    Google Scholar

    Backes , M., Hoffmann , J., Künnemann , R., Speicher , P. & Steinmetz , M. 2017. Simulated penetration testing and mitigation analysis. ArXiv abs/1705.05088.

    Google Scholar

    Baillie , C., Standen , M., Schwartz , J., Docking , M., Bowman , D. & Kim , J. 2020. Cyborg: An autonomous cyber operations research gym. ArXiv abs/2002.10667.

    Google Scholar

    Bräm , T., Brunner , G., Richter , O. & Wattenhofer , R. 2020. Attentive multi-task deep reinforcement learning. In Machine Learning and Knowledge Discovery in Databases, Brefeld , U., Fromont , E., Hotho , A., Knobbe , A., Maathuis , M. & Robardet , C. (eds). Springer International Publishing, 134–149.

    Google Scholar

    Brockman , G., Cheung , V., Pettersson , L., Schneider , J., Schulman , J., Tang , J. & Zaremba , W. 2016. Openai gym. ArXiv abs/1606.01540.

    Google Scholar

    Corporation , T. M. (n.d.a). Mitre att&ck. https://attack.mitre.org.

    Google Scholar

    Corporation , T. M. (n.d.b). Mitre engage. https://engage.mitre.org.

    Google Scholar

    Droste , S., Jansen , T. & Wegener , I. 2002. On the analysis of the (1+1) evolutionary algorithm. Theoretical Computer Science 276(1), 51–81. https://www.sciencedirect.com/science/article/pii/S0304397501001827.

    Google Scholar

    Engström , V. & Lagerström , R. 2022. Two decades of cyberattack simulations: a systematic literature review. Computers & Security, 102681.

    Google Scholar

    Falco , G., Viswanathan , A., Caldera , C. & Shrobe , H. 2018. A master attack methodology for an ai-based automated attack planner for smart cities. IEEE Access 6, 48360–48373.

    Google Scholar

    Goldberg , D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning, 1st edition. Addison-Wesley Longman Publishing Co., Inc.

    Google Scholar

    Grondman , I., Buşoniu , L., Lopes , G. & Babuška , R. 2012. A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 1291–1307.

    Google Scholar

    Group , A. 2022. Adversarial agent-learning for cybersecurity. https://github.com/ALFA-group/adversarial_agent_learning_for_cybersecurity.

    Google Scholar

    Hansen , N. (2016). The CMA evolution strategy: a tutorial. ArXiv abs/1604.00772.

    Google Scholar

    Harris , S. N. & Tauritz , D. R. 2021. Competitive Coevolution for Defense and Security: Elo-Based Similar-Strength Opponent Sampling. Association for Computing Machinery, 1898–1906. https://doi.org/10.1145/3449726.3463193.

    Google Scholar

    Huang , L. & Zhu , Q. 2020. A dynamic games approach to proactive defense strategies against advanced persistent threats in cyber-physical systems. Computers & Security 89,101660.

    Google Scholar

    Jiménez , S., De La Rosa , T., Fernández , S., Fernández , F. & Borrajo , D. 2012. A review of machine learning for automated planning. The Knowledge Engineering Review 27(4), 433–467.

    Google Scholar

    Klijn , D. & Eiben , A. E. 2021. A Coevolutionary Approach to Deep Multi-Agent Reinforcement Learning. Association for Computing Machinery, 283–284. https://doi.org/10.1145/3449726.3459576.

    Google Scholar

    Lee , K., Lee , B.-U., Shin , U. & Kweon , I. S. 2020. An efficient asynchronous method for integrating evolutionary and gradient-based policy search. In Advances in Neural Information Processing Systems, Larochelle , H., Ranzato , M., Hadsell , R., Balcan , M. F. & Lin , H. (eds), 33. Curran Associates, Inc., 10124–10135. https://proceedings.neurips.cc/paper/2020/file/731309c4bb223491a9f67eac5214fb2e-Paper.pdf.

    Google Scholar

    Lillicrap , T. P., Hunt , J. J., Pritzel , A., Heess , N. M. O., Erez , T., Tassa , Y., Silver , D. & Wierstra , D. 2016. Continuous control with deep reinforcement learning. ArXiv abs/1509.02971.

    Google Scholar

    Liu , J., Pérez-Liébana , D. & Lucas , S. M. 2016. Rolling horizon coevolutionary planning for two-player video games. In 2016 8th Computer Science and Electronic Engineering (CEEC), 174–179.

    Google Scholar

    Liu , L., Yasin Chouhan , A., Li , T., Fatima , R. & Wang , J. 2018. Improving software security awareness using a serious game. IET Software 13, 159–169.

    Google Scholar

    Luh , R., Temper , M., Tjoa , S., Schrittwieser , S. & Janicke , H. 2019. Penquest: a gamified attacker/defender meta model for cyber security assessment and education. Journal of Computer Virology and Hacking Techniques 16, 19–61.

    Google Scholar

    Macua , S. V., Davies , I., Tukiainen , A. & De Cote , E. M. 2021. Fully distributed actor-critic architecture for multitask deep reinforcement learning. The Knowledge Engineering Review 36, e6.

    Google Scholar

    Metz , L., Ibarz , J., Jaitly , N. & Davidson , J. 2017. Discrete sequential prediction of continuous actions for deep RL. ArXiv abs/1705.05035.

    Google Scholar

    Mnih , V., Kavukcuoglu , K., Silver , D., Rusu , A. A., Veness , J., Bellemare , M. G., Graves , A., Riedmiller , M. A., Fidjeland , A., Ostrovski , G., Petersen , S., Beattie , C., Sadik , A., Antonoglou , I., King , H., Kumaran , D., Wierstra , D., Legg , S. & Hassabis , D. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–533.

    Google Scholar

    Molina-Markham , A., Winder , R. K. & Ridley , A. 2021. Network defense is not a game. ArXiv abs/2104.10262.

    Google Scholar

    Nguyen , T. T. & Reddi , V. J. 2021. Deep reinforcement learning for cyber security. IEEE Transactions on Neural Networks and Learning Systems, 1–17.

    Google Scholar

    Olesen , T. V. A. N., Nguyen , D. T. T., Palm , R. B. & Risi , S. 2021. Evolutionary planning in latent space. In Applications of Evolutionary Computation, Castillo , P. A. & Jiménez Laredo , J. L. (eds). Springer International Publishing, 522–536.

    Google Scholar

    Palangi , H., Deng , L., Shen , Y., Gao , J., He , X., Chen , J., Song , X. & Ward , R. 2016. Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 694–707.

    Google Scholar

    Panait , L. & Luke , S. 2002. A comparison of two competitive fitness functions. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, GECCO’02, Morgan Kaufmann Publishers Inc., 503–511.

    Google Scholar

    Partalas , I., Vrakas , D. & Vlahavas , I. 2012. Reinforcement learning and automated planning: a survey. In Artificial Intelligence for Advanced Problem Solving Techniques.

    Google Scholar

    Popovici , E., Bucci , A., Wiegand , R. P. & Jong , E. D. 2012. Coevolutionary principles. In Handbook of Natural Computing.

    Google Scholar

    Potter , M. A. & Jong , K. A. D. 2000. Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evolutionary Computation 8, 1–29.

    Google Scholar

    Pourchot , A. & Sigaud , O. 2018. CEM-RL: combining evolutionary and gradient-based methods for policy search. ArXiv abs/1810.01222.

    Google Scholar

    Prince , M. H., McGehee , A. J. & Tauritz , D. R. 2021. Edm-drl: toward stable reinforcement learning through ensembled directed mutation. In Applications of Evolutionary Computation, Castillo , P. A. & Jiménez Laredo , J. L. (eds). Springer International Publishing, 275–290.

    Google Scholar

    Rechenberg , I. 1989. Evolution strategy: nature’s way of optimization. In Optimization: Methods and Applications, Possibilities and Limitations, Bergmann , H. W. (ed.). Springer Berlin Heidelberg, 106–126.

    Google Scholar

    Reinstadler , B. 2021. Ai Attack Planning for Emulated Networks. Master’s thesis, Massachusetts Institute of Technology.

    Google Scholar

    Rush , G., Tauritz , D. R. & Kent , A. D. 2015. Coevolutionary agent-based network defense lightweight event system (candles). In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation. GECCO Companion’15. Association for Computing Machinery, 859–866. https://doi.org/10.1145/2739482.2768429.

    Google Scholar

    Salimans , T., Ho , J., Chen , X. & Sutskever , I. 2017. Evolution strategies as a scalable alternative to reinforcement learning. ArXiv abs/1703.03864.

    Google Scholar

    Schaul , T. & Schmidhuber , J. 2008. A scalable neural network architecture for board games. In 2008 IEEE Symposium on Computational Intelligence and Games, CIG 2008, 357–364.

    Google Scholar

    Sigaud , O. & Stulp , F. 2019. Policy search in continuous action domains: an overview. Neural Networks: The Official Journal of the International Neural Network Society 113, 28–40.

    Google Scholar

    Silver , D., Hubert , T., Schrittwieser , J., Antonoglou , I., Lai , M., Guez , A., Lanctot , M., Sifre , L., Kumaran , D., Graepel , T., Lillicrap , T., Simonyan , K. & Hassabis , D. 2018. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144. https://www.science.org/doi/abs/10.1126/science.aar6404.

    Google Scholar

    Simione , L. & Nolfi , S. 2017. Achieving long-term progress in competitive co-evolution. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 1–8.

    Google Scholar

    Team , M. D. R. 2021. Cyberbattlesim. https://github.com/microsoft/cyberbattlesim. Created by Christian Seifert, Michael Betser, William Blum, James Bono, Kate Farris, Emily Goren, Justin Grana, Kristian Holsheimer, Brandon Marken, Joshua Neil, Nicole Nichols, Jugal Parikh, Haoran Wei.

    Google Scholar

    The MITRE Corporation 2020. Caldera. https://github.com/mitre/caldera.

    Google Scholar

    Vinyals , O., Babuschkin , I., Czarnecki , W., Mathieu , M., Dudzik , A., Chung , J., Choi , D., Powell , R., Ewalds , T., Georgiev , P., Oh , J., Horgan , D., Kroiss , M., Danihelka , I., Huang , A., Sifre , L., Cai , T., Agapiou , J., Jaderberg , M., Vezhnevets , A., Leblond , R., Pohlen , T., Dalibard , V., Budden , D., Sulsky , Y., Molloy , J., Paine , T., Gulcehre , C., Wang , Z., Pfaff , T., Wu , Y., Ring , R., Yogatama , D., Wünsch , D., McKinney , K., Smith , O., Schaul , T., Lillicrap , T., Kavukcuoglu , K., Hassabis , D., Apps , C. & Silver , D. 2019. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354.

    Google Scholar

    Walter , E. C., Ferguson-Walter , K. J. & Ridley , A. D. 2021. Incorporating deception into cyberbattlesim for autonomous defense. ArXiv abs/2108.13980.

    Google Scholar

    Yang , L.-X., Pengdeng , L., Zhang , Y., Yang , X., Xiang , Y. & Zhou , W. 2018. Effective repair strategy against advanced persistent threat: a differential game approach. IEEE Transactions on Information Forensics and Security14(7), 1713–1728.

    Google Scholar

    Zhu , Q. & Rass , S. 2018. On multi-phase and multi-stage game-theoretic modeling of advanced persistent threats. IEEE Access 6, 13958–13971.

    Google Scholar

  • Cite this article

    Alexander Shashkov, Erik Hemberg, Miguel Tulla, Una-May O’Reilly. 2023. Adversarial agent-learning for cybersecurity: a comparison of algorithms. The Knowledge Engineering Review 38(1), doi: 10.1017/S0269888923000012
    Alexander Shashkov, Erik Hemberg, Miguel Tulla, Una-May O’Reilly. 2023. Adversarial agent-learning for cybersecurity: a comparison of algorithms. The Knowledge Engineering Review 38(1), doi: 10.1017/S0269888923000012

Article Metrics

Article views(146) PDF downloads(81)

RESEARCH ARTICLE   Open Access    

Adversarial agent-learning for cybersecurity: a comparison of algorithms

Abstract: Abstract: We investigate artificial intelligence and machine learning methods for optimizing the adversarial behavior of agents in cybersecurity simulations. Our cybersecurity simulations integrate the modeling of agents launching Advanced Persistent Threats (APTs) with the modeling of agents using detection and mitigation mechanisms against APTs. This simulates the phenomenon of how attacks and defenses coevolve. The simulations and machine learning are used to search for optimal agent behaviors. The central question is: under what circumstances, is one training method more advantageous than another? We adapt and compare a variety of deep reinforcement learning (DRL), evolutionary strategies (ES) and Monte Carlo Tree Search methods within Connect 4, a baseline game environment, and on both a simulation supporting a simple APT threat model, SNAPT, as well as CyberBattleSim, an open-source cybersecurity simulation. Our results show that when attackers are trained by DRL and ES algorithms, as well as when they are trained with both algorithms being used in alternation, they are able to effectively choose complex exploits that thwart a defense. The algorithm that combines DRL and ES achieves the best comparative performance when attackers and defenders are simultaneously trained, rather than when each is trained against its non-learning counterpart.

    • https://en.wikipedia.org/wiki/Connect_Four.

    • © The Author(s), 2023. Published by Cambridge University Press2023Cambridge University Press
References (52)
  • About this article
    Cite this article
    Alexander Shashkov, Erik Hemberg, Miguel Tulla, Una-May O’Reilly. 2023. Adversarial agent-learning for cybersecurity: a comparison of algorithms. The Knowledge Engineering Review 38(1), doi: 10.1017/S0269888923000012
    Alexander Shashkov, Erik Hemberg, Miguel Tulla, Una-May O’Reilly. 2023. Adversarial agent-learning for cybersecurity: a comparison of algorithms. The Knowledge Engineering Review 38(1), doi: 10.1017/S0269888923000012
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return