KTH, Brinellvägen 8, Stockholm 114 28, Sweden; E-mail: aniset@kth.se"/> Ericsson, Torshamnsgatan 23, Stockholm 164 83, Sweden; E-mails: alexandros.nikou@ericsson.com, marios.daoutis@ericsson.com"/>
Search
2022 Volume 37
Article Contents
RESEARCH ARTICLE   Open Access    

A scalable species-based genetic algorithm for reinforcement learning problems

More Information
  • Abstract: Reinforcement Learning (RL) methods often rely on gradient estimates to learn an optimal policy for control problems. These expensive computations result in long training times, a poor rate of convergence, and sample inefficiency when applied to real-world problems with a large state and action space. Evolutionary Computation (EC)-based techniques offer a gradient-free apparatus to train a deep neural network for RL problems. In this work, we leverage the benefits of EC and propose a novel variant of genetic algorithm called SP-GA which utilizes a species-inspired weight initialization strategy and trains a population of deep neural networks, each estimating the Q-function for the RL problem. Efficient encoding of a neural network that utilizes less memory is also proposed which provides an intuitive mechanism to apply Gaussian mutations and single-point crossover. The results on Atari 2600 games outline comparable performance with gradient-based algorithms like Deep Q-Network (DQN), Asynchronous Advantage Actor Critic (A3C), and gradient-free algorithms like Evolution Strategy (ES) and simple Genetic Algorithm (GA) while requiring far fewer hyperparameters to train. The algorithm also improved certain Key Performance Indicators (KPIs) when applied to a Remote Electrical Tilt (RET) optimization task in the telecommunication domain.
  • 加载中
  • Abbeel , P., Coates , A., Quigley , M. & Ng , A. 2006. An application of reinforcement learning to aerobatic helicopter flight. In Advances in Neural Information Processing Systems, 1–8.

    Google Scholar

    Amdahl , G. M. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18–20,1967, Spring Joint Computer Conference. AFIPS’67 (Spring). Atlantic City, New Jersey. Association for Computing Machinery, 483–485. ISBN:9781450378956. doi: 10.1145/1465482.1465560.

    Google Scholar

    Baluja , S. & Caruana , R. 1995. Removing the genetics from the standard genetic algorithm. In Proceedings of ICML’95. Morgan Kaufmann Publishers, 38–46.

    Google Scholar

    Barr , K. 2007. ASIC Design in the Silicon Sandbox: A Complete Guide to Building Mixed-Signal Integrated Circuits. McGraw-Hill Education. ISBN:9780071481618. https://www.accessengineeringlibrary.com/content/book/9780071481618.

    Google Scholar

    Bellemare , M. G., Naddaf , Y., Veness , J. & Bowling , M. 2013. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research 47(1), 253–279. ISSN:1076-9757. doi: 10.1613/jair.3912.

    Google Scholar

    Bellman , R. 1958. Dynamic programming and stochastic control processes. Information and Control 1(3), 228–239. ISSN:0019-9958. https://doi.org/10.1016/S0019-9958(58)80003-0. https://www.sciencedirect.com/science/article/pii/S0019995858800030.

    Google Scholar

    Bellman , R. E. 1954. The Theory of Dynamic Programming. RAND Corporation.

    Google Scholar

    Bottou , L. 1998. Online Learning and Stochastic Approximations.

    Google Scholar

    Boutilier , C., Dean , T. & Hanks , S. 1999. Decision-theoretic planning: Structural assumptions and computational leverage. The Journal of Artificial Intelligence Research (JAIR) 11. doi: 10.1613/jair.575.

    CrossRef   Google Scholar

    Brockman , B., Cheung , V., Pettersson , L., Schneider , J., Schulman , J., Tang , J. & Zaremba , W. 2016. OpenAI Gym. arXiv:1606.01540 [cs.LG].

    Google Scholar

    Cho , K., et al. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv:1409.1259 [cs.CL].

    Google Scholar

    Conti , E., Madhavan , V., Such F. P., Lehman , J., Stanley , K. O. & Clune , J. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada (pp. 5032–5043).

    Google Scholar

    Cully , A., et al. 2015. Robots that can adapt like animals. Nature 521, 503–507. doi: 10.1038/nature14422.

    CrossRef   Google Scholar

    Cybenko , G. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2, 303–314.

    Google Scholar

    Da Ronco , C. C. & Benini , E. 2014. A simplex-crossover-based multi-objective evolutionary algorithm. In Kim , H. K. et al. (eds), 583–598. doi: 10.1007/978-94-007-6818-5_41.

    Google Scholar

    Darwin , C. 1859. On the Origin of Species by Means of Natural Selection. or the Preservation of Favored Races in the Struggle for Life. Murray.

    Google Scholar

    Dong , Y. & Zou , X. 2020. Mobile robot path planning based on improved DDPG reinforcement learning algorithm. In 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), 52–56. doi: 10.1109/ICSESS49938.2020.9237641.

    Google Scholar

    Fitch , F. B. 1944. Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of mathematical biophysics, vol. 5 (1943), pp. 115–133. Journal of Symbolic Logic 9(2), 49–50. doi: 10.2307/2268029.

    CrossRef   Google Scholar

    Gangwani , T. & Peng , J. 2018. Policy Optimization by Genetic Distillation. arXiv:1711.01012 [stat.ML].

    Google Scholar

    Glorot , X. & Bengio , Y. 2010. Understanding the difficulty of training deep feedforwardneural networks. Journal of Machine Learning Research-Proceedings Track 9, 249–256.

    Google Scholar

    Goodfellow , I., Bengio , Y. & Courville , A. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

    Google Scholar

    Hansen , N. & Auger , A. 2011. CMA-ES: evolution strategies and covariance matrix adaptation. In Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation. GECCO’11, Dublin, Ireland. Association for Computing Machinery, 991–1010. ISBN:9781450306904. doi: 10.1145/2001858.2002123.

    Google Scholar

    Hasselt , H. 2010. Double Q-learning. In Advances in Neural Information Processing Systems, Lafferty, J., et al., 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2010/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf.

    Google Scholar

    Haupt , S. & Haupt , R. 2003. Genetic algorithms and their applications in Environmental Sciences. 3rd Conference on Artificial Intelligence Applications to the Environmental Science. vol. 23. pp. 49–62.

    Google Scholar

    He , K., et al. 2015a. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV].

    Google Scholar

    He , K., et al. 2015b. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In IEEE International Conference on Computer Vision (ICCV2015), 1502. doi: 10.1109/ICCV.2015.123.

    Google Scholar

    Herculano-Houzel , S. 2009. The human brain in numbers: a linearly scaled-up primate brain. eng. In Frontiers in Human Neuroscience 3.PMC2776484[pmcid], 31–31. ISSN:16625161. doi: 10.3389/neuro.09.031.2009.

    Google Scholar

    Hochreiter , S. & Schmidhuber , J. 1997. Long short-term memory. Neural Computation 9(8), 1735–1780. ISSN:0899-7667. doi: 10.1162/neco.1997.9.8.1735.

    CrossRef   Google Scholar

    Holland , J. H. 1992a. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. The MIT Press. ISBN: 9780262275552. doi: 10.7551/mitpress/1090.001.0001.

    Google Scholar

    Holland , J. H. 1992b. Genetic algorithms. Scientific American 267(1). Full publication date: July 1992, 66–73. http://www.jstor.org/stable/24939139.

    Google Scholar

    Jaderberg , M., et al. 2017. Population Based Training of Neural Networks. arXiv:1711.09846 [cs.LG].

    Google Scholar

    Jouppi , N., et al. 2017. In-datacenter performance analysis of a tensor processing unit. ACM SIGARCH Computer Architecture News 45, 1–12. doi: 10.1145/3140659.3080246.

    CrossRef   Google Scholar

    Kalashnikov , D., et al. 2018. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. arXiv:1806.10293 [cs.LG].

    Google Scholar

    Kavalerov , M., Likhacheva , Y. & Shilova , Y. 2017. A reinforcement learning approach to network routing based on adaptive learning rates and route memory. In SoutheastCon 2017, 1–6. doi: 10.1109/SECON.2017.7925316.

    Google Scholar

    Khadka , S., Majumdar , S., et al. 2019. Collaborative evolutionary reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, Chaudhuri , K. & Salakhutdinov , R. (eds), 97. Proceedings of Machine Learning Research. PMLR, 3341–3350. https://proceedings.mlr.press/v97/khadka19a.html.

    Google Scholar

    Khadka , S. & Tumer , K. 2018. Evolution-Guided Policy Gradient in Reinforcement Learning. arXiv:1805.07917 [cs.LG].

    Google Scholar

    Kullback , S. & Leibler , R. A. 1951. On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86.

    Google Scholar

    Larranaga , P., et al. 1996. Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 26(4), 487–493. doi: 10.1109/3468.508827.

    CrossRef   Google Scholar

    Larranaga , P., et al. 1999. Genetic algorithms for the travelling salesman problem: a review of representations and operators. Artificial Intelligence Review 13, 129–170. doi: 10.1023/A:1006529012972.

    CrossRef   Google Scholar

    Lehman , J. & Stanley , K. 2008. Exploiting open-endedness to solve problems through the search for novelty. In ALIFE.

    Google Scholar

    Lillicrap , T., et al. 2015. Continuous control with deep reinforcement learning. CoRR.

    Google Scholar

    Liu , H., et al. 2017. Hierarchical representations for efficient architecture search. arXiv e-prints, arXiv:1711.00436 [cs.LG].

    Google Scholar

    Liu , H., et al. 2018. Hierarchical Representations for Efficient Architecture Search. arXiv:1711.00436 [cs.LG].

    Google Scholar

    Luong , N. C., et al. (2019). Applications of deep reinforcement learning in communications and networking: a survey. IEEE Communications Surveys Tutorials 21(4), 3133–3174. doi: 10.1109/COMST.2019.2916583.

    CrossRef   Google Scholar

    Ma , S., et al. 2020. Image and video compression with neural networks: a review. IEEE Transactions on Circuits and Systems for Video Technology 30(6), 1683–1698. ISSN:15582205. doi: 10.1109/tcsvt.2019.2910119.

    CrossRef   Google Scholar

    Mania , H., Guy , A. & Recht , B. 2018. Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055 [cs.LG].

    Google Scholar

    Markov , A. A. 1906. Rasprostranenie zakona bol’shih chisel na velichiny, zavisyaschie drug ot druga. In Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete 15, 135156, 18.

    Google Scholar

    Markov , A. A. 2006. An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains. Science in Context 19(4), 591–600. doi: 10.1017/S0269889706001074.

    CrossRef   Google Scholar

    Melo , F. S., Meyn , S. P. & Ribeiro , M. I. 2008. An analysis of reinforcement learning with function approximation. In Proceedings of the 25th International Conference on Machine Learning. ICML’08. Helsinki, Finland. Association for Computing Machinery, 664–671. ISBN:9781605582054. doi: 10.1145/1390156.1390240.

    Google Scholar

    Mitchell , M. 1996. An Introduction to Genetic Algorithms. MIT Press. ISBN:0262133164.

    Google Scholar

    Mnih , V., Badia , A. P., et al. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, Balcan , M. F. & Weinberger , K. Q. (eds), 48. Proceedings of Machine Learning Research. New York, New York, USA, PMLR, 1928–1937. http://proceedings.mlr.press/v48/mniha16.html.

    Google Scholar

    Mnih , V., Kavukcuoglu , K., et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–33. doi: 10.1038/nature14236.

    CrossRef   Google Scholar

    Munemasa , I., et al. 2018. Deep reinforcement learning for recommender systems. In 2018 International Conference on Information and Communications Technology (ICOIACT), 226–233. doi: 10.1109/ICOIACT.2018.8350761.

    Google Scholar

    Nair , A., Srinivasan , P., Blackwell , S., Alcicek , C., Fearon , R., De Maria , A., Panneershelvam , V., Suleyman , M., Beattie , C., Petersen , S. & Legg , S., 2015. Massively parallel methods for deep reinforcement learning. arXiv:1507.04296.

    Google Scholar

    Neglia , G., et al. 2019. The role of network topology for distributed machine learning. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications, 2350–2358. doi: 10.1109/INFOCOM.2019.8737602.

    Google Scholar

    Nikou , A., et al. to appear. Symbolic reinforcement laming for safe RAN control. In International Conference of Autonomous Agents and Multi Agent Systems (AAMAS).

    Google Scholar

    Proshansky , H. & Murphy , G. 1942. The effects of reward and punishment on perception. The Journal of Psychology: Interdisciplinary and Applied 13, 295–305. doi: 10.1080/00223980.1942.9917097.

    CrossRef   Google Scholar

    Pugh , J., Soros , L. & Stanley , K. 2016. Quality diversity: a new frontier for evolutionary computation. Frontiers in Robotics and AI 3. doi: 10.3389/frobt.2016.00040.

    CrossRef   Google Scholar

    Purwins , H., et al. 2019. Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing 13(2), 206–219. ISSN:1941-0484. doi: 10.1109/jstsp.2019.2908700.

    CrossRef   Google Scholar

    Puterman , M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 1st edition. John Wiley & Sons, Inc. ISBN:0471619779.

    Google Scholar

    Radford , A., Wu , J., Child , R., Luan , D., Amodei , D. & Sutskever , I. (2019). Language models are unsupervised multitask learners. In OpenAI blog 1(8), 9.

    Google Scholar

    Rall , L. B. 1981. Automatic Differentiation: Techniques and Applications. Lecture Notes in Computer Science.

    Google Scholar

    Robbins , H. & Monro , S. 1951. A stochastic approximation method. The Annals of Mathematical Statistics 22(3), 400–407. doi: 10.1214/aoms/1177729586.

    CrossRef   Google Scholar

    Rumelhart , D., Hinton , G. & Mcclelland , J. 1986. A general framework for parallel distributed processing. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1.

    Google Scholar

    Salimans , T., et al. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864 [stat.ML].

    Google Scholar

    Schaul , T., Quan , J., Antonoglou , I. & Silver , D., 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952.

    Google Scholar

    Silver , D., Huang , A., et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489. doi: 10.1038/nature16961.

    CrossRef   Google Scholar

    Silver , D., Hubert , T., et al. 2017. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 [cs.AI].

    Google Scholar

    Sousa , C., 2016. An overview on weight initialization methods for feedforward neural networks. doi: 10.1109/IJCNN.2016.7727180.

    Google Scholar

    Srivastava , N., et al. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(56), 1929–1958. http://jmlr.org/papers/v15/srivastava14a.html.

    Google Scholar

    Stanley , K. 2007. Compositional pattern producing networks: a novel abstraction of development. In Genetic Programming and Evolvable Machines 8, 131–162. doi: 10.1007/s10710-007-9028-8.

    CrossRef   Google Scholar

    Stanley , K., D’Ambrosio , D. & Gauci , J. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artificial Life 15, 185–212. doi: 10.1162/artl.2009.15.2.15202.

    CrossRef   Google Scholar

    Stanley , K. O. & Miikkulainen , R. 2002. Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2), 99–127. ISSN:1063-6560. doi: 10.1162/106365602320169811.

    CrossRef   Google Scholar

    Strubell , E., Ganesh , A. & Mccallum , A. 2019. Energy and policy considerations for deep learning in NLP, 3645–3650. doi: 10.18653/v1/P19-1355.

    Google Scholar

    Such , F. P., et al. 2018. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arXiv:1712.06567 [cs.NE].

    Google Scholar

    Sutton , R. S. & Barto , A. G. 2018. Reinforcement Learning: An Introduction. A Bradford Book. ISBN:0262039249.

    Google Scholar

    Tassa , Y., Erez , T. & Todorov , E. 2012. Synthesis and stabilization of complex behaviors through online trajectory optimization, 4906–4913. ISBN:978-1-4673-1737-5. doi: 10.1109/IROS.2012.6386025.

    Google Scholar

    Van Hasselt , H. 2013. Reinforcement Learning in Continuous State and Action Spaces. doi: 10.1007/978-3-642-27645-3_7.

    Google Scholar

    van Hasselt , H., Guez , A. & Silver , D. 2016. Deep reinforcement learning with double Q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI’16. Phoenix, Arizona. AAAI Press, 2094–2100.

    Google Scholar

    Vannella , F., et al. 2021. Remote Electrical Tilt Optimization via Safe Reinforcement Learning. arXiv:2010.0584 2 [cs.LG].

    Google Scholar

    Wang , Z., Schaul , T., Hessel , M., Hasselt , H., Lanctot , M. & Freitas , N. 2016. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. ICML’16. New York, NY, USA. JMLR.org, 1995–2003.

    Google Scholar

    Whiteson , S. 2012. Evolutionary computation for reinforcement learning. In Reinforcement Learning: State of the Art. doi: 10.1007/978-3-642-27645-3_10.

    CrossRef   Google Scholar

    Wierstra , D., et al. 2008. Natural evolution strategies. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), 3381–3387. doi: 10.1109/CEC.2008.4631255.

    CrossRef   Google Scholar

    Xu , K., et al. 2016. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv:1502.03044 [cs.LG].

    Google Scholar

    Yajnanarayana , V., Ryden , H. & Hevizi , L. 2020. 5G handover using reinforcement learning. In 2020 IEEE 3rd 5G World Forum (5GWF). doi: 10.1109/5gwf49715.2020.9221072.

    CrossRef   Google Scholar

    Yu , Y. 2018. Towards sample efficient reinforcement learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. IJCAI’18. Stockholm, Sweden. AAAI Press, 5739–5743. ISBN:9780999241127.

    Google Scholar

    Zheng , G., Zhang , F., Zheng , Z., Xiang , Y., Yuan , N.J., Xie , X. & Li , Z. 2018. DRN: a deep reinforcement learning framework for news recommendation. In WWW’18: Proceedings of the 2018 World Wide Web Conference, 167–176.

    Google Scholar

  • Cite this article

    Anirudh Seth, Alexandros Nikou, Marios Daoutis. 2022. A scalable species-based genetic algorithm for reinforcement learning problems. The Knowledge Engineering Review 37(1), doi: 10.1017/S0269888922000042
    Anirudh Seth, Alexandros Nikou, Marios Daoutis. 2022. A scalable species-based genetic algorithm for reinforcement learning problems. The Knowledge Engineering Review 37(1), doi: 10.1017/S0269888922000042

Article Metrics

Article views(76) PDF downloads(71)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

A scalable species-based genetic algorithm for reinforcement learning problems

Abstract: Abstract: Reinforcement Learning (RL) methods often rely on gradient estimates to learn an optimal policy for control problems. These expensive computations result in long training times, a poor rate of convergence, and sample inefficiency when applied to real-world problems with a large state and action space. Evolutionary Computation (EC)-based techniques offer a gradient-free apparatus to train a deep neural network for RL problems. In this work, we leverage the benefits of EC and propose a novel variant of genetic algorithm called SP-GA which utilizes a species-inspired weight initialization strategy and trains a population of deep neural networks, each estimating the Q-function for the RL problem. Efficient encoding of a neural network that utilizes less memory is also proposed which provides an intuitive mechanism to apply Gaussian mutations and single-point crossover. The results on Atari 2600 games outline comparable performance with gradient-based algorithms like Deep Q-Network (DQN), Asynchronous Advantage Actor Critic (A3C), and gradient-free algorithms like Evolution Strategy (ES) and simple Genetic Algorithm (GA) while requiring far fewer hyperparameters to train. The algorithm also improved certain Key Performance Indicators (KPIs) when applied to a Remote Electrical Tilt (RET) optimization task in the telecommunication domain.

    • https://cloud.google.com/tpu.

    • http://www.swedishepa.se/Environmental-objectives-and-cooperation/Swedish-environmental-work/Work-areas/Climate/How-can-I-reduce-my-carbon-footprint-/.

    • https://gym.openai.com/.

    • https://docs.ray.io/en/master/rllib.html.

    • https://github.com/openai/atari-py.

    • https://en.wikipedia.org/wiki/RSRP.

    • https://en.wikipedia.org/wiki/Radio_Resource_Control.

    • https://github.com/ray-project/rl-experiments.

    • This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the reused or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
References (87)
  • About this article
    Cite this article
    Anirudh Seth, Alexandros Nikou, Marios Daoutis. 2022. A scalable species-based genetic algorithm for reinforcement learning problems. The Knowledge Engineering Review 37(1), doi: 10.1017/S0269888922000042
    Anirudh Seth, Alexandros Nikou, Marios Daoutis. 2022. A scalable species-based genetic algorithm for reinforcement learning problems. The Knowledge Engineering Review 37(1), doi: 10.1017/S0269888922000042
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return