A scalable species-based genetic algorithm for reinforcement learning problems

Anirudh Seth; Alexandros Nikou; Marios Daoutis; Anirudh Seth; Alexandros Nikou; Marios Daoutis

doi:10.1017/S0269888922000042

2022 Volume 37

Article Contents

Next Previous

RESEARCH ARTICLE Open Access

A scalable species-based genetic algorithm for reinforcement learning problems

¹KTH, Brinellvägen 8, Stockholm 114 28, Sweden; E-mail: aniset@kth.se
²Ericsson, Torshamnsgatan 23, Stockholm 164 83, Sweden; E-mails: alexandros.nikou@ericsson.com, marios.daoutis@ericsson.com

More Information

Received: 26 August 2021
Revised: 17 June 2022
Accepted: 28 June 2022
Published online: 19 September 2022
The Knowledge Engineering Review 37, Article number: e9 (2022) | Cite this article

Abstract

Abstract: Reinforcement Learning (RL) methods often rely on gradient estimates to learn an optimal policy for control problems. These expensive computations result in long training times, a poor rate of convergence, and sample inefficiency when applied to real-world problems with a large state and action space. Evolutionary Computation (EC)-based techniques offer a gradient-free apparatus to train a deep neural network for RL problems. In this work, we leverage the benefits of EC and propose a novel variant of genetic algorithm called SP-GA which utilizes a species-inspired weight initialization strategy and trains a population of deep neural networks, each estimating the Q-function for the RL problem. Efficient encoding of a neural network that utilizes less memory is also proposed which provides an intuitive mechanism to apply Gaussian mutations and single-point crossover. The results on Atari 2600 games outline comparable performance with gradient-based algorithms like Deep Q-Network (DQN), Asynchronous Advantage Actor Critic (A3C), and gradient-free algorithms like Evolution Strategy (ES) and simple Genetic Algorithm (GA) while requiring far fewer hyperparameters to train. The algorithm also improved certain Key Performance Indicators (KPIs) when applied to a Remote Electrical Tilt (RET) optimization task in the telecommunication domain.
Rights and permissions
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the reused or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.

References

Abbeel , P., Coates , A., Quigley , M. & Ng , A. 2006. An application of reinforcement learning to aerobatic helicopter flight. In Advances in Neural Information Processing Systems, 1–8.

Google Scholar

Amdahl , G. M. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18–20,1967, Spring Joint Computer Conference. AFIPS’67 (Spring). Atlantic City, New Jersey. Association for Computing Machinery, 483–485. ISBN:9781450378956. doi: 10.1145/1465482.1465560.

Google Scholar

Baluja , S. & Caruana , R. 1995. Removing the genetics from the standard genetic algorithm. In Proceedings of ICML’95. Morgan Kaufmann Publishers, 38–46.

Google Scholar

Barr , K. 2007. ASIC Design in the Silicon Sandbox: A Complete Guide to Building Mixed-Signal Integrated Circuits. McGraw-Hill Education. ISBN:9780071481618. https://www.accessengineeringlibrary.com/content/book/9780071481618.

Google Scholar

Bellemare , M. G., Naddaf , Y., Veness , J. & Bowling , M. 2013. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research 47(1), 253–279. ISSN:1076-9757. doi: 10.1613/jair.3912.

Google Scholar

Bellman , R. 1958. Dynamic programming and stochastic control processes. Information and Control 1(3), 228–239. ISSN:0019-9958. https://doi.org/10.1016/S0019-9958(58)80003-0. https://www.sciencedirect.com/science/article/pii/S0019995858800030.

Google Scholar

Bellman , R. E. 1954. The Theory of Dynamic Programming. RAND Corporation.

Google Scholar

Bottou , L. 1998. Online Learning and Stochastic Approximations.

Google Scholar

Boutilier , C., Dean , T. & Hanks , S. 1999. Decision-theoretic planning: Structural assumptions and computational leverage. The Journal of Artificial Intelligence Research (JAIR) 11. doi: 10.1613/jair.575.

CrossRef Google Scholar

Brockman , B., Cheung , V., Pettersson , L., Schneider , J., Schulman , J., Tang , J. & Zaremba , W. 2016. OpenAI Gym. arXiv:1606.01540 [cs.LG].

Google Scholar

Cho , K., et al. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv:1409.1259 [cs.CL].

Google Scholar

Conti , E., Madhavan , V., Such F. P., Lehman , J., Stanley , K. O. & Clune , J. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada (pp. 5032–5043).

Google Scholar

Cully , A., et al. 2015. Robots that can adapt like animals. Nature 521, 503–507. doi: 10.1038/nature14422.

CrossRef Google Scholar

Cybenko , G. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2, 303–314.

Google Scholar

Da Ronco , C. C. & Benini , E. 2014. A simplex-crossover-based multi-objective evolutionary algorithm. In Kim , H. K. et al. (eds), 583–598. doi: 10.1007/978-94-007-6818-5_41.

Google Scholar

Darwin , C. 1859. On the Origin of Species by Means of Natural Selection. or the Preservation of Favored Races in the Struggle for Life. Murray.

Google Scholar

Dong , Y. & Zou , X. 2020. Mobile robot path planning based on improved DDPG reinforcement learning algorithm. In 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), 52–56. doi: 10.1109/ICSESS49938.2020.9237641.

Google Scholar

Fitch , F. B. 1944. Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of mathematical biophysics, vol. 5 (1943), pp. 115–133. Journal of Symbolic Logic 9(2), 49–50. doi: 10.2307/2268029.

CrossRef Google Scholar

Gangwani , T. & Peng , J. 2018. Policy Optimization by Genetic Distillation. arXiv:1711.01012 [stat.ML].

Google Scholar

Glorot , X. & Bengio , Y. 2010. Understanding the difficulty of training deep feedforwardneural networks. Journal of Machine Learning Research-Proceedings Track 9, 249–256.

Google Scholar

Goodfellow , I., Bengio , Y. & Courville , A. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

Google Scholar

Hansen , N. & Auger , A. 2011. CMA-ES: evolution strategies and covariance matrix adaptation. In Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation. GECCO’11, Dublin, Ireland. Association for Computing Machinery, 991–1010. ISBN:9781450306904. doi: 10.1145/2001858.2002123.

Google Scholar

Hasselt , H. 2010. Double Q-learning. In Advances in Neural Information Processing Systems, Lafferty, J., et al., 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2010/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf.

Google Scholar

Haupt , S. & Haupt , R. 2003. Genetic algorithms and their applications in Environmental Sciences. 3rd Conference on Artificial Intelligence Applications to the Environmental Science. vol. 23. pp. 49–62.

Google Scholar

He , K., et al. 2015a. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV].

Google Scholar

He , K., et al. 2015b. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In IEEE International Conference on Computer Vision (ICCV2015), 1502. doi: 10.1109/ICCV.2015.123.

Google Scholar

Herculano-Houzel , S. 2009. The human brain in numbers: a linearly scaled-up primate brain. eng. In Frontiers in Human Neuroscience 3.PMC2776484[pmcid], 31–31. ISSN:16625161. doi: 10.3389/neuro.09.031.2009.

Google Scholar

Hochreiter , S. & Schmidhuber , J. 1997. Long short-term memory. Neural Computation 9(8), 1735–1780. ISSN:0899-7667. doi: 10.1162/neco.1997.9.8.1735.

CrossRef Google Scholar

Holland , J. H. 1992a. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. The MIT Press. ISBN: 9780262275552. doi: 10.7551/mitpress/1090.001.0001.

Google Scholar

Holland , J. H. 1992b. Genetic algorithms. Scientific American 267(1). Full publication date: July 1992, 66–73. http://www.jstor.org/stable/24939139.

Google Scholar

Jaderberg , M., et al. 2017. Population Based Training of Neural Networks. arXiv:1711.09846 [cs.LG].

Google Scholar

Jouppi , N., et al. 2017. In-datacenter performance analysis of a tensor processing unit. ACM SIGARCH Computer Architecture News 45, 1–12. doi: 10.1145/3140659.3080246.

CrossRef Google Scholar

Kalashnikov , D., et al. 2018. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. arXiv:1806.10293 [cs.LG].

Google Scholar

Kavalerov , M., Likhacheva , Y. & Shilova , Y. 2017. A reinforcement learning approach to network routing based on adaptive learning rates and route memory. In SoutheastCon 2017, 1–6. doi: 10.1109/SECON.2017.7925316.

Google Scholar

Khadka , S., Majumdar , S., et al. 2019. Collaborative evolutionary reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, Chaudhuri , K. & Salakhutdinov , R. (eds), 97. Proceedings of Machine Learning Research. PMLR, 3341–3350. https://proceedings.mlr.press/v97/khadka19a.html.

Google Scholar

Khadka , S. & Tumer , K. 2018. Evolution-Guided Policy Gradient in Reinforcement Learning. arXiv:1805.07917 [cs.LG].

Google Scholar

Kullback , S. & Leibler , R. A. 1951. On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86.

Google Scholar

Larranaga , P., et al. 1996. Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 26(4), 487–493. doi: 10.1109/3468.508827.

CrossRef Google Scholar

Larranaga , P., et al. 1999. Genetic algorithms for the travelling salesman problem: a review of representations and operators. Artificial Intelligence Review 13, 129–170. doi: 10.1023/A:1006529012972.

CrossRef Google Scholar

Lehman , J. & Stanley , K. 2008. Exploiting open-endedness to solve problems through the search for novelty. In ALIFE.

Google Scholar

Lillicrap , T., et al. 2015. Continuous control with deep reinforcement learning. CoRR.

Google Scholar

Liu , H., et al. 2017. Hierarchical representations for efficient architecture search. arXiv e-prints, arXiv:1711.00436 [cs.LG].

Google Scholar

Liu , H., et al. 2018. Hierarchical Representations for Efficient Architecture Search. arXiv:1711.00436 [cs.LG].

Google Scholar

Luong , N. C., et al. (2019). Applications of deep reinforcement learning in communications and networking: a survey. IEEE Communications Surveys Tutorials 21(4), 3133–3174. doi: 10.1109/COMST.2019.2916583.

CrossRef Google Scholar

Ma , S., et al. 2020. Image and video compression with neural networks: a review. IEEE Transactions on Circuits and Systems for Video Technology 30(6), 1683–1698. ISSN:15582205. doi: 10.1109/tcsvt.2019.2910119.

CrossRef Google Scholar

Mania , H., Guy , A. & Recht , B. 2018. Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055 [cs.LG].

Google Scholar

Markov , A. A. 1906. Rasprostranenie zakona bol’shih chisel na velichiny, zavisyaschie drug ot druga. In Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete 15, 135156, 18.

Google Scholar

Markov , A. A. 2006. An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains. Science in Context 19(4), 591–600. doi: 10.1017/S0269889706001074.

CrossRef Google Scholar

Melo , F. S., Meyn , S. P. & Ribeiro , M. I. 2008. An analysis of reinforcement learning with function approximation. In Proceedings of the 25th International Conference on Machine Learning. ICML’08. Helsinki, Finland. Association for Computing Machinery, 664–671. ISBN:9781605582054. doi: 10.1145/1390156.1390240.

Google Scholar

Mitchell , M. 1996. An Introduction to Genetic Algorithms. MIT Press. ISBN:0262133164.

Google Scholar

Mnih , V., Badia , A. P., et al. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, Balcan , M. F. & Weinberger , K. Q. (eds), 48. Proceedings of Machine Learning Research. New York, New York, USA, PMLR, 1928–1937. http://proceedings.mlr.press/v48/mniha16.html.

Google Scholar

Mnih , V., Kavukcuoglu , K., et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–33. doi: 10.1038/nature14236.

CrossRef Google Scholar

Munemasa , I., et al. 2018. Deep reinforcement learning for recommender systems. In 2018 International Conference on Information and Communications Technology (ICOIACT), 226–233. doi: 10.1109/ICOIACT.2018.8350761.

Google Scholar

Nair , A., Srinivasan , P., Blackwell , S., Alcicek , C., Fearon , R., De Maria , A., Panneershelvam , V., Suleyman , M., Beattie , C., Petersen , S. & Legg , S., 2015. Massively parallel methods for deep reinforcement learning. arXiv:1507.04296.

Google Scholar

Neglia , G., et al. 2019. The role of network topology for distributed machine learning. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications, 2350–2358. doi: 10.1109/INFOCOM.2019.8737602.

Google Scholar

Nikou , A., et al. to appear. Symbolic reinforcement laming for safe RAN control. In International Conference of Autonomous Agents and Multi Agent Systems (AAMAS).

Google Scholar

Proshansky , H. & Murphy , G. 1942. The effects of reward and punishment on perception. The Journal of Psychology: Interdisciplinary and Applied 13, 295–305. doi: 10.1080/00223980.1942.9917097.

CrossRef Google Scholar

Pugh , J., Soros , L. & Stanley , K. 2016. Quality diversity: a new frontier for evolutionary computation. Frontiers in Robotics and AI 3. doi: 10.3389/frobt.2016.00040.

CrossRef Google Scholar

Purwins , H., et al. 2019. Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing 13(2), 206–219. ISSN:1941-0484. doi: 10.1109/jstsp.2019.2908700.

CrossRef Google Scholar

Puterman , M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 1st edition. John Wiley & Sons, Inc. ISBN:0471619779.

Google Scholar

Radford , A., Wu , J., Child , R., Luan , D., Amodei , D. & Sutskever , I. (2019). Language models are unsupervised multitask learners. In OpenAI blog 1(8), 9.

Google Scholar

Rall , L. B. 1981. Automatic Differentiation: Techniques and Applications. Lecture Notes in Computer Science.

Google Scholar

Robbins , H. & Monro , S. 1951. A stochastic approximation method. The Annals of Mathematical Statistics 22(3), 400–407. doi: 10.1214/aoms/1177729586.

CrossRef Google Scholar

Rumelhart , D., Hinton , G. & Mcclelland , J. 1986. A general framework for parallel distributed processing. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1.

Google Scholar

Salimans , T., et al. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864 [stat.ML].

Google Scholar

Schaul , T., Quan , J., Antonoglou , I. & Silver , D., 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952.

Google Scholar

Silver , D., Huang , A., et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489. doi: 10.1038/nature16961.

CrossRef Google Scholar

Silver , D., Hubert , T., et al. 2017. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 [cs.AI].

Google Scholar

Sousa , C., 2016. An overview on weight initialization methods for feedforward neural networks. doi: 10.1109/IJCNN.2016.7727180.

Google Scholar

Srivastava , N., et al. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(56), 1929–1958. http://jmlr.org/papers/v15/srivastava14a.html.

Google Scholar

Stanley , K. 2007. Compositional pattern producing networks: a novel abstraction of development. In Genetic Programming and Evolvable Machines 8, 131–162. doi: 10.1007/s10710-007-9028-8.

CrossRef Google Scholar

Stanley , K., D’Ambrosio , D. & Gauci , J. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artificial Life 15, 185–212. doi: 10.1162/artl.2009.15.2.15202.

CrossRef Google Scholar

Stanley , K. O. & Miikkulainen , R. 2002. Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2), 99–127. ISSN:1063-6560. doi: 10.1162/106365602320169811.

CrossRef Google Scholar

Strubell , E., Ganesh , A. & Mccallum , A. 2019. Energy and policy considerations for deep learning in NLP, 3645–3650. doi: 10.18653/v1/P19-1355.

Google Scholar

Such , F. P., et al. 2018. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arXiv:1712.06567 [cs.NE].

Google Scholar

Sutton , R. S. & Barto , A. G. 2018. Reinforcement Learning: An Introduction. A Bradford Book. ISBN:0262039249.

Google Scholar

Tassa , Y., Erez , T. & Todorov , E. 2012. Synthesis and stabilization of complex behaviors through online trajectory optimization, 4906–4913. ISBN:978-1-4673-1737-5. doi: 10.1109/IROS.2012.6386025.

Google Scholar

Van Hasselt , H. 2013. Reinforcement Learning in Continuous State and Action Spaces. doi: 10.1007/978-3-642-27645-3_7.

Google Scholar

van Hasselt , H., Guez , A. & Silver , D. 2016. Deep reinforcement learning with double Q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI’16. Phoenix, Arizona. AAAI Press, 2094–2100.

Google Scholar

Vannella , F., et al. 2021. Remote Electrical Tilt Optimization via Safe Reinforcement Learning. arXiv:2010.0584 2 [cs.LG].

Google Scholar

Wang , Z., Schaul , T., Hessel , M., Hasselt , H., Lanctot , M. & Freitas , N. 2016. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. ICML’16. New York, NY, USA. JMLR.org, 1995–2003.

Google Scholar

Whiteson , S. 2012. Evolutionary computation for reinforcement learning. In Reinforcement Learning: State of the Art. doi: 10.1007/978-3-642-27645-3_10.

CrossRef Google Scholar

Wierstra , D., et al. 2008. Natural evolution strategies. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), 3381–3387. doi: 10.1109/CEC.2008.4631255.

CrossRef Google Scholar

Xu , K., et al. 2016. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv:1502.03044 [cs.LG].

Google Scholar

Yajnanarayana , V., Ryden , H. & Hevizi , L. 2020. 5G handover using reinforcement learning. In 2020 IEEE 3rd 5G World Forum (5GWF). doi: 10.1109/5gwf49715.2020.9221072.

CrossRef Google Scholar

Yu , Y. 2018. Towards sample efficient reinforcement learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. IJCAI’18. Stockholm, Sweden. AAAI Press, 5739–5743. ISBN:9780999241127.

Google Scholar

Zheng , G., Zhang , F., Zheng , Z., Xiang , Y., Yuan , N.J., Xie , X. & Li , Z. 2018. DRN: a deep reinforcement learning framework for news recommendation. In WWW’18: Proceedings of the 2018 World Wide Web Conference, 167–176.

Google Scholar

About this article

Cite this article

Anirudh Seth, Alexandros Nikou, Marios Daoutis. 2022. A scalable species-based genetic algorithm for reinforcement learning problems. The Knowledge Engineering Review. 37:42 doi: 10.1017/S0269888922000042

Anirudh Seth, Alexandros Nikou, Marios Daoutis. 2022. A scalable species-based genetic algorithm for reinforcement learning problems. The Knowledge Engineering Review. 37:42 doi: 10.1017/S0269888922000042

Download PDF

Article Metrics

Article views(241) PDF downloads(439)

{{lists.name}}

A scalable species-based genetic algorithm for reinforcement learning problems

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors