doi:10.1017/S0269888922000042

Abbeel , P., Coates , A., Quigley , M. & Ng , A. 2006. An application of reinforcement learning to aerobatic helicopter flight. In Advances in Neural Information Processing Systems, 1–8.

Amdahl , G. M. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18–20,1967, Spring Joint Computer Conference. AFIPS’67 (Spring). Atlantic City, New Jersey. Association for Computing Machinery, 483–485. ISBN:9781450378956. doi: 10.1145/1465482.1465560.

Baluja , S. & Caruana , R. 1995. Removing the genetics from the standard genetic algorithm. In Proceedings of ICML’95. Morgan Kaufmann Publishers, 38–46.

Barr , K. 2007. ASIC Design in the Silicon Sandbox: A Complete Guide to Building Mixed-Signal Integrated Circuits. McGraw-Hill Education. ISBN:9780071481618. https://www.accessengineeringlibrary.com/content/book/9780071481618.

Bellemare , M. G., Naddaf , Y., Veness , J. & Bowling , M. 2013. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research 47(1), 253–279. ISSN:1076-9757. doi: 10.1613/jair.3912.

Bellman , R. 1958. Dynamic programming and stochastic control processes. Information and Control 1(3), 228–239. ISSN:0019-9958. https://doi.org/10.1016/S0019-9958(58)80003-0. https://www.sciencedirect.com/science/article/pii/S0019995858800030.

Bellman , R. E. 1954. The Theory of Dynamic Programming. RAND Corporation.

Bottou , L. 1998. Online Learning and Stochastic Approximations.

Boutilier , C., Dean , T. & Hanks , S. 1999. Decision-theoretic planning: Structural assumptions and computational leverage. The Journal of Artificial Intelligence Research (JAIR) 11. doi: 10.1613/jair.575.

Brockman , B., Cheung , V., Pettersson , L., Schneider , J., Schulman , J., Tang , J. & Zaremba , W. 2016. OpenAI Gym. arXiv:1606.01540 [cs.LG].

Cho , K., et al. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv:1409.1259 [cs.CL].

Conti , E., Madhavan , V., Such F. P., Lehman , J., Stanley , K. O. & Clune , J. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada (pp. 5032–5043).

Cully , A., et al. 2015. Robots that can adapt like animals. Nature 521, 503–507. doi: 10.1038/nature14422.

Cybenko , G. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2, 303–314.

Da Ronco , C. C. & Benini , E. 2014. A simplex-crossover-based multi-objective evolutionary algorithm. In Kim , H. K. et al. (eds), 583–598. doi: 10.1007/978-94-007-6818-5_41.

Darwin , C. 1859. On the Origin of Species by Means of Natural Selection. or the Preservation of Favored Races in the Struggle for Life. Murray.

Dong , Y. & Zou , X. 2020. Mobile robot path planning based on improved DDPG reinforcement learning algorithm. In 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), 52–56. doi: 10.1109/ICSESS49938.2020.9237641.

Fitch , F. B. 1944. Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of mathematical biophysics, vol. 5 (1943), pp. 115–133. Journal of Symbolic Logic 9(2), 49–50. doi: 10.2307/2268029.

Gangwani , T. & Peng , J. 2018. Policy Optimization by Genetic Distillation. arXiv:1711.01012 [stat.ML].

Glorot , X. & Bengio , Y. 2010. Understanding the difficulty of training deep feedforwardneural networks. Journal of Machine Learning Research-Proceedings Track 9, 249–256.

Goodfellow , I., Bengio , Y. & Courville , A. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

Hansen , N. & Auger , A. 2011. CMA-ES: evolution strategies and covariance matrix adaptation. In Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation. GECCO’11, Dublin, Ireland. Association for Computing Machinery, 991–1010. ISBN:9781450306904. doi: 10.1145/2001858.2002123.

Hasselt , H. 2010. Double Q-learning. In Advances in Neural Information Processing Systems, Lafferty, J., et al., 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2010/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf.

Haupt , S. & Haupt , R. 2003. Genetic algorithms and their applications in Environmental Sciences. 3rd Conference on Artificial Intelligence Applications to the Environmental Science. vol. 23. pp. 49–62.

He , K., et al. 2015a. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV].

He , K., et al. 2015b. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In IEEE International Conference on Computer Vision (ICCV2015), 1502. doi: 10.1109/ICCV.2015.123.

Herculano-Houzel , S. 2009. The human brain in numbers: a linearly scaled-up primate brain. eng. In Frontiers in Human Neuroscience 3.PMC2776484[pmcid], 31–31. ISSN:16625161. doi: 10.3389/neuro.09.031.2009.

Hochreiter , S. & Schmidhuber , J. 1997. Long short-term memory. Neural Computation 9(8), 1735–1780. ISSN:0899-7667. doi: 10.1162/neco.1997.9.8.1735.

Holland , J. H. 1992a. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. The MIT Press. ISBN: 9780262275552. doi: 10.7551/mitpress/1090.001.0001.

Holland , J. H. 1992b. Genetic algorithms. Scientific American 267(1). Full publication date: July 1992, 66–73. http://www.jstor.org/stable/24939139.

Jaderberg , M., et al. 2017. Population Based Training of Neural Networks. arXiv:1711.09846 [cs.LG].

Jouppi , N., et al. 2017. In-datacenter performance analysis of a tensor processing unit. ACM SIGARCH Computer Architecture News 45, 1–12. doi: 10.1145/3140659.3080246.

Kalashnikov , D., et al. 2018. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. arXiv:1806.10293 [cs.LG].

Kavalerov , M., Likhacheva , Y. & Shilova , Y. 2017. A reinforcement learning approach to network routing based on adaptive learning rates and route memory. In SoutheastCon 2017, 1–6. doi: 10.1109/SECON.2017.7925316.

Khadka , S., Majumdar , S., et al. 2019. Collaborative evolutionary reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, Chaudhuri , K. & Salakhutdinov , R. (eds), 97. Proceedings of Machine Learning Research. PMLR, 3341–3350. https://proceedings.mlr.press/v97/khadka19a.html.

Khadka , S. & Tumer , K. 2018. Evolution-Guided Policy Gradient in Reinforcement Learning. arXiv:1805.07917 [cs.LG].

Kullback , S. & Leibler , R. A. 1951. On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86.

Larranaga , P., et al. 1996. Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 26(4), 487–493. doi: 10.1109/3468.508827.

Larranaga , P., et al. 1999. Genetic algorithms for the travelling salesman problem: a review of representations and operators. Artificial Intelligence Review 13, 129–170. doi: 10.1023/A:1006529012972.

Lehman , J. & Stanley , K. 2008. Exploiting open-endedness to solve problems through the search for novelty. In ALIFE.

Lillicrap , T., et al. 2015. Continuous control with deep reinforcement learning. CoRR.

Liu , H., et al. 2017. Hierarchical representations for efficient architecture search. arXiv e-prints, arXiv:1711.00436 [cs.LG].

Liu , H., et al. 2018. Hierarchical Representations for Efficient Architecture Search. arXiv:1711.00436 [cs.LG].

Luong , N. C., et al. (2019). Applications of deep reinforcement learning in communications and networking: a survey. IEEE Communications Surveys Tutorials 21(4), 3133–3174. doi: 10.1109/COMST.2019.2916583.

Ma , S., et al. 2020. Image and video compression with neural networks: a review. IEEE Transactions on Circuits and Systems for Video Technology 30(6), 1683–1698. ISSN:15582205. doi: 10.1109/tcsvt.2019.2910119.

Mania , H., Guy , A. & Recht , B. 2018. Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055 [cs.LG].

Markov , A. A. 1906. Rasprostranenie zakona bol’shih chisel na velichiny, zavisyaschie drug ot druga. In Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete 15, 135156, 18.

Markov , A. A. 2006. An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains. Science in Context 19(4), 591–600. doi: 10.1017/S0269889706001074.

Melo , F. S., Meyn , S. P. & Ribeiro , M. I. 2008. An analysis of reinforcement learning with function approximation. In Proceedings of the 25th International Conference on Machine Learning. ICML’08. Helsinki, Finland. Association for Computing Machinery, 664–671. ISBN:9781605582054. doi: 10.1145/1390156.1390240.

Mitchell , M. 1996. An Introduction to Genetic Algorithms. MIT Press. ISBN:0262133164.

Mnih , V., Badia , A. P., et al. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, Balcan , M. F. & Weinberger , K. Q. (eds), 48. Proceedings of Machine Learning Research. New York, New York, USA, PMLR, 1928–1937. http://proceedings.mlr.press/v48/mniha16.html.

Mnih , V., Kavukcuoglu , K., et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–33. doi: 10.1038/nature14236.

Munemasa , I., et al. 2018. Deep reinforcement learning for recommender systems. In 2018 International Conference on Information and Communications Technology (ICOIACT), 226–233. doi: 10.1109/ICOIACT.2018.8350761.

Nair , A., Srinivasan , P., Blackwell , S., Alcicek , C., Fearon , R., De Maria , A., Panneershelvam , V., Suleyman , M., Beattie , C., Petersen , S. & Legg , S., 2015. Massively parallel methods for deep reinforcement learning. arXiv:1507.04296.

Neglia , G., et al. 2019. The role of network topology for distributed machine learning. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications, 2350–2358. doi: 10.1109/INFOCOM.2019.8737602.

Nikou , A., et al. to appear. Symbolic reinforcement laming for safe RAN control. In International Conference of Autonomous Agents and Multi Agent Systems (AAMAS).

Proshansky , H. & Murphy , G. 1942. The effects of reward and punishment on perception. The Journal of Psychology: Interdisciplinary and Applied 13, 295–305. doi: 10.1080/00223980.1942.9917097.

Pugh , J., Soros , L. & Stanley , K. 2016. Quality diversity: a new frontier for evolutionary computation. Frontiers in Robotics and AI 3. doi: 10.3389/frobt.2016.00040.

Purwins , H., et al. 2019. Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing 13(2), 206–219. ISSN:1941-0484. doi: 10.1109/jstsp.2019.2908700.

Puterman , M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 1st edition. John Wiley & Sons, Inc. ISBN:0471619779.

Radford , A., Wu , J., Child , R., Luan , D., Amodei , D. & Sutskever , I. (2019). Language models are unsupervised multitask learners. In OpenAI blog 1(8), 9.

Rall , L. B. 1981. Automatic Differentiation: Techniques and Applications. Lecture Notes in Computer Science.

Robbins , H. & Monro , S. 1951. A stochastic approximation method. The Annals of Mathematical Statistics 22(3), 400–407. doi: 10.1214/aoms/1177729586.

Rumelhart , D., Hinton , G. & Mcclelland , J. 1986. A general framework for parallel distributed processing. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1.

Salimans , T., et al. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864 [stat.ML].

Schaul , T., Quan , J., Antonoglou , I. & Silver , D., 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952.

Silver , D., Huang , A., et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489. doi: 10.1038/nature16961.

Silver , D., Hubert , T., et al. 2017. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 [cs.AI].

Sousa , C., 2016. An overview on weight initialization methods for feedforward neural networks. doi: 10.1109/IJCNN.2016.7727180.

Srivastava , N., et al. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(56), 1929–1958. http://jmlr.org/papers/v15/srivastava14a.html.

Stanley , K. 2007. Compositional pattern producing networks: a novel abstraction of development. In Genetic Programming and Evolvable Machines 8, 131–162. doi: 10.1007/s10710-007-9028-8.

Stanley , K., D’Ambrosio , D. & Gauci , J. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artificial Life 15, 185–212. doi: 10.1162/artl.2009.15.2.15202.

Stanley , K. O. & Miikkulainen , R. 2002. Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2), 99–127. ISSN:1063-6560. doi: 10.1162/106365602320169811.

Strubell , E., Ganesh , A. & Mccallum , A. 2019. Energy and policy considerations for deep learning in NLP, 3645–3650. doi: 10.18653/v1/P19-1355.

Such , F. P., et al. 2018. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arXiv:1712.06567 [cs.NE].

Sutton , R. S. & Barto , A. G. 2018. Reinforcement Learning: An Introduction. A Bradford Book. ISBN:0262039249.

Tassa , Y., Erez , T. & Todorov , E. 2012. Synthesis and stabilization of complex behaviors through online trajectory optimization, 4906–4913. ISBN:978-1-4673-1737-5. doi: 10.1109/IROS.2012.6386025.

Van Hasselt , H. 2013. Reinforcement Learning in Continuous State and Action Spaces. doi: 10.1007/978-3-642-27645-3_7.

van Hasselt , H., Guez , A. & Silver , D. 2016. Deep reinforcement learning with double Q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI’16. Phoenix, Arizona. AAAI Press, 2094–2100.

Vannella , F., et al. 2021. Remote Electrical Tilt Optimization via Safe Reinforcement Learning. arXiv:2010.0584 2 [cs.LG].

Wang , Z., Schaul , T., Hessel , M., Hasselt , H., Lanctot , M. & Freitas , N. 2016. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. ICML’16. New York, NY, USA. JMLR.org, 1995–2003.

Whiteson , S. 2012. Evolutionary computation for reinforcement learning. In Reinforcement Learning: State of the Art. doi: 10.1007/978-3-642-27645-3_10.

Wierstra , D., et al. 2008. Natural evolution strategies. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), 3381–3387. doi: 10.1109/CEC.2008.4631255.

Xu , K., et al. 2016. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv:1502.03044 [cs.LG].

Yajnanarayana , V., Ryden , H. & Hevizi , L. 2020. 5G handover using reinforcement learning. In 2020 IEEE 3rd 5G World Forum (5GWF). doi: 10.1109/5gwf49715.2020.9221072.

Yu , Y. 2018. Towards sample efficient reinforcement learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. IJCAI’18. Stockholm, Sweden. AAAI Press, 5739–5743. ISBN:9780999241127.

Zheng , G., Zhang , F., Zheng , Z., Xiang , Y., Yuan , N.J., Xie , X. & Li , Z. 2018. DRN: a deep reinforcement learning framework for news recommendation. In WWW’18: Proceedings of the 2018 World Wide Web Conference, 167–176.