doi:10.1017/S026988892000020X

Anthony , T., Tian , Z. & Barber , D.2017. Thinking fast and slow with deep learning and tree search. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, 5366–5376.

Auer , P., Cesa-Bianchi , N. & Fischer , P.2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning47(2), 235–256.

Auger , D., Couetoux , A. & Teytaud , O.2013. Continuous upper confidence trees with polynomial exploration - consistency. In ECML/PKDD (1), Lecture Notes in Computer Science8188, 194–209. Springer.

Battaglia , P. W., Hamrick , J. B., Bapst , V., Sanchez-Gonzalez , A., Zambaldi , V., Malinowski , M., Tacchetti , A., Raposo , D., Santoro , A., Faulkner , R., Gulcehre , C., Song , F., Ballard , A, Gilmer , J., Dahl , G., Vaswani , A., Allen , K., Nash , C., Langston , V., Dyer , C., Heess , N, Wierstra , D., Kohli , P., Botvinick , M., Vinyals , O., Li , Y. & Pascanu , R.2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint .

Bello , I., Pham , H., Le , Q. V., Norouzi , M. & Bengio , S.2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint .

Bjornsson , Y. & Finnsson , H.2009. Cadiaplayer: a simulation-based general game player. IEEE Transactions on Computational Intelligence and AI in Games1(1), 4–15.

Browne , C., Powley , E. J., Whitehouse , D., Lucas , S. M., Cowling , P. I., Rohlfshagen , P., Tavener , S., Liebana , D. P., Samothrakis , S. & Colton , S.2012. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games4(1), 1–43.

Fujikawa , Y. & Min , M.2013. A new environment for algorithm research using gamification. IEEE International Conference on Electro-Information Technology , EIT 2013, Rapid City, SD, 1–6.

Genesereth , M., Love , N. & Pell , B.2005. General game playing: Overview of the AAAI competition. AI Magazine26(2), 62.

Gilmer , J., Schoenholz , S. S., Riley , P. F., Vinyals , O. & Dahl , G. E.2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning 70, 1263–1272. JMLR. org.

Hintikka , J.1982. Game-theoretical semantics: insights and prospects. Notre Dame Journal of Formal Logic23(2), 219–241.

Khalil , E., Dai , H., Zhang , Y., Dilkina , B. & Song , L.2017. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems, 6348–6358.

Kocsis , L. & Szepesvári , C.2006. Bandit based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning, ECML ’06, 282–293. Springer-Verlag.

Laterre , A., Fu , Y., Jabri , M. K., Cohen , A.-S., Kas , D., Hajjar , K., Dahl , T. S., Kerkeni , A. & Beguir , K.2018. Ranked reward: enabling self-play reinforcement learning for combinatorial optimization. arXiv preprint .

Mnih , V., Kavukcuoglu , K., Silver , D., Rusu , A. A., Veness , J., Bellemare , M. G., Graves , A., Riedmiller , M., Fidjeland , A. K., Ostrovski , G., Petersen , S., Beattie , C., Sadik , A., Antonoglou , I., King , H., Kumaran , D., Wierstra , D., Legg , S. & Hassabis , D.2015. Human-level control through deep reinforcement learning. Nature518(7540), 529–533.

Novikov , F. & Katsman , V.2018. Gamification of problem solving process based on logical rules. In Informatics in Schools. Fundamentals of Computer Science and Software Engineering, Pozdniakov , S. N. & Dagienė , V. (eds). Springer International Publishing, 369–380.

Racanière , S., Weber , T., Reichert , D. P., Buesing , L., Guez , A., Rezende , D., Badia , A. P., Vinyals , O., Heess , N., Li , Y., Pascanu , R., Battaglia , P., Hassabis , D., Silver , D. & Wierstra , D.2017. Imagination-augmented agents for deep reinforcement learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, 5694–5705. Curran Associates Inc.

Rezende , M. & Chaimowicz , L.2017. A methodology for creating generic game playing agents for board games. In 2017 16th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), 19–28. IEEE.

Selsam , D., Lamm , M., Bunz , B., Liang , P., de Moura , L. & Dill , D. L.2018. Learning a sat solver from single-bit supervision. arXiv preprint .

Silver , D., Hubert , T., Schrittwieser , J., Antonoglou , I., Lai , M., Guez , A., Lanctot , M., Sifre , L., Kumaran , D., Graepel , T., Lillicrap , T. P., Simonyan , K. & Hassabis , D.2017a. Mastering Chess and Shogi by self-play with a general reinforcement learning algorithm. CoRR .

Silver , D., Hubert , T., Schrittwieser , J., Antonoglou , I., Lai , M., Guez , A., Lanctot , M., Sifre , L., Kumaran , D., Graepel , T., Lillicrap , T., Simonyan , K. & Hassabis , D.2018. A general reinforcement learning algorithm that masters Chess, Shogi, and Go through self-play. Science362(6419), 1140–1144.

Silver , D., Schrittwieser , J., Simonyan , K., Antonoglou , I., Huang , A., Guez , A., Hubert , T., Baker , L., Lai , M., Bolton , A., Chen , Y., Lillicrap , T., Hui , F., Sifre , L., van den Driessche , G., Graepel , T. & Hassabis , D.2017b. Mastering the game of Go without human knowledge. Nature550, 354.

Sniedovich , M.2003. OR/MS games: 4. The joy of egg-dropping in Braunschweig and Hong Kong. INFORMS Transactions on Education4(1), 48–64.

Vinyals , O., Fortunato , M. & Jaitly , N.2015. Pointer networks. In Advances in Neural Information Processing Systems, Cortes , C., Lawrence , N. D., Lee , D. D., Sugiyama , M. & Garnett , R. (eds), 28. Curran Associates, Inc., 2692–2700.

Williams , R. J.1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning8, 229–256.