Learning self-play agents for combinatorial optimization problems

Ruiyang Xu; Karl Lieberherr; Ruiyang Xu; Karl Lieberherr

doi:10.1017/S026988892000020X

Abstract: Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka’s Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.

Other Articles By Authors

Learning self-play agents for combinatorial optimization problems

Northeastern University Khoury College of Computer Sciences, Boston, MA, USA, e-mails: ruiyang@ccs.neu.edu, lieber@ccs.neu.edu

Received: 25 February 2020

Accepted: 26 February 2020

Published online: 23 March 2020

Abstract: Abstract: Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka’s Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.

HTML

Our implementation is based on an open-source, lightweight framework, AlphaZero General: https://github.com/suragnair/alpha-zero-general.

Theoretically, the exploratory term should be $\sqrt{\frac{\sum_{a^{\prime}}N(s_{i-1},a^{\prime})}{N(s_{i-1},a)+1}}$; however, AlphaZero used the variant $\frac{\sqrt{\sum_{a^{\prime}} N(s_{i-1},a^{\prime})}}{N(s_{i-1},a)+1}$ without any explanation. We tried both in our implementation, and it turns out that only the AlphaZero one works, while the other one quickly converges to an incorrect strategy.

In our paper, we call Verifier the Proponent, and Falsifier the OP.

The experiments reported in this paper use TensorFlow 1.15 with Graphics Processing Unit (GPU) support for training the neural networks. To test reproducibility, we also ran the experiments on TensorFlow 2.0, which does not have GPU support yet. To our surprise, our algorithms stopped converging to the winning strategy on TensorFlow 2.0. We remark this lack of reproducibility of our results for researchers who plan to build on our work. We are still trying to find the root cause behind the reproducibility problem, which is most likely caused by an implementation bug in TensorFlow 2.0.

Rights and permissions

References (25)

About this article

Cite this article

Ruiyang Xu, Karl Lieberherr. 2020. Learning self-play agents for combinatorial optimization problems. The Knowledge Engineering Review. 35: doi: 10.1017/S026988892000020X

Ruiyang Xu, Karl Lieberherr. 2020. Learning self-play agents for combinatorial optimization problems. The Knowledge Engineering Review. 35: doi: 10.1017/S026988892000020X

{{lists.name}}

Learning self-play agents for combinatorial optimization problems

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors