Blog

by Jan Frejlak

AlphaGO

 

In 1990s a Taiwanese industrialist Ing Chang-Ki has founded a 1 400 000 dollar prize for the first computer Go-playing program that succeeds in beating a Chinese-Taipei Go Professional before 2000. Nobody gained this prize. Opinion that Go issue would be much harder than solving Chess was common among all AI professionals. To illustrate the difference it was often mentioned that Chess game has an average number of moves around 38 and about 34 choices per move, while typical Go game has about 180 moves, with an average of around 240 choices per move. And the number of imaginable Go games is much higher than the total number of atoms in the visible universe. What makes computer-Go much more difficult is also lack of any imaginable heuristic. When asked how do they estimate their positions, Go masters usually answer that they just “feel” if it is good or bad. Human players, thanks to geometric imagination, ability to construct abstract concepts and creativity, are able to reduce a multitude of choices into few candidates worth consideration and find the best one.

All developed before 2000 Go playing programs were very weak. Even if some of them could beat human beginners in several games, eventually they became “cracked” – each had its “blind spot” enabling to win all games when it get exposed.

Situation has slightly changed by the end of the first decade of XXI century when MCTS (Monte Carlo Tree Search) technology was applied in computer Go. Several programs started to play as stronger amateurs. They were also not easy to “crack”. However the gap between amateur and professional players is in Go so enormous that it still seemed to be impossible to develop a master level software.

The success of DeepMind was possible thanks to innovative connection of MCTS and DNN (Deep Neural Networks) technologies. The researchers started with training networks on the set of 200 thousand strong amateur games downloaded from Kiseido Go Server. Next artfully used pretrained DNNs with MCTS to improve DNNs. AlphaGo game playing program uses combination of MCTS and two DNNs. The first one, called a “policy network” predicts moves that are most likely to be played by human experts in a given situation. A “value network” estimates game result (value of a given game state). First of this networks acts as a human expert “shape feeling” when moves to be analyzed are selected. Moves which make “better shape” have better chance to be used in the search tree. Second network supports Monte Carlo playouts in estimating “leaf” value and behaves like a masters “probabilistic” intuition. Usage of MCTS fulfils “creativity” role in machine learning and game playing processes: even weird moves get their opportunity “to prove that they are good”.

Our goal is to use a similar approach to solve a more general issue and develop an AI system that can be used by game developers in order to improve skills of NPCs in various kinds of games (classical, board, card, RPG).