Learning and Go

I must admit I didn’t take much notice of the DeepMind’s self-challenge to defeat a human Go player before. Today, their algorithm AlphaGo beat Lee Sedol for the second time, and that made me wonder how they did it. According to Nature, “the average 150-move game contains more possible board configurations — 10170 — than there are atoms in the Universe, so it can’t be solved by algorithms that search exhaustively for the best move”.

From the same Nature article,

AlphaGo plays in a human way, says Fan. “If no one told me, maybe I would think the player was a little strange, but a very strong player, a real person.” The program seems to have developed a conservative (rather than aggressive) style, adds Toby Manning, a lifelong Go player who refereed the match.

Do you know how a magic trick gives wonder? Well, just like if you knew how a magic trick is done, studying machine learning takes the marvel out of the human-machine Go game duel. On the other hand, understanding machine learning makes you admire the achievement even more, knowing that one, some of the very best minds must have worked on the algorithm, two, the real ‘wonder’ application is ultimately not the Go game, and three, believing that machine learning is to data what Jethro Tull was to the agricultural revolution.

According to Wikipedia, Jethro Tull “perfected the horse-drawn seed drill in 1701 that economically sowed the seeds in neat rows. He later developed a horse-drawn hoe. Tull’s methods were adopted by many great land owners and helped to provide the basis for modern agriculture”. If you think about it, it’s not so very different. For machine learning, the data has to be prepped first into neat little rows and columns and later ‘harvested’ for ideas.

Pondering it, Go naturally lends itself to the classification problem, more specifically, the tree search and the boosting method. Now game tree is not a new thing, it has been used for solving chess and other arcade games such as RushHour, the car parking game app my sister is so great at solving. According to this write-up, minimax tree search can calculate the high probability of the opponent winning and abandoning that node of play, otherwise known as alpha-beta pruning. However, with some cases, “it is simply too expensive to exhaustively search the game tree. The search tree grows approximately as MN, where M is the number of moves available per turn, and N is the number of total turns in a game”.

Thus, more than brute force is required. This is where the old (but back in fashion) deep neural networks idea comes in. What we do is build a ‘matrix of positions’ and another of the ‘probability of winning’. Mapping the positions onto the probability of winning gives a guide as to which next move is the best through a simple binary classification, without a lookahead. This is known as value networks.

My suspicion is on top of that, they classify local areas and map that also to the probability of winning. Doing this would speed up the runtime – but this is just my guess. Is it then about mapping? No, it’s a search problem, and search is not about maps. Search is about making choices when you explore the tree. This is known as policy networks. Hence, AlphaGo is a hybrid of two networks.

For those who wish to understand this in depth, do refer to this. I’ve blogged before on gradient descent here. On the search algorithm, here. From their paper published in Nature, “Mastering the game of Go with deep neural networks and tree search”,

Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play.

That, in a nutshell, is how it is done.

To sum, what did we use to do with data? We used to do the thinking first, model it and apply the thinking to the data. With machine learning, we let the computer learn based on the data what the best idea is for it to achieve the goal that we have set out. With one subtle shift of this responsibility from human to machine, many things become possible, including playing Go.

There is still some apprehension in many places about these techniques, often claiming over-fitting. In fact, machine learning is much more transparent about this problem, using many techniques to mitigate this problem whereas human “priors” are regarded as pure, rather than the result of previous experience that vastly reduce the true degrees of freedom or introduce look-ahead bias.

While sceptics won’t change their minds, many others are quietly ploughing on, utilising its benefits, some for their own private advancement and some, in the hope of the betterment of our society.

Leave a comment