DeepMind’s AlphaGo Zero AI program just became the Go champion of the world without human data or guidance. This new system marks a significant technological jump from the AlphaGo program which beat Go champion Lee Sedol in 2016.
The game of Go has been played for more than 2,500 years and is widely viewed as not only a game, but a complex art form. And a popular one at that. When the artificially intelligent AlphaGo from DeepMind played its first game against Sedol in March 2016, 60 million viewers tuned in to watch in China alone. AlphaGo went on to win four of five games, surprising the world and signifying a major achievement in AI research.
Unlike the chess match between Deep Blue and Garry Kasparov in 1997, AlphaGo did not win by brute force computing alone. The more complex programming of AlphaGo amazed viewers not only with the excellency of its play, but also with its creativity. The infamous “move 37” in game two was described by Go player Fan Hui as “So beautiful.” It was also so unusual that one of the commentators thought it was a mistake. Fan Hui explained, “It’s not a human move. I’ve never seen a human play this move.”
In other words, AlphaGo not only signified an iconic technological achievement, but also shook deeply held social and cultural beliefs about mastery and creativity. Yet, it turns out that AlphaGo was only the beginning. Today, DeepMind announced AlphaGo Zero.
Unlike AlphaGo, AlphaGo Zero was not shown a single human game of Go from which to learn. AlphaGo Zero learned entirely from playing against itself, with no prior knowledge of the game. Although its first games were random, the system used what DeepMind is calling a novel form of reinforcement learning to combine a neural network with a powerful search algorithm to improve each time it played.
In a DeepMind blog about the announcement, the authors write, “This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge. Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself.”
Though previous AIs from DeepMind have mastered Atari games without human input, as the authors of the Nature article note, “the game of Go, widely viewed as the grand challenge for artificial intelligence, a precise and sophisticated lookahead in vast search spaces.” While the old Atari games were much more straightforward, the new AI system for AlphaGo Zero had to master the strategy for immediate moves, as well as how to anticipate moves that might be played far into the future.
That this was done all without human demonstrations also takes the program a step beyond the original AlphaGo systems. But in addition to that, this new system learned with fewer input features than its predecessors, and while the original AlphaGo systems required two separate neural networks, AlphaGo Zero was built with only one.
AlphaGo Zero is not marginally better than its predecessor, but in an entirely new class of “superhuman performance” with an intelligence that is notably more general. After just three days of playing against itself (4.9 million times), AlphaGo Zero beat AlphaGo by 100 games to 0. It independently learned the ancient secrets of the masters, but also chose moves and developed strategies never before seen among human players.
Co-founder and CEO of DeepMind, Demis Hassabis, said: “It’s amazing to see just how far AlphaGo has come in only two years. AlphaGo Zero is now the strongest version of our program and shows how much progress we can make even with less computing power and zero use of human data.”
Hassabis continued, “Ultimately we want to harness algorithmic breakthroughs like this to help solve all sorts of pressing real world problems like protein folding or designing new materials. If we can make the same progress on these problems that we have with AlphaGo, it has the potential to drive forward human understanding and positively impact all of our lives.”