Podcast: Top AI Breakthroughs, with Ian Goodfellow and Richard Mallah

2016 saw some significant AI developments. To talk about the AI progress of the last year, we turned to Richard Mallah and Ian Goodfellow. Richard is the director of AI projects at FLI, he’s the Senior Advisor to multiple AI companies, and he created the highest-rated enterprise text analytics platform. Ian is a research scientist at OpenAI, he’s the lead author of the Deep Learning textbook, and he’s a lead inventor of Generative Adversarial Networks.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

Ariel: Two events stood out to me in 2016. The first was AlphaGo, which beat the world’s top Go champion, Lee Sedol last March. What is AlphaGo, and why was this such an incredible achievement?

Ian: AlphaGo was DeepMind’s system for playing the game of Go. It’s a game where you place stones on a board with two players, the object being to capture as much territory as possible. But there are hundreds of different positions where we can place a stone on each turn. It’s not even remotely possible to use a computer to simulate many different Go games and figure out how the game will progress in the future. The computer needs to rely on intuition the same way that human Go players can look at a board and get kind of a sixth sense that tells them whether the game is going well or poorly for them, and where they ought to put the next stone. It’s computationally infeasible to explicitly calculate what each player should do next.

Richard: The DeepMind team has one network for what’s called value learning and another deep network for policy learning. The policy is, basically, which places should I evaluate for the next piece. The value network is how good that state is, in terms of the probability that the agent will be winning. And then they do a Monte Carlo tree search, which means it has some randomness and many different paths — on the order of thousands of evaluations. So it’s much more like a human considering a handful of different moves and trying to determine how good those moves would be.

Ian: From 2012 to 2015 we saw a lot of breakthroughs where the exciting thing was that AI was able to copy a human ability. In 2016, we started to see breakthroughs that were all about exceeding human performance. Part of what was so exciting about AlphaGo was that AlphaGo did not only learn how to predict what a human expert Go player would do, AlphaGo also improved beyond that by practicing playing games against itself and learning how to be better than the best human player. So we’re starting to see AI move beyond what humans can tell the computer to do.

Ariel: So how will this be applied to applications that we’ll interact with on a regular basis? How will we start to see these technologies and techniques in action ourselves?

Richard: With these techniques, a lot of them are research systems. It’s not necessarily that they’re going to directly go down the pipeline towards productization, but they are helping the models that are implicitly learned inside of AI systems and machine learning systems to get much better.

Ian: There are other strategies for generating new experiences that resemble previously seen experiences. One of them is called WaveNet. It’s a model produced by DeepMind in 2016 for generating speech. If you provide a sentence, just written down, and you’d like to hear that sentence spoken aloud, WaveNet can create an audio waveform that sounds very realistically like a human pronouncing that sentence written down. The main drawback to WaveNet right now is that it’s fairly slow. It has to generate the audio waveform one piece at a time. I believe it takes WaveNet two minutes to produce one second of audio, so it’s not able to make the audio fast enough to hold an interactive conversation.

Richard: And similarly, we’ve seen applications to colorizing black and white photos, or turning sketches into somewhat photo-realistic images, being able to turn text into images.

Ian: Yeah one thing that really highlights how far we’ve come is that in 2014, one of the big breakthroughs was the ability to take a photo and produce a sentence summarizing what was in the photo. In 2016, we saw different methods for taking a sentence and producing a photo that contains the imagery described by the sentence. It’s much more complicated to go from a few words to a very realistic image containing thousands or millions of pixels than it is to go from the image to the words.

Another thing that was very exciting in 2016 was the use of generative models for drug discovery. Instead of imagining new images, the model could actually imagine new molecules that are intended to have specific medicinal effects.

Richard: And this is pretty exciting because this is being applied towards cancer research, developing potential new cancer treatments.

Ariel: And then there was Google’s language translation program, Google Neural Machine Translation. Can you talk about what that did and why it was a big deal?

Ian: It’s a big deal for two different reasons. First, Google Neural Machine Translation is a lot better than previous approaches to machine translation. Google Neural Machine Translation removes a lot of the human design elements, and just has a neural network figure out what to do.

The other thing that’s really exciting about Google Neural Machine Translation is that the machine translation models have developed what we call an “Interlingua.” It used to be that if you wanted to translate from Japanese to Korean, you had to find a lot of sentences that had been translated from Japanese to Korean before, and then you could train a machine learning model to copy that translation procedure. But now, if you already know how to translate from English to Korean, and you know how to translate from English to Japanese, in the middle, you have Interlingua. So you translate from English to Interlingua and then to Japanese, English to Interlingua and then to Korean. You can also just translate Japanese to Interlingua and Korean to Interlingua and then Interlingua to Japanese or Korean, and you never actually have to get translated sentences from every pair of languages.

Ariel: How can the techniques that are used for language apply elsewhere? How do you anticipate seeing this developed in 2017 and onward?

Richard: So I think what we’ve learned from the approach is that deep learning systems are able to create extremely rich models of the world that can actually express what we can think, which is a pretty exciting milestone. Being able to combine that Interlingua with more structured information about the world is something that a variety of teams are working on — it is a big, open area for the coming years.

Ian: At OpenAI one of our largest projects, Universe, allows a reinforcement learning agent to play many different computer games, and it interacts with these games in the same way that a human does, by sending key presses or mouse strokes to the actual game engine. The same reinforcement learning agent is able to interact with basically anything that a human can interact with on a computer. By having one agent that can do all of these different things we will really exercise our ability to create general artificial intelligence instead of application-specific artificial intelligence. And projects like Google’s Interlingua have shown us that there’s a lot of reason to believe that this will work.

Ariel: What else happened this year that you guys think is important to mention?

Richard: One-shot [learning] is when you see just a little bit of data, potentially just one data point, regarding some new task or some new category, and you’re then able to deduce what that class should look like or what that function should look like in general. So being able to train systems on very little data from just general background knowledge, will be pretty exciting.

Ian: One thing that I’m very excited about is this new area called machine learning security where an attacker can trick a machine learning system into taking the wrong action. For example, we’ve seen that it’s very easy to fool an object-recognition system. We can show it an image that looks a lot like a panda and it gets recognized as being a school bus, or vice versa. It’s actually possible to fool machine learning systems with physical objects. There was a paper called Accessorize to a Crime, that showed that by wearing unusually-colored glasses it’s possible to thwart a face recognition system. And my own collaborators at GoogleBrain and I wrote a paper called Adversarial Examples in the Physical World, where we showed that we can make images that look kind of grainy and noisy, but when viewed through a camera we can control how an object-recognition system will respond to those images.

Ariel: Is there anything else that you thought was either important for 2016 or looking forward to 2017?

Richard: Yeah, looking forward to 2017 I think there will be more focus on unsupervised learning. Most of the world is not annotated by humans. There aren’t little sticky notes on things around the house saying what they are. Being able to process [the world] in a more unsupervised way will unlock a plethora of new applications.

Ian: It will also make AI more democratic. Right now, if you want to use really advanced AI you need to have not only a lot of computers but also a lot of data. That’s part of why it’s mostly very large companies that are competitive in the AI space. If you want to get really good at a task you basically become good at that task by showing the computer a million different examples. In the future, we’ll have AI that can learn much more like a human learns, where just showing it a few examples is enough. Once we have machine learning systems that are able to get the general idea of what’s going on very quickly, in the way that humans do, it won’t be necessary to build these gigantic data sets anymore.

Richard: One application area I think will be important this coming year is automatic detection of fake news, fake audio and fake images and fake video. Some of the applications this past year have actually focused on generating additional frames of video. As those get better, as the photo generation that we talked about earlier gets better, and also as audio templating gets better… I think it was Adobe that demoed what they called PhotoShop for Voice, where you can type something in and select a person, and it will sound like that person saying whatever it is that you typed. So we’ll need ways of detecting that, since this whole concept of fake news is quite at the fore these days.

Ian: It’s worth mentioning that there are other ways of addressing the spread of fake news. Email spam uses a lot of different clues that it can statistically associate with whether people mark the email as spam or not. We can do a lot without needing to advance the underlying AI systems at all.

Ariel: Is there anything that you’re worried about, based on advances that you’ve seen in the last year?

Ian: The employment issue. As we’re able to automate our tasks in the future, how will we make sure that everyone benefits from that automation? And the way that society is structured, right now increasing automation seems to lead to increasing concentration of wealth, and there are winners and losers to every advance. My concern is that automating jobs that are done by millions of people will create very many losers and a small number of winners who really win big.

Richard: I’m also slightly concerned with the speed at which we’re approaching additional generality. It’s extremely cool to see systems be able to do lots of different things, and being able to do tasks that they’ve either seen very little of or none of before. But it raises questions as to when we implement different types of safety techniques. I don’t think that we’re at that point yet, but it raises the issue.

Ariel: To end on a positive note: looking back on what you saw last year, what has you most hopeful for our future?

Ian: I think it’s really great that AI is starting to be used for things like medicine. In the last year we’ve seen a lot of different machine learning algorithms that could exceed human abilities at some tasks, and we’ve also started to see the application of AI to life-saving application areas like designing new medicines. And this makes me very hopeful that we’re going to start seeing superhuman drug design, and other kinds of applications of AI to just really make life better for a lot of people in ways that we would not have been able to do without it.

Richard: Various kinds of tasks that people find to be drudgery within their jobs will be automatable. That will lead them to be open to working on more value-added things with more creativity, and potentially be able to work in more interesting areas of their field or across different fields. I think the future is wide open and it’s really what we make of it, which is exciting in itself.