[beginning of recorded material]
Ariel: I’m Ariel Conn with the Future of Life Institute. If you’ve been following FLI at all, you know that we’re both very concerned but also very excited about the potential impact artificial intelligence will have on our future. Naturally, we’ve been following the big breakthroughs that occur in the field of AI each year, and 2016 saw some significant developments. To talk about the AI progress we saw last year, I have with me Richard Mallah and Ian Goodfellow. Richard is the director of AI projects at FLI, he’s the Senior Advisor to multiple AI companies, and he created the highest-rated enterprise text analytics platform. Ian is a research scientist at OpenAI, he’s the lead author of a deep learning textbook, and he’s the inventor of Generative Adversarial Networks.
Richard and Ian, thank you so much for joining us.
Richard: Thank you for having us.
Ian: Yeah, thank you for inviting me.
Ariel: So first, before we get into 2016, I wanted to go over a quick recap of a bit about AI history in general, and what some of the past AI breakthroughs have been over the last 50 years or so to give us an idea of what 2016 looked like. So Richard, why don’t we start with you – if you could give us a quick review of what AI has looked like over the years.
Richard: Sure, so I guess we can say that AI really started with Alan Turing in the 1940s when he discussed the possibilities of what these new machines that he was talking about as he was laying the foundations for computer science – what they would be capable of doing in the future and the limit. It was really in the mid-1950s that there was a push to actually make this a field. Initially they significantly underestimated how much work it would be to have something that was approaching human-level or able to do things that were impressive. But year by year it became easier because people were building on the body of work from prior years. So actually, in a way, progress has been growing geometrically, by being able to combine things and being able to do some tweaks and new additions and inventions, and improving on everything that came before. Jumping to the 1980s, people were doing a lot of hand-coded features for things like computer vision and for speech recognition. We had early neural networks then, but they were very small and people essentially thought they were toys back then because, mainly the hardware was pretty slow. So developments in the actual algorithms came not nearly as quickly as we’re seeing them now.
Ariel: Can you explain quickly what a neural net is?
Richard: Actually, Ian, you wrote the book on deep nets so maybe you can have a more clear and concise description than I will.
Ian: So a deep neural network is a metaphor that is designed to capture some of our understanding of the brain in a computer. It’s not an attempt to exactly replicate the way that the brain functions, but we understand that the brain has billions of different neurons. Each of those neurons responds to patterns in its input, and when a long chain of neurons each respond to different patterns that they see, eventually we get neurons that can respond to quite complicated patterns in our sensory experiences of the world. We have a neuron that activates when we see a picture of a specific celebrity, for example. The deep neural networks in computers capture this basic idea of having many different computational units each of which is not very intelligent, but the system as a whole is able to respond to complicated patterns.
Ariel: Ok, so keeping the history of this progress in mind, how does these deep neural nets compare to the neural nets of the 80s?
Ian: I’m not sure that there’s that big of a dividing line between 1980s neural nets and modern neural nets, especially the convolutional networks in the late 1980s were reasonably deep.
Ariel: Ok. So let’s stay in the 80s for just a moment. What else is helpful to know in order to understand where we are today with AI?
Ian: So I guess one trend that we often see in the historical development of AI is that ideas can be quite old, and their realization, their actual execution at very high standards of performance and their commercialization can come decades later. So in the case of the neural network algorithms that we’re seeing be very popular today, the basic model family of the modern convolutional network was developed by Fukushima in the early 1980s, but he did not have a training algorithm that could fit this model to data very effectively. And then Yann LeCun combined that kind of architecture that was inspired by the visual system of the brain with the backpropagation algorithm and was then able to very effectively train this kind of deep network, and that was the birth of the modern convolutional network. But it still took decades of advances in computing power and collection of training data before we got to the point that convolutional networks were able to recognize objects at near-human level the way that we’ve seen in the last five years.
Richard: And it’s not just convolutional networks, but neural networks and deep nets in general that don’t necessarily involve convolution. Convolution is a way of combining different signals in a way that one passes over the other, sort of modifying its shape as per the structure of what we call a kernel, which is essentially a transform. So this is primarily useful for signals that are either one-dimensional like sound or two-dimensional like images, but things that have data points that are very close to each other, it operates locally in that regard. But there are some other things, some other types of data that don’t really have that property. So if we’re analyzing things like document structure, these may be more like long-term conditional dependencies, where something like a recurrent neural network is more appropriate, that probably lacks convolution.
Ariel: And so then, in sort of the mid-2000s, it sounds like there was some sort of breakthrough that helped speed up, or make more powerful, AI capabilities. Can you talk a little bit about what happened then?
Ian: So one of the main limitations of neural networks was that we could not make them very deep. By depth I mean that the number of different neurons that would be involved in a sequence when we processed any particular input would be relatively limited. So there might be many different neurons arranged side-by-side next to each other all processing the same input. We would not have a long step where one neuron is the input to another neuron and that is the input to another neuron. You can think of this as saying that neural networks were only able to learn very short computer programs. And what we’d really like to do is have neural networks that chain many different neurons together so they can learn longer programs. Until 2006, we had only really been able to train deep networks if we used some special restrictions on the architecture of the network, like using a convolutional network, for example. In 2006, Geoff Hinton and his collaborators figured out a way to train a network that was three layers deep, and did not have any particular restrictions on its connectivity. That ended up forming a revolution that was maybe more based on people’s expectations of what was possible, than on the technology itself. Today we often train networks that are over a thousand layers deep. That’s not necessarily the best-performing network but it can easily be done. We don’t use the same techniques that were developed in 2006, but prior to 2006 many people believed that it was just mathematically infeasible to study deep neural networks, that we should focus our attention on other models that were easier to understand with concrete theories that could predict exactly what the model would learn. Deep neural networks don’t have much of a theoretical underpinning that tells us exactly what the model will learn when we apply it to a particular training set. And because computers used to be relatively slow it was difficult to run several different experiments with them, and if the first few experiments failed, then people concluded that neural networks were just brittle things that didn’t really work. A lot of the time in sports we see that as soon as someone sets a record, many other people are able to match that record. For example, the first time that someone ran a four-minute mile that inspired many other people to go and run a four-minute mile. Training deep networks was somewhat similar in the sense that as soon as we had seen a demonstration that it was possible, many other people began to work on the same model family and we found that they weren’t nearly as difficult to use as everyone had previously believed.
Ariel: Ok, and then, this might be my own personal bias because I became more interested in AI in 2014 and 2015, but it seemed like more was happening then. Was that just sort of this general progression, or were there more breakthroughs in those couple of years?
Richard: So there’s sort of constant progression, in fact in this geometric manner, which is mostly behind the scenes or within the field. People are knowledgeable of the sorts of developments, both evolutionary and revolutionary, but it’s only when something beats a human or there’s some milestone that’s reported externally, that the public often sees as the breakthroughs. So once we got to the level of deep learning, around 2006-7 or so, every year there was impressive improvements on the architectures, on the algorithms, and of course on the hardware as well. We would get these things that are actually evolutionary on one level but at another level they look revolutionary, in terms of the quality of the results. So that’s been building up pretty steadily since then.
Ian: I would say that between 2006-2012 there was a lot of energy and excitement, but a lot of the things that were developed in that time period ended up being discarded. Geoff Hinton himself described a lot of the work, even work that he himself did, as being a distraction. To some extent that’s the nature of research; we try out different things and some research ideas end up bearing fruit and go on to form a long line of future advances, and some of the branches of the tree that we explore turn out not to be as useful after we’ve followed them for a little while. Between 2006 and 2012 there were some branches of the tree that we pushed further down than we had ever pushed before, but ultimately we’ve ended up mostly abandoning a lot of those newer branches. And the deep neural networks of 2017 look a lot more like the deep neural networks of the 1980s than they look like the deep neural networks of 2011.
Ariel: Ok, so then that makes it sound like there was stuff happening in 2012, but then also 2013, 2014, 2015. What were some of the big things that happened in those years?
Ian: Yes, definitely. So I would say that 2012 was really a landmark year where everything changed, and somewhat began the trajectory that we’re on today. That was when Ilya [Sutskever], Alex [Krizhevsky] and Geoff [Hinton] won the ImageNet Object Recognition Contest using deep convolutional networks. And when they won that contest, they didn’t just come in first place, they got half the error rate of the second place team. So it was a very clear victory with a very large margin. And they did it using a technique that most people in computer vision had not taken seriously, until relatively recently beforehand. Yann LeCun had famously written letters to organizers of conferences complaining about the difficulty of being allowed to publish convolutional networks in computer vision conferences, and after this breakthrough contest victory by the team from the University of Toronto, computer vision is now nearly entirely the study of convolutional networks. Since that time, people have continued to make lots of advances solving different application areas using more or less the same core technologies that were used for this 2012 victory. And those core technologies are the backpropagation algorithm, deep neural networks, and the gradient descent algorithm. All of those algorithms existed in the 1980s.
Ariel: And so can you tell us a bit about the backpropagation algorithm and the gradient descent algorithm?
Ian: So gradient descent is the idea that if we can write down a mathematical formula that describes how well we are doing, if we can describe what we are doing right now as a list of coordinates on a map somewhere, then we can think of the function that tells us how well we’re doing as being like a valley, and we’re trying to find our way to the bottom of the valley. The lower we go in the valley the better we’re doing. So gradient descent just means that we look at the patch of ground that we’re standing on right now and we figure out which direction is the steepest way to go downhill. And we just keep taking more downhill steps until eventually we end up at the bottom of the valley. That basic algorithmic idea goes back to at least as far as the 1840s and probably further back than that. We know that Augustin Cauchy described this algorithm in a letter to the Royal French Academy of Sciences, but I would not be surprised if Isaac Newton, for example, knew about this algorithm centuries earlier. The difficulty in training a deep neural network is that we can write down an expression that describes how well the network is doing at solving some task like recognizing images. That expression can be something like, the probability that the network assigns the right label to the image. But to figure out which way we go downhill can be complicated because the neural network involves so many different neurons. The backpropagation algorithm is basically just an algorithm for solving the calculus problem that says, ‘which direction is downhill’ for this neural network. How should we adjust all of the connection strengths between neurons in order to make a little tiny step that improves the performance of the neural network? And that backpropagation algorithm was first really articulated and put to the test in the 1980s, even though it’s based on calculus ideas that go back to the 1600s, and algorithm design ideas that go back to the mid-20th century.
Ariel: So using that, let’s go ahead and move into 2016 now. For me, there were two events that did a good job of attracting major news sources and making big headlines. The first was AlphaGo, which beat the world’s top champion in Go last March. I was wondering if you both could talk a little bit about what AlphaGo was, why it was such an incredible achievement to beat Lee Sedol, who was the champion. And one of the things that I thought was interesting is that most AI researchers thought it would still be quite a few years before we could create something that could beat a world champion at Go, so this seemed to surprise a lot of experts. And I’m curious what this means for the future of AI: how does this sort of achievement affect what will happen in 2017 and moving forward? Can AlphaGo’s technology be applied to everyday life?
Ian: So AlphaGo was DeepMind’s system for playing the game of Go. It’s a game where you place stones on a board with two players, the object being to capture as much territory as possible, following a very simple set of rules. But the board is very large, which means that there are hundreds of different positions where we can place a stone on each turn. Compare that to Chess, where for Chess it’s possible to move a piece maybe thirty or so different ways. Because there are hundreds of ways of placing a stone on each turn, there are far more possible games of Go, especially if we consider that the length of an entire game of Go is actually also much longer than the length of an average Chess game. Because there’s this explosion in the number of possible ways that a game can play out, it’s not even remotely possible to use a computer to simulate many different Go games and figure out how the game will progress in the future. The computer needs to rely on intuition the same way that human Go players can look at a board and get kind of a sixth sense that tells them whether the game is going well or poorly for them, and where they ought to put the next stone. It’s just computationally infeasible to tackle this problem by explicitly calculating what each player should do next.
Richard: So what the DeepMind team does, is they have one network for what’s called value learning and another deep network for policy learning. So the policy is basically, given the state of the board, which places should I evaluate for putting the next piece. The value network is more like, looking at the state of a board, how good is that state of the world, in terms of, essentially, probability that the agent will be winning. And then they essentially do a tree search, a Monte Carlo tree search, which basically means, it has some randomness to it and they try many different paths through this. But it’s on the order of thousands, it’s not nearly the order of evaluations, that DeepBlue had for Chess evaluation. So it’s much more like a human considering a handful of different moves and trying to determine how good those moves – perhaps a few moves out – would be, instead of trying to exhaustively search some space.
Ian: From 2012 to 2015 we saw a lot of breakthroughs where the exciting thing was that AI was able to copy a human ability. So for example, we saw a lot of work on object recognition where humans would take millions of different images, label what each of those images were – saying, you know, this is a photo of a dog, this is a photo of a cat, and so on. And then the neural network could be trained to copy what the human had done. And we used to get very excited when neural networks copying human demonstrations could get close to human level accuracy on these tasks. In 2016, we started to see breakthroughs that were all about exceeding human performance. Part of what was so exciting about AlphaGo was that AlphaGo did not only learn how to predict what a human expert Go player would do, AlphaGo also improved beyond that by practicing playing games against itself and learning how to be better than the best human player. So we’re starting to see AI move beyond what humans can tell the computer to do. And the AI can actually figure out how to do something better than the best human.
Richard: Playing against itself was a very powerful technique in order to let it improve and figure out its weaknesses. It’s actually kind of similar in a way to generative adversarial networks in that there are different adversarial components.
Ariel: Can you explain both generative adversarial networks but also what adversarial is?
Richard: So I’ll give this to Ian since he invented them.
Ian: So the idea of an AI getting better through self-play dates back very far, in fact at least as far as 1959, when a researcher named Arthur Samuel developed and AI that could play Checkers against itself. That same basic idea is the way that AlphaGo is able to improve on its own abilities and exceed human performance. But the fact that this strategy has been around since 1959, and we were not able to use it to beat the best human player at Go until 2016 shows a lot about how difficult it is to hone our implementation of a core idea, and how important it was to build up data sets, computing infrastructure, and really refine the specifics of how we implement the idea. And when you consider that AlphaGo was the culmination of that many decades of refinement, you see what a difficult problem it was they solved, and how impressive their victory is. This idea of self-play has also been important in other parts of machine learning. I myself am best known for inventing an algorithm called generative adversarial networks. The idea of generative adversarial networks is to create new experiences that resemble past experiences the AI has had. For example, if you train the AI to look at a bunch of images it can imagine new images that appear realistic but have never been seen before. The word adversarial means that there are two different players that are adversaries. One of them is the generator network that actually creates the new images. The other one is the discriminator network. You can think of the discriminator network as being like an art critic that looks at the image and says whether it thinks it’s real or fake. The generator and the discriminator have to play a game against each other where the generator wants to convince the discriminator that everything it produces is real, and the discriminator wants to correctly identify which images are real and came from the training data, and which images are fake and came from the generator. So just as AlphaGo performs really well because it plays Go against itself for a long time, generative adversarial networks are able to produce very complicated images that exceed the quality of images produced by previous AIs because they’re able to improve through this same process of playing a game against another AI.
Ariel: So how will this be applied to applications that we’ll interact with on a regular basis? AlphaGo makes the news when it achieves some feat, such as beating Lee Sedol, but how will we start to see these technologies and techniques in action ourselves?
Richard: So with a lot of these techniques, a lot of them are research systems. So it’s not necessarily that they’re going to directly go down the pipeline towards productization, but what they are doing is helping the models that are implicitly learned inside of AI systems and machine learning systems to get much better. So even something like these generative adversarial networks that are learning really compact and pretty realistic models of the world, or some slice of the world that you’re interested in, parts of that can then be repurposed. Actually, Ian, do you know of near-term applications for, for instance, GANs-
Ian: -for generative modeling, sure. So I should be clear that there are other kinds of generative models besides generative adversarial networks. We’ve discussed generative adversarial networks because they resemble AlphaGo in the sense that it’s an AI playing a game. There are other strategies for generating new experiences that resemble previously seen experiences. One of them is called WaveNet. It’s a model produced by DeepMind in 2016 for generating speech. So if you provide a sentence, just written down, and you’d like to hear that sentence spoken aloud, WaveNet can create an audio waveform that sounds very realistically like a human pronouncing that sentence written down. The main drawback to WaveNet that prevents it from being deployed on your Android phone right now is that it’s fairly slow. It has to generate the audio waveform one piece at a time. The audio waveform is made up of something like, I forget exactly how many, but tens of thousands of different values per second describing the shape of the waveform that needs to be played over the speakers for you to hear. And each of those tens of thousands of samples needs to be generated one at a time. We can imagine that over the next year or two, we’ll probably see other kinds of generative models, maybe generative adversarial networks, maybe variational auto encoders, figure out a way of producing the audio waveform all in one shot so that you can actually have an interactive conversation with your smartphone. Right now I believe it takes WaveNet two minutes to produce one second of audio, so it’s not able to make the audio fast enough to hold an interactive conversation with you.
Richard: So actually there were some interesting applications of conditional generative adversarial networks this past year that might actually have some immediate commercial applications.
Ariel: And can you explain what the conditional part means as well?
Richard: So for that part I’ll turn it to Ian since he is the master of GANs.
Ian: All right, so a conditional model is a generative model that can produce some specific kind of experience given some input. So WaveNet is an example of a conditional generative model. You give a text sentence and it gives you a spoken waveform that contains the same sentence that was written down in the text you provided as input.
Richard: And similarly, we’ve seen applications to colorizing black and white photos, or turning sketches into somewhat photo-realistic images, being able to turn text into photos. When I say ‘photos’ there, the photos were never taken by any camera; they’re essentially computationally imagined by these generative models.
Ian: Yeah one thing that really highlights how far we’ve come is that in 2014 one of the big breakthroughs that year was several different groups developed the ability to take a photo and produce a sentence summarizing what was in the photo. In 2016, we saw really a lot of people using different methods for taking a sentence and producing a photo that contains the imagery described by the sentence. It’s much more complicated to go from a few words to a very realistic image containing thousands or millions of pixels than it is to go from the image to the words, because when you just write down the word ‘car’, you don’t need to imagine all the specific details of the car. When you need to actually generate an image of the car, you can’t really skimp on any of the details. You need to figure out what color paint the car has, how does the light reflect off of the car, what kind of road is it driving on – all of these many different details that force the model to know a lot more about the world. And 2016 was the year that we started to see those methods actually kind of work. They were there in 2015, but 2016 was when they really started working.
Richard: And I just wanted to interject, we did see some of this last year but the resolution was very, very low. It was something like 16×16, or perhaps 32×32, and it looked kind of fuzzy. As where what we’re seeing today are really crisp looking, almost realistic looking, I think 256×256.
Ian: Another thing that was very exciting with conditional generative models in 2016 was the use of generative models for drug discovery. That, instead of imagining new images, the model could actually imagine new molecules that are intended to have specific medicinal effects. I don’t know enough about chemistry to tell you very much about that in detail, but we’re starting to see generative models actually get picked up by industries other than the software industry, and used to affect things in the real world. I’ve actually met two researchers from Insilico Medicine who have themselves taken molecules that were developed from generative adversarial networks, and they’ve published a paper about the way that they were able to design these molecules.
Richard: Yeah, and so specifically, generative adversarial auto encoders have been successful at generating new molecular fingerprints given some parameters that we want to have true for the given new molecule. And this is pretty exciting because this is being applied towards cancer research, so developing potential new cancer treatments.
Ariel: Yeah that was one that I heard about that was actually sort of amazing to me. I’m wondering, can you guys foresee other areas like this where we can apply AI that may not seem obvious initially? I wouldn’t have naturally jumped to – it would go from creating images to creating new molecules.
Richard: Frankly, it’s a wide open area. Whatever humans do could in theory be done by, not necessarily today’s systems, but by advanced machine learning and AI systems in general. But in terms of specific tasks that are probably near term, it’s probably whatever people want to focus on. So if it’s things like, similarly designing molecules for building materials instead of drugs. So finding things with superior physical properties…
Ian: One thing we’ve actually started to see in 2016 is turning neural networks back on themselves, using neural networks to design new neural networks. There’s a paper from Google Brain about using reinforcement learning to design the structure of neural networks and make them perform better.
Richard: Yeah, there’s actually a whole bunch of papers this past year on what’s called ‘Auto ML’ or Automatic Machine Learning – also called learning to learn. So, teams from Google Brain, from MIT, from OpenAI, and other places have applied deep learning to deep learning and have applied reinforcement learning to reinforcement learning, and a combination of those. So we’ve actually been able to generate models that are superior to the models that are designed by humans – or I should say, architecture superior to the architecture designed by humans for certain types of problems. And that’s actually new this past year. The whole learning to learn concept had existed before that, but actually getting high quality results like that is new this past year.
Ian: One comment about generating molecules with generative models is that the generative models don’t actually do quite what we would like them to do yet. What they do is they give us more molecules that behave similarly to molecules that we have seen in the past. So if we can come up with a list of ten molecules that we liked a lot, the generative model will give us as many more molecules as we want that it thinks will have similar properties. What we would really like to do is have a machine learning model that gives us much better molecules that are more effective. The research paper from Insilico Medicine partially addresses this idea by having the generative model try to understand the concentration at which the molecule is effective and then they can tell the generative model – imagine that the effective concentration is very low, “Create me me some molecules that work at this concentration.” But it isn’t quite yet going for this procedure of optimization, rather than copying past experiences. I think in future years, we’ll start to actually see neural networks that can design totally new things that outperform what they’ve seen in the past, rather than generating new things that replicate the performance of examples they’ve seen in the past.
Ariel: Ok, that’ll be exciting. So, I also wanted to go back to the other one that I thought seemed to make really big headlines, and that was just this past November, with Google’s language translation program– Google Neural Machine Translation. Can you guys talk a little bit about what that did and why that was a big deal?
Ian: Yeah, I am happy to. It’s a big deal for two different reasons. First, Google Neural Machine Translation is quite a lot better than previous, more hand-designed approaches to machine translation. It used to be that we would, as human beings, sit down and look at the structure of the problem and try to write down a piece of software that could translate things relatively effectively, using our ideas about how language is structured. Google Neural Machine Translation removes a lot of the human design elements, and just has a neural network figure out what to do. This is more or less the same pattern that we saw, where neural networks replaced hand-designed approaches to object recognition, and computer vision, and also speech recognition. Those are both application areas that all of the old approaches have more or less been replaced by neural networks and now we’re seeing the same thing happen to machine translation. What’s especially exciting is that the results are really good. If you looked at the gap in the score for the translation quality of the old, hand-designed machine translation systems, and measured how far they had to go to get to human-level scores on the translation task, Google Neural Machine Translation moves a little bit more than halfway across that gap. So that was a pretty big step to make all at once.
The other thing that’s really exciting about Google Neural Machine Translation is that the machine translation models have developed what we call an “Interlingua”. The activities of the neurons in the model are essentially a new language that can be used to represent the idea that was contained in any human language – whether you express the idea in English or in French, it becomes encoded in almost the same set of neural activations. And then that set of neural activations can be decoded to any other language, that the translation system has been trained on. It used to be that if you wanted to translate from Japanese to Korean, you had to find a lot of sentences that had been translated from Japanese to Korean before, and then you could train a machine learning model to copy that translation procedure. But now, if you already know how to translate from English to Korean, and you know how to translate from English to Japanese, in the middle you have Interlingua. So you translate from English to Interlingua and then to Japanese, English to Interlingua and then to Korean. You can also just translate Japanese to Interlingua and Korean to Interlingua and then Interlingua to Japanese or Korean, and you never actually have to get translated sentences from every pair of languages. This is especially powerful because it means that the whole system can be trained on documents in all of the different languages, and so the total number of documents used to train it is a lot larger and it can actually improve the performance across the board, and it will get even better at translating languages that it knew how to translate before.
Ariel: Ok, so then my next question about that is, how can the techniques that are used for language apply elsewhere? How do you anticipate seeing this developed in 2017 and onward?
Richard: So I think what we’ve learned from the approach is that deep learning systems are able to create extremely rich models of the world that are sufficiently detailed that they can actually express what we can think, which is actually a pretty exciting milestone. Being able to combine that Interlingua with more structured information about the world is something that a variety of teams are working on. And there was some progress this past year, but it is certainly a big, open area for the coming years. Once that occurs – and this actually is a pretty logical next step for this type of system – it would be able to drive all kinds of different natural language tasks. So being able to parse documents much more deeply at a semantic level, being able to drive things like intelligent agents or – they used to be called chatbots, but are much more potentially intelligent now. And so I think we’ll be seeing a lot more of that in the coming year and many years, and that will be pretty exciting.
Ian: One of the really key ideas we can see from Google Neural Machine Translation is that the system became a lot better when it was applied to very many different languages, instead of just translating from one language to one other language. That same idea appears in many different aspects of machine learning and artificial intelligence. At OpenAI one of our largest projects is working on developing a reinforcement learning agent that can function in very many different environments. So far this has mostly been a massive engineering effort, where we built the system for evaluating an agent in many different environments. That evaluation framework is called Universe. We released that in open source in December. Essentially Universe allows a reinforcement learning agent to play very many different computer games, and it interacts with these games in the same way that a human does, by sending key presses or mouse strokes to the actual game engine, and that means that the same reinforcement learning agent is able to interact with basically anything that a human can interact with on a computer. Over the next year we’re planning to work very hard on developing agents that actually use this new framework we’ve built, and we’d like to see agents that are able to try out a few different racing games and then be able to play any racing game, agents that tried a few different platformer games and then can play any platformer game, and even agents that can learn how to use a web browser. By having one agent that can do all of these different things we will really exercise our ability to create general artificial intelligence instead of application-specific artificial intelligence. And projects like Google’s Interlingua have shown us that there’s a lot of reason to believe that this will work, and that using one system to do many different things will cause it to become better at all of them.
Ariel: That sounds awesome. My next question then is just, what else happened this year that you guys think is important to mention?
Richard: Yeah, so I think being able to combine deep learning with what’s called one-shot learning – and here we’ve been talking about (sort of) zero-shot learning. Zero-shot is when an entirely new task is attempted after having seen some related information, as Ian was talking about. One-shot is when you see just a little bit of data, potentially just one data point, regarding some new task or some new category, and you’re then able to – based on other background information – deduce what that class should look like or what that function should look like in general. So being able to train systems on very little data from just general background knowledge, will be pretty exciting.
DeepMind released a paper in June to this effect. There was also another team based in the U.K. with a paper titled One-Shot Learning in Discriminative Neural Networks. These sorts of developments will feed into this ability to combine what Ian described before as this back-propagatable or differentiable component with this capability of learning from very little data, which I think is quite exciting.
Ian: One thing that I’m very excited about is the emergence of the new field of machine learning security. In the past in computer security we’ve seen security topics like application security, where an attacker can fool a computer into running the wrong program instructions, or network security, where an attacker can intercept a message that they’re not meant to have access to, or they can send a message that fools the recipient into thinking they’re someone else, and can get into your bank account, and so on. There’s this new area called machine learning security where an attacker can trick a machine learning system into taking the wrong action. For example, we’ve seen that it’s very easy to fool an object-recognition system. We can show it an image that looks a lot like a panda and it gets recognized as being a school bus, or vice versa. This year we really saw a lot in increasing engagement from the traditional computer security committee, and a lot of growth in this field. Some of the most exciting results this year revolved around the idea that it’s actually possible to fool machine learning systems with physical objects. Previously, in 2013-2015, we had worked on changing bits that were fed directly to a machine learning model, but now we can actually build objects that fool a system that sees it. There was a paper called Accessorize to a Crime, that showed that by wearing unusually-colored glasses it’s possible to thwart a face recognition system. And my own collaborators at GoogleBrain and I wrote a paper called Adversarial Examples in the Physical World, where we showed that we can make images that look kind of grainy and noisy, but when viewed through a camera we can control how an object-recognition system will respond to those images. Overall I think it’s really great that a lot of people are getting interested in this field of how to fool machine learning models and how to prevent machine learning models from being fooled, because they give us a concrete path to study some of the really hard AI control problems that we think will become so much of a larger issue in the future, and in particular, that I imagine a lot of the Future of Life Institute listeners are most interested in.
Ariel: Yeah, definitely. So is there anything else that you guys wanted to mention that you thought was either important for 2016 or looking forward to 2017?
Richard: Yeah, so looking forward to 2017 I think there will be more focus on unsupervised learning. This is where there isn’t labelled data; there isn’t annotated data, necessarily, coming from humans or some other source, but purely based on very raw and unannotated data that the machine learning algorithms will be able to understand structure in the world and will be able to leverage that structure to understand further unstructured data. So we’re looking forward to developments in that this year.
Ariel: And how does that apply to more real-world applications?
Richard: Yeah, so if we are able to get progress there, that can unlock quite a variety of different things. Most of the world is not annotated by humans. There aren’t little sticky notes on things around the house saying what they are, or sticky notes inside a document describing why it’s a table you’re referring to and not the bowling ball that you may have thrown at the table, when trying to resolve what the word ‘it’ is referring to. So being able to process all of this in a much more unsupervised way will pretty much unlock a plethora of new applications.
Ian: And it will also make AI more democratic. Right now, if you want to use really advanced AI you need to have not only a lot of computers but also a lot of data. That’s part of why it’s mostly very large companies that are competitive in the AI space, is that these companies have been able to amass data sets that no one else has access to. If you want to get really good at a task you basically become good at that task by showing the computer a million different examples where a human has shown what you want the computer to do. In the future, we’ll have AI that can learn much more like a human learns. Where just showing it a few examples is enough for it to get the basic idea. For example, when I was at Google I worked on a system that can read street address numbers in order to add buildings to Google Maps, and we trained that using over 10 million different images that had been labelled with the address number in the image. In the end it was able to read at human level accuracy. So the outcome is the same as what we can get with a human being, but the learning process was really inefficient. If you teach a child to learn to read, you don’t need to take them on a road trip around the world and show them 10 million different houses that all have address numbers for them to look at. They get the idea from just a few numbers that you show them in kindergarten or at home before kindergarten when you’re teaching them to recognize the different numbers. Once we have machine learning systems that are able to get the general idea of what’s going on very quickly, in the way that humans do, it won’t be necessary to build these gigantic data sets anymore. You might be able to get some competitive advantage from doing so but it won’t be strictly necessary to get acceptable performance.
Ariel: Ok, well I like the idea of ending on a democratic note. Is there anything else either of you wanted to add?
Richard: One application area I think will be important this coming year is automatic detection of fake news, fake audio and fake images and fake video. Some of the applications this past year have actually focused on generating additional frames of video. Those are relatively primitive at the moment, and in terms of the amount of video that they’re going to be generating after the amount of video that they’re shown. But as those get better, as the photo generation that we talked about earlier gets better, and also as audio templating gets better… I think it was Adobe that demoed what they colloquially called PhotoShop for Voice, where you can type something in and select essentially a font, which is a person, and it will sound like that person saying whatever it is that you typed. So we’ll need ways of detecting that, since this whole concept of fake news is quite at the fore these days.
Ariel: Is that something that AI researchers are also working on? The idea of being able to track whether something has been modified?
Richard: Yeah, I think some are, but I think more need to be. And for that matter, just trying to classify actual news articles even just from the text, of whether there’s fake news in there or not. It’s kind of an open question as to whether that’s possible with today’s technology.
Ian: It’s worth mentioning that, while it may be very difficult for an AI to read a story and understand that story and think about whether that story is plausible, there are other ways of addressing the spread of fake news. We know that fake news tends to spread through social networks, using the same kind of spammy posting behavior that we see for spreading email spam. So email spam doesn’t work by having an AI read your email, understand what the email means, and reason about whether you really want to see an ad for Viagra or not. Email spam uses a lot of different clues that it can statistically associate with whether people mark the email as spam or not. And those clues can include things like who the sender is. So it may be that we’ll start to see social networks offer warning labels on specific URLs, saying that certain websites are known for spreading fake news stories and things like that. It’s not only a technological problem, and by just leveraging better policies and better social practices, we can actually do a lot without needing to advance the underlying AI systems at all.
Ariel: So is there anything that you’re worried about, based on advances that you’ve seen in the last year? Or are you mostly just excited across the board.
Ian: I personally am most concerned about the employment issue. How will we make sure that everyone benefits from that automation? And the way that society is structured, right now increasing automation seems to lead to increasing concentration of wealth, and there are winners and losers to every advance. My concern is that automating jobs that are done by millions of people will create very many losers and a small number of winners who really win big. And I hope that we’re able to find a solution where the benefits of automation can be shared broadly across society.
Richard: I share that concern as well. I’m also slightly concerned with the speed at which we’re approaching additional generality. It’s extremely cool to see systems be able to do lots of different things, and be able to do them in different modalities and different contexts, and being able to do tasks that they’ve either seen very little of or none of before. But it raises questions as to where do we consider the actual line to be as to when we get serious about things past this point should strictly implement all of the 200 different types of safety techniques that we’re able to enumerate. So I don’t think that we’re at that point yet, but it certainly raises the issue.
Ariel: Ok. Well I would like to end on a positive note. So we’ve just asked about concerns – looking back on what you saw last year, what has you most hopeful for our future?
Ian: I think it’s really great that AI is starting to be used for things like medicine. A lot of why I got into AI is that I thought that AI is something of a meta-solution, where there may be many problems where it’s very difficult for us to come up with the solution using our human ingenuity, but AI could be even more ingenious than us, and could come up with really great solutions to problems that have exceeded our own ability. So in the last year we’ve seen a lot of different machine learning algorithms that could exceed human abilities at some tasks, and we’ve also started to see the application of AI to life-saving application areas like designing new medicines. And this makes me very hopeful that we’re going to start seeing superhuman drug design, and other kinds of applications of AI to just really make life better for a lot of people in ways that we would not have been able to do without it.
Ariel: Ok, and Richard?
Richard: Yeah I’m excited about the possibilities of what these advancements will bring in the coming years. Various kinds of tasks that people find to be drudgery within their jobs will be automatable. That will lead them to be open to working on more value-added things with more creativity, and potentially be able to work in more interesting areas of their field or across different fields. I think the future is wide open and it’s really what we make of it, which is exciting in itself.
Ariel: Alright, well thank you both so much.
Richard: Thank you, Ariel.
Ian: You’re very welcome, thank you for inviting us.
[end of recorded material]