Making Deep Learning More Robust

Imagine how much more efficient lawyers could be if they had the time to read every legal book ever written and review every case ever brought to court. Imagine doctors with the ability to study every advancement published across the world’s medical journals, or consult every medical case, ever. Unfortunately, the human brain cannot store that much information, and it would take decades to achieve these feats.

But a computer, one specifically designed to work like the human mind, could.

Deep learning neural networks are designed to mimic the human brain’s neural connections. They are capable of learning through continuous exposure to huge amounts of data. This allows them to recognize patterns, comprehend complex concepts, and translate high-level abstractions. These networks consist of many layers, each having a different set of weights. The deeper the network, the stronger it is.

Current applications for these networks include medical diagnosis, robotics and engineering, face recognition, and automotive navigation. However, deep learning is still in development – not surprisingly, it is a huge undertaking to get machines to think like humans. In fact, very little is understood about these networks, and months of manual tuning are often required for obtaining excellent performance.

Fuxin Li, assistant professor at the Oregon State University School of Electrical Engineering and Computer Science, and his team are taking on the accuracy of these neural networks under adversarial conditions. Their research focuses on the basic machine learning aspects of deep learning, and how to make general deep learning more robust.

To try to better understand when a deep convolutional neural network (CNN) is going to be right or wrong, Li’s team had to establish an estimate of confidence in the predictions of the deep learning architecture. Those estimates can be used as safeguards when utilizing the networks in real life.

“Basically,” explains Li, “trying to make deep learning increasingly self-aware – to be aware of what type of data it has seen, and what type of data it could work on.”

The team looked at recent advances in deep learning, which have greatly improved the capability to recognize images automatically. Those networks, albeit very resistant to overfitting, were discovered to completely fail if some of the pixels in such images were perturbed via an adversarial optimization algorithm.

To a human observer, the image in question may look fine, but the deep network sees otherwise. According to the researchers, those adversarial examples are dangerous if a deep network is utilized into any crucial real application, such as autonomous driving. If the result of the network can be hacked, wrong authentications and other devastating effects would be unavoidable.

In a departure from previous perspectives that focused on improving the classifiers to correctly organize the adversarial examples, the team focused on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The accuracy for detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%.

The benefits of this research are numerous. It is vital for a neural network to be able to identify whether an example comes from a normal or an adversarial distribution. Such knowledge, if available, will help significantly to control behaviors of robots employing deep learning. A reliable procedure can prevent robots from behaving in an undesirable manner because of the false perceptions it made about the environment.

Li gives one example: “In robotics there’s this big issue about robots not doing something based on erroneous perception. It’s important for a robot to know that it’s not making a confident perception. For example, if [the robot] is saying there’s an object over there, but it’s actually a wall, he’ll go to fetch that object, and then he hits a wall.”

Hopefully, Li says, that won’t happen. However, current software and machine learning have been mostly based solely on prediction confidence within the original machine learning framework. Basically, the testing and training data are assumed to be pulled from the same distribution independently, and that can lead to incorrect assumptions.

Better confidence estimates could potentially help avoid incidents such as the Tesla crash scenario from May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. But first, the computer must be smarter. The computer has to learn to detect objects and differentiate, say, a tree from another vehicle.

“To make it really robust, you need to account for unknown objects. Something weird may hit you. A deer may jump out.” The network can’t be taught every unexpected situation, says Li, “so you need it to discover them without knowledge of what they are. That’s something that we do. We try to bridge the gap.”

Training procedures will make deep learning more automatic and lead to fewer failures, as well as confidence estimates when the deep network is utilized to predict new data. Most of this training, explains Li, comes from photo distribution using stock images. However, these are flat images much different than what a robot would normally see in day-to-day life. It’s difficult to get a 360-degree view just by looking at photos.

“There will be a big difference between the thing [the robot] trains on and the thing it really sees. So then, it is important for the robot to understand that it can predict some things confidently, and others it cannot,” says Li. “[The robot] needs to understand that it probably predicted wrong, so as not to act too aggressively toward its prediction.” This can only be achieved with a more self-aware framework, which is what Li is trying to develop with this grant.

Further, these estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Soon, Li and his team will start generalizing the approach to other domains, such as temporal models (RNNs, LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-making paradigms.

Li’s most recent update on this work can be found here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence and the King Midas Problem

Value alignment. It’s a phrase that often pops up in discussions about the safety and ethics of artificial intelligence. How can scientists create AI with goals and values that align with those of the people it interacts with?

Very simple robots with very constrained tasks do not need goals or values at all. Although the Roomba’s designers know you want a clean floor, Roomba doesn’t: it simply executes a procedure that the Roomba’s designers predict will work—most of the time. If your kitten leaves a messy pile on the carpet, Roomba will dutifully smear it all over the living room. If we keep programming smarter and smarter robots, then by the late 2020s, you may be able to ask your wonderful domestic robot to cook a tasty, high-protein dinner. But if you forgot to buy any meat, you may come home to a hot meal but find the aforementioned cat has mysteriously vanished. The robot, designed for chores, doesn’t understand that the sentimental value of the cat exceeds its nutritional value.

AI and King Midas

Stuart Russell, a renowned AI researcher, compares the challenge of defining a robot’s objective to the King Midas myth. “The robot,” says Russell, “has some objective and pursues it brilliantly to the destruction of mankind. And it’s because it’s the wrong objective. It’s the old King Midas problem.”

This is one of the big problems in AI safety that Russell is trying to solve. “We’ve got to get the right objective,” he explains, “and since we don’t seem to know how to program it, the right answer seems to be that the robot should learn – from interacting with and watching humans – what it is humans care about.”

Russell works from the assumption that the robot will solve whatever formal problem we define. Rather than assuming that the robot should optimize a given objective, Russell defines the problem as a two-player game (“game” as used by economists, meaning a decision problem with multiple agents) called cooperative inverse reinforcement learning (CIRL).

A CIRL game includes a person and a robot: the robot’s only purpose is to make the person happy, but it doesn’t know what the person wants. Fortunately, it can learn more about what the person wants by observing her behavior. For example, if a robot observed the human’s morning routine, it should discover how important coffee is—not to itself, of course (we don’t want robots drinking coffee), but to the human. Then, it will make coffee for the person without being asked.

What about the example of King Midas, who thought he wanted as much gold as possible? How does the approach deal with someone stating their wishes incorrectly?

“We don’t have to know what we want. That’s the whole point. It’s hard to know what we want, and what we ask for is evidence for what we really want, but it’s not ground truth,” says Russell.

The robot should learn what people want by watching their actions. Then, if someone asks for something that’s technically incorrect, the robot will understand their underlying wish. Russell adds, “The robot—if it had been there—would have told Midas that he didn’t really want everything turned to gold, maybe just a few choice objects that he might point at from time to time and say the magic word.”

AI Off-Switch

Russell and his Berkeley colleagues also recently announced further progress toward ensuring safe AI, with a paper on ensuring an AI’s off-switch is always accessible. An off-switch is often suggested to guarantee a robot doesn’t harm humans – if the robot goes bad, just turn it off! But what if the AI anticipates that and disables its off switch?

Russell and his team propose solving this by providing a level of uncertainty in the robot’s objective. They created another CIRL game in which the robot chooses to announce its intentions, which gives the human a chance to switch it off, or to plunge ahead with its plan, bypassing human oversight.

They showed that a robot with a fixed objective will take the second choice and disable its off-switch. But if there is uncertainty about the objective, the robot will reason that a human decision to switch the robot off would imply the announced intention is undesirable. Because its goal is to make the human happy, the robot accepts being switched off. The robot has a positive incentive not to disable the off-switch, and that incentive is directly tied to the robot’s uncertainty about the human’s true objective.

Ensuring AI Safety

In addition to his research, Russell is also one of the most vocal and active AI safety researchers concerned with ensuring a stronger public understanding of the potential issues surrounding AI development.

He recently co-authored a rebuttal to an article in the MIT Technology Review, which claimed that real AI scientists weren’t worried about the existential threat of AI. Russell and his co-author summed up why it’s better to be cautious and careful than just assume all will turn out for the best:

“Our experience with Chernobyl suggests it may be unwise to claim that a powerful technology entails no risks. It may also be unwise to claim that a powerful technology will never come to fruition. On September 11, 1933, Lord Rutherford, perhaps the world’s most eminent nuclear physicist, described the prospect of extracting energy from atoms as nothing but “moonshine.” Less than 24 hours later, Leo Szilard invented the neutron-induced nuclear chain reaction; detailed designs for nuclear reactors and nuclear weapons followed a few years later. Surely it is better to anticipate human ingenuity than to underestimate it, better to acknowledge the risks than to deny them. … [T]he risk [of AI] arises from the unpredictability and potential irreversibility of deploying an optimization process more intelligent than the humans who specified its objectives.”

This summer, Russell received a grant of over $5.5 million from the Open Philanthropy Project for a new research center, the Center for Human-Compatible Artificial Intelligence, in Berkeley. Among the primary objectives of the Center will be to study this problem of value alignment, to continue his efforts toward provably beneficial AI, and to ensure we don’t make the same mistakes as King Midas.

“Look,” he says, “if you were King Midas, would you want your robot to say, ‘Everything turns to gold? OK, boss, you got it.’ No! You’d want it to say, ‘Are you sure? Including your food, drink, and relatives? I’m pretty sure you wouldn’t like that. How about this: you point to something and say ‘Abracadabra Aurificio’ or something, and then I’ll turn it to gold, OK?’”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Supervising AI Growth

When Apple released its software application, Siri, in 2011, iPhone users had high expectations for their intelligent personal assistants. Yet despite its impressive and growing capabilities, Siri often makes mistakes. The software’s imperfections highlight the clear limitations of current AI: today’s machine intelligence can’t understand the varied and changing needs and preferences of human life.

However, as artificial intelligence advances, experts believe that intelligent machines will eventually – and probably soon – understand the world better than humans. While it might be easy to understand how or why Siri makes a mistake, figuring out why a superintelligent AI made the decision it did will be much more challenging.

If humans cannot understand and evaluate these machines, how will they control them?

Paul Christiano, a Ph.D. student in computer science at UC Berkeley, has been working on addressing this problem. He believes that to ensure safe and beneficial AI, researchers and operators must learn to measure how well intelligent machines do what humans want, even as these machines surpass human intelligence.


Semi-supervised Learning

The most obvious way to supervise the development of an AI system also happens to be the hard way. As Christiano explains: “One way humans can communicate what they want, is by spending a lot of time digging down on some small decision that was made [by an AI], and try to evaluate how good that decision was.”

But while this is theoretically possible, the human researchers would never have the time or resources to evaluate every decision the AI made. “If you want to make a good evaluation, you could spend several hours analyzing a decision that the machine made in one second,” says Christiano.

For example, suppose an amateur chess player wants to understand a better chess player’s previous move. Merely spending a few minutes evaluating this move won’t be enough, but if she spends a few hours she could consider every alternative and develop a meaningful understanding of the better player’s moves.

Fortunately for researchers, they don’t need to evaluate every decision an AI makes in order to be confident in its behavior. Instead, researchers can choose “the machine’s most interesting and informative decisions, where getting feedback would most reduce our uncertainty,“ Christiano explains.

“Say your phone pinged you about a calendar event while you were on a phone call,” he elaborates, “That event is not analogous to anything else it has done before, so it’s not sure whether it is good or bad.” Due to this uncertainty, the phone would send the transcript of its decisions to an evaluator at Google, for example. The evaluator would study the transcript, ask the phone owner how he felt about the ping, and determine whether pinging users during phone calls is a desirable or undesirable action. By providing this feedback, Google teaches the phone when it should interrupt users in the future.

This active learning process is an efficient method for humans to train AIs, but what happens when humans need to evaluate AIs that exceed human intelligence?

Consider a computer that is mastering chess. How could a human give appropriate feedback to the computer if the human has not mastered chess? The human might criticize a move that the computer makes, only to realize later that the machine was correct.

With increasingly intelligent phones and computers, a similar problem is bound to occur. Eventually, Christiano explains, “we need to handle the case where AI systems surpass human performance at basically everything.”

If a phone knows much more about the world than its human evaluators, then the evaluators cannot trust their human judgment. They will need to “enlist the help of more AI systems,” Christiano explains.


Using AIs to Evaluate Smarter AIs

When a phone pings a user while he is on a call, the user’s reaction to this decision is crucial in determining whether the phone will interrupt users during future phone calls. But, as Christiano argues, “if a more advanced machine is much better than human users at understanding the consequences of interruptions, then it might be a bad idea to just ask the human ‘should the phone have interrupted you right then?’” The human might express annoyance at the interruption, but the machine might know better and understand that this annoyance was necessary to keep the user’s life running smoothly.

In these situations, Christiano proposes that human evaluators use other intelligent machines to do the grunt work of evaluating an AI’s decisions. In practice, a less capable System 1 would be in charge of evaluating the more capable System 2. Even though System 2 is smarter, System 1 can process a large amount of information quickly, and can understand how System 2 should revise its behavior. The human trainers would still provide input and oversee the process, but their role would be limited.

This training process would help Google understand how to create a safer and more intelligent AI – System 3 – which the human researchers could then train using System 2.

Christiano explains that these intelligent machines would be like little agents that carry out tasks for humans. Siri already has this limited ability to take human input and figure out what the human wants, but as AI technology advances, machines will learn to carry out complex tasks that humans cannot fully understand.


Can We Ensure that an AI Holds Human Values?

As Google and other tech companies continue to improve their intelligent machines with each evaluation, the human trainers will fulfill a smaller role. Eventually, Christiano explains, “it’s effectively just one machine evaluating another machine’s behavior.”

Ideally, “each time you build a more powerful machine, it effectively models human values and does what humans would like,” says Christiano. But he worries that these machines may stray from human values as they surpass human intelligence. To put this in human terms: a complex intelligent machine would resemble a large organization of humans. If the organization does tasks that are too complex for any individual human to understand, it may pursue goals that humans wouldn’t like.

In order to address these control issues, Christiano is working on an “end-to-end description of this machine learning process, fleshing out key technical problems that seem most relevant.” His research will help bolster the understanding of how humans can use AI systems to evaluate the behavior of more advanced AI systems. If his work succeeds, it will be a significant step in building trustworthy artificial intelligence.

You can learn more about Paul Christiano’s work here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Grants Timeline

Grants F.A.Q.

Grants RFP Overview

Grants Program Press Release

New International Grants Program Jump-Starts Research to Ensure AI Remains Beneficial

Elon-Musk-backed program signals growing interest in new branch of artificial intelligence research

July 1, 2015
Amid rapid industry investment in developing smarter artificial intelligence, a new branch of research has begun to take off aimed at ensuring that society can reap the benefits of AI while avoiding potential pitfalls.

The Boston-based Future of Life Institute (FLI) announced the selection of 37 research teams around the world to which it plans to award about $7 million from Elon Musk and the Open Philanthropy Project as part of a first-of-its-kind grant program dedicated to “keeping AI robust and beneficial”. The program launches as an increasing number of high-profile figures including Bill Gates, Elon Musk and Stephen Hawking voice concerns about the possibility of powerful AI systems having unintended, or even potentially disastrous, consequences. The winning teams, chosen from nearly 300 applicants worldwide, will research a host of questions in computer science, law, policy, economics, and other fields relevant to coming advances in AI.

The 37 projects being funded include:

  • Three projects developing techniques for AI systems to learn what humans prefer from observing our behavior, including projects at UC Berkeley and Oxford University
  • A project by Benja Fallenstein at the Machine Intelligence Research Institute on how to keep the interests of superintelligent systems aligned with human values
  • A project led by Manuela Veloso from Carnegie Mellon University on making AI systems explain their decisions to humans
  • A study by Michael Webb of Stanford University on how to keep the economic impacts of AI beneficial
  • A project headed by Heather Roff studying how to keep AI-driven weapons under “meaningful human control”
  • A new Oxford-Cambridge research center for studying AI-relevant policy

As Skype founder Jaan Tallinn, one of FLI’s founders, has described this new research direction, “Building advanced AI is like launching a rocket. The first challenge is to maximize acceleration, but once it starts picking up speed, you also need to to focus on steering.”

When the Future of Life Institute issued an open letter in January calling for research on how to keep AI both robust and beneficial, it was signed by a long list of AI researchers from academia, nonprofits and industry, including AI research leaders from Facebook, IBM, and Microsoft and the founders of Google’s DeepMind Technologies. It was seeing that widespread agreement that moved Elon Musk to seed the research program that has now begun.

“Here are all these leading AI researchers saying that AI safety is important”, said Musk at the time. “I agree with them, so I’m today committing $10M to support research aimed at keeping AI beneficial for humanity.”

“I am glad to have an opportunity to carry this research focused on increasing the transparency of AI robotic systems,” said Manuela Veloso, past president of the Association for the Advancement of Artificial Intelligence (AAAI) and winner of one of the grants.

“This grant program was much needed: because of its emphasis on safe AI and multidisciplinarity, it fills a gap in the overall scenario of international funding programs,” added Prof. Francesca Rossi, president of the International Joint Conference on Artificial Intelligence (IJCAI), also a grant awardee.

Tom Dietterich, president of the AAAI, described how his grant — a project studying methods for AI learning systems to self-diagnose when failing to cope with a new situation — breaks the mold of traditional research:

“In its early days, AI research focused on the ‘known knowns’ by working on problems such as chess and blocks world planning, where everything about the world was known exactly. Starting in the 1980s, AI research began studying the ‘known unknowns’ by using probability distributions to represent and quantify the likelihood of alternative possible worlds. The FLI grant will launch work on the ‘unknown unknowns’: How can an AI system behave carefully and conservatively in a world populated by unknown unknowns — aspects that the designers of the AI system have not anticipated at all?”

As Terminator Genisys debuts this week, organizers stressed the importance of separating fact from fiction. “The danger with the Terminator scenario isn’t that it will happen, but that it distracts from the real issues posed by future AI”, said FLI president Max Tegmark. “We’re staying focused, and the 37 teams supported by today’s grants should help solve such real issues.”

The full list of research grant winners can be found here. The plan is to fund these teams for up to three years, with most of the research projects starting by September 2015, and to focus the remaining $4M of the Musk-backed program on the areas that emerge as most promising.

FLI has a mission to catalyze and support research and initiatives for safeguarding life and developing optimistic visions of the future, including positive ways for humanity to steer its own course considering new technologies and challenges.

Contacts at the Future of Life Institute:

  • Max Tegmark:
  • Meia Chita-Tegmark:
  • Jaan Tallinn:
  • Anthony Aguirre:
  • Viktoriya Krakovna:
  • Jesse Galef:


Elon Musk donates $10M to keep AI beneficial

Thursday January 15, 2015

We are delighted to report that technology inventor Elon Musk, creator of Tesla and SpaceX, has decided to donate $10M to the Future of Life Institute to run a global research program aimed at keeping AI beneficial to humanity.

There is now a broad consensus that AI research is progressing steadily, and that its impact on society is likely to increase. A long list of leading AI-researchers have signed an open letter calling for research aimed at ensuring that AI systems are robust and beneficial, doing what we want them to do. Musk’s donation aims to support precisely this type of research: “Here are all these leading AI researchers saying that AI safety is important”, says Elon Musk. “I agree with them, so I’m today committing $10M to support research aimed at keeping AI beneficial for humanity.”

Musk’s announcement was welcomed by AI leaders in both academia and industry:

“It’s wonderful, because this will provide the impetus to jump-start research on AI safety”, said AAAI president Tom Dietterich. “This addresses several fundamental questions in AI research that deserve much more funding than even this donation will provide.”

“Dramatic advances in artificial intelligence are opening up a range of exciting new applications”, said Demis Hassabis, Shane Legg and Mustafa Suleyman, co-founders of DeepMind Technologies, which was recently acquired by Google. “With these newfound powers comes increased responsibility. Elon’s generous donation will support researchers as they investigate the safe and ethical use of artificial intelligence, laying foundations that will have far reaching societal impacts as these technologies continue to progress”.

Elon Musk and AAAI President Thomas Dietterich comment on the announcement
The $10M program will be administered by the Future of Life Institute, a non-profit organization whose scientific advisory board includes AI-researchers Stuart Russell and Francesca Rossi. “I love technology, because it’s what’s made 2015 better than the stone age”, says MIT professor and FLI president Max Tegmark. “Our organization studies how we can maximize the benefits of future technologies while avoiding potential pitfalls.”

The research supported by the program will be carried out around the globe via an open grants competition, through an application portal at that will open by Thursday January 22. The plan is to award the majority of the grant funds to AI researchers, and the remainder to AI-related research involving other fields such as economics, law, ethics and policy (a detailed list of examples can be found here). “Anybody can send in a grant proposal, and the best ideas will win regardless of whether they come from academia, industry or elsewhere”, says FLI co-founder Viktoriya Krakovna.

“This donation will make a major impact”, said UCSC professor and FLI co-founder Anthony Aguirre: “While heavy industry and government investment has finally brought AI from niche academic research to early forms of a potentially world-transforming technology, to date relatively little funding has been available to help ensure that this change is actually a net positive one for humanity.”

“That AI systems should be beneficial in their effect on human society is a given”, said Stuart Russell, co-author of the standard AI textbook “Artificial Intelligence: a Modern Approach”. “The research that will be funded under this program will make sure that happens. It’s an intrinsic and essential part of doing AI research.”

Skype-founder Jaan Tallinn, one of FLI’s founders, agrees: “Building advanced AI is like launching a rocket. The first challenge is to maximize acceleration, but once it starts picking up speed, you also need to to focus on steering.”

Along with research grants, the program will also include meetings and outreach programs aimed at bringing together academic AI researchers, industry AI developers and other key constituents to continue exploring how to maximize the societal benefits of AI; one such meeting was held in Puerto Rico last week with many of the open-letter signatories.

“Hopefully this grant program will help shift our focus from building things just because we can, toward building things because they are good for us in the long term”, says FLI co-founder Meia Chita-Tegmark.

Contacts at Future of Life Institute:

  • Max Tegmark:
  • Meia Chita-Tegmark:
  • Jaan Tallinn:
  • Anthony Aguirre:
  • Viktoriya Krakovna:

Contacts among AI researchers:

  • Prof. Tom Dietterich, President of the Association for the Advancement of Artificial Intelligence (AAAI), Director of Intelligent Systems:
  • Prof. Stuart Russell, Berkeley, Director of the Center for Intelligent Systems, and co-author of the standard textbook Artificial Intelligence: a Modern Approach:
  • Prof. Bart Selman, co-chair of the AAAI presidential panel on long-term AI futures:
  • Prof. Francesca Rossi, Professor of Computer Science, University of Padova and Harvard University, president of the International Joint Conference on Artificial Intelligence (IJCAI):
  • Prof. Murray Shanahan, Imperial College:

Max Tegmark interviews Elon Musk about his life, his interest in the future of humanity and the background to his donation