How Self-Driving Cars Use Probability

Even though human drivers don’t consciously think in terms of probabilities, we observe our environment and make decisions based on the likelihood of certain things happening. A driver doesn’t calculate the probability that the sports car behind her will pass her, but through observing the car’s behavior and considering similar situations in the past, she makes her best guess.

We trust probabilities because it is the only way to take action in the midst of uncertainty.

Autonomous systems such as self-driving cars will make similar decisions based on probabilities, but through a different process. Unlike a human who trusts intuition and experience, these autonomous cars calculate the probability of certain scenarios using data collectors and reasoning algorithms.


How to Determine Probability

Stefano Ermon, a computer scientist at Stanford University, wants to make self-driving cars and autonomous systems safer and more reliable by improving the way they reason probabilistically about their environment. He explains, “The challenge is that you have to take actions and you don’t know what will happen next. Probabilistic reasoning is just the idea of thinking about the world in terms of probabilities, assuming that there is uncertainty.”

There are two main components to achieve safety. First, the computer model must collect accurate data, and second, the reasoning system must be able to draw the right conclusions from the model’s data.

Ermon explains, “You need both: to build a reliable model you need a lot of data, and then you need to be able to draw the right conclusions based on the model, and that requires the artificial intelligence to think about these models accurately. Even if the model is right, but you don’t have a good way to reason about it, you can do catastrophic things.”

For example, in the context of autonomous vehicles, models use various sensors to observe the environment and collect data about countless variables, such as the behavior of the drivers around you, potholes and other obstacles in front of you, weather conditions—every possible data point.

A reasoning system then interprets this data. It uses the model’s information to decide whether the driver behind you is dangerously aggressive, if the pothole ahead will puncture your tire, if the rain is obstructing visibility, and the system continuously changes the car’s behavior to respond to these variables.

Consider the aggressive driver behind you. As Ermon explains, “Somehow you need to be able to reason about these models. You need to come up with a probability. You don’t know what the car’s going to do but you can estimate, and based on previous behavior you can say this car is likely to cut the line because it has been driving aggressively.”


Improving Probabilistic Reasoning

Ermon is creating strong algorithms that can synthesize all of the data that a model produces and make reliable decisions.

As models improve, they collect more information and capture more variables relevant to making these decisions. But as Ermon notes, “the more complicated the model is, the more variables you have, the more complicated it becomes to make the optimal decisions based on the model.”

Thus as the data collection expands, the analysis must also improve. The artificial intelligence in these cars must be able to reason with this increasingly complex data.

And this reasoning can easily go wrong. “You need to be very precise when computing these probabilities,” Ermon explains. “If the probability that a car cuts into your lane is 0.1, but you completely underestimate it and say it’s 0.01, you might end up making a fatal decision.”

To avoid fatal decisions, the artificial intelligence must be robust, but the data must also be complete. If the model collects incomplete data, “you have no guarantee that the number that you get when you run this algorithm has anything to do with the actual probability of that event,” Ermon explains.

The model and the algorithm entirely depend on each other to produce the optimal decision. If the model is incomplete and fails to capture the black ice in front of you, no reasoning system will be able to make a safe decision. And even if the model captures the black ice and every other possible variable, if the reasoning system cannot handle the complexity of this data, again the car will fail.


How Safe Will Autonomous Systems Be?

The technology in self-driving cars has made huge leaps lately, and Ermon is hopeful. “Eventually, as computers get better and algorithms get better and the models get better, hopefully we’ll be able to prevent all accidents,” he suggests.

However, there are still fundamental limitations on probabilistic reasoning. “Most computer scientists believe that it is impossible to come up with the silver bullet for this problem, an optimal algorithm that is so powerful that it can reason about all sorts of models that you can think about,” Ermon explains. “That’s the key barrier.”

But despite this barrier, self-driving cars will soon be available for consumers. Ford, for one, has promised to put its self-driving cars on the road by 2021. And while most computer scientists expect these cars to be far safer than human drivers, their success depends on their ability to reason probabilistically about their environment.

As Ermon explains, “You need to be able to estimate these kinds of probabilities because they are the building blocks that you need to make decisions.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Making Deep Learning More Robust

Imagine how much more efficient lawyers could be if they had the time to read every legal book ever written and review every case ever brought to court. Imagine doctors with the ability to study every advancement published across the world’s medical journals, or consult every medical case, ever. Unfortunately, the human brain cannot store that much information, and it would take decades to achieve these feats.

But a computer, one specifically designed to work like the human mind, could.

Deep learning neural networks are designed to mimic the human brain’s neural connections. They are capable of learning through continuous exposure to huge amounts of data. This allows them to recognize patterns, comprehend complex concepts, and translate high-level abstractions. These networks consist of many layers, each having a different set of weights. The deeper the network, the stronger it is.

Current applications for these networks include medical diagnosis, robotics and engineering, face recognition, and automotive navigation. However, deep learning is still in development – not surprisingly, it is a huge undertaking to get machines to think like humans. In fact, very little is understood about these networks, and months of manual tuning are often required for obtaining excellent performance.

Fuxin Li, assistant professor at the Oregon State University School of Electrical Engineering and Computer Science, and his team are taking on the accuracy of these neural networks under adversarial conditions. Their research focuses on the basic machine learning aspects of deep learning, and how to make general deep learning more robust.

To try to better understand when a deep convolutional neural network (CNN) is going to be right or wrong, Li’s team had to establish an estimate of confidence in the predictions of the deep learning architecture. Those estimates can be used as safeguards when utilizing the networks in real life.

“Basically,” explains Li, “trying to make deep learning increasingly self-aware – to be aware of what type of data it has seen, and what type of data it could work on.”

The team looked at recent advances in deep learning, which have greatly improved the capability to recognize images automatically. Those networks, albeit very resistant to overfitting, were discovered to completely fail if some of the pixels in such images were perturbed via an adversarial optimization algorithm.

To a human observer, the image in question may look fine, but the deep network sees otherwise. According to the researchers, those adversarial examples are dangerous if a deep network is utilized into any crucial real application, such as autonomous driving. If the result of the network can be hacked, wrong authentications and other devastating effects would be unavoidable.

In a departure from previous perspectives that focused on improving the classifiers to correctly organize the adversarial examples, the team focused on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The accuracy for detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%.

The benefits of this research are numerous. It is vital for a neural network to be able to identify whether an example comes from a normal or an adversarial distribution. Such knowledge, if available, will help significantly to control behaviors of robots employing deep learning. A reliable procedure can prevent robots from behaving in an undesirable manner because of the false perceptions it made about the environment.

Li gives one example: “In robotics there’s this big issue about robots not doing something based on erroneous perception. It’s important for a robot to know that it’s not making a confident perception. For example, if [the robot] is saying there’s an object over there, but it’s actually a wall, he’ll go to fetch that object, and then he hits a wall.”

Hopefully, Li says, that won’t happen. However, current software and machine learning have been mostly based solely on prediction confidence within the original machine learning framework. Basically, the testing and training data are assumed to be pulled from the same distribution independently, and that can lead to incorrect assumptions.

Better confidence estimates could potentially help avoid incidents such as the Tesla crash scenario from May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. But first, the computer must be smarter. The computer has to learn to detect objects and differentiate, say, a tree from another vehicle.

“To make it really robust, you need to account for unknown objects. Something weird may hit you. A deer may jump out.” The network can’t be taught every unexpected situation, says Li, “so you need it to discover them without knowledge of what they are. That’s something that we do. We try to bridge the gap.”

Training procedures will make deep learning more automatic and lead to fewer failures, as well as confidence estimates when the deep network is utilized to predict new data. Most of this training, explains Li, comes from photo distribution using stock images. However, these are flat images much different than what a robot would normally see in day-to-day life. It’s difficult to get a 360-degree view just by looking at photos.

“There will be a big difference between the thing [the robot] trains on and the thing it really sees. So then, it is important for the robot to understand that it can predict some things confidently, and others it cannot,” says Li. “[The robot] needs to understand that it probably predicted wrong, so as not to act too aggressively toward its prediction.” This can only be achieved with a more self-aware framework, which is what Li is trying to develop with this grant.

Further, these estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Soon, Li and his team will start generalizing the approach to other domains, such as temporal models (RNNs, LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-making paradigms.

Li’s most recent update on this work can be found here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

The Financial World of AI

Automated algorithms currently manage over half of trading volume in US equities, and as AI improves, it will continue to assume control over important financial decisions. But these systems aren’t foolproof. A small glitch could send shares plunging, potentially costing investors billions of dollars.

For firms, the decision to accept this risk is simple. The algorithms in automated systems are faster and more accurate than any human, and deploying the most advanced AI technology can keep firms in business.

But for the rest of society, the consequences aren’t clear. Artificial intelligence gives firms a competitive edge, but will these rapidly advancing systems remain safe and robust? What happens when they make mistakes?


Automated Errors

Michael Wellman, a professor of computer science at the University of Michigan, studies AI’s threats to the financial system. He explains, “The financial system is one of the leading edges of where AI is automating things, and it’s also an especially vulnerable sector. It can be easily disrupted, and bad things can happen.”

Consider the story of Knight Capital. On August 1, 2012, Knight decided to try out new software to stay competitive in a new trading pool. The software passed its safety tests, but when Knight deployed it, the algorithm activated its testing software instead of the live trading program. The testing software sent millions of bad orders in the following minutes as Knight frantically tried to stop it. But the damage was done.

In just 45 minutes, Knight Capital lost $440 million – nearly four times their profit in 2011 – all because of one line of code.

In this case, the damage was constrained to Knight, but what happens when one line of code can impact the entire financial system?


Understanding Autonomous Trading Agents

Wellman argues that autonomous trading agents are difficult to control because they process and respond to information at unprecedented speeds, they can be easily replicated on a large scale, they act independently, and they adapt to their environment.

With increasingly general capabilities, systems may learn to make money in dangerous ways that their programmers never intended. As Lawrence Pingree, an analyst at Gartner, said after the Knight meltdown, “Computers do what they’re told. If they’re told to do the wrong thing, they’re going to do it and they’re going to do it really, really well.”

In order to prevent AI systems from undermining market transparency and stability, government agencies and academics must learn how these agents work.


Market Manipulation

Even benign uses of AI can hinder market transparency, but Wellman worries that AI systems will learn to manipulate markets.

Autonomous trading agents are especially effective at exploiting arbitrage opportunities – where they simultaneously purchase and sell an asset to profit from pricing differences. If, for example, a stock trades at $30 in one market and $32 in a second market, an agent can buy the $30 stock and immediately sell it for $32 in the second market, making a $2 profit.

Market inefficiency naturally creates arbitrage opportunities. However, an AI may learn – on its own – to create pricing discrepancies by taking misleading actions that move the market to generate profit.

One manipulative technique is ‘spoofing’ – the act of bidding for a stock item with the intent to cancel the bid before execution. This moves the market in a certain direction, and the spoofer profits from the false signal.

Wellman and his team recently reproduced spoofing in their laboratory models, as part of an effort to understand the situations where spoofing can be effective. He explains, “We’re doing this in the laboratory to see if we can characterize the signature of AIs doing this, so that we reliably detect it and design markets to reduce vulnerability.”

As agents improve, they may learn to exploit arbitrage more maliciously by creating artificial items on the market to mislead traders, or by hacking accounts to report false events that move markets. Wellman’s work aims to produce methods to help control such manipulative behavior.


Secrecy in the Financial World

But the secretive nature of finance prevents academics from fully understanding the role of AI.

Wellman explains, “We know they use AI and machine learning to a significant extent, and they are constantly trying to improve their algorithms. We don’t know to what extent things like market manipulation and spoofing are automated right now, but we know that they could be automated and that could lead to something of an arms race between market manipulators and the systems trying to detect and run surveillance for market bad behavior.”

Government agencies – such as the Securities and Exchange Commission – watch financial markets, but “they’re really outgunned as far as the technology goes,” Wellman notes. “They don’t have the expertise or the infrastructure to keep up with how fast things are changing in the industry.”

But academics can help. According to Wellman, “even without doing the trading for money ourselves, we can reverse engineer what must be going on in the financial world and figure out what can happen.”


Preparing for Advanced AI

Although Wellman studies current and near-term AI, he’s concerned about the threat of advanced, general AI.

“One thing we can do to try to understand the far-out AI is to get experience with dealing with the near-term AI,” he explains. “That’s why we want to look at regulation of autonomous agents that are very near on the horizon or current. The hope is that we’ll learn some lessons that we can then later apply when the superintelligence comes along.”

AI systems are improving rapidly, and there is intense competition between financial firms to use them. Understanding and tracking AI’s role in finance will help financial markets remain stable and transparent.

“We may not be able to manage this threat with 100% reliability,” Wellman admits, “but I’m hopeful that we can redesign markets to make them safer for the AIs and eliminate some forms of the arms race, and that we’ll be able to get a good handle on preventing some of the most egregious behaviors.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Silo Busting in AI Research

Artificial intelligence may seem like a computer science project, but if it’s going to successfully integrate with society, then social scientists must be more involved.

Developing an intelligent machine is not merely a problem of modifying algorithms in a lab. These machines must be aligned with human values, and this requires a deep understanding of ethics and the social consequences of deploying intelligent machines.

Getting people with a variety of backgrounds together seems logical enough in theory, but in practice, what happens when computer scientists, AI developers, economists, philosophers, and psychologists try to discuss AI issues? Do any of them even speak the same language?

Social scientists and computer scientists will come at AI problems from very different directions. And if they collaborate, everybody wins. Social scientists can learn about the complex tools and algorithms used in computer science labs, and computer scientists can become more attuned to the social and ethical implications of advanced AI.

Through transdisciplinary learning, both fields will be better equipped to handle the challenges of developing AI, and society as a whole will be safer.


Silo Busting

Too often, researchers focus on their narrow area of expertise, rarely reaching out to experts in other fields to solve common problems. AI is no different, with thick walls – sometimes literally – separating the social sciences from the computer sciences. This process of breaking down walls between research fields is often called silo-busting.

If AI researchers largely operate in silos, they may lose opportunities to learn from other perspectives and collaborate with potential colleagues. Scientists might miss gaps in their research or reproduce work already completed by others, because they were secluded away in their silo. This can significantly hamper the development of value-aligned AI.

To bust these silos, Wendell Wallach organized workshops to facilitate knowledge-sharing among leading computer and social scientists. Wallach, a consultant, ethicist, and scholar at Yale University’s Interdisciplinary Center for Bioethics, holds these workshops at The Hastings Center, where he is a senior advisor.

With co-chairs Gary Marchant, Stuart Russell, and Bart Selman, Wallach held the first workshop in April 2016. “The first workshop was very much about exposing people to what experts in all of these different fields were thinking about,” Wallach explains. “My intention was just to put all of these people in a room and hopefully they’d see that they weren’t all reinventing the wheel, and recognize that there were other people who were engaged in similar projects.”

The workshop intentionally brought together experts from a variety of viewpoints, including engineering ethics, philosophy, and resilience engineering, as well as participants from the Institute of Electrical and Electronics Engineers (IEEE), the Office of Naval Research, and the World Economic Forum (WEF). Wallach recounts, “some were very interested in how you implement sensitivity to moral considerations in AI computationally, and others were more interested in how AI changes the societal context.”

Other participants studied how the engineers of these systems may be susceptible to harmful cognitive biases and conflicts of interest, while still others focused on governance issues surrounding AI. Each of these viewpoints is necessary for developing beneficial AI, and The Hastings Center’s workshop gave participants the opportunity to learn from and teach each other.

But silo-busting is not easy. Wallach explains, “everybody has their own goals, their own projects, their own intentions, and it’s hard to hear someone say, ‘maybe you’re being a little naïve about this.’” When researchers operate exclusively in silos, “it’s almost impossible to understand how people outside of those silos did what they did,” he adds.

The intention of the first workshop was not to develop concrete strategies or proposals, but rather to open researchers’ minds to the broad challenges of developing AI with human values. “My suspicion is, the most valuable things that came out of this workshop would be hard to quantify,” Wallach clarifies. “It’s more like people’s minds were being stretched and opened. That was, for me, what this was primarily about.”

The workshop did yield some tangible results. For example, Marchant and Wallach introduced a pilot project for the international governance of AI, and nearly everyone at the workshop agreed to work on it. Since then, the IEEE, the International Committee of the Red Cross, the UN, the World Economic Forum, and other institutions have agreed to become active partners with The Hastings Center in building global infrastructure to ensure that AI and Robotics are beneficial.

This transdisciplinary cooperation is a promising sign that Wallach’s efforts are succeeding in strengthening the global response to AI challenges.


Value Alignment

Wallach and his co-chairs held a second workshop at the end of October. The participants were mostly scientists, but also included social theorists, a legal scholar, philosophers, and ethicists. The overall goal remained – to bust AI silos and facilitate transdisciplinary cooperation – but this workshop had a narrower focus.

“We made it more about value alignment and machine ethics,” he explains. “The tension in the room was between those who thought the problem [of value alignment] was imminently solvable and those who were deeply skeptical about solving the problem at all.”

In general, Wallach observed that “the social scientists and philosophers tend to overplay the difficulties [of creating AI with full value alignment] and computer scientists tend to underplay the difficulties.”

Wallach believes that while computer scientists will build the algorithms and utility functions for AI, they will need input from social scientists to ensure value alignment. “If a utility function represents 100,000 inputs, social theorists will help the AI researchers understand what those 100,000 inputs are,” he explains. “The AI researchers might be able to come up with 50,000-60,000 on their own, but they’re suddenly going to realize that people who have thought much more deeply about applied ethics are perhaps sensitive to things that they never considered.”

“I’m hoping that enough of [these researchers] learn each other’s language and how to communicate with each other, that they’ll recognize the value they can get from collaborating together,” he says. “I think I see evidence of that beginning to take place.”


Moving Forward

Developing value-aligned AI is a monumental task with existential risks. Experts from various perspectives must be willing to learn from each other and adapt their understanding of the issue.

In this spirit, The Hastings Center is leading the charge to bring the various AI silos together. After two successful events that resulted in promising partnerships, Wallach and his co-chairs will hold their third workshop in Spring 2018. And while these workshops are a small effort to facilitate transdisciplinary cooperation on AI, Wallach is hopeful.

“It’s a small group,” he admits, “but it’s people who are leaders in these various fields, so hopefully that permeates through the whole field, on both sides.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence and the King Midas Problem

Value alignment. It’s a phrase that often pops up in discussions about the safety and ethics of artificial intelligence. How can scientists create AI with goals and values that align with those of the people it interacts with?

Very simple robots with very constrained tasks do not need goals or values at all. Although the Roomba’s designers know you want a clean floor, Roomba doesn’t: it simply executes a procedure that the Roomba’s designers predict will work—most of the time. If your kitten leaves a messy pile on the carpet, Roomba will dutifully smear it all over the living room. If we keep programming smarter and smarter robots, then by the late 2020s, you may be able to ask your wonderful domestic robot to cook a tasty, high-protein dinner. But if you forgot to buy any meat, you may come home to a hot meal but find the aforementioned cat has mysteriously vanished. The robot, designed for chores, doesn’t understand that the sentimental value of the cat exceeds its nutritional value.

AI and King Midas

Stuart Russell, a renowned AI researcher, compares the challenge of defining a robot’s objective to the King Midas myth. “The robot,” says Russell, “has some objective and pursues it brilliantly to the destruction of mankind. And it’s because it’s the wrong objective. It’s the old King Midas problem.”

This is one of the big problems in AI safety that Russell is trying to solve. “We’ve got to get the right objective,” he explains, “and since we don’t seem to know how to program it, the right answer seems to be that the robot should learn – from interacting with and watching humans – what it is humans care about.”

Russell works from the assumption that the robot will solve whatever formal problem we define. Rather than assuming that the robot should optimize a given objective, Russell defines the problem as a two-player game (“game” as used by economists, meaning a decision problem with multiple agents) called cooperative inverse reinforcement learning (CIRL).

A CIRL game includes a person and a robot: the robot’s only purpose is to make the person happy, but it doesn’t know what the person wants. Fortunately, it can learn more about what the person wants by observing her behavior. For example, if a robot observed the human’s morning routine, it should discover how important coffee is—not to itself, of course (we don’t want robots drinking coffee), but to the human. Then, it will make coffee for the person without being asked.

What about the example of King Midas, who thought he wanted as much gold as possible? How does the approach deal with someone stating their wishes incorrectly?

“We don’t have to know what we want. That’s the whole point. It’s hard to know what we want, and what we ask for is evidence for what we really want, but it’s not ground truth,” says Russell.

The robot should learn what people want by watching their actions. Then, if someone asks for something that’s technically incorrect, the robot will understand their underlying wish. Russell adds, “The robot—if it had been there—would have told Midas that he didn’t really want everything turned to gold, maybe just a few choice objects that he might point at from time to time and say the magic word.”

AI Off-Switch

Russell and his Berkeley colleagues also recently announced further progress toward ensuring safe AI, with a paper on ensuring an AI’s off-switch is always accessible. An off-switch is often suggested to guarantee a robot doesn’t harm humans – if the robot goes bad, just turn it off! But what if the AI anticipates that and disables its off switch?

Russell and his team propose solving this by providing a level of uncertainty in the robot’s objective. They created another CIRL game in which the robot chooses to announce its intentions, which gives the human a chance to switch it off, or to plunge ahead with its plan, bypassing human oversight.

They showed that a robot with a fixed objective will take the second choice and disable its off-switch. But if there is uncertainty about the objective, the robot will reason that a human decision to switch the robot off would imply the announced intention is undesirable. Because its goal is to make the human happy, the robot accepts being switched off. The robot has a positive incentive not to disable the off-switch, and that incentive is directly tied to the robot’s uncertainty about the human’s true objective.

Ensuring AI Safety

In addition to his research, Russell is also one of the most vocal and active AI safety researchers concerned with ensuring a stronger public understanding of the potential issues surrounding AI development.

He recently co-authored a rebuttal to an article in the MIT Technology Review, which claimed that real AI scientists weren’t worried about the existential threat of AI. Russell and his co-author summed up why it’s better to be cautious and careful than just assume all will turn out for the best:

“Our experience with Chernobyl suggests it may be unwise to claim that a powerful technology entails no risks. It may also be unwise to claim that a powerful technology will never come to fruition. On September 11, 1933, Lord Rutherford, perhaps the world’s most eminent nuclear physicist, described the prospect of extracting energy from atoms as nothing but “moonshine.” Less than 24 hours later, Leo Szilard invented the neutron-induced nuclear chain reaction; detailed designs for nuclear reactors and nuclear weapons followed a few years later. Surely it is better to anticipate human ingenuity than to underestimate it, better to acknowledge the risks than to deny them. … [T]he risk [of AI] arises from the unpredictability and potential irreversibility of deploying an optimization process more intelligent than the humans who specified its objectives.”

This summer, Russell received a grant of over $5.5 million from the Open Philanthropy Project for a new research center, the Center for Human-Compatible Artificial Intelligence, in Berkeley. Among the primary objectives of the Center will be to study this problem of value alignment, to continue his efforts toward provably beneficial AI, and to ensure we don’t make the same mistakes as King Midas.

“Look,” he says, “if you were King Midas, would you want your robot to say, ‘Everything turns to gold? OK, boss, you got it.’ No! You’d want it to say, ‘Are you sure? Including your food, drink, and relatives? I’m pretty sure you wouldn’t like that. How about this: you point to something and say ‘Abracadabra Aurificio’ or something, and then I’ll turn it to gold, OK?’”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

The Problem of Defining Autonomous Weapons

What, exactly, is an autonomous weapon? For the general public, the phrase is often used synonymously with killer robots and triggers images of the Terminator. But for the military, the definition of an autonomous weapons system, or AWS, is deceivingly simple.

The United States Department of Defense defines an AWS as “a weapon system that, once activated, can select and engage targets without further intervention by a human operator.  This includes human-supervised autonomous weapon systems that are designed to allow human operators to override operation of the weapon system, but can select and engage targets without further human input after activation.”

Basically, it is a weapon that can be used in any domain — land, air, sea, space, cyber, or any combination thereof — and encompasses significantly more than just the platform that fires the munition. This means that there are various capabilities the system possesses, such as identifying targets, tracking, and firing, all of which may have varying levels of human interaction and input.

Heather Roff, a research scientist at The Global Security Initiative at Arizona State University and a senior research fellow at the University of Oxford, suggests that even the basic terminology of the DoD’s definition is unclear.

“This definition is problematic because we don’t really know what ‘select’ means here.  Is it ‘detect’ or ‘select’?” she asks. Roff also notes another definitional problem arises because, in many instances, the difference between an autonomous weapon (acting independently) and an automated weapon (pre-programmed to act automatically) is not clear.


A Database of Weapons Systems

State parties to the UN’s Convention on Conventional Weapons (CCW) also grapple with what constitutes an autonomous — and not a current automated — weapon. During the last three years of discussion at Informal Meetings of Experts at the CCW, participants typically only referred to two or three presently deployed weapons systems that appear to be AWS, such as the Israeli Harpy or the United States’ Counter Rocket and Mortar system.

To address this, the International Committee of the Red Cross requested more data on presently deployed systems. It wanted to know what the weapons systems are that states currently use and what projects are under development. Roff took up the call to action. She poured over publicly available data from a variety of sources and compiled a database of 284 weapons systems. She wanted to know what capacities already existed on presently deployed systems and whether these were or were not “autonomous.”

“The dataset looks at the top five weapons exporting countries, so that’s Russia, China, the United States, France and Germany,” says Roff. “I’m looking at major sales and major defense industry manufacturers from each country. And then I look at all the systems that are presently deployed by those countries that are manufactured by those top manufacturers, and I code them along a series of about 20 different variables.”

These variables include capabilities like navigation, homing, target identification, firing, etc., and for each variable, Roff coded a weapon as either having the capacity or not. Roff then created a series of three indices to bundle the various capabilities: self-mobility, self-direction, and self-determination. Self-mobility capabilities allow a system to move by itself, self-direction relates to target identification, and self-determination indexes the abilities that a system may possess in relation to goal setting, planning, and communication. Most “smart” weapons have high self-direction and self-mobility, but few, if any, have self-determination capabilities.

As Roff explains in a recent Foreign Policy post, the data shows that “the emerging trend in autonomy has less to do with the hardware and more on the areas of communications and target identification. What we see is a push for better target identification capabilities, identification friend or foe (IFF), as well as learning.  Systems need to be able to adapt, to learn, and to change or update plans while deployed. In short, the systems need to be tasked with more things and vaguer tasks.” Thus newer systems will need greater self-determination capabilities.


The Human in the Loop

But understanding what the weapons systems can do is only one part of the equation. In most systems, humans still maintain varying levels of control, and the military often claims that a human will always be “in the loop.” That is, a human will always have some element of meaningful control over the system. But this leads to another definitional problem: just what is meaningful human control?

Roff argues that this idea of keeping a human “in the loop” isn’t just “unhelpful,” but that it may be “hindering our ability to think about what’s wrong with autonomous systems.” She references what the UK Ministry of Defense calls, the Empty Hangar Problem: no one expects to walk into a military airplane hangar and discover that the autonomous plane spontaneously decided, on its own, to go to war.

“That’s just not going to happen,” Roff says, “These systems are always going to be used by humans, and humans are going to decide to use them.” But thinking about humans in some loop, she contends, means that any difficulties with autonomy get pushed aside.

Earlier this year, Roff worked with Article 36, which coined the phrase “meaningful human control,” to establish more a more clear-cut definition of the term. They published a concept paper, Meaningful Human Control, Artificial Intelligence and Autonomous Weapons, which offered guidelines for delegates at the 2016 CCW Meeting of Experts on Lethal Autonomous Weapons Systems.

In the paper, Roff and Richard Moyes outlined key elements – such as predictable, reliable and transparent technology, accurate user information, a capacity for timely human action and intervention, human control during attacks, etc. – for determining whether an AWS allows for meaningful human control.

“You can’t offload your moral obligation to a non-moral agent,” says Roff. “So that’s where I think our work on meaningful human control is: a human commander has a moral obligation to undertake precaution and proportionality in each attack.” The weapon system cannot do it for the human.

Researchers and the international community are only beginning to tackle the ethical issues that arise from AWSs. Clearly defining the weapons systems and the role humans will continue to play is one small part of a very big problem. Roff will continue to work with the international community to establish more well defined goals and guidelines.

“I’m hoping that the doctrine and the discussions that are developing internationally and through like-minded states will actually guide normative generation of how to use or not use such systems,” she says.

Heather Roff also spoke about this work on an FLI podcast.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Complex AI Systems Explain Their Actions


In the future, service robots equipped with artificial intelligence (AI) are bound to be a common sight. These bots will help people navigate crowded airports, serve meals, or even schedule meetings.

As these AI systems become more integrated into daily life, it is vital to find an efficient way to communicate with them. It is obviously more natural for a human to speak in plain language rather than a string of code. Further, as the relationship between humans and robots grows, it will be necessary to engage in conversations, rather than just give orders.

This human-robot interaction is what Manuela M. Veloso’s research is all about. Veloso, a professor at Carnegie Mellon University, has focused her research on CoBots, autonomous indoor mobile service robots which transport items, guide visitors to building locations, and traverse the halls and elevators. The CoBot robots have been successfully autonomously navigating for several years now, and have traveled more than 1,000km. These accomplishments have enabled the research team to pursue a new direction, focusing now on novel human-robot interaction.

“If you really want these autonomous robots to be in the presence of humans and interacting with humans, and being capable of benefiting humans, they need to be able to talk with humans” Veloso says.


Communicating With CoBots

Veloso’s CoBots are capable of autonomous localization and navigation in the Gates-Hillman Center using WiFi, LIDAR, and/or a Kinect sensor (yes, the same type used for video games).

The robots navigate by detecting walls as planes, which they match to the known maps of the building. Other objects, including people, are detected as obstacles, so navigation is safe and robust. Overall, the CoBots are good navigators and are quite consistent in their motion. In fact, the team noticed the robots could wear down the carpet as they traveled the same path numerous times.

Because the robots are autonomous, and therefore capable of making their own decisions, they are out of sight for large amounts of time while they navigate the multi-floor buildings.

The research team began to wonder about this unaccounted time. How were the robots perceiving the environment and reaching their goals? How was the trip? What did they plan to do next?

“In the future, I think that incrementally we may want to query these systems on why they made some choices or why they are making some recommendations,” explains Veloso.

The research team is currently working on the question of why the CoBots took the route they did while autonomous. The team wanted to give the robots the ability to record their experiences and then transform the data about their routes into natural language. In this way, the bots could communicate with humans and reveal their choices and hopefully the rationale behind their decisions.


Levels of Explanation

The “internals” underlying the functions of any autonomous robots are completely based on numerical computations, and not natural language. For example, the CoBot robots in particular compute the distance to walls, assigning velocities to their motors to enable the motion to specific map coordinates.

Asking an autonomous robot for a non-numerical explanation is complex, says Veloso. Furthermore, the answer can be provided in many potential levels of detail.

“We define what we call the ‘verbalization space’ in which this translation into language can happen with different levels of detail, with different levels of locality, with different levels of specificity.”

For example, if a developer is asking a robot to detail their journey, they might expect a lengthy retelling, with details that include battery levels. But a random visitor might just want to know how long it takes to get from one office to another.

Therefore, the research is not just about the translation from data to language, but also the acknowledgment that the robots need to explain things with more or less detail. If a human were to ask for more detail, the request triggers CoBot “to move” into a more detailed point in the verbalization space.

“We are trying to understand how to empower the robots to be more trustable through these explanations, as they attend to what the humans want to know,” says Veloso. The ability to generate explanations, in particular at multiple levels of detail, will be especially important in the future, as the AI systems will work with more complex decisions. Humans could have a more difficult time inferring the AI’s reasoning. Therefore, the bot will need to be more transparent.

For example, if you go to a doctor’s office and the AI there makes a recommendation about your health, you may want to know why it came to this decision, or why it recommended one medication over another.

Currently, Veloso’s research focuses on getting the robots to generate these explanations in plain language. The next step will be to have the robots incorporate natural language when humans provide them with feedback. “[The CoBot] could say, ‘I came from that way,’ and you could say, ‘well next time, please come through the other way,’” explains Veloso.

These sorts of corrections could be programmed into the code, but Veloso believes that “trustability” in AI systems will benefit from our ability to dialogue, query, and correct their autonomy. She and her team aim at contributing to a multi-robot, multi-human symbiotic relationship, in which robots and humans coordinate and cooperate as a function of their limitations and strengths.

“What we’re working on is to really empower people – a random person who meets a robot – to still be able to ask things about the robot in natural language,” she says.

In the future, when we will have more and more AI systems that are able to perceive the world, make decisions, and support human decision-making, the ability to engage in these types of conversations will be essential­­.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Who is Responsible for Autonomous Weapons?

Consider the following wartime scenario: Hoping to spare the lives of soldiers, a country deploys an autonomous weapon to wipe out an enemy force. This robot has demonstrated military capabilities that far exceed even the best soldiers, but when it hits the ground, it gets confused. It can’t distinguish the civilians from the enemy soldiers and begins taking innocent lives. The military generals desperately try to stop the robot, but by the time they succeed it has already killed dozens.

Who is responsible for this atrocity? Is it the commanders who deployed the robot, the designers and manufacturers of the robot, or the robot itself?


Liability: Autonomous Systems

As artificial intelligence improves, governments may turn to autonomous weapons — like military robots — in order to gain the upper hand in armed conflict. These weapons can navigate environments on their own and make their own decisions about who to kill and who to spare. While the example above may never occur, unintended harm is inevitable. Considering these scenarios helps formulate important questions that governments and researchers must jointly consider, namely:

How do we hold human beings accountable for the actions of autonomous systems? And how is justice served when the killer is essentially a computer?

As it turns out, there is no straightforward answer to this dilemma. When a human soldier commits an atrocity and kills innocent civilians, that soldier is held accountable. But when autonomous weapons do the killing, it’s difficult to blame them for their mistakes.

An autonomous weapon’s “decision” to murder innocent civilians is like a computer’s “decision” to freeze the screen and delete your unsaved project. Frustrating as a frozen computer may be, people rarely think the computer intended to complicate their lives.

Intention must be demonstrated to prosecute someone for a war crime, and while autonomous weapons may demonstrate outward signs of decision-making and intention, they still run on a code that’s just as impersonal as the code that glitches and freezes a computer screen. Like computers, these systems are not legal or moral agents, and it’s not clear how to hold them accountable — or if they can be held accountable — for their mistakes.

So who assumes the blame when autonomous weapons take innocent lives? Should they even be allowed to kill at all?


Liability: from Self-Driving Cars to Autonomous Weapons

Peter Asaro, a philosopher of science, technology, and media at The New School in New York City, has been working on addressing these fundamental questions of responsibility and liability with all autonomous systems, not just weapons. By exploring fundamental concepts of autonomy, agency, and liability, he intends to develop legal approaches for regulating the use of autonomous systems and the harm they cause.

At a recent conference on the Ethics of Artificial Intelligence, Asaro discussed the liability issues surrounding the application of AI to weapons systems. He explained, “AI poses threats to international law itself — to the norms and standards that we rely on to hold people accountable for [decisions, and to] hold states accountable for military interventions — as [people are] able to blame systems for malfunctioning instead of taking responsibility for their decisions.”

The legal system will need to reconsider who is held liable to ensure that justice is served when an accident happens. Asaro argues that the moral and legal issues surrounding autonomous weapons are much different than the issues surrounding other autonomous machines, such as self-driving cars.

Though researchers still expect the occasional fatal accident to occur with self-driving cars, these autonomous vehicles are designed with safety in mind. One of the goals of self-driving cars is to save lives. “The fundamental difference is that with any kind of weapon, you’re intending to do harm, so that carries a special legal and moral burden,” Asaro explains. “There is a moral responsibility to ensure that [the weapon is] only used in legitimate and appropriate circumstances.”

Furthermore, liability with autonomous weapons is much more ambiguous than it is with self-driving cars and other domestic robots.

With self-driving cars, for example, bigger companies like Volvo intend to embrace strict liability – where the manufacturers assume full responsibility for accidental harm. Although it is not clear how all manufacturers will be held accountable for autonomous systems, strict liability and threats of class-action lawsuits incentivize manufacturers to make their product as safe as possible.

Warfare, on the other hand, is a much messier situation.

“You don’t really have liability in war,” says Asaro. “The US military could sue a supplier for a bad product, but as a victim who was wrongly targeted by a system, you have no real legal recourse.”

Autonomous weapons only complicate this. “These systems become more unpredictable as they become more sophisticated, so psychologically commanders feel less responsible for what those systems do. They don’t internalize responsibility in the same way,” Asaro explained at the Ethics of AI conference.

To ensure that commanders internalize responsibility, Asaro suggests that “the system has to allow humans to actually exercise their moral agency.”

That is, commanders must demonstrate that they can fully control the system before they use it in warfare. Once they demonstrate control, it can become clearer who can be held accountable for the system’s actions.


Preparing for the Unknown

Behind these concerns about liability, lies the overarching concern that autonomous machines might act in ways that humans never intended. Asaro asks: “When these systems become more autonomous, can the owners really know what they’re going to do?”

Even the programmers and manufacturers may not know what their machines will do. The purpose of developing autonomous machines is so they can make decisions themselves – without human input. And as the programming inside an autonomous system becomes more complex, people will increasingly struggle to predict the machine’s action.

Companies and governments must be prepared to handle the legal complexities of a domestic or military robot or system causing unintended harm. Ensuring justice for those who are harmed may not be possible without a clear framework for liability.

Asaro explains, “We need to develop policies to ensure that useful technologies continue to be developed, while ensuring that we manage the harms in a just way. A good start would be to prohibit automating decisions over the use of violent and lethal force, and to focus on managing the safety risks in beneficial autonomous systems.”

Peter Asaro also spoke about this work on an FLI podcast. You can learn more about his work at

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Cybersecurity and Machine Learning

When it comes to cybersecurity, no nation can afford to slack off. If a nation’s defense systems cannot anticipate how an attacker will try to fool them, then an especially clever attack could expose military secrets or use disguised malware to cause major networks to crash.

A nation’s defense systems must keep up with the constant threat of attack, but this is a difficult and never-ending process. It seems that the defense is always playing catch-up.

Ben Rubinstein, a professor at the University of Melbourne in Australia, asks: “Wouldn’t it be good if we knew what the malware writers are going to do next, and to know what type of malware is likely to get through the filters?”

In other words, what if defense systems could learn to anticipate how attackers will try to fool them?


Adversarial Machine Learning

In order to address this question, Rubinstein studies how to prepare machine-learning systems to catch adversarial attacks. In the game of national cybersecurity, these adversaries are often individual hackers or governments who want to trick machine-learning systems for profit or political gain.

Nations have become increasingly dependent on machine-learning systems to protect against such adversaries. Unaided by humans, machine-learning systems in anti-malware and facial recognition software have the ability to learn and improve their function as they encounter new data. As they learn, they become better at catching adversarial attacks.

Machine-learning systems are generally good at catching adversaries, but they are not completely immune to threats, and adversaries are constantly looking for new ways to fool them. Rubinstein says, “Machine learning works well if you give it data like it’s seen before, but if you give it data that it’s never seen before, there’s no guarantee that it’s going to work.”

With adversarial machine learning, security agencies address this weakness by presenting the system with different types of malicious data to test the system’s filters. The system then digests this new information and learns how to identify and capture malware from clever attackers.


Security Evaluation of Machine-Learning Systems

Rubinstein’s project is called “Security Evaluation of Machine-Learning Systems”, and his ultimate goal is to develop a software tool that companies and government agencies can use to test their defenses. Any company or agency that uses machine-learning systems could run his software against their system. Rubinstein’s tool would attack and try to fool the system in order to expose the system’s vulnerabilities. In doing so, his tool anticipates how an attacker could slip by the system’s defenses.

The software would evaluate existing machine-learning systems and find weak spots that adversaries might try to exploit – similar to how one might defend a castle.

“We’re not giving you a new castle,” Rubinstein says, “we’re just going to walk around the perimeter and look for holes in the walls and weak parts of the castle, or see where the moat is too shallow.”

By analyzing many different machine-learning systems, his software program will pick up on trends and be able to advise security agencies to either use a different system or bolster the security of their existing system. In this sense, his program acts as a consultant for every machine-learning system.

Consider a program that does facial recognition. This program would use machine learning to identify faces and catch adversaries that pretend to look like someone else.

Rubinstein explains: “Our software aims to automate this security evaluation so that it takes an image of a person and a program that does facial recognition, and it will tell you how to change its appearance so that it will evade detection or change the outcome of machine learning in some way.”

This is called a mimicry attack – when an adversary makes one instance (one face) look like another, and thereby fools a system.

To make this example easier to visualize, Rubinstein’s group built a program that demonstrates how to change a face’s appearance to fool a machine-learning system into thinking that it is another face.

In the image below, the two faces don’t look alike, but the left image has been modified so that the machine-learning system thinks it is the same as the image on the right. This example provides insight into how adversaries can fool machine-learning systems by exploiting quirks.


When Rubinstein’s software fools a system with a mimicry attack, security personnel can then take that information and retrain their program to establish more effective security when the stakes are higher.


Minimizing the Attacker’s Advantage

While Rubinstein’s software will help to secure machine-learning systems against adversarial attacks, he has no illusions about the natural advantages that attackers enjoy. It will always be easier to attack a castle than to defend it, and the same holds true for a machine-learning system. This is called the ‘asymmetry of cyberwarfare.’

“The attacker can come in from any angle. It only needs to succeed at one point, but the defender needs to succeed at all points,” says Rubinstein.

In general, Rubinstein worries that the tools available to test machine-learning systems are theoretical in nature, and put too much responsibility on the security personnel to understand the complex math involved. A researcher might redo the mathematical analysis for every new learning system, but security personnel are unlikely to have the time or resources to keep up.

Rubinstein aims to “bring what’s out there in theory and make it more applied and more practical and easy for anyone who’s using machine learning in a system to evaluate the security of their system.”

With his software, Rubinstein intends to help level the playing field between attackers and defenders. By giving security agencies better tools to test and adapt their machine-learning systems, he hopes to improve the ability of security personnel to anticipate and guard against cyberattacks.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Supervising AI Growth

When Apple released its software application, Siri, in 2011, iPhone users had high expectations for their intelligent personal assistants. Yet despite its impressive and growing capabilities, Siri often makes mistakes. The software’s imperfections highlight the clear limitations of current AI: today’s machine intelligence can’t understand the varied and changing needs and preferences of human life.

However, as artificial intelligence advances, experts believe that intelligent machines will eventually – and probably soon – understand the world better than humans. While it might be easy to understand how or why Siri makes a mistake, figuring out why a superintelligent AI made the decision it did will be much more challenging.

If humans cannot understand and evaluate these machines, how will they control them?

Paul Christiano, a Ph.D. student in computer science at UC Berkeley, has been working on addressing this problem. He believes that to ensure safe and beneficial AI, researchers and operators must learn to measure how well intelligent machines do what humans want, even as these machines surpass human intelligence.


Semi-supervised Learning

The most obvious way to supervise the development of an AI system also happens to be the hard way. As Christiano explains: “One way humans can communicate what they want, is by spending a lot of time digging down on some small decision that was made [by an AI], and try to evaluate how good that decision was.”

But while this is theoretically possible, the human researchers would never have the time or resources to evaluate every decision the AI made. “If you want to make a good evaluation, you could spend several hours analyzing a decision that the machine made in one second,” says Christiano.

For example, suppose an amateur chess player wants to understand a better chess player’s previous move. Merely spending a few minutes evaluating this move won’t be enough, but if she spends a few hours she could consider every alternative and develop a meaningful understanding of the better player’s moves.

Fortunately for researchers, they don’t need to evaluate every decision an AI makes in order to be confident in its behavior. Instead, researchers can choose “the machine’s most interesting and informative decisions, where getting feedback would most reduce our uncertainty,“ Christiano explains.

“Say your phone pinged you about a calendar event while you were on a phone call,” he elaborates, “That event is not analogous to anything else it has done before, so it’s not sure whether it is good or bad.” Due to this uncertainty, the phone would send the transcript of its decisions to an evaluator at Google, for example. The evaluator would study the transcript, ask the phone owner how he felt about the ping, and determine whether pinging users during phone calls is a desirable or undesirable action. By providing this feedback, Google teaches the phone when it should interrupt users in the future.

This active learning process is an efficient method for humans to train AIs, but what happens when humans need to evaluate AIs that exceed human intelligence?

Consider a computer that is mastering chess. How could a human give appropriate feedback to the computer if the human has not mastered chess? The human might criticize a move that the computer makes, only to realize later that the machine was correct.

With increasingly intelligent phones and computers, a similar problem is bound to occur. Eventually, Christiano explains, “we need to handle the case where AI systems surpass human performance at basically everything.”

If a phone knows much more about the world than its human evaluators, then the evaluators cannot trust their human judgment. They will need to “enlist the help of more AI systems,” Christiano explains.


Using AIs to Evaluate Smarter AIs

When a phone pings a user while he is on a call, the user’s reaction to this decision is crucial in determining whether the phone will interrupt users during future phone calls. But, as Christiano argues, “if a more advanced machine is much better than human users at understanding the consequences of interruptions, then it might be a bad idea to just ask the human ‘should the phone have interrupted you right then?’” The human might express annoyance at the interruption, but the machine might know better and understand that this annoyance was necessary to keep the user’s life running smoothly.

In these situations, Christiano proposes that human evaluators use other intelligent machines to do the grunt work of evaluating an AI’s decisions. In practice, a less capable System 1 would be in charge of evaluating the more capable System 2. Even though System 2 is smarter, System 1 can process a large amount of information quickly, and can understand how System 2 should revise its behavior. The human trainers would still provide input and oversee the process, but their role would be limited.

This training process would help Google understand how to create a safer and more intelligent AI – System 3 – which the human researchers could then train using System 2.

Christiano explains that these intelligent machines would be like little agents that carry out tasks for humans. Siri already has this limited ability to take human input and figure out what the human wants, but as AI technology advances, machines will learn to carry out complex tasks that humans cannot fully understand.


Can We Ensure that an AI Holds Human Values?

As Google and other tech companies continue to improve their intelligent machines with each evaluation, the human trainers will fulfill a smaller role. Eventually, Christiano explains, “it’s effectively just one machine evaluating another machine’s behavior.”

Ideally, “each time you build a more powerful machine, it effectively models human values and does what humans would like,” says Christiano. But he worries that these machines may stray from human values as they surpass human intelligence. To put this in human terms: a complex intelligent machine would resemble a large organization of humans. If the organization does tasks that are too complex for any individual human to understand, it may pursue goals that humans wouldn’t like.

In order to address these control issues, Christiano is working on an “end-to-end description of this machine learning process, fleshing out key technical problems that seem most relevant.” His research will help bolster the understanding of how humans can use AI systems to evaluate the behavior of more advanced AI systems. If his work succeeds, it will be a significant step in building trustworthy artificial intelligence.

You can learn more about Paul Christiano’s work here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

How Can AI Learn to Be Safe?

As artificial intelligence improves, machines will soon be equipped with intellectual and practical capabilities that surpass the smartest humans. But not only will machines be more capable than people, they will also be able to make themselves better. That is, these machines will understand their own design and how to improve it – or they could create entirely new machines that are even more capable.

The human creators of AIs must be able to trust these machines to remain safe and beneficial even as they self-improve and adapt to the real world.

Recursive Self-Improvement

This idea of an autonomous agent making increasingly better modifications to its own code is called recursive self-improvement. Through recursive self-improvement, a machine can adapt to new circumstances and learn how to deal with new situations.

To a certain extent, the human brain does this as well. As a person develops and repeats new habits, connections in their brains can change. The connections grow stronger and more effective over time, making the new, desired action easier to perform (e.g. changing one’s diet or learning a new language). In machines though, this ability to self-improve is much more drastic.

An AI agent can process information much faster than a human, and if it does not properly understand how its actions impact people, then its self-modifications could quickly fall out of line with human values.

For Bas Steunebrink, a researcher at the Swiss AI lab IDSIA, solving this problem is a crucial step toward achieving safe and beneficial AI.

Building AI in a Complex World

Because the world is so complex, many researchers begin AI projects by developing AI in carefully controlled environments. Then they create mathematical proofs that can assure them that the AI will achieve success in this specified space.

But Steunebrink worries that this approach puts too much responsibility on the designers and too much faith in the proof, especially when dealing with machines that can learn through recursive self-improvement. He explains, “We cannot accurately describe the environment in all its complexity; we cannot foresee what environments the agent will find itself in in the future; and an agent will not have enough resources (energy, time, inputs) to do the optimal thing.”

If the machine encounters an unforeseen circumstance, then that proof the designer relied on in the controlled environment may not apply. Says Steunebrink, “We have no assurance about the safe behavior of the [AI].”

Experience-based Artificial Intelligence

Instead, Steunebrink uses an approach called EXPAI (experience-based artificial intelligence). EXPAI are “self-improving systems that make tentative, additive, reversible, very fine-grained modifications, without prior self-reasoning; instead, self-modifications are tested over time against experiential evidences and slowly phased in when vindicated, or dismissed when falsified.”

Instead of trusting only a mathematical proof, researchers can ensure that the AI develops safe and benevolent behaviors by teaching and testing the machine in complex, unforeseen environments that challenge its function and goals.

With EXPAI, AI machines will learn from interactive experience, and therefore monitoring their growth period is crucial. As Steunebrink posits, the focus shifts from asking, “What is the behavior of an agent that is very intelligent and capable of self-modification, and how do we control it?” to asking, “How do we grow an agent from baby beginnings such that it gains both robust understanding and proper values?”

Consider how children grow and learn to navigate the world independently. If provided with a stable and healthy childhood, children learn to adopt values and understand their relation to the external world through trial and error, and by examples. Childhood is a time of growth and learning, of making mistakes, of building on success – all to help prepare the child to grow into a competent adult who can navigate unforeseen circumstances.

Steunebrink believes that researchers can ensure safe AI through a similar, gradual process of experience-based learning. In an architectural blueprint developed by Steunebrink and his colleagues, the AI is constructed “starting from only a small amount of designer-specific code – a seed.” Like a child, the beginnings of the machine will be less competent and less intelligent, but it will self-improve over time, as it learns from teachers and real-world experience.

As Steunebrink’s approach focuses on the growth period of an autonomous agent, the teachers, not the programmers, are most responsible for creating a robust and benevolent AI. Meanwhile, the developmental stage gives researchers time to observe and correct an AI’s behavior in a controlled setting where the stakes are still low.

The Future of EXPAI

Steunebrink and his colleagues are currently creating what he describes as a “pedagogy to determine what kind of things to teach to agents and in what order, how to test what the agents understand from being taught, and, depending on the results of such tests, decide whether we can proceed to the next steps of teaching or whether we should reteach the agent or go back to the drawing board.”

A major issue Steunebrink faces is that his method of experience-based learning diverges from the most popular methods for improving AI. Instead of doing the intellectual work of crafting a proof-backed optimal learning algorithm on a computer, EXPAI requires extensive in-person work with the machine to teach it like a child.

Creating safe artificial intelligence might prove to be more a process of teaching and growth rather than a function of creating the perfect mathematical proof. While such a shift in responsibility may be more time-consuming, it could also help establish a far more comprehensive understanding of an AI before it is released into the real world.

Steunebrink explains, “A lot of work remains to move beyond the agent implementation level, towards developing the teaching and testing methodologies that enable us to grow an agent’s understanding of ethical values, and to ensure that the agent is compelled to protect and adhere to them.”

The process is daunting, he admits, “but it is not as daunting as the consequences of getting AI safety wrong.”

If you would like to learn more about Bas Steunebrink’s research, you can read about his project here, or visit He is also the co-founder of NNAISENSE, which you can learn about at

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Training Artificial Intelligence to Compromise

Imagine you’re sitting in a self-driving car that’s about to make a left turn into on-coming traffic. One small system in the car will be responsible for making the vehicle turn, one system might speed it up or hit the brakes, other systems will have sensors that detect obstacles, and yet another system may be in communication with other vehicles on the road. Each system has its own goals — starting or stopping, turning or traveling straight, recognizing potential problems, etc. — but they also have to all work together toward one common goal: turning into traffic without causing an accident.

Harvard professor and FLI researcher, David Parkes, is trying to solve just this type of problem. Parkes told FLI, “The particular question I’m asking is: If we have a system of AIs, how can we construct rewards for individual AIs, such that the combined system is well behaved?”

Essentially, an AI within a system of AIs — like that in the car example above — needs to learn how to meet its own objective, as well as how to compromise so that it’s actions will help satisfy the group objective. On top of that, the system of AIs needs to consider the preferences of society. The safety of the passenger in the car or a pedestrian in the crosswalk is a higher priority than turning left.

Training a well-behaved AI

Because environments like a busy street are so complicated, an engineer can’t just program an AI to act in some way to always achieve its objectives. AIs need to learn proper behavior based on a rewards system. “Each AI has a reward for its action and the action of the other AI,” Parkes explained. With the world constantly changing, the rewards have to evolve, and the AIs need to keep up not only with how their own goals change, but also with the evolving objectives of the system as a whole.

The idea of a rewards-based learning system is something most people can likely relate to. Who doesn’t remember the excitement of a gold star or a smiley face on a test? And any dog owner has experienced how much more likely their pet is to perform a trick when it realizes it will get a treat. A reward for an AI is similar.

A technique often used in designing artificial intelligence is reinforcement learning. With reinforcement learning, when the AI takes some action, it receives either positive or negative feedback. And it then tries to optimize its actions to receive more positive rewards. However, the reward can’t just be programmed into the AI. The AI has to interact with its environment to learn which actions will be considered good, bad or neutral. Again, the idea is similar to a dog learning that tricks can earn it treats or praise, but misbehaving could result in punishment.

More than this, Parkes wants to understand how to distribute rewards to subcomponents – the individual AIs – in order to achieve good system-wide behavior. How often should there be positive (or negative) reinforcement, and in reaction to which types of actions?

For example, if you were to play a video game without any points or lives or levels or other indicators of success or failure, you might run around the world killing or fighting aliens and monsters, and you might eventually beat the game, but you wouldn’t know which specific actions led you to win. Instead, games are designed to provide regular feedback and reinforcement so that you know when you make progress and what steps you need to take next. To train an AI, Parkes has to determine which smaller actions will merit feedback so that the AI can move toward a larger, overarching goal.

Rather than programming a reward specifically into the AI, Parkes shapes the way rewards flow from the environment to the AI in order to promote desirable behaviors as the AI interacts with the world around it.

But this is all for just one AI. How do these techniques apply to two or more AIs?

Training a system of AIs

Much of Parkes’ work involves game theory. Game theory helps researchers understand what types of rewards will elicit collaboration among otherwise self-interested players, or in this case, rational AIs. Once an AI figures out how to maximize its own reward, what will entice it to act in accordance with another AI? To answer this question, Parkes turns to an economic theory called mechanism design.

Mechanism design theory is a Nobel-prize winning theory that allows researchers to determine how a system with multiple parts can achieve an overarching goal. It is a kind of “inverse game theory.” How can rules of interaction – ways to distribute rewards, for instance – be designed so individual AIs will act in favor of system-wide and societal preferences? Among other things, mechanism design theory has been applied to problems in auctions, e-commerce, regulations, environmental policy, and now, artificial intelligence.

The difference between Parkes’ work with AIs and mechanism design theory is that the latter requires some sort of mechanism or manager overseeing the entire system. In the case of an automated car or a drone, the AIs within have to work together to achieve group goals, without a mechanism making final decisions. As the environment changes, the external rewards will change. And as the AIs within the system realize they want to make some sort of change to maximize their rewards, they’ll have to communicate with each other, shifting the goals for the entire autonomous system.

Parkes summarized his work for FLI, saying, “The work that I’m doing as part of the FLI grant program is all about aligning incentives so that when autonomous AIs decide how to act, they act in a way that’s not only good for the AI system, but also good for society more broadly.”

Parkes is also involved with the One Hundred Year Study on Artificial Intelligence, and he explained his “research with FLI has informed a broader perspective on thinking about the role that AI can play in an urban context in the near future.” As he considers the future, he asks, “What can we see, for example, from the early trajectory of research and development on autonomous vehicles and robots in the home, about where the hard problems will be in regard to the engineering of value-aligned systems?”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

The Evolution of AI: Can Morality be Programmed?

The following article was originally posted on

Recent advances in artificial intelligence have made it clear that our computers need to have a moral code. Disagree? Consider this: A car is driving down the road when a child on a bicycle suddenly swerves in front of it. Does the car swerve into an oncoming lane, hitting another car that is already there? Does the car swerve off the road and hit a tree? Does it continue forward and hit the child?

Each solution comes with a problem: It could result in death.

It’s an unfortunate scenario, but humans face such scenarios every day, and if an autonomous car is the one in control, it needs to be able to make this choice. And that means that we need to figure out how to program morality into our computers.

Vincent Conitzer, a Professor of Computer Science at Duke University, recently received a grant from the Future of Life Institute in order to try and figure out just how we can make an advanced AI that is able to make moral judgments…and act on them.


At first glance, the goal seems simple enough—make an AI that behaves in a way that is ethically responsible; however, it’s far more complicated than it initially seems, as there are an amazing amount of factors that come into play. As Conitzer’s project outlines, “moral judgments are affected by rights (such as privacy), roles (such as in families), past actions (such as promises), motives and intentions, and other morally relevant features. These diverse factors have not yet been built into AI systems.”

That’s what we’re trying to do now.

In a recent interview with Futurism, Conitzer clarified that, while the public may be concerned about ensuring that rogue AI don’t decide to wipe-out humanity, such a thing really isn’t a viable threat at the present time (and it won’t be for a long, long time). As a result, his team isn’t concerned with preventing a global-robotic-apocalypse by making selfless AI that adore humanity. Rather, on a much more basic level, they are focused on ensuring that our artificial intelligence systems are able to make the hard, moral choices that humans make on a daily basis.

So, how do you make an AI that is able to make a difficult moral decision?

Conitzer explains that, to reach their goal, the team is following a two path process: Having people make ethical choices in order to find patterns and then figuring out how that can be translated into an artificial intelligence. He clarifies, “what we’re working on right now is actually having people make ethical decisions, or state what decision they would make in a given situation, and then we use machine learning to try to identify what the general pattern is and determine the extent that we could reproduce those kind of decisions.”

In short, the team is trying to find the patterns in our moral choices and translate this pattern into AI systems. Conitzer notes that, on a basic level, it’s all about making predictions regarding what a human would do in a given situation, “if we can become very good at predicting what kind of decisions people make in these kind of ethical circumstances, well then, we could make those decisions ourselves in the form of the computer program.”

However, one major problem with this is, of course, that morality is not objective — it’s neither timeless nor universal.

Conitzer articulates the problem by looking to previous decades, “if we did the same ethical tests a hundred years ago, the decisions that we would get from people would be much more racist, sexist, and all kinds of other things that we wouldn’t see as ‘good’ now. Similarly, right now, maybe our moral development hasn’t come to its apex, and a hundred years from now people might feel that some of the things we do right now, like how we treat animals, is completely immoral. So there’s kind of a risk of bias and with getting stuck at whatever our current level of moral development is.”

And of course, there is the aforementioned problem regarding how complex morality is. “Pure altruism, that’s very easy to address in game theory, but maybe you feel like you owe me something based on previous actions. That’s missing from the game theory literature, and so that’s something that we’re also thinking about a lot—how can you make this, what game theory calls ‘Solutions Concept’—sensible? How can you compute these things?”

To solve these problems, and to help figure out exactly how morality functions and can (hopefully) be programmed into an AI, the team is combining the methods from computer science, philosophy, and psychology “That’s, in a nutshell, what our project is about,” Conitzer asserts.

But what about those sentient AI? When will we need to start worrying about them and discussing how they should be regulated?


According to Conitzer, human-like artificial intelligence won’t be around for some time yet (so yay! No Terminator-styled apocalypse…at least for the next few years).

“Recently, there have been a number of steps towards such a system, and I think there have been a lot of surprising advances….but I think having something like a ‘true AI,’ one that’s really as flexible, able to abstract, and do all these things that humans do so easily, I think we’re still quite far away from that,” Conitzer asserts.

True, we can program systems to do a lot of things that humans do well, but there are some things that are exceedingly complex and hard to translate into a pattern that computers can recognize and learn from (which is ultimately the basis of all AI).

“What came out of early AI research, the first couple decades of AI research, was the fact that certain things that we had thought of as being real benchmarks for intelligence, like being able to play chess well, were actually quite accessible to computers. It was not easy to write and create a chess-playing program, but it was doable.”

Indeed, today, we have computers that are able to beat the best players in the world in a host of games—Chess and Alpha Go, for example.

But Conitzer clarifies that, as it turns out, playing games isn’t exactly a good measure of human-like intelligence. Or at least, there is a lot more to the human mind. “Meanwhile, we learned that other problems that were very simple for people were actually quite hard for computers, or to program computers to do. For example, recognizing your grandmother in a crowd. You could do that quite easily, but it’s actually very difficult to program a computer to recognize things that well.”

Since the early days of AI research, we have made computers that are able to recognize and identify specific images. However, to sum the main point, it is remarkably difficult to program a system that is able to do all of the things that humans can do, which is why it will be some time before we have a ‘true AI.’

Yet, Conitzer asserts that now is the time to start considering what the rules we will use to govern such intelligences. “It may be quite a bit further out, but to computer scientists, that means maybe just on the order of decades, and it definitely makes sense to try to think about these things a little bit ahead.” And he notes that, even though we don’t have any human-like robots just yet, our intelligence systems are already making moral choices and could, potentially, save or end lives.

“Very often, many of these decisions that they make do impact people and we may need to make decisions that we will typically be considered to be a morally loaded decision. And a standard example is a self-driving car that has to decide to either go straight and crash into the car ahead of it or veer off and maybe hurt some pedestrian. How do you make those trade-offs? And that I think is something we can really make some progress on. This doesn’t require superintelligent AI, this can just be programs that make these kind of trade-offs in various ways.”

But of course, knowing what decision to make will first require knowing exactly how our morality operates (or at least having a fairly good idea). From there, we can begin to program it, and that’s what Conitzer and his team are hoping to do.

So welcome to the dawn of moral robots.

This interview has been edited for brevity and clarity. 

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Grants Timeline

Grants F.A.Q.

Grants RFP Overview

Grants Program Press Release

New International Grants Program Jump-Starts Research to Ensure AI Remains Beneficial

Elon-Musk-backed program signals growing interest in new branch of artificial intelligence research

July 1, 2015
Amid rapid industry investment in developing smarter artificial intelligence, a new branch of research has begun to take off aimed at ensuring that society can reap the benefits of AI while avoiding potential pitfalls.

The Boston-based Future of Life Institute (FLI) announced the selection of 37 research teams around the world to which it plans to award about $7 million from Elon Musk and the Open Philanthropy Project as part of a first-of-its-kind grant program dedicated to “keeping AI robust and beneficial”. The program launches as an increasing number of high-profile figures including Bill Gates, Elon Musk and Stephen Hawking voice concerns about the possibility of powerful AI systems having unintended, or even potentially disastrous, consequences. The winning teams, chosen from nearly 300 applicants worldwide, will research a host of questions in computer science, law, policy, economics, and other fields relevant to coming advances in AI.

The 37 projects being funded include:

  • Three projects developing techniques for AI systems to learn what humans prefer from observing our behavior, including projects at UC Berkeley and Oxford University
  • A project by Benja Fallenstein at the Machine Intelligence Research Institute on how to keep the interests of superintelligent systems aligned with human values
  • A project led by Manuela Veloso from Carnegie Mellon University on making AI systems explain their decisions to humans
  • A study by Michael Webb of Stanford University on how to keep the economic impacts of AI beneficial
  • A project headed by Heather Roff studying how to keep AI-driven weapons under “meaningful human control”
  • A new Oxford-Cambridge research center for studying AI-relevant policy

As Skype founder Jaan Tallinn, one of FLI’s founders, has described this new research direction, “Building advanced AI is like launching a rocket. The first challenge is to maximize acceleration, but once it starts picking up speed, you also need to to focus on steering.”

When the Future of Life Institute issued an open letter in January calling for research on how to keep AI both robust and beneficial, it was signed by a long list of AI researchers from academia, nonprofits and industry, including AI research leaders from Facebook, IBM, and Microsoft and the founders of Google’s DeepMind Technologies. It was seeing that widespread agreement that moved Elon Musk to seed the research program that has now begun.

“Here are all these leading AI researchers saying that AI safety is important”, said Musk at the time. “I agree with them, so I’m today committing $10M to support research aimed at keeping AI beneficial for humanity.”

“I am glad to have an opportunity to carry this research focused on increasing the transparency of AI robotic systems,” said Manuela Veloso, past president of the Association for the Advancement of Artificial Intelligence (AAAI) and winner of one of the grants.

“This grant program was much needed: because of its emphasis on safe AI and multidisciplinarity, it fills a gap in the overall scenario of international funding programs,” added Prof. Francesca Rossi, president of the International Joint Conference on Artificial Intelligence (IJCAI), also a grant awardee.

Tom Dietterich, president of the AAAI, described how his grant — a project studying methods for AI learning systems to self-diagnose when failing to cope with a new situation — breaks the mold of traditional research:

“In its early days, AI research focused on the ‘known knowns’ by working on problems such as chess and blocks world planning, where everything about the world was known exactly. Starting in the 1980s, AI research began studying the ‘known unknowns’ by using probability distributions to represent and quantify the likelihood of alternative possible worlds. The FLI grant will launch work on the ‘unknown unknowns’: How can an AI system behave carefully and conservatively in a world populated by unknown unknowns — aspects that the designers of the AI system have not anticipated at all?”

As Terminator Genisys debuts this week, organizers stressed the importance of separating fact from fiction. “The danger with the Terminator scenario isn’t that it will happen, but that it distracts from the real issues posed by future AI”, said FLI president Max Tegmark. “We’re staying focused, and the 37 teams supported by today’s grants should help solve such real issues.”

The full list of research grant winners can be found here. The plan is to fund these teams for up to three years, with most of the research projects starting by September 2015, and to focus the remaining $4M of the Musk-backed program on the areas that emerge as most promising.

FLI has a mission to catalyze and support research and initiatives for safeguarding life and developing optimistic visions of the future, including positive ways for humanity to steer its own course considering new technologies and challenges.

Contacts at the Future of Life Institute:

  • Max Tegmark:
  • Meia Chita-Tegmark:
  • Jaan Tallinn:
  • Anthony Aguirre:
  • Viktoriya Krakovna:
  • Jesse Galef:


Elon Musk donates $10M to keep AI beneficial

Thursday January 15, 2015

We are delighted to report that technology inventor Elon Musk, creator of Tesla and SpaceX, has decided to donate $10M to the Future of Life Institute to run a global research program aimed at keeping AI beneficial to humanity.

There is now a broad consensus that AI research is progressing steadily, and that its impact on society is likely to increase. A long list of leading AI-researchers have signed an open letter calling for research aimed at ensuring that AI systems are robust and beneficial, doing what we want them to do. Musk’s donation aims to support precisely this type of research: “Here are all these leading AI researchers saying that AI safety is important”, says Elon Musk. “I agree with them, so I’m today committing $10M to support research aimed at keeping AI beneficial for humanity.”

Musk’s announcement was welcomed by AI leaders in both academia and industry:

“It’s wonderful, because this will provide the impetus to jump-start research on AI safety”, said AAAI president Tom Dietterich. “This addresses several fundamental questions in AI research that deserve much more funding than even this donation will provide.”

“Dramatic advances in artificial intelligence are opening up a range of exciting new applications”, said Demis Hassabis, Shane Legg and Mustafa Suleyman, co-founders of DeepMind Technologies, which was recently acquired by Google. “With these newfound powers comes increased responsibility. Elon’s generous donation will support researchers as they investigate the safe and ethical use of artificial intelligence, laying foundations that will have far reaching societal impacts as these technologies continue to progress”.

Elon Musk and AAAI President Thomas Dietterich comment on the announcement
The $10M program will be administered by the Future of Life Institute, a non-profit organization whose scientific advisory board includes AI-researchers Stuart Russell and Francesca Rossi. “I love technology, because it’s what’s made 2015 better than the stone age”, says MIT professor and FLI president Max Tegmark. “Our organization studies how we can maximize the benefits of future technologies while avoiding potential pitfalls.”

The research supported by the program will be carried out around the globe via an open grants competition, through an application portal at that will open by Thursday January 22. The plan is to award the majority of the grant funds to AI researchers, and the remainder to AI-related research involving other fields such as economics, law, ethics and policy (a detailed list of examples can be found here). “Anybody can send in a grant proposal, and the best ideas will win regardless of whether they come from academia, industry or elsewhere”, says FLI co-founder Viktoriya Krakovna.

“This donation will make a major impact”, said UCSC professor and FLI co-founder Anthony Aguirre: “While heavy industry and government investment has finally brought AI from niche academic research to early forms of a potentially world-transforming technology, to date relatively little funding has been available to help ensure that this change is actually a net positive one for humanity.”

“That AI systems should be beneficial in their effect on human society is a given”, said Stuart Russell, co-author of the standard AI textbook “Artificial Intelligence: a Modern Approach”. “The research that will be funded under this program will make sure that happens. It’s an intrinsic and essential part of doing AI research.”

Skype-founder Jaan Tallinn, one of FLI’s founders, agrees: “Building advanced AI is like launching a rocket. The first challenge is to maximize acceleration, but once it starts picking up speed, you also need to to focus on steering.”

Along with research grants, the program will also include meetings and outreach programs aimed at bringing together academic AI researchers, industry AI developers and other key constituents to continue exploring how to maximize the societal benefits of AI; one such meeting was held in Puerto Rico last week with many of the open-letter signatories.

“Hopefully this grant program will help shift our focus from building things just because we can, toward building things because they are good for us in the long term”, says FLI co-founder Meia Chita-Tegmark.

Contacts at Future of Life Institute:

  • Max Tegmark:
  • Meia Chita-Tegmark:
  • Jaan Tallinn:
  • Anthony Aguirre:
  • Viktoriya Krakovna:

Contacts among AI researchers:

  • Prof. Tom Dietterich, President of the Association for the Advancement of Artificial Intelligence (AAAI), Director of Intelligent Systems:
  • Prof. Stuart Russell, Berkeley, Director of the Center for Intelligent Systems, and co-author of the standard textbook Artificial Intelligence: a Modern Approach:
  • Prof. Bart Selman, co-chair of the AAAI presidential panel on long-term AI futures:
  • Prof. Francesca Rossi, Professor of Computer Science, University of Padova and Harvard University, president of the International Joint Conference on Artificial Intelligence (IJCAI):
  • Prof. Murray Shanahan, Imperial College:

Max Tegmark interviews Elon Musk about his life, his interest in the future of humanity and the background to his donation