Making Deep Learning More Robust

Imagine how much more efficient lawyers could be if they had the time to read every legal book ever written and review every case ever brought to court. Imagine doctors with the ability to study every advancement published across the world’s medical journals, or consult every medical case, ever. Unfortunately, the human brain cannot store that much information, and it would take decades to achieve these feats.

But a computer, one specifically designed to work like the human mind, could.

Deep learning neural networks are designed to mimic the human brain’s neural connections. They are capable of learning through continuous exposure to huge amounts of data. This allows them to recognize patterns, comprehend complex concepts, and translate high-level abstractions. These networks consist of many layers, each having a different set of weights. The deeper the network, the stronger it is.

Current applications for these networks include medical diagnosis, robotics and engineering, face recognition, and automotive navigation. However, deep learning is still in development – not surprisingly, it is a huge undertaking to get machines to think like humans. In fact, very little is understood about these networks, and months of manual tuning are often required for obtaining excellent performance.

Fuxin Li, assistant professor at the Oregon State University School of Electrical Engineering and Computer Science, and his team are taking on the accuracy of these neural networks under adversarial conditions. Their research focuses on the basic machine learning aspects of deep learning, and how to make general deep learning more robust.

To try to better understand when a deep convolutional neural network (CNN) is going to be right or wrong, Li’s team had to establish an estimate of confidence in the predictions of the deep learning architecture. Those estimates can be used as safeguards when utilizing the networks in real life.

“Basically,” explains Li, “trying to make deep learning increasingly self-aware – to be aware of what type of data it has seen, and what type of data it could work on.”

The team looked at recent advances in deep learning, which have greatly improved the capability to recognize images automatically. Those networks, albeit very resistant to overfitting, were discovered to completely fail if some of the pixels in such images were perturbed via an adversarial optimization algorithm.

To a human observer, the image in question may look fine, but the deep network sees otherwise. According to the researchers, those adversarial examples are dangerous if a deep network is utilized into any crucial real application, such as autonomous driving. If the result of the network can be hacked, wrong authentications and other devastating effects would be unavoidable.

In a departure from previous perspectives that focused on improving the classifiers to correctly organize the adversarial examples, the team focused on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The accuracy for detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%.

The benefits of this research are numerous. It is vital for a neural network to be able to identify whether an example comes from a normal or an adversarial distribution. Such knowledge, if available, will help significantly to control behaviors of robots employing deep learning. A reliable procedure can prevent robots from behaving in an undesirable manner because of the false perceptions it made about the environment.

Li gives one example: “In robotics there’s this big issue about robots not doing something based on erroneous perception. It’s important for a robot to know that it’s not making a confident perception. For example, if [the robot] is saying there’s an object over there, but it’s actually a wall, he’ll go to fetch that object, and then he hits a wall.”

Hopefully, Li says, that won’t happen. However, current software and machine learning have been mostly based solely on prediction confidence within the original machine learning framework. Basically, the testing and training data are assumed to be pulled from the same distribution independently, and that can lead to incorrect assumptions.

Better confidence estimates could potentially help avoid incidents such as the Tesla crash scenario from May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. But first, the computer must be smarter. The computer has to learn to detect objects and differentiate, say, a tree from another vehicle.

“To make it really robust, you need to account for unknown objects. Something weird may hit you. A deer may jump out.” The network can’t be taught every unexpected situation, says Li, “so you need it to discover them without knowledge of what they are. That’s something that we do. We try to bridge the gap.”

Training procedures will make deep learning more automatic and lead to fewer failures, as well as confidence estimates when the deep network is utilized to predict new data. Most of this training, explains Li, comes from photo distribution using stock images. However, these are flat images much different than what a robot would normally see in day-to-day life. It’s difficult to get a 360-degree view just by looking at photos.

“There will be a big difference between the thing [the robot] trains on and the thing it really sees. So then, it is important for the robot to understand that it can predict some things confidently, and others it cannot,” says Li. “[The robot] needs to understand that it probably predicted wrong, so as not to act too aggressively toward its prediction.” This can only be achieved with a more self-aware framework, which is what Li is trying to develop with this grant.

Further, these estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Soon, Li and his team will start generalizing the approach to other domains, such as temporal models (RNNs, LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-making paradigms.

Li’s most recent update on this work can be found here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

How Smart Can AI Get?

Capability Caution Principle: There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities.

A major change is coming, over unknown timescales but across every segment of society, and the people playing a part in that transition have a huge responsibility and opportunity to shape it for the best. What will trigger this change? Artificial intelligence.

The 23 Asilomar AI Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here.

 

Capability Caution

One of the greatest questions facing AI researchers is: just how smart and capable can artificial intelligence become?

In recent years, the development of AI has accelerated in leaps and bounds. DeepMind’s AlphaGo surpassed human performance in the challenging, intricate game of Go, and the company has created AI that can quickly learn to play Atari video games with much greater prowess than a person. We’ve also seen breakthroughs and progress in language translation, self-driving vehicles, and even the creation of new medicinal molecules.

But how much more advanced can AI become? Will it continue to excel only in narrow tasks, or will it develop broader learning skills that will allow a single AI to outperform a human in most tasks? How do we prepare for an AI more intelligent than we can imagine?

Some experts think human-level or even super-human AI could be developed in a matter of a couple decades, while some don’t think anyone will ever accomplish this feat. The Capability Caution Principle argues that, until we have concrete evidence to confirm what an AI can someday achieve, it’s safer to assume that there are no upper limits – that is, for now, anything is possible and we need to plan accordingly.

 

Expert Opinion

The Capability Caution Principle drew both consensus and disagreement from the experts. While everyone I interviewed generally agreed that we shouldn’t assume upper limits for AI, their reasoning varied and some raised concerns.

Stefano Ermon, an assistant professor at Stanford and Roman Yampolskiy, an associate professor at the University of Louisville, both took a better-safe-than-sorry approach.

Ermon turned to history as a reminder of how difficult future predictions are. He explained, “It’s always hard to predict the future. … Think about what people were imagining a hundred years ago, about what the future would look like. … I think it would’ve been very hard for them to imagine what we have today. I think we should take a similar, very cautious view, about making predictions about the future. If it’s extremely hard, then it’s better to play it safe.”

Yampolskiy considered current tech safety policies, saying, “In many areas of computer science such as complexity or cryptography the default assumption is that we deal with the worst case scenario. Similarly, in AI Safety we should assume that AI will become maximally capable and prepare accordingly. If we are wrong we will still be in great shape.”

Dan Weld, a professor at the University of Washington, said of the principle, “I agree! As a scientist, I’m against making strong or unjustified assumptions about anything, so of course I agree.”

But though he agreed with the basic idea behind the principle, Weld also had reservations. “This principle bothers me,” Weld explained, “… because it seems to be implicitly saying that there is an immediate danger that AI is going to become superhumanly, generally intelligent very soon, and we need to worry about this issue. This assertion … concerns me because I think it’s a distraction from what are likely to be much bigger, more important, more near-term, potentially devastating problems. I’m much more worried about job loss and the need for some kind of guaranteed health-care, education and basic income than I am about Skynet. And I’m much more worried about some terrorist taking an AI system and trying to program it to kill all Americans than I am about an AI system suddenly waking up and deciding that it should do that on its own.”

Looking at the problem from a different perspective, Guruduth Banavar, the Vice President of IBM Research, worries that placing upper bounds on AI capabilities could limit the beneficial possibilities. Banavar explained, “The general idea is that intelligence, as we understand it today, is ultimately the ability to process information from all possible sources and to use that to predict the future and to adapt to the future. It is entirely in the realm of possibility that machines can do that. … I do think we should avoid assumptions of upper limits on machine intelligence because I don’t want artificial limits on how advanced AI can be.”

IBM research scientist Francesca Rossi considered this principle from yet another perspective, suggesting that AI is necessary for humanity to reach our full capabilities, where we also don’t want to assume upper limits.

“I personally am for building AI systems that augment human intelligence instead of replacing human intelligence,” said Rossi, “And I think that in that space of augmenting human intelligence there really is a huge potential for AI in making the personal and professional lives of everybody much better. I don’t think that there are upper limits of the future AI capabilities in that respect. I think more and more AI systems together with humans will enhance our kind of intelligence, which is complementary to the kind of intelligence that machines have, and will help us make better decisions, and live better, and solve problems that we don’t know how to solve right now. I don’t see any upper limit to that.”

 

What do you think?

Is there an upper limit to artificial intelligence? Is there an upper limit to what we can achieve with AI? How long will it take to achieve increasing levels of advanced AI? How do we plan for the future with such uncertainties? How can society as a whole address these questions? What other questions should we be asking about AI capabilities?

MIRI February 2017 Newsletter

Following up on a post outlining some of the reasons MIRI researchers and OpenAI researcher Paul Christiano are pursuing different research directions, Jessica Taylor has written up the key motivations for MIRI’s highly reliable agent design research.

Research updates

General updates

  • We attended the Future of Life Institute’s Beneficial AI conference at Asilomar. See Scott Alexander’s recap. MIRI executive director Nate Soares was on a technical safety panel discussion with representatives from DeepMind, OpenAI, and academia (video), also featuring a back-and-forth with Yann LeCun, the head of Facebook’s AI research group (at 22:00).
  • MIRI staff and a number of top AI researchers are signatories on FLI’s new Asilomar AI Principles, which include cautions regarding arms races, value misalignment, recursive self-improvement, and superintelligent AI.
  • The Center for Applied Rationality recounts MIRI researcher origin stories and other cases where their workshops have been a big assist to our work, alongside examples of CFAR’s impact on other groups.
  • The Open Philanthropy Project has awarded a $32,000 grant to AI Impacts.
  • Andrew Critch spoke at Princeton’s ENVISION conference (video).
  • Matthew Graves has joined MIRI as a staff writer. See his first piece for our blog, a reply to “Superintelligence: The Idea That Eats Smart People.”
  • The audio version of Rationality: From AI to Zombies is temporarily unavailable due to the shutdown of Castify. However, fans are already putting together a new free recording of the full collection.

News and links

  • An Asilomar panel on superintelligence (video) gathers Elon Musk (OpenAI), Demis Hassabis (DeepMind), Ray Kurzweil (Google), Stuart Russell and Bart Selman (CHCAI), Nick Bostrom (FHI), Jaan Tallinn (CSER), Sam Harris, and David Chalmers.
  • Also from Asilomar: Russell on corrigibility (video), Bostrom on openness in AI (video), and LeCun on the path to general AI (video).
  • From MIT Technology Review‘s “AI Software Learns to Make AI Software”:
    Companies must currently pay a premium for machine-learning experts, who are in short supply. Jeff Dean, who leads the Google Brain research group, mused last week that some of the work of such workers could be supplanted by software. He described what he termed “automated machine learning” as one of the most promising research avenues his team was exploring.

The Financial World of AI

Automated algorithms currently manage over half of trading volume in US equities, and as AI improves, it will continue to assume control over important financial decisions. But these systems aren’t foolproof. A small glitch could send shares plunging, potentially costing investors billions of dollars.

For firms, the decision to accept this risk is simple. The algorithms in automated systems are faster and more accurate than any human, and deploying the most advanced AI technology can keep firms in business.

But for the rest of society, the consequences aren’t clear. Artificial intelligence gives firms a competitive edge, but will these rapidly advancing systems remain safe and robust? What happens when they make mistakes?

 

Automated Errors

Michael Wellman, a professor of computer science at the University of Michigan, studies AI’s threats to the financial system. He explains, “The financial system is one of the leading edges of where AI is automating things, and it’s also an especially vulnerable sector. It can be easily disrupted, and bad things can happen.”

Consider the story of Knight Capital. On August 1, 2012, Knight decided to try out new software to stay competitive in a new trading pool. The software passed its safety tests, but when Knight deployed it, the algorithm activated its testing software instead of the live trading program. The testing software sent millions of bad orders in the following minutes as Knight frantically tried to stop it. But the damage was done.

In just 45 minutes, Knight Capital lost $440 million – nearly four times their profit in 2011 – all because of one line of code.

In this case, the damage was constrained to Knight, but what happens when one line of code can impact the entire financial system?

 

Understanding Autonomous Trading Agents

Wellman argues that autonomous trading agents are difficult to control because they process and respond to information at unprecedented speeds, they can be easily replicated on a large scale, they act independently, and they adapt to their environment.

With increasingly general capabilities, systems may learn to make money in dangerous ways that their programmers never intended. As Lawrence Pingree, an analyst at Gartner, said after the Knight meltdown, “Computers do what they’re told. If they’re told to do the wrong thing, they’re going to do it and they’re going to do it really, really well.”

In order to prevent AI systems from undermining market transparency and stability, government agencies and academics must learn how these agents work.

 

Market Manipulation

Even benign uses of AI can hinder market transparency, but Wellman worries that AI systems will learn to manipulate markets.

Autonomous trading agents are especially effective at exploiting arbitrage opportunities – where they simultaneously purchase and sell an asset to profit from pricing differences. If, for example, a stock trades at $30 in one market and $32 in a second market, an agent can buy the $30 stock and immediately sell it for $32 in the second market, making a $2 profit.

Market inefficiency naturally creates arbitrage opportunities. However, an AI may learn – on its own – to create pricing discrepancies by taking misleading actions that move the market to generate profit.

One manipulative technique is ‘spoofing’ – the act of bidding for a stock item with the intent to cancel the bid before execution. This moves the market in a certain direction, and the spoofer profits from the false signal.

Wellman and his team recently reproduced spoofing in their laboratory models, as part of an effort to understand the situations where spoofing can be effective. He explains, “We’re doing this in the laboratory to see if we can characterize the signature of AIs doing this, so that we reliably detect it and design markets to reduce vulnerability.”

As agents improve, they may learn to exploit arbitrage more maliciously by creating artificial items on the market to mislead traders, or by hacking accounts to report false events that move markets. Wellman’s work aims to produce methods to help control such manipulative behavior.

 

Secrecy in the Financial World

But the secretive nature of finance prevents academics from fully understanding the role of AI.

Wellman explains, “We know they use AI and machine learning to a significant extent, and they are constantly trying to improve their algorithms. We don’t know to what extent things like market manipulation and spoofing are automated right now, but we know that they could be automated and that could lead to something of an arms race between market manipulators and the systems trying to detect and run surveillance for market bad behavior.”

Government agencies – such as the Securities and Exchange Commission – watch financial markets, but “they’re really outgunned as far as the technology goes,” Wellman notes. “They don’t have the expertise or the infrastructure to keep up with how fast things are changing in the industry.”

But academics can help. According to Wellman, “even without doing the trading for money ourselves, we can reverse engineer what must be going on in the financial world and figure out what can happen.”

 

Preparing for Advanced AI

Although Wellman studies current and near-term AI, he’s concerned about the threat of advanced, general AI.

“One thing we can do to try to understand the far-out AI is to get experience with dealing with the near-term AI,” he explains. “That’s why we want to look at regulation of autonomous agents that are very near on the horizon or current. The hope is that we’ll learn some lessons that we can then later apply when the superintelligence comes along.”

AI systems are improving rapidly, and there is intense competition between financial firms to use them. Understanding and tracking AI’s role in finance will help financial markets remain stable and transparent.

“We may not be able to manage this threat with 100% reliability,” Wellman admits, “but I’m hopeful that we can redesign markets to make them safer for the AIs and eliminate some forms of the arms race, and that we’ll be able to get a good handle on preventing some of the most egregious behaviors.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Can We Ensure Privacy in the Era of Big Data?

Personal Privacy Principle: People should have the right to access, manage and control the data they generate, given AI systems’ power to analyze and utilize that data.

A major change is coming, over unknown timescales but across every segment of society, and the people playing a part in that transition have a huge responsibility and opportunity to shape it for the best. What will trigger this change? Artificial intelligence.

The 23 Asilomar AI Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here.

Personal Privacy

In the age of social media and online profiles, maintaining privacy is already a tricky problem. As companies collect ever-increasing quantities of data about us, and as AI programs get faster and more sophisticated at analyzing that data, our information can become both a commodity for business and a liability for us.

We’ve already seen small examples of questionable data use, such as Target recognizing a teenager was pregnant before her family knew. But this is merely advanced marketing. What happens when governments or potential employers can gather what seems like innocent and useless information (like grocery shopping preferences) to uncover your most intimate secrets – like health issues even you didn’t know about yet?

It turns out, all of the researchers I spoke to strongly agree with the Personal Privacy Principle.

The Importance of Personal Privacy

“I think that’s a big immediate issue,” says Stefano Ermon, an assistant professor at Stanford. “I think when the general public thinks about AI safety, maybe they think about killer robots or these kind of apocalyptic scenarios, but there are big concrete issues like privacy, fairness, and accountability.”

“I support that principle very strongly!” agrees Dan Weld, a professor at the University of Washington. “I’m really quite worried about the loss of privacy. The number of sensors is increasing and combined with advanced machine learning, there are few limits to what companies and governments can learn about us. Now is the time to insist on the ability to control our own data.”

Toby Walsh, a guest professor at the Technical University of Berlin, also worries about privacy. “Yes, this is a great one, and actually I’m really surprised how little discussion we have around AI and privacy,” says Walsh. “I thought there was going to be much more fallout from Snowden and some of the revelations that happened, and AI, of course, is enabling technology. If you’re collecting all of this data, the only way to make sense of it is to use AI, so I’ve been surprised that there hasn’t been more discussion and more concern amongst the public around these sorts of issues.”

Kay Firth-Butterfield, an adjunct professor at the University of Texas in Austin, adds, “As AI becomes more powerful, we need to take steps to ensure that it cannot use our personal data against us if it falls into the wrong hands.”

Taking this concern a step further, Roman Yampolskiy, an associate professor at the University of Louisville, argues that “the world’s dictatorships are looking forward to opportunities to target their citizenry with extreme levels of precision.”

“The tech we will develop,” he continues, “will most certainly become available throughout the world and so we have a responsibility to make privacy a fundamental cornerstone of any data analysis.”

But some of the researchers also worry about the money to be made from personal data.

Ermon explains, “Privacy is definitely a big one, and one of the most valuable things that these large corporations have is the data they are collecting from us, so we should think about that carefully.”

“Data is worth money,” agrees Firth-Butterfield, “and as individuals we should be able to choose when and how to monetize our own data whilst being encouraged to share data for public health and other benefits.”

Francesca Rossi, a research scientist for IBM, believes this principle is “very important,” but she also emphasizes the benefits we can gain if we can share our data without fearing it will be misused. She says, “People should really have the right to own their privacy, and companies like IBM or any other that provide AI capabilities and systems should protect the data of their clients. The quality and amount of data is essential for many AI systems to work well, especially in machine learning. … It’s also very important that these companies don’t just assure that they are taking care of the data, but that they are transparent about the use of the data. Without this transparency and trust, people will resist giving their data, which would be detrimental to the AI capabilities and the help AI can offer in solving their health problems, or whatever the AI is designed to solve.”

Privacy as as Social Right

Both Yoshua Bengio and Guruduth Banavar argued that personal privacy isn’t just something that AI researchers should value, but that it should also be considered a social right.

Bengio, a professor at the University of Montreal, says, “We should be careful that the complexity of AI systems doesn’t become a tool for abusing minorities or individuals who don’t have access to understand how it works. I think this is a serious social rights issue.” But he also worries that preventing rights violations may not be an easy technical fix. “We have to be careful with that because we may end up barring machine learning from publicly used systems, if we’re not careful,” he explains, adding, “the solution may not be as simple as saying ‘it has to be explainable,’ because it won’t be.”

And as Ermon says, “The more we delegate decisions to AI systems, the more we’re going to run into these issues.”

Meanwhile, Banavar, the Vice President of IBM Research, considers the issue of personal privacy rights especially important. He argues, “It’s absolutely crucial that individuals should have the right to manage access to the data they generate. … AI does open new insight to individuals and institutions. It creates a persona for the individual or institution – personality traits, emotional make-up, lots of the things we learn when we meet each other. AI will do that too and it’s very personal. I want to control how [my] persona is created. A persona is a fundamental right.”

What Do You Think?

And now we turn the conversation over to you. What does personal privacy mean to you? How important is it to have control over your data? The experts above may have agreed about how serious the problem of personal privacy is, but solutions are harder come by. Do we need to enact new laws to protect the public? Do we need new corporate policies? How can we ensure that companies and governments aren’t using our data for nefarious purposes – or even for well-intentioned purposes that still aren’t what we want? What else should we, as a society, be asking?

How Do We Align Artificial Intelligence with Human Values?

A major change is coming, over unknown timescales but across every segment of society, and the people playing a part in that transition have a huge responsibility and opportunity to shape it for the best. What will trigger this change? Artificial intelligence.

Recently, some of the top minds in AI and related fields got together to discuss how we can ensure AI remains beneficial throughout this transition, and the result was the Asilomar AI Principles document. The intent of these 23 principles is to offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.”

The Principles represent the beginning of a conversation, and now that the conversation is underway, we need to follow up with broad discussion about each individual principle. The Principles will mean different things to different people, and in order to benefit as much of society as possible, we need to think about each principle individually.

As part of this effort, I interviewed many of the AI researchers who signed the Principles document to learn their take on why they signed and what issues still confront us.

Value Alignment

Today, we start with the Value Alignment principle.

Value Alignment: Highly autonomous AI systems should be designed so that their goals and behaviors can be assured to align with human values throughout their operation.

Stuart Russell, who helped pioneer the idea of value alignment, likes to compare this to the King Midas story. When King Midas asked for everything he touched to turn to gold, he really just wanted to be rich. He didn’t actually want his food and loved ones to turn to gold. We face a similar situation with artificial intelligence: how do we ensure that an AI will do what we really want, while not harming humans in a misguided attempt to do what its designer requested?

“Robots aren’t going to try to revolt against humanity,” explains Anca Dragan, an assistant professor and colleague of Russell’s at UC Berkeley, “they’ll just try to optimize whatever we tell them to do. So we need to make sure to tell them to optimize for the world we actually want.”

What Do We Want?

Understanding what “we” want is among the biggest challenges facing AI researchers.

“The issue, of course, is to define what exactly these values are, because people might have different cultures, [come from] different parts of the world, [have] different socioeconomic backgrounds — I think people will have very different opinions on what those values are. And so that’s really the challenge,” says Stefano Ermon, an assistant professor at Stanford.

Roman Yampolskiy, an associate professor at the University of Louisville agrees. He explains, “It is very difficult to encode human values in a programming language, but the problem is made more difficult by the fact that we as humanity do not agree on common values, and even parts we do agree on change with time.”

And while some values are hard to gain consensus around, there are also lots of values we all implicitly agree on. As Russell notes, any human understands emotional and sentimental values that they’ve been socialized with, but it’s difficult to guarantee that a robot will be programmed with that same understanding.

But IBM research scientist Francesca Rossi is hopeful. As Rossi points out, “there is scientific research that can be undertaken to actually understand how to go from these values that we all agree on to embedding them into the AI system that’s working with humans.”

Dragan’s research comes at the problem from a different direction. Instead of trying to understand people, she looks at trying to train a robot or AI to be flexible with its goals as it interacts with people. She explains, “At Berkeley, … we think it’s important for agents to have uncertainty about their objectives, rather than assuming they are perfectly specified, and treat human input as valuable observations about the true underlying desired objective.”

Rewrite the Principle?

While most researchers agree with the underlying idea of the Value Alignment Principle, not everyone agrees with how it’s phrased, let alone how to implement it.

Yoshua Bengio, an AI pioneer and professor at the University of Montreal, suggests “assured” may be too strong. He explains, “It may not be possible to be completely aligned. There are a lot of things that are innate, which we won’t be able to get by machine learning, and that may be difficult to get by philosophy or introspection, so it’s not totally clear we’ll be able to perfectly align. I think the wording should be something along the lines of ‘we’ll do our best.’ Otherwise, I totally agree.”

Walsh, who’s currently a guest professor at the Technical University of Berlin, questions the use of the word “highly.” “I think any autonomous system, even a lowly autonomous system, should be aligned with human values. I’d wordsmith away the ‘high,’” he says.

Walsh also points out that, while value alignment is often considered an issue that will arise in the future, he believes it’s something that needs to be addressed sooner rather than later. “I think that we have to worry about enforcing that principle today,” he explains. “I think that will be helpful in solving the more challenging value alignment problem as systems get more sophisticated.

Rossi, who supports the the Value Alignment Principle as, “the one closest to my heart,” agrees that the principle should apply to current AI systems. “I would be even more general than what you’ve written in this principle,” she says. “Because this principle has to do not only with autonomous AI systems, but … is very important and essential also for systems that work tightly with humans-in-the-loop and where the human is the final decision maker. When you have a human and machine tightly working together, you want this to be a real team.”

But as Dragan explains, “This is one step toward helping AI figure out what it should do, and continuously refining the goals should be an ongoing process between humans and AI.”

Let the Dialogue Begin

And now we turn the conversation over to you. What does it mean to you to have artificial intelligence aligned with your own life goals and aspirations? How can it be aligned with you and everyone else in the world at the same time? How do we ensure that one person’s version of an ideal AI doesn’t make your life more difficult? How do we go about agreeing on human values, and how can we ensure that AI understands these values? If you have a personal AI assistant, how should it be programmed to behave? If we have AI more involved in things like medicine or policing or education, what should that look like? What else should we, as a society, be asking?

Podcast: Top AI Breakthroughs, with Ian Goodfellow and Richard Mallah

2016 saw some significant AI developments. To talk about the AI progress of the last year, we turned to Richard Mallah and Ian Goodfellow. Richard is the director of AI projects at FLI, he’s the Senior Advisor to multiple AI companies, and he created the highest-rated enterprise text analytics platform. Ian is a research scientist at OpenAI, he’s the lead author of the Deep Learning textbook, and he’s a lead inventor of Generative Adversarial Networks.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

Ariel: Two events stood out to me in 2016. The first was AlphaGo, which beat the world’s top Go champion, Lee Sedol last March. What is AlphaGo, and why was this such an incredible achievement?

Ian: AlphaGo was DeepMind’s system for playing the game of Go. It’s a game where you place stones on a board with two players, the object being to capture as much territory as possible. But there are hundreds of different positions where we can place a stone on each turn. It’s not even remotely possible to use a computer to simulate many different Go games and figure out how the game will progress in the future. The computer needs to rely on intuition the same way that human Go players can look at a board and get kind of a sixth sense that tells them whether the game is going well or poorly for them, and where they ought to put the next stone. It’s computationally infeasible to explicitly calculate what each player should do next.

Richard: The DeepMind team has one network for what’s called value learning and another deep network for policy learning. The policy is, basically, which places should I evaluate for the next piece. The value network is how good that state is, in terms of the probability that the agent will be winning. And then they do a Monte Carlo tree search, which means it has some randomness and many different paths — on the order of thousands of evaluations. So it’s much more like a human considering a handful of different moves and trying to determine how good those moves would be.

Ian: From 2012 to 2015 we saw a lot of breakthroughs where the exciting thing was that AI was able to copy a human ability. In 2016, we started to see breakthroughs that were all about exceeding human performance. Part of what was so exciting about AlphaGo was that AlphaGo did not only learn how to predict what a human expert Go player would do, AlphaGo also improved beyond that by practicing playing games against itself and learning how to be better than the best human player. So we’re starting to see AI move beyond what humans can tell the computer to do.

Ariel: So how will this be applied to applications that we’ll interact with on a regular basis? How will we start to see these technologies and techniques in action ourselves?

Richard: With these techniques, a lot of them are research systems. It’s not necessarily that they’re going to directly go down the pipeline towards productization, but they are helping the models that are implicitly learned inside of AI systems and machine learning systems to get much better.

Ian: There are other strategies for generating new experiences that resemble previously seen experiences. One of them is called WaveNet. It’s a model produced by DeepMind in 2016 for generating speech. If you provide a sentence, just written down, and you’d like to hear that sentence spoken aloud, WaveNet can create an audio waveform that sounds very realistically like a human pronouncing that sentence written down. The main drawback to WaveNet right now is that it’s fairly slow. It has to generate the audio waveform one piece at a time. I believe it takes WaveNet two minutes to produce one second of audio, so it’s not able to make the audio fast enough to hold an interactive conversation.

Richard: And similarly, we’ve seen applications to colorizing black and white photos, or turning sketches into somewhat photo-realistic images, being able to turn text into images.

Ian: Yeah one thing that really highlights how far we’ve come is that in 2014, one of the big breakthroughs was the ability to take a photo and produce a sentence summarizing what was in the photo. In 2016, we saw different methods for taking a sentence and producing a photo that contains the imagery described by the sentence. It’s much more complicated to go from a few words to a very realistic image containing thousands or millions of pixels than it is to go from the image to the words.

Another thing that was very exciting in 2016 was the use of generative models for drug discovery. Instead of imagining new images, the model could actually imagine new molecules that are intended to have specific medicinal effects.

Richard: And this is pretty exciting because this is being applied towards cancer research, developing potential new cancer treatments.

Ariel: And then there was Google’s language translation program, Google Neural Machine Translation. Can you talk about what that did and why it was a big deal?

Ian: It’s a big deal for two different reasons. First, Google Neural Machine Translation is a lot better than previous approaches to machine translation. Google Neural Machine Translation removes a lot of the human design elements, and just has a neural network figure out what to do.

The other thing that’s really exciting about Google Neural Machine Translation is that the machine translation models have developed what we call an “Interlingua.” It used to be that if you wanted to translate from Japanese to Korean, you had to find a lot of sentences that had been translated from Japanese to Korean before, and then you could train a machine learning model to copy that translation procedure. But now, if you already know how to translate from English to Korean, and you know how to translate from English to Japanese, in the middle, you have Interlingua. So you translate from English to Interlingua and then to Japanese, English to Interlingua and then to Korean. You can also just translate Japanese to Interlingua and Korean to Interlingua and then Interlingua to Japanese or Korean, and you never actually have to get translated sentences from every pair of languages.

Ariel: How can the techniques that are used for language apply elsewhere? How do you anticipate seeing this developed in 2017 and onward?

Richard: So I think what we’ve learned from the approach is that deep learning systems are able to create extremely rich models of the world that can actually express what we can think, which is a pretty exciting milestone. Being able to combine that Interlingua with more structured information about the world is something that a variety of teams are working on — it is a big, open area for the coming years.

Ian: At OpenAI one of our largest projects, Universe, allows a reinforcement learning agent to play many different computer games, and it interacts with these games in the same way that a human does, by sending key presses or mouse strokes to the actual game engine. The same reinforcement learning agent is able to interact with basically anything that a human can interact with on a computer. By having one agent that can do all of these different things we will really exercise our ability to create general artificial intelligence instead of application-specific artificial intelligence. And projects like Google’s Interlingua have shown us that there’s a lot of reason to believe that this will work.

Ariel: What else happened this year that you guys think is important to mention?

Richard: One-shot [learning] is when you see just a little bit of data, potentially just one data point, regarding some new task or some new category, and you’re then able to deduce what that class should look like or what that function should look like in general. So being able to train systems on very little data from just general background knowledge, will be pretty exciting.

Ian: One thing that I’m very excited about is this new area called machine learning security where an attacker can trick a machine learning system into taking the wrong action. For example, we’ve seen that it’s very easy to fool an object-recognition system. We can show it an image that looks a lot like a panda and it gets recognized as being a school bus, or vice versa. It’s actually possible to fool machine learning systems with physical objects. There was a paper called Accessorize to a Crime, that showed that by wearing unusually-colored glasses it’s possible to thwart a face recognition system. And my own collaborators at GoogleBrain and I wrote a paper called Adversarial Examples in the Physical World, where we showed that we can make images that look kind of grainy and noisy, but when viewed through a camera we can control how an object-recognition system will respond to those images.

Ariel: Is there anything else that you thought was either important for 2016 or looking forward to 2017?

Richard: Yeah, looking forward to 2017 I think there will be more focus on unsupervised learning. Most of the world is not annotated by humans. There aren’t little sticky notes on things around the house saying what they are. Being able to process [the world] in a more unsupervised way will unlock a plethora of new applications.

Ian: It will also make AI more democratic. Right now, if you want to use really advanced AI you need to have not only a lot of computers but also a lot of data. That’s part of why it’s mostly very large companies that are competitive in the AI space. If you want to get really good at a task you basically become good at that task by showing the computer a million different examples. In the future, we’ll have AI that can learn much more like a human learns, where just showing it a few examples is enough. Once we have machine learning systems that are able to get the general idea of what’s going on very quickly, in the way that humans do, it won’t be necessary to build these gigantic data sets anymore.

Richard: One application area I think will be important this coming year is automatic detection of fake news, fake audio and fake images and fake video. Some of the applications this past year have actually focused on generating additional frames of video. As those get better, as the photo generation that we talked about earlier gets better, and also as audio templating gets better… I think it was Adobe that demoed what they called PhotoShop for Voice, where you can type something in and select a person, and it will sound like that person saying whatever it is that you typed. So we’ll need ways of detecting that, since this whole concept of fake news is quite at the fore these days.

Ian: It’s worth mentioning that there are other ways of addressing the spread of fake news. Email spam uses a lot of different clues that it can statistically associate with whether people mark the email as spam or not. We can do a lot without needing to advance the underlying AI systems at all.

Ariel: Is there anything that you’re worried about, based on advances that you’ve seen in the last year?

Ian: The employment issue. As we’re able to automate our tasks in the future, how will we make sure that everyone benefits from that automation? And the way that society is structured, right now increasing automation seems to lead to increasing concentration of wealth, and there are winners and losers to every advance. My concern is that automating jobs that are done by millions of people will create very many losers and a small number of winners who really win big.

Richard: I’m also slightly concerned with the speed at which we’re approaching additional generality. It’s extremely cool to see systems be able to do lots of different things, and being able to do tasks that they’ve either seen very little of or none of before. But it raises questions as to when we implement different types of safety techniques. I don’t think that we’re at that point yet, but it raises the issue.

Ariel: To end on a positive note: looking back on what you saw last year, what has you most hopeful for our future?

Ian: I think it’s really great that AI is starting to be used for things like medicine. In the last year we’ve seen a lot of different machine learning algorithms that could exceed human abilities at some tasks, and we’ve also started to see the application of AI to life-saving application areas like designing new medicines. And this makes me very hopeful that we’re going to start seeing superhuman drug design, and other kinds of applications of AI to just really make life better for a lot of people in ways that we would not have been able to do without it.

Richard: Various kinds of tasks that people find to be drudgery within their jobs will be automatable. That will lead them to be open to working on more value-added things with more creativity, and potentially be able to work in more interesting areas of their field or across different fields. I think the future is wide open and it’s really what we make of it, which is exciting in itself.

 

 

Why 2016 Was Actually a Year of Hope

Just about everyone found something to dislike about 2016, from wars to politics and celebrity deaths. But hidden within this year’s news feeds were some really exciting news stories. And some of them can even give us hope for the future.

Artificial Intelligence

Though concerns about the future of AI still loom, 2016 was a great reminder that, when harnessed for good, AI can help humanity thrive.

AI and Health

Some of the most promising and hopefully more immediate breakthroughs and announcements were related to health. Google’s DeepMind announced a new division that would focus on helping doctors improve patient care. Harvard Business Review considered what an AI-enabled hospital might look like, which would improve the hospital experience for the patient, the doctor, and even the patient’s visitors and loved ones. A breakthrough from MIT researchers could see AI used to more quickly and effectively design new drug compounds that could be applied to a range of health needs.

More specifically, Microsoft wants to cure cancer, and the company has been working with research labs and doctors around the country to use AI to improve cancer research and treatment. But Microsoft isn’t the only company that hopes to cure cancer. DeepMind Health also partnered with University College London’s hospitals to apply machine learning to diagnose and treat head and neck cancers.

AI and Society

Other researchers are turning to AI to help solve social issues. While AI has what is known as the “white guy problem” and examples of bias cropped up in many news articles, Fei Fei Li has been working with STEM girls at Stanford to bridge the gender gap. Stanford researchers also published research that suggests  artificial intelligence could help us use satellite data to combat global poverty.

It was also a big year for research on how to keep artificial intelligence safe as it continues to develop. Google and the Future of Humanity Institute made big headlines with their work to design a “kill switch” for AI. Google Brain also published a research agenda on various problems AI researchers should be studying now to help ensure safe AI for the future.

Even the White House got involved in AI this year, hosting four symposia on AI and releasing reports in October and December about the potential impact of AI and the necessary areas of research. The White House reports are especially focused on the possible impact of automation on the economy, but they also look at how the government can contribute to AI safety, especially in the near future.

AI in Action

And of course there was AlphaGo. In January, Google’s DeepMind published a paper, which announced that the company had created a program, AlphaGo, that could beat one of Europe’s top Go players. Then, in March, in front of a live audience, AlphaGo beat the reigning world champion of Go in four out of five games. These results took the AI community by surprise and indicate that artificial intelligence may be progressing more rapidly than many in the field realized.

And AI went beyond research labs this year to be applied practically and beneficially in the real world. Perhaps most hopeful was some of the news that came out about the ways AI has been used to address issues connected with pollution and climate change. For example, IBM has had increasing success with a program that can forecast pollution in China, giving residents advanced warning about days of especially bad air. Meanwhile, Google was able to reduce its power usage by using DeepMind’s AI to manipulate things like its cooling systems.

And speaking of addressing climate change…

Climate Change

With recent news from climate scientists indicating that climate change may be coming on faster and stronger than previously anticipated and with limited political action on the issue, 2016 may not have made climate activists happy. But even here, there was some hopeful news.

Among the biggest news was the ratification of the Paris Climate Agreement. But more generally, countries, communities and businesses came together on various issues of global warming, and Voices of America offers five examples of how this was a year of incredible, global progress.

But there was also news of technological advancements that could soon help us address climate issues more effectively. Scientists at Oak Ridge National Laboratory have discovered a way to convert CO2 into ethanol. A researcher from UC Berkeley has developed a method for artificial photosynthesis, which could help us more effectively harness the energy of the sun. And a multi-disciplinary team has genetically engineered bacteria that could be used to help combat global warming.

Biotechnology

Biotechnology — with fears of designer babies and manmade pandemics – is easily one of most feared technologies. But rather than causing harm, the latest biotech advances could help to save millions of people.

CRISPR

In the course of about two years, CRISPR-cas9 went from a new development to what could become one of the world’s greatest advances in biology. Results of studies early in the year were promising, but as the year progressed, the news just got better. CRISPR was used to successfully remove HIV from human immune cells. A team in China used CRISPR on a patient for the first time in an attempt to treat lung cancer (treatments are still ongoing), and researchers in the US have also received approval to test CRISPR cancer treatment in patients. And CRISPR was also used to partially restore sight to blind animals.

Gene Drive

Where CRISPR could have the most dramatic, life-saving effect is in gene drives. By using CRISPR to modify the genes of an invasive species, we could potentially eliminate the unwelcome plant or animal, reviving the local ecology and saving native species that may be on the brink of extinction. But perhaps most impressive is the hope that gene drive technology could be used to end mosquito- and tick-borne diseases, such as malaria, dengue, Lyme, etc. Eliminating these diseases could easily save over a million lives every year.

Other Biotech News

The year saw other biotech advances as well. Researchers at MIT addressed a major problem in synthetic biology in which engineered genetic circuits interfere with each other. Another team at MIT engineered an antimicrobial peptide that can eliminate many types of bacteria, including some of the antibiotic-resistant “superbugs.” And various groups are also using CRISPR to create new ways to fight antibiotic-resistant bacteria.

Nuclear Weapons

If ever there was a topic that does little to inspire hope, it’s nuclear weapons. Yet even here we saw some positive signs this year. The Cambridge City Council voted to divest their $1 billion pension fund from any companies connected with nuclear weapons, which earned them an official commendation from the U.S. Conference of Mayors. In fact, divestment may prove a useful tool for the general public to express their displeasure with nuclear policy, which will be good, since one cause for hope is that the growing awareness of the nuclear weapons situation will help stigmatize the new nuclear arms race.

In February, Londoners held the largest anti-nuclear rally Britain had seen in decades, and the following month MinutePhysics posted a video about nuclear weapons that’s been seen by nearly 1.3 million people. In May, scientific and religious leaders came together to call for steps to reduce nuclear risks. And all of that pales in comparison to the attention the U.S. elections brought to the risks of nuclear weapons.

As awareness of nuclear risks grows, so do our chances of instigating the change necessary to reduce those risks.

The United Nations Takes on Weapons

But if awareness alone isn’t enough, then recent actions by the United Nations may instead be a source of hope. As October came to a close, the United Nations voted to begin negotiations on a treaty that would ban nuclear weapons. While this might not have an immediate impact on nuclear weapons arsenals, the stigmatization caused by such a ban could increase pressure on countries and companies driving the new nuclear arms race.

The U.N. also announced recently that it would officially begin looking into the possibility of a ban on lethal autonomous weapons, a cause that’s been championed by Elon Musk, Steve Wozniak, Stephen Hawking and thousands of AI researchers and roboticists in an open letter.

Looking Ahead

And why limit our hope and ambition to merely one planet? This year, a group of influential scientists led by Yuri Milner announced an Alpha-Centauri starshot, in which they would send a rocket of space probes to our nearest star system. Elon Musk later announced his plans to colonize Mars. And an MIT scientist wants to make all of these trips possible for humans by using CRISPR to reengineer our own genes to keep us safe in space.

Yet for all of these exciting events and breakthroughs, perhaps what’s most inspiring and hopeful is that this represents only a tiny sampling of all of the amazing stories that made the news this year. If trends like these keep up, there’s plenty to look forward to in 2017.

Podcast: FLI 2016 – A Year In Review

For FLI, 2016 was a great year, full of our own success, but also great achievements from so many of the organizations we work with. Max, Meia, Anthony, Victoria, Richard, Lucas, David, and Ariel discuss what they were most excited to see in 2016 and what they’re looking forward to in 2017.

AGUIRRE: I’m Anthony Aguirre. I am a professor of physics at UC Santa Cruz, and I’m one of the founders of the Future of Life Institute.

STANLEY: I’m David Stanley, and I’m currently working with FLI as a Project Coordinator/Volunteer Coordinator.

PERRY: My name is Lucas Perry, and I’m a Project Coordinator with the Future of Life Institute.

TEGMARK: I’m Max Tegmark, and I have the fortune to be the President of the Future of Life Institute.

CHITA-TEGMARK: I’m Meia Chita-Tegmark, and I am a co-founder of the Future of Life Institute.

MALLAH: Hi, I’m Richard Mallah. I’m the Director of AI Projects at the Future of Life Institute.

KRAKOVNA: Hi everyone, I am Victoria Krakovna, and I am one of the co-founders of FLI. I’ve recently taken up a position at Google DeepMind working on AI safety.

CONN: And I’m Ariel Conn, the Director of Media and Communications for FLI. 2016 has certainly had its ups and downs, and so at FLI, we count ourselves especially lucky to have had such a successful year. We’ve continued to progress with the field of AI safety research, we’ve made incredible headway with our nuclear weapons efforts, and we’ve worked closely with many amazing groups and individuals. On that last note, much of what we’ve been most excited about throughout 2016 is the great work these other groups in our fields have also accomplished.

Over the last couple of weeks, I’ve sat down with our founders and core team to rehash their highlights from 2016 and also to learn what they’re all most looking forward to as we move into 2017.

To start things off, Max gave a summary of the work that FLI does and why 2016 was such a success.

TEGMARK: What I was most excited by in 2016 was the overall sense that people are taking seriously this idea – that we really need to win this race between the growing power of our technology and the wisdom with which we manage it. Every single way in which 2016 is better than the Stone Age is because of technology, and I’m optimistic that we can create a fantastic future with tech as long as we win this race. But in the past, the way we’ve kept one step ahead is always by learning from mistakes. We invented fire, messed up a bunch of times, and then invented the fire extinguisher. We at the Future of Life Institute feel that that strategy of learning from mistakes is a terrible idea for more powerful tech, like nuclear weapons, artificial intelligence, and things that can really alter the climate of our globe.

Now, in 2016 we saw multiple examples of people trying to plan ahead and to avoid problems with technology instead of just stumbling into them. In April, we had world leaders getting together and signing the Paris Climate Accords. In November, the United Nations General Assembly voted to start negotiations about nuclear weapons next year. The question is whether they should actually ultimately be phased out; whether the nations that don’t have nukes should work towards stigmatizing building more of them – with the idea that 14,000 is way more than anyone needs for deterrence. And – just the other day – the United Nations also decided to start negotiations on the possibility of banning lethal autonomous weapons, which is another arms race that could be very, very destabilizing. And if we keep this positive momentum, I think there’s really good hope that all of these technologies will end up having mainly beneficial uses.

Today, we think of our biologist friends as mainly responsible for the fact that we live longer and healthier lives, and not as those guys who make the bioweapons. We think of chemists as providing us with better materials and new ways of making medicines, not as the people who built chemical weapons and are all responsible for global warming. We think of AI scientists as – I hope, when we look back on them in the future – as people who helped make the world better, rather than the ones who just brought on the AI arms race. And it’s very encouraging to me that as much as people in general – but also the scientists in all these fields – are really stepping up and saying, “Hey, we’re not just going to invent this technology, and then let it be misused. We’re going to take responsibility for making sure that the technology is used beneficially.”

CONN: And beneficial AI is what FLI is primarily known for. So what did the other members have to say about AI safety in 2016? We’ll hear from Anthony first.

AGUIRRE: I would say that what has been great to see over the last year or so is the AI safety and beneficiality research field really growing into an actual research field. When we ran our first conference a couple of years ago, they were these tiny communities who had been thinking about the impact of artificial intelligence in the future and in the long-term future. They weren’t really talking to each other; they weren’t really doing much actual research – there wasn’t funding for it. So, to see in the last few years that transform into something where it takes a massive effort to keep track of all the stuff that’s being done in this space now. All the papers that are coming out, the research groups – you sort of used to be able to just find them all, easily identified. Now, there’s this huge worldwide effort and long lists, and it’s difficult to keep track of. And that’s an awesome problem to have.

As someone who’s not in the field, but sort of watching the dynamics of the research community, that’s what’s been so great to see. A research community that wasn’t there before really has started, and I think in the past year we’re seeing the actual results of that research start to come in. You know, it’s still early days. But it’s starting to come in, and we’re starting to see papers that have been basically created using these research talents and the funding that’s come through the Future of Life Institute. It’s been super gratifying. And seeing that it’s a fairly large amount of money – but fairly small compared to the total amount of research funding in artificial intelligence or other fields – but because it was so funding-starved and talent-starved before, it’s just made an enormous impact. And that’s been nice to see.

CONN: Not surprisingly, Richard was equally excited to see AI safety becoming a field of ever-increasing interest for many AI groups.

MALLAH: I’m most excited by the continued mainstreaming of AI safety research. There are more and more publications coming out by places like DeepMind and Google Brain that have really lent additional credibility to the space, as well as a continued uptake of more and more professors, and postdocs, and grad students from a wide variety of universities entering this space. And, of course, OpenAI has come out with a number of useful papers and resources.

I’m also excited that governments have really realized that this is an important issue. So, while the White House reports have come out recently focusing more on near-term AI safety research, they did note that longer-term concerns like superintelligence are not necessarily unreasonable for later this century. And that they do support – right now – funding safety work that can scale toward the future, which is really exciting. We really need more funding coming into the community for that type of research. Likewise, other governments – like the U.K. and Japan, Germany – have all made very positive statements about AI safety in one form or another. And other governments around the world.

CONN: In addition to seeing so many other groups get involved in AI safety, Victoria was also pleased to see FLI taking part in so many large AI conferences.

KRAKOVNA: I think I’ve been pretty excited to see us involved in these AI safety workshops at major conferences. So on the one hand, our conference in Puerto Rico that we organized ourselves was very influential and helped to kick-start making AI safety more mainstream in the AI community. On the other hand, it felt really good in 2016 to complement that with having events that are actually part of major conferences that were co-organized by a lot of mainstream AI researchers. I think that really was an integral part of the mainstreaming of the field. For example, I was really excited about the Reliable Machine Learning workshop at ICML that we helped to make happen. I think that was something that was quite positively received at the conference, and there was a lot of good AI safety material there.

CONN: And of course, Victoria was also pretty excited about some of the papers that were published this year connected to AI safety, many of which received at least partial funding from FLI.

KRAKOVNA: There were several excellent papers in AI safety this year, addressing core problems in safety for machine learning systems. For example, there was a paper from Stuart Russell’s lab published at NIPS, on cooperative IRL. This is about teaching AI what humans want – how to train an RL algorithm to learn the right reward function that reflects what humans want it to do. DeepMind and FHI published a paper at UAI on safely interruptible agents, that formalizes what it means for an RL agent not to have incentives to avoid shutdown. MIRI made an impressive breakthrough with their paper on logical inductors. I’m super excited about all these great papers coming out, and that our grant program contributed to these results.

CONN: For Meia, the excitement about AI safety went beyond just the technical aspects of artificial intelligence.

CHITA-TEGMARK: I am very excited about the dialogue that FLI has catalyzed – and also engaged in – throughout 2016, and especially regarding the impact of technology on society. My training is in psychology; I’m a psychologist. So I’m very interested in the human aspect of technology development. I’m very excited about questions like, how are new technologies changing us? How ready are we to embrace new technologies? Or how our psychological biases may be clouding our judgement about what we’re creating and the technologies that we’re putting out there. Are these technologies beneficial for our psychological well-being, or are they not?

So it has been extremely interesting for me to see that these questions are being asked more and more, especially by artificial intelligence developers and also researchers. I think it’s so exciting to be creating technologies that really force us to grapple with some of the most fundamental aspects, I would say, of our own psychological makeup. For example, our ethical values, our sense of purpose, our well-being, maybe our biases and shortsightedness and shortcomings as biological human beings. So I’m definitely very excited about how the conversation regarding technology – and especially artificial intelligence – has evolved over the last year. I like the way it has expanded to capture this human element, which I find so important. But I’m also so happy to feel that FLI has been an important contributor to this conversation.

CONN: Meanwhile, as Max described earlier, FLI has also gotten much more involved in decreasing the risk of nuclear weapons, and Lucas helped spearhead one of our greatest accomplishments of the year.

PERRY: One of the things that I was most excited about was our success with our divestment campaign. After a few months, we had great success in our own local Boston area with helping the City of Cambridge to divest its $1 billion portfolio from nuclear weapon producing companies. And we see this as a really big and important victory within our campaign to help institutions, persons, and universities to divest from nuclear weapons producing companies.

CONN: And in order to truly be effective we need to reach an international audience, which is something Dave has been happy to see grow this year.

STANLEY: I’m mainly excited about – at least, in my work – the increasing involvement and response we’ve had from the international community in terms of reaching out about these issues. I think it’s pretty important that we engage the international community more, and not just academics. Because these issues – things like nuclear weapons and the increasing capabilities of artificial intelligence – really will affect everybody. And they seem to be really underrepresented in mainstream media coverage as well.

So far, we’ve had pretty good responses just in terms of volunteers from many different countries around the world being interested in getting involved to help raise awareness in their respective communities, either through helping develop apps for us, or translation, or promoting just through social media these ideas in their little communities.

CONN: Many FLI members also participated in both local and global events and projects, like the following we’re about  to hear from Victoria, Richard, Lucas and Meia.

KRAKOVNA: The EAGX Oxford Conference was a fairly large conference. It was very well organized, and we had a panel there with Demis Hassabis, Nate Soares from MIRI, Murray Shanahan from Imperial, Toby Ord from FHI, and myself. I feel like overall, that conference did a good job of, for example, connecting the local EA community with the people at DeepMind, who are really thinking about AI safety concerns like Demis and also Sean Legassick, who also gave a talk about the ethics and impacts side of things. So I feel like that conference overall did a good job of connecting people who are thinking about these sorts of issues, which I think is always a great thing.  

MALLAH: I was involved in this endeavor with IEEE regarding autonomy and ethics in autonomous systems, sort of representing FLI’s positions on things like autonomous weapons and long-term AI safety. One thing that came out this year – just a few days ago, actually, due to this work from IEEE – is that the UN actually took the report pretty seriously, and it may have influenced their decision to take up the issue of autonomous weapons formally next year. That’s kind of heartening.

PERRY: A few different things that I really enjoyed doing were giving a few different talks at Duke and Boston College, and a local effective altruism conference. I’m also really excited about all the progress we’re making on our nuclear divestment application. So this is an application that will allow anyone to search their mutual fund and see whether or not their mutual funds have direct or indirect holdings in nuclear weapons-producing companies.

CHITA-TEGMARK:  So, a wonderful moment for me was at the conference organized by Yann LeCun in New York at NYU, when Daniel Kahneman, one of my thinker-heroes, asked a very important question that really left the whole audience in silence. He asked, “Does this make you happy? Would AI make you happy? Would the development of a human-level artificial intelligence make you happy?” I think that was one of the defining moments, and I was very happy to participate in this conference.

Later on, David Chalmers, another one of my thinker-heroes – this time, not the psychologist but the philosopher – organized another conference, again at NYU, trying to bring philosophers into this very important conversation about the development of artificial intelligence. And again, I felt there too, that FLI was able to contribute and bring in this perspective of the social sciences on this issue.

CONN: Now, with 2016 coming to an end, it’s time to turn our sites to 2017, and FLI is excited for this new year to be even more productive and beneficial.

TEGMARK: We at the Future of Life Institute are planning to focus primarily on artificial intelligence, and on reducing the risk of accidental nuclear war in various ways. We’re kicking off by having an international conference on artificial intelligence, and then we want to continue throughout the year providing really high-quality and easily accessible information on all these key topics, to help inform on what happens with climate change, with nuclear weapons, with lethal autonomous weapons, and so on.

And looking ahead here, I think it’s important right now – especially since a lot of people are very stressed out about the political situation in the world, about terrorism, and so on – to not ignore the positive trends and the glimmers of hope we can see as well.

CONN: As optimistic as FLI members are about 2017, we’re all also especially hopeful and curious to see what will happen with continued AI safety research.

AGUIRRE: I would say I’m looking forward to seeing in the next year more of the research that comes out, and really sort of delving into it myself, and understanding how the field of artificial intelligence and artificial intelligence safety is developing. And I’m very interested in this from the forecast and prediction standpoint.

I’m interested in trying to draw some of the AI community into really understanding how artificial intelligence is unfolding – in the short term and the medium term – as a way to understand, how long do we have? Is it, you know, if it’s really infinity, then let’s not worry about that so much, and spend a little bit more on nuclear weapons and global warming and biotech, because those are definitely happening. If human-level AI were 8 years away… honestly, I think we should be freaking out right now. And most people don’t believe that, I think most people are in the middle it seems, of thirty years or fifty years or something, which feels kind of comfortable. Although it’s not that long, really, on the big scheme of things. But I think it’s quite important to know now, which is it? How fast are these things, how long do we really have to think about all of the issues that FLI has been thinking about in AI? How long do we have before most jobs in industry and manufacturing are replaceable by a robot being slotted in for a human? That may be 5 years, it may be fifteen… It’s probably not fifty years at all. And having a good forecast on those good short-term questions I think also tells us what sort of things we have to be thinking about now.

And I’m interested in seeing how this massive AI safety community that’s started develops. It’s amazing to see centers kind of popping up like mushrooms after a rain all over and thinking about artificial intelligence safety. This partnership on AI between Google and Facebook and a number of other large companies getting started. So to see how those different individual centers will develop and how they interact with each other. Is there an overall consensus on where things should go? Or is it a bunch of different organizations doing their own thing? Where will governments come in on all of this? I think it will be interesting times. So I look forward to seeing what happens, and I will reserve judgement in terms of my optimism.

KRAKOVNA: I’m really looking forward to AI safety becoming even more mainstream, and even more of the really good researchers in AI giving it serious thought. Something that happened in the past year that I was really excited about, that I think is also pointing in this direction, is the research agenda that came out of Google Brain called “Concrete Problems in AI Safety.” And I think I’m looking forward to more things like that happening, where AI safety becomes sufficiently mainstream that people who are working in AI just feel inspired to do things like that and just think from their own perspectives: what are the important problems to solve in AI safety? And work on them.

I’m a believer in the portfolio approach with regards to AI safety research, where I think we need a lot of different research teams approaching the problems from different angles and making different assumptions, and hopefully some of them will make the right assumption. I think we are really moving in the direction in terms of more people working on these problems, and coming up with different ideas. And I look forward to seeing more of that in 2017. I think FLI can also help continue to make this happen.

MALLAH: So, we’re in the process of fostering additional collaboration among people in the AI safety space. And we will have more announcements about this early next year. We’re also working on resources to help people better visualize and better understand the space of AI safety work, and the opportunities there and the work that has been done. Because it’s actually quite a lot.

I’m also pretty excited about fostering continued theoretical work and practical work in making AI more robust and beneficial. The work in value alignment, for instance, is not something we see supported in mainstream AI research. And this is something that is pretty crucial to the way that advanced AIs will need to function. It won’t be very explicit instructions to them; they’ll have to be making decision based on what they think is right. And what is right? It’s something that… or even structuring the way to think about what is right requires some more research.

STANLEY: We’ve had pretty good success at FLI in the past few years helping to legitimize the field of AI safety. And I think it’s going to be important because AI is playing a large role in industry and there’s a lot of companies working on this, and not just in the US. So I think increasing international awareness about AI safety is going to be really important.

CHITA-TEGMARK: I believe that the AI community has raised some very important questions in 2016 regarding the impact of AI on society. I feel like 2017 should be the year to make progress on these questions, and actually research them and have some answers to them. For this, I think we need more social scientists – among people from other disciplines – to join this effort of really systematically investigating what would be the optimal impact of AI on people. I hope that in 2017 we will have more research initiatives, that we will attempt to systematically study other burning questions regarding the impact of AI on society. Some examples are: how can we ensure the psychological well-being for people while AI creates lots of displacement on the job market as many people predict. How do we optimize engagement with technology, and withdrawal from it also? Will some people be left behind, like the elderly or the economically disadvantaged? How will this affect them, and how will this affect society at large?

What about withdrawal from technology? What about satisfying our need for privacy? Will we be able to do that, or is the price of having more and more customized technologies and more and more personalization of the technologies we engage with… will that mean that we will have no privacy anymore, or that our expectations of privacy will be very seriously violated? I think these are some very important questions that I would love to get some answers to. And my wish, and also my resolution, for 2017 is to see more progress on these questions, and to hopefully also be part of this work and answering them.

PERRY: In 2017 I’m very interested in pursuing the landscape of different policy and principle recommendations from different groups regarding artificial intelligence. I’m also looking forward to expanding out nuclear divestment campaign by trying to introduce divestment to new universities, institutions, communities, and cities.

CONN: In fact, some experts believe nuclear weapons pose a greater threat now than at any time during our history.

TEGMARK: I personally feel that the greatest threat to the world in 2017 is one that the newspapers almost never write about. It’s not terrorist attacks, for example. It’s the small but horrible risk that the U.S. and Russia for some stupid reason get into an accidental nuclear war against each other. We have 14,000 nuclear weapons, and this war has almost happened many, many times. So, actually what’s quite remarkable and really gives a glimmer of hope is that – however people may feel about Putin and Trump – the fact is they are both signaling strongly that they are eager to get along better. And if that actually pans out and they manage to make some serious progress in nuclear arms reduction, that would make 2017 the best year for nuclear weapons we’ve had in a long, long time, reversing this trend of ever greater risks with ever more lethal weapons.

CONN: Some FLI members are also looking beyond nuclear weapons and artificial intelligence, as I learned when I asked Dave about other goals he hopes to accomplish with FLI this year.

STANLEY: Definitely having the volunteer team – particularly the international volunteers – continue to grow, and then scale things up. Right now, we have a fairly committed core of people who are helping out, and we think that they can start recruiting more people to help out in their little communities, and really making this stuff accessible. Not just to academics, but to everybody. And that’s also reflected in the types of people we have working for us as volunteers. They’re not just academics. We have programmers, linguists, people having just high school degrees all the way up to Ph.D.’s, so I think it’s pretty good that this varied group of people can get involved and contribute, and also reach out to other people they can relate to.

CONN: In addition to getting more people involved, Meia also pointed out that one of the best ways we can help ensure a positive future is to continue to offer people more informative content.

CHITA-TEGMARK: Another thing that I’m very excited about regarding our work here at the Future of Life Institute is this mission of empowering people to information. I think information is very powerful and can change the way people approach things: they can change their beliefs, their attitudes, and their behaviors as well. And by creating ways in which information can be readily distributed to the people, and with which they can engage very easily, I hope that we can create changes. For example, we’ve had a series of different apps regarding nuclear weapons that I think have contributed a lot to peoples knowledge and has brought this issue to the forefront of their thinking.

CONN: Yet as important as it is to highlight the existential risks we must address to keep humanity safe, perhaps it’s equally important to draw attention to the incredible hope we have for the future if we can solve these problems. Which is something both Richard and Lucas brought up for 2017.

MALLAH: I’m excited about trying to foster more positive visions of the future, so focusing on existential hope aspects of the future. Which are kind of the flip side of existential risks. So we’re looking at various ways of getting people to be creative about understanding some of the possibilities, and how to differentiate the paths between the risks and the benefits.

PERRY: Yeah, I’m also interested in creating and generating a lot more content that has to do with existential hope. Given the current global political climate, it’s all the more important to focus on how we can make the world better.

CONN: And on that note, I want to mention one of the most amazing things I discovered this past year. It had nothing to do with technology, and everything to do with people. Since starting at FLI, I’ve met countless individuals who are dedicating their lives to trying to make the world a better place. We may have a lot of problems to solve, but with so many groups focusing solely on solving them, I’m far more hopeful for the future. There are truly too many individuals that I’ve met this year to name them all, so instead, I’d like to provide a rather long list of groups and organizations I’ve had the pleasure to work with this year. A link to each group can be found at futureoflife.org/2016, and I encourage you to visit them all to learn more about the wonderful work they’re doing. In no particular order, they are:

Machine Intelligence Research Institute

Future of Humanity Institute

Global Catastrophic Risk Institute

Center for the Study of Existential Risk

Ploughshares Fund

Bulletin of Atomic Scientists

Open Philanthropy Project

Union of Concerned Scientists

The William Perry Project

ReThink Media

Don’t Bank on the Bomb

Federation of American Scientists

Massachusetts Peace Action

IEEE (Institute for Electrical and Electronics Engineers)

Center for Human-Compatible Artificial Intelligence

Center for Effective Altruism

Center for Applied Rationality

Foresight Institute

Leverhulme Center for the Future of Intelligence

Global Priorities Project

Association for the Advancement of Artificial Intelligence

International Joint Conference on Artificial Intelligence

Partnership on AI

The White House Office of Science and Technology Policy

The Future Society at Harvard Kennedy School

 

I couldn’t be more excited to see what 2017 holds in store for us, and all of us at FLI look forward to doing all we can to help create a safe and beneficial future for everyone. But to end on an even more optimistic note, I turn back to Max.

TEGMARK: Finally, I’d like – because I spend a lot of my time thinking about our universe – to remind everybody that we shouldn’t just be focused on the next election cycle. We have not decades, but billions of years of potentially awesome future for life, on Earth and far beyond. And it’s so important to not let ourselves get so distracted by our everyday little frustrations that we lose sight of these incredible opportunities that we all stand to gain from if we can get along, and focus, and collaborate, and use technology for good.

AI Safety Highlights from NIPS 2016

This year’s Neural Information Processing Systems (NIPS) conference was larger than ever, with almost 6000 people attending, hosted in a huge convention center in Barcelona, Spain. The conference started off with two exciting announcements on open-sourcing collections of environments for training and testing general AI capabilities – the DeepMind Lab and the OpenAI Universe. Among other things, this is promising for testing safety properties of ML algorithms. OpenAI has already used their Universe environment to give an entertaining and instructive demonstration of reward hacking that illustrates the challenge of designing robust reward functions.

I was happy to see a lot of AI-safety-related content at NIPS this year. The ML and the Law symposium and Interpretable ML for Complex Systems workshop focused on near-term AI safety issues, while the Reliable ML in the Wild workshop also covered long-term problems. Here are some papers relevant to long-term AI safety:

Inverse Reinforcement Learning

Cooperative Inverse Reinforcement Learning (CIRL) by Hadfield-Menell, Russell, Abbeel, and Dragan (main conference). This paper addresses the value alignment problem by teaching the artificial agent about the human’s reward function, using instructive demonstrations rather than optimal demonstrations like in classical IRL (e.g. showing the robot how to make coffee vs having it observe coffee being made). (3-minute video)

cirl

Generalizing Skills with Semi-Supervised Reinforcement Learning by Finn, Yu, Fu, Abbeel, and Levine (Deep RL workshop). This work addresses the scalable oversight problem by proposing the first tractable algorithm for semi-supervised RL. This allows artificial agents to robustly learn reward functions from limited human feedback. The algorithm uses an IRL-like approach to infer the reward function, using the agent’s own prior experiences in the supervised setting as an expert demonstration.

ssrl

Towards Interactive Inverse Reinforcement Learning by Armstrong and Leike (Reliable ML workshop). This paper studies the incentives of an agent that is trying to learn about the reward function while simultaneously maximizing the reward. The authors discuss some ways to reduce the agent’s incentive to manipulate the reward learning process.

interactive-irl

Should Robots Have Off Switches? by Milli, Hadfield-Menell, and Russell (Reliable ML workshop). This poster examines some adverse effects of incentivizing artificial agents to be compliant in the off-switch game (a variant of CIRL).

off-switch

 

Safe Exploration

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes by Turchetta, Berkenkamp, and Krause (main conference). This paper develops a reinforcement learning algorithm called Safe MDP that can explore an unknown environment without getting into irreversible situations, unlike classical RL approaches.

safemdp

Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear by Lipton, Gao, Li, Chen, and Deng (Reliable ML workshop). This work addresses the ‘Sisyphean curse’ of DQN algorithms forgetting past experiences, as they become increasingly unlikely under a new policy, and therefore eventually repeating catastrophic mistakes. The paper introduces an approach called ‘intrinsic fear’, which maintains a model for how likely different states are to lead to a catastrophe within some number of steps.

intrinsic_fear

Most of these papers were related to inverse reinforcement learning – while IRL is a promising approach, it would be great to see more varied safety material at the next NIPS. There were some more safety papers on other topics at UAI this summer: Safely Interruptible Agents (formalizing what it means to incentivize an agent to obey shutdown signals) and A Formal Solution to the Grain of Truth Problem (providing a broad theoretical framework for multiple agents learning to predict each other in arbitrary computable games).

These highlights were originally posted here and cross-posted to Approximately Correct. Thanks to Jan Leike, Zachary Lipton, and Janos Kramar for providing feedback on this post.

 

MIRI December 2016 Newsletter

We’re in the final weeks of our push to cover our funding shortfall, and we’re now halfway to our $160,000 goal. For potential donors who are interested in an outside perspective, Future of Humanity Institute (FHI) researcher Owen Cotton-Barratt has written up why he’s donating to MIRI this year. (Donation page.)Research updates

General updates

  • We teamed up with a number of AI safety researchers to help compile a list of recommended AI safety readings for the Center for Human-Compatible AI. See this page if you would like to get involved with CHCAI’s research.
  • Investment analyst Ben Hoskin reviews MIRI and other organizations involved in AI safety.

News and links

  • The Off-Switch Game“: Dylan Hadfield-Manell, Anca Dragan, Pieter Abbeel, and Stuart Russell show that an AI agent’s corrigibility is closely tied to the uncertainty it has about its utility function.
  • Russell and Allan Dafoe critique an inaccurate summary by Oren Etzioni of a new survey of AI experts on superintelligence.
  • Sam Harris interviews Russell on the basics of AI risk (video). See also Russell’s new Q&A on the future of AI.
  • Future of Life Institute co-founder Viktoriya Krakovna and FHI researcher Jan Leike join Google DeepMind’s safety team.
  • GoodAI sponsors a challenge to “accelerate the search for general artificial intelligence”.
  • OpenAI releases Universe, “a software platform for measuring and training an AI’s general intelligence across the world’s supply of games”. Meanwhile, DeepMind has open-sourced their own platform for general AI research, DeepMind Lab.
  • Staff at GiveWell and the Centre for Effective Altruism, along with others in the effective altruism community, explain where they’re donating this year.
  • FHI is seeking AI safety interns, researchers, and admins: jobs page.

This newsletter was originally posted here.

Silo Busting in AI Research

Artificial intelligence may seem like a computer science project, but if it’s going to successfully integrate with society, then social scientists must be more involved.

Developing an intelligent machine is not merely a problem of modifying algorithms in a lab. These machines must be aligned with human values, and this requires a deep understanding of ethics and the social consequences of deploying intelligent machines.

Getting people with a variety of backgrounds together seems logical enough in theory, but in practice, what happens when computer scientists, AI developers, economists, philosophers, and psychologists try to discuss AI issues? Do any of them even speak the same language?

Social scientists and computer scientists will come at AI problems from very different directions. And if they collaborate, everybody wins. Social scientists can learn about the complex tools and algorithms used in computer science labs, and computer scientists can become more attuned to the social and ethical implications of advanced AI.

Through transdisciplinary learning, both fields will be better equipped to handle the challenges of developing AI, and society as a whole will be safer.

 

Silo Busting

Too often, researchers focus on their narrow area of expertise, rarely reaching out to experts in other fields to solve common problems. AI is no different, with thick walls – sometimes literally – separating the social sciences from the computer sciences. This process of breaking down walls between research fields is often called silo-busting.

If AI researchers largely operate in silos, they may lose opportunities to learn from other perspectives and collaborate with potential colleagues. Scientists might miss gaps in their research or reproduce work already completed by others, because they were secluded away in their silo. This can significantly hamper the development of value-aligned AI.

To bust these silos, Wendell Wallach organized workshops to facilitate knowledge-sharing among leading computer and social scientists. Wallach, a consultant, ethicist, and scholar at Yale University’s Interdisciplinary Center for Bioethics, holds these workshops at The Hastings Center, where he is a senior advisor.

With co-chairs Gary Marchant, Stuart Russell, and Bart Selman, Wallach held the first workshop in April 2016. “The first workshop was very much about exposing people to what experts in all of these different fields were thinking about,” Wallach explains. “My intention was just to put all of these people in a room and hopefully they’d see that they weren’t all reinventing the wheel, and recognize that there were other people who were engaged in similar projects.”

The workshop intentionally brought together experts from a variety of viewpoints, including engineering ethics, philosophy, and resilience engineering, as well as participants from the Institute of Electrical and Electronics Engineers (IEEE), the Office of Naval Research, and the World Economic Forum (WEF). Wallach recounts, “some were very interested in how you implement sensitivity to moral considerations in AI computationally, and others were more interested in how AI changes the societal context.”

Other participants studied how the engineers of these systems may be susceptible to harmful cognitive biases and conflicts of interest, while still others focused on governance issues surrounding AI. Each of these viewpoints is necessary for developing beneficial AI, and The Hastings Center’s workshop gave participants the opportunity to learn from and teach each other.

But silo-busting is not easy. Wallach explains, “everybody has their own goals, their own projects, their own intentions, and it’s hard to hear someone say, ‘maybe you’re being a little naïve about this.’” When researchers operate exclusively in silos, “it’s almost impossible to understand how people outside of those silos did what they did,” he adds.

The intention of the first workshop was not to develop concrete strategies or proposals, but rather to open researchers’ minds to the broad challenges of developing AI with human values. “My suspicion is, the most valuable things that came out of this workshop would be hard to quantify,” Wallach clarifies. “It’s more like people’s minds were being stretched and opened. That was, for me, what this was primarily about.”

The workshop did yield some tangible results. For example, Marchant and Wallach introduced a pilot project for the international governance of AI, and nearly everyone at the workshop agreed to work on it. Since then, the IEEE, the International Committee of the Red Cross, the UN, the World Economic Forum, and other institutions have agreed to become active partners with The Hastings Center in building global infrastructure to ensure that AI and Robotics are beneficial.

This transdisciplinary cooperation is a promising sign that Wallach’s efforts are succeeding in strengthening the global response to AI challenges.

 

Value Alignment

Wallach and his co-chairs held a second workshop at the end of October. The participants were mostly scientists, but also included social theorists, a legal scholar, philosophers, and ethicists. The overall goal remained – to bust AI silos and facilitate transdisciplinary cooperation – but this workshop had a narrower focus.

“We made it more about value alignment and machine ethics,” he explains. “The tension in the room was between those who thought the problem [of value alignment] was imminently solvable and those who were deeply skeptical about solving the problem at all.”

In general, Wallach observed that “the social scientists and philosophers tend to overplay the difficulties [of creating AI with full value alignment] and computer scientists tend to underplay the difficulties.”

Wallach believes that while computer scientists will build the algorithms and utility functions for AI, they will need input from social scientists to ensure value alignment. “If a utility function represents 100,000 inputs, social theorists will help the AI researchers understand what those 100,000 inputs are,” he explains. “The AI researchers might be able to come up with 50,000-60,000 on their own, but they’re suddenly going to realize that people who have thought much more deeply about applied ethics are perhaps sensitive to things that they never considered.”

“I’m hoping that enough of [these researchers] learn each other’s language and how to communicate with each other, that they’ll recognize the value they can get from collaborating together,” he says. “I think I see evidence of that beginning to take place.”

 

Moving Forward

Developing value-aligned AI is a monumental task with existential risks. Experts from various perspectives must be willing to learn from each other and adapt their understanding of the issue.

In this spirit, The Hastings Center is leading the charge to bring the various AI silos together. After two successful events that resulted in promising partnerships, Wallach and his co-chairs will hold their third workshop in Spring 2018. And while these workshops are a small effort to facilitate transdisciplinary cooperation on AI, Wallach is hopeful.

“It’s a small group,” he admits, “but it’s people who are leaders in these various fields, so hopefully that permeates through the whole field, on both sides.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence and the King Midas Problem

Value alignment. It’s a phrase that often pops up in discussions about the safety and ethics of artificial intelligence. How can scientists create AI with goals and values that align with those of the people it interacts with?

Very simple robots with very constrained tasks do not need goals or values at all. Although the Roomba’s designers know you want a clean floor, Roomba doesn’t: it simply executes a procedure that the Roomba’s designers predict will work—most of the time. If your kitten leaves a messy pile on the carpet, Roomba will dutifully smear it all over the living room. If we keep programming smarter and smarter robots, then by the late 2020s, you may be able to ask your wonderful domestic robot to cook a tasty, high-protein dinner. But if you forgot to buy any meat, you may come home to a hot meal but find the aforementioned cat has mysteriously vanished. The robot, designed for chores, doesn’t understand that the sentimental value of the cat exceeds its nutritional value.

AI and King Midas

Stuart Russell, a renowned AI researcher, compares the challenge of defining a robot’s objective to the King Midas myth. “The robot,” says Russell, “has some objective and pursues it brilliantly to the destruction of mankind. And it’s because it’s the wrong objective. It’s the old King Midas problem.”

This is one of the big problems in AI safety that Russell is trying to solve. “We’ve got to get the right objective,” he explains, “and since we don’t seem to know how to program it, the right answer seems to be that the robot should learn – from interacting with and watching humans – what it is humans care about.”

Russell works from the assumption that the robot will solve whatever formal problem we define. Rather than assuming that the robot should optimize a given objective, Russell defines the problem as a two-player game (“game” as used by economists, meaning a decision problem with multiple agents) called cooperative inverse reinforcement learning (CIRL).

A CIRL game includes a person and a robot: the robot’s only purpose is to make the person happy, but it doesn’t know what the person wants. Fortunately, it can learn more about what the person wants by observing her behavior. For example, if a robot observed the human’s morning routine, it should discover how important coffee is—not to itself, of course (we don’t want robots drinking coffee), but to the human. Then, it will make coffee for the person without being asked.

What about the example of King Midas, who thought he wanted as much gold as possible? How does the approach deal with someone stating their wishes incorrectly?

“We don’t have to know what we want. That’s the whole point. It’s hard to know what we want, and what we ask for is evidence for what we really want, but it’s not ground truth,” says Russell.

The robot should learn what people want by watching their actions. Then, if someone asks for something that’s technically incorrect, the robot will understand their underlying wish. Russell adds, “The robot—if it had been there—would have told Midas that he didn’t really want everything turned to gold, maybe just a few choice objects that he might point at from time to time and say the magic word.”

AI Off-Switch

Russell and his Berkeley colleagues also recently announced further progress toward ensuring safe AI, with a paper on ensuring an AI’s off-switch is always accessible. An off-switch is often suggested to guarantee a robot doesn’t harm humans – if the robot goes bad, just turn it off! But what if the AI anticipates that and disables its off switch?

Russell and his team propose solving this by providing a level of uncertainty in the robot’s objective. They created another CIRL game in which the robot chooses to announce its intentions, which gives the human a chance to switch it off, or to plunge ahead with its plan, bypassing human oversight.

They showed that a robot with a fixed objective will take the second choice and disable its off-switch. But if there is uncertainty about the objective, the robot will reason that a human decision to switch the robot off would imply the announced intention is undesirable. Because its goal is to make the human happy, the robot accepts being switched off. The robot has a positive incentive not to disable the off-switch, and that incentive is directly tied to the robot’s uncertainty about the human’s true objective.

Ensuring AI Safety

In addition to his research, Russell is also one of the most vocal and active AI safety researchers concerned with ensuring a stronger public understanding of the potential issues surrounding AI development.

He recently co-authored a rebuttal to an article in the MIT Technology Review, which claimed that real AI scientists weren’t worried about the existential threat of AI. Russell and his co-author summed up why it’s better to be cautious and careful than just assume all will turn out for the best:

“Our experience with Chernobyl suggests it may be unwise to claim that a powerful technology entails no risks. It may also be unwise to claim that a powerful technology will never come to fruition. On September 11, 1933, Lord Rutherford, perhaps the world’s most eminent nuclear physicist, described the prospect of extracting energy from atoms as nothing but “moonshine.” Less than 24 hours later, Leo Szilard invented the neutron-induced nuclear chain reaction; detailed designs for nuclear reactors and nuclear weapons followed a few years later. Surely it is better to anticipate human ingenuity than to underestimate it, better to acknowledge the risks than to deny them. … [T]he risk [of AI] arises from the unpredictability and potential irreversibility of deploying an optimization process more intelligent than the humans who specified its objectives.”

This summer, Russell received a grant of over $5.5 million from the Open Philanthropy Project for a new research center, the Center for Human-Compatible Artificial Intelligence, in Berkeley. Among the primary objectives of the Center will be to study this problem of value alignment, to continue his efforts toward provably beneficial AI, and to ensure we don’t make the same mistakes as King Midas.

“Look,” he says, “if you were King Midas, would you want your robot to say, ‘Everything turns to gold? OK, boss, you got it.’ No! You’d want it to say, ‘Are you sure? Including your food, drink, and relatives? I’m pretty sure you wouldn’t like that. How about this: you point to something and say ‘Abracadabra Aurificio’ or something, and then I’ll turn it to gold, OK?’”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Westworld Op-Ed: Are Conscious AI Dangerous?

“These violent delights have violent ends.”

With the help of Shakespeare and Michael Crichton, HBO’s Westworld has brought to light some of the concerns about creating advanced artificial intelligence.

If you haven’t seen it already, Westworld is a show in which human-like AI populate a park designed to look like America’s Wild West. Visitors spend huge amounts of money to visit the park and live out old west adventures, in which they can fight, rape, and kill the AI. Each time one of the robots “dies,” its body is cleaned up, its memory is wiped, and it starts a new iteration of its script.

The show’s season finale aired Sunday evening, and it certainly went out with a bang – but not to worry, there are no spoilers in this article.

AI Safety Issues in Westworld

Westworld was inspired by an old Crichton movie of the same name, and leave it to him – the writer of Jurassic Park — to create a storyline that would have us questioning the level of control we’ll be able to maintain over advanced scientific endeavors. But unlike the original movie, in which the robot is the bad guy, in the TV show, the robots are depicted as the most sympathetic and even the most human characters.

Not surprisingly, concerns about the safety of the park show up almost immediately. The park is overseen by one man who can make whatever program updates he wants without running it by anyone for a safety check. The robots show signs of remembering their mistreatment. One of the characters mentions that only one line of code keeps the robots from being able to harm humans.

These issues are just some of the problems the show touches on that present real AI safety concerns: A single “bad agent” who uses advanced AI to intentionally cause harm to people; small glitches in the software that turn deadly; and a lack of redundancy and robustness in the code to keep people safe.

But to really get your brain working, many of the safety and ethics issues that crop up during the show hinge on whether or not the robots are conscious. In fact, the show whole-heartedly delves into one of the hardest questions of all: what is consciousness? On top of that, can humans create a conscious being? If so, can we control it? Do we want to find out?

To consider these questions, I turned to Georgia Tech AI researcher Mark Riedl, whose research focuses on creating creative AI, and NYU philosopher David Chalmers, who’s most famous for his formulation of the “hard problem of consciousness.”

Can AI Feel Pain?

I spoke with Riedl first, asking him about the extent to which a robot would feel pain if it was so programmed. “First,” he said, “I do not condone violence against humans, animals, or anthropomorphized robots or AI.” He then explained that humans and animals feel pain as a warning signal to “avoid a particular stimulus.”

For robots, however, “the closest analogy might be what happens in reinforcement learning agents, which engage in trial-and-error learning.” The AI would receive a positive or negative reward for some action and it would adjust its future behavior accordingly. Rather than feeling like pain, Riedl suggests that the negative reward would be more “akin to losing points in a computer game.”

“Robots and AI can be programmed to ‘express’ pain in a human-like fashion,” says Riedl, “but it would be an illusion. There is one reason for creating this illusion: for the robot to communicate its internal state to humans in a way that is instantly understandable and invokes empathy.”

Riedl isn’t worried that the AI would feel real pain, and if the robot’s memory is completely erased each night, then he suggests it would be as though nothing happened. However, he does see one possible safety issue here. For reinforcement learning to work properly, the AI needs to take actions that optimize for the positive reward. If the robot’s memory isn’t completely erased — if the robot starts to remember the bad things that happened to it – then it could try to avoid those actions or people that trigger the negative reward.

“In theory,” says Riedl, “these agents can learn to plan ahead to reduce the possibility of receiving negative reward in the most cost-effective way possible. … If robots don’t understand the implications of their actions in terms other than reward gain or loss, this can also mean acting in advance to stop humans from harming them.”

Riedl points out, though, that for the foreseeable future, we do not have robots with sufficient capabilities to pose an immediate concern. But assuming these robots do arrive, problems with negative rewards could be potentially dangerous for the humans. (Possibly even more dangerous, as the show depicts, is if the robots do understand the implications of their actions against humans who have been mistreating them for decades.)

Can AI Be Conscious?

Chalmers sees things a bit differently. “The way I think about consciousness,” says Chalmers, “the way most people think about consciousness – there just doesn’t seem to be any question that these beings are conscious. … They’re presented as having fairly rich emotional lives – that’s presented as feeling pain and thinking thoughts. … They’re not just exhibiting reflexive behavior. They’re thinking about their situations. They’re reasoning.”

“Obviously, they’re sentient,” he adds.

Chalmers suggests that instead of trying to define what about the robots makes them conscious, we should instead consider what it is they’re lacking. Most notably, says Chalmers, they lack free will and memory. However, many of us live in routines that we’re unable to break out from. And there have been numerous cases of people with extreme memory problems, but no one thinks that makes it okay to rape or kill them.

“If it is regarded as okay to mistreat the AIs on this show, is it because of some deficit they have or because of something else?” Chalmers asks.

The specific scenarios portrayed in Westworld may not be realistic because Chalmers doesn’t believe the Bicameral-mind theory is unlikely to lead to consciousness, even for robots. ” I think it’s hopeless as a theory,” he says, “even of robot consciousness — or of robot self-consciousness, which seems more what’s intended.  It would be so much easier just to program the robots to monitor their own thoughts directly.”

But this still presents risks. “If you had a situation that was as complex and as brain-like as these, would it also be so easily controllable?” asks Chalmers.

In any case, treating robots badly could easily pose a risk to human safety. We risk creating unconscious robots that learn the wrong lessons from negative feedback, or we risk inadvertently (or intentionally, as in the case of Westworld) creating conscious entities who will eventually fight back against their abuse and oppression.

When a host in episode two is asked if she’s “real,” she responds, “If you can’t tell, does it matter?”

These seem like the safest words to live by.

Autonomous Weapons: an Interview With the Experts

FLI’s Ariel Conn recently spoke with Heather Roff and Peter Asaro about autonomous weapons. Roff, a research scientist at The Global Security Initiative at Arizona State University and a senior research fellow at the University of Oxford, recently compiled an international database of weapons systems that exhibit some level of autonomous capabilities. Asaro is a philosopher of science, technology, and media at The New School in New York City. He looks at fundamental questions of responsibility and liability with all autonomous systems, but he’s also the Co-Founder and Vice-Chair of the International Committee for Robot Arms Control and a he’s a Spokesperson for the Campaign to Stop Killer Robots.

The following interview has been edited for brevity, but you can read it in its entirety here or listen to it above.

ARIEL: Dr. Roff, I’d like to start with you. With regard to the database, what prompted you to create it, what information does it provide, how can we use it?

ROFF: The main impetus behind the creation of the database [was] a feeling that the same autonomous or automated weapons systems were brought out in discussions over and over and over again. It made it seem like there wasn’t anything else to worry about. So I created a database of about 250 autonomous systems that are currently deployed [from] Russia, China, the United States, France, and Germany. I code them along a series of about 20 different variables: from automatic target recognition [to] the ability to navigate [to] acquisition capabilities [etc.].

It’s allowing everyone to understand that autonomy isn’t just binary. It’s not a yes or a no. Not many people in the world have a good understanding of what modern militaries fight with, and how they fight.

ARIEL: And Dr. Asaro, your research is about liability. How is it different for autonomous weapons versus a human overseeing a drone that just accidentally fires on the wrong target.

ASARO: My work looks at autonomous weapons and other kinds of autonomous systems and the interface of the ethical and legal aspects. Specifically, questions about the ethics of killing, and the legal requirements under international law for killing in armed conflict. These kind of autonomous systems are not really legal and moral agents in the way that humans are, and so delegating the authority to kill to them is unjustifiable.

One aspect of accountability is, if a mistake is made, holding people to account for that mistake. There’s a feedback mechanism to prevent that error occurring in the future. There’s also a justice element, which could be attributive justice, in which you try to make up for loss. Other forms of accountability look at punishment itself. When you have autonomous systems, you can’t really punish the system. More importantly, if nobody really intended the effect that the system brought about then it becomes very difficult to hold anybody accountable for the actions of the system. The debate — it’s really kind of framed around this question of the accountability gap.

ARIEL: One of the things we hear a lot in the news is about always keeping a human in the loop. How does that play into the idea of liability? And realistically, what does it mean?

ROFF: I actually think this is just a really unhelpful heuristic. It’s hindering our ability to think about what’s potentially risky or dangerous or might produce unintended consequences. So here’s an example: the UK’s Ministry of Defense calls this the Empty Hangar Problem. It’s very unlikely that they’re going to walk down to an airplane hangar, look in, and be like, “Hey! Where’s the airplane? Oh, it’s decided to go to war today.” That’s just not going to happen.

These systems are always going to be used by humans, and humans are going to decide to use them. A better way to think about this is in terms of task allocation. What is the scope of the task, and how much information and control does the human have before deploying that system to execute? If there is a lot of time, space, and distance between the time the decision is made to field it and then the application of force, there’s more time for things to change on the ground, and there’s more time for the human to basically [say] they didn’t intend for this to happen.

ASARO: If self-driving cars start running people over, people will sue the manufacturer. But there’s no mechanisms in international law for the victims of bombs and missiles and potentially autonomous weapons to sue the manufacturers of those systems. That just doesn’t happen. So there’s no incentives for companies that manufacture those [weapons] to improve safety and performance.

ARIEL: Dr. Asaro, we’ve briefly mentioned definitional problems of autonomous weapons — how does the liability play in there?

ASARO: The law of international armed conflict is pretty clear that humans are the ones that make the decisions, especially about a targeting decision or the taking of a human life in armed conflict. This question of having a system that could range over many miles and many days and select targets on its own is where things are problematic. Part of the definition is: how do you figure out exactly what constitutes a targeting decision, and how do you ensure that a human is making that decision? That’s the direction the discussion at the UN is going as well. Instead of trying to define what’s an autonomous system, what we focus on is the targeting decision and firing decisions of weapons for individual attacks. What we want to acquire is meaningful human control over those decisions.

ARIEL: Dr. Roff, you were working on the idea of meaningful human control, as well. Can you talk about that?

ROFF: If [a commander] fields a weapon that can go from attack to attack without checking back with her, then the weapon is making the proportionality calculation, and she [has] delegated her authority and her obligation to a machine. That is prohibited under IHL, and I would say is also morally prohibited. You can’t offload your moral obligation to a nonmoral agent. So that’s where our work on meaningful human control is: a human commander has a moral obligation to undertake precaution and proportionality in each attack.

ARIEL: Is there anything else you think is important to add?

ROFF: We still have limitations AI. We have really great applications of AI, and we have blind. It would be really incumbent on the AI community to be vocal about where they think there are capacities and capabilities that could be reliably and predictably deployed on such systems. If they don’t think that those technologies or applications could be reliably and predictably deployed, then they need to stand up and say as much.

ASARO: We’re not trying to prohibit autonomous operations of different kind of systems or the development and application of artificial intelligence for a wide range of civilian and military applications. But there are certain applications, specifically the lethal ones, that have higher standards of moral and legal requirements that need to be met.

 

The Problem of Defining Autonomous Weapons

What, exactly, is an autonomous weapon? For the general public, the phrase is often used synonymously with killer robots and triggers images of the Terminator. But for the military, the definition of an autonomous weapons system, or AWS, is deceivingly simple.

The United States Department of Defense defines an AWS as “a weapon system that, once activated, can select and engage targets without further intervention by a human operator.  This includes human-supervised autonomous weapon systems that are designed to allow human operators to override operation of the weapon system, but can select and engage targets without further human input after activation.”

Basically, it is a weapon that can be used in any domain — land, air, sea, space, cyber, or any combination thereof — and encompasses significantly more than just the platform that fires the munition. This means that there are various capabilities the system possesses, such as identifying targets, tracking, and firing, all of which may have varying levels of human interaction and input.

Heather Roff, a research scientist at The Global Security Initiative at Arizona State University and a senior research fellow at the University of Oxford, suggests that even the basic terminology of the DoD’s definition is unclear.

“This definition is problematic because we don’t really know what ‘select’ means here.  Is it ‘detect’ or ‘select’?” she asks. Roff also notes another definitional problem arises because, in many instances, the difference between an autonomous weapon (acting independently) and an automated weapon (pre-programmed to act automatically) is not clear.

 

A Database of Weapons Systems

State parties to the UN’s Convention on Conventional Weapons (CCW) also grapple with what constitutes an autonomous — and not a current automated — weapon. During the last three years of discussion at Informal Meetings of Experts at the CCW, participants typically only referred to two or three presently deployed weapons systems that appear to be AWS, such as the Israeli Harpy or the United States’ Counter Rocket and Mortar system.

To address this, the International Committee of the Red Cross requested more data on presently deployed systems. It wanted to know what the weapons systems are that states currently use and what projects are under development. Roff took up the call to action. She poured over publicly available data from a variety of sources and compiled a database of 284 weapons systems. She wanted to know what capacities already existed on presently deployed systems and whether these were or were not “autonomous.”

“The dataset looks at the top five weapons exporting countries, so that’s Russia, China, the United States, France and Germany,” says Roff. “I’m looking at major sales and major defense industry manufacturers from each country. And then I look at all the systems that are presently deployed by those countries that are manufactured by those top manufacturers, and I code them along a series of about 20 different variables.”

These variables include capabilities like navigation, homing, target identification, firing, etc., and for each variable, Roff coded a weapon as either having the capacity or not. Roff then created a series of three indices to bundle the various capabilities: self-mobility, self-direction, and self-determination. Self-mobility capabilities allow a system to move by itself, self-direction relates to target identification, and self-determination indexes the abilities that a system may possess in relation to goal setting, planning, and communication. Most “smart” weapons have high self-direction and self-mobility, but few, if any, have self-determination capabilities.

As Roff explains in a recent Foreign Policy post, the data shows that “the emerging trend in autonomy has less to do with the hardware and more on the areas of communications and target identification. What we see is a push for better target identification capabilities, identification friend or foe (IFF), as well as learning.  Systems need to be able to adapt, to learn, and to change or update plans while deployed. In short, the systems need to be tasked with more things and vaguer tasks.” Thus newer systems will need greater self-determination capabilities.

 

The Human in the Loop

But understanding what the weapons systems can do is only one part of the equation. In most systems, humans still maintain varying levels of control, and the military often claims that a human will always be “in the loop.” That is, a human will always have some element of meaningful control over the system. But this leads to another definitional problem: just what is meaningful human control?

Roff argues that this idea of keeping a human “in the loop” isn’t just “unhelpful,” but that it may be “hindering our ability to think about what’s wrong with autonomous systems.” She references what the UK Ministry of Defense calls, the Empty Hangar Problem: no one expects to walk into a military airplane hangar and discover that the autonomous plane spontaneously decided, on its own, to go to war.

“That’s just not going to happen,” Roff says, “These systems are always going to be used by humans, and humans are going to decide to use them.” But thinking about humans in some loop, she contends, means that any difficulties with autonomy get pushed aside.

Earlier this year, Roff worked with Article 36, which coined the phrase “meaningful human control,” to establish more a more clear-cut definition of the term. They published a concept paper, Meaningful Human Control, Artificial Intelligence and Autonomous Weapons, which offered guidelines for delegates at the 2016 CCW Meeting of Experts on Lethal Autonomous Weapons Systems.

In the paper, Roff and Richard Moyes outlined key elements – such as predictable, reliable and transparent technology, accurate user information, a capacity for timely human action and intervention, human control during attacks, etc. – for determining whether an AWS allows for meaningful human control.

“You can’t offload your moral obligation to a non-moral agent,” says Roff. “So that’s where I think our work on meaningful human control is: a human commander has a moral obligation to undertake precaution and proportionality in each attack.” The weapon system cannot do it for the human.

Researchers and the international community are only beginning to tackle the ethical issues that arise from AWSs. Clearly defining the weapons systems and the role humans will continue to play is one small part of a very big problem. Roff will continue to work with the international community to establish more well defined goals and guidelines.

“I’m hoping that the doctrine and the discussions that are developing internationally and through like-minded states will actually guide normative generation of how to use or not use such systems,” she says.

Heather Roff also spoke about this work on an FLI podcast.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Complex AI Systems Explain Their Actions

cobots_mauela_veloso

In the future, service robots equipped with artificial intelligence (AI) are bound to be a common sight. These bots will help people navigate crowded airports, serve meals, or even schedule meetings.

As these AI systems become more integrated into daily life, it is vital to find an efficient way to communicate with them. It is obviously more natural for a human to speak in plain language rather than a string of code. Further, as the relationship between humans and robots grows, it will be necessary to engage in conversations, rather than just give orders.

This human-robot interaction is what Manuela M. Veloso’s research is all about. Veloso, a professor at Carnegie Mellon University, has focused her research on CoBots, autonomous indoor mobile service robots which transport items, guide visitors to building locations, and traverse the halls and elevators. The CoBot robots have been successfully autonomously navigating for several years now, and have traveled more than 1,000km. These accomplishments have enabled the research team to pursue a new direction, focusing now on novel human-robot interaction.

“If you really want these autonomous robots to be in the presence of humans and interacting with humans, and being capable of benefiting humans, they need to be able to talk with humans” Veloso says.

 

Communicating With CoBots

Veloso’s CoBots are capable of autonomous localization and navigation in the Gates-Hillman Center using WiFi, LIDAR, and/or a Kinect sensor (yes, the same type used for video games).

The robots navigate by detecting walls as planes, which they match to the known maps of the building. Other objects, including people, are detected as obstacles, so navigation is safe and robust. Overall, the CoBots are good navigators and are quite consistent in their motion. In fact, the team noticed the robots could wear down the carpet as they traveled the same path numerous times.

Because the robots are autonomous, and therefore capable of making their own decisions, they are out of sight for large amounts of time while they navigate the multi-floor buildings.

The research team began to wonder about this unaccounted time. How were the robots perceiving the environment and reaching their goals? How was the trip? What did they plan to do next?

“In the future, I think that incrementally we may want to query these systems on why they made some choices or why they are making some recommendations,” explains Veloso.

The research team is currently working on the question of why the CoBots took the route they did while autonomous. The team wanted to give the robots the ability to record their experiences and then transform the data about their routes into natural language. In this way, the bots could communicate with humans and reveal their choices and hopefully the rationale behind their decisions.

 

Levels of Explanation

The “internals” underlying the functions of any autonomous robots are completely based on numerical computations, and not natural language. For example, the CoBot robots in particular compute the distance to walls, assigning velocities to their motors to enable the motion to specific map coordinates.

Asking an autonomous robot for a non-numerical explanation is complex, says Veloso. Furthermore, the answer can be provided in many potential levels of detail.

“We define what we call the ‘verbalization space’ in which this translation into language can happen with different levels of detail, with different levels of locality, with different levels of specificity.”

For example, if a developer is asking a robot to detail their journey, they might expect a lengthy retelling, with details that include battery levels. But a random visitor might just want to know how long it takes to get from one office to another.

Therefore, the research is not just about the translation from data to language, but also the acknowledgment that the robots need to explain things with more or less detail. If a human were to ask for more detail, the request triggers CoBot “to move” into a more detailed point in the verbalization space.

“We are trying to understand how to empower the robots to be more trustable through these explanations, as they attend to what the humans want to know,” says Veloso. The ability to generate explanations, in particular at multiple levels of detail, will be especially important in the future, as the AI systems will work with more complex decisions. Humans could have a more difficult time inferring the AI’s reasoning. Therefore, the bot will need to be more transparent.

For example, if you go to a doctor’s office and the AI there makes a recommendation about your health, you may want to know why it came to this decision, or why it recommended one medication over another.

Currently, Veloso’s research focuses on getting the robots to generate these explanations in plain language. The next step will be to have the robots incorporate natural language when humans provide them with feedback. “[The CoBot] could say, ‘I came from that way,’ and you could say, ‘well next time, please come through the other way,’” explains Veloso.

These sorts of corrections could be programmed into the code, but Veloso believes that “trustability” in AI systems will benefit from our ability to dialogue, query, and correct their autonomy. She and her team aim at contributing to a multi-robot, multi-human symbiotic relationship, in which robots and humans coordinate and cooperate as a function of their limitations and strengths.

“What we’re working on is to really empower people – a random person who meets a robot – to still be able to ask things about the robot in natural language,” she says.

In the future, when we will have more and more AI systems that are able to perceive the world, make decisions, and support human decision-making, the ability to engage in these types of conversations will be essential­­.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Who is Responsible for Autonomous Weapons?

Consider the following wartime scenario: Hoping to spare the lives of soldiers, a country deploys an autonomous weapon to wipe out an enemy force. This robot has demonstrated military capabilities that far exceed even the best soldiers, but when it hits the ground, it gets confused. It can’t distinguish the civilians from the enemy soldiers and begins taking innocent lives. The military generals desperately try to stop the robot, but by the time they succeed it has already killed dozens.

Who is responsible for this atrocity? Is it the commanders who deployed the robot, the designers and manufacturers of the robot, or the robot itself?

 

Liability: Autonomous Systems

As artificial intelligence improves, governments may turn to autonomous weapons — like military robots — in order to gain the upper hand in armed conflict. These weapons can navigate environments on their own and make their own decisions about who to kill and who to spare. While the example above may never occur, unintended harm is inevitable. Considering these scenarios helps formulate important questions that governments and researchers must jointly consider, namely:

How do we hold human beings accountable for the actions of autonomous systems? And how is justice served when the killer is essentially a computer?

As it turns out, there is no straightforward answer to this dilemma. When a human soldier commits an atrocity and kills innocent civilians, that soldier is held accountable. But when autonomous weapons do the killing, it’s difficult to blame them for their mistakes.

An autonomous weapon’s “decision” to murder innocent civilians is like a computer’s “decision” to freeze the screen and delete your unsaved project. Frustrating as a frozen computer may be, people rarely think the computer intended to complicate their lives.

Intention must be demonstrated to prosecute someone for a war crime, and while autonomous weapons may demonstrate outward signs of decision-making and intention, they still run on a code that’s just as impersonal as the code that glitches and freezes a computer screen. Like computers, these systems are not legal or moral agents, and it’s not clear how to hold them accountable — or if they can be held accountable — for their mistakes.

So who assumes the blame when autonomous weapons take innocent lives? Should they even be allowed to kill at all?

 

Liability: from Self-Driving Cars to Autonomous Weapons

Peter Asaro, a philosopher of science, technology, and media at The New School in New York City, has been working on addressing these fundamental questions of responsibility and liability with all autonomous systems, not just weapons. By exploring fundamental concepts of autonomy, agency, and liability, he intends to develop legal approaches for regulating the use of autonomous systems and the harm they cause.

At a recent conference on the Ethics of Artificial Intelligence, Asaro discussed the liability issues surrounding the application of AI to weapons systems. He explained, “AI poses threats to international law itself — to the norms and standards that we rely on to hold people accountable for [decisions, and to] hold states accountable for military interventions — as [people are] able to blame systems for malfunctioning instead of taking responsibility for their decisions.”

The legal system will need to reconsider who is held liable to ensure that justice is served when an accident happens. Asaro argues that the moral and legal issues surrounding autonomous weapons are much different than the issues surrounding other autonomous machines, such as self-driving cars.

Though researchers still expect the occasional fatal accident to occur with self-driving cars, these autonomous vehicles are designed with safety in mind. One of the goals of self-driving cars is to save lives. “The fundamental difference is that with any kind of weapon, you’re intending to do harm, so that carries a special legal and moral burden,” Asaro explains. “There is a moral responsibility to ensure that [the weapon is] only used in legitimate and appropriate circumstances.”

Furthermore, liability with autonomous weapons is much more ambiguous than it is with self-driving cars and other domestic robots.

With self-driving cars, for example, bigger companies like Volvo intend to embrace strict liability – where the manufacturers assume full responsibility for accidental harm. Although it is not clear how all manufacturers will be held accountable for autonomous systems, strict liability and threats of class-action lawsuits incentivize manufacturers to make their product as safe as possible.

Warfare, on the other hand, is a much messier situation.

“You don’t really have liability in war,” says Asaro. “The US military could sue a supplier for a bad product, but as a victim who was wrongly targeted by a system, you have no real legal recourse.”

Autonomous weapons only complicate this. “These systems become more unpredictable as they become more sophisticated, so psychologically commanders feel less responsible for what those systems do. They don’t internalize responsibility in the same way,” Asaro explained at the Ethics of AI conference.

To ensure that commanders internalize responsibility, Asaro suggests that “the system has to allow humans to actually exercise their moral agency.”

That is, commanders must demonstrate that they can fully control the system before they use it in warfare. Once they demonstrate control, it can become clearer who can be held accountable for the system’s actions.

 

Preparing for the Unknown

Behind these concerns about liability, lies the overarching concern that autonomous machines might act in ways that humans never intended. Asaro asks: “When these systems become more autonomous, can the owners really know what they’re going to do?”

Even the programmers and manufacturers may not know what their machines will do. The purpose of developing autonomous machines is so they can make decisions themselves – without human input. And as the programming inside an autonomous system becomes more complex, people will increasingly struggle to predict the machine’s action.

Companies and governments must be prepared to handle the legal complexities of a domestic or military robot or system causing unintended harm. Ensuring justice for those who are harmed may not be possible without a clear framework for liability.

Asaro explains, “We need to develop policies to ensure that useful technologies continue to be developed, while ensuring that we manage the harms in a just way. A good start would be to prohibit automating decisions over the use of violent and lethal force, and to focus on managing the safety risks in beneficial autonomous systems.”

Peter Asaro also spoke about this work on an FLI podcast. You can learn more about his work at http://www.peterasaro.org.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

MIRI’S November 2016 Newsletter

Post-fundraiser update: Donors rallied late last month to get us most of the way to our first fundraiser goal, but we ultimately fell short. This means that we’ll need to make up the remaining $160k gap over the next month if we’re going to move forward on our 2017 plans. We’re in a good position to expand our research staff and trial a number of potential hires, but only if we feel confident about our funding prospects over the next few years.Since we don’t have an official end-of-the-year fundraiser planned this time around, we’ll be relying more on word-of-mouth to reach new donors. To help us with our expansion plans, donate at https://intelligence.org/donate/ — and spread the word!

Research updates

General updates

News and links

Cybersecurity and Machine Learning

When it comes to cybersecurity, no nation can afford to slack off. If a nation’s defense systems cannot anticipate how an attacker will try to fool them, then an especially clever attack could expose military secrets or use disguised malware to cause major networks to crash.

A nation’s defense systems must keep up with the constant threat of attack, but this is a difficult and never-ending process. It seems that the defense is always playing catch-up.

Ben Rubinstein, a professor at the University of Melbourne in Australia, asks: “Wouldn’t it be good if we knew what the malware writers are going to do next, and to know what type of malware is likely to get through the filters?”

In other words, what if defense systems could learn to anticipate how attackers will try to fool them?

 

Adversarial Machine Learning

In order to address this question, Rubinstein studies how to prepare machine-learning systems to catch adversarial attacks. In the game of national cybersecurity, these adversaries are often individual hackers or governments who want to trick machine-learning systems for profit or political gain.

Nations have become increasingly dependent on machine-learning systems to protect against such adversaries. Unaided by humans, machine-learning systems in anti-malware and facial recognition software have the ability to learn and improve their function as they encounter new data. As they learn, they become better at catching adversarial attacks.

Machine-learning systems are generally good at catching adversaries, but they are not completely immune to threats, and adversaries are constantly looking for new ways to fool them. Rubinstein says, “Machine learning works well if you give it data like it’s seen before, but if you give it data that it’s never seen before, there’s no guarantee that it’s going to work.”

With adversarial machine learning, security agencies address this weakness by presenting the system with different types of malicious data to test the system’s filters. The system then digests this new information and learns how to identify and capture malware from clever attackers.

 

Security Evaluation of Machine-Learning Systems

Rubinstein’s project is called “Security Evaluation of Machine-Learning Systems”, and his ultimate goal is to develop a software tool that companies and government agencies can use to test their defenses. Any company or agency that uses machine-learning systems could run his software against their system. Rubinstein’s tool would attack and try to fool the system in order to expose the system’s vulnerabilities. In doing so, his tool anticipates how an attacker could slip by the system’s defenses.

The software would evaluate existing machine-learning systems and find weak spots that adversaries might try to exploit – similar to how one might defend a castle.

“We’re not giving you a new castle,” Rubinstein says, “we’re just going to walk around the perimeter and look for holes in the walls and weak parts of the castle, or see where the moat is too shallow.”

By analyzing many different machine-learning systems, his software program will pick up on trends and be able to advise security agencies to either use a different system or bolster the security of their existing system. In this sense, his program acts as a consultant for every machine-learning system.

Consider a program that does facial recognition. This program would use machine learning to identify faces and catch adversaries that pretend to look like someone else.

Rubinstein explains: “Our software aims to automate this security evaluation so that it takes an image of a person and a program that does facial recognition, and it will tell you how to change its appearance so that it will evade detection or change the outcome of machine learning in some way.”

This is called a mimicry attack – when an adversary makes one instance (one face) look like another, and thereby fools a system.

To make this example easier to visualize, Rubinstein’s group built a program that demonstrates how to change a face’s appearance to fool a machine-learning system into thinking that it is another face.

In the image below, the two faces don’t look alike, but the left image has been modified so that the machine-learning system thinks it is the same as the image on the right. This example provides insight into how adversaries can fool machine-learning systems by exploiting quirks.

ben-rubinstein-facial-recognition

When Rubinstein’s software fools a system with a mimicry attack, security personnel can then take that information and retrain their program to establish more effective security when the stakes are higher.

 

Minimizing the Attacker’s Advantage

While Rubinstein’s software will help to secure machine-learning systems against adversarial attacks, he has no illusions about the natural advantages that attackers enjoy. It will always be easier to attack a castle than to defend it, and the same holds true for a machine-learning system. This is called the ‘asymmetry of cyberwarfare.’

“The attacker can come in from any angle. It only needs to succeed at one point, but the defender needs to succeed at all points,” says Rubinstein.

In general, Rubinstein worries that the tools available to test machine-learning systems are theoretical in nature, and put too much responsibility on the security personnel to understand the complex math involved. A researcher might redo the mathematical analysis for every new learning system, but security personnel are unlikely to have the time or resources to keep up.

Rubinstein aims to “bring what’s out there in theory and make it more applied and more practical and easy for anyone who’s using machine learning in a system to evaluate the security of their system.”

With his software, Rubinstein intends to help level the playing field between attackers and defenders. By giving security agencies better tools to test and adapt their machine-learning systems, he hopes to improve the ability of security personnel to anticipate and guard against cyberattacks.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.