Can We Properly Prepare for the Risks of Superintelligent AI?

Risks Principle: Risks posed by AI systems, especially catastrophic or existential risks, must be subject to planning and mitigation efforts commensurate with their expected impact.

We don’t know what the future of artificial intelligence will look like. Though some may make educated guesses, the future is unclear.

AI could keep developing like all other technologies, helping us transition from one era into a new one. Many, if not all, AI researchers hope it could help us transform into a healthier, more intelligent, peaceful society. But it’s important to remember that AI is a tool and, as such, not inherently good or bad. As with any other technology or tool, there could be unintended consequences. Rarely do people actively attempt to crash their cars or smash their thumbs with hammers, yet both happen all the time.

A concern is that as technology becomes more advanced, it can affect more people. A poorly swung hammer is likely to only hurt the person holding the nail. A car accident can harm passengers and drivers in both cars, as well as pedestrians. A plane crash can kill hundreds of people. Now, automation threatens millions of jobs — and while presumably no lives will be lost as a direct result, mass unemployment can have devastating consequences.

And job automation is only the beginning. When AI becomes very general and very powerful, aligning it with human interests will be challenging. If we fail, AI could plausibly become an existential risk for humanity.

Given the expectation that advanced AI will far surpass any technology seen to date — and possibly surpass even human intelligence — how can we predict and prepare for the risks to humanity?

To consider the Risks Principle, I turned to six AI researchers and philosophers.


Non-zero Probability

An important aspect of considering the risk of advanced AI is recognizing that the risk exists, and it should be taken into account.

As Roman Yampolskiy, an associate professor at the University of Louisville, explained, “Even a small probability of existential risk becomes very impactful once multiplied by all the people it will affect. Nothing could be more important than avoiding the extermination of humanity.”

This is “a very reasonable principle,” said Bart Selman, a professor at Cornell University. He explained, “I sort of refer to some of the discussions between AI scientists who might differ in how big they think that risk is. I’m quite certain it’s not zero, and the impact could be very high. So … even if these things are still far off and we’re not clear if we’ll ever reach them, even with a small probability of a very high consequence we should be serious about these issues. And again, not everybody, but the subcommunity should.”

Anca Dragan, an assistant professor at UC Berkeley was more specific about her concerns. “An immediate risk is agents producing unwanted, surprising behavior,” she explained. “Even if we plan to use AI for good, things can go wrong, precisely because we are bad at specifying objectives and constraints for AI agents. Their solutions are often not what we had in mind.”


Considering Other Risks

While most people I spoke with interpreted this Principle to address longer-term risks of AI, Dan Weld, a professor at the University of Washington, took a more nuanced approach.

“How could I disagree?” He asked. “Should we ignore the risks of any technology and not take precautions? Of course not. So I’m happy to endorse this one. But it did make me uneasy, because there is again an implicit premise that AI systems have a significant probability of posing an existential risk.”

But then he added, “I think what’s going to happen is – long before we get superhuman AGI – we’re going to get superhuman artificial *specific* intelligence. … These narrower kinds of intelligence are going to be at the superhuman level long before a *general* intelligence is developed, and there are many challenges that accompany these more narrowly described intelligences.”

“One technology,” he continued, “that I wish [was] discussed more is explainable machine learning. Since machine learning is at the core of pretty much every AI success story, it’s really important for us to be able to understand *what* it is that the machine learned. And, of course, with deep neural networks it is notoriously difficult to understand what they learned. I think it’s really important for us to develop techniques so machines can explain what they learned so humans can validate that understanding. … Of course, we’ll need explanations before we can trust an AGI, but we’ll need it long before we achieve general intelligence, as we deploy much more limited intelligent systems. For example, if a medical expert system recommends a treatment, we want to be able to ask, ‘Why?’

“Narrow AI systems, foolishly deployed, could be catastrophic. I think the immediate risk is less a function of the intelligence of the system than it is about the system’s autonomy, specifically the power of its effectors and the type of constraints on its behavior. Knight Capital’s automated trading system is much less intelligent than Google Deepmind’s AlphaGo, but the former lost $440 million in just forty-five minutes. AlphaGo hasn’t and can’t hurt anyone. … And don’t get me wrong – I think it’s important to have some people thinking about problems surrounding AGI; I applaud supporting that research. But I do worry that it distracts us from some other situations which seem like they’re going to hit us much sooner and potentially cause calamitous harm.”


Open to Interpretation

Still others I interviewed worried about how the Principle might be interpreted, and suggested reconsidering word choices, or rewriting the principle altogether.

Patrick Lin, an Associate Professor at California Polytechnic State University, believed that the Principle is too ambiguous.

He explained, “This sounds great in ‘principle,’ but you need to work it out. For instance, it could be that there’s this catastrophic risk that’s going to affect everyone in the world. It could be AI or an asteroid or something, but it’s a risk that will affect everyone. But the probabilities are tiny — 0.000001 percent, let’s say. Now if you do an expected utility calculation, these large numbers are going to break the formula every time. There could be some AI risk that’s truly catastrophic, but so remote that if you do an expected utility calculation, you might be misled by the numbers.”

“I agree with it in general,” Lin continued, “but part of my issue with this particular phrasing is the word ‘commensurate.’ Commensurate meaning an appropriate level that correlates to its severity. So I think how we define commensurate is going to be important. Are we looking at the probabilities? Are we looking at the level of damage? Or are we looking at expected utility? The different ways you look at risk might point you to different conclusions. I’d be worried about that. We can imagine all sorts of catastrophic risks from AI or robotics or genetic engineering, but if the odds are really tiny, and you still want to stick with this expected utility framework, these large numbers might break the math. It’s not always clear what the right way is to think about risk and a proper response to it.”

Meanwhile Nate Soares, the Executive Director of the Machine Intelligence Research Institute, suggested that the Principle should be more specific.

Soares said, “The principle seems too vague. … Maybe my biggest concern with it is that it leaves out questions of tractability: the attention we devote to risks shouldn’t actually be proportional to the risks’ expected impact; it should be proportional to the expected usefulness of the attention. There are cases where we should devote more attention to smaller risks than to larger ones, because the larger risk isn’t really something we can make much progress on. (There are also two separate and additional claims, namely ‘also we should avoid taking actions with appreciable existential risks whenever possible’ and ‘many methods (including the default methods) for designing AI systems that are superhumanly capable in the domains of cross-domain learning, reasoning, and planning pose appreciable existential risks.’ Neither of these is explicitly stated in the principle.)

“If I were to propose a version of the principle that has more teeth, as opposed to something that quickly mentions ‘existential risk’ but doesn’t give that notion content or provide a context for interpreting it, I might say something like: ‘The development of machines with par-human or greater abilities to learn and plan across many varied real-world domains, if mishandled, poses enormous global accident risks. The task of developing this technology therefore calls for extraordinary care. We should do what we can to ensure that relations between segments of the AI research community are strong, collaborative, and high-trust, so that researchers do not feel pressured to rush or cut corners on safety and security efforts.’”


What Do You Think?

How can we prepare for the potential risks that AI might pose? How can we address longer-term risks without sacrificing research for shorter-term risks? Human history is rife with learning from mistakes, but in the case of the catastrophic and existential risks that AI could present, we can’t allow for error – but how can we plan for problems we don’t know how to anticipate? AI safety research is critical to identifying unknown unknowns, but is there more the the AI community or the rest of society can do to help mitigate potential risks?

This article is part of a weekly series on the 23 Asilomar AI Principles.

The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here.

Why Freezing North Korea’s Weapons Programs Would Make Us Safer

Last week, China proposed a way to reduce tensions on the Korean peninsula: Pyongyang would freeze its missile and nuclear programs in exchange for Washington and Seoul halting their current round of military exercises. China also sees this as a way of starting talks between the United States and North Korea, which it believes is necessary to resolve hostilities on the peninsula.

In comments in South Korea on Friday, Secretary of State Rex Tillerson said “We do not believe the conditions are right to engage in any talks at this time. … [C]onditions must change before there is any scope for talks to resume, whether they be five-party or six-party.” Whether the last phrase means the United States is consciously rejecting the idea of one-on-one talks with the North is not clear.

Tillerson also said he believes it is “premature” to talk about a freeze since a freeze would leave North Korea with significant military capabilities. His choice of words appears to leave the possibility of a freeze on the table.

A North Korean freeze on nuclear tests and missile flight tests would be highly beneficial—and readily verifiable. It would prevent Pyongyang from developing a long-range missile capable of hitting the United States.

It is important, of course, to work to eliminate the capabilities North Korea currently has, but a sensible first step is to keep it from increasing its capabilities.


Freezing North Korean Development Would Be Valuable

Over the last two decades, Pyongyang has developed technologies beyond what many thought it was capable of doing. It has developed nuclear weapons, an array of short- and medium-range missiles that could pose a significant threat to its neighbors, and a large satellite launcher that has placed two objects in orbit (although neither appears to have been operational once in orbit).

These weapon systems are not particularly sophisticated and in some cases are highly unreliable but, as Secretary Tillerson noted, they represent significant military capabilities.

And if these development programs continue, things will get considerably worse. The past shows North Korea’s ability to make slow but steady progress, and that will continue. After all, it is developing weapons systems that the United States, Soviet Union, and others developed 50 years ago.

What could a freeze do?


A Freeze on Nuclear Tests

North Korea has now conducted five nuclear tests, the last of which indicated it has developed a nuclear device that can produce a yield similar to that of the Hiroshima and Nagasaki bombs.

Getting a nuclear device to explode under test conditions, however, is not the same as having a usable nuclear warhead. A deliverable warhead needs to be small enough, both in terms of physical size and mass, to be carried by North Korea’s ballistic missiles. It needs to work in a realistic environment, rather than just in a static test environment.  In particular, it must withstand the considerable forces of missile delivery, including acceleration, vibrations, and buffeting during launch and reentry.

If the North does not yet have a weapon that is small and rugged enough for delivery by ballistic missile, stopping additional nuclear tests could keep it from developing one. Even if it does have such a weapon, stopping testing would limit its ability to improve its design.

A testing freeze could be verified by the global network of seismic and other types of sensors that make up the International Monitoring System. This network is sensitive enough to have detected North Korea’s previous tests—including the 2006 test with an estimated yield of only about one kiloton.


A Freeze on Missile Tests

A freeze on missile flight testing would limit North Korea’s ability to build more capable and longer range missiles, and to determine the reliability of existing missiles or gain operational practice in firing them.

Today, North Korea’s longest range operational missile is the Nodong, with a range of about 1,300 km (800 miles)—much shorter than the roughly 9,000 km (5,600 miles) to the US West Coast. It does not currently have a ballistic missile that could carry a nuclear warhead to long distances.

However, it is developing a number of the components it needs to produce such a missile.

For example:

  • North Korea appears to have had one successful flight test, with five or more failures, of its Musudan (Hwasong-10) missile, which uses a more advanced fuel than the North’s previous missiles. Developing a working missile would require additional tests. Besides giving Pyongyang a considerably longer range missile than it currently has, successfully developing this missile would open the way to modifying it for use as an upper stage of longer range missiles.
  • In the last year, Pyongyang has conducted ground tests of several new engines (April 2016, September 2016, and March 2017). Some of these appear to use advanced fuels and have higher thrust than its current missile engines.
  • The North has paraded a prototype of a two-stage missile on a mobile launcher that could have a long range if it used the technology being tested in the Musudan missile. One of the engine ground tests mentioned above may be of an engine intended for the first stage of this missile.
  • Pyongyang is in early stages of testing a solid-fueled missile that could be launched from a mobile launcher on the ground or from a submarine. While the range appears to be similar to the Nodong, it would have a much shorter launch-preparation time if fired from mobile launchers on the ground, and could reach more distant targets if fired from a submarine at sea.
  • North Korea has not yet flight tested a reentry heat shield for a long-range missile, which is critical for successfully delivering a nuclear warhead. Over-designing the heat shield adds weight to the reentry vehicle, which reduces the range of the missile; Under-designing it can cause overheating of the warhead that can damage it. Moreover, North Korea is likely to design a blunt reentry vehicle that reenters slowly to reduce the intensity of heating, and this can lead to very large inaccuracies—tens of kilometers or more.

Transforming these pieces into working missiles would require a series of flight tests. A freeze on missile testing would keep it from developing that capability.

A flight test ban would be completely verifiable. The United States has a satellite network of infrared sensors that can detect launches essentially anywhere in the world. This system, for example, even detected the short-range Scud missiles launched at Israel during the first Gulf War. And the United States is currently deploying a new generation of even more capable satellites for this job.


Would North Korea Be Willing to Freeze?

No one knows.

It is worth remembering that North Korea observed a flight test moratorium from September 1999 through July 2006, which it began when its talks with the Clinton administration about missile and nuclear issues seemed to be moving ahead. It announced it was no longer bound by the moratorium in March 2005, in response to the Bush administration’s lack of diplomatic engagement on missile issues, and resumed testing the next year.

Things are different now, of course, and it’s not at all clear that Pyongyang would agree to a freeze. Kim Jong Un may be unwilling to stop until he has developed a credible long-range threat.

However, Kim clearly sees the US-South Korean military exercises as threatening, and offering to scale back those exercises may give the United States significant leverage. Agreeing to talks once a freeze is in place could add leverage.

But whether or not North Korea would ultimately agree to a freeze, the United States should not be the one to take this option off the table. A freeze would be an effective, meaningful step in limiting further development of North Korea’s nuclear and missile programs—and the administration should be doing what it can to put a freeze in place.

The AI Debate Must Stay Grounded in Reality

The following article was written by Vincent Conitzer and originally posted in Prospect Magazine.

Progress in artificial intelligence has been rapid in recent years. Computer programs are dethroning humans in games ranging from Jeopardy to Go to poker. Self-driving cars are appearing on roads. AI is starting to outperform humans in image and speech recognition.

With all this progress, a host of concerns about AI’s impact on human societies have come to the forefront. How should we design and regulate self-driving cars and similar technologies? Will AI leave large segments of the population unemployed? Will AI have unintended sociological consequences? (Think about algorithms that accurately predict which news articles a person will like resulting in highly polarised societies, or algorithms that predict whether someone will default on a loan or commit another crime becoming racially biased due to the input data they are given.)

Will AI be abused by oppressive governments to sniff out and stifle any budding dissent? Should we develop weapons that can act autonomously? And should we perhaps even be concerned that AI will eventually become “superintelligent”—intellectually more capable than human beings in every important way—making us obsolete or even extinct? While this last concern was once purely in the realm of science fiction, notable figures including Elon Musk, Bill Gates, and Stephen Hawking, inspired by Oxford philosopher Nick Bostrom’s Superintelligence book, have recently argued it needs to be taken seriously.

These concerns are mostly quite distinct from each other, but they all rely on the premise of technical advances in AI. Actually, in all cases but the last one, even just currently demonstrated AI capabilities justify the concern to some extent, but further progress will rapidly exacerbate it. And further progress seems inevitable, both because there do not seem to be any fundamental obstacles to it and because large amounts of resources are being poured into AI research and development. The concerns feed off each other and a community of people studying the risks of AI is starting to take shape. This includes traditional AI researchers—primarily computer scientists—as well as people from other disciplines: economists studying AI-driven unemployment, legal scholars debating how best to regulate self-driving cars, and so on.

A conference on “Beneficial AI” held in California in January brought a sizeable part of this community together. The topics covered reflected the diversity of concerns and interests. One moment, the discussion centered on which communities are disproportionately affected by their jobs being automated; the next moment, the topic was whether we should make sure that super-intelligent AI has conscious experiences. The mixing together of such short- and long-term concerns does not sit well with everyone. Most traditional AI researchers are reluctant to speculate about whether and when we will attain truly human-level AI: current techniques still seem a long way off this and it is not clear what new insights would be able to close the gap. Most of them would also rather focus on making concrete technical progress than get mired down in philosophical debates about the nature of consciousness. At the same time, most of these researchers are willing to take seriously the other concerns, which have a concrete basis in current capabilities.

Is there a risk that speculation about super-intelligence, often sounding like science fiction more than science, will discredit the larger project of focusing on the societally responsible development of real AI? And if so, is it perhaps better to put aside any discussion of super-intelligence for now? While I am quite sceptical of the idea that truly human-level AI will be developed anytime soon, overall I think that the people worried about this deserve a place at the table in these discussions. For one, some of the most surprisingly impressive recent technical accomplishments have come from people who are very bullish on what AI can achieve. Even if it turns out that we are still nowhere close to human-level AI, those who imagine that we are could contribute useful insights into what might happen in the medium-term.

I think there is value even in thinking about some of the very hard philosophical questions, such as whether AI could ever have subjective experiences, whether there is something it would be like to be a highly advanced AI system. (See also my earlier Prospect article.) Besides casting an interesting new light on some ancient questions, the exercise is likely to inform future societal debates. For example, we may imagine that in the future people will become attached to the highly personalised and anthropomorphised robots that care for them in old age, and demand certain rights for these robots after they pass away. Should such rights be granted? Should such sentiments be avoided?

At the same time, the debate should obviously not exclude or turn off people who genuinely care about the short-term concerns while being averse to speculation about the long-term, especially because most real AI researchers fall in this last category. Besides contributing solutions to the short-term concerns, their participation is essential to ensure that the longer-term debate stays grounded in reality. Research communities work best when they include people with different views and different sub-interests. And it is hard to imagine a topic for which this is truer than the impact of AI on human societies.

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we post op-eds that we believe will help spur discussion within our community. Op-eds do not necessarily represent FLI’s opinions or views.

Artificial Intelligence and Income Inequality

Shared Prosperity Principle: The economic prosperity created by AI should be shared broadly, to benefit all of humanity.

Income inequality is a well recognized problem. The gap between the rich and poor has grown over the last few decades, but it became increasingly pronounced after the 2008 financial crisis. While economists debate the extent to which technology plays a role in global inequality, most agree that tech advances have exacerbated the problem.

In an interview with the MIT Tech Review, economist Erik Brynjolfsson said, “My reading of the data is that technology is the main driver of the recent increases in inequality. It’s the biggest factor.”

Which begs the question: what happens as automation and AI technologies become more advanced and capable?

Artificial intelligence can generate great value by providing services and creating products more efficiently than ever before. But many fear this will lead to an even greater disparity between the wealthy and the rest of the world.

AI expert Yoshua Bengio suggests that equality and ensuring a shared benefit from AI could be pivotal in the development of safe artificial intelligence. Bengio, a professor at the University of Montreal, explains, “In a society where there’s a lot of violence, a lot of inequality, [then] the risk of misusing AI or having people use it irresponsibly in general is much greater. Making AI beneficial for all is very central to the safety question.”

In fact, when speaking with many AI experts across academia and industry, the consensus was unanimous: the development of AI cannot benefit only the few.

Broad Agreement

“It’s almost a moral principle that we should share benefits among more people in society,” argued Bart Selman, a professor at Cornell University. “I think it’s now down to eight people who have as much as half of humanity. These are incredible numbers, and of course if you look at that list it’s often technology pioneers that own that half. So we have to go into a mode where we are first educating the people about what’s causing this inequality and acknowledging that technology is part of that cost, and then society has to decide how to proceed.”

Guruduth Banavar, Vice President of IBM Research, agreed with the Shared Prosperity Principle, but said, “It needs rephrasing. This is broader than AI work. Any AI prosperity should be available for the broad population. Everyone should benefit and everyone should find their lives changed for the better. This should apply to all technology – nanotechnology, biotech – it should all help to make life better. But I’d write it as ‘prosperity created by AI should be available as an opportunity to the broadest population.’”

Francesca Rossi, a research scientist at IBM, added, “I think [this principle is] very important. And it also ties in with the general effort and commitment by IBM to work a lot on education and re-skilling people to be able to engage with the new technologies in the best way. In that way people will be more able to take advantage of all the potential benefits of AI technology. That also ties in with the impact of AI on the job market and all the other things that are being discussed. And they are very dear to IBM as well, in really helping people to benefit the most out of the AI technology and all the applications.”

Meanwhile, Stanford’s Stefano Ermon believes that research could help ensure greater equality. “It’s very important that we make sure that AI is really for everybody’s benefit,” he explained, “that it’s not just going to be benefitting a small fraction of the world’s population, or just a few large corporations. And I think there is a lot that can be done by AI researchers just by working on very concrete research problems where AI can have a huge impact. I’d really like to see more of that research work done.”

A Big Challenge

“AI is having incredible successes and becoming widely deployed. But this success also leads to a big challenge,” said Dan Weld, a professor at the University of Washington. “[That is] its impending potential to increase productivity to the point where many people may lose their jobs. As a result, AI is likely to dramatically increase income disparity, perhaps more so than other technologies that have come about recently. If a significant percentage of the populace loses employment, that’s going to create severe problems, right? We need to be thinking about ways to cope with these issues, very seriously and soon.”

Berkeley professor, Anca Dragan, summed up the problem when she asked, “If all the resources are automated, then who actually controls the automation? Is it everyone or is it a few select people?”

“I’m really concerned about AI worsening the effects and concentration of power and wealth that we’ve seen in the last 30 years,” Bengio added.

“It’s a real fundamental problem facing our society today, which is the increasing inequality and the fact that prosperity is not being shared around,” explained Toby Walsh, a professor at UNSW Australia.

“This is fracturing our societies and we see this in many places, in Brexit, in Trump,” Walsh continued. “A lot of dissatisfaction within our societies. So it’s something that we really have to fundamentally address. But again, this doesn’t seem to me something that’s really particular to AI. I think really you could say this about most technologies. … although AI is going to amplify some of these increasing inequalities. If it takes away people’s jobs and only leaves wealth in the hands of those people owning the robots, then that’s going to exacerbate some trends that are already happening.”

Kay Firth-Butterfield, the Executive Director of, also worries that AI could exacerbate an already tricky situation. “AI is a technology with such great capacity to benefit all of humanity,” she said, “but also the chance of simply exacerbating the divides between the developed and developing world, and the haves and have nots in our society. To my mind that is unacceptable and so we need to ensure, as Elon Musk said, that AI is truly democratic and its benefits are available to all.”

“Given that all the jobs (physical and mental) will be gone, [shared prosperity] is the only chance we have to be provided for,” added University of Louisville professor, Roman Yampolskiy.

What Do You Think?

Given current tech trends, is it reasonable to assume that AI will exacerbate today’s inequality issues? Will this lead to increased AI safety risks? How can we change the societal mindset that currently discourages a greater sharing of wealth? Or is that even a change we should consider?

This article is part of a weekly series on the 23 Asilomar AI Principles.

The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here.

How Self-Driving Cars Use Probability

Even though human drivers don’t consciously think in terms of probabilities, we observe our environment and make decisions based on the likelihood of certain things happening. A driver doesn’t calculate the probability that the sports car behind her will pass her, but through observing the car’s behavior and considering similar situations in the past, she makes her best guess.

We trust probabilities because it is the only way to take action in the midst of uncertainty.

Autonomous systems such as self-driving cars will make similar decisions based on probabilities, but through a different process. Unlike a human who trusts intuition and experience, these autonomous cars calculate the probability of certain scenarios using data collectors and reasoning algorithms.


How to Determine Probability

Stefano Ermon, a computer scientist at Stanford University, wants to make self-driving cars and autonomous systems safer and more reliable by improving the way they reason probabilistically about their environment. He explains, “The challenge is that you have to take actions and you don’t know what will happen next. Probabilistic reasoning is just the idea of thinking about the world in terms of probabilities, assuming that there is uncertainty.”

There are two main components to achieve safety. First, the computer model must collect accurate data, and second, the reasoning system must be able to draw the right conclusions from the model’s data.

Ermon explains, “You need both: to build a reliable model you need a lot of data, and then you need to be able to draw the right conclusions based on the model, and that requires the artificial intelligence to think about these models accurately. Even if the model is right, but you don’t have a good way to reason about it, you can do catastrophic things.”

For example, in the context of autonomous vehicles, models use various sensors to observe the environment and collect data about countless variables, such as the behavior of the drivers around you, potholes and other obstacles in front of you, weather conditions—every possible data point.

A reasoning system then interprets this data. It uses the model’s information to decide whether the driver behind you is dangerously aggressive, if the pothole ahead will puncture your tire, if the rain is obstructing visibility, and the system continuously changes the car’s behavior to respond to these variables.

Consider the aggressive driver behind you. As Ermon explains, “Somehow you need to be able to reason about these models. You need to come up with a probability. You don’t know what the car’s going to do but you can estimate, and based on previous behavior you can say this car is likely to cut the line because it has been driving aggressively.”


Improving Probabilistic Reasoning

Ermon is creating strong algorithms that can synthesize all of the data that a model produces and make reliable decisions.

As models improve, they collect more information and capture more variables relevant to making these decisions. But as Ermon notes, “the more complicated the model is, the more variables you have, the more complicated it becomes to make the optimal decisions based on the model.”

Thus as the data collection expands, the analysis must also improve. The artificial intelligence in these cars must be able to reason with this increasingly complex data.

And this reasoning can easily go wrong. “You need to be very precise when computing these probabilities,” Ermon explains. “If the probability that a car cuts into your lane is 0.1, but you completely underestimate it and say it’s 0.01, you might end up making a fatal decision.”

To avoid fatal decisions, the artificial intelligence must be robust, but the data must also be complete. If the model collects incomplete data, “you have no guarantee that the number that you get when you run this algorithm has anything to do with the actual probability of that event,” Ermon explains.

The model and the algorithm entirely depend on each other to produce the optimal decision. If the model is incomplete and fails to capture the black ice in front of you, no reasoning system will be able to make a safe decision. And even if the model captures the black ice and every other possible variable, if the reasoning system cannot handle the complexity of this data, again the car will fail.


How Safe Will Autonomous Systems Be?

The technology in self-driving cars has made huge leaps lately, and Ermon is hopeful. “Eventually, as computers get better and algorithms get better and the models get better, hopefully we’ll be able to prevent all accidents,” he suggests.

However, there are still fundamental limitations on probabilistic reasoning. “Most computer scientists believe that it is impossible to come up with the silver bullet for this problem, an optimal algorithm that is so powerful that it can reason about all sorts of models that you can think about,” Ermon explains. “That’s the key barrier.”

But despite this barrier, self-driving cars will soon be available for consumers. Ford, for one, has promised to put its self-driving cars on the road by 2021. And while most computer scientists expect these cars to be far safer than human drivers, their success depends on their ability to reason probabilistically about their environment.

As Ermon explains, “You need to be able to estimate these kinds of probabilities because they are the building blocks that you need to make decisions.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Is an AI Arms Race Inevitable?

AI Arms Race Principle: An arms race in lethal autonomous weapons should be avoided.*

Perhaps the scariest aspect of the Cold War was the nuclear arms race. At its peak, the US and Russia held over 70,000 nuclear weapons, only a fraction of which could have killed every person on earth. As the race to create increasingly powerful artificial intelligence accelerates, and as governments increasingly test AI capabilities in weapons, many AI experts worry that an equally terrifying AI arms race may already be under way.

In fact, at the end of 2015, the Pentagon requested $12-$15 billion for AI and autonomous weaponry for the 2017 budget, and the Deputy Defense Secretary at the time, Robert Work, admitted that he wanted “our competitors to wonder what’s behind the black curtain.” Work also said that the new technologies were “aimed at ensuring a continued military edge over China and Russia.”

But the US does not have a monopoly on this technology, and many fear that countries with lower safety standards could quickly pull ahead. Without adequate safety in place, autonomous weapons could be more difficult to control, create even greater risk of harm to innocent civilians, and more easily fall into the hands of terrorists, dictators, reckless states, or others with nefarious intentions.

Anca Dragan, an assistant professor at UC Berkeley, described the possibility of such an AI arms race as “the equivalent of very cheap and easily accessible nuclear weapons.”

“And that would not fare well for us,” Dragan added.

Unlike nuclear weapons, this new class of WMD can potentially target by traits like race or even by what people have liked on social media.

Lethal Autonomous Weapons

Toby Walsh, a professor at UNSW Australia, took the lead on the 2015 autonomous weapons open letter, which calls for a ban on lethal autonomous weapons and has been signed by over 20,000 people. With regard to that letter and the AI Arms Race Principle, Walsh explained:

“One reason that I got involved in these discussions is that there are some topics I think are very relevant today, and one of them is the arms race that’s happening amongst militaries around the world already, today. This is going to be very destabilizing. It’s going to upset the current world order when people get their hands on these sorts of technologies. It’s actually stupid AI that they’re going to be fielding in this arms race to begin with and that’s actually quite worrying – that it’s technologies that aren’t going to be able to distinguish between combatants and civilians, and aren’t able to act in accordance with international humanitarian law, and will be used by despots and terrorists and hacked to behave in ways that are completely undesirable. And that’s something that’s happening today.”

When asked about his take on this Principle, University of Montreal professor Yoshua Bengio pointed out that he had signed the autonomous weapons open letter, which basically “says it all” about his concerns of a potential AI arms race.

Details and Definitions

In addition to worrying about the risks of a race, Dragan also expressed a concern over “what to do about it and how to avoid it.”

“I assume international treaties would have to occur here,” she said.

Dragan’s not the only one expecting international treaties. The UN recently agreed to begin formal discussions that will likely lead to negotiations on an autonomous weapons ban or restrictions. However, as with so many things, the devil will be in the details.

In reference to an AI arms race, Cornell professor Bart Selman stated, “It should be avoided.” But he also added, “There’s a difference between it ‘should’ be avoided and ‘can’ it be avoided – that may be a much harder question.”

Selman would like to see “the same kinds of discussions as there were around atomic weapons or biological weapons, where people actually start to look at the tradeoffs and the risks of an arms race.”

“That discussion has to be had,” he said, “and it may actually bring people together in a positive way. Countries could get together and say this is not a good development and we should limit it and avoid it. So to bring it out as a principle, I think the main value there is that we need to have the discussion as a society and with other countries.”

Dan Weld, a professor at the University of Washington, also worries that simply saying an arms race should be avoided is insufficient.

“I fervently hope we don’t see an arms race in lethal autonomous weapons,” Weld explained. “That said, this principle bothered me, because it doesn’t seem to have any operational form. Specifically, an arms race is a dynamic phenomenon that happens when you’ve got multiple agents interacting. It takes two people to race. So whose fault is it if there is a race? I’m worried that both participants will point a finger at the other and say, ‘Hey, I’m not racing! Let’s not have a race, but I’m going to make my weapons more accurate and we can avoid a race if you just relax.’ So what force does the principle have?”

General Consensus

Though preventing an AI arms race may be tricky, there seems to be general consensus that a race would be bad and should be avoided.

“Weaponized AI is a weapon of mass destruction and an AI arms race is likely to lead to an existential catastrophe for humanity,” said Roman Yampolskiy, a professor at the University of Louisville.

Kay Firth-Butterfield, the Executive Director of, explained, “Any arms race should be avoided but particularly this one where the stakes are so high and the possibility of such weaponry, if developed, being used within domestic policing is so terrifying.”

But Stanford professor Stefano Ermon may have summed it up best when he said, “Even just with the capabilities we have today it’s not hard to imagine how [AI] could be used in very harmful ways. I don’t want my contributions to the field and any kind of techniques that we’re all developing to do harm to other humans or to develop weapons or to start wars or to be even more deadly than what we already have.”

What do you think?

Is an AI arms race inevitable? How can it be prevented? Can we keep autonomous weapons out of the hands of dictators and terrorists? How can companies and governments work together to build beneficial AI without allowing the technology to be used to create what could be the deadliest weapons the world has ever seen?

This article is part of a weekly series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here.

*The AI Arms Race Principle specifically addresses lethal autonomous weapons. Later in the series, we’ll discuss the Race Avoidance Principle which will look at the risks of companies racing to creating AI technology.

Using Machine Learning to Address AI Risk

The following article and talk are by Jessica Taylor and they were originally posted on MIRI.

At the EA Global 2016 conference, I gave a talk on “Using Machine Learning to Address AI Risk”:

It is plausible that future artificial general intelligence systems will share many qualities in common with present-day machine learning systems. If so, how could we ensure that these systems robustly act as intended? We discuss the technical agenda for a new project at MIRI focused on this question.

A recording of my talk is now up online:



The talk serves as a quick survey (for a general audience) of the kinds of technical problems we’re working on under the “Alignment for Advanced ML Systems” research agenda. Included below is a version of the talk in blog post form.1

Talk outline:

1. Goal of this research agenda
2. Six potential problems with highly capable AI systems

2.1. Actions are hard to evaluate

2.2. Ambiguous test examples

2.3. Difficulty imitating human behavior

2.4. Difficulty specifying goals about the real world

2.5. Negative side-effects

2.6. Edge cases that still satisfy the goal

3. Technical details on one problem: inductive ambiguity identification

3.1. KWIK learning

3.2. A Bayesian view of the problem

4. Other agendas

Goal of this research agenda

This talk is about a new research agenda aimed at using machine learning to make AI systems safe even at very high capability levels. I’ll begin by summarizing the goal of the research agenda, and then go into more depth on six problem classes we’re focusing on.

The goal statement for this technical agenda is that we want to know how to train a smarter-than-human AI system to perform one or more large-scale, useful tasks in the world.

Some assumptions this research agenda makes:

  1. Future AI systems are likely to look like more powerful versions of present-day ML systems in many ways. We may get better deep learning algorithms, for example, but we’re likely to still be relying heavily on something like deep learning.2
  2. Artificial general intelligence (AGI) is likely to be developed relatively soon (say, in the next couple of decades).3
  3. Building task-directed AGI is a good idea, and we can make progress today studying how to do so.

I’m not confident that all three of these assumptions are true, but I think they’re plausible enough to deserve about as much attention from the AI community as the likeliest alternative scenarios.

A task-directed AI system is a system that pursues a semi-concrete objective in the world, like “build a million houses” or “cure cancer.” For those who have read Superintelligence, task-directed AI is similar to the idea of genie AI. Although these tasks are kind of fuzzy — there’s probably a lot of work you’d need to do to clarify what it really means to build a million houses, or what counts as a good house — they’re at least somewhat concrete.

An example of an AGI system that isn’t task-directed would be one with a goal like “learn human values and do things humans would consider good upon sufficient reflection.” This is too abstract to count as a “task” in the sense we mean; it doesn’t directly cash out in things in the world.

The hope is that even though task-directed AI pursues a less ambitious objective then “learn human values and do what we’d want it to do,” it’s still sufficient to prevent global catastrophic risks. Once the immediate risks are averted, we can then work on building more ambitious AI systems under reduced time pressure.

Task-directed AI uses some (moderate) amount of human assistance to clarify the goal and to evaluate and implement its plans. A goal like “cure cancer” is vague enough that humans will have to do some work to clarify what they mean by it, though most of the intellectual labor should be coming from the AI system rather than from humans.

Ideally, task-directed AI also shouldn’t require significantly more computational resources than competing systems. You shouldn’t get an exponential slowdown from building a safe system vs. a generic system.

In order to think about this overall goal, we need some kind of model for these future systems. The general approach that I’ll take is to look at current systems and imagine that they’re more powerful. A lot of the time you can look at tasks that people do in ML and you can see that the performance improves over time. We’ll model more advanced AI systems by just supposing that systems will continue to achieve higher scores in ML tasks. We can then ask what kinds of failure modes are likely to arise as systems improve, and what we can work on today to make those failures less likely or less costly.

Six potential problems with highly capable AI systems

Problem 1: Actions are hard to evaluate

Slide 13Suppose an AI system composes a story, and a human gives the system a reward based on how good the story is.4

This is similar to some RL tasks: the agent wants to do something that will cause it to receive a high reward in the future. The formalism of RL would say that the objective of this RL agent is to write a story that the human is expected to give a high score to.

For this objective to actually help us receive very high-quality stories, however, we also need to know that the human understands the RL agent’s actions well enough to correctly administer rewards. This assumption seems less likely to hold for systems that are optimizing the objective much more powerfully than any present-day system. For example:

  • A system much smarter than a human may be able to manipulate or coerce the human into giving a bad story a high score.
  • Even if the system is less intelligent than that, it might resort to plagiarism. Plagiarism can be easier to generate than to detect, since detection often requires scouring a larger pool of source texts.
  • A subhuman system might also have an advantage in inserting steganography into the story; it might take polynomial time to embed a secret message, and exponential time to detect such a message. Finding a way to discourage agents from taking covert actions like these would make it easier to monitor those actions’ effects and keep operators in the loop.

Do we have a general way of preventing this? Can we train an RL system to not only output an action (e.g., a story), but also a report that might help an overseer better evaluate the system’s performance? Following OpenAI researcher Paul Christiano, we call this the problem of informed oversight.5

Problem 2: Ambiguous test examples

Slide 19Another problem: Consider a classifier trained to distinguish images of cats from images not containing cats, or trained to detect cancer. You may have lots of life experience that tells you “wild cats are cats.” If the training set only contains images of house cats and dogs, however, then it may not be possible to infer this fact during training.

An AI system that was superhumanly good at classifying images from a particular data set might not construct the same generalizations as a human, making it unreliable in new environments.

Slide 21In safety-critical settings, ideally we would like the classifier to say, “This is ambiguous,” to alert us that the image’s label is underdetermined by the labels of training set images. We could then leverage the classifier’s proficiency at classification to intervene in contexts where the system is relatively likely to misclassify things, and could also supply training data that’s tailored to the dimensions along which the original data was uninformative. Formalizing this goal is the problem of inductive ambiguity detection.

Problem 3: Difficulty imitating human behavior

Slide 25One way we might want to deal with the problems of smarter-than-human AI is to just have it imitate a human. By having the system observe a human attempting to perform some task and having it learn to behave similarly, we might be able to leverage the AI system’s cognitive advantages while having it mostly stick to fairly normal strategies.

As an example, we can consider an AI system tasked with producing the kind of picture a human would draw. How could you actually train a system on this objective?

Slide 27In one approach, generative adversarial networks, you have a human and two AI agents: an imitator, and a distinguisher.6The distinguisher tries to guess whether a given picture came from the human or from the imitator, and the imitator tries to produce the kinds of image that the distinguisher classifies as coming from the human.

This is a pretty cool setup, but it raises some theoretical issues. We could ask the question, “Does the distinguisher have to be smarter than the imitator? And if so, how much smarter?” If the distinguisher is less smart than the imitator, then the imitator could covertly do very non-human-like things to the image, as long as they’re too subtle for the distinguisher to notice. An open avenue of research here is to analyze schemes like this and ask under what conditions we can get nice robustness guarantees.

For more on this problem, see Paul Christiano’s “Act-Based Agents” and “Mimicry and Meeting Halfway”.

Problem 4: Difficulty specifying goals about the real world

What if we want to train our AI system to perform a task like making a sandwich? One option might be to hook the system up to a robot in a room full of sandwich ingredients, have it perform an action sequence, and then have a human observer rate the robot’s performance based on how close it came to making a sandwich. That rating determines the robot’s reward.

We previously noted that sufficiently capable RL agents might pick actions that are hard to evaluate. Here we face the additional problem that useful tasks will often require taking physical action in the world. If the system is capable enough, then this setup gives it an incentive to take away the reward button and press it itself. This is what the formalism of RL would tell you is the best action, if we imagine AI systems that continue to be trained in the RL framework far past current capability levels.

A natural question, then, is whether we can train AI systems that just keep getting better at producing a sandwich as they improve in capabilities, without ever reaching a tipping point where they have an incentive to do something else. Can we avoid relying on proxies for the task we care about, and just train the system to value completing the task in its own right? This is the generalizable environmental goals problem.

Problem 5: Negative side-effects

Suppose we succeeded in making a system that wants to put a sandwich in the room. In choosing between plans, it will favor whichever plan has the higher probability of resulting in a sandwich. Perhaps the policy of just walking over and making a sandwich has a 99.9% chance of success; but there’s always a chance that a human could step in and shut off the robot. A policy that drives down the probability of interventions like that might push up the probability of the room ending up containing a sandwich to 99.9999%. In this way, sufficiently advanced ML systems can end up with incentives to interfere with their developers and operators even when there’s no risk of reward hacking.

This is the problem of designing task-directed systems that can become superhumanly good at achieving their task, without causing negative side-effects in the process.

One response to this problem is to try to quantify how much total impact different policies have on the world. We can then add a penalty term for actions that have a high impact, causing the system to favor low-impact strategies.

Another approach is to ask how we might design an AI system to be satisfied with a merely 99.9% chance of success — just have the system stop trying to think up superior policies once it finds one meeting that threshold. This is the problem of formalizing mild optimization.

Or one can consider advanced AI systems from the perspective of convergent instrumental strategies. No matter what the system is trying to do, it can probably benefit by having more computational resources, by having the programmers like it more, by having more money. A sandwich-making system might want money so it can buy more ingredients, whereas a story-writing system might want money so it can buy books to learn from. Many different goals imply similar instrumental strategies, a number of which are likely to introduce conflicts due to resource limitations.

One, approach, then, would be to study these instrumental strategies directly and try to find a way to design a system that doesn’t exhibit them. If we can identify common features of these strategies, and especially of the adversarial strategies, then we could try to proactively avert the incentives to pursue those strategies. This seems difficult, and is very underspecified, but there’s some initial research pointed in this direction.

Problem 6: Edge cases that still satisfy the goal

Slide 40Another problem that’s likely to become more serious as ML systems become more advanced is edge cases.

Consider our ordinary concept of a sandwich. There are lots of things that technically count as sandwiches, but are unlikely to have the same practical uses a sandwich normally has for us. You could have an extremely small or extremely large sandwich, or a toxic sandwich.

For an example of this behavior in present-day systems, we can consider this image that an image classifier correctly classified as a panda (with 57% confidence). Goodfellow, Shlens, and Szegedy found that they could add a tiny vector to this image that causes the classifier to misclassify it as a gibbon with 99% confidence.7

Such edge cases are likely to become more common and more hazardous as ML systems begin to search wider solution spaces than humans are likely (or even able) to consider. This is then another case where systems might become increasingly good at maximizing their score on a conventional metric, while becoming less reliable for achieving realistic goals we care about.

Conservative concepts are an initial idea for trying to address this problem, by biasing systems to avoid assigning positive classifications to examples that are near the edges of the search space. The system might then make the mistake of thinking that some perfectly good sandwiches are inadmissible, but it would not make the more risky mistake of classifying toxic or otherwise bizarre sandwiches as admissible.

Technical details on one problem: inductive ambiguity identification

I’ve outlined eight research directions for addressing six problems that seem likely to start arising (or to become more serious) as ML systems become better at optimizing their objectives — objectives that may not exactly match programmers’ intentions. The research directions were:

These problems are discussed in more detail in “Alignment for Advanced ML Systems.” I’ll go into more technical depth on an example problem to give a better sense of what working on these problems looks like in practice.

KWIK learning

Slide 44Let’s consider the inductive ambiguity identification problem, applied to a classifier for 2D points. In this case, we have 4 positive examples and 4 negative examples.

When a new point comes in, the classifier could try to label it by drawing a whole bunch of models that are consistent with the previous data. Here, I draw just 4 of them. The question mark falls on opposite sides of these different models, suggesting that all of these models are plausible given the data.

Slide 45We can suppose that the system infers from this that the training data is ambiguous with respect to the new point’s classification, and asks the human to label it. The human might then label it with a plus, and the system draws new conclusions about which models are plausible.

This approach is called “Knows What It Knows” learning, or KWIK learning. We start with some input space X ≔ ℝn and assume that there exists some true mapping from inputs to probabilities. E.g., for each image the cat classifier encounters we assume that there is a true answer in the set Y ≔ [0,1] to the question, “What is the probability that this image is a cat?” This probability corresponds to the probability that a human will label that image “1” as opposed to “0,” which we can represent as a weighted coin flip. The model maps the inputs to answers, which in this case are probabilities.8

The KWIK learner is going to play a game. At the beginning of the game, some true model h* gets picked out. The true model is assumed to be in the hypothesis set H. On each iteration i some new example xi ∈ ℝncomes in. It has some true answer yi = h*(xi), but the learner is unsure about the true answer. The learner has two choices:

  • Output an answer ŷi ∈ [0,1].
    • If |ŷiyi| > ε, the learner then loses the game.
  • Output ⊥ to indicate that the example is ambiguous.
    • The learner then gets to observe the true label zi = FlipCoin(yi) from the observation set Z ≔ {0,1}.

The goal is to not lose, and to not output ⊥ too many times. The upshot is that it’s actually possible to win this game with a high probability if the hypothesis class H is a small finite set or a low-dimensional linear class. This is pretty cool. It turns out that there are certain forms of uncertainty where we can just resolve the ambiguity.

The way this works is that on each new input, we consider multiple models h that have done well in the past, and we consider something “ambiguous” if the models disagree on h(xi) by more than ε. Then we just refine the set of models over time.

The way that a KWIK learner represents this notion of inductive ambiguity is: ambiguity is about not knowing which model is correct. There’s some set of models, many are plausible, and you’re not sure which one is the right model.

There are some problems with this. One of the main problems is KWIK learning’s realizability assumption — the assumption that the true model h* is actually in the hypothesis set H. Realistically, the actual universe won’t be in your hypothesis class, since your hypotheses need to fit in your head. Another problem is that this method only works for these very simple model classes.

A Bayesian view of the problem

That’s some existing work on inductive ambiguity identification. What’s some work we’ve been doing at MIRI related to this?

Lately, I’ve been trying to approach this problem from a Bayesian perspective. On this view, we have some kind of prior Q over mappings X → {0,1} from the input space to the label. The assumption we’ll make is that our prior is wrong in some way and there’s some unknown “true” prior P over these mappings. The goal is that even though the system only has access to Q, it should perform the classification task almost as well (in expectation over P) as if it already knew P.

It seems like this task is hard. If the real world is sampled from P, and P is different from your prior Q, there aren’t that many guarantees. To make this tractable, we can add a grain of truth assumption:


This says that if P assigns a high probability to something, then so does Q. Can we get good performance in various classification tasks under this kind of assumption?

We haven’t completed this research avenue, but initial results suggest that it’s possible to do pretty well on this task while avoiding catastrophic behaviors in at least in some cases (e.g., online supervised learning). That’s somewhat promising, and this is definitely an area for future research.

How this ties in to inductive ambiguity identification: If you’re uncertain about what’s true, then there are various ways of describing what that uncertainty is about. You can try taking your beliefs and partitioning them into various possibilities. That’s in some sense an ambiguity, because you don’t know which possibility is correct. We can think of the grain of truth assumption as saying that there’s some way of splitting up your probability distribution into components such that one of the components is right. The system should do well even though it doesn’t initially know which component is right.

(For more recent work on this problem, see Paul Christiano’s “Red Teams” and “Learning with Catastrophes” and research forum results from me and Ryan Carey: “Bias-Detecting Online Learners” and “Adversarial Bandit Learning with Catastrophes.”)

Other research agendas

Let’s return to a broad view and consider other research agendas focused on long-run AI safety. The first such agenda was outlined in MIRI’s 2014 agent foundations report.9

The agent foundations agenda is about developing a better theoretical understanding of reasoning and decision-making. An example of a relevant gap in our current theories is ideal reasoning about mathematical statements (including statements about computer programs), in contexts where you don’t have the time or compute to do a full proof. This is the basic problem we’re responding to in “Logical Induction.” In this talk I’ve focused on problems for advanced AI systems that broadly resemble present-day ML; in contrast, the agent foundations problems are agnostic about the details of the system. They apply to ML systems, but also to other possible frameworks for good general-purpose reasoning.

Then there’s the “Concrete Problems in AI Safety” agenda.10 Here the idea is to study AI safety problems with a more empirical focus, specifically looking for problems that we can study using current ML methods, and perhaps can even demonstrate in current systems or in systems that might be developed in the near future.

As an example, consider the question, “How do you make an RL agent that behaves safely while it’s still exploring its environment and learning about how the environment works?” It’s a question that comes up in current systems all the time, and is relatively easy to study today, but is likely to apply to more capable systems as well.

These different agendas represent different points of view on how one might make AI systems more reliable in a way that scales with capabilities progress, and our hope is that by encouraging work on a variety of different problems from a variety of different perspectives, we’re less likely to completely miss a key consideration. At the same time, we can achieve more confidence that we’re on the right track when relatively independent approaches all arrive at similar conclusions.

I’m leading the team at MIRI that will be focusing on the “Alignment for Advanced ML Systems” agenda going forward. It seems like there’s a lot of room for more eyes on these problems, and we’re hoping to hire a number of new researchers and kick off a number of collaborations to tackle these problems. If you’re interested in these problems and have a solid background in mathematics or computer science, I definitely recommend getting in touch or reading more about these problems.

  1. I also gave a version of this talk at the MIRI/FHI Colloquium on Robust and Beneficial AI. 
  2. Alternatively, you may think that AGI won’t look like modern ML in most respects, but that the ML aspects are easier to productively study today and are unlikely to be made completely irrelevant by future developments. 
  3. Alternatively, you may think timelines are long, but that we should focus on scenarios with shorter timelines because they’re more urgent. 
  4. Although I’ll use the example of stories here, in real life it could be a system generating plans for curing cancers, and humans evaluating how good the plans are. 
  5. See the Q&A section of the talk for questions like “Won’t the report be subject to the same concerns as the original story?” 
  6. Ian J. Goodfellow et al. “Generative Adversarial Nets”. In: Advances in Neural Information Processing 27. Ed. by Z. Ghahramani et al. Curran Associates, Inc., 2014, pp. 2672-2680. URL:
  7. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and Harnessing Adversarial Examples”. In: (2014). arXiv: 1412.6572 [stat.ML]
  8. The KWIK learning framework is much more general than this; I’m just giving one example. 
  9. Nate Soares and Benja Fallenstein. Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda. Tech. rep. 2014-8. Forthcoming 2017 in “The Technological Singularity: Managing the Journey” Jim Miller, Roman Yampolskiy, Stuart J. Armstrong, and Vic Callaghan, Eds. Berkeley, CA. Machine Intelligence Research Institute. 2014. 
  10. Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. “Concrete Problems in AI Safety”. In: (2016). arXiv: 1606.06565 [cs.AI]

Preparing for the Biggest Change in Human History

Importance Principle: Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources.

In the history of human progress, a few events have stood out as especially revolutionary: the intentional use of fire, the invention of agriculture, the industrial revolution, possibly the invention of computers and the Internet. But many anticipate that the creation of advanced artificial intelligence will tower over these achievements.

In a popular post, Tim Urban with Wait But Why wrote that artificial intelligence is “by far THE most important topic for our future.

Or, as AI professor Roman Yampolskiy told me, “Design of human-level AI will be the most impactful event in the history of humankind. It is impossible to over-prepare for it.”

The Importance Principle encourages us to plan for what could be the greatest “change in the history of life.” But just what are we preparing for? What will more advanced AI mean for society? I turned to some of the top experts in the field of AI to consider these questions.

Societal Benefits?

Guruduth Banavar, the Vice President of IBM Research, is hopeful that as AI advances, it will help humanity advance as well. In favor of the principle, he said, “I strongly believe this. I think this goes back to evolution. From the evolutionary point of view, humans have reached their current level of power and control over the world because of intelligence. … AI is augmented intelligence – it’s a combination of humans and AI working together. And this will produce a more productive and realistic future than autonomous AI, which is too far out. In the foreseeable future, augmented AI – AI working with people – will transform life on the planet. It will help us solve the big problems like those related to the environment, health, and education.”

“I think I also agreed with that one,” said Bart Selman, a professor at Cornell University. “Maybe not every person on earth should be concerned about it, but there should be, among scientists, a discussion about these issues and a plan – can you build safety guidelines to work with value alignment work? What can you actually do to make sure that the developments are beneficial in the end?”

Anca Dragan, an assistant professor at UC Berkeley, explained, “Ultimately, we work on AI because we believe it can have a strong positive impact on the world. But the more capable the technology becomes, the easier it becomes to misuse it – or perhaps, the effects of misusing it become more drastic. That is why it is so important, as we make progress, to start thinking more strongly about what role AI will play.”

Short-term Concerns

Though the Importance Principle specifically mentions advanced AI, some of the researchers I interviewed pointed out that nearer-term artificial intelligence could also drastically impact humanity.

“I believe that AI will create profound change even before it is ‘advanced’ and thus we need to plan and manage growth of the technology,” explained Kay Firth-Butterfield, Executive Director of “As humans, we are not good at long-term planning because our civil systems don’t encourage it, however, this is an area in which we must develop our abilities to ensure a responsible and beneficial partnership between man and machine.”

Stefano Ermon, an assistant professor at Stanford University, also considered the impacts of less advanced AI, saying, “It’s an incredibly powerful technology. I think it’s even hard to imagine what one could do if we are able to develop a strong AI, but even before that, well before that, the capabilities are really huge. We’ve seen the kind of computers and information technologies we have today, the way they’ve revolutionized our society, our economy, our everyday lives. And my guess is that AI technologies would have the potential to be even more impactful and even more revolutionary on our lives. And so I think it’s going to be a big change and it’s worth thinking very carefully about, although it’s hard to plan for it.”

In a follow up question about planning for AI over the shorter term, Selman added, “I think the effect will be quite dramatic. This is another interesting point – sometimes AI scientists say, well it might not be advanced AI will do us in, but dumb AI. … The example is always the self-driving car has no idea it’s driving you anywhere. It doesn’t even know what driving is. … If you looked at the videos of an accident that’s going to happen, people are so surprised that the car doesn’t hit the brakes at all, and that’s because the car works quite differently than humans. So I think there is some short-term [AI] risk in that … we actually think they’re smarter than they are. And I think that will actually go away when the machines become smarter, but for now…”

Learning From Experience

As revolutionary as advanced AI might be, we can still learn from previous technological revolutions and draw on their lessons to prepare for the changes ahead.

Toby Walsh, a guest professor at Technical University of Berlin, expressed a common criticism of the principles, arguing that the Importance Principle could – and probably should – apply to many “groundbreaking technologies.”

He explained, “This is one of those principles where I think you could put any society-changing technology in place of advanced AI. … It would be true of the steam engine, in some sense it’s true of social media and we’ve failed at that one, it could be true of the Internet but we failed at planning that well. It could be true of fire too, but we failed on that one as well and used it for war. But to get back to the observation that some of them are things that are not particular to AI – once you realize that AI is going to be groundbreaking, then all of the things that should apply to any groundbreaking technology should apply.”

By looking back at these previous revolutionary technologies and understanding their impacts, perhaps we can gain insight into how we can plan ahead for advanced AI.

Dragan was also interested more explicit solutions to the problem of planning ahead.

“As the AI capabilities advance,” she told me, “we have to take a step back and ask ourselves: are we solving the right problem? Is there a better problem definition that will more likely result in benefits to humanity?

“For instance, we have always defined AI agents as rational. That means they maximize expected utility. Thus far, utility is assumed to be known. But if you think about it, there is no gospel specifying utility. We are assuming that some *person* somewhere will know exactly what utility to specify for their agent. Well, it turns out, we don’t work like that: it is really hard for people, including AI experts, to specify utility functions. We try our best, but when the system goes ahead and optimizes for what we inputted, the result is sometimes surprising, and not in a good way. This suggests that our definition of an AI agent is predicated on a wrong assumption. We’ve already started seeing that in robotics – the definition of how a robot should move didn’t account for people, the definition of how a robot should learn from demonstration assumed that people can provide perfect demonstrations to a robot, etc. – I assume we are going to see this more and more in AI as a whole. We have to stop making implicit assumptions about people and end-users of AI, and rigorously tackle that head-on, putting people into the equation.”

What Do You Think?

What kind of impact will advanced AI have on the development of human progress? How can we prepare for such potentially tremendous changes? Can we prepare? What other questions do we, as a society, need to ask?

This article is part of a weekly series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here.

Making Deep Learning More Robust

Imagine how much more efficient lawyers could be if they had the time to read every legal book ever written and review every case ever brought to court. Imagine doctors with the ability to study every advancement published across the world’s medical journals, or consult every medical case, ever. Unfortunately, the human brain cannot store that much information, and it would take decades to achieve these feats.

But a computer, one specifically designed to work like the human mind, could.

Deep learning neural networks are designed to mimic the human brain’s neural connections. They are capable of learning through continuous exposure to huge amounts of data. This allows them to recognize patterns, comprehend complex concepts, and translate high-level abstractions. These networks consist of many layers, each having a different set of weights. The deeper the network, the stronger it is.

Current applications for these networks include medical diagnosis, robotics and engineering, face recognition, and automotive navigation. However, deep learning is still in development – not surprisingly, it is a huge undertaking to get machines to think like humans. In fact, very little is understood about these networks, and months of manual tuning are often required for obtaining excellent performance.

Fuxin Li, assistant professor at the Oregon State University School of Electrical Engineering and Computer Science, and his team are taking on the accuracy of these neural networks under adversarial conditions. Their research focuses on the basic machine learning aspects of deep learning, and how to make general deep learning more robust.

To try to better understand when a deep convolutional neural network (CNN) is going to be right or wrong, Li’s team had to establish an estimate of confidence in the predictions of the deep learning architecture. Those estimates can be used as safeguards when utilizing the networks in real life.

“Basically,” explains Li, “trying to make deep learning increasingly self-aware – to be aware of what type of data it has seen, and what type of data it could work on.”

The team looked at recent advances in deep learning, which have greatly improved the capability to recognize images automatically. Those networks, albeit very resistant to overfitting, were discovered to completely fail if some of the pixels in such images were perturbed via an adversarial optimization algorithm.

To a human observer, the image in question may look fine, but the deep network sees otherwise. According to the researchers, those adversarial examples are dangerous if a deep network is utilized into any crucial real application, such as autonomous driving. If the result of the network can be hacked, wrong authentications and other devastating effects would be unavoidable.

In a departure from previous perspectives that focused on improving the classifiers to correctly organize the adversarial examples, the team focused on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The accuracy for detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%.

The benefits of this research are numerous. It is vital for a neural network to be able to identify whether an example comes from a normal or an adversarial distribution. Such knowledge, if available, will help significantly to control behaviors of robots employing deep learning. A reliable procedure can prevent robots from behaving in an undesirable manner because of the false perceptions it made about the environment.

Li gives one example: “In robotics there’s this big issue about robots not doing something based on erroneous perception. It’s important for a robot to know that it’s not making a confident perception. For example, if [the robot] is saying there’s an object over there, but it’s actually a wall, he’ll go to fetch that object, and then he hits a wall.”

Hopefully, Li says, that won’t happen. However, current software and machine learning have been mostly based solely on prediction confidence within the original machine learning framework. Basically, the testing and training data are assumed to be pulled from the same distribution independently, and that can lead to incorrect assumptions.

Better confidence estimates could potentially help avoid incidents such as the Tesla crash scenario from May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. But first, the computer must be smarter. The computer has to learn to detect objects and differentiate, say, a tree from another vehicle.

“To make it really robust, you need to account for unknown objects. Something weird may hit you. A deer may jump out.” The network can’t be taught every unexpected situation, says Li, “so you need it to discover them without knowledge of what they are. That’s something that we do. We try to bridge the gap.”

Training procedures will make deep learning more automatic and lead to fewer failures, as well as confidence estimates when the deep network is utilized to predict new data. Most of this training, explains Li, comes from photo distribution using stock images. However, these are flat images much different than what a robot would normally see in day-to-day life. It’s difficult to get a 360-degree view just by looking at photos.

“There will be a big difference between the thing [the robot] trains on and the thing it really sees. So then, it is important for the robot to understand that it can predict some things confidently, and others it cannot,” says Li. “[The robot] needs to understand that it probably predicted wrong, so as not to act too aggressively toward its prediction.” This can only be achieved with a more self-aware framework, which is what Li is trying to develop with this grant.

Further, these estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Soon, Li and his team will start generalizing the approach to other domains, such as temporal models (RNNs, LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-making paradigms.

Li’s most recent update on this work can be found here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

How Smart Can AI Get?

Capability Caution Principle: There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities.

A major change is coming, over unknown timescales but across every segment of society, and the people playing a part in that transition have a huge responsibility and opportunity to shape it for the best. What will trigger this change? Artificial intelligence.

The 23 Asilomar AI Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here.


Capability Caution

One of the greatest questions facing AI researchers is: just how smart and capable can artificial intelligence become?

In recent years, the development of AI has accelerated in leaps and bounds. DeepMind’s AlphaGo surpassed human performance in the challenging, intricate game of Go, and the company has created AI that can quickly learn to play Atari video games with much greater prowess than a person. We’ve also seen breakthroughs and progress in language translation, self-driving vehicles, and even the creation of new medicinal molecules.

But how much more advanced can AI become? Will it continue to excel only in narrow tasks, or will it develop broader learning skills that will allow a single AI to outperform a human in most tasks? How do we prepare for an AI more intelligent than we can imagine?

Some experts think human-level or even super-human AI could be developed in a matter of a couple decades, while some don’t think anyone will ever accomplish this feat. The Capability Caution Principle argues that, until we have concrete evidence to confirm what an AI can someday achieve, it’s safer to assume that there are no upper limits – that is, for now, anything is possible and we need to plan accordingly.


Expert Opinion

The Capability Caution Principle drew both consensus and disagreement from the experts. While everyone I interviewed generally agreed that we shouldn’t assume upper limits for AI, their reasoning varied and some raised concerns.

Stefano Ermon, an assistant professor at Stanford and Roman Yampolskiy, an associate professor at the University of Louisville, both took a better-safe-than-sorry approach.

Ermon turned to history as a reminder of how difficult future predictions are. He explained, “It’s always hard to predict the future. … Think about what people were imagining a hundred years ago, about what the future would look like. … I think it would’ve been very hard for them to imagine what we have today. I think we should take a similar, very cautious view, about making predictions about the future. If it’s extremely hard, then it’s better to play it safe.”

Yampolskiy considered current tech safety policies, saying, “In many areas of computer science such as complexity or cryptography the default assumption is that we deal with the worst case scenario. Similarly, in AI Safety we should assume that AI will become maximally capable and prepare accordingly. If we are wrong we will still be in great shape.”

Dan Weld, a professor at the University of Washington, said of the principle, “I agree! As a scientist, I’m against making strong or unjustified assumptions about anything, so of course I agree.”

But though he agreed with the basic idea behind the principle, Weld also had reservations. “This principle bothers me,” Weld explained, “… because it seems to be implicitly saying that there is an immediate danger that AI is going to become superhumanly, generally intelligent very soon, and we need to worry about this issue. This assertion … concerns me because I think it’s a distraction from what are likely to be much bigger, more important, more near-term, potentially devastating problems. I’m much more worried about job loss and the need for some kind of guaranteed health-care, education and basic income than I am about Skynet. And I’m much more worried about some terrorist taking an AI system and trying to program it to kill all Americans than I am about an AI system suddenly waking up and deciding that it should do that on its own.”

Looking at the problem from a different perspective, Guruduth Banavar, the Vice President of IBM Research, worries that placing upper bounds on AI capabilities could limit the beneficial possibilities. Banavar explained, “The general idea is that intelligence, as we understand it today, is ultimately the ability to process information from all possible sources and to use that to predict the future and to adapt to the future. It is entirely in the realm of possibility that machines can do that. … I do think we should avoid assumptions of upper limits on machine intelligence because I don’t want artificial limits on how advanced AI can be.”

IBM research scientist Francesca Rossi considered this principle from yet another perspective, suggesting that AI is necessary for humanity to reach our full capabilities, where we also don’t want to assume upper limits.

“I personally am for building AI systems that augment human intelligence instead of replacing human intelligence,” said Rossi, “And I think that in that space of augmenting human intelligence there really is a huge potential for AI in making the personal and professional lives of everybody much better. I don’t think that there are upper limits of the future AI capabilities in that respect. I think more and more AI systems together with humans will enhance our kind of intelligence, which is complementary to the kind of intelligence that machines have, and will help us make better decisions, and live better, and solve problems that we don’t know how to solve right now. I don’t see any upper limit to that.”


What do you think?

Is there an upper limit to artificial intelligence? Is there an upper limit to what we can achieve with AI? How long will it take to achieve increasing levels of advanced AI? How do we plan for the future with such uncertainties? How can society as a whole address these questions? What other questions should we be asking about AI capabilities?

MIRI February 2017 Newsletter

Following up on a post outlining some of the reasons MIRI researchers and OpenAI researcher Paul Christiano are pursuing different research directions, Jessica Taylor has written up the key motivations for MIRI’s highly reliable agent design research.

Research updates

General updates

  • We attended the Future of Life Institute’s Beneficial AI conference at Asilomar. See Scott Alexander’s recap. MIRI executive director Nate Soares was on a technical safety panel discussion with representatives from DeepMind, OpenAI, and academia (video), also featuring a back-and-forth with Yann LeCun, the head of Facebook’s AI research group (at 22:00).
  • MIRI staff and a number of top AI researchers are signatories on FLI’s new Asilomar AI Principles, which include cautions regarding arms races, value misalignment, recursive self-improvement, and superintelligent AI.
  • The Center for Applied Rationality recounts MIRI researcher origin stories and other cases where their workshops have been a big assist to our work, alongside examples of CFAR’s impact on other groups.
  • The Open Philanthropy Project has awarded a $32,000 grant to AI Impacts.
  • Andrew Critch spoke at Princeton’s ENVISION conference (video).
  • Matthew Graves has joined MIRI as a staff writer. See his first piece for our blog, a reply to “Superintelligence: The Idea That Eats Smart People.”
  • The audio version of Rationality: From AI to Zombies is temporarily unavailable due to the shutdown of Castify. However, fans are already putting together a new free recording of the full collection.

News and links

  • An Asilomar panel on superintelligence (video) gathers Elon Musk (OpenAI), Demis Hassabis (DeepMind), Ray Kurzweil (Google), Stuart Russell and Bart Selman (CHCAI), Nick Bostrom (FHI), Jaan Tallinn (CSER), Sam Harris, and David Chalmers.
  • Also from Asilomar: Russell on corrigibility (video), Bostrom on openness in AI (video), and LeCun on the path to general AI (video).
  • From MIT Technology Review‘s “AI Software Learns to Make AI Software”:
    Companies must currently pay a premium for machine-learning experts, who are in short supply. Jeff Dean, who leads the Google Brain research group, mused last week that some of the work of such workers could be supplanted by software. He described what he termed “automated machine learning” as one of the most promising research avenues his team was exploring.

The Financial World of AI

Automated algorithms currently manage over half of trading volume in US equities, and as AI improves, it will continue to assume control over important financial decisions. But these systems aren’t foolproof. A small glitch could send shares plunging, potentially costing investors billions of dollars.

For firms, the decision to accept this risk is simple. The algorithms in automated systems are faster and more accurate than any human, and deploying the most advanced AI technology can keep firms in business.

But for the rest of society, the consequences aren’t clear. Artificial intelligence gives firms a competitive edge, but will these rapidly advancing systems remain safe and robust? What happens when they make mistakes?


Automated Errors

Michael Wellman, a professor of computer science at the University of Michigan, studies AI’s threats to the financial system. He explains, “The financial system is one of the leading edges of where AI is automating things, and it’s also an especially vulnerable sector. It can be easily disrupted, and bad things can happen.”

Consider the story of Knight Capital. On August 1, 2012, Knight decided to try out new software to stay competitive in a new trading pool. The software passed its safety tests, but when Knight deployed it, the algorithm activated its testing software instead of the live trading program. The testing software sent millions of bad orders in the following minutes as Knight frantically tried to stop it. But the damage was done.

In just 45 minutes, Knight Capital lost $440 million – nearly four times their profit in 2011 – all because of one line of code.

In this case, the damage was constrained to Knight, but what happens when one line of code can impact the entire financial system?


Understanding Autonomous Trading Agents

Wellman argues that autonomous trading agents are difficult to control because they process and respond to information at unprecedented speeds, they can be easily replicated on a large scale, they act independently, and they adapt to their environment.

With increasingly general capabilities, systems may learn to make money in dangerous ways that their programmers never intended. As Lawrence Pingree, an analyst at Gartner, said after the Knight meltdown, “Computers do what they’re told. If they’re told to do the wrong thing, they’re going to do it and they’re going to do it really, really well.”

In order to prevent AI systems from undermining market transparency and stability, government agencies and academics must learn how these agents work.


Market Manipulation

Even benign uses of AI can hinder market transparency, but Wellman worries that AI systems will learn to manipulate markets.

Autonomous trading agents are especially effective at exploiting arbitrage opportunities – where they simultaneously purchase and sell an asset to profit from pricing differences. If, for example, a stock trades at $30 in one market and $32 in a second market, an agent can buy the $30 stock and immediately sell it for $32 in the second market, making a $2 profit.

Market inefficiency naturally creates arbitrage opportunities. However, an AI may learn – on its own – to create pricing discrepancies by taking misleading actions that move the market to generate profit.

One manipulative technique is ‘spoofing’ – the act of bidding for a stock item with the intent to cancel the bid before execution. This moves the market in a certain direction, and the spoofer profits from the false signal.

Wellman and his team recently reproduced spoofing in their laboratory models, as part of an effort to understand the situations where spoofing can be effective. He explains, “We’re doing this in the laboratory to see if we can characterize the signature of AIs doing this, so that we reliably detect it and design markets to reduce vulnerability.”

As agents improve, they may learn to exploit arbitrage more maliciously by creating artificial items on the market to mislead traders, or by hacking accounts to report false events that move markets. Wellman’s work aims to produce methods to help control such manipulative behavior.


Secrecy in the Financial World

But the secretive nature of finance prevents academics from fully understanding the role of AI.

Wellman explains, “We know they use AI and machine learning to a significant extent, and they are constantly trying to improve their algorithms. We don’t know to what extent things like market manipulation and spoofing are automated right now, but we know that they could be automated and that could lead to something of an arms race between market manipulators and the systems trying to detect and run surveillance for market bad behavior.”

Government agencies – such as the Securities and Exchange Commission – watch financial markets, but “they’re really outgunned as far as the technology goes,” Wellman notes. “They don’t have the expertise or the infrastructure to keep up with how fast things are changing in the industry.”

But academics can help. According to Wellman, “even without doing the trading for money ourselves, we can reverse engineer what must be going on in the financial world and figure out what can happen.”


Preparing for Advanced AI

Although Wellman studies current and near-term AI, he’s concerned about the threat of advanced, general AI.

“One thing we can do to try to understand the far-out AI is to get experience with dealing with the near-term AI,” he explains. “That’s why we want to look at regulation of autonomous agents that are very near on the horizon or current. The hope is that we’ll learn some lessons that we can then later apply when the superintelligence comes along.”

AI systems are improving rapidly, and there is intense competition between financial firms to use them. Understanding and tracking AI’s role in finance will help financial markets remain stable and transparent.

“We may not be able to manage this threat with 100% reliability,” Wellman admits, “but I’m hopeful that we can redesign markets to make them safer for the AIs and eliminate some forms of the arms race, and that we’ll be able to get a good handle on preventing some of the most egregious behaviors.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Can We Ensure Privacy in the Era of Big Data?

Personal Privacy Principle: People should have the right to access, manage and control the data they generate, given AI systems’ power to analyze and utilize that data.

A major change is coming, over unknown timescales but across every segment of society, and the people playing a part in that transition have a huge responsibility and opportunity to shape it for the best. What will trigger this change? Artificial intelligence.

The 23 Asilomar AI Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here.

Personal Privacy

In the age of social media and online profiles, maintaining privacy is already a tricky problem. As companies collect ever-increasing quantities of data about us, and as AI programs get faster and more sophisticated at analyzing that data, our information can become both a commodity for business and a liability for us.

We’ve already seen small examples of questionable data use, such as Target recognizing a teenager was pregnant before her family knew. But this is merely advanced marketing. What happens when governments or potential employers can gather what seems like innocent and useless information (like grocery shopping preferences) to uncover your most intimate secrets – like health issues even you didn’t know about yet?

It turns out, all of the researchers I spoke to strongly agree with the Personal Privacy Principle.

The Importance of Personal Privacy

“I think that’s a big immediate issue,” says Stefano Ermon, an assistant professor at Stanford. “I think when the general public thinks about AI safety, maybe they think about killer robots or these kind of apocalyptic scenarios, but there are big concrete issues like privacy, fairness, and accountability.”

“I support that principle very strongly!” agrees Dan Weld, a professor at the University of Washington. “I’m really quite worried about the loss of privacy. The number of sensors is increasing and combined with advanced machine learning, there are few limits to what companies and governments can learn about us. Now is the time to insist on the ability to control our own data.”

Toby Walsh, a guest professor at the Technical University of Berlin, also worries about privacy. “Yes, this is a great one, and actually I’m really surprised how little discussion we have around AI and privacy,” says Walsh. “I thought there was going to be much more fallout from Snowden and some of the revelations that happened, and AI, of course, is enabling technology. If you’re collecting all of this data, the only way to make sense of it is to use AI, so I’ve been surprised that there hasn’t been more discussion and more concern amongst the public around these sorts of issues.”

Kay Firth-Butterfield, an adjunct professor at the University of Texas in Austin, adds, “As AI becomes more powerful, we need to take steps to ensure that it cannot use our personal data against us if it falls into the wrong hands.”

Taking this concern a step further, Roman Yampolskiy, an associate professor at the University of Louisville, argues that “the world’s dictatorships are looking forward to opportunities to target their citizenry with extreme levels of precision.”

“The tech we will develop,” he continues, “will most certainly become available throughout the world and so we have a responsibility to make privacy a fundamental cornerstone of any data analysis.”

But some of the researchers also worry about the money to be made from personal data.

Ermon explains, “Privacy is definitely a big one, and one of the most valuable things that these large corporations have is the data they are collecting from us, so we should think about that carefully.”

“Data is worth money,” agrees Firth-Butterfield, “and as individuals we should be able to choose when and how to monetize our own data whilst being encouraged to share data for public health and other benefits.”

Francesca Rossi, a research scientist for IBM, believes this principle is “very important,” but she also emphasizes the benefits we can gain if we can share our data without fearing it will be misused. She says, “People should really have the right to own their privacy, and companies like IBM or any other that provide AI capabilities and systems should protect the data of their clients. The quality and amount of data is essential for many AI systems to work well, especially in machine learning. … It’s also very important that these companies don’t just assure that they are taking care of the data, but that they are transparent about the use of the data. Without this transparency and trust, people will resist giving their data, which would be detrimental to the AI capabilities and the help AI can offer in solving their health problems, or whatever the AI is designed to solve.”

Privacy as as Social Right

Both Yoshua Bengio and Guruduth Banavar argued that personal privacy isn’t just something that AI researchers should value, but that it should also be considered a social right.

Bengio, a professor at the University of Montreal, says, “We should be careful that the complexity of AI systems doesn’t become a tool for abusing minorities or individuals who don’t have access to understand how it works. I think this is a serious social rights issue.” But he also worries that preventing rights violations may not be an easy technical fix. “We have to be careful with that because we may end up barring machine learning from publicly used systems, if we’re not careful,” he explains, adding, “the solution may not be as simple as saying ‘it has to be explainable,’ because it won’t be.”

And as Ermon says, “The more we delegate decisions to AI systems, the more we’re going to run into these issues.”

Meanwhile, Banavar, the Vice President of IBM Research, considers the issue of personal privacy rights especially important. He argues, “It’s absolutely crucial that individuals should have the right to manage access to the data they generate. … AI does open new insight to individuals and institutions. It creates a persona for the individual or institution – personality traits, emotional make-up, lots of the things we learn when we meet each other. AI will do that too and it’s very personal. I want to control how [my] persona is created. A persona is a fundamental right.”

What Do You Think?

And now we turn the conversation over to you. What does personal privacy mean to you? How important is it to have control over your data? The experts above may have agreed about how serious the problem of personal privacy is, but solutions are harder come by. Do we need to enact new laws to protect the public? Do we need new corporate policies? How can we ensure that companies and governments aren’t using our data for nefarious purposes – or even for well-intentioned purposes that still aren’t what we want? What else should we, as a society, be asking?

How Do We Align Artificial Intelligence with Human Values?

A major change is coming, over unknown timescales but across every segment of society, and the people playing a part in that transition have a huge responsibility and opportunity to shape it for the best. What will trigger this change? Artificial intelligence.

Recently, some of the top minds in AI and related fields got together to discuss how we can ensure AI remains beneficial throughout this transition, and the result was the Asilomar AI Principles document. The intent of these 23 principles is to offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.”

The Principles represent the beginning of a conversation, and now that the conversation is underway, we need to follow up with broad discussion about each individual principle. The Principles will mean different things to different people, and in order to benefit as much of society as possible, we need to think about each principle individually.

As part of this effort, I interviewed many of the AI researchers who signed the Principles document to learn their take on why they signed and what issues still confront us.

Value Alignment

Today, we start with the Value Alignment principle.

Value Alignment: Highly autonomous AI systems should be designed so that their goals and behaviors can be assured to align with human values throughout their operation.

Stuart Russell, who helped pioneer the idea of value alignment, likes to compare this to the King Midas story. When King Midas asked for everything he touched to turn to gold, he really just wanted to be rich. He didn’t actually want his food and loved ones to turn to gold. We face a similar situation with artificial intelligence: how do we ensure that an AI will do what we really want, while not harming humans in a misguided attempt to do what its designer requested?

“Robots aren’t going to try to revolt against humanity,” explains Anca Dragan, an assistant professor and colleague of Russell’s at UC Berkeley, “they’ll just try to optimize whatever we tell them to do. So we need to make sure to tell them to optimize for the world we actually want.”

What Do We Want?

Understanding what “we” want is among the biggest challenges facing AI researchers.

“The issue, of course, is to define what exactly these values are, because people might have different cultures, [come from] different parts of the world, [have] different socioeconomic backgrounds — I think people will have very different opinions on what those values are. And so that’s really the challenge,” says Stefano Ermon, an assistant professor at Stanford.

Roman Yampolskiy, an associate professor at the University of Louisville agrees. He explains, “It is very difficult to encode human values in a programming language, but the problem is made more difficult by the fact that we as humanity do not agree on common values, and even parts we do agree on change with time.”

And while some values are hard to gain consensus around, there are also lots of values we all implicitly agree on. As Russell notes, any human understands emotional and sentimental values that they’ve been socialized with, but it’s difficult to guarantee that a robot will be programmed with that same understanding.

But IBM research scientist Francesca Rossi is hopeful. As Rossi points out, “there is scientific research that can be undertaken to actually understand how to go from these values that we all agree on to embedding them into the AI system that’s working with humans.”

Dragan’s research comes at the problem from a different direction. Instead of trying to understand people, she looks at trying to train a robot or AI to be flexible with its goals as it interacts with people. She explains, “At Berkeley, … we think it’s important for agents to have uncertainty about their objectives, rather than assuming they are perfectly specified, and treat human input as valuable observations about the true underlying desired objective.”

Rewrite the Principle?

While most researchers agree with the underlying idea of the Value Alignment Principle, not everyone agrees with how it’s phrased, let alone how to implement it.

Yoshua Bengio, an AI pioneer and professor at the University of Montreal, suggests “assured” may be too strong. He explains, “It may not be possible to be completely aligned. There are a lot of things that are innate, which we won’t be able to get by machine learning, and that may be difficult to get by philosophy or introspection, so it’s not totally clear we’ll be able to perfectly align. I think the wording should be something along the lines of ‘we’ll do our best.’ Otherwise, I totally agree.”

Walsh, who’s currently a guest professor at the Technical University of Berlin, questions the use of the word “highly.” “I think any autonomous system, even a lowly autonomous system, should be aligned with human values. I’d wordsmith away the ‘high,’” he says.

Walsh also points out that, while value alignment is often considered an issue that will arise in the future, he believes it’s something that needs to be addressed sooner rather than later. “I think that we have to worry about enforcing that principle today,” he explains. “I think that will be helpful in solving the more challenging value alignment problem as systems get more sophisticated.

Rossi, who supports the the Value Alignment Principle as, “the one closest to my heart,” agrees that the principle should apply to current AI systems. “I would be even more general than what you’ve written in this principle,” she says. “Because this principle has to do not only with autonomous AI systems, but … is very important and essential also for systems that work tightly with humans-in-the-loop and where the human is the final decision maker. When you have a human and machine tightly working together, you want this to be a real team.”

But as Dragan explains, “This is one step toward helping AI figure out what it should do, and continuously refining the goals should be an ongoing process between humans and AI.”

Let the Dialogue Begin

And now we turn the conversation over to you. What does it mean to you to have artificial intelligence aligned with your own life goals and aspirations? How can it be aligned with you and everyone else in the world at the same time? How do we ensure that one person’s version of an ideal AI doesn’t make your life more difficult? How do we go about agreeing on human values, and how can we ensure that AI understands these values? If you have a personal AI assistant, how should it be programmed to behave? If we have AI more involved in things like medicine or policing or education, what should that look like? What else should we, as a society, be asking?

Doomsday Clock: Two and a Half Minutes to Midnight

Is the world more dangerous than ever?

Today in Washington, D.C, the Bulletin of Atomic Scientists announced its decision to move the infamous Doomsday Clock thirty seconds closer to doom: It is now two and a half minutes to midnight.

Each year since 1947, the Bulletin of Atomic Scientists has publicized the symbol of the Doomsday Clock to convey how close we are to destroying our civilization with dangerous technologies of our own making. As the Bulletin perceives our existential threats to grow, the minute hand inches closer to midnight.

For the past two years the Doomsday Clock has been set at three minutes to midnight.

But now, in the face of an increasingly unstable political climate, the Doomsday Clock is the closest to midnight it has been since 1953.

The clock struck two minutes to midnight in 1953 at the start of the nuclear arms race, but what makes 2017 uniquely dangerous for humanity is the variety of threats we face. Not only is there growing uncertainty with nuclear weapons and the leaders that control them, but the existential threats of climate change, artificial intelligence, cybersecurity, and biotechnology continue to grow.

As the Bulletin notes, “The challenge remains whether societies can develop and apply powerful technologies for our welfare without also bringing about our own destruction through misapplication, madness, or accident.”

Rachel Bronson, the Executive Director and publisher of the Bulletin of the Atomic Scientists, said: “This year’s Clock deliberations felt more urgent than usual. In addition to the existential threats posed by nuclear weapons and climate change, new global realities emerged, as trusted sources of information came under attack, fake news was on the rise, and words were used by a President-elect of the United States in cavalier and often reckless ways to address the twin threats of nuclear weapons and climate change.”

Lawrence Krauss, a Chair on the Board of Sponsors, warned viewers that “technological innovation is occurring at a speed that challenges society’s ability to keep pace.” While these technologies offer unprecedented opportunities for humanity to thrive, they have proven difficult to control and thus demand responsible leadership.

Given the difficulty of controlling these increasingly capable technologies, Krauss discussed the importance of science for informing policy. Scientists and groups like the Bulletin don’t seek to make policy, but their research and evidence must support and inform policy. “Facts are stubborn things,” Krauss explained, “and they must be taken into account if the future of humanity is to be preserved. Nuclear weapons and climate change are precisely the sort of complex existential threats that cannot be properly managed without access to and reliance on expert knowledge.”

The Bulletin ended their public statement today with a strong message: “It is two and a half minutes to midnight, the Clock is ticking, global danger looms. Wise public officials should act immediately, guiding humanity away from the brink. If they do not, wise citizens must step forward and lead the way.”

You can read the Bulletin of Atomic Scientists’ full report here.

Why 2016 Was Actually a Year of Hope

Just about everyone found something to dislike about 2016, from wars to politics and celebrity deaths. But hidden within this year’s news feeds were some really exciting news stories. And some of them can even give us hope for the future.

Artificial Intelligence

Though concerns about the future of AI still loom, 2016 was a great reminder that, when harnessed for good, AI can help humanity thrive.

AI and Health

Some of the most promising and hopefully more immediate breakthroughs and announcements were related to health. Google’s DeepMind announced a new division that would focus on helping doctors improve patient care. Harvard Business Review considered what an AI-enabled hospital might look like, which would improve the hospital experience for the patient, the doctor, and even the patient’s visitors and loved ones. A breakthrough from MIT researchers could see AI used to more quickly and effectively design new drug compounds that could be applied to a range of health needs.

More specifically, Microsoft wants to cure cancer, and the company has been working with research labs and doctors around the country to use AI to improve cancer research and treatment. But Microsoft isn’t the only company that hopes to cure cancer. DeepMind Health also partnered with University College London’s hospitals to apply machine learning to diagnose and treat head and neck cancers.

AI and Society

Other researchers are turning to AI to help solve social issues. While AI has what is known as the “white guy problem” and examples of bias cropped up in many news articles, Fei Fei Li has been working with STEM girls at Stanford to bridge the gender gap. Stanford researchers also published research that suggests  artificial intelligence could help us use satellite data to combat global poverty.

It was also a big year for research on how to keep artificial intelligence safe as it continues to develop. Google and the Future of Humanity Institute made big headlines with their work to design a “kill switch” for AI. Google Brain also published a research agenda on various problems AI researchers should be studying now to help ensure safe AI for the future.

Even the White House got involved in AI this year, hosting four symposia on AI and releasing reports in October and December about the potential impact of AI and the necessary areas of research. The White House reports are especially focused on the possible impact of automation on the economy, but they also look at how the government can contribute to AI safety, especially in the near future.

AI in Action

And of course there was AlphaGo. In January, Google’s DeepMind published a paper, which announced that the company had created a program, AlphaGo, that could beat one of Europe’s top Go players. Then, in March, in front of a live audience, AlphaGo beat the reigning world champion of Go in four out of five games. These results took the AI community by surprise and indicate that artificial intelligence may be progressing more rapidly than many in the field realized.

And AI went beyond research labs this year to be applied practically and beneficially in the real world. Perhaps most hopeful was some of the news that came out about the ways AI has been used to address issues connected with pollution and climate change. For example, IBM has had increasing success with a program that can forecast pollution in China, giving residents advanced warning about days of especially bad air. Meanwhile, Google was able to reduce its power usage by using DeepMind’s AI to manipulate things like its cooling systems.

And speaking of addressing climate change…

Climate Change

With recent news from climate scientists indicating that climate change may be coming on faster and stronger than previously anticipated and with limited political action on the issue, 2016 may not have made climate activists happy. But even here, there was some hopeful news.

Among the biggest news was the ratification of the Paris Climate Agreement. But more generally, countries, communities and businesses came together on various issues of global warming, and Voices of America offers five examples of how this was a year of incredible, global progress.

But there was also news of technological advancements that could soon help us address climate issues more effectively. Scientists at Oak Ridge National Laboratory have discovered a way to convert CO2 into ethanol. A researcher from UC Berkeley has developed a method for artificial photosynthesis, which could help us more effectively harness the energy of the sun. And a multi-disciplinary team has genetically engineered bacteria that could be used to help combat global warming.


Biotechnology — with fears of designer babies and manmade pandemics – is easily one of most feared technologies. But rather than causing harm, the latest biotech advances could help to save millions of people.


In the course of about two years, CRISPR-cas9 went from a new development to what could become one of the world’s greatest advances in biology. Results of studies early in the year were promising, but as the year progressed, the news just got better. CRISPR was used to successfully remove HIV from human immune cells. A team in China used CRISPR on a patient for the first time in an attempt to treat lung cancer (treatments are still ongoing), and researchers in the US have also received approval to test CRISPR cancer treatment in patients. And CRISPR was also used to partially restore sight to blind animals.

Gene Drive

Where CRISPR could have the most dramatic, life-saving effect is in gene drives. By using CRISPR to modify the genes of an invasive species, we could potentially eliminate the unwelcome plant or animal, reviving the local ecology and saving native species that may be on the brink of extinction. But perhaps most impressive is the hope that gene drive technology could be used to end mosquito- and tick-borne diseases, such as malaria, dengue, Lyme, etc. Eliminating these diseases could easily save over a million lives every year.

Other Biotech News

The year saw other biotech advances as well. Researchers at MIT addressed a major problem in synthetic biology in which engineered genetic circuits interfere with each other. Another team at MIT engineered an antimicrobial peptide that can eliminate many types of bacteria, including some of the antibiotic-resistant “superbugs.” And various groups are also using CRISPR to create new ways to fight antibiotic-resistant bacteria.

Nuclear Weapons

If ever there was a topic that does little to inspire hope, it’s nuclear weapons. Yet even here we saw some positive signs this year. The Cambridge City Council voted to divest their $1 billion pension fund from any companies connected with nuclear weapons, which earned them an official commendation from the U.S. Conference of Mayors. In fact, divestment may prove a useful tool for the general public to express their displeasure with nuclear policy, which will be good, since one cause for hope is that the growing awareness of the nuclear weapons situation will help stigmatize the new nuclear arms race.

In February, Londoners held the largest anti-nuclear rally Britain had seen in decades, and the following month MinutePhysics posted a video about nuclear weapons that’s been seen by nearly 1.3 million people. In May, scientific and religious leaders came together to call for steps to reduce nuclear risks. And all of that pales in comparison to the attention the U.S. elections brought to the risks of nuclear weapons.

As awareness of nuclear risks grows, so do our chances of instigating the change necessary to reduce those risks.

The United Nations Takes on Weapons

But if awareness alone isn’t enough, then recent actions by the United Nations may instead be a source of hope. As October came to a close, the United Nations voted to begin negotiations on a treaty that would ban nuclear weapons. While this might not have an immediate impact on nuclear weapons arsenals, the stigmatization caused by such a ban could increase pressure on countries and companies driving the new nuclear arms race.

The U.N. also announced recently that it would officially begin looking into the possibility of a ban on lethal autonomous weapons, a cause that’s been championed by Elon Musk, Steve Wozniak, Stephen Hawking and thousands of AI researchers and roboticists in an open letter.

Looking Ahead

And why limit our hope and ambition to merely one planet? This year, a group of influential scientists led by Yuri Milner announced an Alpha-Centauri starshot, in which they would send a rocket of space probes to our nearest star system. Elon Musk later announced his plans to colonize Mars. And an MIT scientist wants to make all of these trips possible for humans by using CRISPR to reengineer our own genes to keep us safe in space.

Yet for all of these exciting events and breakthroughs, perhaps what’s most inspiring and hopeful is that this represents only a tiny sampling of all of the amazing stories that made the news this year. If trends like these keep up, there’s plenty to look forward to in 2017.

AI Safety Highlights from NIPS 2016

This year’s Neural Information Processing Systems (NIPS) conference was larger than ever, with almost 6000 people attending, hosted in a huge convention center in Barcelona, Spain. The conference started off with two exciting announcements on open-sourcing collections of environments for training and testing general AI capabilities – the DeepMind Lab and the OpenAI Universe. Among other things, this is promising for testing safety properties of ML algorithms. OpenAI has already used their Universe environment to give an entertaining and instructive demonstration of reward hacking that illustrates the challenge of designing robust reward functions.

I was happy to see a lot of AI-safety-related content at NIPS this year. The ML and the Law symposium and Interpretable ML for Complex Systems workshop focused on near-term AI safety issues, while the Reliable ML in the Wild workshop also covered long-term problems. Here are some papers relevant to long-term AI safety:

Inverse Reinforcement Learning

Cooperative Inverse Reinforcement Learning (CIRL) by Hadfield-Menell, Russell, Abbeel, and Dragan (main conference). This paper addresses the value alignment problem by teaching the artificial agent about the human’s reward function, using instructive demonstrations rather than optimal demonstrations like in classical IRL (e.g. showing the robot how to make coffee vs having it observe coffee being made). (3-minute video)


Generalizing Skills with Semi-Supervised Reinforcement Learning by Finn, Yu, Fu, Abbeel, and Levine (Deep RL workshop). This work addresses the scalable oversight problem by proposing the first tractable algorithm for semi-supervised RL. This allows artificial agents to robustly learn reward functions from limited human feedback. The algorithm uses an IRL-like approach to infer the reward function, using the agent’s own prior experiences in the supervised setting as an expert demonstration.


Towards Interactive Inverse Reinforcement Learning by Armstrong and Leike (Reliable ML workshop). This paper studies the incentives of an agent that is trying to learn about the reward function while simultaneously maximizing the reward. The authors discuss some ways to reduce the agent’s incentive to manipulate the reward learning process.


Should Robots Have Off Switches? by Milli, Hadfield-Menell, and Russell (Reliable ML workshop). This poster examines some adverse effects of incentivizing artificial agents to be compliant in the off-switch game (a variant of CIRL).



Safe Exploration

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes by Turchetta, Berkenkamp, and Krause (main conference). This paper develops a reinforcement learning algorithm called Safe MDP that can explore an unknown environment without getting into irreversible situations, unlike classical RL approaches.


Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear by Lipton, Gao, Li, Chen, and Deng (Reliable ML workshop). This work addresses the ‘Sisyphean curse’ of DQN algorithms forgetting past experiences, as they become increasingly unlikely under a new policy, and therefore eventually repeating catastrophic mistakes. The paper introduces an approach called ‘intrinsic fear’, which maintains a model for how likely different states are to lead to a catastrophe within some number of steps.


Most of these papers were related to inverse reinforcement learning – while IRL is a promising approach, it would be great to see more varied safety material at the next NIPS. There were some more safety papers on other topics at UAI this summer: Safely Interruptible Agents (formalizing what it means to incentivize an agent to obey shutdown signals) and A Formal Solution to the Grain of Truth Problem (providing a broad theoretical framework for multiple agents learning to predict each other in arbitrary computable games).

These highlights were originally posted here and cross-posted to Approximately Correct. Thanks to Jan Leike, Zachary Lipton, and Janos Kramar for providing feedback on this post.


MIRI December 2016 Newsletter

We’re in the final weeks of our push to cover our funding shortfall, and we’re now halfway to our $160,000 goal. For potential donors who are interested in an outside perspective, Future of Humanity Institute (FHI) researcher Owen Cotton-Barratt has written up why he’s donating to MIRI this year. (Donation page.)Research updates

General updates

  • We teamed up with a number of AI safety researchers to help compile a list of recommended AI safety readings for the Center for Human-Compatible AI. See this page if you would like to get involved with CHCAI’s research.
  • Investment analyst Ben Hoskin reviews MIRI and other organizations involved in AI safety.

News and links

  • The Off-Switch Game“: Dylan Hadfield-Manell, Anca Dragan, Pieter Abbeel, and Stuart Russell show that an AI agent’s corrigibility is closely tied to the uncertainty it has about its utility function.
  • Russell and Allan Dafoe critique an inaccurate summary by Oren Etzioni of a new survey of AI experts on superintelligence.
  • Sam Harris interviews Russell on the basics of AI risk (video). See also Russell’s new Q&A on the future of AI.
  • Future of Life Institute co-founder Viktoriya Krakovna and FHI researcher Jan Leike join Google DeepMind’s safety team.
  • GoodAI sponsors a challenge to “accelerate the search for general artificial intelligence”.
  • OpenAI releases Universe, “a software platform for measuring and training an AI’s general intelligence across the world’s supply of games”. Meanwhile, DeepMind has open-sourced their own platform for general AI research, DeepMind Lab.
  • Staff at GiveWell and the Centre for Effective Altruism, along with others in the effective altruism community, explain where they’re donating this year.
  • FHI is seeking AI safety interns, researchers, and admins: jobs page.

This newsletter was originally posted here.

Silo Busting in AI Research

Artificial intelligence may seem like a computer science project, but if it’s going to successfully integrate with society, then social scientists must be more involved.

Developing an intelligent machine is not merely a problem of modifying algorithms in a lab. These machines must be aligned with human values, and this requires a deep understanding of ethics and the social consequences of deploying intelligent machines.

Getting people with a variety of backgrounds together seems logical enough in theory, but in practice, what happens when computer scientists, AI developers, economists, philosophers, and psychologists try to discuss AI issues? Do any of them even speak the same language?

Social scientists and computer scientists will come at AI problems from very different directions. And if they collaborate, everybody wins. Social scientists can learn about the complex tools and algorithms used in computer science labs, and computer scientists can become more attuned to the social and ethical implications of advanced AI.

Through transdisciplinary learning, both fields will be better equipped to handle the challenges of developing AI, and society as a whole will be safer.


Silo Busting

Too often, researchers focus on their narrow area of expertise, rarely reaching out to experts in other fields to solve common problems. AI is no different, with thick walls – sometimes literally – separating the social sciences from the computer sciences. This process of breaking down walls between research fields is often called silo-busting.

If AI researchers largely operate in silos, they may lose opportunities to learn from other perspectives and collaborate with potential colleagues. Scientists might miss gaps in their research or reproduce work already completed by others, because they were secluded away in their silo. This can significantly hamper the development of value-aligned AI.

To bust these silos, Wendell Wallach organized workshops to facilitate knowledge-sharing among leading computer and social scientists. Wallach, a consultant, ethicist, and scholar at Yale University’s Interdisciplinary Center for Bioethics, holds these workshops at The Hastings Center, where he is a senior advisor.

With co-chairs Gary Marchant, Stuart Russell, and Bart Selman, Wallach held the first workshop in April 2016. “The first workshop was very much about exposing people to what experts in all of these different fields were thinking about,” Wallach explains. “My intention was just to put all of these people in a room and hopefully they’d see that they weren’t all reinventing the wheel, and recognize that there were other people who were engaged in similar projects.”

The workshop intentionally brought together experts from a variety of viewpoints, including engineering ethics, philosophy, and resilience engineering, as well as participants from the Institute of Electrical and Electronics Engineers (IEEE), the Office of Naval Research, and the World Economic Forum (WEF). Wallach recounts, “some were very interested in how you implement sensitivity to moral considerations in AI computationally, and others were more interested in how AI changes the societal context.”

Other participants studied how the engineers of these systems may be susceptible to harmful cognitive biases and conflicts of interest, while still others focused on governance issues surrounding AI. Each of these viewpoints is necessary for developing beneficial AI, and The Hastings Center’s workshop gave participants the opportunity to learn from and teach each other.

But silo-busting is not easy. Wallach explains, “everybody has their own goals, their own projects, their own intentions, and it’s hard to hear someone say, ‘maybe you’re being a little naïve about this.’” When researchers operate exclusively in silos, “it’s almost impossible to understand how people outside of those silos did what they did,” he adds.

The intention of the first workshop was not to develop concrete strategies or proposals, but rather to open researchers’ minds to the broad challenges of developing AI with human values. “My suspicion is, the most valuable things that came out of this workshop would be hard to quantify,” Wallach clarifies. “It’s more like people’s minds were being stretched and opened. That was, for me, what this was primarily about.”

The workshop did yield some tangible results. For example, Marchant and Wallach introduced a pilot project for the international governance of AI, and nearly everyone at the workshop agreed to work on it. Since then, the IEEE, the International Committee of the Red Cross, the UN, the World Economic Forum, and other institutions have agreed to become active partners with The Hastings Center in building global infrastructure to ensure that AI and Robotics are beneficial.

This transdisciplinary cooperation is a promising sign that Wallach’s efforts are succeeding in strengthening the global response to AI challenges.


Value Alignment

Wallach and his co-chairs held a second workshop at the end of October. The participants were mostly scientists, but also included social theorists, a legal scholar, philosophers, and ethicists. The overall goal remained – to bust AI silos and facilitate transdisciplinary cooperation – but this workshop had a narrower focus.

“We made it more about value alignment and machine ethics,” he explains. “The tension in the room was between those who thought the problem [of value alignment] was imminently solvable and those who were deeply skeptical about solving the problem at all.”

In general, Wallach observed that “the social scientists and philosophers tend to overplay the difficulties [of creating AI with full value alignment] and computer scientists tend to underplay the difficulties.”

Wallach believes that while computer scientists will build the algorithms and utility functions for AI, they will need input from social scientists to ensure value alignment. “If a utility function represents 100,000 inputs, social theorists will help the AI researchers understand what those 100,000 inputs are,” he explains. “The AI researchers might be able to come up with 50,000-60,000 on their own, but they’re suddenly going to realize that people who have thought much more deeply about applied ethics are perhaps sensitive to things that they never considered.”

“I’m hoping that enough of [these researchers] learn each other’s language and how to communicate with each other, that they’ll recognize the value they can get from collaborating together,” he says. “I think I see evidence of that beginning to take place.”


Moving Forward

Developing value-aligned AI is a monumental task with existential risks. Experts from various perspectives must be willing to learn from each other and adapt their understanding of the issue.

In this spirit, The Hastings Center is leading the charge to bring the various AI silos together. After two successful events that resulted in promising partnerships, Wallach and his co-chairs will hold their third workshop in Spring 2018. And while these workshops are a small effort to facilitate transdisciplinary cooperation on AI, Wallach is hopeful.

“It’s a small group,” he admits, “but it’s people who are leaders in these various fields, so hopefully that permeates through the whole field, on both sides.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence and the King Midas Problem

Value alignment. It’s a phrase that often pops up in discussions about the safety and ethics of artificial intelligence. How can scientists create AI with goals and values that align with those of the people it interacts with?

Very simple robots with very constrained tasks do not need goals or values at all. Although the Roomba’s designers know you want a clean floor, Roomba doesn’t: it simply executes a procedure that the Roomba’s designers predict will work—most of the time. If your kitten leaves a messy pile on the carpet, Roomba will dutifully smear it all over the living room. If we keep programming smarter and smarter robots, then by the late 2020s, you may be able to ask your wonderful domestic robot to cook a tasty, high-protein dinner. But if you forgot to buy any meat, you may come home to a hot meal but find the aforementioned cat has mysteriously vanished. The robot, designed for chores, doesn’t understand that the sentimental value of the cat exceeds its nutritional value.

AI and King Midas

Stuart Russell, a renowned AI researcher, compares the challenge of defining a robot’s objective to the King Midas myth. “The robot,” says Russell, “has some objective and pursues it brilliantly to the destruction of mankind. And it’s because it’s the wrong objective. It’s the old King Midas problem.”

This is one of the big problems in AI safety that Russell is trying to solve. “We’ve got to get the right objective,” he explains, “and since we don’t seem to know how to program it, the right answer seems to be that the robot should learn – from interacting with and watching humans – what it is humans care about.”

Russell works from the assumption that the robot will solve whatever formal problem we define. Rather than assuming that the robot should optimize a given objective, Russell defines the problem as a two-player game (“game” as used by economists, meaning a decision problem with multiple agents) called cooperative inverse reinforcement learning (CIRL).

A CIRL game includes a person and a robot: the robot’s only purpose is to make the person happy, but it doesn’t know what the person wants. Fortunately, it can learn more about what the person wants by observing her behavior. For example, if a robot observed the human’s morning routine, it should discover how important coffee is—not to itself, of course (we don’t want robots drinking coffee), but to the human. Then, it will make coffee for the person without being asked.

What about the example of King Midas, who thought he wanted as much gold as possible? How does the approach deal with someone stating their wishes incorrectly?

“We don’t have to know what we want. That’s the whole point. It’s hard to know what we want, and what we ask for is evidence for what we really want, but it’s not ground truth,” says Russell.

The robot should learn what people want by watching their actions. Then, if someone asks for something that’s technically incorrect, the robot will understand their underlying wish. Russell adds, “The robot—if it had been there—would have told Midas that he didn’t really want everything turned to gold, maybe just a few choice objects that he might point at from time to time and say the magic word.”

AI Off-Switch

Russell and his Berkeley colleagues also recently announced further progress toward ensuring safe AI, with a paper on ensuring an AI’s off-switch is always accessible. An off-switch is often suggested to guarantee a robot doesn’t harm humans – if the robot goes bad, just turn it off! But what if the AI anticipates that and disables its off switch?

Russell and his team propose solving this by providing a level of uncertainty in the robot’s objective. They created another CIRL game in which the robot chooses to announce its intentions, which gives the human a chance to switch it off, or to plunge ahead with its plan, bypassing human oversight.

They showed that a robot with a fixed objective will take the second choice and disable its off-switch. But if there is uncertainty about the objective, the robot will reason that a human decision to switch the robot off would imply the announced intention is undesirable. Because its goal is to make the human happy, the robot accepts being switched off. The robot has a positive incentive not to disable the off-switch, and that incentive is directly tied to the robot’s uncertainty about the human’s true objective.

Ensuring AI Safety

In addition to his research, Russell is also one of the most vocal and active AI safety researchers concerned with ensuring a stronger public understanding of the potential issues surrounding AI development.

He recently co-authored a rebuttal to an article in the MIT Technology Review, which claimed that real AI scientists weren’t worried about the existential threat of AI. Russell and his co-author summed up why it’s better to be cautious and careful than just assume all will turn out for the best:

“Our experience with Chernobyl suggests it may be unwise to claim that a powerful technology entails no risks. It may also be unwise to claim that a powerful technology will never come to fruition. On September 11, 1933, Lord Rutherford, perhaps the world’s most eminent nuclear physicist, described the prospect of extracting energy from atoms as nothing but “moonshine.” Less than 24 hours later, Leo Szilard invented the neutron-induced nuclear chain reaction; detailed designs for nuclear reactors and nuclear weapons followed a few years later. Surely it is better to anticipate human ingenuity than to underestimate it, better to acknowledge the risks than to deny them. … [T]he risk [of AI] arises from the unpredictability and potential irreversibility of deploying an optimization process more intelligent than the humans who specified its objectives.”

This summer, Russell received a grant of over $5.5 million from the Open Philanthropy Project for a new research center, the Center for Human-Compatible Artificial Intelligence, in Berkeley. Among the primary objectives of the Center will be to study this problem of value alignment, to continue his efforts toward provably beneficial AI, and to ensure we don’t make the same mistakes as King Midas.

“Look,” he says, “if you were King Midas, would you want your robot to say, ‘Everything turns to gold? OK, boss, you got it.’ No! You’d want it to say, ‘Are you sure? Including your food, drink, and relatives? I’m pretty sure you wouldn’t like that. How about this: you point to something and say ‘Abracadabra Aurificio’ or something, and then I’ll turn it to gold, OK?’”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.