More in-depth background reading about risks and benefits of artificial intelligence

Superintelligence survey

The Future of AI – What Do You Think?

Max Tegmark’s new book on artificial intelligence, Life 3.0: Being Human in the Age of Artificial Intelligence, explores how AI will impact life as it grows increasingly advanced, perhaps even achieving superintelligence far beyond human level in all areas. For the book, Max surveys experts’ forecasts, and explores a broad spectrum of views on what will/should happen. But it’s time to expand the conversation. If we’re going to create a future that benefits as many people as possible, we need to include as many voices as possible. And that includes yours! Below are the answers from the first 14,866 people who have taken the survey that goes along with Max’s book. To join the conversation yourself, please take the survey here.

How soon, and should we welcome or fear it?

The first big controversy, dividing even leading AI researchers, involves forecasting what will happen. When, if ever, will AI outperform humans at all intellectual tasks, and will it be a good thing?

Do you want superintelligence?

Everything we love about civilization is arguably the product of intelligence, so we can potentially do even better by amplifying human intelligence with machine intelligence. But some worry that superintelligent machines would end up controlling us and wonder whether their goals would be aligned with ours. Do you want there to be superintelligent AI, i.e., general intelligence far beyond human level?

What Should the Future Look Like?

In his book, Tegmark argues that we shouldn’t passively ask “what will happen?” as if the future is predetermined, but instead ask what we want to happen and then try to create that future.  What sort of future do you want?

If superintelligence arrives, who should be in control?
If you one day get an AI helper, do you want it to be conscious, i.e., to have subjective experience (as opposed to being like a zombie which can at best pretend to be conscious)?
What should a future civilization strive for?
Do you want life spreading into the cosmos?

The Ideal Society?

In Life 3.0, Max explores 12 possible future scenarios, describing what might happen in the coming millennia if superintelligence is/isn’t developed. You can find a cheatsheet that quickly describes each here, but for a more detailed look at the positives and negatives of each possibility, check out chapter 5 of the book. Here’s a breakdown so far of the options people prefer:

You can learn a lot more about these possible future scenarios — along with fun explanations about what AI is, how it works, how it’s impacting us today, and what else the future might bring — when you order Max’s new book.

The results above will be updated regularly. Please add your voice by taking the survey here, and share your comments below!

When AI Journalism Goes Bad

Slate is currently running a feature called “Future Tense,” which claims to be the “citizens guide to the future.” Two of their recent articles, however, are full of inaccuracies about AI safety and the researchers studying it. While this is disappointing, it also represents a good opportunity to clear up some misconceptions about why AI safety research is necessary.

The first contested article was Let Artificial Intelligence Evolve, by Michael Chorost, which displays a poor understanding of the issues surrounding the evolution of artificial intelligence. The second, How to be Good, by Adam Elkus, got some of the concerns about developing safe AI correct, but, in the process, did great disservice to one of today’s most prominent AI safety researchers, as well as to scientific research in general.

We do not know if AI will evolve safely

In his article, Chorost defends the idea of simply letting artificial intelligence evolve, without interference from researchers worried about AI safety. Chorost first considers an example from Nick Bostrom’s book, Superintelligence, in which a superintelligent system might tile the Earth with some undesirable product, thus eliminating all biological life. Chorost argues this is impossible because “a superintelligent mind would need time and resources to invent humanity-destroying technologies.” Of course it would. The concern is that a superintelligent system, being smarter than us, would be able to achieve such goals without us realizing what it was up to. How? We don’t know. This is one of the reasons it’s so important to study AI safety now.

It’s quite probable that a superintelligent system would not attempt such a feat, but at the moment, no one can guarantee that. We don’t know yet how a superintelligent AI will behave. There’s no reason to expect a superintelligent system to “think” like humans do, yet somehow we need to try to anticipate what an advanced AI will do. We can’t just hope that advanced AI systems will evolve compatibly with human life: we need to do research now to try to ensure compatibility.

Chorost then goes on to claim that a superintelligent AI won’t tile the Earth with some undesirable object because it won’t want to. He says, “Until an A.I. has feelings, it’s going to be unable to want to do anything at all, let alone act counter to humanity’s interests and fight off human resistance. Wanting is essential to any kind of independent action.” This represents misplaced anthropromorphization and a misunderstanding of programming goals. What an AI wants to do is dependent on what it is programmed to do. Microsoft Office doesn’t want me to spell properly, yet it will mark all misspelled words because that’s what it was programmed to do. And that’s just software, not an advanced, superintelligent system, which would be infinitely more complex.

If a robot is given the task of following a path to reach some destination, but is programmed to recognize that reaching the destination is more important than sticking to the path, then if it encounters an obstacle, it will find another route in order to achieve its primary objective. This isn’t because it has an emotional attachment to reaching its destination, but rather, that’s what it was programmed to do. AlphaGo doesn’t want to beat the world’s top Go player: it’s just been programmed to win at Go. The list of examples of a system wanting to achieve some goal can go on and on, and it has nothing to do with how (or whether) the system feels.

Chorost continues this argument by claiming: “And the minute an A.I. wants anything, it will live in a universe with rewards and punishments—including punishments from us for behaving badly. In order to survive in a world dominated by humans, a nascent A.I. will have to develop a human-like moral sense that certain things are right and others are wrong.” Unless it’s smart enough to trick us into thinking it’s doing what we want while doing something completely different without us realizing it. Any child knows that one of the best ways to not get in trouble is to not get caught. Why would we think a superintelligent system couldn’t learn the same lesson? A punishment might just antagonize it or teach it to deceive us. There’s also the chance that the superintelligent agent will partake in some sort of action that is too complex for us to understand its ramifications; we can’t punish an agent if we don’t realize that what it’s doing is harmful.

The article then considers that for a superintelligent system to want something in the way that biological entities want something, it can’t be made purely with electronics. The reasoning is that since humans are biochemical in nature, if we want to create a superintelligent system with human wants and needs, that system must be made of similar stuff. Specifically, Chorost says, “To get a system that has sensations, you would have to let it recapitulate the evolutionary process in which sensations became valuable.”

First, it’s not clear why we need a superintelligent system that exhibits sensations, nor is there any reason that should be a goal of advanced AI. Chorost argues that we need this because it’s the only way a system can evolve to be moral, but his arguments seem limited to the idea that for a system to be superintelligent, it must be human-like.

Yet, consider the analogy of planes to birds. Planes are essentially electronics and metal – none of the biochemistry of a bird – yet they can fly higher, faster, longer, and farther than any bird. And while collisions between birds and planes can damage a plane, they’re a lot more damaging to the bird. Though planes are on the “dumber” end of the AI superintelligence spectrum, compared to birds, they could be considered “superflying” systems. There’s no reason to expect a superintelligent system to be any more similar to humans than planes are to birds.

Finally, Chorost concludes the article by arguing that history has shown that as humanity has evolved, it has become less and less violent. He argues, “A.I.s will have to step on the escalator of reason just like humans have, because they will need to bargain for goods in a human-dominated economy and they will face human resistance to bad behavior.” However, even if this is a completely accurate prediction, he doesn’t explain how we survive a superintelligent system as it transitions from its early violent stages to the more advanced social understanding we have today.

Again, it’s important to keep in mind that perhaps as AI evolves, everything truly will go smoothly, but we don’t know for certain that’s the case. As long as there are unknowns about the future of AI, we need beneficial AI research.

This leads to the problematic second article by Elkus. The premise of his article is reasonable: he believes it will be difficult to teach human values to an AI, given that human values aren’t consistent across all societies. However, his shoddy research and poor understanding of AI research turn this article into an example of a dangerous and damaging type of scientific journalism, both for AI and science in general.

Bad AI journalism can ruin the science

Elkus looks at a single interview that AI researcher Stuart Russell gave to Quanta Magazine. He then uses snippets of that interview, taken out of context, as his basis for arguing that AI researchers are not properly addressing concerns about developing AI with human-aligned values. He criticizes Russell for only focusing on the technical side of robotics values, saying, “The question is not whether machines can be made to obey human values but which humans ought to decide those values.” On the contrary, both are important questions that must be asked, and Russell asks both questions in all of his published talks. The values a robot takes on will have to be decided by societies, government officials, policy makers, the robot’s owners, etc. Russell argues that the learning process should involve the entire human race, to the extent possible, both now and throughout history. In this talk he gave at CERN in January of this year, Russell clearly enunciates that the “obvious difficulties” of value alignment include the fact that “values differ across individuals and cultures.” Elkus essentially fabricates a position that Russell does not take in order to provide a line of attack.

Elkus also argues that Russell needs to “brush up on his A.I. History” and learn from failed research in the past, without realizing that those lessons are already incorporated into Russell’s research (and apparently without realizing that Russell is the co-author of the seminal textbook on Artificial Intelligence, which, over 20 year later, is still the most influential and fundamental text on AI — the book is viewed by other AI history experts, such as Nils Nilsson, as perhaps the authoritative source on much of AI’s history). He also misunderstands the objectives of having a robot learn about human values from something like movies or books. Elkus inaccurately suggests that the AI would learn only from one movie, which is obviously problematic if the AI only “watches” the silent, racist movie, Birth of a Nation. Instead, the AI could look at all movies. Then it could look at all criticisms and reviews of each movie, as well as how public reactions to the movies change over the years. This is just one example of how an AI could learn values, but certainly not the only one.

Finally, Elkus suggests that Russell, as a “Western, well-off, white male cisgender scientist,” has no right to be working on the problem of ensuring that machines respect human values. For the sake of civil discourse, we will ignore the ad hominem nature of this argument and assume that it is merely a recommendation to draw on the expertise of multiple disciplines and viewpoints. Yet a simple Google search would reveal that not only is Russell one of the fiercest advocates for ensuring we keep AI safe and beneficial, but he is an equally strong advocate for bringing together a broad coalition of researchers and the broadest possible range of people to tackle the question of human values. In this talk at the World Economic Forum in 2015, Russell predicted that “in the future, moral philosophy will be a key industry sector,” and he suggests that machines will need to “engage in an extended conversation with the human race” to learn about human values.

Two days after Elkus’s article went live, Slate published an interview with Russell, written by another author, that does do a reasonable job of explaining Russell’s research and his concerns about AI safety. However, this is uncommon. Rarely do scientists have a chance to defend themselves. Plus, even when they are able to rebut an article, seeds of doubt have already been planted in the public’s mind.

From the perspective of beneficial AI research, articles like Elkus’s do more harm than good. Elkus describes an important problem that must be solved to achieve safe AI, but portrays one of the top AI safety researchers as someone who doesn’t know what he’s doing. This unnecessarily increases fears about the development of artificial intelligence, making researchers’ jobs that much more difficult. More generally, this type of journalism can be damaging not only to the researcher in question, but also to the overall field. If the general public develops a distaste for some scientific pursuit, then raising the money necessary to perform the research becomes that much more difficult.

For the sake of good science, journalists must maintain a higher standard and do their own due diligence when researching a particular topic or scientist: when it comes to science, there is most definitely such a thing as bad press.

Introductory Resources on AI Safety Research

Reading list to get up to speed on the main ideas in the field. The resources are selected for relevance and/or brevity, and the list is not meant to be comprehensive. [Updated on 15 August 2017.]


For a popular audience:

Cade Metz, 2017. New York Times: Teaching A.I. Systems to Behave Themselves

FLI. AI risk background and FAQ. At the bottom of the background page, there is a more extensive list of resources on AI safety.

Tim Urban, 2015. Wait But Why: The AI Revolution. An accessible introduction to AI risk forecasts and arguments (with cute hand-drawn diagrams, and a few corrections from Luke Muehlhauser).

OpenPhil, 2015. Potential risks from advanced artificial intelligence. An overview of AI risks and timelines, possible interventions, and current actors in this space.

For a more technical audience:

Stuart Russell:

  • The long-term future of AI (longer version), 2015. A video of Russell’s classic talk, discussing why it makes sense for AI researchers to think about AI safety, and going over various misconceptions about the issues.
  • Concerns of an AI pioneer, 2015. An interview with Russell on the importance of provably aligning AI with human values, and the challenges of value alignment research.
  • On Myths and Moonshine, 2014. Russell’s response to the “Myth of AI” question on, which draws an analogy between AI research and nuclear research, and points out some dangers of optimizing a misspecified utility function.

Scott Alexander, 2015. No time like the present for AI safety work. An overview of long-term AI safety challenges, e.g. preventing wireheading and formalizing ethics.

Victoria Krakovna, 2015. AI risk without an intelligence explosion. An overview of long-term AI risks besides the (overemphasized) intelligence explosion / hard takeoff scenario, arguing why intelligence explosion skeptics should still think about AI safety.

Stuart Armstrong, 2014. Smarter Than Us: The Rise Of Machine Intelligence. A short ebook discussing potential promises and challenges presented by advanced AI, and the interdisciplinary problems that need to be solved on the way there.

Technical overviews

Soares and Fallenstein, 2017. Aligning Superintelligence with Human Interests: A Technical Research Agenda

Amodei, Olah, et al, 2016. Concrete Problems in AI safety. Research agenda focusing on accident risks that apply to current ML systems as well as more advanced future AI systems.

Jessica Taylor et al, 2016. Alignment for Advanced Machine Learning Systems

FLI, 2015. A survey of research priorities for robust and beneficial AI

Jacob Steinhardt, 2015. Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems. A taxonomy of AI safety issues that require ordinary vs extraordinary engineering to address.

Nate Soares, 2015. Safety engineering, target selection, and alignment theory. Identifies and motivates three major areas of AI safety research.

Nick Bostrom, 2014. Superintelligence: Paths, Dangers, Strategies. A seminal book outlining long-term AI risk considerations.

Steve Omohundro, 2007. The basic AI drives. A classic paper arguing that sufficiently advanced AI systems are likely to develop drives such as self-preservation and resource acquisition independently of their assigned objectives.

Technical work

Value learning:

Smitha Milli et al. Should robots be obedient? Obedience to humans may sound like a great thing, but blind obedience can get in the way of learning human preferences.

William Saunders et al, 2017. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. (blog post)

Amin, Jiang, and Singh, 2017. Repeated Inverse Reinforcement Learning. Separates the reward function into a task-specific component and an intrinsic component. In a sequence of task, the agent learns the intrinsic component while trying to avoid surprising the human.

Dylan Hadfield-Menell et al, 2016. Cooperative inverse reinforcement learning. Defines value learning as a cooperative game where the human tries to teach the agent about their reward function, rather than giving optimal demonstrations like in standard IRL.

Owain Evans et al, 2016. Learning the Preferences of Ignorant, Inconsistent Agents.

Reward gaming / wireheading:

Tom Everitt et al, 2017. Reinforcement learning with a corrupted reward channel. A formalization of the reward misspecification problem in terms of true and corrupt reward, a proof that RL agents cannot overcome reward corruption, and a framework for giving the agent extra information to overcome reward corruption. (blog post)

Amodei and Clark, 2016. Faulty Reward Functions in the Wild. An example of reward function gaming in a boat racing game, where the agent gets a higher score by going in circles and hitting the same targets than by actually playing the game.

Everitt and Hutter, 2016. Avoiding Wireheading with Value Reinforcement Learning. An alternative to RL that reduces the incentive to wirehead.

Laurent Orseau, 2015. Wireheading. An investigation into how different types of artificial agents respond to opportunities to wirehead (unintended shortcuts to maximize their objective function).

Interruptibility / corrigibility:

Dylan Hadfield-Menell et al. The Off-Switch Game. This paper studies the interruptibility problem as a game between human and robot, and investigates which incentives the robot could have to allow itself to be switched off.

El Mahdi El Mhamdi et al, 2017. Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning.

Orseau and Armstrong, 2016. Safely interruptible agents. Provides a formal definition of safe interruptibility and shows that off-policy RL agents are more interruptible than on-policy agents. (blog post)

Nate Soares et al, 2015. Corrigibility. Designing AI systems without incentives to resist corrective modifications by their creators.

Scalable oversight:

Christiano, Leike et al, 2017. Deep reinforcement learning from human preferences. Communicating complex goals to AI systems using human feedback (comparing pairs of agent trajectory segments).

David Abel et al. Agent-Agnostic Human-in-the-Loop Reinforcement Learning.


Armstrong and Levinstein, 2017. Low Impact Artificial Intelligences. An intractable but enlightening definition of low impact for AI systems.

Babcock, Kramar and Yampolskiy, 2017. Guidelines for Artificial Intelligence Containment.

Scott Garrabrant et al, 2016. Logical Induction. A computable algorithm for the logical induction problem.

Note: I did not include literature on less neglected areas of the field like safe exploration, distributional shift, adversarial examples, or interpretability (see e.g. Concrete Problems or the CHAI bibliography for extensive references on these topics).

Collections of technical works

CHAI bibliography

MIRI publications

FHI publications

FLI grantee publications (scroll down)

Paul Christiano. AI control. A blog on designing safe, efficient AI systems (approval-directed agents, aligned reinforcement learning agents, etc).

If there are any resources missing from this list that you think are a must-read, please let me know! If you want to go into AI safety research, check out these guidelines and the AI Safety Syllabus.

(Thanks to Ben Sancetta, Taymon Beal and Janos Kramar for their feedback on this post.)

This article was originally posted on Victoria Krakovna’s blog.


Hawking Reddit AMA on AI

Our Scientific Advisory Board member Stephen Hawking’s long-awaited Reddit AMA answers on Artificial Intelligence just came out, and was all over today’s world news, including MSNBCHuffington PostThe Independent and Time.

Read the Q&A below and visit the official Reddit page for the full discussion:

Question 1:

Professor Hawking- Whenever I teach AI, Machine Learning, or Intelligent Robotics, my class and I end up having what I call “The Terminator Conversation.” My point in this conversation is that the dangers from AI are overblown by media and non-understanding news, and the real danger is the same danger in any complex, less-than-fully-understood code: edge case unpredictability. In my opinion, this is different from “dangerous AI” as most people perceive it, in that the software has no motives, no sentience, and no evil morality, and is merely (ruthlessly) trying to optimize a function that we ourselves wrote and designed. Your viewpoints (and Elon Musk’s) are often presented by the media as a belief in “evil AI,” though of course that’s not what your signed letter says. Students that are aware of these reports challenge my view, and we always end up having a pretty enjoyable conversation. How would you represent your own beliefs to my class? Are our viewpoints reconcilable? Do you think my habit of discounting the layperson Terminator-style “evil AI” is naive? And finally, what morals do you think I should be reinforcing to my students interested in AI?

Answer 1:

You’re right: media often misrepresent what is actually said. The real risk with AI isn’t malice but competence. A superintelligent AI will be extremely good at accomplishing its goals, and if those goals aren’t aligned with ours, we’re in trouble. You’re probably not an evil ant-hater who steps on ants out of malice, but if you’re in charge of a hydroelectric green energy project and there’s an anthill in the region to be flooded, too bad for the ants. Let’s not place humanity in the position of those ants. Please encourage your students to think not only about how to create AI, but also about how to ensure its beneficial use.

Question 2:

Hello Doctor Hawking, thank you for doing this AMA. I am a student who has recently graduated with a degree in Artificial Intelligence and Cognitive Science. Having studied A.I., I have seen first hand the ethical issues we are having to deal with today concerning how quickly machines can learn the personal features and behaviours of people, as well as being able to identify them at frightening speeds. However, the idea of a “conscious” or actual intelligent system which could pose an existential threat to humans still seems very foreign to me, and does not seem to be something we are even close to cracking from a neurological and computational standpoint. What I wanted to ask was, in your message aimed at warning us about the threat of intelligent machines, are you talking about current developments and breakthroughs (in areas such as machine learning), or are you trying to say we should be preparing early for what will inevitably come in the distant future?

Answer 2:

The latter. There’s no consensus among AI researchers about how long it will take to build human-level AI and beyond, so please don’t trust anyone who claims to know for sure that it will happen in your lifetime or that it won’t happen in your lifetime. When it eventually does occur, it’s likely to be either the best or worst thing ever to happen to humanity, so there’s huge value in getting it right. We should shift the goal of AI from creating pure undirected artificial intelligence to creating beneficial intelligence. It might take decades to figure out how to do this, so let’s start researching this today rather than the night before the first strong AI is switched on.

Question 3:

Hello, Prof. Hawking. Thanks for doing this AMA! Earlier this year you, Elon Musk, and many other prominent science figures signed an open letter warning the society about the potential pitfalls of Artificial Intelligence. The letter stated: “We recommend expanded research aimed at ensuring that increasingly capable AI systems are robust and beneficial: our AI systems must do what we want them to do.” While being a seemingly reasonable expectation, this statement serves as a start point for the debate around the possibility of Artificial Intelligence ever surpassing the human race in intelligence.
My questions: 1. One might think it impossible for a creature to ever acquire a higher intelligence than its creator. Do you agree? If yes, then how do you think artificial intelligence can ever pose a threat to the human race (their creators)? 2. If it was possible for artificial intelligence to surpass humans in intelligence, where would you define the line of “It’s enough”? In other words, how smart do you think the human race can make AI, while ensuring that it doesn’t surpass them in intelligence?

Answer 3:

It’s clearly possible for a something to acquire higher intelligence than its ancestors: we evolved to be smarter than our ape-like ancestors, and Einstein was smarter than his parents. The line you ask about is where an AI becomes better than humans at AI design, so that it can recursively improve itself without human help. If this happens, we may face an intelligence explosion that ultimately results in machines whose intelligence exceeds ours by more than ours exceeds that of snails.

Question 4:

I’m rather late to the question-asking party, but I’ll ask anyway and hope. Have you thought about the possibility of technological unemployment, where we develop automated processes that ultimately cause large unemployment by performing jobs faster and/or cheaper than people can perform them? Some compare this thought to the thoughts of the Luddites, whose revolt was caused in part by perceived technological unemployment over 100 years ago. In particular, do you foresee a world where people work less because so much work is automated? Do you think people will always either find work or manufacture more work to be done? Thank you for your time and your contributions. I’ve found research to be a largely social endeavor, and you’ve been an inspiration to so many.

Answer 4:

If machines produce everything we need, the outcome will depend on how things are distributed. Everyone can enjoy a life of luxurious leisure if the machine-produced wealth is shared, or most people can end up miserably poor if the machine-owners successfully lobby against wealth redistribution. So far, the trend seems to be toward the second option, with technology driving ever-increasing inequality.

Question 5:

Hello Professor Hawking, thank you for doing this AMA! I’ve thought lately about biological organisms’ will to survive and reproduce, and how that drive evolved over millions of generations. Would an AI have these basic drives, and if not, would it be a threat to humankind? Also, what are two books you think every person should read?

Answer 5:

An AI that has been designed rather than evolved can in principle have any drives or goals. However, as emphasized by Steve Omohundro, an extremely intelligent future AI will probably develop a drive to survive and acquire more resources as a step toward accomplishing whatever goal it has, because surviving and having more resources will increase its chances of accomplishing that other goal. This can cause problems for humans whose resources get taken away.

Question 6:

Thanks for doing this AMA. I am a biologist. Your fear of AI appears to stem from the assumption that AI will act like a new biological species competing for the same resources or otherwise transforming the planet in ways incompatible with human (or other) life. But the reason that biological species compete like this is because they have undergone billions of years of selection for high reproduction. Essentially, biological organisms are optimized to ‘take over’ as much as they can. It’s basically their ‘purpose’. But I don’t think this is necessarily true of an AI. There is no reason to surmise that AI creatures would be ‘interested’ in reproducing at all. I don’t know what they’d be ‘interested’ in doing. I am interested in what you think an AI would be ‘interested’ in doing, and why that is necessarily a threat to humankind that outweighs the benefits of creating a sort of benevolent God.

Answer 6:

You’re right that we need to avoid the temptation to anthropomorphize and assume that AI’s will have the sort of goals that evolved creatures to. An AI that has been designed rather than evolved can in principle have any drives or goals. However, as emphasized by Steve Omohundro, an extremely intelligent future AI will probably develop a drive to survive and acquire more resources as a step toward accomplishing whatever goal it has, because surviving and having more resources will increase its chances of accomplishing that other goal. This can cause problems for humans whose resources get taken away.

Wait But Why: ‘The AI Revolution’

Tim Urban of Wait But Why has an engaging two-part series on the development of superintelligent AI and the dramatic consequences it would have on humanity. Equal parts exciting and sobering, this is a perfect primer for the layperson and thorough enough to be read-worthy to acquaintances of the topic as well.

Part 1: The Road to Superintelligence

Part 2: Our Immortality or Extinction