Introductory Resources on AI Safety Research

The resources are selected for relevance and/or brevity, and the list is not meant to be comprehensive.


For a popular audience:

FLI: AI risk background and FAQ. At the bottom of the background page, there is a more extensive list of resources on AI safety.

Tim Urban, Wait But Why: The AI Revolution. An accessible introduction to AI risk forecasts and arguments (with cute hand-drawn diagrams, and a few corrections from Luke Muehlhauser).

GiveWell: Potential risks from advanced artificial intelligence. An overview of AI risks and timelines, possible interventions, and current actors in this space.

Stuart Armstrong. Smarter Than Us: The Rise Of Machine Intelligence. A short ebook discussing potential promises and challenges presented by advanced AI, and the interdisciplinary problems that need to be solved on the way there.

For a more technical audience:

Stuart Russell:

  • The long-term future of AI (longer version). A video of Russell’s classic talk, discussing why it makes sense for AI researchers to think about AI safety, and going over various misconceptions about the issues.
  • Concerns of an AI pioneer. An interview with Russell on the importance of provably aligning AI with human values, and the challenges of value alignment research.
  • On Myths and Moonshine. Russell’s response to the “Myth of AI” question on, which draws an analogy between AI research and nuclear research, and points out some dangers of optimizing a misspecified utility function.

Scott Alexander: No time like the present for AI safety work. An overview of long-term AI safety challenges, e.g. preventing wireheading and formalizing ethics.

Victoria Krakovna: AI risk without an intelligence explosion. An overview of long-term AI risks besides the (overemphasized) intelligence explosion / hard takeoff scenario, arguing why intelligence explosion skeptics should still think about AI safety.

Technical overviews

Amodel, Olah et al: Concrete Problems in AI safety

Taylor et al (MIRI): Alightment for Advanced Machine Learning Systems

FLI: A survey of research priorities for robust and beneficial AI

MIRI: Aligning Superintelligence with Human Interests: A Technical Research Agenda

Jacob Steinhardt: Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems. A taxonomy of AI safety issues that require ordinary vs extraordinary engineering to address.

Nate Soares: Safety engineering, target selection, and alignment theory. Identifies and motivates three major areas of AI safety research.

Nick Bostrom: Superintelligence: Paths, Dangers, Strategies. A seminal book outlining long-term AI risk considerations.

Technical work

Steve Omohundro: The basic AI drives. Argues that sufficiently advanced AI systems are likely to develop drives such as self-preservation and resource acquisition independently of their assigned objectives.

Paul Christiano: AI control. A blog on designing safe, efficient AI systems (approval-directed agents, aligned reinforcement learning agents, etc).

MIRI: Corrigibility. Designing AI systems without incentives to resist corrective modifications by their creators.

Laurent Orseau: Wireheading. An investigation into how different types of artificial agents respond to wireheading opportunities (unintended shortcuts to maximize their objective function).

Collections of papers

MIRI publications

FHI publications

If you want to go into AI safety research, check out these guidelines and the AI Safety Syllabus

Thanks to Ben Sancetta, Taymon Beal and Janos Kramar for their feedback on this post.

This article was originally posted on Victoria Krakovna’s blog.