AI Safety Research

Owen Cotton-Barratt

Director of Research: Centre for Effective Altruism, Oxford

Future of Humanity Institute, Oxford

Project: Decision-relevant uncertainty in AI safety

Amount Recommended:    $119,670

Project Summary

What are the most important projects for reducing the risk of harm from superintelligent artificial intelligence? We will probably not have to deal with such systems for many years – and we do not expect they will be developed with the same architectures we use today. That may make us want to focus on developing long-term capabilities in AI safety research. On the other hand, there are forces pushing us towards working on near-term problems. We suffer from ‘near-sightedness’ and are better at finding the answer to questions that are close at hand. Just as important, work on long-term problems can happen in the future and get extra people attending to it, while work on near-term problems has to happen now if it is to happen at all.

This project models the trade-offs we make when carrying out AI safety projects that aim at various horizons, and focused on specific architectures. It estimates crucial parameters – like the time-horizon probability distribution and how near-sighted we tend to be. It uses that model to work out what the AI safety community should be funding, and what it should call on policymakers to do.

Technical Abstract

The advent of human-level artificial intelligence (HLAI) would pose a challenge for society. The most cost-effective work on this challenge depends on the time at which we achieve HLAI, on the architecture which produces HLAI, and on whether the first HLAI is likely to be rapidly superseded. For example, direct work on safety issues is preferable if we will achieve HLAI soon, while theoretical work and capability building is important for more distant scenarios.

This project develops a model for the marginal cost-effectiveness of extra resources in AI safety.  The model accounts for uncertainty over scenarios and over work aimed at those scenarios, and for diminishing marginal returns for work. A major part of the project is parameter estimation.  We will estimate key parameters based on existing work where possible (timeline probability distributions), new work (‘near-sightedness’, using historical predictions of mitigation strategies for coming challenges), and expert elicitation, and combine these into a joint probability distribution representing our current best understanding of the likelihood of different scenarios.  The project will then make recommendations for the AI safety community, and for policy-makers, on prioritising between types of AI safety work.


  1. Ethical AI: June 8, 2016. Oxford.
    • Owen presented new ideas at a one-day workshop at Oxford. He has further developed informal models of likely crucial parameters to include in the models, and he now believes that the model should additionally include a division between scenarios where a single AI-enabled actor gains a decisive strategic advantage, and ones where this does not occur.

Ongoing Projects

  1. Strategic AI Research Centre (SAIRC) – Future of Humanity Institute
    • Together with Toby Ord and Andrew Snyder-Beattie, Owen drafted short, but comprehensive topic outlines with the purpose of structuring the research agenda of SAIRC. These papers have condensed the past research and tacit knowledge into several concrete outlines and a clear research agenda. The three pieces cover multipolar versus singleton scenarios, the risks from different speeds of intelligence takeoffs, and the comparative risks of different AI research trajectories.