AI Safety Research

Owen Cotton-Barratt

Director of Research: Centre for Effective Altruism, Oxford

Future of Humanity Institute, Oxford

Project: Decision-relevant uncertainty in AI safety

Amount Recommended:    $119,670

Project Summary

What are the most important projects for reducing the risk of harm from superintelligent artificial intelligence? We will probably not have to deal with such systems for many years – and we do not expect they will be developed with the same architectures we use today. That may make us want to focus on developing long-term capabilities in AI safety research. On the other hand, there are forces pushing us towards working on near-term problems. We suffer from ‘near-sightedness’ and are better at finding the answer to questions that are close at hand. Just as important, work on long-term problems can happen in the future and get extra people attending to it, while work on near-term problems has to happen now if it is to happen at all.

This project models the trade-offs we make when carrying out AI safety projects that aim at various horizons, and focused on specific architectures. It estimates crucial parameters – like the time-horizon probability distribution and how near-sighted we tend to be. It uses that model to work out what the AI safety community should be funding, and what it should call on policymakers to do.

Technical Abstract

The advent of human-level artificial intelligence (HLAI) would pose a challenge for society. The most cost-effective work on this challenge depends on the time at which we achieve HLAI, on the architecture which produces HLAI, and on whether the first HLAI is likely to be rapidly superseded. For example, direct work on safety issues is preferable if we will achieve HLAI soon, while theoretical work and capability building is important for more distant scenarios.

This project develops a model for the marginal cost-effectiveness of extra resources in AI safety.  The model accounts for uncertainty over scenarios and over work aimed at those scenarios, and for diminishing marginal returns for work. A major part of the project is parameter estimation.  We will estimate key parameters based on existing work where possible (timeline probability distributions), new work (‘near-sightedness’, using historical predictions of mitigation strategies for coming challenges), and expert elicitation, and combine these into a joint probability distribution representing our current best understanding of the likelihood of different scenarios.  The project will then make recommendations for the AI safety community, and for policy-makers, on prioritising between types of AI safety work.

Optimizing AI Safety Research: An Interview With Owen Cotton-Barratt

Artificial intelligence poses a myriad of risks to humanity. From privacy concerns, to algorithmic bias and “black box” decision making, to broader questions of value alignment, recursive self-improvement, and existential risk from superintelligence — there’s no shortage of AI safety issues.

AI safety research aims to address all of these concerns. But with limited funding and too few researchers, trade-offs in research are inevitable. In order to ensure that the AI safety community tackles the most important questions, researchers must prioritize their causes.

Owen Cotton-Barratt, along with his colleagues at the Future of Humanity Institute (FHI) and the Centre for Effective Altruism (CEA), looks at this ‘cause prioritization’ for the AI safety community. They analyze which projects are more likely to help mitigate catastrophic or existential risks from highly-advanced AI systems, especially artificial general intelligence (AGI). By modeling trade-offs between different types of research, Cotton-Barratt hopes to guide scientists toward more effective AI safety research projects.

Technical and Strategic Work

The first step of cause prioritization is understanding the work already being done. Broadly speaking, AI safety research happens in two domains: technical work and strategic work.

AI’s technical safety challenge is to keep machines safe and secure as they become more capable and creative. By making AI systems more predictable, more transparent, and more robustly aligned with our goals and values, we can significantly reduce the risk of harm. Technical safety work includes Stuart Russell’s research on reinforcement learning and Dan Weld’s work on explainable machine learning, since they’re improving the actual programming in AI systems.

In addition, the Machine Intelligence Research Institute (MIRI) recently released a technical safety agenda aimed at aligning machine intelligence with human interests in the long term, while OpenAI, another non-profit AI research company, is investigating the “many research problems around ensuring that modern machine learning systems operate as intended,” following suggestions from the seminal paper Concrete Problems in AI Safety.

Strategic safety work is broader, and asks how society can best prepare for and mitigate the risks of powerful AI. This research includes analyzing the political environment surrounding AI development, facilitating open dialogue between research areas, disincentivizing arms races, and learning from game theory and neuroscience about probable outcomes for AI. Yale professor Allan Dafoe has recently focused on strategic work, researching the international politics of artificial intelligence and consulting for governments, AI labs and nonprofits about AI risks. And Yale bioethicist Wendell Wallach, apart from his work on “silo busting,” is researching forms of global governance for AI.

Cause prioritization is strategy work, as well. Cotton-Barratt explains, “Strategy work includes analyzing the safety landscape itself and considering what kind of work do we think we’re going to have lots of, what are we going to have less of, and therefore helping us steer resources and be more targeted in our work.”

Who Needs More Funding?

As the graph above illustrates, AI safety spending has grown significantly since 2015. And while more money doesn’t always translate into improved results, funding patterns are easy to assess and can say a lot about research priorities. Seb Farquhar, Cotton-Barratt’s colleague at CEA, wrote a post earlier this year analyzing AI safety funding and suggesting ways to better allocate future investments.

To start, he suggests that the technical research community acquire more personal investigators to take the research agenda, detailed in Concrete Problems in AI Safety, forward. OpenAI is already taking a lead on this. Additionally, the community should go out of its way to ensure that emerging AI safety centers hire the best candidates, since these researchers will shape each center’s success for years to come.

In general, Farquhar notes that strategy, outreach and policy work haven’t kept up with the overall growth of AI safety research. He suggests that more people focus on improving communication about long-run strategies between AI safety research teams, between the AI safety community and the broader AI community, and between policymakers and researchers. Building more PhD and Masters courses on AI strategy and policy could establish a pipeline to fill this void, he adds.

To complement Farquhar’s data, Cotton-Barratt’s colleague Max Dalton created a mathematical model to track how more funding and more people working on a safety problem translate into useful progress or solutions. The model tries to answer such questions as: if we want to reduce AI’s existential risks, how much of an effect do we get by investing money in strategy research versus technical research?

In general, technical research is easier to track than strategic work in mathematical models. For example, spending more on strategic ethics research may be vital for AI safety, but it’s difficult to quantify that impact. Improving models of reinforcement learning, however, can produce safer and more robustly-aligned machines. With clearer feedback loops, these technical projects fit best with Dalton’s models.

Near-sightedness and AGI

But these models also confront major uncertainty. No one really knows when AGI will be developed, and this makes it difficult to determine the most important research. If AGI will be developed in five years, perhaps researchers should focus only on the most essential safety work, such as improving transparency in AI systems. But if we have thirty years, researchers can probably afford to dive into more theoretical work.

Moreover, no one really knows how AGI will function. Machine learning and deep neural networks have ushered in a new AI revolution, but AGI will likely be developed on architectures far different from AlphaGo and Watson.

This makes some long-term safety research a risky investment, even if, as many argue, it is the most important research we can do. For example, researchers could spend years making deep neural nets safe and transparent, only to find their work wasted when AGI develops on an entirely different programming architecture.

Cotton-Barratt attributes this issue to ‘nearsightedness,’ and discussed it in a recent talk at Effective Altruism Global this summer. Humans often can’t anticipate disruptive change, and AI researchers are no exception.

“Work that we might do for long-term scenarios might turn out to be completely confused because we weren’t thinking of the right type of things,” he explains. “We have more leverage over the near-term scenarios because we’re more able to assess what they’re going to look like.”

Any additional AI safety research is better than none, but given the unknown timelines and the potential gravity of AI’s threats to humanity, we’re better off pursuing — to the extent possible — the most effective AI safety research.

By helping the AI research portfolio advance in a more efficient and comprehensive direction, Cotton-Barratt and his colleagues hope to ensure that when machines eventually outsmart us, we will have asked — and hopefully answered — the right questions.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.


  1. Ethical AI: June 8, 2016. Oxford.
    • Owen presented new ideas at a one-day workshop at Oxford. He has further developed informal models of likely crucial parameters to include in the models, and he now believes that the model should additionally include a division between scenarios where a single AI-enabled actor gains a decisive strategic advantage, and ones where this does not occur.

Ongoing Projects

  1. Strategic AI Research Centre (SAIRC) – Future of Humanity Institute
    • Together with Toby Ord and Andrew Snyder-Beattie, Owen drafted short, but comprehensive topic outlines with the purpose of structuring the research agenda of SAIRC. These papers have condensed the past research and tacit knowledge into several concrete outlines and a clear research agenda. The three pieces cover multipolar versus singleton scenarios, the risks from different speeds of intelligence takeoffs, and the comparative risks of different AI research trajectories.