Decision-relevant uncertainty in AI safety
What are the most important projects for reducing the risk of harm from superintelligent artificial intelligence? We will probably not have to deal with such systems for many years – and we do not expect they will be developed with the same architectures we use today. That may make us want to focus on developing long-term capabilities in AI safety research. On the other hand, there are forces pushing us towards working on near-term problems. We suffer from ‘near-sightedness’ and are better at finding the answer to questions that are close at hand. Just as important, work on long-term problems can happen in the future and get extra people attending to it, while work on near-term problems has to happen now if it is to happen at all.
This project models the trade-offs we make when carrying out AI safety projects that aim at various horizons, and focused on specific architectures. It estimates crucial parameters – like the time-horizon probability distribution and how near-sighted we tend to be. It uses that model to work out what the AI safety community should be funding, and what it should call on policymakers to do.
The advent of human-level artificial intelligence (HLAI) would pose a challenge for society. The most cost-effective work on this challenge depends on the time at which we achieve HLAI, on the architecture which produces HLAI, and on whether the first HLAI is likely to be rapidly superseded. For example, direct work on safety issues is preferable if we will achieve HLAI soon, while theoretical work and capability building is important for more distant scenarios.
This project develops a model for the marginal cost-effectiveness of extra resources in AI safety. The model accounts for uncertainty over scenarios and over work aimed at those scenarios, and for diminishing marginal returns for work. A major part of the project is parameter estimation. We will estimate key parameters based on existing work where possible (timeline probability distributions), new work (‘near-sightedness’, using historical predictions of mitigation strategies for coming challenges), and expert elicitation, and combine these into a joint probability distribution representing our current best understanding of the likelihood of different scenarios. The project will then make recommendations for the AI safety community, and for policymakers, on prioritising between types of AI safety work.