MIRI’s New Technical Research Agenda

Published:

March 18, 2015

Author:

Nate Soares

Luke Muehlhauser is the Executive Director of the Machine Intelligence Research Institute (MIRI), a research institute devoted to studying the technical challenges of ensuring desirable behavior from highly advanced AI agents, including those capable of recursive self-improvement. In this guest blog post, he delves into MIRI’s new research agenda.

MIRI’s current research agenda — summarized in “Aligning Superintelligence with Human Interests” — is focused on technical research problems that must be solved in order to eventually build smarter-than-human AI systems that are reliably aligned with human interests: How can we create an AI agent that will reliably pursue the goals it is given? How can we formally specify beneficial goals? How can we ensure that this agent will assist and cooperate with its programmers as they improve its design, given that mistakes in the initial version are inevitable?

Within this broad area of research, MIRI specializes in problems which have three properties:

(a) We focus on research questions that cannot be delegated to future human-level AI systems (HLAIs). HLAIs will have the incentives and capability to improve e.g. their own machine vision algorithms, but if an HLAI’s preferences themselves are mis-specified, in may never have an incentive to “fix” the mis-specification itself.

(b) We focus on research questions that are tractable today. In the absence of concrete HLAI designs to test and verify, research on these problems must be theoretical and exploratory, but such research should be technical whenever possible so that clear progress can be shown, e.g. by discovering unbounded formal solutions for problems we currently don’t know how to solve even given unlimited computational resources. Such exploratory work is somewhat akin to the toy models Butler Lampson used to study covert channel communication two decades before covert channels were observed in the wild, and is also somewhat akin to quantum algorithms research long before any large-scale quantum computer is built.

(c) We focus on research questions that are uncrowded. Research on e.g. formal verification for near-future AI systems already receives significant funding, whereas MIRI’s chosen research problems otherwise receive limited attention.

Example research problems we will study include:

(1) Corrigibility. How can we build an advanced agent that cooperates with what its creators regard as a corrective intervention in its design, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences?

(2) Value learning. Direct specification of broad human preferences in an advanced agents’ reward/value function is impractical. How can we build an advanced AI that will safely learn to act as intended?

This content was first published at futureoflife.org on March 18, 2015.

About the Future of Life Institute

The Future of Life Institute (FLI) is a global think tank with a team of 20+ full-time staff operating across the US and Europe. FLI has been working to steer the development of transformative technologies towards benefitting life and away from extreme large-scale risks since its founding in 2014. Find out more about our mission or explore our work.

Our content

MIRI’s New Technical Research Agenda

Contents

About the Future of Life Institute

Related content

Other posts about AI, Partner Orgs

Preparing for an AI Economy (with Daniel Susskind)

Will AI Companies Respect Creators’ Rights? (with Ed Newton-Rex)

AI Timelines and Human Psychology (with Sarah Hastings-Woodhouse)

Could Powerful AI Break Our Fragile World? (with Michael Nielsen)

Sign up for the Future of Life Institute newsletter