Luke Muehlhauser is the Executive Director of the Machine Intelligence Research Institute (MIRI), a research institute devoted to studying the technical challenges of ensuring desirable behavior from highly advanced AI agents, including those capable of recursive self-improvement. In this guest blog post, he delves into MIRI’s new research agenda.
MIRI’s current research agenda — summarized in “Aligning Superintelligence with Human Interests” — is focused on technical research problems that must be solved in order to eventually build smarter-than-human AI systems that are reliably aligned with human interests: How can we create an AI agent that will reliably pursue the goals it is given? How can we formally specify beneficial goals? How can we ensure that this agent will assist and cooperate with its programmers as they improve its design, given that mistakes in the initial version are inevitable?
Within this broad area of research, MIRI specializes in problems which have three properties:
(a) We focus on research questions that cannot be delegated to future human-level AI systems (HLAIs). HLAIs will have the incentives and capability to improve e.g. their own machine vision algorithms, but if an HLAI’s preferences themselves are mis-specified, in may never have an incentive to “fix” the mis-specification itself.
(b) We focus on research questions that are tractable today. In the absence of concrete HLAI designs to test and verify, research on these problems must be theoretical and exploratory, but such research should be technical whenever possible so that clear progress can be shown, e.g. by discovering unbounded formal solutions for problems we currently don’t know how to solve even given unlimited computational resources. Such exploratory work is somewhat akin to the toy models Butler Lampson used to study covert channel communication two decades before covert channels were observed in the wild, and is also somewhat akin to quantum algorithms research long before any large-scale quantum computer is built.
(c) We focus on research questions that are uncrowded. Research on e.g. formal verification for near-future AI systems already receives significant funding, whereas MIRI’s chosen research problems otherwise receive limited attention.
Example research problems we will study include:
(1) Corrigibility. How can we build an advanced agent that cooperates with what its creators regard as a corrective intervention in its design, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences?
(2) Value learning. Direct specification of broad human preferences in an advanced agents’ reward/value function is impractical. How can we build an advanced AI that will safely learn to act as intended?