New Center for Human-Compatible AI
Contents
Congratulations to Stuart Russell for his recently announced launch of the Center for Human-Compatible AI!
The new center will be funded, primarily, by a generous grant from the Open Philanthropy Project for $5,555,550. The center will focus on research around value alignment, in which AI systems and robots will be trained using novel methods to understand what a human really wants, rather than just relying on initial programming.
Russell is most well known as the co-author of Artificial Intelligence: A Modern Approach, which has become the standard textbook for AI students. However, in recent years, Russell has also become an increasingly strong advocate for AI safety research and ensuring that the goals of artificial intelligence align with the goals of humans.
In a statement to FLI, Russell (who also sits on the FLI Science Advisory Board) said:
“I’m thrilled to have the opportunity to launch a serious attack on what is — as Nick Bostrom has called it — ‘the essential task of our age.’ It’s obviously in the very early stages but our work (funded previously by FLI) is already leading to some surprising new ideas for what safe AI systems might look like. We hope to find some excellent PhD students and postdocs and to start training the researchers who will take this forward.”
An example of this type of research can be seen in a paper published this month by Russell and other researchers on Cooperative Inverse Reinforcement Learning (CIRL). In inverse reinforcement learning, the AI system or robot has to learn a human’s goals by observing the human in a real-world or simulated environment, and CIRL is a potentially more effective method for teaching the AI to achieve this. In a press release about the new center, the Open Philanthropy Project listed other possible research avenues, such as:
- “Value alignment through, e.g., inverse reinforcement learning from multiple sources (such as text and video).
- “Value functions defined by partially observable and partially defined terms (e.g. ‘health,’ ‘death’).
- “The structure of human value systems, and the implications of computational limitations and human inconsistency.
- “Conceptual questions including the properties of ideal value systems, tradeoffs among humans and long-term stability of values.”
Other funders include the Future of Life Institute and the Defense Advanced Research Projects Agency, and other co-PIs and collaborators include:
- Pieter Abbeel, Associate Professor of Computer Science, UC Berkeley
- Anca Dragan, Assistant Professor of Computer Science, UC Berkeley
- Tom Griffiths, Professor of Psychology and Cognitive Science, UC Berkeley
- Bart Selman, Professor of Computer Science, Cornell University
- Joseph Halpern, Professor of Computer Science, Cornell University
- Michael Wellman, Professor of Computer Science, University of Michigan
- Satinder Singh Baveja, Professor of Computer Science, University of Michigan
In their press release, the Open Philanthropy Project added:
“We also believe that supporting Professor Russell’s work in general is likely to be beneficial. He appears to us to be more focused on reducing potential risks of advanced artificial intelligence (particularly the specific risks we are most focused on) than any comparably senior, mainstream academic of whom we are aware. We also see him as an effective communicator with a good reputation throughout the field.”
About the Future of Life Institute
The Future of Life Institute (FLI) is a global non-profit with a team of 20+ full-time staff operating across the US and Europe. FLI has been working to steer the development of transformative technologies towards benefitting life and away from extreme large-scale risks since its founding in 2014. Find out more about our mission or explore our work.