MIRI March 2017 Newsletter

Published

16 March, 2017

Author

Rob Bensinger

Research updates

New at IAFF: Some Problems with Making Induction Benign; Entangled Equilibria and the Twin Prisoners’ Dilemma; Generalizing Foundations of Decision Theory
New at AI Impacts: Changes in Funding in the AI Safety Field; Funding of AI Research
MIRI Research Fellow Andrew Critch has started a two-year stint at UC Berkeley’s Center for Human-Compatible AI, helping launch the research program there.
“Using Machine Learning to Address AI Risk“: Jessica Taylor explains our AAMLS agenda (in video and blog versions) by walking through six potential problems with highly performing ML systems.

General updates

Why AI Safety?: A quick summary (originally posted during our fundraiser) of the case for working on AI risk, including notes on distinctive features of our approach and our goals for the field.
Nate Soares attended “Envisioning and Addressing Adverse AI Outcomes,” an event pitting red-team attackers against defenders in a variety of AI risk scenarios.
We also attended an AI safety strategy retreat run by the Center for Applied Rationality.

News and links

Ray Arnold provides a useful list of ways the average person help with AI safety.
New from OpenAI: attacking machine learning with adversarial examples.
OpenAI researcher Paul Christiano explains his view of human intelligence:

I think of my brain as a machine driven by a powerful reinforcement learning agent. The RL agent chooses what thoughts to think, which memories to store and retrieve, where to direct my attention, and how to move my muscles.The “I” who speaks and deliberates is implemented by the RL agent, but is distinct and has different beliefs and desires. My thoughts are outputs and inputs to the RL agent, they are not what the RL agent “feels like from the inside.”
Christiano describes three directions and desiderata for AI control: reliability and robustness, reward learning, and deliberation and amplification.
Sarah Constantin argues that existing techniques won’t scale up to artificial general intelligence absent major conceptual breakthroughs.
The Future of Humanity Institute and the Centre for the Study of Existential Risk ran a “Bad Actors and AI” workshop.
FHI is seeking interns in reinforcement learning and AI safety.
Michael Milford argues against brain-computer interfaces as an AI risk strategy.
Open Philanthropy Project head Holden Karnofsky explains why he sees fewer benefits to public discourse than he used to.

Our newsletter

Regular updates about the Future of Life Institute, in your inbox

Subscribe to our newsletter and join over 20,000+ people who believe in our mission to preserve the future of life.

Recent newsletters

Future of Life Institute Newsletter: One Big Beautiful Bill…banning state AI laws?!

Plus: Updates on the EU AI Act Code of Practice; the Singapore Consensus; open letter from Evangelical leaders; and more.

Maggie Munro

31 May, 2025

Future of Life Institute Newsletter: Where are the safety teams?

Plus: Online course on worldbuilding for positive futures with AI; new publications about AI; our Digital Media Accelerator; and more.

Maggie Munro

1 May, 2025

Future of Life Institute Newsletter: Recommendations for the AI Action Plan

Plus: FLI Executive Director's new essay on keeping the future human; "Slaughterbots: A treaty on the horizon"; apply to our new Digital Media Accelerator; and more!

Maggie Munro

31 March, 2025

All Newsletters

MIRI March 2017 Newsletter

Contents

Our newsletter

Regular updates about the Future of Life Institute, in your inbox

Recent newsletters

Future of Life Institute Newsletter: One Big Beautiful Bill…banning state AI laws?!

Future of Life Institute Newsletter: Where are the safety teams?

Future of Life Institute Newsletter: Recommendations for the AI Action Plan

Sign up for the Future of Life Institute newsletter