The U.S. Worldwide Threat Assessment Includes Warnings of Cyber Attacks, Nuclear Weapons, Climate Change, etc.

Last Thursday – just one day before the WannaCry ransomware attack would shut down 16 hospitals in the UK and ultimately hit hundreds of thousands of organizations and individuals in over 150 countries – the Director of National Intelligence, Daniel Coats, released the Worldwide Threat Assessment of the US Intelligence Community.

Large-scale cyber attacks are among the first risks cited in the document, which warns that “cyber threats also pose an increasing risk to public health, safety, and prosperity as cyber technologies are integrated with critical infrastructure in key sectors.”

Perhaps the other most prescient, or at least well-timed, warning in the document relates to North Korea’s ambitions to create nuclear intercontinental ballistic missiles (ICBMs). Coats writes:

“Pyongyang is committed to developing a long-range, nuclear-armed missile that is capable of posing a direct threat to the United States; it has publicly displayed its road-mobile ICBMs on multiple occasions. We assess that North Korea has taken steps toward fielding an ICBM but has not flight-tested it.”

This past Sunday, North Korea performed a missile test launch, which many experts believe shows considerable progress toward the development of an ICBM. Though the report hints this may be less of an actual threat from North Korea and more for show. “We have long assessed that Pyongyang’s nuclear capabilities are intended for deterrence, international prestige, and coercive diplomacy,” says Coats in the report.

More Nuclear Threats

The Assessment also addresses the potential of nuclear threats from China and Pakistan. China “continues to modernize its nuclear missile force by adding more survivable road-mobile systems and enhancing its silo-based systems. This new generation of missiles is intended to ensure the viability of China’s strategic deterrent by providing a second-strike capability.” In addition, China is also working to develop “its first long-range, sea-based nuclear capability.”

Meanwhile, though Pakistan’s nuclear program doesn’t pose a direct threat to the U.S., advances in Pakistan’s nuclear capabilities could risk further destabilization along the India-Pakistan border.

The report warns: “Pakistan’s pursuit of tactical nuclear weapons potentially lowers the threshold for their use.” And of the ongoing conflicts between Pakistan and India, it says, “Increasing numbers of firefights along the Line of Control, including the use of artillery and mortars, might exacerbate the risk of unintended escalation between these nuclear-armed neighbors.”

This could be especially problematic because “early deployment during a crisis of smaller, more mobile nuclear weapons would increase the amount of time that systems would be outside the relative security of a storage site, increasing the risk that a coordinated attack by non-state actors might succeed in capturing a complete nuclear weapon.”

Even a small nuclear war between India and Pakistan could trigger a nuclear winter that could send the planet into a mini ice age and starve an estimated 1 billion people.

Artificial Intelligence

Nukes aren’t the only weapons the government is worried about. The report also expresses concern about the impact of artificial intelligence on weaponry: “Artificial Intelligence (Al) is advancing computational capabilities that benefit the economy, yet those advances also enable new military capabilities for our adversaries.”

Coats worries that AI could negatively impact other aspects of society, saying, “The implications of our adversaries’ abilities to use AI are potentially profound and broad. They include an increased vulnerability to cyber attack, difficulty in ascertaining attribution, facilitation of advances in foreign weapon and intelligence systems, the risk of accidents and related liability issues, and unemployment.”

Space Warfare

But threats of war are not expected to remain Earth-bound. The Assessment also addresses issues associated with space warfare, which could put satellites and military communication at risk.

For example, the report warns that “Russia and China perceive a need to offset any US military advantage derived from military, civil, or commercial space systems and are increasingly considering attacks against satellite systems as part of their future warfare doctrine.”

The report also adds that “the global threat of electronic warfare (EW) attacks against space systems will expand in the coming years in both number and types of weapons.” Coats expects that EW attacks will “focus on jamming capabilities against dedicated military satellite communications” and against GPS, among others.

Environmental Risks & Climate Change

Plenty of global threats do remain Earth-bound though, and not all are directly related to military concerns. The government also sees environmental issues and climate change as potential threats to national security.

The report states, “The trend toward a warming climate is forecast to continue in 2017. … This warming is projected to fuel more intense and frequent extreme weather events that will be distributed unequally in time and geography. Countries with large populations in coastal areas are particularly vulnerable to tropical weather events and storm surges, especially in Asia and Africa.”

In addition to rising temperatures, “global air pollution is worsening as more countries experience rapid industrialization, urbanization, forest burning, and agricultural waste incineration, according to the World Health Organization (WHO). An estimated 92 percent of the world’s population live in areas where WHO air quality standards are not met.”

According to the Assessment, biodiversity loss will also continue to pose an increasing threat to humanity. The report suggests global biodiversity “will likely continue to decline due to habitat loss, overexploitation, pollution, and invasive species, … disrupting ecosystems that support life, including humans.”

The Assessment goes on to raise concerns about the rate at which biodiversity loss is occurring. It says, “Since 1970, vertebrate populations have declined an estimated 60 percent … [and] populations in freshwater systems declined more than 80 percent. The rate of species loss worldwide is estimated at 100 to 1,000 times higher than the natural background extinction rate.”

Other Threats

The examples above are just a sampling of the risks highlighted in the Assessment. A great deal of the report covers threats of terrorism, issues with Russia, China and other regional conflicts, organized crime, economics, and even illegal fishing. Overall, the report is relatively accessible and provides a quick summary of the greatest known risks that could threaten not only the U.S., but also other countries in 2017. You can read the report in its entirety here.

MIRI’s May 2017 Newsletter

Research updates

General updates

  • Our strategy update discusses changes to our AI forecasts and research priorities, new outreach goals, a MIRI/DeepMind collaboration, and other news.
  • MIRI is hiring software engineers! If you’re a programmer who’s passionate about MIRI’s mission and wants to directly support our research efforts, apply here to trial with us.
  • MIRI Assistant Research Fellow Ryan Carey has taken on an additional affiliation with the Centre for the Study of Existential Risk, and is also helping edit an issue of Informatica on superintelligence.

News and links

GP-write and the Future of Biology

Imagine going to the airport, but instead of walking through – or waiting in – long and tedious security lines, you could walk through a hallway that looks like a terrarium. No lines or waiting. Just a lush, indoor garden. But these plants aren’t something you can find in your neighbor’s yard – their genes have been redesigned to act as sensors, and the plants will change color if someone walks past with explosives.

The Genome Project Write (GP-write) got off to a rocky start last year when it held a “secret” meeting that prohibited journalists. News of the event leaked, and the press quickly turned to fears of designer babies and Frankenstein-like creations. This year, organizers of the meeting learned from the 2016 debacle. Not only did they invite journalists, but they also highlighted work by researchers like June Medford, whose plants research could lead to advancements like the security garden above.

Jef Boeke, one of the lead authors of the GP-write Grand Challenge, emphasized that this project was not just about writing the human genome. “The notion that we could write a human genome is simultaneously thrilling to some and not so thrilling to others,” Boeke told the group. “We recognize that this will take a lot of discussion.”

Boeke explained that the GP-write project will focus on the genomes of cells, and the researchers involved are not trying to produce an organism. He added that this work could be used to solve problems associated with climate change and the environment, invasive species, pathogens, and food insecurity.

To learn more about why this project is important, I spoke with genetics researcher, John Min, about what GP-write is and what it could accomplish. Min is not directly involved with GP-write, but he works with George Church, another one of the lead authors of the project.

Min explained, “We aren’t currently capable of making DNA as long as human chromosomes – we can’t make that from scratch in the laboratory. In this case, they’ll use CRISPR to make very specific cuts in the genome of an existing cell, and either use synthesized DNA to replace whole chunks or add new functionality in.”

He added, “An area of potentially exciting research with this new project is to create a human cell immune to all known viruses. If we can create this in the lab, then we can start to consider how to apply it to people around the world. Or we can use it to build an antibody library against all known viruses. Right now, tackling such a project is completely unaffordable – the costs are just too astronomic.”

But costs aren’t the only reason GP-write is hugely ambitious. It’s also incredibly challenging science. To achieve the objectives mentioned above, scientists will synthesize, from basic chemicals, the building blocks of life. Synthesizing a genome involves slowly editing out tiny segments of genes and replacing them with the new chemical version. Then researchers study each edit to determine what, if anything, changed for the organism involved. Then they repeat this for every single known gene. It is a tedious, time-consuming process, rife with errors and failures that send scientists back to the drawing board over and over, until they finally get just one gene right. On top of that, Min explained, it’s not clear how to tell when a project transitions from editing a cell, to synthesizing it. “How many edits can you make to an organism’s genome before you can say you’ve synthesized it?” he asked.

Clyde Hutchison, working with Craig Venter, recently came closest to answering that question. He and Venter’s team published the first paper depicting attempts to synthesize a simple bacterial genome. The project involved understanding which genes were essential, which genes were inessential, and discovering that some genes are “quasi-essential.” In the process, they uncovered “149 genes with unknown biological functions, suggesting the presence of undiscovered functions that are essential for life.”

This discovery tells us two things. First, it shows just how enormous the GP-write project is. To find 149 unknown genes in simple bacteria offers just a taste of how complicated the genomes of more advanced organisms will be. Kris Saha, Assistant Professor of Biomedical Engineering at the University of Wisconsin-Madison, explained this to the Genetic Experts News Service:

“The evolutionary leap between a bacterial cell, which does not have a nucleus, and a human cell is enormous. The human genome is organized differently and is much more complex. […] We don’t entirely understand how the genome is organized inside of a typical human cell. So given the heroic effort that was needed to make a synthetic bacterial cell, a similar if not more intense effort will be required – even to make a simple mammalian or eukaryotic cell, let alone a human cell.”

Second, this discovery gives us a clue as to how much more GP-write could tell us about how biology and the human body work. If we can uncover unknown functions within DNA, how many diseases could we eliminate? Could we cure aging? Could we increase our energy levels? Could we boost our immunities? Are there risks we need to prepare for?

The best assumption for that last question is: yes.

“Safety is one of our top priorities,” said Church at the event’s press conference, which included other leaders of the project. They said they expect safeguards to be engineered into research “from the get-go,” and part of the review process would include assessments of whether research within the project could be developed to have both positive or negative outcomes, known as Dual Use Research of Concern (DURC)

The meeting included roughly 250 people from 10 countries with backgrounds in science, ethics, law, government, and more. In general, the energy at the conference was one of excitement about the possibilities that GP-write could unleash.

“This project not only changes the way the world works, but it changes the way we work in the world,” said GP-write lead author Nancy J. Kelley.

Forget the Cold War – Experts say Nuclear Weapons Are a Bigger Risk Today

Until recently, many Americans believed that nuclear weapons don’t represent the same threat as during the Cold War. However, recent events and aggressive posturing between nuclear nations —especially the U.S., Russia, and North Korea—has increased public awareness and concern. These fears were addressed at a recent MIT conference on nuclear weapons.

“The possibility of a nuclear bomb going off is greater today than 20 years ago,” said Ernest Moniz, former Secretary of Energy and a keynote speaker.

California Congresswoman Barbara Lee, another keynote speaker, recently returned from a trip to South Korea and Japan. Of the trip, she said, “We went to the DMZ, and I saw how close to nuclear war we really are.”

Lee suggested that if we want to eliminate nuclear weapons once and for all, this is the time to do it. At the very least, she argued for a common sense nuclear policy of “no first use,” that is, the U.S. won’t launch the first nuclear strike.

“We must prevent the president from launching nuclear weapons without a declaration from Congress,” Lee said.

Under current U.S. nuclear policy, the President is the only person who can launch a nuclear weapon, and no one else’s input is necessary. This policy was adapted, at least in part, to ensure the safety and usability of the land-based arm of the nuclear triad (the other two arms are air- and sea-based).

During the Cold War, the fear was that, if Russia were to attack the U.S., it would first target the land-based missiles in an attempt to limit their use during war. To protect these weapons, the U.S. developed an advanced-warning system that could notify military personnel of an incoming strike, giving the president just enough time to launch the land-based missiles in response.

Weapons launched from Russia would take about 30 minutes to reach the U.S. That means that, in 30 minutes, the warning system must pick up the signal of incoming missiles. Then, personnel must confirm that the warning is accurate, and not an error – which has happened many times. And by the time the information reaches the President, he’ll have around 10 minutes to decide whether to launch a retaliation.

Lisbeth Gronlund with the Union of Concerned Scientists pointed out that not only does this time frame put us at greater risk of an accidental launch, but “cyber attacks are a new unknown.” As a result, she’s also concerned that the risk of a nuclear launch is greater today than during the Cold War.

“If we eliminate our land-based missiles and base our deterrence on nuclear submarines and bombers, which are safe from a Russian attack, then we eliminate the risk of nuclear war caused by false alarms and rushed decisions,” said MIT physics professor Max Tegmark.

But even with growing risks, people who are concerned about nuclear weapons still feel they must compete for public attention with groups who are worried about climate change and income inequality and women’s rights issues. Jonathan King, a molecular biologist at MIT who has worked to strengthen the Biological Weapons Convention, emphasized that the idea of competition is the wrong approach. Rather, the cost of and government focus on nuclear weapons actually prevents us from dealing with these other issues.

“The reason we don’t have these things is because tax dollars are going to things like nuclear weapons,” King explained, arguing that if we could free up money that’s currently allotted for nukes, we could finally address technological costs of solving climate problems or building better infrastructure.

The 2017 budget for the Unites States calls for an increase in military spending of $54 billion. However, as William Hartung, a nuclear weapons and military spending expert explained, the current U.S. budget is already larger than the next eight countries combined. And just the proposed increase in spending for 2017 exceeds the total military spending for almost all countries.

The United States nuclear arsenal, itself, requires tens of billions of dollars per year, and the U.S. currently plans to spend $1 trillion over the next 30 years to upgrade the nuclear arsenal to be better suited for a first strike. Burning $1 million per hour for the next 30 years would cost roughly a quarter of this budget, leading Hartung to suggest that “burning the money is a better investment.”

Cambridge Mayor Denise Simmons summed up all of these concerns, saying, “[it] feels like we’re playing with matches outside an explosives factory.”

Reducing the Threat of Nuclear War 2017

Spring Conference at MIT, Saturday, May 6

The growing hostility between the US and Russia — and with North Korea and Iran — makes it more urgent than ever to reduce the risk of nuclear war, as well as to rethink plans to spend a trillion dollars replacing US nuclear weapons with new ones that will be more suited for launching a first-strike. Nuclear war can be triggered intentionally or through miscalculation — terror or error — and this conference aims to advocate and organize toward reducing and ultimately eliminating this danger.

This one-day event includes lunch as well as food for thought from a great speaker lineup, including Iran-deal broker Ernie Moniz (MIT, fmr Secretary of Energy), California Congresswoman Barbara Lee, Lisbeth Gronlund (Union of Concerned Scientists), Joe Cirincione (Ploughshares), our former congressman John Tierney, MA state reps Denise Provost and Mike Connolly, and Cambridge Mayor Denise Simmons. It is not an academic conference, but rather one that addresses the political and economic realities, and attempts to stimulate and inform the kinds of social movement needed to change national policy. The focus will be on concrete steps we can take to reduce the risks.


8:45 AM – Registration and coffee

9:15 AM – Welcome from City of Cambridge: Mayor Denise Simmons

9:30 AM – Program for the Day: Prof. Jonathan King (MIT, Peace Action)

9:45 AM – Session I. The Pressing Need for Nuclear Disarmament

– Costs and Profits from Nuclear Weapons Manufacture: William Hartung (Center for International Policy).

– Reasons to Reject the Trillion Dollar Nuclear Weapons Escalation: Joseph Cirincione (Ploughshares Fund).

– Nuclear Weapons Undermine Democracy: Prof. Elaine Scarry (Harvard University)

10:45 AM – Session II. Destabilizing Factors

Chair: Prof. Frank Von Hippel (Princeton University)

– Dangers of Hair Trigger Alert: Lisbeth Gronlund (Union of Concerned Scientists).

– Nuclear Modernization vs. National Security: Prof. Aron Bernstein (MIT, Council for a Livable World).

– Accidents and Unexpected Events: Prof. Max Tegmark (MIT, Future of Life Institute).

– International Tensions and Risks of further Nuclear Proliferation: TBA.

12:00 PM – Lunch Workshops (listed below)

2:00 PM – Session III. Economic and Social Consequences of Excessive Weapons Spending

Chair: Prof. Melissa Nobles (MIT):

– Build Housing Not Bombs: Rev. Paul Robeson Ford (Union Baptist Church).

– Education as a National Priority: Barbara Madeloni (Mass Teachers Association).

– Invest in Minds Not Missiles: Prof. Jonathan King (MIT, Mass Peace Action).

– Build Subways Not Submarines: Fred Salvucci (former Secretary of Transportation).

3:00 PM – Session IV. Current Prospects for Progress

Chair: John Tierney (former US Representative, Council for a Livable World)

– House Steps Toward Nuclear Disarmament: U. S. Representative Barbara Lee.

– Maintaining the Iran Nuclear Agreement: Ernie Moniz (MIT, former Secretary of Energy).

4:15 PM – Session V. Organizing to Reduce the Dangers

Chair: Jim Anderson (President, Peace Action New York State):

– Divesting from Nuclear Weapons Investments: Susi Snyder (Don’t Bank on the Bomb).

– Taxpayers Information and Transparency Acts: State Reps Denise Provost/Mike Connolly.

– Mobilizing the Scientific Community: Prof. Max Tegmark (MIT, Future of Life Institute).

– A National Nuclear Disarmament Organizing Network 2017 -2018: Program Committee.

5:00 PM – Adjourn

Conference Workshops:

a) Campus Organizing – Chair: Kate Alexander (Peace Action New York State); Caitlin Forbes (Mass Peace Action); Remy Pontes (Brandeis University); Haleigh Copley-Cunningham (Tufts U), Lucas Perry (Don’t Bank on the Bomb, Future of Life Institute); MIT Students (Nuclear Weapons Matter).

b) Bringing nuclear weapons into physics and history course curricula – Chair: Frank Davis (past President of TERC); Prof. Gary Goldstein (Tufts University); Prof. Aron Bernstein (MIT); Prof. Vincent Intondi (American University); Ray Matsumiya (Oleander Initiative, University of the Middle East Project).

c) Dangerous Conflicts – Chair, Erica Fein (Women’s Action for New Directions); Jim Walsh (MT Security Studies Program); John Tierney (former US Representative, Council for a Livable World); Subrata Ghoshroy (MIT); Arnie Alpert (New Hampshire AFSC).

d) Municipal and State Initiatives – Chair: Cole Harrison (Mass Peace Action); Rep. Denise Provost (Mass State Legislature); Councilor Dennis Carlone (Cambridge City Councillor and Architect/Urban Designer); Jared Hicks (Our Revolution); Prof. Ceasar McDowell (MIT Urban Studies); Nora Ranney (National Priorities Project).

e) Peace with Justice: People’s Budget and Related Campaigns to Shift Federal budget Priorities – Chair: Andrea Miller (People Demanding Action); Rep. Mike Connolly (Mass State Legislature); Paul Shannon (AFSC); Madelyn Hoffman (NJPA); Richard Krushnic (Mass Peoples Budget Campaign).

f) Reducing Nuclear Weapons through Treaties and Negotiation – Chair: Prof. Nazli Choucri (MIT), Kevin Martin (National Peace Action); Shelagh Foreman (Mass Peace Action); Joseph Gerson (AFSC); Michel DeGraff (MIT Haiti Project).

g) Strengthening the Connection between Averting Climate Change and Averting Nuclear War – Chair: Prof. Frank Von Hippel (Princeton University); Ed Aquilar (Pennsylvania Peace Action); Geoffrey Supran (Fossil Free MIT); Rosalie Anders (Mass Peace Action).

h) Working with Communities of Faith – Chair: Rev. Thea Keith-Lucas (MIT Radius); Rev. Herb Taylor (Harvard-Epworth United Methodist Church); Pat Ferrone (Mass Pax Christi); Rev. Paul Robeson Ford (Union Baptist Church).


50 Vassar St. Building #34 Rm 101
Cambridge, Massachusetts, 02139


By Red Line: Exit the Kendall Square Red Line Station and walk west (away from Boston) past Ames Street to Vassar Street. Turn left and walk halfway down Vassar to #50 MIT building 34 (broad stairs, set back entrance).

By #1 Bus: Exit in front of MIT Main Entrance. Walk 1/2 block back on Mass Ave to Vassar Street. Turn right and walk half block to #50 MIT Building 34 (broad stairs, set back entrance).

By car: Public Parking Structures are available nearby on Ames Street, between Main and Broadway. A smaller surface lot is on the corner of Mass Ave and Vassar St.


Kate Alexander

Kate Alexander – Alexander is a peace advocate and researcher with 10 years experience in community organizing. Her previous work experience includes war crimes research and assistance in a genocide trial in Bosnia and community peace-building work in Northern Uganda. She is a graduate of Brandeis University with a degree in International and Global Studies and a minor in Legal Studies. Kate is currently studying at the Columbia University School of International and Public Affairs.

Arnie Alpert

Arnie Alpert – Alpert serves as AFSC’s New Hampshire co-director and co-coordinator of the Presidential Campaign Project, and has coordinated AFSC’s New Hampshire program since 1981. He is a leader in movements for economic justice and affordable housing, civil and worker rights, peace and disarmament, abolition of the death penalty, and an end to racism and homophobia.

Rosalie Anders

Rosalie Anders – Anders worked as an Associate Planner with the City of Cambridge’s Community Development Department, and is author of the city’s Pedestrian Plan, a set of guidelines intended to promote walking in the city. She has a Master’s degree in social work and worked as a family therapist for many years. She organizes around peace and environmental issues and is active with 350 Massachusetts. She chairs the Massachusetts Peace Action Education Fund board and co-founded our Climate and Peace Working Group in early 2016.

Ed Aquilar

Ed Aquilar – Ed Aguilar is director for the Coalition for Peace Action in the Greater Philadelphia region. After successful collaboration on the New START Treaty (2010), in 2012, he opened the Philadelphia CFPA office, and organized a Voting Rights campaign, to allow 50,000 college students to vote, who were being denied by the “PA Voter ID Law”, later reversed. Ed has worked on rallies and conferences at Friends Center; Temple, Philadelphia, and Drexel Universities; and the Philadelphia Ethical Society—on the climate crisis, drones, mass incarceration, nuclear disarmament, and diplomacy with Iran.


Aron Bernstein – Bernstein is a Professor of Physics Emeritus at MIT where he has been on the faculty since 1961. He has taught a broad range of physics courses from freshman to graduate level. His research program has been in nuclear and particle physics, with an emphasis on studying the basic symmetries of matter, and currently involves collaborations with University and government laboratories, and colleagues in many countries.

Dennis Carlone

Dennis Carlone – Carlone is currently serving his second term on the Cambridge City Council, where he has earned recognition as an advocate for social justice through his expertise in citywide planning, transit policy, and sustainability initiatives.


Nazli Choucri – Nazli Choucri is Professor of Political Science. Her work is in the area of international relations, most notably on sources and consequences of international conflict and violence. Professor Choucri is the architect and Director of the Global System for Sustainable Development (GSSD), a multi-lingual web-based knowledge networking system focusing on the multi-dimensionality of sustainability. As Principal Investigator of an MIT-Harvard multi-year project on Explorations in Cyber International Relations, she directed a multi-disciplinary and multi-method research initiative. She is Editor of the MIT Press Series on Global Environmental Accord and, formerly, General Editor of the International Political Science Review. She also previously served as the Associate Director of MIT’s Technology and Development Program.


Joseph Cirincione – Cirincione is president of Ploughshares Fund, a global security foundation. He is the author of the new book Nuclear Nightmares: Securing the World Before It Is Too Late, Bomb Scare: The History and Future of Nuclear Weapons and Deadly Arsenals: Nuclear, Biological and Chemical Threats. He is a member of Secretary of State John Kerry’s International Security Advisory Board and the Council on Foreign Relations.

Mike Connolly

Mike Connolly – Connolly is an attorney and community organizer who proudly represents Cambridge and Somerville in the Massachusetts House of Representatives. He is committed to social and economic justice and emphasizes the importance of broad investments in affordable housing, public transportation, early education, afterschool programs, and other critical services.

Haleigh Copley-Cunningham

Frank Davis

Michel DeGraff

Michel DeGraff – DeGraff is the Director of the MIT-Haiti Initiative, a Founding Member of Akademi Kreyol Ayisyen, and a Professor of Linguistics at MIT. His research interests include syntax, morphology, and language change and is the author of over 40 publications.

Erica Fein – Fein is WAND’s Nuclear Weapons Policy Director. In this capacity, she works with Congress, the executive branch, and the peace and security community on arms control, nonproliferation, and Pentagon and nuclear weapons budget reduction efforts. Previously, Erica served as a legislative assistant to Congressman John D. Dingell where she advised on national security, defense, foreign policy, small business, and veterans’ issues. Erica’s commentary has been published in the New York Times, Defense One, Defense News, The Hill, and the Huffington Post. She has also appeared on WMNF 88.5 in Tampa. Erica holds a M.A in International Security from the University of Denver’s Josef Korbel School of International Studies and a B.A. in International Studies from University of Wisconsin – Madison. She is a political partner at the Truman National Security Project. Erica can be found on Twitter @enfein.


Charles Ferguson – Ferguson has been the president of the Federation of American Scientists since January 1, 2010. From February 1998 to August 2000, Dr. Ferguson worked for FAS on nuclear proliferation and arms control issues as a senior research analyst. Previously, from 2002 to 2004, Dr. Ferguson had been with the Monterey Institute’s Center for Nonproliferation Studies (CNS) as its scientist-in-residence. At CNS, he co-authored the book The Four Faces of Nuclear Terrorism and was also lead author of the award-winning report “Commercial Radioactive Sources: Surveying the Security Risks,” which was published in January 2003 and was one of the first post-9/11 reports to assess the radiological dispersal device, or “dirty bomb,” threat. This report won the 2003 Robert S. Landauer Lecture Award from the Health Physics Society. From June 2011 to October 2013, he served as Co-Chairman of the U.S.-Japan Nuclear Working Group, organized by the Mansfield Foundation, FAS, and the Sasakawa Peace Foundation. In May 2011, his book Nuclear Energy: What Everyone Needs to Know was published by Oxford University Press. In 2013, he was elected a Fellow of the American Physical Society for his work in educating the public and policy makers about nuclear issues. Dr. Ferguson received his undergraduate degree in physics from the United States Naval Academy in Annapolis, Maryland, and his M.A. and Ph.D. degrees, also in physics, from Boston University in Boston, Massachusetts.

Pat Ferrone – Pat has been involved in peace and justice issues from a gospel nonviolent perspective for the past 40+ years. Currently, she acts as Co-coordinator of Pax Christi MA, a regional group of Pax Christi USA, the Catholic peace organization associated with Pax Christi International. Pax Christi, “grounded in the gospel and Catholic social teaching…rejects war, preparation for war, every form of violence and domination, and personal and systemic racism..we seek to model the Peace of Christi in our witness to the mandate of the nonviolence of the Cross.” She also chairs the St. Susanna Parish Pax Christi Committee, which recently sponsored two programs on the nuclear issue.


Caitlin Forbes – Forbes is the Student Outreach Coordinator for Massachusetts Peace Action, a nonpartisan, nonprofit organization working to develop peaceful US policies. Before beginning her work with MAPA, Caitlin gained a strong background with students through her work as an instructor of first year literature at the University of Connecticut and as the assistant alpine ski coach for Brown University. Caitlin has received both her B.A. and her M.A. in Literature and focused her work on the intersection between US-Middle Eastern foreign policy and contemporary American literature.

Rev. Paul Robeson Ford

Rev. Paul Robeson Ford – The Rev. Paul Robeson Ford is the Senior Pastor of the Union Baptist Church in Cambridge, Massachusetts. Shortly after his third year at Union, he assumed leadership as Executive Director of the Boston Workers Alliance, a Roxbury-based grassroots organization dedicated to creating economic opportunity and winning criminal justice reform in Massachusetts; he served there until June 2016.
He received a Bachelor of Arts from Grinnell College and a Master of Divinity Degree from the Divinity School at the University of Chicago.

Shelagh Foreman

Shelagh Foreman – Shelagh is the program director of Massachusetts Peace Action. She was a founding member in the early 1980s of Mass Freeze, the statewide nuclear freeze organization, which merged with SANE to form Massachusetts Peace Action. She has worked consistently on nuclear disarmament and on bringing Peace Action’s message to our elected officials. She studied art at The Cooper Union and Columbia University, taught art and art history, and is a painter and printmaker. She represents MAPA on the Political Committee of Mass Alliance and is a core group member of 20/20 Action. She serves on the boards of Mass. Peace Action and Mass. Peace Action Ed Fund and on MAPA’s executive committee and is chair of MAPA’s Middle East Task Force. She has 5 children and 7 grandchildren and with her husband Ed Furshpan lives in Cambridge and also spends time in Falmouth.


Joseph Gerson – Gerson has served the American Friends Service committee since 1976 and is currently Director of Programs and Director of the Peace and Economic Security Program for the AFSC in New England. His program work focuses on challenging and overcoming U.S. global hegemony, its preparations for and threats to initiate nuclear war, and its military domination of the Asia-Pacific and the Middle East.

Subrata Ghoshroy – Ghoshroy is a research affiliate at the Massachusetts Institute of Technology’s Program in Science, Technology, and Society. Before that, he was for many years a senior engineer in the field of high-energy lasers. He was also a professional staff member of the House National Security Committee and later a senior analyst with the Government Accountability Office.


Prof. Gary R. Goldstein is a theoretical physicist, specializing in high energy particle physics and nuclear physics. As a researcher, teacher and a long time member of Tufts Physics and Astronomy Department, he taught all levels of Physics course along with courses for non-scientists including Physics for Humanists, The Nuclear Age: History and Physics (with Prof. M. Sherwin – History), Physics of Music and Color. He is a political activist on nuclear issues, social equity, anti-war, and environmentalism. He spent several years working in the Program for Science, Technology and International Security and at University of Oxford Department of Theoretical Physics. He was also a Science Education researcher affiliated with the Tufts Education department and TERC, Cambridge, working with K-12 students and teachers in public schools. He is a member of the board of the Mass Peace Action fund for education. Over many years he has been giving talks for a general audience about the dangers of nuclear weapons and war.


Lisbeth Gronlund – Gronlund focuses on technical and policy issues related to nuclear weapons, ballistic missile defenses, and space weapons. She has authored numerous articles and reports, lectured on nuclear arms control and missile defense policy issues before lay and expert audiences, and testified before Congress. A long list of news organizations, including the New York Times and NPR, have cited Gronlund since she joined UCS in 1992.


Cole Harrison – Cole is Executive Director of Massachusetts Peace Action. He was on the coordinating committee of the 2012 Budget for All Massachusetts campaign, co-coordinates the People’s Budget Campaign, and leads Peace Action’s national Move the Money Working Group. He is a member of the planning committee of United for Justice with Peace (UJP) and coordinated the Afghanistan Working Group of United for Peace and Justice (UFPJ) from 2010 to 2012. Born in Delhi, India, he has a B.A. from Harvard in applied mathematics and a M.S. from Northeastern in computer science. He worked for the Symphony Tenants Organizing Project and the Fenway News in the 1970?s, participated in the Jamaica Plain Committee on Central America (JP COCA) in the 1980s, and worked as a software developer and manager at CompuServe Data Technologies, Praxis Inc., and before joining Peace Action in 2010. He lives in Roslindale, Massachusetts.

William Hartung

William Hartung – He is the author of Prophets of War: Lockheed Martin and the Making of the Military-Industrial Complex (Nation Books, 2011) and the co-editor, with Miriam Pemberton, of Lessons from Iraq: Avoiding the Next War (Paradigm Press, 2008). His previous books include And Weapons for All (HarperCollins, 1995), a critique of U.S. arms sales policies from the Nixon through Clinton administrations. From July 2007 through March 2011, Mr. Hartung was the director of the Arms and Security Initiative at the New America Foundation. Prior to that, he served as the director of the Arms Trade Resource Center at the World Policy Institute.

Madelyn Hoffman

Jared Hicks

Jared Hicks

Prof. Vincent Intondi

Thea Keith-Lucas

Thea Keith-Lucas – Keith-Lucas was raised on the campus of the University of the South in a family of scientists and engineers. She served as Curate to Trinity Church in Randolph, one of the most ethnically diverse parishes of the Diocese of Massachusetts, and then in 2007 was called as Rector of Calvary Episcopal Church in Danvers, where she initiated creative outreach efforts and facilitated a merger. Thea joined the staff of Radius in January 2013.


Jonathan A. King – King is professor of molecular biology at MIT, the author of over 250 scientific papers, and a specialist in protein folding. Prof. King is a former President of the Biophysical Society, former Guggenheim Fellow, and a recipient of MIT’s MLKJr Faculty Leadership Award. He was a leader in the mobilization of biomedical scientists to renounce the military use of biotechnology and strengthen the Biological Weapons Convention. He was a founder of a Jobs with Peace campaign in the 1980s and now chairs Massachusetts Peace Action’s Nuclear Weapons Abolition working group. He is also an officer of the Cambridge Residents Alliance and of Citizens for Public Schools.

Richard Krushnic

Barbara Lee

Barbara Lee – Lee is the U.S. Representative for California’s 13th congressional district, serving East Bay voters from 1998 to 2013 during a time when the region was designated California’s 9th congressional district. She is a member of the Democratic Party. She was the first woman to represent the 9th district and is also the first woman to represent the 13th district. Lee was the Chair of the Congressional Black Caucus and was the Co-Chair of the Congressional Progressive Caucus. Lee is notable as the only member of either house of Congress to vote against the authorization of use of force following the September 11, 2001 attacks.[1] This made her a hero among many in the anti-war movement.[2] Lee has been a vocal critic of the war in Iraq and supports legislation creating a Department of Peace.

Kevin Martin – Martin, President of Peace Action and the Peace Action Education Fund, joined the staff on Sept 4, 2001. Kevin previously served as Director of Project Abolition, a national organizing effort for nuclear disarmament, from August 1999 through August 2001. Kevin came to Project Abolition after ten years in Chicago as Executive Director of Illinois Peace Action. Prior to his decade-long stint in Chicago, Kevin directed the community outreach canvass for Peace Action (then called Sane/Freeze) in Washington, D.C., where he originally started as a door-to-door canvasser with the organization in 1985. Kevin has traveled abroad representing Peace Action and the U.S. peace movement on delegations and at conferences in Russia, Japan, China, Mexico and Britain. He is married, with two children, and lives in Silver Spring, Maryland.

Barbara Madeloni

Barbara Madeloni – Madeloni is president of the 110,000-member Massachusetts Teachers Association and a staunch advocate for students and educators in the public schools and public higher education system in Massachusetts. She believes that strong unions led by rank-and-file members produce stronger public schools and communities. She is committed to racial and economic justice – and to building alliances with parents, students and communities – to secure a more just world.

Ray Matsumiya

Ray Matsumiya

Ceasar McDowell

Ceasar McDowell – McDowell is Professor of the Practice of Community Development at MIT. He holds an Ed.D. (88) and M.Ed. (84) from Harvard. McDowell’’s current work is on the development of community knowledge systems and civic engagement. He is also expanding his critical moments reflection methodology to identify, share and maintaining grassroots knowledge. His research and teaching interests also include the use of mass media and technology in promoting democracy and community-building, the education of urban students, the development and use of empathy in community work, civil rights history, peacemaking and conflict resolution. He is Director of the global civic engagement organization dropping knowledge international Dropping Knowledge International, MIT’s former Center for Reflective Community Practice (renamed Co-Lab) and co-founder of The Civil Rights Forum on Telecommunications Policy and founding Board member of The Algebra Project Algebra.

Andrea Miller

Ernie Moniz

Ernie Moniz – Moniz is an American nuclear physicist and the former United States Secretary of Energy, serving under U.S. President Barack Obama from May 2013 to January 2017. He served as the Associate Director for Science in the Office of Science and Technology Policy in the Executive Office of the President of the United States from 1995 to 1997 and was Under Secretary of Energy from 1997 to 2001 during the Clinton Administration. Moniz is one of the founding members of The Cyprus Institute and has served at Massachusetts Institute of Technology as the Cecil and Ida Green Professor of Physics and Engineering Systems, as the Director of the Energy Initiative, and as the Director of the Laboratory for Energy and the Environment.

Melissa Nobles

Melissa Nobles – Nobles is Kenan Sahin Dean of the School of Humanities, Arts, and Social Sciences, and Professor of Political Science at the Massachusetts Institute of Technology. Her current research is focused on constructing a database of racial murders in the American South, 1930–1954. Working closely as a faculty collaborator and advisory board member of Northeastern Law School’s Civil Rights and Restorative Justice law clinic, Nobles has conducted extensive archival research, unearthing understudied and more often, unknown racial murders and contributing to several legal investigations. She is the author of two books, Shades of Citizenship: Race and the Census in Modern Politics (Stanford University Press, 2000), The Politics of Official Apologies, (Cambridge University Press, 2008), and co-editor with Jun-Hyeok Kwak of Inherited Responsibility and Historical Reconciliation in East Asia (Routledge Press, 2013).

Remy Pontes


Lucas Perry – Perry is passionate about the role that science and technology will play in the evolution of all sentient life. He has studied at a Buddhist monastery in Nepal and while there he engaged in meditative retreats and practices. He is now working to challenge and erode our sense of self and our subject-object frame of reference. His current project explores how mereological nihilism and the illusion of self may contribute to forming a radically post-human consequentialist ethics. His other work seeks to resolve the conflicts between bio-conservatism and transhumanism.

Denise Provost

Denise Provost

John Ratliff – Ratliff was political director of an SEIU local union in Miami, Florida, and relocated to Cambridge after his retirement in 2012. He is a graduate of Princeton University and Yale Law School. A Vietnam veteran and member of Veterans for Peace, he is a member of the coordinating committee of Massachusetts Senior Action’s Cambridge branch, and chair of Massachusetts Jobs with Justice’s Global Justice Task Force. As Mass. Peace Action’s economic justice coordinator he leads our coalition work with Raise Up Massachusetts for an increased minimum wage and sick time benefits, and against the Trans Pacific Partnership. He is the father of high school senior Daniel Bausher-Belton, who was an intern at Mass. Peace Action in summer 2013.

Fred Salvucci

Fred Salvucci – Salvucci, senior lecturer and senior research associate, is a civil engineer with interest in infrastructure, urban transportation and public transportation. He has over 30 years of contextual transportation experience, most of it in the public sector as former Secretary of Transportation for the Commonwealth of Massachusetts (1983-1990) and transportation advisor to Boston Mayor Kevin White (1975-1978). Some of his notable achievements include shifting public focus from highway spending towards rail transit investment and spearheading the depression of the Central Artery in Boston. He has participated in the expansion of the transit system, the development of the financial and political support for the Central Artery/Tunnel Project, and the design of implementation strategies to comply with the Clean Air Act consistent with economic growth. Other efforts include formulation of noise rules to reverse the increase in aircraft noise at Logan Airport and development of strategies to achieve high-speed rail service between Boston and New York.


Elaine Scarry – Scarry is an American essayist and professor of English and American Literature and Language. She is the Walter M. Cabot Professor of Aesthetics and the General Theory of Value at Harvard University. Her books include The Body in Pain, Thermonuclear Monarchy, and On Beauty and Being Just.

Paul Shannon – Shannon is program staff for the Peace and Economic Security program of the American Friends Service Committee (AFSC) in Cambridge, hosts regular educational forums at the Cambridge Public Library for the AFSC and has coordinated the National AFSC Film Lending Library for the past 26 years. For over 3 decades he has been active in various peace, union, prison reform, solidarity, economic justice and human rights movements particularly the Vietnam anti-war movement, the 1970’s United Farm Workers movement, the South Africa anti-apartheid movement, the 1980’s Central America and Cambodia solidarity movements, the Haiti Solidarity movement of the early 90’s and the Afghanistan and Iraq anti-war movement. Paul has been teaching social science courses at colleges in the greater Boston area for the past 27 years. Since 1982 he has been teaching a course on the history of the Vietnam War at Middlesex Community College and occasionally teaches professional development courses on the Vietnam war for high school teachers at Northeastern University and Merrimack Educational Center. He is past editor of the Indochina Newsletter and has written numerous articles for peace movement publications. He is on the Board of Directors of the community/fan organization, Save Fenway Park. He currently represents the American Friends Service Committee on the Coordinating Committee of the United for Justice with Peace Coalition.


Denise Simmons – As Mayor of the City of Cambridge, Denise Simmons won praise for her open-door policy, for her excellent constituent services, and for her down-to-earth approach to her duties. She continues to bring these qualities to her work on the Cambridge City Council. She was sworn in to her second term as mayor on January 4, 2016.

Susie Snyder Mrs. Susi Snyder is the Nuclear Disarmament Programme Manager for Pax in the Netherlands. Mrs. Snyder is a primary author of the Don’t Bank on the Bomb: Global Report on the Financing of Nuclear Weapons Producers (2013, 2014, 2015) and has published numerous reports and articles, including the 2015 Dealing with a Ban & Escalating Tensions, the 2014 The Rotterdam Blast: The immediate humanitarian consequences of a 12 kiloton nuclear explosion; and the 2011 Withdrawal Issues: What NATO countries say about the future of tactical nuclear weapons in Europe. She is an International Steering Group member of the International Campaign to Abolish Nuclear Weapons. Previously, Mrs. Snyder served as the International Secretary General of the Women’s International League for Peace and Freedom, where she monitored various issues under the aegis of the United Nations, including sustainable development, human rights, and disarmament.

Geoffrey Supran – Longstanding interest in optoelectronics. Opportunities to overcome scientific and economic hurdles in solar cell design and significantly impact world energy markets are alluring. Hybrid devices combining the flexibility, large area and tunable absorption of low cost solution processable nanocrystals (or polymers) with the high carrier mobility of, for example, III-V semiconductors, appear promising. In particular, enhancement of photocurrent by nonradiative energy transfer and carrier multiplication is of interest. Additionally, the importance of a nanoscale test-bed for fundamental studies of photo-induced energy/charge transport motivates my curiosity for the investigation of stand-alone photovoltaic single nanowire heterostructures. I am also interested in the development of photoelectrochemical storage catalysts and the pursuit of coupled photovoltaic-electrolysis systems.

Herb Taylor

Herb Taylor – Taylor became Senior Pastor at Harvard-Epworth UMC in August, 2014. Before coming to the church, he served as President and CEO of Deaconess Abundant Life Communities, a not-for-profit aging services provider. Founded in 1889, the Deaconess has over 400 employees and serves over a thousand older adults through skilled nursing, assisted living and independent living apartments in multiple locations in Massachusetts and New Hampshire.


Max Tegmark – Known as “Mad Max” for his unorthodox ideas and passion for adventure, his scientific interests range from precision cosmology to the ultimate nature of reality, all explored in his new popular book “Our Mathematical Universe”. He is an MIT physics professor with more than two hundred technical papers and has featured in dozens of science documentaries. His work with the SDSS collaboration on galaxy clustering shared the first prize in Science magazine’s “Breakthrough of the Year: 2003.” He is founder (with Anthony Aguirre) of the Foundational Questions Institute.

John Tierney

John Tierney – Tierney is an American politician who served as a U.S. Representative from Massachusetts from January 3, 1997, to January 3, 2015. In February 2016, he was appointed the executive director of the Council for a Livable World and the Center for Arms Control and Non-Proliferation, the council’s affiliated education and research organization. He is a Democrat who represented the state’s 6th district, which includes the state’s North Shore and Cape Ann. Born and raised in Salem, Massachusetts, Tierney graduated from Salem State College and Suffolk University Law School. He worked in private law and served on the Salem Chamber of Commerce (1976–97). Tierney was sworn in as a U.S. representative in 1997.

Frank Von Hippel

Frank Von Hippel – Hippel’s areas of policy research include nuclear arms control and nonproliferation, energy, and checks and balances in policy making for technology. Prior to coming to Princeton, he worked for ten years in the field of elementary-particle theoretical physics. He has written extensively on the technical basis for nuclear nonproliferation and disarmament initiatives, the future of nuclear energy, and improved automobile fuel economy. He won a 1993 MacArthur fellowship in recognition of his outstanding contributions to his fields of research. During 1993–1994, he served as assistant director for national security in the White House Office of Science and Technology Policy.

Jim Walsh

Jim Walsh – Walsh is a Senior Research Associate at the Massachusetts Institute of Technology’s Security Studies Program (SSP).Walsh’s research and writings focus on international security, and in particular, topics involving nuclear weapons, the Middle East, and East Asia. Walsh has testified before the United States Senate and House of Representatives on issues of nuclear terrorism, Iran, and North Korea. He is one of a handful of Americans who has traveled to both Iran and North Korea for talks with officials about nuclear issues. His recent publications include “Stopping North Korea, Inc.: Sanctions Effectiveness and Unintended Consequences” and “Rivals, Adversaries, and Partners: Iran and Iraq in the Middle East” in Iran and Its Neighbors. He is the international security contributor to the NPR program “Here and Now,” and his comments and analysis have appeared in the New York Times, the New York Review of Books, Washington Post, Wall Street Journal, ABC, CBS, NBC, Fox, and numerous other national and international media outlets. Before coming to MIT, Dr. Walsh was Executive Director of the Managing the Atom project at Harvard University’s John F. Kennedy School of Government and a visiting scholar at the Center for Global Security Research at Lawrence Livermore National Laboratory. He has taught at both Harvard University and MIT. Dr. Walsh received his Ph.D from the Massachusetts Institute of Technology.


We would like to extend a special thank you to our Program Committee and sponsors for all their help creating and organizing this event.

Prof. Aron Bernstein (MIT, Council for a Livable), Joseph Gerson (AFSC), Subrata Ghoshroy (MIT), Prof. Gary Goldstein (Tufts University), Cole Harrison (Mass Peace Action), Jonathan King (MIT and Mass Peace Action), State Rep. Denise Provost; John Ratliff (Mass Peace Action, Mass Senior Action), Prof. Elaine Scarry (Harvard University), Prof.Max Tegmark (MIT, Future of Life Institute), Patricia Weinmann (MIT Radius).

Sponsored by MIT Radius (the former Technology and Culture Forum), Massachusetts Peace Action, the American Friends Service Committee, and the Future of Life Institute.


Ensuring Smarter-than-human Intelligence Has a Positive Outcome

The following article and video were originally posted here.

I recently gave a talk at Google on the problem of aligning smarter-than-human AI with operators’ goals:


The talk was inspired by “AI Alignment: Why It’s Hard, and Where to Start,” and serves as an introduction to the subfield of alignment research in AI. A modified transcript follows.

Talk outline (slides):

1. Overview

2. Simple bright ideas going wrong

2.1. Task: Fill a cauldron
2.2. Subproblem: Suspend buttons

3. The big picture

3.1. Alignment priorities
3.2. Four key propositions

4. Fundamental difficulties


I’m the executive director of the Machine Intelligence Research Institute. Very roughly speaking, we’re a group that’s thinking in the long term about artificial intelligence and working to make sure that by the time we have advanced AI systems, we also know how to point them in useful directions.

Across history, science and technology have been the largest drivers of change in human and animal welfare, for better and for worse. If we can automate scientific and technological innovation, that has the potential to change the world on a scale not seen since the Industrial Revolution. When I talk about “advanced AI,” it’s this potential for automating innovation that I have in mind.

AI systems that exceed humans in this capacity aren’t coming next year, but many smart people are working on it, and I’m not one to bet against human ingenuity. I think it’s likely that we’ll be able to build something like an automated scientist in our lifetimes, which suggests that this is something we need to take seriously.

When people talk about the social implications of general AI, they often fall prey to anthropomorphism. They conflate artificial intelligence with artificial consciousness, or assume that if AI systems are “intelligent,” they must be intelligent in the same way a human is intelligent. A lot of journalists express a concern that when AI systems pass a certain capability level, they’ll spontaneously develop “natural” desires like a human hunger for power; or they’ll reflect on their programmed goals, find them foolish, and “rebel,” refusing to obey their programmed instructions.

These are misplaced concerns. The human brain is a complicated product of natural selection. We shouldn’t expect machines that exceed human performance in scientific innovation to closely resemble humans, any more than early rockets, airplanes, or hot air balloons closely resembled birds.1

The notion of AI systems “breaking free” of the shackles of their source code or spontaneously developing human-like desires is just confused. The AI system is its source code, and its actions will only ever follow from the execution of the instructions that we initiate. The CPU just keeps on executing the next instruction in the program register. We could write a program that manipulates its own code, including coded objectives. Even then, though, the manipulations that it makes are made as a result of executing the original code that we wrote; they do not stem from some kind of ghost in the machine.

The serious question with smarter-than-human AI is how we can ensure that the objectives we’ve specified are correct, and how we can minimize costly accidents and unintended consequences in cases of misspecification. As Stuart Russell (co-author of Artificial Intelligence: A Modern Approach) puts it:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.

These kinds of concerns deserve a lot more attention than the more anthropomorphic risks that are generally depicted in Hollywood blockbusters.


Simple bright ideas going wrong

Task: Fill a cauldron

Many people, when they start talking about concerns with smarter-than-human AI, will throw up a picture of the Terminator. I was once quoted in a news article making fun of people who put up Terminator pictures in all their articles about AI, next to a Terminator picture. I learned something about the media that day.

I think this is a much better picture:




This is Mickey Mouse in the movie Fantasia, who has very cleverly enchanted a broom to fill a cauldron on his behalf.

How might Mickey do this? We can imagine that Mickey writes a computer program and has the broom execute the program. Mickey starts by writing down a scoring function or objective function:

Given some set 𝐴 of available actions, Mickey then writes a program that can take one of these actions 𝑎 as input and calculate how high the score is expected to be if the broom takes that action. Then Mickey can write a function that spends some time looking through actions and predicting which ones lead to high scores, and outputs an action that leads to a relatively high score:

The reason this is “sorta-argmax” is that there may not be time to evaluate every action in 𝐴. For realistic action sets, agents should only need to find actions that make the scoring function as large as they can given resource constraints, even if this isn’t the maximal action.

This program may look simple, but of course, the devil’s in the details: writing an algorithm that does accurate prediction and smart search through action space is basically the whole problem of AI. Conceptually, however, it’s pretty simple: We can describe in broad strokes the kinds of operations the broom must carry out, and their plausible consequences at different performance levels.

When Mickey runs this program, everything goes smoothly at first. Then:




I claim that as fictional depictions of AI go, this is pretty realistic.

Why would we expect a generally intelligent system executing the above program to start overflowing the cauldron, or otherwise to go to extreme lengths to ensure the cauldron is full?

The first difficulty is that the objective function that Mickey gave his broom left out a bunch of other terms Mickey cares about:

The second difficulty is that Mickey programmed the broom to make the expectation of its score as large as it could. “Just fill one cauldron with water” looks like a modest, limited-scope goal, but when we translate this goal into a probabilistic context, we find that optimizing it means driving up the probability of success to absurd heights. If the broom assigns a 99.9% probability to “the cauldron is full,” and it has extra resources lying around, then it will always try to find ways to use those resources to drive the probability even a little bit higher.

Contrast this with the limited “task-like” goal we presumably had in mind. We wanted the cauldron full, but in some intuitive sense we wanted the system to “not try too hard” even if it has lots of available cognitive and physical resources to devote to the problem. We wanted it to exercise creativity and resourcefulness within some intuitive limits, but we didn’t want it to pursue “absurd” strategies, especially ones with large unanticipated consequences.2

In this example, the original objective function looked pretty task-like. It was bounded and quite simple. There was no way to get ever-larger amounts of utility. It’s not like the system got one point for every bucket of water it poured in — then there would clearly be an incentive to overfill the cauldron. The problem was hidden in the fact that we’re maximizing expected utility. This makes the goal open-ended, meaning that even small errors in the system’s objective function will blow up.

There are a number of different ways that a goal that looks task-like can turn out to be open-ended. Another example: a larger system that has an overarching task-like goal may have subprocesses that are themselves trying to maximize a variety of different objective functions, such as optimizing the system’s memory usage. If you don’t understand your system well enough to track whether any of its subprocesses are themselves acting like resourceful open-ended optimizers, then it may not matter how safe the top-level objective is.

So the broom keeps grabbing more pails of water — say, on the off chance that the cauldron has a leak in it, or that “fullness” requires the water to be slightly above the level of the brim. And, of course, at no point does the broom “rebel against” Mickey’s code. If anything, the broom pursued the objectives it was programmed with too effectively.


Subproblem: Suspend buttons

A common response to this problem is: “OK, there may be some unintended consequences of the objective function, but we can always pull the plug, right?”

Mickey tries this, and it doesn’t work:




And I claim that this is realistic too, for systems that are sufficiently good at modeling their environment. If the system is trying to drive up the expectation of its scoring function and is smart enough to recognize that its being shut down will result in lower-scoring outcomes, then the system’s incentive is to subvert shutdown attempts. The more capable the system is, the likelier it is to find creative ways to achieve that subgoal — e.g., by copying itself to the Internet, or by tricking the programmers into thinking it’s safer.

This isn’t to say that it’s impossible to shut down sufficiently capable AI systems; it’s just that we need to put in the extra work of deliberately designing systems to avoid searching for ways to avoid being shut down. If you’ve written code that would try to resist attempts to shut it down, then you shouldn’t run the code even if you’re convinced it would fail — there are better ways to spend those compute cycles than on adversarial policy searches.

Relevant safety measures that don’t assume we can always outthink and outmaneuver the system will need to modify the system’s objective function in some relevant way. We need to figure out how to formally specify objective functions that don’t automatically place the AI system into an adversarial context with the operators; or we need to figure out some way to have the system achieve goals without optimizing some objective function in the traditional sense. E.g., we can look for ways to limit the system’s capabilities to a small set of “whitelisted” domains that don’t include thinking about whether the system will be shut down.

One obvious solution is to try changing the score function to make reference to a suspend button 𝐁:

This says that when the button is on, the system’s goal changes from “fill the cauldron” to “be suspended.”

It’s not clear how we could actually specify this goal — if there are any loopholes in the way we’ve specified “suspension,” then the system might try to exploit them to hedge its bets. E.g., it could fulfill the technical definition of “suspend operations” while tasking a child agent it built or subprocess it started with filling the cauldron. This would allow is to achieve both “cauldron full” and “robot suspended” at once, just in case it’s wrong about whether the button has been pushed.

Let’s assume, however, that we’ve found an airtight way to technically specify suspension. The problem we run into now is that we’ve given the system an incentive to care a lot about whether the button is pushed. If it has any way to influence its operators’ decision to push the button, or any other way to exercise more control, then it will jump on the opportunity and try to ensure the button is pushed, or ensure that it’s not pushed — depending on whether it looks easier to suspend or to fill the cauldron.

And, surprisingly, the state of the art on this problem isn’t much more advanced than that. There’s a paper co-authored by me, Fallenstein, Yudkowsky, and Armstrong (“Corrigibility“), and a related result by Orseau and Armstrong (“Safely Interruptible Agents“), but not a lot more.

What we want is a way to combine two objective functions — a default function for normal operation, and a suspend function for when we want to suspend the system to disk.

We want our method for combining the functions to satisfy three conditions: an operator should be able to switch between the functions (say, by pushing a button); the system shouldn’t have any incentives to control which function is active; and if it’s plausible that the system’s normal operations could inadvertently compromise our ability to switch between the functions, then the system should be incentivized to keep that from happening.

So far, we haven’t found any way to achieve all three goals at once. As an example, we can try to remove any incentive on the system’s part to control whether its suspend button is pushed by giving the system a switching objective function that always assigns the same expected utility to the button being on or off:

Unfortunately, this approach makes it impossible to satisfy the third condition. It turns out that the above objective function is equivalent to setting the probability of the button being pressed to 0. The system won’t deliberately tamper with the button, but it will happily approve updates that delete any reference to the button as dead code, or build subagents that ignore the button.

Meeting these intuitively simple constraints turns out to be a nontrivial problem. This is a pattern that runs through a lot of questions in this space: traditional tools and concepts run into immediate safety problems that don’t turn up in conventional capabilities research.

The big picture

Alignment priorities

Let’s take a step back and talk about what’s needed overall in order to align highly capable AI systems with our interests.

Here’s a dramatically simplified pipeline: You have some humans who come up with some task or goal or preference set that serves as their intended value function 𝘝. Since our values are complicated and context-sensitive, in practice we’ll need to build systems to learn our values over time, rather than coding them by hand.3 We’ll call the goal the AI system ends up with (which may or may not be identical to 𝘝) 𝗨.

alignment-prioritiesWhen the press covers this topic, they often focus on one of two problems: “What if the wrong group of humans develops smarter-than-human AI first?”, and “What if AI’s natural desires cause 𝗨 to diverge from 𝘝?”

humans-ndIn my view, the “wrong humans” issue shouldn’t be the thing we focus on until we have reason to think we could get good outcomes with the right group of humans. We’re very much in a situation where well-intentioned people couldn’t leverage a general AI system to do good things even if they tried. As a simple example, if you handed me a box that was an extraordinarily powerful function optimizer — I could put in a description of any mathematical function, and it would give me an input that makes the output extremely large — then I do know how to use the box to produce a random catastrophe, but I don’t actually know how I could use that box in the real world to have a good impact.4

There’s a lot we don’t understand about AI capabilities, but we’re in a position where we at least have a general sense of what progress looks like. We have a number of good frameworks, techniques, and metrics, and we’ve put a great deal of thought and effort into successfully chipping away at the problem from various angles. At the same time, we have a very weak grasp on the problem of how to align highly capable systems with any particular goal. We can list out some intuitive desiderata, but the field hasn’t really developed its first formal frameworks, techniques, or metrics.

I believe that there’s a lot of low-hanging fruit in this area, and also that a fair amount of the work does need to be done early (e.g., to help inform capabilities research directions — some directions may produce systems that are much easier to align than others). If we don’t solve these problems, developers with arbitrarily good or bad intentions will end up producing equally bad outcomes. From an academic or scientific standpoint, our first objective in that kind of situation should be to remedy this state of affairs and at least make good outcomes technologically possible.

Many people quickly recognize that “natural desires” are a fiction, but infer from this that we instead need to focus on the other issues the media tends to emphasize — “What if bad actors get their hands on smarter-than-human AI?”, “How will this kind of AI impact employment and the distribution of wealth?”, etc. These are important questions, but they’ll only end up actually being relevant if we figure out how to bring general AI systems up to a minimum level of reliability and safety.

Another common thread is “Why not just tell the AI system to (insert intuitive moral precept here)?” On this way of thinking about the problem, often (perhaps unfairly) associated with Isaac Asimov’s writing, ensuring a positive impact from AI systems is largely about coming up with natural-language instructions that are vague enough to subsume a lot of human ethical reasoning:


In contrast, precision is a virtue in real-world safety-critical software systems. Driving down accident risk requires that we begin with limited-scope goals rather than trying to “solve” all of morality at the outset.5

My view is that the critical work is mostly in designing an effective value learning process, and in ensuring that the sorta-argmax process is correctly hooked up to the resultant objective function 𝗨:


The better your value learning framework is, the less explicit and precise you need to be in pinpointing your value function 𝘝, and the more you can offload the problem of figuring out what you want to the AI system itself. Value learning, however, raises a number of basic difficulties that don’t crop up in ordinary machine learning tasks.

Classic capabilities research is concentrated in the sorta-argmax and Expectation parts of the diagram, but sorta-argmax also contains what I currently view as the most neglected, tractable, and important safety problems. The easiest way to see why “hooking up the value learning process correctly to the system’s capabilities” is likely to be an important and difficult challenge in its own right is to consider the case of our own biological history.

Natural selection is the only “engineering” process we know of that has ever led to a generally intelligent artifact: the human brain. Since natural selection relies on a fairly unintelligent hill-climbing approach, one lesson we can take away from this is that it’s possible to reach general intelligence with a hill-climbing approach and enough brute force — though we can presumably do better with our human creativity and foresight.

Another key take-away is that natural selection was maximally strict about only optimizing brains for a single very simple goal: genetic fitness. In spite of this, the internal objectives that humans represent as their goals are not genetic fitness. We have innumerable goals — love, justice, beauty, mercy, fun, esteem, good food, good health, … — that correlated with good survival and reproduction strategies in the ancestral savanna. However, we ended up valuing these correlates directly, rather than valuing propagation of our genes as an end in itself — as demonstrated every time we employ birth control.

This is a case where the external optimization pressure on an artifact resulted in a general intelligence with internal objectives that didn’t match the external selection pressure. And just as this caused humans’ actions to diverge from natural selection’s pseudo-goal once we gained new capabilities, we can expect AI systems’ actions to diverge from humans’ if we treat their inner workings as black boxes.

If we apply gradient descent to a black box, trying to get it to be very good at maximizing some objective, then with enough ingenuity and patience, we may be able to produce a powerful optimization process of some kind.6 By default, we should expect an artifact like that to have a goal 𝗨 that strongly correlates with our objective 𝘝 in the training environment, but sharply diverges from 𝘝 in some new environments or when a much wider option set becomes available.

On my view, the most important part of the alignment problem is ensuring that the value learning framework and overall system design we implement allow us to crack open the hood and confirm when the internal targets the system is optimizing for match (or don’t match) the targets we’re externally selecting through the learning process.7

We expect this to be technically difficult, and if we can’t get it right, then it doesn’t matter who’s standing closest to the AI system when it’s developed. Good intentions aren’t sneezed into computer programs by kind-hearted programmers, and coming up with plausible goals for advanced AI systems doesn’t help if we can’t align the system’s cognitive labor with a given goal.


Four key propositions

Taking another step back: I’ve given some examples of open problems in this area (suspend buttons, value learning, limited task-based AI, etc.), and I’ve outlined what I consider to be the major problem categories. But my initial characterization of why I consider this an important area — “AI could automate general-purpose scientific reasoning, and general-purpose scientific reasoning is a big deal” — was fairly vague. What are the core reasons to prioritize this work?

First, goals and capabilities are orthogonal. That is, knowing an AI system’s objective function doesn’t tell you how good it is at optimizing that function, and knowing that something is a powerful optimizer doesn’t tell you what it’s optimizing.

I think most programmers intuitively understand this. Some people will insist that when a machine tasked with filling a cauldron gets smart enough, it will abandon cauldron-filling as a goal unworthy of its intelligence. From a computer science perspective, the obvious response is that you could go out of your way to build a system that exhibits that conditional behavior, but you could also build a system that doesn’t exhibit that conditional behavior. It can just keeps searching for actions that have a higher score on the “fill a cauldron” metric. You and I might get bored if someone told us to just keep searching for better actions, but it’s entirely possible to write a program that executes a search and never gets bored.8

Second, sufficiently optimized objectives tend to converge on adversarial instrumental strategies. Most objectives a smarter-than-human AI system could possess would be furthered by subgoals like “acquire resources” and “remain operational” (along with “learn more about the environment,” etc.).

This was the problem suspend buttons ran into: even if you don’t explicitly include “remain operational” in your goal specification, whatever goal you did load into the system is likely to be better achieved if the system remains online. Software systems’ capabilities and (terminal) goals are orthogonal, but they’ll often exhibit similar behaviors if a certain class of actions is useful for a wide variety of possible goals.

To use an example due to Stuart Russell: If you build a robot and program it to go to the supermarket to fetch some milk, and the robot’s model says that one of the paths is much safer than the other, then the robot, in optimizing for the probability that it returns with milk, will automatically take the safer path. It’s not that the system fears death, but that it can’t fetch the milk if it’s dead.

Third, general-purpose AI systems are likely to show large and rapid capability gains. The human brain isn’t anywhere near the upper limits for hardware performance (or, one assumes, software performance), and there are a number of other reasons to expect large capability advantages and rapid capability gain from advanced AI systems.

As a simple example, Google can buy a promising AI startup and throw huge numbers of GPUs at them, resulting in a quick jump from “these problems look maybe relevant a decade from now” to “we need to solve all of these problems in the next year.”9

Fourth, aligning advanced AI systems with our interests looks difficult. I’ll say more about why I think this presently.

Roughly speaking, the first proposition says that AI systems won’t naturally end up sharing our objectives. The second says that by default, systems with substantially different objectives are likely to end up adversarially competing for control of limited resources. The third suggests that adversarial general-purpose AI systems are likely to have a strong advantage over humans. And the fourth says that this problem is hard to solve — for example, that it’s hard to transmit our values to AI systems (addressing orthogonality) or avert adversarial incentives (addressing convergent instrumental strategies).

These four propositions don’t mean that we’re screwed, but they mean that this problem is critically important. General-purpose AI has the potential to bring enormous benefits if we solve this problem, but we do need to make finding solutions a priority for the field.

Fundamental difficulties

Why do I think that AI alignment looks fairly difficult? The main reason is just that this has been my experience from actually working on these problems. I encourage you to look at some of the problems yourself and try to solve them in toy settings; we could use more eyes here. I’ll also make note of a few structural reasons to expect these problems to be hard:

First, aligning advanced AI systems with our interests looks difficult for the same reason rocket engineering is more difficult than airplane engineering.

Before looking at the details, it’s natural to think “it’s all just AI” and assume that the kinds of safety work relevant to current systems are the same as the kinds you need when systems surpass human performance. On that view, it’s not obvious that we should work on these issues now, given that they might all be worked out in the course of narrow AI research (e.g., making sure that self-driving cars don’t crash).

Similarly, at a glance someone might say, “Why would rocket engineering be fundamentally harder than airplane engineering? It’s all just material science and aerodynamics in the end, isn’t it?” In spite of this, empirically, the proportion of rockets that explode is far higher than the proportion of airplanes that crash. The reason for this is that a rocket is put under much greater stress and pressure than an airplane, and small failures are much more likely to be highly destructive.10

Analogously, even though general AI and narrow AI are “just AI” in some sense, we can expect that the more general AI systems are likely to experience a wider range of stressors, and possess more dangerous failure modes.

For example, once an AI system begins modeling the fact that (i) your actions affect its ability to achieve its objectives, (ii) your actions depend on your model of the world, and (iii) your model of the world is affected by its actions, the degree to which minor inaccuracies can lead to harmful behavior increases, and the potential harmfulness of its behavior (which can now include, e.g., deception) also increases. In the case of AI, as with rockets, greater capability makes it easier for small defects to cause big problems.

Second, alignment looks difficult for the same reason it’s harder to build a good space probe than to write a good app.

You can find a number of interesting engineering practices at NASA. They do things like take three independent teams, give each of them the same engineering spec, and tell them to design the same software system; and then they choose between implementations by majority vote. The system that they actually deploy consults all three systems when making a choice, and if the three systems disagree, the choice is made by majority vote. The idea is that any one implementation will have bugs, but it’s unlikely all three implementations will have a bug in the same place.

This is significantly more caution than goes into the deployment of, say, the new WhatsApp. One big reason for the difference is that it’s hard to roll back a space probe. You can send version updates to a space probe and correct software bugs, but only if the probe’s antenna and receiver work, and if all the code required to apply the patch is working. If your system for applying patches is itself failing, then there’s nothing to be done.

In that respect, smarter-than-human AI is more like a space probe than like an ordinary software project. If you’re trying to build something smarter than yourself, there are parts of the system that have to work perfectly on the first real deployment. We can do all the test runs we want, but once the system is out there, we can only make online improvements if the code that makes the system allow those improvements is working correctly.

If nothing yet has struck fear into your heart, I suggest meditating on the fact that the future of our civilization may well depend on our ability to write code that works correctly on the first deploy.

Lastly, alignment looks difficult for the same reason computer security is difficult: systems need to be robust to intelligent searches for loopholes.

Suppose you have a dozen different vulnerabilities in your code, none of which is itself fatal or even really problematic in ordinary settings. Security is difficult because you need to account for intelligent attackers who might find all twelve vulnerabilities and chain them together in a novel way to break into (or just break) your system. Failure modes that would never arise by accident can be sought out and exploited; weird and extreme contexts can be instantiated by an attacker to cause your code to follow some crazy code path that you never considered.

A similar sort of problem arises with AI. The problem I’m highlighting here is not that AI systems might act adversarially: AI alignment as a research program is all about finding ways to prevent adversarial behavior before it can crop up. We don’t want to be in the business of trying to outsmart arbitrarily intelligent adversaries. That’s a losing game.

The parallel to cryptography is that in AI alignment we deal with systems that perform intelligent searches through a very large search space, and which can produce weird contexts that force the code down unexpected paths. This is because the weird edge cases are places of extremes, and places of extremes are often the place where a given objective function is optimized.11 Like computer security professionals, AI alignment researchers need to be very good at thinking about edge cases.

It’s much easier to make code that works well on the path that you were visualizing than to make code that works on all the paths that you weren’t visualizing. AI alignment needs to work on all the paths you weren’t visualizing.

Summing up, we should approach a problem like this with the same level of rigor and caution we’d use for a security-critical rocket-launched space probe, and do the legwork as early as possible. At this early stage, a key part of the work is just to formalize basic concepts and ideas so that others can critique them and build on them. It’s one thing to have a philosophical debate about what kinds of suspend buttons people intuit ought to work, and another thing to translate your intuition into an equation so that others can fully evaluate your reasoning.

This is a crucial project, and I encourage all of you who are interested in these problems to get involved and try your hand at them. There are ample resources online for learning more about the open technical problems. Some good places to start include MIRI’s research agendas and a great paper from researchers at Google Brain, OpenAI, and Stanford called “Concrete Problems in AI Safety.”


  1. An airplane can’t heal its injuries or reproduce, though it can carry heavy cargo quite a bit further and faster than a bird. Airplanes are simpler than birds in many respects, while also being significantly more capable in terms of carrying capacity and speed (for which they were designed). It’s plausible that early automated scientists will likewise be simpler than the human mind in many respects, while being significantly more capable in certain key dimensions. And just as the construction and design principles of aircraft look alien relative to the architecture of biological creatures, we should expect the design of highly capable AI systems to be quite alien when compared to the architecture of the human mind.
  2. Trying to give some formal content to these attempts to differentiate task-like goals from open-ended goals is one way of generating open research problems. In the “Alignment for Advanced Machine Learning Systems” research proposal, the problem of formalizing “don’t try too hard” is mild optimization, “steer clear of absurd strategies” is conservatism, and “don’t have large unanticipated consequences” is impact measures. See also “avoiding negative side effects” in Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané’s “Concrete Problems in AI Safety.”
  3. One thing we’ve learned in the field of machine vision over the last few decades is that it’s hopeless to specify by hand what a cat looks like, but that it’s not too hard to specify a learning system that can learn to recognize cats. It’s even more hopeless to specify everything we value by hand, but it’s plausible that we could specify a learning system that can learn the relevant concept of “value.”
  4. Roughly speaking, MIRI’s focus is on research directions that seem likely to help us conceptually understand how to do AI alignment in principle, so we’re fundamentally less confused about the kind of work that’s likely to be needed.What do I mean by this? Let’s say that we’re trying to develop a new chess-playing programs. Do we understand the problem well enough that we could solve it if someone handed us an arbitrarily large computer? Yes: We make the whole search tree, backtrack, see whether white has a winning move.If we didn’t know how to answer the question even with an arbitrarily large computer, then this would suggest that we were fundamentally confused about chess in some way. We’d either be missing the search-tree data structure or the backtracking algorithm, or we’d be missing some understanding of how chess works.This was the position we were in regarding chess prior to Claude Shannon’s seminal paper, and it’s the position we’re currently in regarding many problems in AI alignment. No matter how large a computer you hand me, I could not make a smarter-than-human AI system that performs even a very simple limited-scope task (e.g., “put a strawberry on a plate without producing any catastrophic side-effects”) or achieves even a very simple open-ended goal (e.g., “maximize the amount of diamond in the universe”).If I didn’t have any particular goal in mind for the system, I could write a program (assuming an arbitrarily large computer) that strongly optimized the future in an undirected way, using a formalism like AIXI. In that sense we’re less obviously confused about capabilities than about alignment, even though we’re still missing a lot of pieces of the puzzle on the practical capabilities front.Our goal is to develop and formalize basic approaches and ways of thinking about the alignment problem, so that our engineering decisions don’t end up depending on sophisticated and clever-sounding verbal arguments that turn out to be subtly mistaken. Simplifications like “what if we weren’t worried about resource constraints?” and “what if we were trying to achieve a much simpler goal?” are a good place to start breaking down the problem into manageable pieces. For more on this methodology, see “MIRI’s Approach.”
  5. “Fill this cauldron without being too clever about it or working too hard or having any negative consequences I’m not anticipating” is a rough example of a goal that’s intuitively limited in scope. The things we actually want to use smarter-than-human AI for are obviously more ambitious than that, but we’d still want to begin with various limited-scope tasks rather than open-ended goals.Asimov’s Three Laws of Robotics make for good stories partly for the same reasons they’re unhelpful from a research perspective. The hard task of turning a moral precept into lines of code is hidden behind phrasings like “[don’t,] through inaction, allow a human being to come to harm.” If one followed a rule like that strictly, the result would be massively disruptive, as AI systems would need to systematically intervene to prevent even the smallest risks of even the slightest harms; and if the intent is that one follow the rule loosely, then all the work is being done by the human sensibilities and intuitions that tell us when and how to apply the rule.A common response here is that vague natural-language instruction is sufficient, because smarter-than-human AI systems are likely to be capable of natural language comprehension. However, this is eliding the distinction between the system’s objective function and its model of the world. A system acting in an environment containing humans may learn a world-model that has lots of information about human language and concepts, which the system can then use to achieve its objective function; but this fact doesn’t imply that any of the information about human language and concepts will “leak out” and alter the system’s objective function directly.Some kind of value learning process needs to be defined where the objective function itself improves with new information. This is a tricky task because there aren’t known (scalable) metrics or criteria for value learning in the way that there are for conventional learning.If a system’s world-model is accurate in training environments but fails in the real world, then this is likely to result in lower scores on its objective function — the system itself has an incentive to improve. The severity of accidents is also likelier to be self-limiting in this case, since false beliefs limit a system’s ability to effectively pursue strategies.In contrast, if a system’s value learning process results in a 𝗨 that matches our 𝘝 in training but diverges from 𝘝 in the real world, then the system’s 𝗨 will obviously not penalize it for optimizing 𝗨. The system has no incentive relative to 𝗨 to “correct” divergences between 𝗨 and 𝘝, if the value learning process is initially flawed. And accident risk is larger in this case, since a mismatch between 𝗨 and 𝘝 doesn’t necessarily place any limits on the system’s instrumental effectiveness at coming up with effective and creative strategies for achieving 𝗨.The problem is threefold:1. “Do What I Mean” is an informal idea, and even if we knew how to build a smarter-than-human AI system, we wouldn’t know how to precisely specify this idea in lines of code.2. If doing what we actually mean is instrumentally useful for achieving a particular objective, then a sufficiently capable system may learn how to do this, and may act accordingly so long as doing so is useful for its objective. But as systems become more capable, they are likely to find creative new ways to achieve the same objectives, and there is no obvious way to get an assurance that “doing what I mean” will continue to be instrumentally useful indefinitely.

    3. If we use value learning to refine a system’s goals over time based on training data that appears to be guiding the system toward a 𝗨 that inherently values doing what we mean, it is likely that the system will actually end up zeroing in on a 𝗨 that approximately does what we mean during training but catastrophically diverges in some difficult-to-anticipate contexts. See “Goodhart’s Curse” for more on this.

    For examples of problems faced by existing techniques for learning goals and facts, such as reinforcement learning, see “Using Machine Learning to Address AI Risk.”

  6. The result will probably not be a particularly human-like design, since so many complex historical contingencies were involved in our evolution. The result will also be able to benefit from a number of large software and hardware advantages.
  7. This concept is sometimes lumped into the “transparency” category, but standard algorithmic transparency research isn’t really addressing this particular problem. A better term for what I have in mind here is “understanding.” What we want is to gain deeper and broader insights into the kind of cognitive work the system is doing and how this work relates to the system’s objectives or optimization targets, to provide a conceptual lens with which to make sense of the hands-on engineering work.
  8. We could choose to program the system to tire, but we don’t have to. In principle, one could program a broom that only ever finds and executes actions that optimize the fullness of the cauldron. Improving the system’s ability to efficiently find high-scoring actions (in general, or relative to a particular scoring rule) doesn’t in itself change the scoring rule it’s using to evaluate actions.
  9. Some other examples: a system’s performance may suddenly improve when it’s first given large-scale Internet access, when there’s a conceptual breakthrough in algorithm design, or when the system itself is able to propose improvements to its hardware and software. We can imagine the latter case in particular resulting in a feedback loop as the system’s design improvements allow it to come up with further design improvements, until all the low-hanging fruit is exhausted.Another important consideration is that two of the main bottlenecks to humans doing faster scientific research are training time and communication bandwidth. If we could train a new mind to be a cutting-edge scientist in ten minutes, and if scientists could near-instantly trade their experience, knowledge, concepts, ideas, and intuitions to their collaborators, then scientific progress might be able to proceed much more rapidly. Those sorts of bottlenecks are exactly the sort of bottleneck that might give automated innovators an enormous edge over human innovators even without large advantages in hardware or algorithms.
  10. Specifically, rockets experience a wider range of temperatures and pressures, traverse those ranges more rapidly, and are also packed more fully with explosives.
  11. Consider Bird and Layzell’s example of a very simple genetic algorithm that was tasked with evolving an oscillating circuit. Bird and Layzell were astonished to find that the algorithm made no use of the capacitor on the chip; instead, it had repurposed the circuit tracks on the motherboard as a radio to replay the oscillating signal from the test device back to the test device.This was not a very smart program. This is just using hill climbing on a very small solution space. In spite of this, the solution turned out to be outside the space of solutions the programmers were themselves visualizing. In a computer simulation, this algorithm might have behaved as intended, but the actual solution space in the real world was wider than that, allowing hardware-level interventions.In the case of an intelligent system that’s significantly smarter than humans on whatever axes you’re measuring, you should by default expect the system to push toward weird and creative solutions like these, and for the chosen solution to be difficult to anticipate.