What Can We Learn From Cape Town’s Water Crisis?

The following article was contributed by Billy Babis.

Earlier this year, Cape Town, a port city in South Africa, prepared for a full depletion of its water resources amidst the driest 3-year span on record. The threat of ”Day Zero” —  the day when the city would officially have cut off running water — has subsided for now, but the long-term threat remains. And though Cape Town’s crisis is local, it exemplifies a problem several regions across the globe may soon have to address.


Current situation in Cape Town

In addition to being part of Cape Town’s driest 3-year span on record, 2017 was the city’s driest single year since 1933. With a population that has nearly doubled to 3.74 million in the past 25 years, water consumption has increased as the supply dwindles.

In January of this year, the city of Cape Town made an emergency announcement that Day Zero would land in mid-April and began enforcing restrictions and regulations. Starting on February 1st, the Cape Town government put emergency water regulations into effect, increasing the cost of water to 5-8 times its previous rate and placing a suggested 50 liter per person per day cap on water use. For context, the average American uses approximately 300-380 liters of water per day. And while Cape Town residents continue to use 80 million liters more than the city’s goal of 450 million liters per day, these regulations and increased costs have made progress. Water consumption decreased enough that Day Zero has now been postponed until 2019, as the rainy season (July-August) is expected to partially replenish the reservoirs.

Cape Town’s water consumption decreased largely due to its increased cost. The municipality also restricted agricultural use of water, which usually makes up just under half of total consumption. These restrictions are worsening Cape Town’s already struggling agricultural sector; which in this 3-year drought has slashed 37,000 jobs and lost R14 billion (US$1.17 billion), contributing to inflated food prices that shoved 50,000 people below the poverty line.

On the innovation side, the city has made major investments in infrastructure to increase water availability: 3 desalination plants, 3 aquifer abstraction facilities, and 1 waste-water recycling project are currently underway to ultimately increase Cape Town’s water availability by almost 300 million liters per day.

Did climate change cause the water crisis?

Severe droughts have plagued subtropical regions like Cape Town long before human-caused climate change. Thus, it’s difficult to conclude that climate change directly caused the Cape Town water crisis. However, the International Panel on Climate Change (IPCC) continues to find evidence suggesting that climate change has caused drought in certain regions and will cause longer, more frequent droughts over the next century.

Drought can either be meteorological (abnormally limited rainfall), agricultural (abnormally dry soil, excess evaporation), or hydrological (limited stream-water). While each of these problems are interrelated, they have varying impacts on drought in different regions. Meteorological drought is often the most important, and this is certainly the case in Cape Town.

Meteorological drought occurs naturally in Cape Town and the other few regions of the world with a “Mediterranean” climate: Central California, central Chile, northern Africa and southern Europe, southwestern Australia, and the greater Cape Town area have dry summers and variably rainy winters. Due to global weather oscillations like El Nino, the total rainfall in winter varies dramatically. A given winter is usually either very rainy or very dry. But as long as repeated and prolonged periods of drought don’t strike, these regions can prepare for dry seasons by storing water from previous wet seasons.

But climate change threatens to jeopardize this. With “robust evidence and high agreement,” the IPCC concluded that while tropical regions will receive more precipitation this century, subtropical dry regions (like these Mediterranean climates) will receive less. In fact, warm, rising air near the equator ultimately settles and cools in these subtropical regions, creating deserts and droughts. Therefore, increasing equatorial heat and rain (as global warming promises to do) will likely lead to drier subtropical conditions and more frequent meteorological drought.  

But the IPCC also expects these Mediterranean climates to experience more frequent agricultural drought, (IPCC 5) largely due to growing human populations. In addition, renewable surface water and groundwater will decrease and hydrologic drought will likely occur more frequently, due in large part to the increasing population and resulting consumption.


What we can learn from this crisis

Earlier this year, many Capetonians feared a total catastrophe: running out of water. Wealthier citizens might have been able to pay for imported water or outbound flights, but poorer communities would have been left in a much more dire situation. International aid likely would have been necessary to avoid any fatal consequences.

The city seems to have averted that for now, but not without cost. The drought has caused immense strain on the agricultural economy that “will take years to work out of the system,” explains Beatrice Conradie, Professor of Economics and Social Sciences at University of Cape Town. “Primary producers are likely to act more conservatively as a result and this will make them less inclined to invest and create jobs. The unemployed will migrate to cities where they will put additional pressure on already strained infrastructure.”

And while Cape Town’s water infrastructure projects — desalination plants, aquifer abstraction facilities, and waste-water recycling projects — provided some immediate and prospective relief, they will not always be an option for every region. Desalination plants are very expensive and energy intensive (thus, climate change contributors), and pollute the local ocean ecosystem by releasing the brine remnants of desalination back into the water. Conradie raises further concerns about unregulated well-drilling in response to surface water restrictions. Regulated and unregulated over-abstraction from aquifers commonly leads to salt-water intrusion, permanently contaminating that fresh water and killing wetland wildlife. These are the best solutions available, and none of them are sustainable.

“Cape Town is really a wake-up call for other cities around the world,” shares NASA’s senior water scientist, Jay Famiglietti. “We have huge challenges ahead of us if we want to avert future day zeros in other cities around the world.”

Just as Capetonians failed to heed the cries of their government before reaching crisis-mode, global citizens are adjusting very slowly to the climate change cries of scientists and governments around the globe. But Cape Town’s response offers some valuable sociological lessons on sustainability. One is that behavioral changes can swing abruptly on a mass scale. Once a sufficient sense of urgency struck the people of Cape Town in early February (see figure below), the conservation movement gained a critical mass. Community members exponentially fed off each others’ hope.

But the role of governance proved indispensable. While Cape Town had long tried to inform the public of the water shortages, residents didn’t adjust their consumption until the government made the emergency announcement on January 17th and began enforcing drastic regulations and fees.

As Cape Town’s sustainability efforts demonstrate, addressing climate change is a social problem as much as a technical problem. Regardless of technological innovations, understanding human behavioral habits will be crucial in propelling necessary changes. As such, sociologists will grow just as important as climate scientists or chemical engineers in leading change.

Cape Town’s main focus over the past few months has been discovering the best ways to make behavioral nudges to its citizens on a mass scale to reduce water consumption. This entailed research partnerships with University of Cape Town’s sociology departments and the Environmental Policy Research Unit (EPRU). The principle of reciprocity reigned true – that people are more likely to contribute to the public good if they see others doing it – and enhancing this effect on a global scale will grow increasingly important as we attempt to mitigate and adapt to environmental threats in this century.

With or without a changing climate, though, water scarcity will become an increasingly urgent issue for humanity’s growing population. Population growth continues to catapult our ecological footprint and increasingly threaten the ability of future, presumably larger, generations to flourish. Amidst their environmental challenge, the people of Cape Town demonstrated the importance of effective governance and collaboration. As more subtropical regions begin to suffer from drought and water shortages, learning from the failures and successes of Cape Town’s 2018 crisis will help avoid disaster.

Teaching Today’s AI Students To Be Tomorrow’s Ethical Leaders: An Interview With Yan Zhang

Some of the greatest scientists and inventors of the future are sitting in high school classrooms right now, breezing through calculus and eagerly awaiting freshman year at the world’s top universities. They may have already won Math Olympiads or invented clever, new internet applications. We know these students are smart, but are they prepared to responsibly guide the future of technology?

Developing safe and beneficial technology requires more than technical expertise — it requires a well-rounded education and the ability to understand other perspectives. But since math and science students must spend so much time doing technical work, they often lack the skills and experience necessary to understand how their inventions will impact society.

These educational gaps could prove problematic as artificial intelligence assumes a greater role in our lives. AI research is booming among young computer scientists, and these students need to understand the complex ethical, governance, and safety challenges posed by their innovations.



In 2012, a group of AI researchers and safety advocates – Paul Christiano, Jacob Steinhardt, Andrew Critch, Anna Salamon, and Yan Zhang – created the Summer Program in Applied Rationality and Cognition (SPARC) to address the many issues that face quantitatively strong teenagers, including the issue of educational gaps in AI. As with all technologies, they explain, the more the AI community consists of thoughtful, intelligent, broad-minded reasoners, the more likely AI is to be developed in a safe and beneficial manner.

Each summer, the SPARC founders invite 30-35 mathematically gifted high school students to participate in their two-week program. Zhang, SPARC’s director, explains: “Our goals are to generate a strong community, expose these students to ideas that they’re not going to get in class – blind spots of being a quantitatively strong teenager in today’s world, like empathy and social dynamics. Overall we want to make them more powerful individuals who can bring positive change to the world.”

To help students make a positive impact, SPARC instructors teach core ideas in effective altruism (EA). “We have a lot of conversations about EA, but we don’t push the students to become EA,” Zhang says. “We expose them to good ideas, and I think that’s a healthier way to do mentorship.”

SPARC also exposes students to machine learning, AI safety, and existential risks. In 2016 and 2017, they held over 10 classes on these topics, including: “Machine Learning” and “Tensorflow” taught by Jacob Steinhardt, “Irresponsible Futurism” and “Effective Do-Gooding” taught by Paul Christiano, “Optimization” taught by John Schulman, and “Long-Term Thinking on AI and Automization” taught by Michael Webb.

But SPARC instructors don’t push students down the AI path either. Instead, they encourage students to apply SPARC’s holistic training to make a more positive impact in any field.


Thinking on the Margin: The Role of Social Skills

Making the most positive impact requires thinking on the margin, and asking: What one additional unit of knowledge will be most helpful for creating positive impact? For these students, most of whom have won Math and Computing Olympiads, it’s usually not more math.

“A weakness of a lot of mathematically-minded students are things like social skills or having productive arguments with people,” Zhang says. “Because to be impactful you need your quantitative skills, but you need to also be able to relate with people.”

To counter this weakness, he teaches classes on social skills and signaling, and occasionally leads improvisational games. SPARC still teaches a lot of math, but Zhang is more interested in addressing these students’ educational blind spots – the same blind spots that the instructors themselves had as students. “What would have made us more impactful individuals, and also more complete and more human in many ways?” he asks.

Working with non-math students can help, so Zhang and his colleagues have experimented with bringing excellent writers and original thinkers into the program. “We’ve consistently had really good successes with those students, because they bring something that the Math Olympiad kids don’t have,” Zhang says.

SPARC also broadens students’ horizons with guest speakers from academia and organizations such as the Open Philanthropy Project, OpenAI, Dropbox and Quora. In one talk, Dropbox engineer Albert Ni spoke to SPARC students about “common mistakes that math people make when they try to do things later in life.”

In another successful experiment suggested by Ofer Grossman, a SPARC alum who is now a staff member, SPARC made half of all classes optional in 2017. The classes were still packed because students appreciated the culture. The founders also agreed that conversations after class are often more impactful than classes, and therefore engineered one-on-one time and group discussions into the curriculum. Thinking on the margin, they ask: “What are the things that were memorable about school? What are the good parts? Can we do more of those and less of the others?”

Above all, SPARC fosters a culture of openness, curiosity and accountability. Inherent in this project is “cognitive debiasing” – learning about common biases like selection bias and confirmation bias, and correcting for them. “We do a lot of de-biasing in our interactions with each other, very explicitly,” Zhang says. “We also have classes on cognitive biases, but the culture is the more important part.”


AI Research and Future Leaders

Designing safe and beneficial technology requires technical expertise, but in SPARC’s view, cultivating a holistic research culture is equally important. Today’s top students may make some of the most consequential AI breakthroughs in the future, and their values, education and temperament will play a critical role in ensuring that advanced AI is deployed safely and for the common good.

“This is also important outside of AI,” Zhang explains. “The official SPARC stance is to make these students future leaders in their communities, whether it’s AI, academia, medicine, or law. These leaders could then talk to each other and become allies instead of having a bunch of splintered, narrow disciplines.”

As SPARC approaches its 7th year, some alumni have already begun to make an impact. A few AI-oriented alumni recently founded AlphaSheets – a collaborative, programmable spreadsheet for finance that is less prone to error – while other students are leading a “hacker house” with people in Silicon Valley. Additionally, SPARC inspired the creation of ESPR, a similar European program explicitly focused on AI risk.

But most impacts will be less tangible. “Different pockets of people interested in different things have been working with SPARC’s resources, and they’re forming a lot of social groups,” Zhang explains. “It’s like a bunch of little sparks and we don’t quite know what they’ll become, but I’m pretty excited about next five years.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

ICRAC Open Letter Opposes Google’s Involvement With Military

From improving medicine to better search engines to assistants that help ease busy schedules, artificial intelligence is already proving a boon to society. But just as it can be designed to help, it can be designed to harm and even to kill.

Military uses of AI can also run the gamut from programs that could help improve food distribution logistics to weapons that can identify and assassinate targets without input from humans. Because AI programs can have these dual uses, it’s difficult for companies who do not want their technology to cause harm to work with militaries – it’s not currently possible for a company to ensure that if it helps the military solve a benign problem with an AI program that the program won’t later be repurposed to take human lives.

So when employees at Google learned earlier this year about the company’s involvement in the Pentagon’s Project Maven, they were upset. Though Google argues that their work on Project Maven only assisted the U.S. military with image recognition tools from drone footage, many suggest that this technology could later be used for harm. In response, over 3,000 employees signed an open letter saying they did not want their work to be used to kill.

And it isn’t just Google’s employees who are concerned.

Earlier this week, the International Committee for Robot Arms Control released an open letter signed by hundreds of academics calling on Google’s leadership to withdraw from the “business of war.” The letter, which is addressed to Google’s leadership, responds to the growing criticism of Google’s participation in the Pentagon’s program, Project Maven.

The letter states, “we write in solidarity with the 3100+ Google employees, joined by other technology workers, who oppose Google’s participation in Project Maven.” It goes on to remind Google leadership to be cognizant of the incredible responsibility the company has for safeguarding the data it’s collected from its users, as well as its famous motto, “Don’t Be Evil.”

Specifically, the letter calls on Google to:

  • “Terminate its Project Maven contract with the DoD.
  • “Commit not to develop military technologies, nor to allow the personal data it has collected to be used for military operations.
  • “Pledge to neither participate in nor support the development, manufacture, trade or use of autonomous weapons; and to support efforts to ban autonomous weapons.”

Lucy Suchman, one of the letter’s authors, explained part of her motivation for her involvement:

“For me the greatest concern is that this effort will lead to further reliance on profiling and guilt by association in the US drone surveillance program, as the only way to generate signal out of the noise of massive data collection. There are already serious questions about the legality of targeted killing, and automating it further will only make it less accountable.”

The letter was released the same week that a small group of Google employees made news for resigning in protest against Project Maven. It also comes barely a month after a successful boycott by academic researchers against KAIST’s autonomous weapons effort.

In addition, last month the United Nations held their most recent meeting to consider a ban on lethal autonomous weapons. 26 countries, including China, have now said they would support some sort of official ban on these weapons.

In response to the number of signatories the open letter has received, Suchman added, “This is clearly an issue that strikes a chord for many researchers who’ve been tracking the incorporation of AI and robotics into military systems.”

If you want to add your name to the letter, you can do so here.

Lethal Autonomous Weapons: An Update from the United Nations

Earlier this month, the United Nations Convention on Conventional Weapons (UN CCW) Group of Governmental Experts met in Geneva to discuss the future of lethal autonomous weapons systems. But before we get to that, here’s a quick recap of everything that’s happened in the last six months.


Slaughterbots and Boycotts

Since its release in November 2017, the video Slaughterbots has been seen approximately 60 million times and has been featured in hundreds of news articles around the world. The video coincided with the UN CCW Group of Governmental Experts’ first meeting in Geneva to discuss a ban on lethal autonomous weapons, as well as the release of open letters from AI researchers in Australia, Canada, Belgium, and other countries urging their heads of state to support an international ban on lethal autonomous weapons.

Over the last two months, autonomous weapons regained the international spotlight. In March, after learning that the Korea Advanced Institute of Science and Technology (KAIST) planned to open an AI weapons lab in collaboration with a major arms company, AI researcher Toby Walsh led an academic boycott of the university. Over 50 of the world’s leading AI and robotics researchers from 30 countries joined the boycott, and in less than a week, KAIST agreed to “not conduct any research activities counter to human dignity including autonomous weapons lacking meaningful human control.” The boycott was covered by CNN and The Guardian.

Additionally, over 3,100 Google employees, including dozens of senior engineers, signed a letter in early April protesting the company’s involvement in a Pentagon program called “Project Maven,” which uses AI to analyze drone imaging. Employees worried that this technology could be repurposed to also operate drones or launch weapons. Citing their “Don’t Be Evil” motto, the employees asked to cancel the project and not to become involved in the “business of war.”


The UN CCW meets again…

In the wake of this growing pressure, 82 countries in the UN CCW met again from April 9-13 to consider a ban on lethal autonomous weapons. Throughout the week, states and civil society representatives discussed “meaningful human control” and whether they should just be concerned about “lethal” autonomous weapons, or all autonomous weapons generally. Here is a brief recap of the meeting’s progress:

  • The group of nations that explicitly endorse the call to ban LAWS expanded to 26 (with China, Austria, Colombia, and Djibouti joining during the CCW meeting.)
  • However, five states explicitly rejected moving to negotiate new international law on fully autonomous weapons: France, Israel, Russia, United Kingdom, and United States.
  • Nearly every nation agreed that it is important to retain human control over autonomous weapons, despite disagreements surrounding the definition of “meaningful human control.”
  • Throughout the discussion, states focused on complying with International Humanitarian Law (IHL). Human Rights Watch argued that there already is precedent in international law and disarmament law for banning weapons without human control.
  • Many countries submitted working papers to inform the discussions, including China and the United States.
  • Although states couldn’t reach an agreement during the meeting, momentum is growing towards solidifying a framework for defining lethal autonomous weapons.

You can find written and video recaps from each day of the UN CCW meeting here, written by Reaching Critical Will.

The UN CCW is slated to resume discussions in August 2018, however, given the speed with which autonomous weaponry is advancing, many advocates worry that they are moving too slowly.


What can you do?

If you work in the tech industry, consider signing the Tech Workers Coalition open letter, which calls on Google, Amazon and Microsoft to stay out of the business of war. And if you’d like to support the fight against LAWS, we recommend donating to the Campaign to Stop Killer Robots. This organization, which is not affiliated with FLI, has done amazing work over the past few years to lead efforts around the world to prevent the development of lethal autonomous weapons. Please consider donating here.


Learn more…

If you want to learn more about the technological, political, and social developments of autonomous weapons, check out the Research & Reports page of our Autonomous Weapons website. You can find relevant news stories and updates at @AIweapons on Twitter and autonomousweapons on Facebook.

AI and Robotics Researchers Boycott South Korea Tech Institute Over Development of AI Weapons Technology

UPDATE 4-9-18: The boycott against KAIST has ended. The press release for the ending of the boycott explained:

“More than 50 of the world’s leading artificial intelligence (AI) and robotics researchers from 30 different countries have declared they would end a boycott of the Korea Advanced Institute of Science and Technology (KAIST), South Korea’s top university, over the opening of an AI weapons lab in collaboration with Hanwha Systems, a major arms company.

“At the opening of the new laboratory, the Research Centre for the Convergence of National Defence and Artificial Intelligence, it was reported that KAIST was “joining the global competition to develop autonomous arms” by developing weapons “which would search for and eliminate targets without human control”. Further cause for concern was that KAIST’s industry partner, Hanwha Systems builds cluster munitions, despite an UN ban, as well as a fully autonomous weapon, the SGR-A1 Sentry Robot. In 2008, Norway excluded Hanwha from its $380 billion future fund on ethical grounds.

“KAIST’s President, Professor Sung-Chul Shin, responded to the boycott by affirming in a statement that ‘KAIST does not have any intention to engage in development of lethal autonomous weapons systems and killer robots.’ He went further by committing that ‘KAIST will not conduct any research activities counter to human dignity including autonomous weapons lacking meaningful human control.’

“Given this swift and clear commitment to the responsible use of artificial intelligence in the development of weapons, the 56 AI and robotics researchers who were signatories to the boycott have rescinded the action. They will once again visit and host researchers from KAIST, and collaborate on scientific projects.”

UPDATE 4-5-18: In response to the boycott, KAIST President Sung-Chul Shin released an official statement to the press. In it, he says:

“I would like to reaffirm that KAIST does not have any intention to engage in development of lethal autonomous weapons systems and killer robots. KAIST is significantly aware of ethical concerns in the application of all technologies including artificial intelligence.

“I would like to stress once again that this research center at KAIST, which was opened in collaboration with Hanwha Systems, does not intend to develop any lethal autonomous weapon systems and the research activities do not target individual attacks.”


Leading artificial intelligence researchers from around the world are boycotting South Korea’s KAIST (Korea Advanced Institute of Science and Technology) after the institute announced a partnership with Hanwha Systems to create a center that will help develop technology for AI weapons systems.

The boycott, organized by AI researcher Toby Walsh, was announced just days before the start of the next United Nations Convention on Conventional Weapons (CCW) meeting in which countries will discuss how to address challenges posed by autonomous weapons. 

“At a time when the United Nations is discussing how to contain the threat posed to international security by autonomous weapons, it is regrettable that a prestigious institution like KAIST looks to accelerate the arms race to develop such weapons,” the boycott letter states. 

The letter also explains the concerns AI researchers have regarding autonomous weapons:

“If developed, autonomous weapons will be the third revolution in warfare. They will permit war to be fought faster and at a scale greater than ever before. They have the potential to be weapons of terror. Despots and terrorists could use them against innocent populations, removing any ethical restraints. This Pandora’s box will be hard to close if it is opened.”

The letter has been signed by over 50 of the world’s leading AI and robotics researchers from 30 countries, including professors Yoshua Bengio, Geoffrey Hinton, Stuart Russell, and Wolfram Burgard.

Explaining the boycott, the letter states:

“We therefore publicly declare that we will boycott all collaborations with any part of KAIST until such time as the President of KAIST provides assurances, which we have sought but not received, that the Center will not develop autonomous weapons lacking meaningful human control. We will, for example, not visit KAIST, host visitors from KAIST, or contribute to any research project involving KAIST.”

In February, the Korean Times reported on the opening of the Research Center for the Convergence of National Defense and Artificial Intelligence, which was formed as a partnership between KAIST and Hanwha to “[join] the global competition to develop autonomous arms.” The Korean Times article added that “researchers from the university and Hanwha will carry out various studies into how technologies of the Fourth Industrial Revolution can be utilized on future battlefields.”

In the press release for the boycott, Walsh referenced concerns that he and other AI researchers have had since 2015, when he and FLI released an open letter signed by thousands of researchers calling for a ban on autonomous weapons.

“Back in 2015, we warned of an arms race in autonomous weapons,” said Walsh. “That arms race has begun. We can see prototypes of autonomous weapons under development today by many nations including the US, China, Russia and the UK. We are locked into an arms race that no one wants to happen. KAIST’s actions will only accelerate this arms race.”

Many organizations and people have come together through the Campaign to Stop Killer Robots to advocate for a UN ban on lethal autonomous weapons. In her summary of the last United Nations CCW meeting in November, 2017, Ray Acheson of Reaching Critical Will wrote:

“It’s been four years since we first began to discuss the challenges associated with the development of autonomous weapon systems (AWS) at the United Nations. … But the consensus-based nature of the Convention on Certain Conventional Weapons (CCW) in which these talks have been held means that even though the vast majority of states are ready and willing to take some kind of action now, they cannot because a minority opposes it.”

Walsh adds, “I am hopeful that this boycott will add urgency to the discussions at the UN that start on Monday. It sends a clear message that the AI & Robotics community do not support the development of autonomous weapons.”

To learn more about autonomous weapons and efforts to ban them, visit the Campaign to Stop Killer Robots and autonomousweapons.org. The full open letter and signatories are below.

Open Letter:

As researchers and engineers working on artificial intelligence and robotics, we are greatly concerned by the opening of a “Research Center for the Convergence of National Defense and Artificial Intelligence” at KAIST in collaboration with Hanwha Systems, South Korea’s leading arms company. It has been reported that the goals of this Center are to “develop artificial intelligence (AI) technologies to be applied to military weapons, joining the global competition to develop autonomous arms.”

At a time when the United Nations is discussing how to contain the threat posed to international security by autonomous weapons, it is regrettable that a prestigious institution like KAIST looks to accelerate the arms race to develop such weapons. We therefore publicly declare that we will boycott all collaborations with any part of KAIST until such time as the President of KAIST provides assurances, which we have sought but not received, that the Center will not develop autonomous weapons lacking meaningful human control. We will, for example, not visit KAIST, host visitors from KAIST, or contribute to any research project involving KAIST.

If developed, autonomous weapons will be the third revolution in warfare. They will permit war to be fought faster and at a scale greater than ever before. They have the potential to be weapons of terror. Despots and terrorists could use them against innocent populations, removing any ethical restraints. This Pandora’s box will be hard to close if it is opened. As with other technologies banned in the past like blinding lasers, we can simply decide not to develop them. We urge KAIST to follow this path, and work instead on uses of AI to improve and not harm human lives.



Alphabetically by country, then by family name.

  • Prof. Toby Walsh, USNW Sydney, Australia.
  • Prof. Mary-Anne Williams, University of Technology Sydney, Australia.
  • Prof. Thomas Either, TU Wein, Austria.
  • Prof. Paolo Petta, Austrian Research Institute for Artificial Intelligence, Austria.
  • Prof. Maurice Bruynooghe, Katholieke Universiteit Leuven, Belgium.
  • Prof. Marco Dorigo, Université Libre de Bruxelles, Belgium.
  • Prof. Luc De Raedt, Katholieke Universiteit Leuven, Belgium.
  • Prof. Andre C. P. L. F. de Carvalho, University of São Paulo, Brazil.
  • Prof. Yoshua Bengio, University of Montreal, & scientific director of MILA, co-founder of Element AI, Canada.
  • Prof. Geoffrey Hinton, University of Toronto, Canada.
  • Prof. Kevin Leyton-Brown, University of British Columbia, Canada.
  • Prof. Csaba Szepesvari, University of Alberta, Canada.
  • Prof. Zhi-Hua Zhou,Nanjing University, China.
  • Prof. Thomas Bolander, Danmarks Tekniske Universitet, Denmark.
  • Prof. Malik Ghallab, LAAS-CNRS, France.
  • Prof. Marie-Christine Rousset, University of Grenoble Alpes, France.
  • Prof. Wolfram Burgard, University of Freiburg, Germany.
  • Prof. Bernd Neumann, University of Hamburg, Germany.
  • Prof. Bernhard Schölkopf, Director, Max Planck Institute for Intelligent Systems, Germany.
  • Prof. Manolis Koubarakis, National and Kapodistrian University of Athens, Greece.
  • Prof. Grigorios Tsoumakas, Aristotle University of Thessaloniki, Greece.
  • Prof. Benjamin W. Wah, Provost, The Chinese University of Hong Kong, Hong Kong.
  • Prof. Dit-Yan Yeung, Hong Kong University of Science and Technology, Hong Kong.
  • Prof. Kristinn R. Thórisson, Managing Director, Icelandic Institute for Intelligent Machines, Iceland.
  • Prof. Barry Smyth, University College Dublin, Ireland.
  • Prof. Diego Calvanese, Free University of Bozen-Bolzano, Italy.
  • Prof. Nicola Guarino, Italian National Research Council (CNR), Trento, Italy.
  • Prof. Bruno Siciliano, University of Naples, Italy.
  • Prof. Paolo Traverso, Director of FBK, IRST, Italy.
  • Prof. Yoshihiko Nakamura, University of Tokyo, Japan.
  • Prof. Imad H. Elhajj, American University of Beirut, Lebanon.
  • Prof. Christoph Benzmüller, Université du Luxembourg, Luxembourg.
  • Prof. Miguel Gonzalez-Mendoza, Tecnológico de Monterrey, Mexico.
  • Prof. Raúl Monroy, Tecnológico de Monterrey, Mexico.
  • Prof. Krzysztof R. Apt, Center Mathematics and Computer Science (CWI), Amsterdam, the Netherlands.
  • Prof. Angat van den Bosch, Radboud University, the Netherlands.
  • Prof. Bernhard Pfahringer, University of Waikato, New Zealand.
  • Prof. Helge Langseth, Norwegian University of Science and Technology, Norway.
  • Prof. Zygmunt Vetulani, Adam Mickiewicz University in Poznań, Poland.
  • Prof. José Alferes, Universidade Nova de Lisboa, Portugal.
  • Prof. Luis Moniz Pereira, Universidade Nova de Lisboa, Portugal.
  • Prof. Ivan Bratko, University of Ljubljana, Slovenia.
  • Prof. Matjaz Gams, Jozef Stefan Institute and National Council for Science, Slovenia.
  • Prof. Hector Geffner, Universitat Pompeu Fabra, Spain.
  • Prof. Ramon Lopez de Mantaras, Director, Artificial Intelligence Research Institute, Spain.
  • Prof. Alessandro Saffiotti, Orebro University, Sweden.
  • Prof. Boi Faltings, EPFL, Switzerland.
  • Prof. Jürgen Schmidhuber, Scientific Director, Swiss AI Lab, Universià della Svizzera italiana, Switzerland.
  • Prof. Chao-Lin Liu, National Chengchi University, Taiwan.
  • Prof. J. Mark Bishop, Goldsmiths, University of London, UK.
  • Prof. Zoubin Ghahramani, University of Cambridge, UK.
  • Prof. Noel Sharkey, University of Sheffield, UK.
  • Prof. Luchy Suchman, Lancaster University, UK.
  • Prof. Marie des Jardins, University of Maryland, USA.
  • Prof. Benjamin Kuipers, University of Michigan, USA.
  • Prof. Stuart Russell, University of California, Berkeley, USA.
  • Prof. Bart Selman, Cornell University, USA.


2018 Spring Conference: Invest in Minds Not Missiles

On Saturday April 7th and Sunday morning April 8th, MIT and Massachusetts Peace Action will co-host a conference and workshop at MIT on understanding and reducing the risk of nuclear war. Tickets are free for students. To attend, please register here.


Saturday sessions


Sunday Morning Planning Breakfast

Student-led session to design and implement programs enhancing existing campus groups, and organizing new ones; extending the network to campuses in Rhode Island, Connecticut, New Jersey, New Hampshire, Vermont and Maine.

For more information, contact Jonathan King at <jaking@mit.edu>, or call 617-354-2169

How AI Handles Uncertainty: An Interview With Brian Ziebart

When training image detectors, AI researchers can’t replicate the real world. They teach systems what to expect by feeding them training data, such as photographs, computer-generated images, real video and simulated video, but these practice environments can never capture the messiness of the physical world.

In machine learning (ML), image detectors learn to spot objects by drawing bounding boxes around them and giving them labels. And while this training process succeeds in simple environments, it gets complicated quickly.








It’s easy to define the person on the left, but how would you draw a bounding box around the person on the right? Would you only include the visible parts of his body, or also his hidden torso and legs? These differences may seem trivial, but they point to a fundamental problem in object recognition: there rarely is a single best way to define an object.

As this second image demonstrates, the real world is rarely clear-cut, and the “right” answer is usually ambiguous. Yet when ML systems use training data to develop their understanding of the world, they often fail to reflect this. Rather than recognizing uncertainty and ambiguity, these systems often confidently approach new situations no differently than their training data, which can put the systems and humans at risk.

Brian Ziebart, a Professor of Computer Science at the University of Illinois at Chicago, is conducting research to improve AI systems’ ability to operate amidst the inherent uncertainty around them. The physical world is messy and unpredictable, and if we are to trust our AI systems, they must be able to safely handle it.


Overconfidence in ML Systems

ML systems will inevitably confront real-world scenarios that their training data never prepared them for. But, as Ziebart explains, current statistical models “tend to assume that the data that they’ll see in the future will look a lot like the data they’ve seen in the past.”

As a result, these systems are overly confident that they know what to do when they encounter new data points, even when those data points look nothing like what they’ve seen. ML systems falsely assume that their training prepared them for everything, and the resulting overconfidence can lead to dangerous consequences.

Consider image detection for a self-driving car. A car might train its image detection on data from the dashboard of another car, tracking the visual field and drawing bounding boxes around certain objects, as in the image below:

Bounding boxes on a highway – CloudFactory Blog













For clear views like this, image detectors excel. But the real world isn’t always this simple. If researchers train an image detector on clean, well-lit images in the lab, it might accurately recognize objects 80% of the time during the day. But when forced to navigate roads on a rainy night, it might drop to 40%.

“If you collect all of your data during the day and then try to deploy the system at night, then however it was trained to do image detection during the day just isn’t going to work well when you generalize into those new settings,” Ziebart explains.

Moreover, the ML system might not recognize the problem: since the system assumes that its training covered everything, it will remain confident about its decisions and continue “to make strong predictions that are just inaccurate,” Ziebart adds.

In contrast, humans tend to recognize when previous experience doesn’t generalize into new settings. If a driver spots an unknown object ahead in the road, she wouldn’t just plow through the object. Instead, she might slow down, pay attention to how other cars respond to the object, and consider swerving if she can do so safely. When humans feel uncertain about our environment, we exercise caution to avoid making dangerous mistakes.

Ziebart would like AI systems to incorporate similar levels of caution in uncertain situations. Instead of confidently making mistakes, a system should recognize its uncertainty and ask questions to glean more information, much like an uncertain human would.


An Adversarial Approach

Training and practice may never prepare AI systems for every possible situation, but researchers can make their training methods more foolproof. Ziebart posits that feeding systems messier data in the lab can train them to better recognize and address uncertainty.

Conveniently, humans can provide this messy, real-world data. By hiring a group of human annotators to look at images and draw bounding boxes around certain objects – cars, people, dogs, trees, etc. – researchers can “build into the classifier some idea of what ‘normal’ data looks like,” Ziebart explains.

“If you ask ten different people to provide these bounding boxes, you’re likely to get back ten different bounding boxes,” he says. “There’s just a lot of inherent ambiguity in how people think about the ground truth for these things.”

Returning to the image above of the man in the car, human annotators might give ten different bounding boxes that capture different portions of the visible and hidden person. By feeding ML systems this confusing and contradictory data, Ziebart prepares them to expect ambiguity.

“We’re synthesizing more noise into the data set in our training procedure,” Ziebart explains. This noise reflects the messiness of the real world, and trains systems to be cautious when making predictions in new environments. Cautious and uncertain, AI systems will seek additional information and learn to navigate the confusing situations they encounter.

Of course, self-driving cars shouldn’t have to ask questions. If a car’s image detection spots a foreign object up ahead, for instance, it won’t have time to ask humans for help. But if it’s trained to recognize uncertainty and act cautiously, it might slow down, detect what other cars are doing, and safely navigate around the object.


Building Blocks for Future Machines

Ziebart’s research remains in training settings thus far. He feeds systems messy, varied data and trains them to provide bounding boxes that have at least 70% overlap with people’s bounding boxes. And his process has already produced impressive results. On an ImageNet object detection task investigated in collaboration with Sima Behpour (University of Illinois at Chicago) and Kris Kitani (Carnegie Mellon University), for example, Ziebart’s adversarial approach “improves performance by over 16% compared to the best performing data augmentation method.” Trained to operate amidst uncertain environments, these systems more effectively manage new data points that training didn’t explicitly prepare them for.

But while Ziebart trains relatively narrow AI systems, he believes that this research can scale up to more advanced systems like autonomous cars and public transit systems.

“I view this as kind of a fundamental issue in how we design these predictors,” he says. “We’ve been trying to construct better building blocks on which to make machine learning – better first principles for machine learning that’ll be more robust.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Stephen Hawking in Memoriam

As we mourn the loss of Stephen Hawking, we should remember that his legacy goes far beyond science. Yes, of course he was one of the greatest scientists of the past century, discovering that black holes evaporate and helping found the modern quest for quantum gravity. But he also had a remarkable legacy as a social activist, who looked far beyond the next election cycle and used his powerful voice to bring out the best in us all. As a founding member of FLI’s Scientific Advisory board, he tirelessly helped us highlight the importance of long-term thinking and ensuring that we use technology to help humanity flourish rather than flounder. I marveled at how he could sometimes answer my emails faster than my grad students. His activism revealed the same visionary fearlessness as his scientific and personal life: he saw further ahead than most of those around him and wasn’t afraid of controversially sounding the alarm about humanity’s sloppy handling of powerful technology, from nuclear weapons to AI.

On a personal note, I’m saddened to have lost not only a long-time collaborator but, above all, a great inspiration, always reminding me of how seemingly insurmountable challenges can be overcome with creativity, willpower and positive attitude. Thanks Stephen for inspiring us all!

Can Global Warming Stay Below 1.5 Degrees? Views Differ Among Climate Scientists

The Paris Climate Agreement seeks to keep global warming well below 2 degrees Celsius relative to pre-industrial temperatures. In the best case scenario, warming would go no further than 1.5 degrees.

Many scientists see this as an impossible goal. A recent study by Peter Cox et al. postulates that, given a twofold increase in atmospheric carbon dioxide, there is only a 3% chance of keeping warming below 1.5 degrees.

But a study by Richard Miller et al. provides more reason for hope. The Miller report concludes that the 1.5 degree limit is still physically feasible, if only narrowly. It also provides an updated “carbon budget”—a projection of how much more carbon dioxide we can emit without breaking the 1.5 degree limit.

Dr. Joeri Rogelj, a climate scientist and research scholar with the Energy Program of the International Institute for Applied Systems Analysis, co-authored the Miller report. For Rogelj, the updated carbon budget is not the paper’s most important point. “Our paper shows to decision makers the importance of anticipating new and updated scientific knowledge,” he says.

Projected “carbon budgets” are rough estimates based on limited observations. These projections need to be continually updated as more data becomes available. Fortunately, the Paris Agreement calls for countries to periodically update their emission reduction pledges based on new estimates. Rogelj is hopeful “that this paper has put the necessity for a strong [updating] process on the radar of delegates.”

For scientists who have dismissed the 1.5 degree limit as impossible, the updating process might seem pointless. But Rogelj stresses that his team looked only at geophysical limitations, not political ones. Their report assumes that countries will agree to a zero emissions commitment—a much more ambitious scenario than other researchers have considered.

There is a misconception, Rogelj says, that the report claims to have found an inaccuracy in the Earth system models (ESMs) that are used to estimate human-driven warming. “We are using precisely those models to estimate the carbon budget from today onward,” Rogelj explains.

The problem is not the models, but rather the data fed into them. These simulations are often run using inexact projections of CO2 emissions. Over time, small discrepancies accumulate and are reflected in the warming predictions that the models make.

Given information about current CO2 emissions, however, ESMs make temperature predictions that are “quite accurate.” And when they are provided with an ambitious future scenario for emissions reduction, the models indicate that it is possible for global temperature increases to remain below 1.5 degrees.

So what would such a scenario look like? First off, emissions have to fall to zero. At the same time, the carbon budget needs to be continually reevaluated, and strategy changes must be based on the updated budget. For example, if emissions fall to zero but we’ve surpassed our carbon budget, then we’ll need to focus on making our emissions negative—in other words, on carbon dioxide removal.

Rogelj names two major processes for carbon dioxide removal: reforestation and bio-energy with carbon capture and storage. Some negative emissions processes, such as reforestation, provide benefits beyond carbon capture, while others may have undesired side effects.

But Rogelj is quick to add that these negative emissions technologies are not “silver bullets.” It’s too soon to know if carbon dioxide removal at a global scale will actually be necessary—we’ll have to get to zero emissions before we can tell. But such technologies could also help us reach zero in the first place.

What else will get us to zero emissions? According to Rogelj, we need “a strong emphasis on energy efficiency, combined with an electrification of end-use sectors like transport and building and a shift away from fossil fuels.” This will require a major shift in investment patterns. We want to avoid “locking into carbon dioxide-intensive infrastructure” that would saddle future generations with a dependency on non-renewable energy, he explains.

Rogelj stresses that his team’s findings are based only on geophysical data. Societal factors are a different matter: It is up to individual countries to decide where reducing emissions falls on their list of priorities.

However, the stipulation in the Paris Climate Agreement that countries periodically update their pledges is a source of optimism. Rogelj, for his part, is cautiously hopeful: “Looking at real world dynamics in terms of costs of renewables and energy storage, I personally think there is room for pledges to be strengthened over the coming five to ten years as countries better understand what is possible and how these pledges can align with other priorities.”

But not everyone in the scientific community shares the hopeful tone struck by Rogelj and his team. An article by the MIT Technology Review outlines “the five most worrisome climate developments” from 2017.

To start, global emissions are on the rise, up 2% from 2016. While the prior few years had seen a relative flattening in emissions, this more recent data shattered hopes that the trend would continue. On top of that, scientists are finding that observable climate trends line up best with “worst-case scenario” models of global warming—that is, global temperatures could rise five degrees in the next century.

And the arctic is melting much faster than scientists predicted. A recent report by the U.S. National Oceanic and Atmospheric Administration (NOAA) declared “that the North Pole had reached a ‘new normal,’ with no sign of returning to a ‘reliably frozen region.’”

Melting glaciers and sea ice trigger a whole new set of problems. The disappearing ice will cause sea levels to rise, and the “reflective white snow and ice [will] turn into heat-absorbing dark-blue water…[meaning] the Arctic will send less heat back into space, which leads to more warming, more melting, and more sea-level rise still.”

And finally, natural disasters are becoming increasingly ferocious as weather patterns mutate. The United States saw this first-hand, with massive wildfires on the west coast—including the largest ever in California’s history—and a string of hurricanes that ravaged the Virgin Islands, Puerto Rico, and many southern states.

These consequences of global warming are beginning to affect areas of social interest beyond the environment. The 2017 Atlantic hurricane season, for example, has been a massive economic burden, wracking up more than $200 billion in damages.

In Rogelj’s words, “Right now we really need to find ways to achieve multiple societal objectives, to find policies and measures and options that allow us to achieve those together.” As governments come to see how climate protection “can align with other priorities like reducing air pollution, and providing clean water and reliable energy,” we have reason to hope that it may become a higher and higher priority.

How to Prepare for the Malicious Use of AI

How can we forecast, prevent, and (when necessary) mitigate the harmful effects of malicious uses of AI?

This is the question posed by a 100-page report released last week, written by 26 authors from 14 institutions. The report, which is the result of a two-day workshop in Oxford, UK followed by months of research, provides a sweeping landscape of the security implications of artificial intelligence.

The authors, who include representatives from the Future of Humanity Institute, the Center for the Study of Existential Risk, OpenAI, and the Center for a New American Security, argue that AI is not only changing the nature and scope of existing threats, but also expanding the range of threats we will face. They are excited about many beneficial applications of AI, including the ways in which it will assist defensive capabilities. But the purpose of the report is to survey the landscape of security threats from intentionally malicious uses of AI.

“Our report focuses on ways in which people could do deliberate harm with AI,” said Seán Ó hÉigeartaigh, Executive Director of the Cambridge Centre for the Study of Existential Risk. “AI may pose new threats, or change the nature of existing threats, across cyber, physical, and political security.”

Importantly, this is not a report about a far-off future. The only technologies considered are those that are already available or that are likely to be within the next five years. The message therefore is one of urgency. We need to acknowledge the risks and take steps to manage them because the technology is advancing exponentially. As reporter Dave Gershgorn put it, “Every AI advance by the good guys is an advance for the bad guys, too.”

AI systems tend to be more efficient and more scalable than traditional tools. Additionally, the use of AI can increase the anonymity and psychological distance a person feels to the actions carried out, potentially lowering the barrier to committing crimes and acts of violence. Moreover, AI systems have their own unique vulnerabilities including risks from data poisoning, adversarial examples, and the exploitation of flaws in their design. AI-enabled attacks will outpace traditional cyberattacks because they will generally be more effective, more finely targeted, and more difficult to attribute.

The kinds of attacks we need to prepare for are not limited to sophisticated computer hacks. The authors suggest there are three primary security domains: digital security, which largely concerns cyberattacks; physical security, which refers to carrying out attacks with drones and other physical systems; and political security, which includes examples such as surveillance, persuasion via targeted propaganda, and deception via manipulated videos. These domains have significant overlap, but the framework can be useful for identifying different types of attacks, the rationale behind them, and the range of options available to protect ourselves.

What can be done to prepare for malicious uses of AI across these domains? The authors provide many good examples. The scenarios described in the report can be a good way for researchers and policymakers to explore possible futures and brainstorm ways to manage the most critical threats. For example, imagining a commercial cleaning robot being repurposed as a non-traceable explosion device may scare us, but it also suggests why policies like robot registration requirements may be a useful option.

Each domain also has its own possible points of control and countermeasures. For example, to improve digital security, companies can promote consumer awareness and incentivize white hat hackers to find vulnerabilities in code. We may also be able to learn from the cybersecurity community and employ measures such as red teaming for AI development, formal verification in AI systems, and responsible disclosure of AI vulnerabilities. To improve physical security, policymakers may want to regulate hardware development and prohibit sales of lethal autonomous weapons. Meanwhile, media platforms may be able to minimize threats to political security by offering image and video authenticity certification, fake news detection, and encryption.

The report additionally provides four high level recommendations, which are not intended to provide specific technical or policy proposals, but rather to draw attention to areas that deserve further investigation. The recommendations are the following:

Recommendation #1: Policymakers should collaborate closely with technical researchers to investigate, prevent, and mitigate potential malicious uses of AI.

Recommendation #2: Researchers and engineers in artificial intelligence should take the dual-use nature of their work seriously, allowing misuse-related considerations to influence research priorities and norms, and proactively reaching out to relevant actors when harmful applications are foreseeable.

Recommendation #3: Best practices should be identified in research areas with more mature methods for addressing dual- use concerns, such as computer security, and imported where applicable to the case of AI.

Recommendation #4: Actively seek to expand the range of stakeholders and domain experts involved in discussions of these challenges.

Finally, the report identifies several areas for further research. The first of these is to learn from and with the cybersecurity community because the impacts of cybersecurity incidents will grow as AI-based systems become more widespread and capable. Other areas of research include exploring different openness models, promoting a culture of responsibility among AI researchers, and developing technological and policy solutions.

As the authors state, “The malicious use of AI will impact how we construct and manage our digital infrastructure as well as how we design and distribute AI systems, and will likely require policy and other institutional responses.”

Although this is only the beginning of the understanding needed on how AI will impact global security, this report moves the discussion forward. It not only describes numerous emergent security concerns related to AI, but also suggests ways we can begin to prepare for those threats today.

MIRI’s February 2018 Newsletter


News and links

  • In “Adversarial Spheres,” Gilmer et al. investigate the tradeoff between test error and vulnerability to adversarial perturbations in many-dimensional spaces.
  • Recent posts on Less Wrong: Critch on “Taking AI Risk Seriously” and Ben Pace’s background model for assessing AI x-risk plans.
  • Solving the AI Race“: GoodAI is offering prizes for proposed responses to the problem that “key stakeholders, including [AI] developers, may ignore or underestimate safety procedures, or agreements, in favor of faster utilization”.
  • The Open Philanthropy Project is hiring research analysts in AI alignment, forecasting, and strategy, along with generalist researchers and operations staff.

This newsletter was originally posted on MIRI’s website.

Optimizing AI Safety Research: An Interview With Owen Cotton-Barratt

Artificial intelligence poses a myriad of risks to humanity. From privacy concerns, to algorithmic bias and “black box” decision making, to broader questions of value alignment, recursive self-improvement, and existential risk from superintelligence — there’s no shortage of AI safety issues.  

AI safety research aims to address all of these concerns. But with limited funding and too few researchers, trade-offs in research are inevitable. In order to ensure that the AI safety community tackles the most important questions, researchers must prioritize their causes.

Owen Cotton-Barratt, along with his colleagues at the Future of Humanity Institute (FHI) and the Centre for Effective Altruism (CEA), looks at this ‘cause prioritization’ for the AI safety community. They analyze which projects are more likely to help mitigate catastrophic or existential risks from highly-advanced AI systems, especially artificial general intelligence (AGI). By modeling trade-offs between different types of research, Cotton-Barratt hopes to guide scientists toward more effective AI safety research projects.


Technical and Strategic Work

The first step of cause prioritization is understanding the work already being done. Broadly speaking, AI safety research happens in two domains: technical work and strategic work.

AI’s technical safety challenge is to keep machines safe and secure as they become more capable and creative. By making AI systems more predictable, more transparent, and more robustly aligned with our goals and values, we can significantly reduce the risk of harm. Technical safety work includes Stuart Russell’s research on reinforcement learning and Dan Weld’s work on explainable machine learning, since they’re improving the actual programming in AI systems.

In addition, the Machine Intelligence Research Institute (MIRI) recently released a technical safety agenda aimed at aligning machine intelligence with human interests in the long term, while OpenAI, another non-profit AI research company, is investigating the “many research problems around ensuring that modern machine learning systems operate as intended,” following suggestions from the seminal paper Concrete Problems in AI Safety.

Strategic safety work is broader, and asks how society can best prepare for and mitigate the risks of powerful AI. This research includes analyzing the political environment surrounding AI development, facilitating open dialogue between research areas, disincentivizing arms races, and learning from game theory and neuroscience about probable outcomes for AI. Yale professor Allan Dafoe has recently focused on strategic work, researching the international politics of artificial intelligence and consulting for governments, AI labs and nonprofits about AI risks. And Yale bioethicist Wendell Wallach, apart from his work on “silo busting,” is researching forms of global governance for AI.

Cause prioritization is strategy work, as well. Cotton-Barratt explains, “Strategy work includes analyzing the safety landscape itself and considering what kind of work do we think we’re going to have lots of, what are we going to have less of, and therefore helping us steer resources and be more targeted in our work.”












Who Needs More Funding?

As the graph above illustrates, AI safety spending has grown significantly since 2015. And while more money doesn’t always translate into improved results, funding patterns are easy to assess and can say a lot about research priorities. Seb Farquhar, Cotton-Barratt’s colleague at CEA, wrote a post earlier this year analyzing AI safety funding and suggesting ways to better allocate future investments.

To start, he suggests that the technical research community acquire more personal investigators to take the research agenda, detailed in Concrete Problems in AI Safety, forward. OpenAI is already taking a lead on this. Additionally, the community should go out of its way to ensure that emerging AI safety centers hire the best candidates, since these researchers will shape each center’s success for years to come.

In general, Farquhar notes that strategy, outreach and policy work haven’t kept up with the overall growth of AI safety research. He suggests that more people focus on improving communication about long-run strategies between AI safety research teams, between the AI safety community and the broader AI community, and between policymakers and researchers. Building more PhD and Masters courses on AI strategy and policy could establish a pipeline to fill this void, he adds.

To complement Farquhar’s data, Cotton-Barratt’s colleague Max Dalton created a mathematical model to track how more funding and more people working on a safety problem translate into useful progress or solutions. The model tries to answer such questions as: if we want to reduce AI’s existential risks, how much of an effect do we get by investing money in strategy research versus technical research?

In general, technical research is easier to track than strategic work in mathematical models. For example, spending more on strategic ethics research may be vital for AI safety, but it’s difficult to quantify that impact. Improving models of reinforcement learning, however, can produce safer and more robustly-aligned machines. With clearer feedback loops, these technical projects fit best with Dalton’s models.


Near-sightedness and AGI

But these models also confront major uncertainty. No one really knows when AGI will be developed, and this makes it difficult to determine the most important research. If AGI will be developed in five years, perhaps researchers should focus only on the most essential safety work, such as improving transparency in AI systems. But if we have thirty years, researchers can probably afford to dive into more theoretical work.

Moreover, no one really knows how AGI will function. Machine learning and deep neural networks have ushered in a new AI revolution, but AGI will likely be developed on architectures far different from AlphaGo and Watson.

This makes some long-term safety research a risky investment, even if, as many argue, it is the most important research we can do. For example, researchers could spend years making deep neural nets safe and transparent, only to find their work wasted when AGI develops on an entirely different programming architecture.

Cotton-Barratt attributes this issue to ‘nearsightedness,’ and discussed it in a recent talk at Effective Altruism Global this summer. Humans often can’t anticipate disruptive change, and AI researchers are no exception.

“Work that we might do for long-term scenarios might turn out to be completely confused because we weren’t thinking of the right type of things,” he explains. “We have more leverage over the near-term scenarios because we’re more able to assess what they’re going to look like.”

Any additional AI safety research is better than none, but given the unknown timelines and the potential gravity of AI’s threats to humanity, we’re better off pursuing — to the extent possible — the most effective AI safety research.

By helping the AI research portfolio advance in a more efficient and comprehensive direction, Cotton-Barratt and his colleagues hope to ensure that when machines eventually outsmart us, we will have asked — and hopefully answered — the right questions.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. If you’re interested in applying for our 2018 grants competition, please see this link.

As Acidification Increases, Ocean Biodiversity May Decline

Dubbed “the evil twin of global warming,” ocean acidification is a growing crisis that poses a threat to both water-dwelling species and human communities that rely on the ocean for food and livelihood.

Since pre-industrial times, the ocean’s pH has dropped from 8.2 to 8.1—a change that may seem insignificant, but actually represents a 30 percent increase in acidity. As the threat continues to mount, the German research project  BIOACID (Biological Impacts of Ocean Acidification) seeks to provide a better understanding of the phenomenon by studying its effects around the world.

BIOACID began in 2009, and since that time, over 250 German researchers  have contributed more than 580 publications to the scientific discourse on the effects of acidification and how the  oceans are changing.

The organization recently released a report that synthesizes their most notable findings for climate negotiators and decision makers. Their work explores “how different marine species respond to ocean acidification, how these reactions impact the food web as well as material cycles and energy turnover in the ocean, and what consequences these changes have for economy and society.”

Field research for the project has spanned multiple oceans, where key species and communities have been studied under natural conditions. In the laboratory, researchers have also been able to test for coming changes by exposing organisms to simulated future conditions.

Their results indicate that acidification is only one part of a larger problem. While organisms might be capable of adapting to the shift in pH, acidification is typically accompanied by other environmental stressors that make adaptation all the more difficult.

In some cases, marine life that had been able to withstand acidification by itself could not tolerate the additional stress of increased water temperatures, researchers found. Other factors like pollution and eutrophication—an excess of nutrients—compounded the harm.

Further, rising water temperatures are forcing many species to abandon part or all of their original habitats, wreaking additional havoc on ecosystems. And a 1.2 degree increase in global temperature—which is significantly under the 2 degree limit set in the Paris Climate Agreements—is expected to kill at least half of the world’s tropical coral reefs.

Acidification itself is a multipronged threat. When carbon dioxide is absorbed by the ocean, a series of chemical reactions take place. These reactions have two important outcomes: acid levels increase and the compound carbonate is transformed into bicarbonate. Both of these results have widespread effects on the organisms who make their homes in our oceans.

Increased acidity has a particularly harmful effect on organisms in their early life stages, such as fish larvae. This means, among other things, the depletion of fish stocks—a cornerstone of the economy as well as diet in many human communities. Researchers “have found that both [acidification and warming] work synergistically, especially on the most sensitive early life stages of [fish] as well as embryo and larval survival.”

Many species are harmed as well by the falling levels of carbonate, which is an essential building block for organisms like coral, mussels, and some plankton. Like all calcifying corals, the cold-water coral species Lophelia pertusa builds its skeleton from calcium carbonate. Some research suggests that acidification threatens both to slow its growth and to corrode the dead branches that are no longer protected by organic matter.

As a “reef engineer,” Lophelia is home to countless species; as it suffers, so will they. The BIOACID report warns: “[T]o definitely preserve the magnificent oases of biodiversity founded by Lophelia pertusa, effects of climate change need to be minimised even now–while science continues to investigate this complex marine ecosystem.”

Even those organisms not directly affected by acidification may find themselves in trouble as their ecosystems are thrown out of balance. Small changes at the bottom of the food web, for example, may have big effects at higher trophic levels. In the Artic, Limacina helicina—a tiny swimming snail or “sea butterfly—is a major source of food for many marine animals. The polar cod species Boreogadus saida, which feeds on Limacina, is a key food source for larger fish, birds, and mammals such as whales and seals.

As acidification increases, research suggests that Limacina’s nutrional value will decrease as its metabolism and shell growth are affected; its numbers, too, will likely drop. With the disappearance of this prey, the polar cod will likely suffer. Diminishing cod populations will in turn affect the many predators who feed on them.

Even where acidification stands to benefit a particular species, the overall impact on the ecosystem can be negative. In the Baltic Sea, BIOACID scientists have found that Nodularia spumigena, a species of cyanobacteria, “manages perfectly with water temperatures above 16 degrees Celsius and elevated carbon dioxide concentrations–whereas other organisms already reach their limits at less warming.”

Nodularia becomes more productive under acidified conditions, producing bacterial “blooms” that can extend upwards of 60,000 square kilometers in the Baltic Sea. These blooms block light from other organisms, and as dead bacteria degrade near the ocean floor they take up precious oxygen. The cells also release toxins that are harmful to marine animals and humans alike.

Ultimately biodiversity, “a basic requirement for ecosystem functioning and ultimately even human wellbeing,” will be lost. Damage to tropical coral reefs, which are home to one quarter of all marine species, could drastically reduce the ocean’s biodiversity. And as biodiversity decreases, an ecosystem becomes more fragile: ecological functions that were once performed by several different species become entirely dependent on only one.

And the diversity of marine ecosystems is not the only thing at stake. Currently, the ocean plays a major mitigating role in global warming, absorbing around 30 percent of the carbon dioxide emitted by humans. It also absorbs over 90 percent of the heat produced by the greenhouse effect. But as acidification continues, the ocean will take up less and less carbon dioxide—meaning we may see an increase in the rate of global warming.

The ocean controls carbon dioxide uptake in part through a biological mechanism known as the carbon pump. Normally, phytoplankton near the ocean’s surface take up carbon dioxide and then sink towards the ocean floor. This process lowers surface carbon dioxide concentrations, facilitating its uptake from the atmosphere.

But acidification weakens this biological carbon pump. Researchers have found that acidified conditions favor smaller types of phytoplankton, which sink more slowly. In addition, heavier calcifying plankton—which typically propel the pump by sinking more quickly—will have increasing difficulty forming their weighty calcium carbonate shells. As the pump’s efficiency decreases, so will the uptake of carbon dioxide from the air.

The BIOACID report stresses that the risks of acidification remain largely uncertain. However, despite — or perhaps because of — this, society must tread cautiously with care of the oceans. The report explains, “Following the precautionary principle is the best way to act when considering potential risks to the environment and humankind, including future generations.”

Transparent and Interpretable AI: an interview with Percy Liang

At the end of 2017, the United States House of Representatives passed a bill called the SELF DRIVE Act, laying out an initial federal framework for autonomous vehicle regulation. Autonomous cars have been undergoing testing on public roads for almost two decades. With the passing of this bill, along with the increasing safety benefits of autonomous vehicles, it is likely that they will become even more prevalent in our daily lives. This is true for numerous autonomous technologies including those in the medical, legal, and safety fields – just to name a few.

To that end, researchers, developers, and users alike must be able to have confidence in these types of technologies that rely heavily on artificial intelligence (AI). This extends beyond autonomous vehicles, applying to everything from security devices in your smart home to the personal assistant in your phone.


Predictability in Machine Learning

Percy Liang, Assistant Professor of Computer Science at Stanford University, explains that humans rely on some degree of predictability in their day-to-day interactions — both with other humans and automated systems (including, but not limited to, their cars). One way to create this predictability is by taking advantage of machine learning.

Machine learning deals with algorithms that allow an AI to “learn” based on data gathered from previous experiences. Developers do not need to write code that dictates each and every action or intention for the AI. Instead, the system recognizes patterns from its experiences and assumes the appropriate action based on that data. It is akin to the process of trial and error.

A key question often asked of machine learning systems in the research and testing environment is, “Why did the system make this prediction?” About this search for intention, Liang explains:

“If you’re crossing the road and a car comes toward you, you have a model of what the other human driver is going to do. But if the car is controlled by an AI, how should humans know how to behave?”

It is important to see that a system is performing well, but perhaps even more important is its ability to explain in easily understandable terms why it acted the way it did. Even if the system is not accurate, it must be explainable and predictable. For AI to be safely deployed, systems must rely on well-understood, realistic, and testable assumptions.

Current theories that explore the idea of reliable AI focus on fitting the observable outputs in the training data. However, as Liang explains, this could lead “to an autonomous driving system that performs well on validation tests but does not understand the human values underlying the desired outputs.”

Running multiple tests is important, of course. These types of simulations, explains Liang, “are good for debugging techniques — they allow us to more easily perform controlled experiments, and they allow for faster iteration.”

However, to really know whether a technique is effective, “there is no substitute for applying it to real life,” says Liang, “ this goes for language, vision, and robotics.” An autonomous vehicle may perform well in all testing conditions, but there is no way to accurately predict how it could perform in an unpredictable natural disaster.


Interpretable ML Systems

The best-performing models in many domains — e.g., deep neural networks for image and speech recognition — are obviously quite complex. These are considered “blackbox models,” and their predictions can be difficult, if not impossible, for them to explain.

Liang and his team are working to interpret these models by researching how a particular training situation leads to a prediction. As Liang explains, “Machine learning algorithms take training data and produce a model, which is used to predict on new inputs.”

This type of observation becomes increasingly important as AIs take on more complex tasks – think life or death situations, such as interpreting medical diagnoses. “If the training data has outliers or adversarially generated data,” says Liang, “this will affect (corrupt) the model, which will in turn cause predictions on new inputs to be possibly wrong.  Influence functions allow you to track precisely the way that a single training point would affect the prediction on a particular new input.”

Essentially, by understanding why a model makes the decisions it makes, Liang’s team hopes to improve how models function, discover new science, and provide end users with explanations of actions that impact them.

Another aspect of Liang’s research is ensuring that an AI understands, and is able to communicate, its limits to humans. The conventional metric for success, he explains, is average accuracy, “which is not a good interface for AI safety.” He posits, “what is one to do with an 80 percent reliable system?”

Liang is not looking for the system to have an accurate answer 100 percent of the time. Instead, he wants the system to be able to admit when it does not know an answer. If a user asks a system “How many painkillers should I take?” it is better for the system to say, “I don’t know” rather than making a costly or dangerous incorrect prediction.

Liang’s team is working on this challenge by tracking a model’s predictions through its learning algorithm — all the way back to the training data where the model parameters originated.

Liang’s team hopes that this approach — of looking at the model through the lens of the training data — will become a standard part of the toolkit of developing, understanding, and diagnosing machine learning. He explains that researchers could relate this to many applications: medical, computer, natural language understanding systems, and various business analytics applications.

“I think,” Liang concludes, “there is some confusion about the role of simulations some eschew it entirely and some are happy doing everything in simulation. Perhaps we need to change culturally to have a place for both.

In this way, Liang and his team plan to lay a framework for a new generation of machine learning algorithms that work reliably, fail gracefully, and reduce risks.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. If you’re interested in applying for our 2018 grants competition, please see this link.

As CO2 Levels Rise, Scientists Question Best- and Worst-Case Scenarios of Climate Change

Scientists know that the planet is warming, that humans are causing it, and that we’re running out of time to avoid catastrophic climate change. But at the same time, their estimates for future global warming can seem frustratingly vague — best-case scenarios allow decades to solve the energy crisis, while worst-case scenarios seem utterly hopeless, predicting an uninhabitable planet no matter what we do.

At the University of Exeter, some researchers disagree with these vague boundaries. Professors Peter Cox, Chris Huntingford, and Mark Williamson co-authored a recent report in Nature that argues for a more constrained understanding of the climate’s sensitivity to carbon dioxide. In general, they found that both the worst-case and best-case scenarios for global warming are far more unlikely than previously thought.

Their research focuses on a measure known as equilibrium climate sensitivity (ECS) — defined as “the global mean warming that would occur if the atmospheric carbon dioxide (CO2) concentration were instantly doubled and the climate were then brought to equilibrium with that new level of CO2.”

This concept simplifies Earth’s actual climate — CO2 won’t double instantly and it often takes decades or centuries for the climate to return to equilibrium — but ECS is critical for gauging the planet’s response to fossil fuel emissions. It can help predict how much warming will come from increases in atmospheric CO2, even before the climate settles into equilibrium.


How hot will it get if atmospheric CO2 doubles?

In other words, what is Earth’s ECS? The Intergovernmental Panel on Climate Change (IPCC) predicts that ECS is between 1.5-4.5 °C, with a 25% chance that it exceeds 4 °C and a 16% chance that it’s lower than 1.5 °C.

Cox and his colleagues argue that this range is too generous. Using tighter constraints based on historical observations of warming, they conclude that doubling atmospheric CO2 would push temperatures between 2.2–3.4 °C higher, with a 2% chance that ECS exceeds 4 °C and a 3% chance that ECS is lower than 1.5 °C. The extremes (both good and bad) of global warming thus appear less likely.

Although some scientists applauded these findings, others are more skeptical. Kevin Trenberth, a Senior Scientist in the Climate Analysis Section at the National Center for Atmospheric Research (NCAR), says the study’s climate models don’t adequately account for natural variability, making it difficult to give the findings much weight.

“I do think some previous estimates are overblown and they do not adequately use the observations we have as constraints,” he explains. “This study picks up on that a bit, and in that sense the new results seem reasonable and could be important for ruling out really major extreme changes. But it is much more important to improve the models and make better projections into the future.”


But When Will Atmospheric CO2 Double?

CO2 levels may not have doubled from pre-industrial levels yet, but they’re increasing at an alarming rate.

In 1958, NOAA’s Mauna Loa observatory opened in Hawaii to monitor atmospheric change. Its first reading of atmospheric CO2 levels clocked in at 280 parts per million (ppm). In 2013, CO2 levels surpassed 400 ppm for the first time, and just four years later, the Mauna Loa Observatory recorded its first-ever carbon dioxide reading above 410 ppm.

The last time CO2 levels were this high, global surface temperatures were 6 °C higher, oceans were 100 feet higher, and modern humans didn’t exist. Unless the international community makes massive strides towards the Paris Agreement goals, atmospheric CO2 could rise to 560 ppm by 2050 — double the concentration in 1958, and a sign of much more global warming to come.

Annual CO2 Emissions from Fossil Fuels by Country, 1959-2017 / Source: Carbon Brief














Avoiding the worst, while ensuring the bad

On the one hand, Cox’s findings come as a sigh of relief, as they reduce uncertainty about ECS and renew hope of avoiding catastrophic global warming.

But these results also imply that there’s very little hope of achieving the best-case scenarios predicted by the Paris Agreement, which seeks to keep temperatures at or below a 1.5 °C increase. Since atmospheric CO2 levels could plausibly double by midcentury, Cox’s results indicate that not only will temperatures soar past 1.5 °C, but that they’ll quickly rise higher than Paris’ upper limit of 2 degrees.

Even 2 °C of warming would be devastating for the planet, leading to an ice-free Arctic and over a meter of sea level rise — enough to submerge the Marshall Islands — while leaving tropical regions deathly hot for outdoor workers and metropolises Karachi and Kolkata nearly uninhabitable. Deadly heat waves would plague North Africa, Central America, Southeast Asia, and the Southeast US, while decreasing the yields of wheat, rice and corn by over 20%. Food shortages and extreme weather could trigger the migration of tens of millions of people and leave regions of the world ungovernable.

This two-degree world might not be far off. Global temperatures have already risen 0.8 degrees celsius since pre-industrial levels, and the past few years have provided grave indications that things are heating up.

In January, NASA announced that 2017 was the second-hottest year on record (behind 2016 and ahead of 2015) while NOAA recorded it as their third-hottest year on record. Despite this minor discrepancy, both agencies agree that the 2017 data make the past four years the hottest period in their 138-year archives.

Global warming continues, and since the climate responds to rising CO2 levels on a delay of decades, there is more warming “in the pipeline,” no matter how quickly we cut fossil fuel emissions. But understanding ECS and continuing to improve climate models, as Dr. Trenberth suggests, can provide a clearer picture of what’s ahead and give us a better idea of the actions we need to take.

Is There a Trade-off Between Immediate and Longer-term AI Safety Efforts?

Something I often hear in the machine learning community and media articles is “Worries about superintelligence are a distraction from the *real* problem X that we are facing today with AI” (where X = algorithmic bias, technological unemployment, interpretability, data privacy, etc). This competitive attitude gives the impression that immediate and longer-term safety concerns are in conflict. But is there actually a tradeoff between them?


We can make this question more specific: what resources might these two types of efforts be competing for?

Media attention. Given the abundance of media interest in AI, there have been a lot of articles about all these issues. Articles about advanced AI safety have mostly been alarmist Terminator-ridden pieces that ignore the complexities of the problem. This has understandably annoyed many AI researchers, and led some of them to dismiss these risks based on the caricature presented in the media instead of the real arguments. The overall effect of media attention towards advanced AI risk has been highly negative. I would be very happy if the media stopped writing about superintelligence altogether and focused on safety and ethics questions about today’s AI systems.

Funding. Much of the funding for advanced AI safety work currently comes from donors and organizations who are particularly interested in these problems, such as the Open Philanthropy Project and Elon Musk. They would be unlikely to fund safety work that doesn’t generalize to advanced AI systems, so their donations to advanced AI safety research are not taking funding away from immediate problems. On the contrary, FLI’s first grant program awarded some funding towards current issues with AI (such as economic and legal impacts). There isn’t a fixed pie of funding that immediate and longer-term safety are competing for – it’s more like two growing pies that don’t overlap very much. There has been an increasing amount of funding going into both fields, and hopefully this trend will continue.

Talent. The field of advanced AI safety has grown in recent years but is still very small, and the “brain drain” resulting from researchers going to work on it has so far been negligible. The motivations for working on current and longer-term problems tend to be different as well, and these problems often attract different kinds of people. For example, someone who primarily cares about social justice is more likely to work on algorithmic bias, while someone who primarily cares about the long-term future is more likely to work on superintelligence risks.

Overall, there does not seem to be much tradeoff in terms of funding or talent, and the media attention tradeoff could (in theory) be resolved by devoting essentially all the airtime to current concerns. Not only are these issues not in conflict – there are synergies between addressing them. Both benefit from fostering a culture in the AI research community of caring about social impact and being proactive about risks. Some safety problems are highly relevant both in the immediate and longer term, such as interpretability and adversarial examples. I think we need more people working on these problems for current systems while keeping scalability to more advanced future systems in mind.

AI safety problems are too important for the discussion to be derailed by status contests like “my issue is better than yours”. This kind of false dichotomy is itself a distraction from the shared goal of ensuring AI has a positive impact on the world, both now and in the future. People who care about the safety of current and future AI systems are natural allies – let’s support each other on the path towards this common goal.

This article originally appeared on the Deep Safety blog.

MIRI’s January 2018 Newsletter

Our 2017 fundraiser was a huge success, with 341 donors contributing a total of $2.5 million!

Some of the largest donations came from Ethereum inventor Vitalik Buterin, bitcoin investors Christian Calderon and Marius van Voorden, poker players Dan Smith and Tom and Martin Crowley (as part of a matching challenge), and the Berkeley Existential Risk Initiative. Thank you to everyone who contributed!

Research updates

General updates

News and links

Rewinding the Doomsday Clock

On Thursday, the Bulletin of Atomic Scientists inched their iconic Doomsday Clock forward another thirty seconds. It is now two minutes to midnight.

Citing the growing threats of climate change, increasing tensions between nuclear-armed countries, and a general loss of trust in government institutions, the Bulletin warned that we are “making the world security situation more dangerous than it was a year ago—and as dangerous as it has been since World War II.”

The Doomsday Clock hasn’t fallen this close to midnight since 1953, a year after the US and Russia tested the hydrogen bomb, a bomb up to 1000 times more powerful than the bombs dropped on Hiroshima and Nagasaki. And like 1953, this year’s announcement highlighted the increased global tensions around nuclear weapons.

As the Bulletin wrote in their statement, “To call the world nuclear situation dire is to understate the danger—and its immediacy.”

Between the US, Russia, North Korea, and Iran, the threats of aggravated nuclear war and accidental nuclear war both grew in 2017. As former Secretary of Defense William Perry said in a statement, “The events of the past year have only increased my concern that the danger of a nuclear catastrophe is increasingly real. We are failing to learn from the lessons of history as we find ourselves blundering headfirst towards a second cold war.”

The threat of nuclear war has hovered in the background since the weapons were invented, but with the end of the Cold War, many were pulled into what now appears to have been a false sense of security. In the last year, aggressive language and plans for new and upgraded nuclear weapons have reignited fears of nuclear armageddon. The recent false missile alerts in Hawaii and Japan were perhaps the starkest reminders of how close nuclear war feels, and how destructive it would be. 


But the nuclear threat isn’t all the Bulletin looks at. 2017 also saw the growing risk of climate change, a breakdown of trust in government institutions, and the emergence of new technological threats.

Climate change won’t hit humanity as immediately as nuclear war, but with each year that the international community fails to drastically reduce carbon fossil fuel emissions, the threat of catastrophic climate change grows. In 2017, the US pulled out of the Paris Climate Agreement and global carbon emissions grew 2% after a two-year plateau. Meanwhile, NASA and NOAA confirmed that the past four years are the hottest four years they’ve ever recorded.

For emerging technological risks, such as widespread cyber attacks, the development of autonomous weaponry, and potential misuse of synthetic biology, the Bulletin calls for the international community to work together. They write, “world leaders also need to seek better collective methods of managing those advances, so the positive aspects of new technologies are encouraged and malign uses discovered and countered.”

Pointing to disinformation campaigns and “fake news”, the Bulletin’s Science and Security Board writes that they are “deeply concerned about the loss of public trust in political institutions, in the media, in science, and in facts themselves—a loss that the abuse of information technology has fostered.”


Turning Back the Clock

The Doomsday Clock is a poignant symbol of the threats facing human civilization, and it received broad media attention this week through British outlets like The Guardian and The Independent, Australian outlets such as ABC Online, and American outlets from Fox News to The New York Times.

“[The clock] is a tool,” explains Lawrence Krauss, a theoretical physicist at Arizona State University and member of the Bulletin’s Science and Security Board. “For one day a year, there are thousands of newspaper stories about the deep, existential threats that humanity faces.”

The Bulletin ends its report with a list of priorities to help turn back the Clock, chocked full of suggestions for government and industrial leaders. But the authors also insist that individual citizens have a crucial role in tackling humanity’s greatest risks.

“Leaders react when citizens insist they do so,” the authors explain. “Citizens around the world can use the power of the internet to improve the long-term prospects of their children and grandchildren. They can insist on facts, and discount nonsense. They can demand action to reduce the existential threat of nuclear war and unchecked climate change. They can seize the opportunity to make a safer and saner world.”

You can read the Bulletin’s full report here.

AI Should Provide a Shared Benefit for as Many People as Possible

Shared Benefit Principle: AI technologies should benefit and empower as many people as possible.

Today, the combined wealth of the eight richest people in the world is greater than that of the poorest half of the global population. That is, 8 people have more than the combined wealth of 3,600,000,000 others.

This is already an extreme example of income inequality, but if we don’t prepare properly for artificial intelligence, the situation could get worse. In addition to the obvious economic benefits that would befall whoever designs advanced AI first, those who profit from AI will also likely have: access to better health care, happier and longer lives, more opportunities for their children, various forms of intelligence enhancement, and so on.

A Cultural Shift

Our approach to technology so far has been that whoever designs it first, wins — and they win big. In addition to the fabulous wealth an inventor can accrue, the creator of a new technology also assumes complete control over the product and its distribution. This means that an invention or algorithm will only benefit those whom the creator wants it to benefit. While this approach may have worked with previous inventions, many are concerned that advanced AI will be so powerful that we can’t treat it as business-as-usual.

What if we could ensure that as AI is developed we all benefit? Can we make a collective — and pre-emptive — decision to use AI to help raise up all people, rather than just a few?

Joshua Greene, a professor of psychology at Harvard, explains his take on this Principle: “We’re saying in advance, before we know who really has it, that this is not a private good. It will land in the hands of some private person, it will land in the hands of some private company, it will land in the hands of some nation first. But this principle is saying, ‘It’s not yours.’ That’s an important thing to say because the alternative is to say that potentially, the greatest power that humans ever develop belongs to whoever gets it first.”

AI researcher Susan Craw also agreed with the Principle, and she further clarified it.

“That’s definitely a yes,” Craw said, “But it is AI technologies plural, when it’s taken as a whole. Rather than saying that a particular technology should benefit lots of people, it’s that the different technologies should benefit and empower people.”

The Challenge of Implementation

However, as is the case with all of the Principles, agreeing with them is one thing; implementing them is another. John Havens, the Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, considered how the Shared Benefit Principle would ultimately need to be modified so that the new technologies will benefit both developed and developing countries alike.

“Yes, it’s great,” Havens said of the Principle, before adding, “if you can put a comma after it, and say … something like, ‘issues of wealth, GDP, notwithstanding.’ The point being, what this infers is whatever someone can afford, it should still benefit them.”

Patrick Lin, a philosophy professor at California Polytechnic State University, was even more concerned about how the Principle might be implemented, mentioning the potential for unintended consequences.

Lin explained: “Shared benefit is interesting, because again, this is a principle that implies consequentialism, that we should think about ethics as satisfying the preferences or benefiting as many people as possible. That approach to ethics isn’t always right. … Consequentialism often makes sense, so weighing these pros and cons makes sense, but that’s not the only way of thinking about ethics. Consequentialism could fail you in many cases. For instance, consequentialism might green-light torturing or severely harming a small group of people if it gives rise to a net increase in overall happiness to the greater community.”

“That’s why I worry about the … Shared Benefit Principle,” Lin continued. “[It] makes sense, but [it] implicitly adopts a consequentialist framework, which by the way is very natural for engineers and technologists to use, so they’re very numbers-oriented and tend to think of things in black and white and pros and cons, but ethics is often squishy. You deal with these squishy, abstract concepts like rights and duties and obligations, and it’s hard to reduce those into algorithms or numbers that could be weighed and traded off.”

As we move from discussing these Principles as ideals to implementing them as policy, concerns such as those that Lin just expressed will have to be addressed, keeping possible downsides of consequentialism and utilitarianism in mind.

The Big Picture

The devil will always be in the details. As we consider how we might shift cultural norms to prevent all benefits going only to the creators of new technologies — as well as considering the possible problems that could arise if we do so — it’s important to remember why the Shared Benefit Principle is so critical. Roman Yampolskiy, an AI researcher at the University of Louisville, sums this up:

“Early access to superior decision-making tools is likely to amplify existing economic and power inequalities turning the rich into super-rich, permitting dictators to hold on to power and making oppositions’ efforts to change the system unlikely to succeed. Advanced artificial intelligence is likely to be helpful in medical research and genetic engineering in particular making significant life extension possible, which would remove one the most powerful drivers of change and redistribution of power – death. For this and many other reasons, it is important that AI tech should be beneficial and empowering to all of humanity, making all of us wealthier and healthier.”

What Do You Think?

How important is the Shared Benefit Principle to you? How can we ensure that the benefits of new AI technologies are spread globally, rather than remaining with only a handful of people who developed them? How can we ensure that we don’t inadvertently create more problems in an effort to share the benefits of AI?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Deep Safety: NIPS 2017 Report

This year’s NIPS gave me a general sense that near-term AI safety is now mainstream and long-term safety is slowly going mainstream. On the near-term side, I particularly enjoyed Kate Crawford’s keynote on neglected problems in AI fairness, the ML security workshops, and the Interpretable ML symposium debate that addressed the “do we even need interpretability?” question in a somewhat sloppy but entertaining way. There was a lot of great content on the long-term side, including several oral / spotlight presentations and the Aligned AI workshop.

Value alignment papers

Inverse Reward Design (Hadfield-Menell et al) defines the problem of an RL agent inferring a human’s true reward function based on the proxy reward function designed by the human. This is different from inverse reinforcement learning, where the agent infers the reward function from human behavior. The paper proposes a method for IRD that models uncertainty about the true reward, assuming that the human chose a proxy reward that leads to the correct behavior in the training environment. For example, if a test environment unexpectedly includes lava, the agent assumes that a lava-avoiding reward function is as likely as a lava-indifferent or lava-seeking reward function, since they lead to the same behavior in the training environment. The agent then follows a risk-averse policy with respect to its uncertainty about the reward function.


The paper shows some encouraging results on toy environments for avoiding some types of side effects and reward hacking behavior, though it’s unclear how well they will generalize to more complex settings. For example, the approach to reward hacking relies on noticing disagreements between different sensors / features that agreed in the training environment, which might be much harder to pick up on in a complex environment. The method is also at risk of being overly risk-averse and avoiding anything new, whether it be lava or gold, so it would be great to see some approaches for safe exploration in this setting.

Repeated Inverse RL (Amin et al) defines the problem of inferring intrinsic human preferences that incorporate safety criteria and are invariant across many tasks. The reward function for each task is a combination of the task-invariant intrinsic reward (unobserved by the agent) and a task-specific reward (observed by the agent). This multi-task setup helps address the identifiability problem in IRL, where different reward functions could produce the same behavior.

repeated irl

The authors propose an algorithm for inferring the intrinsic reward while minimizing the number of mistakes made by the agent. They prove an upper bound on the number of mistakes for the “active learning” case where the agent gets to choose the tasks, and show that a certain number of mistakes is inevitable when the agent cannot choose the tasks (there is no upper bound in that case). Thus, letting the agent choose the tasks that it’s trained on seems like a good idea, though it might also result in a selection of tasks that is less interpretable to humans.

Deep RL from Human Preferences (Christiano et al) uses human feedback to teach deep RL agents about complex objectives that humans can evaluate but might not be able to demonstrate (e.g. a backflip). The human is shown two trajectory snippets of the agent’s behavior and selects which one more closely matches the objective. This method makes very efficient use of limited human feedback, scaling much better than previous methods and enabling the agent to learn much more complex objectives (as shown in MuJoCo and Atari).


Dynamic Safe Interruptibility for Decentralized Multi-Agent RL (El Mhamdi et al) generalizes the safe interruptibility problem to the multi-agent setting. Non-interruptible dynamics can arise in a group of agents even if each agent individually is indifferent to interruptions. This can happen if Agent B is affected by interruptions of Agent A and is thus incentivized to prevent A from being interrupted (e.g. if the agents are self-driving cars and A is in front of B on the road). The multi-agent definition focuses on preserving the system dynamics in the presence of interruptions, rather than on converging to an optimal policy, which is difficult to guarantee in a multi-agent setting.

Aligned AI workshop

This was a more long-term-focused version of the Reliable ML in the Wild workshop held in previous years. There were many great talks and posters there – my favorite talks were Ian Goodfellow’s “Adversarial Robustness for Aligned AI” and Gillian Hadfield’s “Incomplete Contracting and AI Alignment”.

Ian made the case of ML security being important for long-term AI safety. The effectiveness of adversarial examples is problematic not only from the near-term perspective of current ML systems (such as self-driving cars) being fooled by bad actors. It’s also bad news from the long-term perspective of aligning the values of an advanced agent, which could inadvertently seek out adversarial examples for its reward function due to Goodhart’s law. Relying on the agent’s uncertainty about the environment or human preferences is not sufficient to ensure safety, since adversarial examples can cause the agent to have arbitrarily high confidence in the wrong answer.

ian talk_3

Gillian approached AI safety from an economics perspective, drawing parallels between specifying objectives for artificial agents and designing contracts for humans. The same issues that make contracts incomplete (the designer’s inability to consider all relevant contingencies or precisely specify the variables involved, and incentives for the parties to game the system) lead to side effects and reward hacking for artificial agents.

Gillian talk_4

The central question of the talk was how we can use insights from incomplete contracting theory to better understand and systematically solve specification problems in AI safety, which is a really interesting research direction. The objective specification problem seems even harder to me than the incomplete contract problem, since the contract design process relies on some level of shared common sense between the humans involved, which artificial agents do not currently possess.

Interpretability for AI safety

I gave a talk at the Interpretable ML symposium on connections between interpretability and long-term safety, which explored what forms of interpretability could help make progress on safety problems (slidesvideo). Understanding our systems better can help ensure that safe behavior generalizes to new situations, and it can help identify causes of unsafe behavior when it does occur.

For example, if we want to build an agent that’s indifferent to being switched off, it would be helpful to see whether the agent has representations that correspond to an off-switch, and whether they are used in its decisions. Side effects and safe exploration problems would benefit from identifying representations that correspond to irreversible states (like “broken” or “stuck”). While existing work on examining the representations of neural networks focuses on visualizations, safety-relevant concepts are often difficult to visualize.

Local interpretability techniques that explain specific predictions or decisions are also useful for safety. We could examine whether features that are idiosyncratic to the training environment or indicate proximity to dangerous states influence the agent’s decisions. If the agent can produce a natural language explanation of its actions, how does it explain problematic behavior like reward hacking or going out of its way to disable the off-switch?

There are many ways in which interpretability can be useful for safety. Somewhat less obvious is what safety can do for interpretability: serving as grounding for interpretability questions. As exemplified by the final debate of the symposium, there is an ongoing conversation in the ML community trying to pin down the fuzzy idea of interpretability – what is it, do we even need it, what kind of understanding is useful, etc. I think it’s important to keep in mind that our desire for interpretability is to some extent motivated by our systems being fallible – understanding our AI systems would be less important if they were 100% robust and made no mistakes. From the safety perspective, we can define interpretability as the kind of understanding that help us ensure the safety of our systems.

For those interested in applying the interpretability hammer to the safety nail, or working on other long-term safety questions, FLI has recently announced a new grant program. Now is a great time for the AI field to think deeply about value alignment. As Pieter Abbeel said at the end of his keynote, “Once you build really good AI contraptions, how do you make sure they align their value system with our value system? Because at some point, they might be smarter than us, and it might be important that they actually care about what we care about.”

(Thanks to Janos Kramar for his feedback on this post, and to everyone at DeepMind who gave feedback on the interpretability talk.)

This article was originally posted here.