2018 Spring Conference: Invest in Minds Not Missiles

On Saturday April 7th and Sunday morning April 8th, MIT and Massachusetts Peace Action will co-host a conference and workshop at MIT on understanding and reducing the risk of nuclear war. Tickets are free for students. To attend, please register here.

 

Saturday sessions

Workshops

Sunday Morning Planning Breakfast

Student-led session to design and implement programs enhancing existing campus groups, and organizing new ones; extending the network to campuses in Rhode Island, Connecticut, New Jersey, New Hampshire, Vermont and Maine.

For more information, contact Jonathan King at <jaking@mit.edu>, or call 617-354-2169

How AI Handles Uncertainty: An Interview With Brian Ziebart

Click here to see this page in other languages:  Russian

When training image detectors, AI researchers can’t replicate the real world. They teach systems what to expect by feeding them training data, such as photographs, computer-generated images, real video and simulated video, but these practice environments can never capture the messiness of the physical world.

In machine learning (ML), image detectors learn to spot objects by drawing bounding boxes around them and giving them labels. And while this training process succeeds in simple environments, it gets complicated quickly.

 

 

 

 

 

 

 

It’s easy to define the person on the left, but how would you draw a bounding box around the person on the right? Would you only include the visible parts of his body, or also his hidden torso and legs? These differences may seem trivial, but they point to a fundamental problem in object recognition: there rarely is a single best way to define an object.

As this second image demonstrates, the real world is rarely clear-cut, and the “right” answer is usually ambiguous. Yet when ML systems use training data to develop their understanding of the world, they often fail to reflect this. Rather than recognizing uncertainty and ambiguity, these systems often confidently approach new situations no differently than their training data, which can put the systems and humans at risk.

Brian Ziebart, a Professor of Computer Science at the University of Illinois at Chicago, is conducting research to improve AI systems’ ability to operate amidst the inherent uncertainty around them. The physical world is messy and unpredictable, and if we are to trust our AI systems, they must be able to safely handle it.

 

Overconfidence in ML Systems

ML systems will inevitably confront real-world scenarios that their training data never prepared them for. But, as Ziebart explains, current statistical models “tend to assume that the data that they’ll see in the future will look a lot like the data they’ve seen in the past.”

As a result, these systems are overly confident that they know what to do when they encounter new data points, even when those data points look nothing like what they’ve seen. ML systems falsely assume that their training prepared them for everything, and the resulting overconfidence can lead to dangerous consequences.

Consider image detection for a self-driving car. A car might train its image detection on data from the dashboard of another car, tracking the visual field and drawing bounding boxes around certain objects, as in the image below:

Bounding boxes on a highway – CloudFactory Blog

 

 

 

 

 

 

 

 

 

 

 

 

For clear views like this, image detectors excel. But the real world isn’t always this simple. If researchers train an image detector on clean, well-lit images in the lab, it might accurately recognize objects 80% of the time during the day. But when forced to navigate roads on a rainy night, it might drop to 40%.

“If you collect all of your data during the day and then try to deploy the system at night, then however it was trained to do image detection during the day just isn’t going to work well when you generalize into those new settings,” Ziebart explains.

Moreover, the ML system might not recognize the problem: since the system assumes that its training covered everything, it will remain confident about its decisions and continue “to make strong predictions that are just inaccurate,” Ziebart adds.

In contrast, humans tend to recognize when previous experience doesn’t generalize into new settings. If a driver spots an unknown object ahead in the road, she wouldn’t just plow through the object. Instead, she might slow down, pay attention to how other cars respond to the object, and consider swerving if she can do so safely. When humans feel uncertain about our environment, we exercise caution to avoid making dangerous mistakes.

Ziebart would like AI systems to incorporate similar levels of caution in uncertain situations. Instead of confidently making mistakes, a system should recognize its uncertainty and ask questions to glean more information, much like an uncertain human would.

 

An Adversarial Approach

Training and practice may never prepare AI systems for every possible situation, but researchers can make their training methods more foolproof. Ziebart posits that feeding systems messier data in the lab can train them to better recognize and address uncertainty.

Conveniently, humans can provide this messy, real-world data. By hiring a group of human annotators to look at images and draw bounding boxes around certain objects – cars, people, dogs, trees, etc. – researchers can “build into the classifier some idea of what ‘normal’ data looks like,” Ziebart explains.

“If you ask ten different people to provide these bounding boxes, you’re likely to get back ten different bounding boxes,” he says. “There’s just a lot of inherent ambiguity in how people think about the ground truth for these things.”

Returning to the image above of the man in the car, human annotators might give ten different bounding boxes that capture different portions of the visible and hidden person. By feeding ML systems this confusing and contradictory data, Ziebart prepares them to expect ambiguity.

“We’re synthesizing more noise into the data set in our training procedure,” Ziebart explains. This noise reflects the messiness of the real world, and trains systems to be cautious when making predictions in new environments. Cautious and uncertain, AI systems will seek additional information and learn to navigate the confusing situations they encounter.

Of course, self-driving cars shouldn’t have to ask questions. If a car’s image detection spots a foreign object up ahead, for instance, it won’t have time to ask humans for help. But if it’s trained to recognize uncertainty and act cautiously, it might slow down, detect what other cars are doing, and safely navigate around the object.

 

Building Blocks for Future Machines

Ziebart’s research remains in training settings thus far. He feeds systems messy, varied data and trains them to provide bounding boxes that have at least 70% overlap with people’s bounding boxes. And his process has already produced impressive results. On an ImageNet object detection task investigated in collaboration with Sima Behpour (University of Illinois at Chicago) and Kris Kitani (Carnegie Mellon University), for example, Ziebart’s adversarial approach “improves performance by over 16% compared to the best performing data augmentation method.” Trained to operate amidst uncertain environments, these systems more effectively manage new data points that training didn’t explicitly prepare them for.

But while Ziebart trains relatively narrow AI systems, he believes that this research can scale up to more advanced systems like autonomous cars and public transit systems.

“I view this as kind of a fundamental issue in how we design these predictors,” he says. “We’ve been trying to construct better building blocks on which to make machine learning – better first principles for machine learning that’ll be more robust.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Stephen Hawking in Memoriam

As we mourn the loss of Stephen Hawking, we should remember that his legacy goes far beyond science. Yes, of course he was one of the greatest scientists of the past century, discovering that black holes evaporate and helping found the modern quest for quantum gravity. But he also had a remarkable legacy as a social activist, who looked far beyond the next election cycle and used his powerful voice to bring out the best in us all. As a founding member of FLI’s Scientific Advisory board, he tirelessly helped us highlight the importance of long-term thinking and ensuring that we use technology to help humanity flourish rather than flounder. I marveled at how he could sometimes answer my emails faster than my grad students. His activism revealed the same visionary fearlessness as his scientific and personal life: he saw further ahead than most of those around him and wasn’t afraid of controversially sounding the alarm about humanity’s sloppy handling of powerful technology, from nuclear weapons to AI.

On a personal note, I’m saddened to have lost not only a long-time collaborator but, above all, a great inspiration, always reminding me of how seemingly insurmountable challenges can be overcome with creativity, willpower and positive attitude. Thanks Stephen for inspiring us all!

Can Global Warming Stay Below 1.5 Degrees? Views Differ Among Climate Scientists

The Paris Climate Agreement seeks to keep global warming well below 2 degrees Celsius relative to pre-industrial temperatures. In the best case scenario, warming would go no further than 1.5 degrees.

Many scientists see this as an impossible goal. A recent study by Peter Cox et al. postulates that, given a twofold increase in atmospheric carbon dioxide, there is only a 3% chance of keeping warming below 1.5 degrees.

But a study by Richard Millar et al. provides more reason for hope. The Millar report concludes that the 1.5 degree limit is still physically feasible, if only narrowly. It also provides an updated “carbon budget”—a projection of how much more carbon dioxide we can emit without breaking the 1.5 degree limit.

Dr. Joeri Rogelj, a climate scientist and research scholar with the Energy Program of the International Institute for Applied Systems Analysis, co-authored the Millar report. For Rogelj, the updated carbon budget is not the paper’s most important point. “Our paper shows to decision makers the importance of anticipating new and updated scientific knowledge,” he says.

Projected “carbon budgets” are rough estimates based on limited observations. These projections need to be continually updated as more data becomes available. Fortunately, the Paris Agreement calls for countries to periodically update their emission reduction pledges based on new estimates. Rogelj is hopeful “that this paper has put the necessity for a strong [updating] process on the radar of delegates.”

For scientists who have dismissed the 1.5 degree limit as impossible, the updating process might seem pointless. But Rogelj stresses that his team looked only at geophysical limitations, not political ones. Their report assumes that countries will agree to a zero emissions commitment—a much more ambitious scenario than other researchers have considered.

There is a misconception, Rogelj says, that the report claims to have found an inaccuracy in the Earth system models (ESMs) that are used to estimate human-driven warming. “We are using precisely those models to estimate the carbon budget from today onward,” Rogelj explains.

The problem is not the models, but rather the data fed into them. These simulations are often run using inexact projections of CO2 emissions. Over time, small discrepancies accumulate and are reflected in the warming predictions that the models make.

Given information about current CO2 emissions, however, ESMs make temperature predictions that are “quite accurate.” And when they are provided with an ambitious future scenario for emissions reduction, the models indicate that it is possible for global temperature increases to remain below 1.5 degrees.

So what would such a scenario look like? First off, emissions have to fall to zero. At the same time, the carbon budget needs to be continually reevaluated, and strategy changes must be based on the updated budget. For example, if emissions fall to zero but we’ve surpassed our carbon budget, then we’ll need to focus on making our emissions negative—in other words, on carbon dioxide removal.

Rogelj names two major processes for carbon dioxide removal: reforestation and bio-energy with carbon capture and storage. Some negative emissions processes, such as reforestation, provide benefits beyond carbon capture, while others may have undesired side effects.

But Rogelj is quick to add that these negative emissions technologies are not “silver bullets.” It’s too soon to know if carbon dioxide removal at a global scale will actually be necessary—we’ll have to get to zero emissions before we can tell. But such technologies could also help us reach zero in the first place.

What else will get us to zero emissions? According to Rogelj, we need “a strong emphasis on energy efficiency, combined with an electrification of end-use sectors like transport and building and a shift away from fossil fuels.” This will require a major shift in investment patterns. We want to avoid “locking into carbon dioxide-intensive infrastructure” that would saddle future generations with a dependency on non-renewable energy, he explains.

Rogelj stresses that his team’s findings are based only on geophysical data. Societal factors are a different matter: It is up to individual countries to decide where reducing emissions falls on their list of priorities.

However, the stipulation in the Paris Climate Agreement that countries periodically update their pledges is a source of optimism. Rogelj, for his part, is cautiously hopeful: “Looking at real world dynamics in terms of costs of renewables and energy storage, I personally think there is room for pledges to be strengthened over the coming five to ten years as countries better understand what is possible and how these pledges can align with other priorities.”

But not everyone in the scientific community shares the hopeful tone struck by Rogelj and his team. An article by the MIT Technology Review outlines “the five most worrisome climate developments” from 2017.

To start, global emissions are on the rise, up 2% from 2016. While the prior few years had seen a relative flattening in emissions, this more recent data shattered hopes that the trend would continue. On top of that, scientists are finding that observable climate trends line up best with “worst-case scenario” models of global warming—that is, global temperatures could rise five degrees in the next century.

And the arctic is melting much faster than scientists predicted. A recent report by the U.S. National Oceanic and Atmospheric Administration (NOAA) declared “that the North Pole had reached a ‘new normal,’ with no sign of returning to a ‘reliably frozen region.’”

Melting glaciers and sea ice trigger a whole new set of problems. The disappearing ice will cause sea levels to rise, and the “reflective white snow and ice [will] turn into heat-absorbing dark-blue water…[meaning] the Arctic will send less heat back into space, which leads to more warming, more melting, and more sea-level rise still.”

And finally, natural disasters are becoming increasingly ferocious as weather patterns mutate. The United States saw this first-hand, with massive wildfires on the west coast—including the largest ever in California’s history—and a string of hurricanes that ravaged the Virgin Islands, Puerto Rico, and many southern states.

These consequences of global warming are beginning to affect areas of social interest beyond the environment. The 2017 Atlantic hurricane season, for example, has been a massive economic burden, wracking up more than $200 billion in damages.

In Rogelj’s words, “Right now we really need to find ways to achieve multiple societal objectives, to find policies and measures and options that allow us to achieve those together.” As governments come to see how climate protection “can align with other priorities like reducing air pollution, and providing clean water and reliable energy,” we have reason to hope that it may become a higher and higher priority.

How to Prepare for the Malicious Use of AI

How can we forecast, prevent, and (when necessary) mitigate the harmful effects of malicious uses of AI?

This is the question posed by a 100-page report released last week, written by 26 authors from 14 institutions. The report, which is the result of a two-day workshop in Oxford, UK followed by months of research, provides a sweeping landscape of the security implications of artificial intelligence.

The authors, who include representatives from the Future of Humanity Institute, the Center for the Study of Existential Risk, OpenAI, and the Center for a New American Security, argue that AI is not only changing the nature and scope of existing threats, but also expanding the range of threats we will face. They are excited about many beneficial applications of AI, including the ways in which it will assist defensive capabilities. But the purpose of the report is to survey the landscape of security threats from intentionally malicious uses of AI.

“Our report focuses on ways in which people could do deliberate harm with AI,” said Seán Ó hÉigeartaigh, Executive Director of the Cambridge Centre for the Study of Existential Risk. “AI may pose new threats, or change the nature of existing threats, across cyber, physical, and political security.”

Importantly, this is not a report about a far-off future. The only technologies considered are those that are already available or that are likely to be within the next five years. The message therefore is one of urgency. We need to acknowledge the risks and take steps to manage them because the technology is advancing exponentially. As reporter Dave Gershgorn put it, “Every AI advance by the good guys is an advance for the bad guys, too.”

AI systems tend to be more efficient and more scalable than traditional tools. Additionally, the use of AI can increase the anonymity and psychological distance a person feels to the actions carried out, potentially lowering the barrier to committing crimes and acts of violence. Moreover, AI systems have their own unique vulnerabilities including risks from data poisoning, adversarial examples, and the exploitation of flaws in their design. AI-enabled attacks will outpace traditional cyberattacks because they will generally be more effective, more finely targeted, and more difficult to attribute.

The kinds of attacks we need to prepare for are not limited to sophisticated computer hacks. The authors suggest there are three primary security domains: digital security, which largely concerns cyberattacks; physical security, which refers to carrying out attacks with drones and other physical systems; and political security, which includes examples such as surveillance, persuasion via targeted propaganda, and deception via manipulated videos. These domains have significant overlap, but the framework can be useful for identifying different types of attacks, the rationale behind them, and the range of options available to protect ourselves.

What can be done to prepare for malicious uses of AI across these domains? The authors provide many good examples. The scenarios described in the report can be a good way for researchers and policymakers to explore possible futures and brainstorm ways to manage the most critical threats. For example, imagining a commercial cleaning robot being repurposed as a non-traceable explosion device may scare us, but it also suggests why policies like robot registration requirements may be a useful option.

Each domain also has its own possible points of control and countermeasures. For example, to improve digital security, companies can promote consumer awareness and incentivize white hat hackers to find vulnerabilities in code. We may also be able to learn from the cybersecurity community and employ measures such as red teaming for AI development, formal verification in AI systems, and responsible disclosure of AI vulnerabilities. To improve physical security, policymakers may want to regulate hardware development and prohibit sales of lethal autonomous weapons. Meanwhile, media platforms may be able to minimize threats to political security by offering image and video authenticity certification, fake news detection, and encryption.

The report additionally provides four high level recommendations, which are not intended to provide specific technical or policy proposals, but rather to draw attention to areas that deserve further investigation. The recommendations are the following:

Recommendation #1: Policymakers should collaborate closely with technical researchers to investigate, prevent, and mitigate potential malicious uses of AI.

Recommendation #2: Researchers and engineers in artificial intelligence should take the dual-use nature of their work seriously, allowing misuse-related considerations to influence research priorities and norms, and proactively reaching out to relevant actors when harmful applications are foreseeable.

Recommendation #3: Best practices should be identified in research areas with more mature methods for addressing dual- use concerns, such as computer security, and imported where applicable to the case of AI.

Recommendation #4: Actively seek to expand the range of stakeholders and domain experts involved in discussions of these challenges.

Finally, the report identifies several areas for further research. The first of these is to learn from and with the cybersecurity community because the impacts of cybersecurity incidents will grow as AI-based systems become more widespread and capable. Other areas of research include exploring different openness models, promoting a culture of responsibility among AI researchers, and developing technological and policy solutions.

As the authors state, “The malicious use of AI will impact how we construct and manage our digital infrastructure as well as how we design and distribute AI systems, and will likely require policy and other institutional responses.”

Although this is only the beginning of the understanding needed on how AI will impact global security, this report moves the discussion forward. It not only describes numerous emergent security concerns related to AI, but also suggests ways we can begin to prepare for those threats today.

MIRI’s February 2018 Newsletter

Updates

News and links

  • In “Adversarial Spheres,” Gilmer et al. investigate the tradeoff between test error and vulnerability to adversarial perturbations in many-dimensional spaces.
  • Recent posts on Less Wrong: Critch on “Taking AI Risk Seriously” and Ben Pace’s background model for assessing AI x-risk plans.
  • Solving the AI Race“: GoodAI is offering prizes for proposed responses to the problem that “key stakeholders, including [AI] developers, may ignore or underestimate safety procedures, or agreements, in favor of faster utilization”.
  • The Open Philanthropy Project is hiring research analysts in AI alignment, forecasting, and strategy, along with generalist researchers and operations staff.

This newsletter was originally posted on MIRI’s website.

Optimizing AI Safety Research: An Interview With Owen Cotton-Barratt

Artificial intelligence poses a myriad of risks to humanity. From privacy concerns, to algorithmic bias and “black box” decision making, to broader questions of value alignment, recursive self-improvement, and existential risk from superintelligence — there’s no shortage of AI safety issues.  

AI safety research aims to address all of these concerns. But with limited funding and too few researchers, trade-offs in research are inevitable. In order to ensure that the AI safety community tackles the most important questions, researchers must prioritize their causes.

Owen Cotton-Barratt, along with his colleagues at the Future of Humanity Institute (FHI) and the Centre for Effective Altruism (CEA), looks at this ‘cause prioritization’ for the AI safety community. They analyze which projects are more likely to help mitigate catastrophic or existential risks from highly-advanced AI systems, especially artificial general intelligence (AGI). By modeling trade-offs between different types of research, Cotton-Barratt hopes to guide scientists toward more effective AI safety research projects.

 

Technical and Strategic Work

The first step of cause prioritization is understanding the work already being done. Broadly speaking, AI safety research happens in two domains: technical work and strategic work.

AI’s technical safety challenge is to keep machines safe and secure as they become more capable and creative. By making AI systems more predictable, more transparent, and more robustly aligned with our goals and values, we can significantly reduce the risk of harm. Technical safety work includes Stuart Russell’s research on reinforcement learning and Dan Weld’s work on explainable machine learning, since they’re improving the actual programming in AI systems.

In addition, the Machine Intelligence Research Institute (MIRI) recently released a technical safety agenda aimed at aligning machine intelligence with human interests in the long term, while OpenAI, another non-profit AI research company, is investigating the “many research problems around ensuring that modern machine learning systems operate as intended,” following suggestions from the seminal paper Concrete Problems in AI Safety.

Strategic safety work is broader, and asks how society can best prepare for and mitigate the risks of powerful AI. This research includes analyzing the political environment surrounding AI development, facilitating open dialogue between research areas, disincentivizing arms races, and learning from game theory and neuroscience about probable outcomes for AI. Yale professor Allan Dafoe has recently focused on strategic work, researching the international politics of artificial intelligence and consulting for governments, AI labs and nonprofits about AI risks. And Yale bioethicist Wendell Wallach, apart from his work on “silo busting,” is researching forms of global governance for AI.

Cause prioritization is strategy work, as well. Cotton-Barratt explains, “Strategy work includes analyzing the safety landscape itself and considering what kind of work do we think we’re going to have lots of, what are we going to have less of, and therefore helping us steer resources and be more targeted in our work.”

 

 

 

 

 

 

 

 

 

 

 

Who Needs More Funding?

As the graph above illustrates, AI safety spending has grown significantly since 2015. And while more money doesn’t always translate into improved results, funding patterns are easy to assess and can say a lot about research priorities. Seb Farquhar, Cotton-Barratt’s colleague at CEA, wrote a post earlier this year analyzing AI safety funding and suggesting ways to better allocate future investments.

To start, he suggests that the technical research community acquire more personal investigators to take the research agenda, detailed in Concrete Problems in AI Safety, forward. OpenAI is already taking a lead on this. Additionally, the community should go out of its way to ensure that emerging AI safety centers hire the best candidates, since these researchers will shape each center’s success for years to come.

In general, Farquhar notes that strategy, outreach and policy work haven’t kept up with the overall growth of AI safety research. He suggests that more people focus on improving communication about long-run strategies between AI safety research teams, between the AI safety community and the broader AI community, and between policymakers and researchers. Building more PhD and Masters courses on AI strategy and policy could establish a pipeline to fill this void, he adds.

To complement Farquhar’s data, Cotton-Barratt’s colleague Max Dalton created a mathematical model to track how more funding and more people working on a safety problem translate into useful progress or solutions. The model tries to answer such questions as: if we want to reduce AI’s existential risks, how much of an effect do we get by investing money in strategy research versus technical research?

In general, technical research is easier to track than strategic work in mathematical models. For example, spending more on strategic ethics research may be vital for AI safety, but it’s difficult to quantify that impact. Improving models of reinforcement learning, however, can produce safer and more robustly-aligned machines. With clearer feedback loops, these technical projects fit best with Dalton’s models.

 

Near-sightedness and AGI

But these models also confront major uncertainty. No one really knows when AGI will be developed, and this makes it difficult to determine the most important research. If AGI will be developed in five years, perhaps researchers should focus only on the most essential safety work, such as improving transparency in AI systems. But if we have thirty years, researchers can probably afford to dive into more theoretical work.

Moreover, no one really knows how AGI will function. Machine learning and deep neural networks have ushered in a new AI revolution, but AGI will likely be developed on architectures far different from AlphaGo and Watson.

This makes some long-term safety research a risky investment, even if, as many argue, it is the most important research we can do. For example, researchers could spend years making deep neural nets safe and transparent, only to find their work wasted when AGI develops on an entirely different programming architecture.

Cotton-Barratt attributes this issue to ‘nearsightedness,’ and discussed it in a recent talk at Effective Altruism Global this summer. Humans often can’t anticipate disruptive change, and AI researchers are no exception.

“Work that we might do for long-term scenarios might turn out to be completely confused because we weren’t thinking of the right type of things,” he explains. “We have more leverage over the near-term scenarios because we’re more able to assess what they’re going to look like.”

Any additional AI safety research is better than none, but given the unknown timelines and the potential gravity of AI’s threats to humanity, we’re better off pursuing — to the extent possible — the most effective AI safety research.

By helping the AI research portfolio advance in a more efficient and comprehensive direction, Cotton-Barratt and his colleagues hope to ensure that when machines eventually outsmart us, we will have asked — and hopefully answered — the right questions.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. If you’re interested in applying for our 2018 grants competition, please see this link.

As Acidification Increases, Ocean Biodiversity May Decline

Dubbed “the evil twin of global warming,” ocean acidification is a growing crisis that poses a threat to both water-dwelling species and human communities that rely on the ocean for food and livelihood.

Since pre-industrial times, the ocean’s pH has dropped from 8.2 to 8.1—a change that may seem insignificant, but actually represents a 30 percent increase in acidity. As the threat continues to mount, the German research project  BIOACID (Biological Impacts of Ocean Acidification) seeks to provide a better understanding of the phenomenon by studying its effects around the world.

BIOACID began in 2009, and since that time, over 250 German researchers  have contributed more than 580 publications to the scientific discourse on the effects of acidification and how the  oceans are changing.

The organization recently released a report that synthesizes their most notable findings for climate negotiators and decision makers. Their work explores “how different marine species respond to ocean acidification, how these reactions impact the food web as well as material cycles and energy turnover in the ocean, and what consequences these changes have for economy and society.”

Field research for the project has spanned multiple oceans, where key species and communities have been studied under natural conditions. In the laboratory, researchers have also been able to test for coming changes by exposing organisms to simulated future conditions.

Their results indicate that acidification is only one part of a larger problem. While organisms might be capable of adapting to the shift in pH, acidification is typically accompanied by other environmental stressors that make adaptation all the more difficult.

In some cases, marine life that had been able to withstand acidification by itself could not tolerate the additional stress of increased water temperatures, researchers found. Other factors like pollution and eutrophication—an excess of nutrients—compounded the harm.

Further, rising water temperatures are forcing many species to abandon part or all of their original habitats, wreaking additional havoc on ecosystems. And a 1.2 degree increase in global temperature—which is significantly under the 2 degree limit set in the Paris Climate Agreements—is expected to kill at least half of the world’s tropical coral reefs.

Acidification itself is a multipronged threat. When carbon dioxide is absorbed by the ocean, a series of chemical reactions take place. These reactions have two important outcomes: acid levels increase and the compound carbonate is transformed into bicarbonate. Both of these results have widespread effects on the organisms who make their homes in our oceans.

Increased acidity has a particularly harmful effect on organisms in their early life stages, such as fish larvae. This means, among other things, the depletion of fish stocks—a cornerstone of the economy as well as diet in many human communities. Researchers “have found that both [acidification and warming] work synergistically, especially on the most sensitive early life stages of [fish] as well as embryo and larval survival.”

Many species are harmed as well by the falling levels of carbonate, which is an essential building block for organisms like coral, mussels, and some plankton. Like all calcifying corals, the cold-water coral species Lophelia pertusa builds its skeleton from calcium carbonate. Some research suggests that acidification threatens both to slow its growth and to corrode the dead branches that are no longer protected by organic matter.

As a “reef engineer,” Lophelia is home to countless species; as it suffers, so will they. The BIOACID report warns: “[T]o definitely preserve the magnificent oases of biodiversity founded by Lophelia pertusa, effects of climate change need to be minimised even now–while science continues to investigate this complex marine ecosystem.”

Even those organisms not directly affected by acidification may find themselves in trouble as their ecosystems are thrown out of balance. Small changes at the bottom of the food web, for example, may have big effects at higher trophic levels. In the Artic, Limacina helicina—a tiny swimming snail or “sea butterfly—is a major source of food for many marine animals. The polar cod species Boreogadus saida, which feeds on Limacina, is a key food source for larger fish, birds, and mammals such as whales and seals.

As acidification increases, research suggests that Limacina’s nutrional value will decrease as its metabolism and shell growth are affected; its numbers, too, will likely drop. With the disappearance of this prey, the polar cod will likely suffer. Diminishing cod populations will in turn affect the many predators who feed on them.

Even where acidification stands to benefit a particular species, the overall impact on the ecosystem can be negative. In the Baltic Sea, BIOACID scientists have found that Nodularia spumigena, a species of cyanobacteria, “manages perfectly with water temperatures above 16 degrees Celsius and elevated carbon dioxide concentrations–whereas other organisms already reach their limits at less warming.”

Nodularia becomes more productive under acidified conditions, producing bacterial “blooms” that can extend upwards of 60,000 square kilometers in the Baltic Sea. These blooms block light from other organisms, and as dead bacteria degrade near the ocean floor they take up precious oxygen. The cells also release toxins that are harmful to marine animals and humans alike.

Ultimately biodiversity, “a basic requirement for ecosystem functioning and ultimately even human wellbeing,” will be lost. Damage to tropical coral reefs, which are home to one quarter of all marine species, could drastically reduce the ocean’s biodiversity. And as biodiversity decreases, an ecosystem becomes more fragile: ecological functions that were once performed by several different species become entirely dependent on only one.

And the diversity of marine ecosystems is not the only thing at stake. Currently, the ocean plays a major mitigating role in global warming, absorbing around 30 percent of the carbon dioxide emitted by humans. It also absorbs over 90 percent of the heat produced by the greenhouse effect. But as acidification continues, the ocean will take up less and less carbon dioxide—meaning we may see an increase in the rate of global warming.

The ocean controls carbon dioxide uptake in part through a biological mechanism known as the carbon pump. Normally, phytoplankton near the ocean’s surface take up carbon dioxide and then sink towards the ocean floor. This process lowers surface carbon dioxide concentrations, facilitating its uptake from the atmosphere.

But acidification weakens this biological carbon pump. Researchers have found that acidified conditions favor smaller types of phytoplankton, which sink more slowly. In addition, heavier calcifying plankton—which typically propel the pump by sinking more quickly—will have increasing difficulty forming their weighty calcium carbonate shells. As the pump’s efficiency decreases, so will the uptake of carbon dioxide from the air.

The BIOACID report stresses that the risks of acidification remain largely uncertain. However, despite — or perhaps because of — this, society must tread cautiously with care of the oceans. The report explains, “Following the precautionary principle is the best way to act when considering potential risks to the environment and humankind, including future generations.”

Transparent and Interpretable AI: an interview with Percy Liang

At the end of 2017, the United States House of Representatives passed a bill called the SELF DRIVE Act, laying out an initial federal framework for autonomous vehicle regulation. Autonomous cars have been undergoing testing on public roads for almost two decades. With the passing of this bill, along with the increasing safety benefits of autonomous vehicles, it is likely that they will become even more prevalent in our daily lives. This is true for numerous autonomous technologies including those in the medical, legal, and safety fields – just to name a few.

To that end, researchers, developers, and users alike must be able to have confidence in these types of technologies that rely heavily on artificial intelligence (AI). This extends beyond autonomous vehicles, applying to everything from security devices in your smart home to the personal assistant in your phone.

 

Predictability in Machine Learning

Percy Liang, Assistant Professor of Computer Science at Stanford University, explains that humans rely on some degree of predictability in their day-to-day interactions — both with other humans and automated systems (including, but not limited to, their cars). One way to create this predictability is by taking advantage of machine learning.

Machine learning deals with algorithms that allow an AI to “learn” based on data gathered from previous experiences. Developers do not need to write code that dictates each and every action or intention for the AI. Instead, the system recognizes patterns from its experiences and assumes the appropriate action based on that data. It is akin to the process of trial and error.

A key question often asked of machine learning systems in the research and testing environment is, “Why did the system make this prediction?” About this search for intention, Liang explains:

“If you’re crossing the road and a car comes toward you, you have a model of what the other human driver is going to do. But if the car is controlled by an AI, how should humans know how to behave?”

It is important to see that a system is performing well, but perhaps even more important is its ability to explain in easily understandable terms why it acted the way it did. Even if the system is not accurate, it must be explainable and predictable. For AI to be safely deployed, systems must rely on well-understood, realistic, and testable assumptions.

Current theories that explore the idea of reliable AI focus on fitting the observable outputs in the training data. However, as Liang explains, this could lead “to an autonomous driving system that performs well on validation tests but does not understand the human values underlying the desired outputs.”

Running multiple tests is important, of course. These types of simulations, explains Liang, “are good for debugging techniques — they allow us to more easily perform controlled experiments, and they allow for faster iteration.”

However, to really know whether a technique is effective, “there is no substitute for applying it to real life,” says Liang, “ this goes for language, vision, and robotics.” An autonomous vehicle may perform well in all testing conditions, but there is no way to accurately predict how it could perform in an unpredictable natural disaster.

 

Interpretable ML Systems

The best-performing models in many domains — e.g., deep neural networks for image and speech recognition — are obviously quite complex. These are considered “blackbox models,” and their predictions can be difficult, if not impossible, for them to explain.

Liang and his team are working to interpret these models by researching how a particular training situation leads to a prediction. As Liang explains, “Machine learning algorithms take training data and produce a model, which is used to predict on new inputs.”

This type of observation becomes increasingly important as AIs take on more complex tasks – think life or death situations, such as interpreting medical diagnoses. “If the training data has outliers or adversarially generated data,” says Liang, “this will affect (corrupt) the model, which will in turn cause predictions on new inputs to be possibly wrong.  Influence functions allow you to track precisely the way that a single training point would affect the prediction on a particular new input.”

Essentially, by understanding why a model makes the decisions it makes, Liang’s team hopes to improve how models function, discover new science, and provide end users with explanations of actions that impact them.

Another aspect of Liang’s research is ensuring that an AI understands, and is able to communicate, its limits to humans. The conventional metric for success, he explains, is average accuracy, “which is not a good interface for AI safety.” He posits, “what is one to do with an 80 percent reliable system?”

Liang is not looking for the system to have an accurate answer 100 percent of the time. Instead, he wants the system to be able to admit when it does not know an answer. If a user asks a system “How many painkillers should I take?” it is better for the system to say, “I don’t know” rather than making a costly or dangerous incorrect prediction.

Liang’s team is working on this challenge by tracking a model’s predictions through its learning algorithm — all the way back to the training data where the model parameters originated.

Liang’s team hopes that this approach — of looking at the model through the lens of the training data — will become a standard part of the toolkit of developing, understanding, and diagnosing machine learning. He explains that researchers could relate this to many applications: medical, computer, natural language understanding systems, and various business analytics applications.

“I think,” Liang concludes, “there is some confusion about the role of simulations some eschew it entirely and some are happy doing everything in simulation. Perhaps we need to change culturally to have a place for both.

In this way, Liang and his team plan to lay a framework for a new generation of machine learning algorithms that work reliably, fail gracefully, and reduce risks.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. If you’re interested in applying for our 2018 grants competition, please see this link.

As CO2 Levels Rise, Scientists Question Best- and Worst-Case Scenarios of Climate Change

Scientists know that the planet is warming, that humans are causing it, and that we’re running out of time to avoid catastrophic climate change. But at the same time, their estimates for future global warming can seem frustratingly vague — best-case scenarios allow decades to solve the energy crisis, while worst-case scenarios seem utterly hopeless, predicting an uninhabitable planet no matter what we do.

At the University of Exeter, some researchers disagree with these vague boundaries. Professors Peter Cox, Chris Huntingford, and Mark Williamson co-authored a recent report in Nature that argues for a more constrained understanding of the climate’s sensitivity to carbon dioxide. In general, they found that both the worst-case and best-case scenarios for global warming are far more unlikely than previously thought.

Their research focuses on a measure known as equilibrium climate sensitivity (ECS) — defined as “the global mean warming that would occur if the atmospheric carbon dioxide (CO2) concentration were instantly doubled and the climate were then brought to equilibrium with that new level of CO2.”

This concept simplifies Earth’s actual climate — CO2 won’t double instantly and it often takes decades or centuries for the climate to return to equilibrium — but ECS is critical for gauging the planet’s response to fossil fuel emissions. It can help predict how much warming will come from increases in atmospheric CO2, even before the climate settles into equilibrium.

 

How hot will it get if atmospheric CO2 doubles?

In other words, what is Earth’s ECS? The Intergovernmental Panel on Climate Change (IPCC) predicts that ECS is between 1.5-4.5 °C, with a 25% chance that it exceeds 4 °C and a 16% chance that it’s lower than 1.5 °C.

Cox and his colleagues argue that this range is too generous. Using tighter constraints based on historical observations of warming, they conclude that doubling atmospheric CO2 would push temperatures between 2.2–3.4 °C higher, with a 2% chance that ECS exceeds 4 °C and a 3% chance that ECS is lower than 1.5 °C. The extremes (both good and bad) of global warming thus appear less likely.

Although some scientists applauded these findings, others are more skeptical. Kevin Trenberth, a Senior Scientist in the Climate Analysis Section at the National Center for Atmospheric Research (NCAR), says the study’s climate models don’t adequately account for natural variability, making it difficult to give the findings much weight.

“I do think some previous estimates are overblown and they do not adequately use the observations we have as constraints,” he explains. “This study picks up on that a bit, and in that sense the new results seem reasonable and could be important for ruling out really major extreme changes. But it is much more important to improve the models and make better projections into the future.”

 

But When Will Atmospheric CO2 Double?

CO2 levels may not have doubled from pre-industrial levels yet, but they’re increasing at an alarming rate.

In 1958, NOAA’s Mauna Loa observatory opened in Hawaii to monitor atmospheric change. Its first reading of atmospheric CO2 levels clocked in at 280 parts per million (ppm). In 2013, CO2 levels surpassed 400 ppm for the first time, and just four years later, the Mauna Loa Observatory recorded its first-ever carbon dioxide reading above 410 ppm.

The last time CO2 levels were this high, global surface temperatures were 6 °C higher, oceans were 100 feet higher, and modern humans didn’t exist. Unless the international community makes massive strides towards the Paris Agreement goals, atmospheric CO2 could rise to 560 ppm by 2050 — double the concentration in 1958, and a sign of much more global warming to come.

Annual CO2 Emissions from Fossil Fuels by Country, 1959-2017 / Source: Carbon Brief

 

 

 

 

 

 

 

 

 

 

 

 

 

Avoiding the worst, while ensuring the bad

On the one hand, Cox’s findings come as a sigh of relief, as they reduce uncertainty about ECS and renew hope of avoiding catastrophic global warming.

But these results also imply that there’s very little hope of achieving the best-case scenarios predicted by the Paris Agreement, which seeks to keep temperatures at or below a 1.5 °C increase. Since atmospheric CO2 levels could plausibly double by midcentury, Cox’s results indicate that not only will temperatures soar past 1.5 °C, but that they’ll quickly rise higher than Paris’ upper limit of 2 degrees.

Even 2 °C of warming would be devastating for the planet, leading to an ice-free Arctic and over a meter of sea level rise — enough to submerge the Marshall Islands — while leaving tropical regions deathly hot for outdoor workers and metropolises Karachi and Kolkata nearly uninhabitable. Deadly heat waves would plague North Africa, Central America, Southeast Asia, and the Southeast US, while decreasing the yields of wheat, rice and corn by over 20%. Food shortages and extreme weather could trigger the migration of tens of millions of people and leave regions of the world ungovernable.

This two-degree world might not be far off. Global temperatures have already risen 0.8 degrees celsius since pre-industrial levels, and the past few years have provided grave indications that things are heating up.

In January, NASA announced that 2017 was the second-hottest year on record (behind 2016 and ahead of 2015) while NOAA recorded it as their third-hottest year on record. Despite this minor discrepancy, both agencies agree that the 2017 data make the past four years the hottest period in their 138-year archives.

Global warming continues, and since the climate responds to rising CO2 levels on a delay of decades, there is more warming “in the pipeline,” no matter how quickly we cut fossil fuel emissions. But understanding ECS and continuing to improve climate models, as Dr. Trenberth suggests, can provide a clearer picture of what’s ahead and give us a better idea of the actions we need to take.

Is There a Trade-off Between Immediate and Longer-term AI Safety Efforts?

Something I often hear in the machine learning community and media articles is “Worries about superintelligence are a distraction from the *real* problem X that we are facing today with AI” (where X = algorithmic bias, technological unemployment, interpretability, data privacy, etc). This competitive attitude gives the impression that immediate and longer-term safety concerns are in conflict. But is there actually a tradeoff between them?

tradeoff

We can make this question more specific: what resources might these two types of efforts be competing for?

Media attention. Given the abundance of media interest in AI, there have been a lot of articles about all these issues. Articles about advanced AI safety have mostly been alarmist Terminator-ridden pieces that ignore the complexities of the problem. This has understandably annoyed many AI researchers, and led some of them to dismiss these risks based on the caricature presented in the media instead of the real arguments. The overall effect of media attention towards advanced AI risk has been highly negative. I would be very happy if the media stopped writing about superintelligence altogether and focused on safety and ethics questions about today’s AI systems.

Funding. Much of the funding for advanced AI safety work currently comes from donors and organizations who are particularly interested in these problems, such as the Open Philanthropy Project and Elon Musk. They would be unlikely to fund safety work that doesn’t generalize to advanced AI systems, so their donations to advanced AI safety research are not taking funding away from immediate problems. On the contrary, FLI’s first grant program awarded some funding towards current issues with AI (such as economic and legal impacts). There isn’t a fixed pie of funding that immediate and longer-term safety are competing for – it’s more like two growing pies that don’t overlap very much. There has been an increasing amount of funding going into both fields, and hopefully this trend will continue.

Talent. The field of advanced AI safety has grown in recent years but is still very small, and the “brain drain” resulting from researchers going to work on it has so far been negligible. The motivations for working on current and longer-term problems tend to be different as well, and these problems often attract different kinds of people. For example, someone who primarily cares about social justice is more likely to work on algorithmic bias, while someone who primarily cares about the long-term future is more likely to work on superintelligence risks.

Overall, there does not seem to be much tradeoff in terms of funding or talent, and the media attention tradeoff could (in theory) be resolved by devoting essentially all the airtime to current concerns. Not only are these issues not in conflict – there are synergies between addressing them. Both benefit from fostering a culture in the AI research community of caring about social impact and being proactive about risks. Some safety problems are highly relevant both in the immediate and longer term, such as interpretability and adversarial examples. I think we need more people working on these problems for current systems while keeping scalability to more advanced future systems in mind.

AI safety problems are too important for the discussion to be derailed by status contests like “my issue is better than yours”. This kind of false dichotomy is itself a distraction from the shared goal of ensuring AI has a positive impact on the world, both now and in the future. People who care about the safety of current and future AI systems are natural allies – let’s support each other on the path towards this common goal.

This article originally appeared on the Deep Safety blog.

MIRI’s January 2018 Newsletter

Our 2017 fundraiser was a huge success, with 341 donors contributing a total of $2.5 million!

Some of the largest donations came from Ethereum inventor Vitalik Buterin, bitcoin investors Christian Calderon and Marius van Voorden, poker players Dan Smith and Tom and Martin Crowley (as part of a matching challenge), and the Berkeley Existential Risk Initiative. Thank you to everyone who contributed!

Research updates

General updates

News and links

Rewinding the Doomsday Clock

On Thursday, the Bulletin of Atomic Scientists inched their iconic Doomsday Clock forward another thirty seconds. It is now two minutes to midnight.

Citing the growing threats of climate change, increasing tensions between nuclear-armed countries, and a general loss of trust in government institutions, the Bulletin warned that we are “making the world security situation more dangerous than it was a year ago—and as dangerous as it has been since World War II.”

The Doomsday Clock hasn’t fallen this close to midnight since 1953, a year after the US and Russia tested the hydrogen bomb, a bomb up to 1000 times more powerful than the bombs dropped on Hiroshima and Nagasaki. And like 1953, this year’s announcement highlighted the increased global tensions around nuclear weapons.

As the Bulletin wrote in their statement, “To call the world nuclear situation dire is to understate the danger—and its immediacy.”

Between the US, Russia, North Korea, and Iran, the threats of aggravated nuclear war and accidental nuclear war both grew in 2017. As former Secretary of Defense William Perry said in a statement, “The events of the past year have only increased my concern that the danger of a nuclear catastrophe is increasingly real. We are failing to learn from the lessons of history as we find ourselves blundering headfirst towards a second cold war.”

The threat of nuclear war has hovered in the background since the weapons were invented, but with the end of the Cold War, many were pulled into what now appears to have been a false sense of security. In the last year, aggressive language and plans for new and upgraded nuclear weapons have reignited fears of nuclear armageddon. The recent false missile alerts in Hawaii and Japan were perhaps the starkest reminders of how close nuclear war feels, and how destructive it would be. 

 

But the nuclear threat isn’t all the Bulletin looks at. 2017 also saw the growing risk of climate change, a breakdown of trust in government institutions, and the emergence of new technological threats.

Climate change won’t hit humanity as immediately as nuclear war, but with each year that the international community fails to drastically reduce carbon fossil fuel emissions, the threat of catastrophic climate change grows. In 2017, the US pulled out of the Paris Climate Agreement and global carbon emissions grew 2% after a two-year plateau. Meanwhile, NASA and NOAA confirmed that the past four years are the hottest four years they’ve ever recorded.

For emerging technological risks, such as widespread cyber attacks, the development of autonomous weaponry, and potential misuse of synthetic biology, the Bulletin calls for the international community to work together. They write, “world leaders also need to seek better collective methods of managing those advances, so the positive aspects of new technologies are encouraged and malign uses discovered and countered.”

Pointing to disinformation campaigns and “fake news”, the Bulletin’s Science and Security Board writes that they are “deeply concerned about the loss of public trust in political institutions, in the media, in science, and in facts themselves—a loss that the abuse of information technology has fostered.”

 

Turning Back the Clock

The Doomsday Clock is a poignant symbol of the threats facing human civilization, and it received broad media attention this week through British outlets like The Guardian and The Independent, Australian outlets such as ABC Online, and American outlets from Fox News to The New York Times.

“[The clock] is a tool,” explains Lawrence Krauss, a theoretical physicist at Arizona State University and member of the Bulletin’s Science and Security Board. “For one day a year, there are thousands of newspaper stories about the deep, existential threats that humanity faces.”

The Bulletin ends its report with a list of priorities to help turn back the Clock, chocked full of suggestions for government and industrial leaders. But the authors also insist that individual citizens have a crucial role in tackling humanity’s greatest risks.

“Leaders react when citizens insist they do so,” the authors explain. “Citizens around the world can use the power of the internet to improve the long-term prospects of their children and grandchildren. They can insist on facts, and discount nonsense. They can demand action to reduce the existential threat of nuclear war and unchecked climate change. They can seize the opportunity to make a safer and saner world.”

You can read the Bulletin’s full report here.

AI Should Provide a Shared Benefit for as Many People as Possible

Shared Benefit Principle: AI technologies should benefit and empower as many people as possible.

Today, the combined wealth of the eight richest people in the world is greater than that of the poorest half of the global population. That is, 8 people have more than the combined wealth of 3,600,000,000 others.

This is already an extreme example of income inequality, but if we don’t prepare properly for artificial intelligence, the situation could get worse. In addition to the obvious economic benefits that would befall whoever designs advanced AI first, those who profit from AI will also likely have: access to better health care, happier and longer lives, more opportunities for their children, various forms of intelligence enhancement, and so on.

A Cultural Shift

Our approach to technology so far has been that whoever designs it first, wins — and they win big. In addition to the fabulous wealth an inventor can accrue, the creator of a new technology also assumes complete control over the product and its distribution. This means that an invention or algorithm will only benefit those whom the creator wants it to benefit. While this approach may have worked with previous inventions, many are concerned that advanced AI will be so powerful that we can’t treat it as business-as-usual.

What if we could ensure that as AI is developed we all benefit? Can we make a collective — and pre-emptive — decision to use AI to help raise up all people, rather than just a few?

Joshua Greene, a professor of psychology at Harvard, explains his take on this Principle: “We’re saying in advance, before we know who really has it, that this is not a private good. It will land in the hands of some private person, it will land in the hands of some private company, it will land in the hands of some nation first. But this principle is saying, ‘It’s not yours.’ That’s an important thing to say because the alternative is to say that potentially, the greatest power that humans ever develop belongs to whoever gets it first.”

AI researcher Susan Craw also agreed with the Principle, and she further clarified it.

“That’s definitely a yes,” Craw said, “But it is AI technologies plural, when it’s taken as a whole. Rather than saying that a particular technology should benefit lots of people, it’s that the different technologies should benefit and empower people.”

The Challenge of Implementation

However, as is the case with all of the Principles, agreeing with them is one thing; implementing them is another. John Havens, the Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, considered how the Shared Benefit Principle would ultimately need to be modified so that the new technologies will benefit both developed and developing countries alike.

“Yes, it’s great,” Havens said of the Principle, before adding, “if you can put a comma after it, and say … something like, ‘issues of wealth, GDP, notwithstanding.’ The point being, what this infers is whatever someone can afford, it should still benefit them.”

Patrick Lin, a philosophy professor at California Polytechnic State University, was even more concerned about how the Principle might be implemented, mentioning the potential for unintended consequences.

Lin explained: “Shared benefit is interesting, because again, this is a principle that implies consequentialism, that we should think about ethics as satisfying the preferences or benefiting as many people as possible. That approach to ethics isn’t always right. … Consequentialism often makes sense, so weighing these pros and cons makes sense, but that’s not the only way of thinking about ethics. Consequentialism could fail you in many cases. For instance, consequentialism might green-light torturing or severely harming a small group of people if it gives rise to a net increase in overall happiness to the greater community.”

“That’s why I worry about the … Shared Benefit Principle,” Lin continued. “[It] makes sense, but [it] implicitly adopts a consequentialist framework, which by the way is very natural for engineers and technologists to use, so they’re very numbers-oriented and tend to think of things in black and white and pros and cons, but ethics is often squishy. You deal with these squishy, abstract concepts like rights and duties and obligations, and it’s hard to reduce those into algorithms or numbers that could be weighed and traded off.”

As we move from discussing these Principles as ideals to implementing them as policy, concerns such as those that Lin just expressed will have to be addressed, keeping possible downsides of consequentialism and utilitarianism in mind.

The Big Picture

The devil will always be in the details. As we consider how we might shift cultural norms to prevent all benefits going only to the creators of new technologies — as well as considering the possible problems that could arise if we do so — it’s important to remember why the Shared Benefit Principle is so critical. Roman Yampolskiy, an AI researcher at the University of Louisville, sums this up:

“Early access to superior decision-making tools is likely to amplify existing economic and power inequalities turning the rich into super-rich, permitting dictators to hold on to power and making oppositions’ efforts to change the system unlikely to succeed. Advanced artificial intelligence is likely to be helpful in medical research and genetic engineering in particular making significant life extension possible, which would remove one the most powerful drivers of change and redistribution of power – death. For this and many other reasons, it is important that AI tech should be beneficial and empowering to all of humanity, making all of us wealthier and healthier.”

What Do You Think?

How important is the Shared Benefit Principle to you? How can we ensure that the benefits of new AI technologies are spread globally, rather than remaining with only a handful of people who developed them? How can we ensure that we don’t inadvertently create more problems in an effort to share the benefits of AI?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Deep Safety: NIPS 2017 Report

This year’s NIPS gave me a general sense that near-term AI safety is now mainstream and long-term safety is slowly going mainstream. On the near-term side, I particularly enjoyed Kate Crawford’s keynote on neglected problems in AI fairness, the ML security workshops, and the Interpretable ML symposium debate that addressed the “do we even need interpretability?” question in a somewhat sloppy but entertaining way. There was a lot of great content on the long-term side, including several oral / spotlight presentations and the Aligned AI workshop.

Value alignment papers

Inverse Reward Design (Hadfield-Menell et al) defines the problem of an RL agent inferring a human’s true reward function based on the proxy reward function designed by the human. This is different from inverse reinforcement learning, where the agent infers the reward function from human behavior. The paper proposes a method for IRD that models uncertainty about the true reward, assuming that the human chose a proxy reward that leads to the correct behavior in the training environment. For example, if a test environment unexpectedly includes lava, the agent assumes that a lava-avoiding reward function is as likely as a lava-indifferent or lava-seeking reward function, since they lead to the same behavior in the training environment. The agent then follows a risk-averse policy with respect to its uncertainty about the reward function.

ird

The paper shows some encouraging results on toy environments for avoiding some types of side effects and reward hacking behavior, though it’s unclear how well they will generalize to more complex settings. For example, the approach to reward hacking relies on noticing disagreements between different sensors / features that agreed in the training environment, which might be much harder to pick up on in a complex environment. The method is also at risk of being overly risk-averse and avoiding anything new, whether it be lava or gold, so it would be great to see some approaches for safe exploration in this setting.

Repeated Inverse RL (Amin et al) defines the problem of inferring intrinsic human preferences that incorporate safety criteria and are invariant across many tasks. The reward function for each task is a combination of the task-invariant intrinsic reward (unobserved by the agent) and a task-specific reward (observed by the agent). This multi-task setup helps address the identifiability problem in IRL, where different reward functions could produce the same behavior.

repeated irl

The authors propose an algorithm for inferring the intrinsic reward while minimizing the number of mistakes made by the agent. They prove an upper bound on the number of mistakes for the “active learning” case where the agent gets to choose the tasks, and show that a certain number of mistakes is inevitable when the agent cannot choose the tasks (there is no upper bound in that case). Thus, letting the agent choose the tasks that it’s trained on seems like a good idea, though it might also result in a selection of tasks that is less interpretable to humans.

Deep RL from Human Preferences (Christiano et al) uses human feedback to teach deep RL agents about complex objectives that humans can evaluate but might not be able to demonstrate (e.g. a backflip). The human is shown two trajectory snippets of the agent’s behavior and selects which one more closely matches the objective. This method makes very efficient use of limited human feedback, scaling much better than previous methods and enabling the agent to learn much more complex objectives (as shown in MuJoCo and Atari).

qbert_trimmed

Dynamic Safe Interruptibility for Decentralized Multi-Agent RL (El Mhamdi et al) generalizes the safe interruptibility problem to the multi-agent setting. Non-interruptible dynamics can arise in a group of agents even if each agent individually is indifferent to interruptions. This can happen if Agent B is affected by interruptions of Agent A and is thus incentivized to prevent A from being interrupted (e.g. if the agents are self-driving cars and A is in front of B on the road). The multi-agent definition focuses on preserving the system dynamics in the presence of interruptions, rather than on converging to an optimal policy, which is difficult to guarantee in a multi-agent setting.

Aligned AI workshop

This was a more long-term-focused version of the Reliable ML in the Wild workshop held in previous years. There were many great talks and posters there – my favorite talks were Ian Goodfellow’s “Adversarial Robustness for Aligned AI” and Gillian Hadfield’s “Incomplete Contracting and AI Alignment”.

Ian made the case of ML security being important for long-term AI safety. The effectiveness of adversarial examples is problematic not only from the near-term perspective of current ML systems (such as self-driving cars) being fooled by bad actors. It’s also bad news from the long-term perspective of aligning the values of an advanced agent, which could inadvertently seek out adversarial examples for its reward function due to Goodhart’s law. Relying on the agent’s uncertainty about the environment or human preferences is not sufficient to ensure safety, since adversarial examples can cause the agent to have arbitrarily high confidence in the wrong answer.

ian talk_3

Gillian approached AI safety from an economics perspective, drawing parallels between specifying objectives for artificial agents and designing contracts for humans. The same issues that make contracts incomplete (the designer’s inability to consider all relevant contingencies or precisely specify the variables involved, and incentives for the parties to game the system) lead to side effects and reward hacking for artificial agents.

Gillian talk_4

The central question of the talk was how we can use insights from incomplete contracting theory to better understand and systematically solve specification problems in AI safety, which is a really interesting research direction. The objective specification problem seems even harder to me than the incomplete contract problem, since the contract design process relies on some level of shared common sense between the humans involved, which artificial agents do not currently possess.

Interpretability for AI safety

I gave a talk at the Interpretable ML symposium on connections between interpretability and long-term safety, which explored what forms of interpretability could help make progress on safety problems (slidesvideo). Understanding our systems better can help ensure that safe behavior generalizes to new situations, and it can help identify causes of unsafe behavior when it does occur.

For example, if we want to build an agent that’s indifferent to being switched off, it would be helpful to see whether the agent has representations that correspond to an off-switch, and whether they are used in its decisions. Side effects and safe exploration problems would benefit from identifying representations that correspond to irreversible states (like “broken” or “stuck”). While existing work on examining the representations of neural networks focuses on visualizations, safety-relevant concepts are often difficult to visualize.

Local interpretability techniques that explain specific predictions or decisions are also useful for safety. We could examine whether features that are idiosyncratic to the training environment or indicate proximity to dangerous states influence the agent’s decisions. If the agent can produce a natural language explanation of its actions, how does it explain problematic behavior like reward hacking or going out of its way to disable the off-switch?

There are many ways in which interpretability can be useful for safety. Somewhat less obvious is what safety can do for interpretability: serving as grounding for interpretability questions. As exemplified by the final debate of the symposium, there is an ongoing conversation in the ML community trying to pin down the fuzzy idea of interpretability – what is it, do we even need it, what kind of understanding is useful, etc. I think it’s important to keep in mind that our desire for interpretability is to some extent motivated by our systems being fallible – understanding our AI systems would be less important if they were 100% robust and made no mistakes. From the safety perspective, we can define interpretability as the kind of understanding that help us ensure the safety of our systems.

For those interested in applying the interpretability hammer to the safety nail, or working on other long-term safety questions, FLI has recently announced a new grant program. Now is a great time for the AI field to think deeply about value alignment. As Pieter Abbeel said at the end of his keynote, “Once you build really good AI contraptions, how do you make sure they align their value system with our value system? Because at some point, they might be smarter than us, and it might be important that they actually care about what we care about.”

(Thanks to Janos Kramar for his feedback on this post, and to everyone at DeepMind who gave feedback on the interpretability talk.)

This article was originally posted here.

Research for Beneficial Artificial Intelligence

Click here to see this page in other languages: Chinese 

Research Goal: The goal of AI research should be to create not undirected intelligence, but beneficial intelligence.

It’s no coincidence that the first Asilomar Principle is about research. On the face of it, the Research Goal Principle may not seem as glamorous or exciting as some of the other Principles that more directly address how we’ll interact with AI and the impact of superintelligence. But it’s from this first Principle that all of the others are derived.

Simply put, without AI research and without specific goals by researchers, AI cannot be developed. However, participating in research and working toward broad AI goals without considering the possible long-term effects of the research could be detrimental to society.

There’s a scene in Jurassic Park, in which Jeff Goldblum’s character laments that the scientists who created the dinosaurs “were so preoccupied with whether or not they could that they didn’t stop to think if they should.” Until recently, AI researchers have also focused primarily on figuring out what they could accomplish, without longer-term considerations, and for good reason: scientists were just trying to get their AI programs to work at all, and the results were far too limited to pose any kind of threat.

But in the last few years, scientists have made great headway with artificial intelligence. The impacts of AI on society are already being felt, and as we’re seeing with some of the issues of bias and discrimination that are already popping up, this isn’t always good.

Attitude Shift

Unfortunately, there’s still a culture within AI research that’s too accepting of the idea that the developers aren’t responsible for how their products are used. Stuart Russell compares this attitude to that of civil engineers, who would never be allowed to say something like, “I just design the bridge; someone else can worry about whether it stays up.”

Joshua Greene, a psychologist from Harvard, agrees. He explains:

“I think that is a bookend to the Common Good Principle [#23] – the idea that it’s not okay to be neutral. It’s not okay to say, ‘I just make tools and someone else decides whether they’re used for good or ill.’ If you’re participating in the process of making these enormously powerful tools, you have a responsibility to do what you can to make sure that this is being pushed in a generally beneficial direction. With AI, everyone who’s involved has a responsibility to be pushing it in a positive direction, because if it’s always somebody else’s problem, that’s a recipe for letting things take the path of least resistance, which is to put the power in the hands of the already powerful so that they can become even more powerful and benefit themselves.”

What’s Beneficial?

Other AI experts I spoke with agreed with the general idea of the Principle, but didn’t see quite eye-to-eye on how it was worded. Patrick Lin, for example was concerned about the use of the word “beneficial” and what it meant, while John Havens appreciated the word precisely because it forces us to consider what “beneficial” means in this context.

“I generally agree with this research goal,” explained Lin, a philosopher at Cal Poly. “Given the potential of AI to be misused or abused, it’s important to have a specific positive goal in mind. I think where it might get hung up is what this word ‘beneficial’ means. If we’re directing it towards beneficial intelligence, we’ve got to define our terms; we’ve got to define what beneficial means, and that to me isn’t clear. It means different things to different people, and it’s rare that you could benefit everybody.”

Meanwhile, Havens, the Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, was pleased the word forced the conversation.

“I love the word beneficial,” Havens said. “I think sometimes inherently people think that intelligence, in one sense, is always positive. Meaning, because something can be intelligent, or autonomous, and that can advance technology, that that is a ‘good thing’. Whereas the modifier ‘beneficial’ is excellent, because you have to define: What do you mean by beneficial? And then, hopefully, it gets more specific, and it’s: Who is it beneficial for? And, ultimately, what are you prioritizing? So I love the word beneficial.”

AI researcher Susan Craw, a professor at Robert Gordon University, also agrees with the Principle but questioned the order of the phrasing.

“Yes, I agree with that,” Craw said, but adds, “I think it’s a little strange the way it’s worded, because of ‘undirected.’ It might even be better the other way around, which is, it would be better to create beneficial research, because that’s a more well-defined thing.”

Long-term Research

Roman Yampolskiy, an AI researcher at the University of Louisville, brings the discussion back to the issues of most concern for FLI:

“The universe of possible intelligent agents is infinite with respect to both architectures and goals. It is not enough to simply attempt to design a capable intelligence, it is important to explicitly aim for an intelligence that is in alignment with goals of humanity. This is a very narrow target in a vast sea of possible goals and so most intelligent agents would not make a good optimizer for our values resulting in a malevolent or at least indifferent AI (which is likewise very dangerous). It is only by aligning future superintelligence with our true goals, that we can get significant benefit out of our intellectual heirs and avoid existential catastrophe.”

And with that in mind, we’re excited to announce we’ve launched a new round of grants! If you haven’t seen the Request for Proposals (RFP) yet, you can find it here. The focus of this RFP is on technical research or other projects enabling development of AI that is beneficial to society, and robust in the sense that the benefits are somewhat guaranteed: our AI systems must do what we want them to do.

If you’re a researcher interested in the field of AI, we encourage you to review the RFP and consider applying.

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

MIRI’s December 2017 Newsletter and Annual Fundraiser

Our annual fundraiser is live. Discussed in the fundraiser post:

  • News  — What MIRI’s researchers have been working on lately, and more.
  • Goals — We plan to grow our research team 2x in 2018–2019. If we raise $850k this month, we think we can do that without dipping below a 1.5-year runway.
  • Actual goals — A bigger-picture outline of what we think is the likeliest sequence of events that could lead to good global outcomes.

Our funding drive will be running until December 31st.

Research updates

General updates

When Should Machines Make Decisions?

Click here to see this page in other languages: Chinese   Russian 

Human Control: Humans should choose how and whether to delegate decisions to AI systems, to accomplish human-chosen objectives.

When is it okay to let a machine make a decision instead of a person? Most of us allow Google Maps to choose the best route to a new location. Many of us are excited to let self-driving cars take us to our destinations while we work or daydream. But are you ready to let your car choose your destination for you? The car might recognize that your ultimate objective is to eat or to shop or to run some errand, but most of the time, we have specific stores or restaurants that we want to go to, and we may not want the vehicle making those decisions for us.

What about more challenging decisions? Should weapons be allowed to choose who to kill? If so, how do they make that choice? And how do we address the question of control when artificial intelligence becomes much smarter than people? If an AI knows more about the world and our preferences than we do, would it be better if the AI made all of our decisions for us?

Questions like these are not easy to address. In fact, two of the AI experts I interviewed responded to this Principle with comments like, “Yeah, this is tough,” and “Right, that’s very, very tricky.”

And everyone I talked to agreed that this question of human control taps into some of the most challenging problems facing the design of AI.

“I think this is hugely important,” said Susan Craw, a Research Professor at Robert Gordon University Aberdeen. “Otherwise you’ll have systems wanting to do things for you that you don’t necessarily want them to do, or situations where you don’t agree with the way that systems are doing something.”

What does human control mean?

Joshua Greene, a psychologist at Harvard, cut right to the most important questions surrounding this Principle.

“This is an interesting one because it’s not clear what it would mean to violate that rule,” Greene explained. “What kind of decision could an AI system make that was not in some sense delegated to the system by a human? AI is a human creation. This principle, in practice, is more about what specific decisions we consciously choose to let the machines make. One way of putting it is that we don’t mind letting the machines make decisions, but whatever decisions they make, we want to have decided that they are the ones making those decisions.

“In, say, a navigating robot that walks on legs like a human, the person controlling it is not going to decide every angle of every movement. The humans won’t be making decisions about where exactly each foot will land, but the humans will have said, ‘I’m comfortable with the machine making those decisions as long as it doesn’t conflict with some other higher level command.’”

Roman Yampolskiy, an AI researcher at the University of Louisville, suggested that we might be even closer to giving AI decision-making power than many realize.

“In many ways we have already surrendered control to machines,” Yampolskiy said. “AIs make over 85% of all stock trades, control operation of power plants, nuclear reactors, electric grid, traffic light coordination and in some cases military nuclear response aka “dead hand.” Complexity and speed required to meaningfully control those sophisticated processes prevent meaningful human control. We are simply not quick enough to respond to ultrafast events, such as those in algorithmic trading and more and more seen in military drones. We are also not capable enough to keep thousands of variables in mind or to understand complicated mathematical models. Our reliance on machines will only increase but as long as they make good decisions (decisions we would make if we were smart enough, had enough data and enough time) we are OK with them making such decisions. It is only in cases where machine decisions diverge from ours that we would like to be able to intervene. Of course figuring out cases in which we diverge is exactly the unsolved Value Alignment Problem.”

Greene also elaborated on this idea: “The worry is when you have machines that are making more complicated and consequential decisions than ‘where do to put the next footstep.’ When you have a machine that can behave in an open-ended flexible way, how do you delegate anything without delegating everything? When you have someone who works for you and you have some problem that needs to be solved and you say, ‘Go figure it out,’ you don’t specify, ‘But don’t murder anybody in the process. Don’t break any laws and don’t spend all the company’s money trying to solve this one small-sized problem.’ There are assumptions in the background that are unspecified and fairly loose, but nevertheless very important.

“I like the spirit of this principle. It’s a specification of what follows from the more general idea of responsibility, that every decision is either made by a person or specifically delegated to the machine. But this one will be especially hard to implement once AI systems start behaving in more flexible, open-ended ways.”

Trust and Responsibility

AI is often compared to a child, both in terms of what level of learning a system has achieved and also how the system is learning. And just as we would be with a child, we’re hesitant to give a machine too much control until it’s proved it can be trusted to be safe and accountable. Artificial intelligence systems may have earned our trust when it comes to maps, financial trading, and the operation of power grids, but some question whether this trend can continue as AI systems become even more complex or when safety and well-being are at greater risk.

John Havens, the Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, explained, “Until universally systems can show that humans can be completely out of the loop and more often than not it will be beneficial, then I think humans need to be in the loop.”

“However, the research I’ve seen also shows that right now is the most dangerous time, where humans are told, ‘Just sit there, the system works 99% of the time, and we’re good.’ That’s the most dangerous situation,” he added, in reference to recent research that has found people stop paying attention if a system, like a self-driving car, rarely has problems. The research indicates that when problems do arise, people struggle to refocus and address the problem.

“I think it still has to be humans delegating first,” Havens concluded.

In addition to the issues already mentioned with decision-making machines, Patrick Lin, a philosopher at California Polytechnic State University, doesn’t believe it’s clear who would be held responsible if something does go wrong.

“I wouldn’t say that you must always have meaningful human control in everything you do,” Lin said. “I mean, it depends on the decision, but also I think this gives rise to new challenges. … This is related to the idea of human control and responsibility. If you don’t have human control, it could be unclear who’s responsible … the context matters. It really does depend on what kind of decisions we’re talking about, that will help determine how much human control there needs to be.”

Susan Schneider, a philosopher at the University of Connecticut, also worried about how these problems could be exacerbated if we achieve superintelligence.

“Even now it’s sometimes difficult to understand why a deep learning system made the decisions that it did,” she said, adding later, “If we delegate decisions to a system that’s vastly smarter than us, I don’t know how we’ll be able to trust it, since traditional methods of verification seem break down.”

What do you think?

Should humans be in control of a machine’s decisions at all times? Is that even possible? When is it appropriate for a machine to take over, and when do we need to make sure a person is “awake at the wheel,” so to speak? There are clearly times when machines are more equipped to safely address a situation than humans, but is that all that matters? When are you comfortable with a machine making decisions for you, and when would you rather remain in control?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Help Support FLI This Giving Tuesday

We’ve accomplished a lot. FLI has only been around for a few years, but during that time, we’ve:

  • Helped mainstream AI safety research,
  • Funded 37 AI safety research grants,
  • Launched multiple open letters that have brought scientists and the public together for the common cause of a beneficial future,
  • Drafted the 23 Asilomar Principles which offer guidelines for ensuring that AI is developed beneficially for all,
  • Supported the successful efforts by the International Campaign to Abolish Nuclear Weapons (ICAN) to get a treaty UN treaty passed that bans and stigmatizes nuclear weapons (ICAN won this year’s Nobel Peace Prize for their work),
  • Supported efforts to advance negotiations toward a ban on lethal autonomous weapons with a video that’s been viewed over 30 millions times,
  • Launched a website that’s received nearly 3 million page views,
  • Broadened the conversation about how humanity can flourish rather than flounder with powerful technologies.

But that’s just the beginning. There’s so much more we’d like to do, but we need your help. On Giving Tuesday this year, please consider a donation to FLI.

Where would your money go?

  • More AI safety research,
  • More high-quality information and communication about AI safety,
  • More efforts to keep the future safe from lethal autonomous weapons,
  • More efforts to trim excess nuclear stockpiles & reduce nuclear war risk,
  • More efforts to guarantee a future we can all look forward to.

Please Consider a Donation to Support FLI