Deep Safety: NIPS 2017 Report

This year’s NIPS gave me a general sense that near-term AI safety is now mainstream and long-term safety is slowly going mainstream. On the near-term side, I particularly enjoyed Kate Crawford’s keynote on neglected problems in AI fairness, the ML security workshops, and the Interpretable ML symposium debate that addressed the “do we even need interpretability?” question in a somewhat sloppy but entertaining way. There was a lot of great content on the long-term side, including several oral / spotlight presentations and the Aligned AI workshop.

Value alignment papers

Inverse Reward Design (Hadfield-Menell et al) defines the problem of an RL agent inferring a human’s true reward function based on the proxy reward function designed by the human. This is different from inverse reinforcement learning, where the agent infers the reward function from human behavior. The paper proposes a method for IRD that models uncertainty about the true reward, assuming that the human chose a proxy reward that leads to the correct behavior in the training environment. For example, if a test environment unexpectedly includes lava, the agent assumes that a lava-avoiding reward function is as likely as a lava-indifferent or lava-seeking reward function, since they lead to the same behavior in the training environment. The agent then follows a risk-averse policy with respect to its uncertainty about the reward function.

ird

The paper shows some encouraging results on toy environments for avoiding some types of side effects and reward hacking behavior, though it’s unclear how well they will generalize to more complex settings. For example, the approach to reward hacking relies on noticing disagreements between different sensors / features that agreed in the training environment, which might be much harder to pick up on in a complex environment. The method is also at risk of being overly risk-averse and avoiding anything new, whether it be lava or gold, so it would be great to see some approaches for safe exploration in this setting.

Repeated Inverse RL (Amin et al) defines the problem of inferring intrinsic human preferences that incorporate safety criteria and are invariant across many tasks. The reward function for each task is a combination of the task-invariant intrinsic reward (unobserved by the agent) and a task-specific reward (observed by the agent). This multi-task setup helps address the identifiability problem in IRL, where different reward functions could produce the same behavior.

repeated irl

The authors propose an algorithm for inferring the intrinsic reward while minimizing the number of mistakes made by the agent. They prove an upper bound on the number of mistakes for the “active learning” case where the agent gets to choose the tasks, and show that a certain number of mistakes is inevitable when the agent cannot choose the tasks (there is no upper bound in that case). Thus, letting the agent choose the tasks that it’s trained on seems like a good idea, though it might also result in a selection of tasks that is less interpretable to humans.

Deep RL from Human Preferences (Christiano et al) uses human feedback to teach deep RL agents about complex objectives that humans can evaluate but might not be able to demonstrate (e.g. a backflip). The human is shown two trajectory snippets of the agent’s behavior and selects which one more closely matches the objective. This method makes very efficient use of limited human feedback, scaling much better than previous methods and enabling the agent to learn much more complex objectives (as shown in MuJoCo and Atari).

qbert_trimmed

Dynamic Safe Interruptibility for Decentralized Multi-Agent RL (El Mhamdi et al) generalizes the safe interruptibility problem to the multi-agent setting. Non-interruptible dynamics can arise in a group of agents even if each agent individually is indifferent to interruptions. This can happen if Agent B is affected by interruptions of Agent A and is thus incentivized to prevent A from being interrupted (e.g. if the agents are self-driving cars and A is in front of B on the road). The multi-agent definition focuses on preserving the system dynamics in the presence of interruptions, rather than on converging to an optimal policy, which is difficult to guarantee in a multi-agent setting.

Aligned AI workshop

This was a more long-term-focused version of the Reliable ML in the Wild workshop held in previous years. There were many great talks and posters there – my favorite talks were Ian Goodfellow’s “Adversarial Robustness for Aligned AI” and Gillian Hadfield’s “Incomplete Contracting and AI Alignment”.

Ian made the case of ML security being important for long-term AI safety. The effectiveness of adversarial examples is problematic not only from the near-term perspective of current ML systems (such as self-driving cars) being fooled by bad actors. It’s also bad news from the long-term perspective of aligning the values of an advanced agent, which could inadvertently seek out adversarial examples for its reward function due to Goodhart’s law. Relying on the agent’s uncertainty about the environment or human preferences is not sufficient to ensure safety, since adversarial examples can cause the agent to have arbitrarily high confidence in the wrong answer.

ian talk_3

Gillian approached AI safety from an economics perspective, drawing parallels between specifying objectives for artificial agents and designing contracts for humans. The same issues that make contracts incomplete (the designer’s inability to consider all relevant contingencies or precisely specify the variables involved, and incentives for the parties to game the system) lead to side effects and reward hacking for artificial agents.

Gillian talk_4

The central question of the talk was how we can use insights from incomplete contracting theory to better understand and systematically solve specification problems in AI safety, which is a really interesting research direction. The objective specification problem seems even harder to me than the incomplete contract problem, since the contract design process relies on some level of shared common sense between the humans involved, which artificial agents do not currently possess.

Interpretability for AI safety

I gave a talk at the Interpretable ML symposium on connections between interpretability and long-term safety, which explored what forms of interpretability could help make progress on safety problems (slidesvideo). Understanding our systems better can help ensure that safe behavior generalizes to new situations, and it can help identify causes of unsafe behavior when it does occur.

For example, if we want to build an agent that’s indifferent to being switched off, it would be helpful to see whether the agent has representations that correspond to an off-switch, and whether they are used in its decisions. Side effects and safe exploration problems would benefit from identifying representations that correspond to irreversible states (like “broken” or “stuck”). While existing work on examining the representations of neural networks focuses on visualizations, safety-relevant concepts are often difficult to visualize.

Local interpretability techniques that explain specific predictions or decisions are also useful for safety. We could examine whether features that are idiosyncratic to the training environment or indicate proximity to dangerous states influence the agent’s decisions. If the agent can produce a natural language explanation of its actions, how does it explain problematic behavior like reward hacking or going out of its way to disable the off-switch?

There are many ways in which interpretability can be useful for safety. Somewhat less obvious is what safety can do for interpretability: serving as grounding for interpretability questions. As exemplified by the final debate of the symposium, there is an ongoing conversation in the ML community trying to pin down the fuzzy idea of interpretability – what is it, do we even need it, what kind of understanding is useful, etc. I think it’s important to keep in mind that our desire for interpretability is to some extent motivated by our systems being fallible – understanding our AI systems would be less important if they were 100% robust and made no mistakes. From the safety perspective, we can define interpretability as the kind of understanding that help us ensure the safety of our systems.

For those interested in applying the interpretability hammer to the safety nail, or working on other long-term safety questions, FLI has recently announced a new grant program. Now is a great time for the AI field to think deeply about value alignment. As Pieter Abbeel said at the end of his keynote, “Once you build really good AI contraptions, how do you make sure they align their value system with our value system? Because at some point, they might be smarter than us, and it might be important that they actually care about what we care about.”

(Thanks to Janos Kramar for his feedback on this post, and to everyone at DeepMind who gave feedback on the interpretability talk.)

This article was originally posted here.

Research for Beneficial Artificial Intelligence

Research Goal: The goal of AI research should be to create not undirected intelligence, but beneficial intelligence.

It’s no coincidence that the first Asilomar Principle is about research. On the face of it, the Research Goal Principle may not seem as glamorous or exciting as some of the other Principles that more directly address how we’ll interact with AI and the impact of superintelligence. But it’s from this first Principle that all of the others are derived.

Simply put, without AI research and without specific goals by researchers, AI cannot be developed. However, participating in research and working toward broad AI goals without considering the possible long-term effects of the research could be detrimental to society.

There’s a scene in Jurassic Park, in which Jeff Goldblum’s character laments that the scientists who created the dinosaurs “were so preoccupied with whether or not they could that they didn’t stop to think if they should.” Until recently, AI researchers have also focused primarily on figuring out what they could accomplish, without longer-term considerations, and for good reason: scientists were just trying to get their AI programs to work at all, and the results were far too limited to pose any kind of threat.

But in the last few years, scientists have made great headway with artificial intelligence. The impacts of AI on society are already being felt, and as we’re seeing with some of the issues of bias and discrimination that are already popping up, this isn’t always good.

Attitude Shift

Unfortunately, there’s still a culture within AI research that’s too accepting of the idea that the developers aren’t responsible for how their products are used. Stuart Russell compares this attitude to that of civil engineers, who would never be allowed to say something like, “I just design the bridge; someone else can worry about whether it stays up.”

Joshua Greene, a psychologist from Harvard, agrees. He explains:

“I think that is a bookend to the Common Good Principle [#23] – the idea that it’s not okay to be neutral. It’s not okay to say, ‘I just make tools and someone else decides whether they’re used for good or ill.’ If you’re participating in the process of making these enormously powerful tools, you have a responsibility to do what you can to make sure that this is being pushed in a generally beneficial direction. With AI, everyone who’s involved has a responsibility to be pushing it in a positive direction, because if it’s always somebody else’s problem, that’s a recipe for letting things take the path of least resistance, which is to put the power in the hands of the already powerful so that they can become even more powerful and benefit themselves.”

What’s Beneficial?

Other AI experts I spoke with agreed with the general idea of the Principle, but didn’t see quite eye-to-eye on how it was worded. Patrick Lin, for example was concerned about the use of the word “beneficial” and what it meant, while John Havens appreciated the word precisely because it forces us to consider what “beneficial” means in this context.

“I generally agree with this research goal,” explained Lin, a philosopher at Cal Poly. “Given the potential of AI to be misused or abused, it’s important to have a specific positive goal in mind. I think where it might get hung up is what this word ‘beneficial’ means. If we’re directing it towards beneficial intelligence, we’ve got to define our terms; we’ve got to define what beneficial means, and that to me isn’t clear. It means different things to different people, and it’s rare that you could benefit everybody.”

Meanwhile, Havens, the Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, was pleased the word forced the conversation.

“I love the word beneficial,” Havens said. “I think sometimes inherently people think that intelligence, in one sense, is always positive. Meaning, because something can be intelligent, or autonomous, and that can advance technology, that that is a ‘good thing’. Whereas the modifier ‘beneficial’ is excellent, because you have to define: What do you mean by beneficial? And then, hopefully, it gets more specific, and it’s: Who is it beneficial for? And, ultimately, what are you prioritizing? So I love the word beneficial.”

AI researcher Susan Craw, a professor at Robert Gordon University, also agrees with the Principle but questioned the order of the phrasing.

“Yes, I agree with that,” Craw said, but adds, “I think it’s a little strange the way it’s worded, because of ‘undirected.’ It might even be better the other way around, which is, it would be better to create beneficial research, because that’s a more well-defined thing.”

Long-term Research

Roman Yampolskiy, an AI researcher at the University of Louisville, brings the discussion back to the issues of most concern for FLI:

“The universe of possible intelligent agents is infinite with respect to both architectures and goals. It is not enough to simply attempt to design a capable intelligence, it is important to explicitly aim for an intelligence that is in alignment with goals of humanity. This is a very narrow target in a vast sea of possible goals and so most intelligent agents would not make a good optimizer for our values resulting in a malevolent or at least indifferent AI (which is likewise very dangerous). It is only by aligning future superintelligence with our true goals, that we can get significant benefit out of our intellectual heirs and avoid existential catastrophe.”

And with that in mind, we’re excited to announce we’ve launched a new round of grants! If you haven’t seen the Request for Proposals (RFP) yet, you can find it here. The focus of this RFP is on technical research or other projects enabling development of AI that is beneficial to society, and robust in the sense that the benefits are somewhat guaranteed: our AI systems must do what we want them to do.

If you’re a researcher interested in the field of AI, we encourage you to review the RFP and consider applying.

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

MIRI’s December 2017 Newsletter and Annual Fundraiser

Our annual fundraiser is live. Discussed in the fundraiser post:

  • News  — What MIRI’s researchers have been working on lately, and more.
  • Goals — We plan to grow our research team 2x in 2018–2019. If we raise $850k this month, we think we can do that without dipping below a 1.5-year runway.
  • Actual goals — A bigger-picture outline of what we think is the likeliest sequence of events that could lead to good global outcomes.

Our funding drive will be running until December 31st.

Research updates

General updates

When Should Machines Make Decisions?

Click here to see this page in other languages: Russian 

Human Control: Humans should choose how and whether to delegate decisions to AI systems, to accomplish human-chosen objectives.

When is it okay to let a machine make a decision instead of a person? Most of us allow Google Maps to choose the best route to a new location. Many of us are excited to let self-driving cars take us to our destinations while we work or daydream. But are you ready to let your car choose your destination for you? The car might recognize that your ultimate objective is to eat or to shop or to run some errand, but most of the time, we have specific stores or restaurants that we want to go to, and we may not want the vehicle making those decisions for us.

What about more challenging decisions? Should weapons be allowed to choose who to kill? If so, how do they make that choice? And how do we address the question of control when artificial intelligence becomes much smarter than people? If an AI knows more about the world and our preferences than we do, would it be better if the AI made all of our decisions for us?

Questions like these are not easy to address. In fact, two of the AI experts I interviewed responded to this Principle with comments like, “Yeah, this is tough,” and “Right, that’s very, very tricky.”

And everyone I talked to agreed that this question of human control taps into some of the most challenging problems facing the design of AI.

“I think this is hugely important,” said Susan Craw, a Research Professor at Robert Gordon University Aberdeen. “Otherwise you’ll have systems wanting to do things for you that you don’t necessarily want them to do, or situations where you don’t agree with the way that systems are doing something.”

What does human control mean?

Joshua Greene, a psychologist at Harvard, cut right to the most important questions surrounding this Principle.

“This is an interesting one because it’s not clear what it would mean to violate that rule,” Greene explained. “What kind of decision could an AI system make that was not in some sense delegated to the system by a human? AI is a human creation. This principle, in practice, is more about what specific decisions we consciously choose to let the machines make. One way of putting it is that we don’t mind letting the machines make decisions, but whatever decisions they make, we want to have decided that they are the ones making those decisions.

“In, say, a navigating robot that walks on legs like a human, the person controlling it is not going to decide every angle of every movement. The humans won’t be making decisions about where exactly each foot will land, but the humans will have said, ‘I’m comfortable with the machine making those decisions as long as it doesn’t conflict with some other higher level command.’”

Roman Yampolskiy, an AI researcher at the University of Louisville, suggested that we might be even closer to giving AI decision-making power than many realize.

“In many ways we have already surrendered control to machines,” Yampolskiy said. “AIs make over 85% of all stock trades, control operation of power plants, nuclear reactors, electric grid, traffic light coordination and in some cases military nuclear response aka “dead hand.” Complexity and speed required to meaningfully control those sophisticated processes prevent meaningful human control. We are simply not quick enough to respond to ultrafast events, such as those in algorithmic trading and more and more seen in military drones. We are also not capable enough to keep thousands of variables in mind or to understand complicated mathematical models. Our reliance on machines will only increase but as long as they make good decisions (decisions we would make if we were smart enough, had enough data and enough time) we are OK with them making such decisions. It is only in cases where machine decisions diverge from ours that we would like to be able to intervene. Of course figuring out cases in which we diverge is exactly the unsolved Value Alignment Problem.”

Greene also elaborated on this idea: “The worry is when you have machines that are making more complicated and consequential decisions than ‘where do to put the next footstep.’ When you have a machine that can behave in an open-ended flexible way, how do you delegate anything without delegating everything? When you have someone who works for you and you have some problem that needs to be solved and you say, ‘Go figure it out,’ you don’t specify, ‘But don’t murder anybody in the process. Don’t break any laws and don’t spend all the company’s money trying to solve this one small-sized problem.’ There are assumptions in the background that are unspecified and fairly loose, but nevertheless very important.

“I like the spirit of this principle. It’s a specification of what follows from the more general idea of responsibility, that every decision is either made by a person or specifically delegated to the machine. But this one will be especially hard to implement once AI systems start behaving in more flexible, open-ended ways.”

Trust and Responsibility

AI is often compared to a child, both in terms of what level of learning a system has achieved and also how the system is learning. And just as we would be with a child, we’re hesitant to give a machine too much control until it’s proved it can be trusted to be safe and accountable. Artificial intelligence systems may have earned our trust when it comes to maps, financial trading, and the operation of power grids, but some question whether this trend can continue as AI systems become even more complex or when safety and well-being are at greater risk.

John Havens, the Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, explained, “Until universally systems can show that humans can be completely out of the loop and more often than not it will be beneficial, then I think humans need to be in the loop.”

“However, the research I’ve seen also shows that right now is the most dangerous time, where humans are told, ‘Just sit there, the system works 99% of the time, and we’re good.’ That’s the most dangerous situation,” he added, in reference to recent research that has found people stop paying attention if a system, like a self-driving car, rarely has problems. The research indicates that when problems do arise, people struggle to refocus and address the problem.

“I think it still has to be humans delegating first,” Havens concluded.

In addition to the issues already mentioned with decision-making machines, Patrick Lin, a philosopher at California Polytechnic State University, doesn’t believe it’s clear who would be held responsible if something does go wrong.

“I wouldn’t say that you must always have meaningful human control in everything you do,” Lin said. “I mean, it depends on the decision, but also I think this gives rise to new challenges. … This is related to the idea of human control and responsibility. If you don’t have human control, it could be unclear who’s responsible … the context matters. It really does depend on what kind of decisions we’re talking about, that will help determine how much human control there needs to be.”

Susan Schneider, a philosopher at the University of Connecticut, also worried about how these problems could be exacerbated if we achieve superintelligence.

“Even now it’s sometimes difficult to understand why a deep learning system made the decisions that it did,” she said, adding later, “If we delegate decisions to a system that’s vastly smarter than us, I don’t know how we’ll be able to trust it, since traditional methods of verification seem break down.”

What do you think?

Should humans be in control of a machine’s decisions at all times? Is that even possible? When is it appropriate for a machine to take over, and when do we need to make sure a person is “awake at the wheel,” so to speak? There are clearly times when machines are more equipped to safely address a situation than humans, but is that all that matters? When are you comfortable with a machine making decisions for you, and when would you rather remain in control?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Help Support FLI This Giving Tuesday

We’ve accomplished a lot. FLI has only been around for a few years, but during that time, we’ve:

  • Helped mainstream AI safety research,
  • Funded 37 AI safety research grants,
  • Launched multiple open letters that have brought scientists and the public together for the common cause of a beneficial future,
  • Drafted the 23 Asilomar Principles which offer guidelines for ensuring that AI is developed beneficially for all,
  • Supported the successful efforts by the International Campaign to Abolish Nuclear Weapons (ICAN) to get a treaty UN treaty passed that bans and stigmatizes nuclear weapons (ICAN won this year’s Nobel Peace Prize for their work),
  • Supported efforts to advance negotiations toward a ban on lethal autonomous weapons with a video that’s been viewed over 30 millions times,
  • Launched a website that’s received nearly 3 million page views,
  • Broadened the conversation about how humanity can flourish rather than flounder with powerful technologies.

But that’s just the beginning. There’s so much more we’d like to do, but we need your help. On Giving Tuesday this year, please consider a donation to FLI.

Where would your money go?

  • More AI safety research,
  • More high-quality information and communication about AI safety,
  • More efforts to keep the future safe from lethal autonomous weapons,
  • More efforts to trim excess nuclear stockpiles & reduce nuclear war risk,
  • More efforts to guarantee a future we can all look forward to.

Please Consider a Donation to Support FLI

Harvesting Water Out of Thin Air: A Solution to Water Shortage Crisis?

The following post was written by Jung Hyun Claire Park.

One in nine people around the world do not have access to clean water.  As the global population increases and climate heats up, experts fear water shortages will increase. To address this anticipated crisis, scientists are turning to a natural reserve of fresh water that has yet to be exploited: the atmosphere.

The atmosphere is estimated to contain 13 trillion liters of water vapor and droplets, which could significantly contribute to resolving the water shortage problem. However, a number of attempts have already been made to collect water from air. Previously, researchers have used porous materials such as zeolites, silica gel, and clay to capture water molecules, but these approaches suffered from several limitations. First, the aforementioned materials work efficiently only in high-humidity condition. Yet it’s low-humidity areas, like sub-Saharan Africa, which are in greatest need of clean drinking water. Another limitation is that these materials tend to cling too tightly to the water molecules they collect. Thus, these previous methods of collecting water from air have required high energy consumption to release the absorbed water, diminishing their viability as a solution to the water shortage crisis.

Now, Dr. Omar Yaghi and a team of scientists at Massachusetts Institute of Technology and the University of California Berkeley have developed a new technology that provides a solution to these limitations. The technology uses a material called a metal-organic framework (MOF) that effectively captures water molecules at low-humidity levels. And the only energy necessary to release drinkable water from the MOFs can be harnessed from ambient sunlight.

How Does This System Work?

MOFs belong to a family of porous compounds whose sponge-like configuration is ideal for trapping molecules. The MOFs can be easily modified at the molecular level to meet various needs, and they are highly customizable. Researchers can modify the type of molecule that’s absorbed, the optimal humidity level for maximum absorption, and the energy required to release trapped molecules — thus yielding a plethora of potential MOF variations. The proposed water harvesting technology uses a hydrophilic variation of MOFs called microcrystalline powder MOF-801. This variation is engineered to more efficiently harvest water from an atmosphere in which the relative humidity level as low as 20% — the typical level found in the world’s driest regions. Furthermore, the MOF-801 only requires energy from ambient sunlight to relinquish its collected water, which means the energy necessary for this technology is abundant in precisely those desert areas with the most severely limited supply of fresh water.  MOF-801 overcomes most, if not all, of the limitations found in the materials that were previously proposed for harvesting water from air.

A Schematic of a metal-organic framework (MOF). The yellow balls represent the porous space where molecules are captured. The lines are organic linkers, and the blue intersections are metal ions. UC Berkeley, Berkeley Lab image

The prototype is shaped like a rectangular prism and it operates through a simple mechanism. To collect water from the atmosphere, MOF is pressed into a thin sheet of copper metal and placed under the solar absorber located on top of the prism. The condenser plate is placed at the bottom and is kept at room temperature. Once the top layer absorbs solar heat, water is released from the MOF and collected in the cooler bottom layer due to concentration and temperature difference. Tests showed that one kilogram (about 2 pounds) of MOF can collect about 2.8L of water per day. Yaghi notes that since the technology collects distilled water, all that’s needed is the addition of mineral ions. He suggests that one kilogram of MOF will be able to produce enough drinkable water per day for a person living in some of the driest regions on earth.

Image of a water harvesting prototype with MOF-801 with outer dimension of 7cm by 7cm x 4.5cm. MIT.

Why This Technology Is Promising

The promise of this technology mostly lies in its sustainability. Water can be pulled from the air without any energy input beyond that which can be collected from the ambient sunlight. In addition, MOF-801 is a zirconium-based compound that is widely available for a low cost. And the technology has a long-life span: Yaghi predicts that the MOF will last through at least 100,000 cycles of water absorption and desorption, and thus it does not require frequent replacement. Plus, the water harvesting technology employing MOF isn’t limited to drinking water. It could be used for any service requiring water, such as agriculture. Yaghi believes that this water harvesting technology could pose a viable solution for water shortage problems in various regions of the world.

Yaghi also anticipates that the material itself could be used for the separation, storage, and catalysis of molecules other than water as well. For instance, MOF can be tailored to capture carbon emissions before those emissions reach the atmosphere. Or they may be designed to remove existing CO2 from the atmosphere. MOF, as the name suggests, is simply a framework, and thus it has opened up many opportunities for modification to suit practical needs.

Future of Water Harvesting Technology

The team of researchers from Berkeley and MIT are currently pushing to test the water harvesting technology in real-life settings in regions with low humidity levels. Yaghi remarked that his ultimate goal would be to “have drinking water widely available, especially in areas that lack clean water.” He envisions providing water to villages that are “off-grid,” where each household will have a machine and create their own “personalized water.” And he hopes his envisioned future may not be too far away.

AI Researchers Create Video to Call for Autonomous Weapons Ban at UN

In response to growing concerns about autonomous weapons, a coalition of AI researchers and advocacy organizations released a fictitious video on Monday that depicts a disturbing future in which lethal autonomous weapons have become cheap and ubiquitous.

The video was launched in Geneva, where AI researcher Stuart Russell presented it at an event at the United Nations Convention on Conventional Weapons hosted by the Campaign to Stop Killer Robots.

Russell, in an appearance at the end of the video, warns that the technology described in the film already exists and that the window to act is closing fast.

Support for a ban has been mounting. Just this past week, over 200 Canadian scientists and over 100 Australian scientists in academia and industry penned open letters to Prime Minister Justin Trudeau and Malcolm Turnbull urging them to support the ban. Earlier this summer, over 130 leaders of AI companies signed a letter in support of this week’s discussions. These letters follow a 2015 open letter released by the Future of Life Institute and signed by more than 20,000 AI/Robotics researchers and others, including Elon Musk and Stephen Hawking.

These letters indicate both grave concern and a sense that the opportunity to curtail lethal autonomous weapons is running out.

Noel Sharkey of the International Committee for Robot Arms Control explains, “The Campaign to Stop Killer Robots is not trying to stifle innovation in artificial intelligence and robotics and it does not wish to ban autonomous systems in the civilian or military world. Rather we see an urgent need to prevent automation of the critical functions for selecting targets and applying violent force without human deliberation and to ensure meaningful human control for every attack.”

Drone technology today is very close to having fully autonomous capabilities. And many of the world’s leading AI researchers worry that if these autonomous weapons are ever developed, they could dramatically lower the threshold for armed conflict, ease and cheapen the taking of human life, empower terrorists, and create global instability. The US and other nations have used drones and semi-automated systems to carry out attacks for several years now, but fully removing a human from the loop is at odds with international humanitarian and human rights law.

A ban can exert great power on the trajectory of technological development without needing to stop every instance of misuse. Max Tegmark, MIT Professor and co-founder of the Future of Life Institute, points out, “People’s knee-jerk reaction that bans can’t help isn’t historically accurate: the bioweapon ban created such a powerful stigma that, despite treaty cheating, we have almost no bioterror attacks today and almost all biotech funding is civilian.”

As Toby Walsh, an AI professor at the University of New South Wales, argues: “The academic community has sent a clear and consistent message. Autonomous weapons will be weapons of terror, the perfect tool for those who have no qualms about the terrible uses to which they are put. We need to act now before this future arrives.”

More than 70 countries are participating in the meeting taking place November 13 – 17 organized by the 2016 Fifth Review Conference at the UN, which established a Group of Governmental Experts on lethal autonomous weapons. The meeting is chaired by Ambassador Amandeep Singh Gill of India, and the countries will continue negotiations of what could become an historic international treaty.

For more information about autonomous weapons, see the following resources:

Developing Ethical Priorities for Neurotechnologies and AI

Click here to see this page in other languages:  Russian 

Private companies and military sectors have moved beyond the goal of merely understanding the brain to that of augmenting and manipulating brain function. In particular, companies such as Elon Musk’s Neuralink and Bryan Johnson’s Kernel are hoping to harness advances in computing and artificial intelligence alongside neuroscience to provide new ways to merge our brains with computers.

Musk also sees this as a means to help address both AI safety and human relevance as algorithms outperform humans in one area after another. He has previously stated, “Some high bandwidth interface to the brain will be something that helps achieve a symbiosis between human and machine intelligence and maybe solves the control problem and the usefulness problem.”

In a comment in Nature, 27 people from The Morningside Group outlined four ethical priorities for the emerging space of neurotechnologies and artificial intelligence. The authors include neuroscientists, ethicists and AI engineers from Google, top US and global Universities, and several non-profit research organizations such as AI Now and The Hastings Center.

A Newsweek article describes their concern, “Artificial intelligence could hijack brain-computer interfaces and take control of our minds.” While this is not exactly the warning the Group describes, they do suggest we are in store for some drastic changes:

…we are on a path to a world in which it will be possible to decode people’s mental processes and directly manipulate the brain mechanisms underlying their intentions, emotions and decisions; where individuals could communicate with others simply by thinking; and where powerful computational systems linked directly to people’s brains aid their interactions with the world such that their mental and physical abilities are greatly enhanced.

The authors suggest that although these advances could provide meaningful and beneficial enhancements to the human experience, they could also exacerbate social inequalities, enable more invasive forms of social manipulation, and threaten core fundamentals of what it means to be human. They encourage readers to consider the ramifications of these emerging technologies now.

Referencing the Asilomar AI Principles and other ethical guidelines as a starting point, they call for a new set of guidelines that specifically address concerns that will emerge as groups like Elon Musk’s startup Neuralink and other companies around the world explore ways to improve the interface between brains and machines. Their recommendations cover four key areas: privacy and consent; agency and identity; augmentation; and bias.

Regarding privacy and consent, they posit that the right to keep neural data private is critical. To this end, they recommend opt-in policies, strict regulation of commercial entities, and the use of blockchain-based techniques to provide transparent control over the use of data. In relation to agency and identity, they recommend that bodily and mental integrity, as well as the ability to choose our actions, be enshrined in international treaties such as the Universal Declaration of Human Rights.

In the area of augmentation, the authors discuss the possibility of an augmentation arms race of soldiers in the pursuit of so-called “super-soldiers” that are more resilient to combat conditions. They recommend that the use of neural technology for military purposes be stringently regulated. And finally, they recommend the exploration of countermeasures, as well as diversity in the design process, in order to prevent widespread bias in machine learning applications.

The ways in which AI will increasingly connect with our bodies and brains pose challenging safety and ethical concerns that will require input from a vast array of people. As Dr. Rafael Yuste of Columbia University, a neuroscientist who co-authored the essay, told STAT, “the ethical thinking has been insufficient. Science is advancing to the point where suddenly you can do things you never would have thought possible.”

MIRI’s November 2017 Newsletter

Eliezer Yudkowsky has written a new book on civilizational dysfunction and outperformance: Inadequate Equilibria: Where and How Civilizations Get Stuck. The full book will be available in print and electronic formats November 16. To preorder the ebook or sign up for updates, visit equilibriabook.com.

We’re posting the full contents online in stages over the next two weeks. The first two chapters are:

  1. Inadequacy and Modesty (discussion: LessWrong, EA Forum, Hacker News)
  2. An Equilibrium of No Free Energy (discussion: LessWrong, EA Forum)

Research updates

General updates

News and links

Scientists to Congress: The Iran Deal is a Keeper

The following article was written by Dr. Lisbeth Gronlund and originally posted on the Union of Concerned Scientists blog.

The July 2015 Iran Deal, which places strict, verified restrictions on Iran’s nuclear activities, is again under attack by President Trump. This time he’s kicked responsibility over to Congress to “fix” the agreement and promised that if Congress fails to do so, he will withdraw from it.

As the New York Times reported, in response to this development over 90 prominent scientists sent a letter to leading members of Congress yesterday urging them to support the Iran Deal—making the case that continued US participation will enhance US security.

Many of these scientists also signed a letter strongly supporting the Iran Deal to President Obama in August 2015, as well as a letter to President-elect Trump in January. In all three cases, the first signatory is Richard L. Garwin, a long-standing UCS board member who helped develop the H-bomb as a young man and has since advised the government on all matters of security issues. Last year, he was awarded a Presidential Medal of Freedom.

What’s the Deal?

If President Trump did pull out of the agreement, what would that mean? First, the Joint Comprehensive Plan of Action (JCPoA) (as it is formally named) is not an agreement between just Iran and the US—but also includes China, France, Germany, Russia, the UK, and the European Union. So the agreement will continue—unless Iran responds by quitting as well. (More on that later.)

The Iran Deal is not a treaty, and did not require Senate ratification. Instead, the United States participates in the JCPoA by presidential action. However, Congress wanted to get into the act and passed The Iran Agreement Review Act of 2015, which requires the president to certify every 90 days that Iran remains in compliance.

President Trump has done so twice, but declined to do so this month and instead called for Congress—and US allies—to work with the administration “to address the deal’s many serious flaws.” Among those supposed flaws is that the deal covering Iran’s nuclear activities does not also cover its missile activities!

According to President Trump’s October 13 remarks:

Key House and Senate leaders are drafting legislation that would amend the Iran Nuclear Agreement Review Act to strengthen enforcement, prevent Iran from developing an inter– —this is so totally important—an intercontinental ballistic missile, and make all restrictions on Iran’s nuclear activity permanent under US law.

The Reality

First, according to the International Atomic Energy Agency, which verifies the agreement, Iran remains in compliance. This was echoed by Norman Roule, who retired this month after working at the CIA for three decades. He served as the point person for US intelligence on Iran under multiple administrations. He told an NPR interviewer, “I believe we can have confidence in the International Atomic Energy Agency’s efforts.”

Second, the Iran Deal was the product of several years of negotiations. Not surprisingly, recent statements by the United Kingdom, France, Germany, the European Union, and Iran make clear that they will not agree to renegotiate the agreement. It just won’t happen. US allies are highly supportive of the Iran Deal.

Third, Congress can change US law by amending the Iran Nuclear Agreement Review Act, but this will have no effect on the terms of the Iran Deal. This may be a face-saving way for President Trump to stay with the agreement—for now. However, such amendments will lay the groundwork for a future withdrawal and give credence to President Trump’s claims that the agreement is a “bad deal.” That’s why the scientists urged Congress to support the Iran Deal as it is.

The End of a Good Deal?

If President Trump pulls out of the Iran Deal and reimposes sanctions against Iran, our allies will urge Iran to stay with the deal. But Iran has its own hardliners who want to leave the deal—and a US withdrawal is exactly what they are hoping for.

If Iran leaves the agreement, President Trump will have a lot to answer for. Here is an agreement that significantly extends the time it would take for Iran to produce enough material for a nuclear weapon, and that would give the world an alarm if they started to do so. For the United States to throw that out the window would be deeply irresponsible. It would not just undermine its own security, but that of Iran’s neighbors and the rest of the world.

Congress should do all it can to prevent this outcome. The scientists sent their letter to Senators Corker and Cardin, who are the Chairman and Ranking Member of the Senate Foreign Relations Committee, and to Representatives Royce and Engel, who are the Chairman and Ranking Member of the House Foreign Affairs Committee, because these men have a special responsibility on issues like these.

Let’s hope these four men will do what’s needed to prevent the end of a good deal—a very good deal.

55 Years After Preventing Nuclear Attack, Arkhipov Honored With Inaugural Future of Life Award

Click here to see this page in other languages: Russian

London, UK – On October 27, 1962, a soft-spoken naval officer named Vasili Arkhipov single-handedly prevented nuclear war during the height of the Cuban Missile Crisis. Arkhipov’s submarine captain, thinking their sub was under attack by American forces, wanted to launch a nuclear weapon at the ships above. Arkhipov, with the power of veto, said no, thus averting nuclear war.

Now, 55 years after his courageous actions, the Future of Life Institute has presented the Arkhipov family with the inaugural Future of Life Award to honor humanity’s late hero.

Arkhipov’s surviving family members, represented by his daughter Elena and grandson Sergei, flew into London for the ceremony, which was held at the Institute of Engineering & Technology. After explaining Arkhipov’s heroics to the audience, Max Tegmark, president of FLI, presented the Arkhipov family with their award and $50,000. Elena and Sergei were both honored by the gesture and by the overall message of the award.

Elena explained that her father “always thought that he did what he had to do and never consider his actions as heroism. … Our family is grateful for the prize and considers it as a recognition of his work and heroism. He did his part for the future so that everyone can live on our planet.”

Elena and Sergei with the Future of Life Award

The Future of Life Award seeks to recognize and reward those who take exceptional measures to safeguard the collective future of humanity. Arkhipov, whose courage and composure potentially saved billions of lives, was an obvious choice for the inaugural event.

“Vasili Arkhipov is arguably the most important person in modern history, thanks to whom October 27 2017 isn’t the 55th anniversary of World War III,” FLI president Max Tegmark explained. “We’re showing our gratitude in a way he’d have appreciated, by supporting his loved ones.”

The award also aims to foster a dialogue about the growing existential risks that humanity faces, and the people that work to mitigate them.

Jaan Tallinn, co-founder of FLI, said: “Given that this century will likely bring technologies that can be even more dangerous than nukes, we will badly need more people like Arkhipov — people who will represent humanity’s interests even in the heated moments of a crisis.”

FLI president Max Tegmark presenting the Future of Life Award to Arkhipov’s daughter, Elena, and grandson, Sergei.

 

Arkhipov’s Story

On October 27 1962, during the Cuban Missile Crisis, eleven US Navy destroyers and the aircraft carrier USS Randolph had cornered the Soviet submarine B-59 near Cuba, in international waters outside the US “quarantine” area. Arkhipov was one of the officers on board. The crew had had no contact with Moscow for days and didn’t know whether World War III had already begun. Then the Americans started dropping small depth charges at them which, unbeknownst to the crew, they’d informed Moscow were merely meant to force the sub to surface and leave.

“We thought – that’s it – the end”, crewmember V.P. Orlov recalled. “It felt like you were sitting in a metal barrel, which somebody is constantly blasting with a sledgehammer.”

What the Americans didn’t know was that the B-59 crew had a nuclear torpedo that they were authorized to launch without clearing it with Moscow. As the depth charges intensified and temperatures onboard climbed above 45ºC (113ºF), many crew members fainted from carbon dioxide poisoning, and in the midst of this panic, Captain Savitsky decided to launch their nuclear weapon.

“Maybe the war has already started up there,” he shouted. “We’re gonna blast them now! We will die, but we will sink them all – we will not disgrace our Navy!”

The combination of depth charges, extreme heat, stress, and isolation from the outside world almost lit the fuse of full-scale nuclear war. But it didn’t. The decision to launch a nuclear weapon had to be authorized by three officers on board, and one of them, Vasili Arkhipov, said no.

Amidst the panic, the 34-year old Arkhipov remained calm and tried to talk Captain Savitsky down. He eventually convinced Savitsky that these depth charges were signals for the Soviet submarine to surface, and the sub surfaced safely and headed north, back to the Soviet Union.

It is sobering that very few have heard of Arkhipov, although his decision was perhaps the most valuable individual contribution to human survival in modern history. PBS made a documentary, The Man Who Saved the World, documenting Arkhipov’s moving heroism, and National Geographic profiled him as well in an article titled – You (and almost everyone you know) Owe Your Life to This Man.

The Cold War never became a hot war, in large part thanks to Arkhipov, but the threat of nuclear war remains high. Beatrice Fihn, Executive Director of the International Campaign to Abolish Nuclear Weapons (ICAN) and this year’s recipient of the Nobel Peace Prize, hopes that the Future of Life Award will help draw attention to the current threat of nuclear weapons and encourage more people to stand up to that threat. Fihn explains: “Arkhipov’s story shows how close to nuclear catastrophe we have been in the past. And as the risk of nuclear war is on the rise right now, all states must urgently join the Treaty on the Prohibition of Nuclear Weapons to prevent such catastrophe.”

Of her father’s role in preventing nuclear catastrophe, Elena explained: “We must strive so that the powerful people around the world learn from Vasili’s example. Everybody with power and influence should act within their competence for world peace.”

Understanding Artificial General Intelligence — An Interview With Hiroshi Yamakawa

Click here to see this page in other languages : Japanese    Russian

Artificial general intelligence (AGI) is something of a holy grail for many artificial intelligence researchers. Today’s narrow AI systems are only capable of specific tasks — such as internet searches, driving a car, or playing a video game — but none of the systems today can do all of these tasks. A single AGI would be able to accomplish a breadth and variety of cognitive tasks similar to that of people.

How close are we to developing AGI? How can we ensure that the power of AGI will benefit the world, and not just the group who develops it first? Will AGI become an existential threat for humanity, or an existential hope?

Dr. Hiroshi Yamakawa, Director of Dwango AI Laboratory, is one of the leading AGI researchers in Japan. Members of the Future of Life Institute sat down with Dr. Yamakawa and spoke with him about AGI and his lab’s progress in developing it. In this interview, Dr. Yamakawa explains how AI can model the human brain, his vision of a future where humans coexist with AGI, and why the Japanese think of AI differently than many in the West.

This transcript has been heavily edited for brevity. You can see the full conversation here.

Why did the Dwango Artificial Intelligence Laboratory make a large investment in [AGI]?

HY: Usable AI that has been developed up to now is essentially for solving specific areas or addressing a particular problem. Rather than just solving a number of problems using experience, AGI, we believe, will be more similar to human intelligence that can solve various problems which were not assumed in the design phase.

What is the advantage of the Whole Brain Architecture approach?

HY: The whole brain architecture is an engineering-based research approach “to create a human-like artificial general intelligence (AGI) by learning from the architecture of the entire brain.” Basically, this approach to building AGI is the integration of artificial neural networks and machine-learning modules while using the brain’s hard wiring as a reference.

I think it will be easier to create an AI with the same behavior and sense of values as humans this way. Even if superintelligence exceeds human intelligence in the near future, it will be comparatively easy to communicate with AI designed to think like a human, and this will be useful as machines and humans continue to live and interact with each other.

General intelligence is a function of many combined, interconnected features produced by learning, so we cannot manually break down these features into individual parts. Because of this difficulty, one meaningful characteristic of whole brain architecture is that though based on brain architecture, it is designed to be a functional assembly of parts that can still be broken down and used.

The functional parts of the brain are to some degree already present in artificial neural networks. It follows that we can build a roadmap of AGI based on these technologies as pieces and parts.

It is now said that convolutional neural networks have essentially outperformed the system/interaction between the temporal lobe and visual cortex in terms of image recognition tasks. At the same time, deep learning has been used to achieve very accurate voice recognition. In humans, the neocortex contains about 14 billion neurons, but about half of those can be partially explained with deep learning. From this point on, we need to come closer to simulating the functions of different structures of the brain, and even without the whole brain architecture, we need to be able to assemble several structures together to reproduce some behavioral level functions. Then, I believe, we’ll have a path to expand that development process to cover the rest of the brain functions, and finally integrate as whole brain..

You also started a non-profit, the Whole Brain Architecture Initiative. How does the non-profit’s role differ from the commercial work?

HY: The Whole Brain Architecture Initiative serves as an organization that helps promote whole brain AI architecture R&D as a whole.

The Basic Ideas of the WBAI:

  • Our vision is to create a world in which AI exists in harmony with humanity.
  • Our mission is to promote the open development of whole brain architecture.
    • In order to make human-friendly artificial general intelligence a public good for all of mankind, we seek to continually expand open, collaborative efforts to develop AI based on an architecture modeled after the brain.
  • Our values are Study, Imagine and Build.
    • Study: Deepen and spread our expertise.
    • Imagine: Broaden our views through public dialogue.
    • Build: Create AGI through open collaboration.

What do you think poses the greatest existential risk to global society in the 21st century?

HY: The risk is not just limited to AI; basically, as human scientific and technological abilities expand, and we become more empowered, risks will increase, too.

Imagine a large field where everyone only has weapons as dangerous as bamboo spears.  The risk that human beings would go extinct by killing each other is extremely small.  On the other hand, as technologies develop, we have bombs in a very small room and no matter who detonates the bomb, we approach a state of annihilation. That risk should concern everyone.

If there are only 10 people in the room, they will mutually monitor and trust each other. However, imagine trusting 10 billion people each with the ability to destroy everyone — such a scenario is beyond our ability to comprehend. Of course, technological development will advance not only offensive power but also defensive power, but it is not easy to have defensive power to contain attacking power at the same time. If scientific and technological development are promoted using artificial intelligence technology, for example, many countries will easily hold intercontinental ballistic fleets, and artificial intelligence can be extremely dangerous to living organisms by using nanotechnology. It could comprise a scenario to extinguish mankind by the development or use of dangerous substances.  Generally speaking, new offensive weapons are developed utilizing the progress of technology, and defensive weapons are developed to neutralize them. Therefore, it is inevitable that periods will exist where the offensive power needed to destroy humanity exceeds its defensive power.

What do you think is the greatest benefit that AGI can bring society?

HY: AGI’s greatest benefit comes from acceleration of development for science and technology. More sophisticated technology will offer solutions for global problems such as environmental issues, food problems and space colonization.

Here I would like to share my vision for the future: “In a desirable future, the happiness of all humans will be balanced against the survival of humankind under the support of superintelligence. In that future, society will be an ecosystem formed by augmented human beings and various public AIs, in what I dub ‘an ecosystem of shared intelligent agents’ (EcSIA).

“Although no human can completely understand EcSIA—it is too complex and vast—humans can control its basic directions. In implementing such control, the grace and wealth that EcSIA affords needs to be properly distributed to everyone.”

Assuming no global catastrophe halts progress, what are the odds of human level AGI in the next 10 years?

HY: I think there’s a possibility that it can happen soon, but taking the average of the estimates of people involved in WBAI, we came up with 2030.

In my current role as the editorial chairman for the Japanese Society of Artificial Intelligence (JSAI) journal, I’m promoting a plan to have a series of discussions starting in the July edition on the theme of “Singularity and AI,” in which we’ll have AI specialists discuss the singularity from a technical viewpoint. I want to help spread calm, technical views on the issue in this way, starting in Japan.

Once human level AGI is achieved, how long would you expect it to take for it to self-modify its way up to massive superhuman intelligence?

HY: If human-level AGI is achieved, it could take on the role of an AI researcher itself. Therefore, immediately after the AGI is built, it could start rapidly cultivating great numbers of AI researcher AI’s that work 24/7, and AI R&D would be drastically accelerated.

What probability do you assign to negative consequences as a result of badly done AI design or operation?

HY: If you include the risk of something like some company losing a lot of money, that will definitely happen.

The range of things that can be done with AI is becoming wider, and the disparity will widen between those who profit from it and those who do not. When that happens, the bad economic situation will give rise to dissatisfaction with the system, and that could create a breeding ground for war and strife. This could be perceived as the evils brought about by capitalism. It’s important that we try to curtail the causes of instability as much as possible.

Is it too soon for us to be researching AI Safety?

HY: I do not think it is at all too early to act for safety, and I think we should progress forward quickly. If possible, we should have several methods to be able to calculate the existential risk brought about by AGI.

Is there anything you think that the AI research community should be more aware of, more open about, or taking more action on?

HY: There are a number of actions that are obviously necessary. Based on this notion, we have established a number of measures like the Japanese Society for Artificial Intelligence Ethics in May 2015 (http://ai-elsi.org/ [in Japanese]), and subsequent Ethical Guidelines for AI researchers (http://ai-elsi.org/archives/514).

A majority of the content of these ethical guidelines expresses the standpoint that researchers should move forward with research that contributes to humanity and society. Additionally, one special characteristic of these guidelines is that the ninth principle listed, a call for ethical compliance of AI itself, states that AI in the future should also abide by the same ethical principles as AI researchers.

Japan, as a society, seems more welcoming of automation. Do you think the Japanese view of AI is different than that in the West?

HY: If we look at things from the standpoint of a moral society, we are all human, and without even looking from the viewpoints of one country or another, in general we should start with the mentality that we have more common characteristics than different.

When looking at AI from the traditional background of Japan, there is a strong influence from beliefs that spirits or “kami” are dwelling in all things. The boundary between living things and humans is relatively unclear, and along the same lines, the same boundaries for AI and robots are unclear. For this reason, in the past, robotic characters like “Tetsuwan Atom” (Astro Boy) and Doraemon were depicted as living and existing in the same world as humans, a theme that has been pervasive in Japanese anime for a long time.

From here on out, we will see humans and AI not as separate entities. Rather I think we will see the appearance of new combinations of AI and humans. Becoming more diverse in this way will certainly improve our chances of survival.

As a very personal view, I think that “surviving intelligence” is something that should be preserved in the future because I feel that it is very fortunate that we have established an intelligent society now, beyond the stormy sea of evolution.   Imagine a future in which our humanity is living with intelligent extraterrestrials after first contact. We will start caring about the survival of humanity but also intelligent extraterrestrials.  If that happens, one future scenario is that our dominant values will be extended to the survival of intelligence rather than the survival of the human race itself.

Hiroshi Yamakawa is the Director of Dwango AI Laboratory, Director and Chief Editor of the Japanese Society for Artificial Intelligence, a Fellow Researcher at the Brain Science Institute at Tamagawa University, and the Chairperson of the Whole Brain Architecture Initiative. He specializes in cognitive architecture, concept acquisition, neuro-computing, and opinion collection. He is one of the leading researchers working on AGI in Japan.

To learn more about Dr. Yamakawa’s work, you can read the full interview transcript here.

This interview was prepared by Eric Gastfriend, Jason Orlosky, Mamiko Matsumoto, Benjamin Peterson, Kazue Evans, and Tucker Davey. Original interview date: April 5, 2017. 

DeepMind’s AlphaGo Zero Becomes Go Champion Without Human Input

DeepMind’s AlphaGo Zero AI program just became the Go champion of the world without human data or guidance. This new system marks a significant technological jump from the AlphaGo program which beat Go champion Lee Sedol in 2016.

The game of Go has been played for more than 2,500 years and is widely viewed as not only a game, but a complex art form.  And a popular one at that. When the artificially intelligent AlphaGo from DeepMind played its first game against Sedol in March 2016, 60 million viewers tuned in to watch in China alone. AlphaGo went on to win four of five games, surprising the world and signifying a major achievement in AI research.

Unlike the chess match between Deep Blue and Garry Kasparov in 1997, AlphaGo did not win by brute force computing alone. The more complex programming of AlphaGo amazed viewers not only with the excellency of its play, but also with its creativity. The infamous “move 37” in game two was described by Go player Fan Hui as “So beautiful.” It was also so unusual that one of the commentators thought it was a mistake. Fan Hui explained, “It’s not a human move. I’ve never seen a human play this move.”

In other words, AlphaGo not only signified an iconic technological achievement, but also shook deeply held social and cultural beliefs about mastery and creativity. Yet, it turns out that AlphaGo was only the beginning. Today, DeepMind announced AlphaGo Zero.

Unlike AlphaGo, AlphaGo Zero was not shown a single human game of Go from which to learn. AlphaGo Zero learned entirely from playing against itself, with no prior knowledge of the game. Although its first games were random, the system used what DeepMind is calling a novel form of reinforcement learning to combine a neural network with a powerful search algorithm to improve each time it played.

In a DeepMind blog about the announcement, the authors write, “This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge. Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself.”

Though previous AIs from DeepMind have mastered Atari games without human input, as the authors of the Nature article note, “the game of Go, widely viewed as the grand challenge for artificial intelligence, [requires] a precise and sophisticated lookahead in vast search spaces.” While the old Atari games were much more straightforward, the new AI system for AlphaGo Zero had to master the strategy for immediate moves, as well as how to anticipate moves that might be played far into the future.

That this was done all without human demonstrations also takes the program a step beyond the original AlphaGo systems. But in addition to that, this new system learned with fewer input features than its predecessors, and while the original AlphaGo systems required two separate neural networks, AlphaGo Zero was built with only one.

AlphaGo Zero is not marginally better than its predecessor, but in an entirely new class of “superhuman performance” with an intelligence that is notably more general. After just three days of playing against itself (4.9 million times), AlphaGo Zero beat AlphaGo by 100 games to 0. It independently learned the ancient secrets of the masters, but also chose moves and developed strategies never before seen among human players.

Co-founder​ ​and​ ​CEO of ​DeepMind, Demis​ ​Hassabis, said: “It’s amazing to see just how far AlphaGo has come in only two years. AlphaGo Zero is now the strongest version of our program and shows how much progress we can make even with less computing power and zero use of human data.”

Hassabis continued, “Ultimately we want to harness algorithmic breakthroughs like this to help solve all sorts of pressing real world problems like protein folding or designing new materials. If we can make the same progress on these problems that we have with AlphaGo, it has the potential to drive forward human understanding and positively impact all of our lives.”

ICAN Wins Nobel Peace Prize

We at FLI offer an excited congratulations to the International Campaign to Abolish Nuclear Weapons (ICAN), this year’s winners of the Nobel Peace Prize. We could not be more honored to have had the opportunity to work with ICAN during their campaign to ban nuclear weapons.

Over 70 years have passed since the bombs were first dropped on Hiroshima and Nagasaki, but finally, on July 7 of this year, 122 countries came together at the United Nations to establish a treaty outlawing nuclear weapons. Behind the effort was the small, dedicated team at ICAN, led by Beatrice Fihn. They coordinated with hundreds of NGOs in 100 countries to guide a global discussion and build international support for the ban.

In a statement, they said: “By harnessing the power of the people, we have worked to bring an end to the most destructive weapon ever created – the only weapon that poses an existential threat to all humanity.”

There’s still more work to be done to decrease nuclear stockpiles and rid the world of nuclear threats, but this incredible achievement by ICAN provides the hope and inspiration we need to make the world a safer place.

Perhaps most striking, as seen below in many of the comments by FLI members, is how such a small, passionate group was able to make such a huge difference in the world. Congratulations to everyone at ICAN!

Statements by members of FLI:

Anthony Aguirre: “The work of Bea inspiringly shows that a passionate and committed group of people working to make the world safer can actually succeed!”

Ariel Conn: “Fear and tragedy might monopolize the news lately, but behind the scenes, groups like ICAN are changing the world for the better. Bea and her small team represent great hope for the future, and they are truly an inspiration.”

Tucker Davey: “It’s easy to feel hopeless about the nuclear threat, but Bea and the dedicated ICAN team have clearly demonstrated that a small group can make a difference. Passing the nuclear ban treaty is a huge step towards a safer world, and I hope ICAN’s Nobel Prize inspires others to tackle this urgent threat.”

Victoria Krakovna: “Bea’s dedicated efforts to protect humanity from itself are an inspiration to us all.”

Richard Mallah: “Bea and ICAN have shown such dedication in working to curb the ability of a handful of us to kill most of the rest of us.”

Lucas Perry: “For me, Bea and ICAN have beautifully proven and embodied Margaret Mead’s famous quote, ‘Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.’”

David Stanley: “The work taken on by ICAN’s team is often not glamorous, yet they have acted tirelessly for the past 10 years to protect us all from these abhorrent weapons. They are the few to whom so much is owed.”

Max Tegmark: “It’s been an honor and a pleasure collaborating with ICAN, and the attention brought by this Nobel Prize will help the urgently needed efforts to stigmatize the new nuclear arms race.”

Learn more about the treaty here.

The Future of Humanity Institute Releases Three Papers on Biorisks

Click here to see this page in other languages:  Russian 

Earlier this month, the Future of Humanity Institute (FHI) released three new papers that assess global catastrophic and existential biosecurity risks and offer a cost-benefit analysis of various approaches to dealing with these risks.

The work – done by Piers Millett, Andrew Snyder-Beattie, Sebastian Farquhar, and Owen Cotton-Barratt – looks at what the greatest risks might be, how cost-effective they are to address, and how funding agencies can approach high-risk research.

In one paper, Human Agency and Global Catastrophic Biorisks, Millett and Snyder-Beattie suggest that “the vast majority of global catastrophic biological risk (GCBR) comes from human agency rather than natural resources.” This risk could grow as future technologies allow us to further manipulate our environment and biology. The authors list many of today’s known biological risks but they also highlight how unknown risks in the future could easily arise as technology advances. They call for a GCBR community that will provide “a space for overlapping interests between the health security communities and the global catastrophic risk communities.”

Millett and Snyder-Beattie also authored the paper, Existential Risk and Cost-Effective Biosecurity. This paper looks at the existential threat of future bioweapons to assess whether the risks are high enough to justify investing in threat-mitigation efforts. They consider a spectrum of biosecurity risks, including biocrimes, bioterrorism, and biowarfare, and they look at three models to estimate the risk of extinction from these weapons. As they state in their conclusion: “Although the probability of human extinction from bioweapons may be extremely low, the expected value of reducing the risk (even by a small amount) is still very large, since such risks jeopardize the existence of all future human lives.”

The third paper is Pricing Externalities to Balance Public Risks and Benefits of Research, by Farquhar, Cotton-Barratt, and Snyder-Beattie. Here they consider how scientific funders should “evaluate research with public health risks.” The work was inspired by the controversy surrounding the “gain-of-function” experiments performed on the H5N1 flu virus. The authors propose an approach that translates an estimate of the risk into a financial price, which “can then be included in the cost of the research.” They conclude with the argument that the “approaches discussed would work by aligning the incentives for scientists and for funding bodies more closely with those of society as a whole.”

START from the Beginning: 25 Years of US-Russian Nuclear Weapons Reductions

By Eryn MacDonald and originally posted at the Union of Concerned Scientists.

For the past 25 years, a series of treaties have allowed the US and Russia to greatly reduce their nuclear arsenals—from well over 10,000 each to fewer than 2,000 deployed long-range weapons each. These Strategic Arms Reduction Treaties (START) have enhanced US security by reducing the nuclear threat, providing valuable information about Russia’s nuclear arsenal, and improving predictability and stability in the US-Russia strategic relationship.

Twenty-five years ago, US policy-makers of both parties recognized the benefits of the first START agreement: on October 1, 1992, the Senate voted overwhelmingly—93 to 6—in favor of ratifying START I.

The end of START?

With increased tensions between the US and Russia and an expanded range of security threats for the US to worry about, this longstanding foundation is now more valuable than ever.

The most recent agreement—New START—will expire in early February 2021, but can be extended for another five years if the US and Russian presidents agree to do so. In a January 28 phone call with President Trump, Russian President Putin reportedly raised the possibility of extending the treaty. But instead of being extended, or even maintained, the START framework is now in danger of being abandoned.

President Trump has called New START “one-sided” and “a bad deal,” and has even suggested the US might withdraw from the treaty. His advisors are clearly opposed to doing so. Secretary of State Rex Tillerson expressed support for New START in his confirmation hearing. Secretary of Defense James Mattis, while recently stating that the administration is currently reviewing the treaty “to determine whether it’s a good idea,” has previously also expressed support, as have the head of US Strategic Command and other military officials.

Withdrawal seems unlikely, especially given recent anonymous comments by administration officials saying that the US still sees value in New START and is not looking to discard it. But given the president’s attitude toward the treaty, it may still take some serious pushing from Mattis and other military officials to convince him to extend it. Worse, even if Trump is not re-elected, and the incoming president is more supportive of the treaty, there will be little time for a new administration, taking office in late January 2021, to do an assessment and sign on to an extension before the deadline. While UCS and other treaty supporters will urge the incoming administration to act quickly, if the Trump administration does not extend the treaty, it is quite possible that New START—and the security benefits it provides—will lapse.

The Beginning: The Basics and Benefits of START I

The overwhelming bipartisan support for a treaty cutting US nuclear weapons demonstrated by the START I ratification vote today seems unbelievable. At the time, however, both Democrats and Republicans in Congress, as well as the first President Bush, recognized the importance of the historic agreement, the first to require an actual reduction, rather than simply a limitation, in the number of US and Russian strategic nuclear weapons.

By the end of the Cold War, the US had about 23,000 nuclear warheads in its arsenal, and the Soviet Union had roughly 40,000. These numbers included about 12,000 US and 11,000 Soviet deployed strategic warheads—those mounted on long-range missiles and bombers. The treaty limited each country to 1,600 strategic missiles and bombers and 6,000 warheads, and established procedures for verifying these limits.

The limits on missiles and bombers, in addition to limits on the warheads themselves, were significant because START required the verifiable destruction of any excess delivery vehicles, which gave each side confidence that the reductions could not be quickly or easily reversed. To do this, the treaty established a robust verification regime with an unprecedented level of intrusiveness, including on-site inspections and exchanges of data about missile telemetry.

Though the groundwork for START I was laid during the Reagan administration, ratification and implementation took place during the first President Bush’s term. The treaty was one among several measures taken by the elder Bush that reduced the US nuclear stockpile by nearly 50 percent during his time in office.

START I entered into force in 1994 and had a 15-year lifetime; it required the US and Russia to complete reductions by 2001, and maintain those reductions until 2009. However, both countries actually continued reductions after reaching the START I limits. By the end of the Bush I administration, the US had already reduced its arsenal to just over 7,000 deployed strategic warheads. By the time the treaty expired, this number had fallen to roughly 3,900.

The Legacy of START I

Building on the success of START I, the US and Russia negotiated a follow-on treaty—START II—that required further cuts in deployed strategic weapons. These reductions were to be carried out in two steps, but when fully implemented would limit each country to 3,500 deployed strategic warheads, with no more than 1,750 of these on submarine-launched ballistic missiles.

Phase II also required the complete elimination of independently targetable re-entry vehicles (MIRVs) on intercontinental ballistic missiles. This marked a major step forward, because MIRVs were a particularly destabilizing configuration. Since just one incoming warhead could destroy all the warheads on a MIRVed land-based missile, MIRVs create pressure to “use them or lose them”—an incentive to strike first in a crisis. Otherwise, a country risked losing its ability to use those missiles to retaliate in the case of a first strike against it.

While both sides ratified START II, it was a long and contentious process, and entry into force was complicated by provisions attached by both the US Senate and Russian Duma. The US withdrawal from the Anti-Ballistic Missile (ABM) treaty in 2002 was the kiss of death for START II. The ABM treaty had strictly limited missile defenses. Removing this limit created a situation in which either side might feel it had to deploy more and more weapons to be sure it could overcome the other’s defense. But the George W. Bush administration was now committed to building a larger-scale defense, regardless of Russia’s vocal opposition and clear statements that doing so would undermine arms control progress.

Russia responded by announcing its withdrawal from START II, finally ending efforts to bring the treaty into force. A proposed START III treaty, which would have called for further reductions to 2,000 to 2,500 warheads on each side, never materialized; negotiations had been planned to begin after entry into force of START II.

After the failure of START II, the US and Russia negotiated the Strategic Offensive Reductions Treaty (SORT, often called the “Moscow Treaty”). SORT required each party to reduce to 1,700 to 2,200 deployed strategic warheads, but was a much less formal treaty than START. It did not include the same kind of extensive verification regime and, in fact, did not even define what was considered a “strategic warhead,” instead leaving each party to decide for itself what it would count. This meant that although SORT did encourage further progress to lower numbers of weapons, overall it did not provide the same kind of benefits for the US as START had.

New START

Recognizing the deficiencies of the minimal SORT agreement, the Obama administration made negotiation of New START an early priority, and the treaty was ratified in 2010.

New START limits each party to 1,550 deployed strategic nuclear warheads by February 2018. The treaty also limits the number of deployed intercontinental ballistic missiles, submarine-launched ballistic missiles, and long-range bombers equipped to carry nuclear weapons to no more than 700 on each side. Altogether, no more than 800 deployed and non-deployed missiles and bombers are allowed for each side.

In reality, each country will deploy somewhat more than 1,550 warheads—probably around 1,800 each—because of a change in the way New START counts warheads carried by long-range bombers. START I assigned a number of warheads to each bomber based on its capabilities. New START simply counts each long-range bomber as a single warhead, regardless of the actual number it does or could carry. The less stringent limits on bombers are possible because bombers are considered less destabilizing than missiles. The bombers’ detectability and long flight times—measured in hours vs. the roughly thirty minutes it takes for a missile to fly between the United States and Russia—mean that neither side is likely to use them to launch a first strike.

Both the United States and Russia have been moving toward compliance with the New START limits, and as of July 1, 2017—when the most recent official exchange of data took place—both are under the limit for deployed strategic delivery vehicles and close to meeting the limit for deployed and non-deployed strategic delivery vehicles. The data show that the United States is currently slightly under the limit for deployed strategic warheads, at 1,411, while Russia, with 1,765, still has some cuts to make to reach this limit.

Even in the increasingly partisan atmosphere of the 2000s, New START gained support from a wide range of senators, as well as military leaders and national security experts. The treaty passed in the Senate with a vote of 71 to 26; thirteen Republicans joined all Democratic senators in voting in favor. While this is significantly closer than the START I vote, as then-Senator John F. Kerry noted at the time, “in today’s Senate, 70 votes is yesterday’s 95.”

And the treaty continues to have strong support—including from Air Force General John Hyten, commander of US Strategic Command, which is responsible for all US nuclear forces. In Congressional testimony earlier this year, Hyten called himself “a big supporter” of New START and said that “when it comes to nuclear weapons and nuclear capabilities, that bilateral, verifiable arms control agreements are essential to our ability to provide an effective deterrent.” Another Air Force general, Paul Selva, vice chair of the Joint Chiefs of Staff, agreed, saying in the same hearing that when New START was ratified in 2010, “the Joint Chiefs reviewed the components of the treaty—and endorsed it. It is a bilateral, verifiable agreement that gives us some degree of predictability on what our potential adversaries look like.”

The military understands the benefits of New START. That President Trump has the power to withdraw from the treaty despite support from those who are most directly affected by it is, as he would say, “SAD.”

That the US president fails to understand the value of US-Russian nuclear weapon treaties that have helped to maintain stability for more than two decades is a travesty.

Explainable AI: a discussion with Dan Weld

Machine learning systems are confusing – just ask any AI researcher. Their deep neural networks operate incredibly quickly, considering thousands of possibilities in seconds before making decisions. The human brain simply can’t keep up.

When people learn to play Go, instructors can challenge their decisions and hear their explanations. Through this interaction, teachers determine the limits of a student’s understanding. But DeepMind’s AlphaGo, which recently beat the world’s champions at Go, can’t answer these questions. When AlphaGo makes an unexpected decision it’s difficult to understand why it made that choice.

Admittedly, the stakes are low with AlphaGo: no one gets hurt if it makes an unexpected move and loses. But deploying intelligent machines that we can’t understand could set a dangerous precedent.

According to computer scientist Dan Weld, understanding and trusting machines is “the key problem to solve” in AI safety, and it’s necessary today. He explains, “Since machine learning is at the core of pretty much every AI success story, it’s really important for us to be able to understand what it is that the machine learned.”

As machine learning (ML) systems assume greater control in healthcare, transportation, and finance, trusting their decisions becomes increasingly important. If researchers can program AIs to explain their decisions and answer questions, as Weld is trying to do, we can better assess whether they will operate safely on their own.

 

Teaching Machines to Explain Themselves

Weld has worked on techniques that expose blind spots in ML systems, or “unknown unknowns.”

When an ML system faces a “known unknown,” it recognizes its uncertainty with the situation. However, when it encounters an unknown unknown, it won’t even recognize that this is an uncertain situation: the system will have extremely high confidence that its result is correct, but it will be wrong. Often, classifiers have this confidence because they were “trained on data that had some regularity in it that’s not reflected in the real world,” Weld says.

Consider an ML system that has been trained to classify images of dogs, but has only been trained on images of brown and black dogs. If this system sees a white dog for the first time, it might confidently assert that it’s not a dog. This is an “unknown unknown” – trained on incomplete data, the classifier has no idea that it’s completely wrong.

ML systems can be programmed to ask for human oversight on known unknowns, but since they don’t recognize unknown unknowns, they can’t easily ask for oversight. Weld’s research team is developing techniques to facilitate this, and he believes that it will complement explainability. “After finding unknown unknowns, the next thing the human probably wants is to know WHY the learner made those mistakes, and why it was so confident,” he explains.

Machines don’t “think” like humans do, but that doesn’t mean researchers can’t engineer them to explain their decisions.

One research group jointly trained a ML classifier to recognize images of birds and generate captions. If the AI recognizes a toucan, for example, the researchers can ask “why.” The neural net can then generate an explanation that the huge, colorful bill indicated a toucan.

While AI developers will prefer certain concepts explained graphically, consumers will need these interactions to involve natural language and more simplified explanations. “Any explanation is built on simplifying assumptions, but there’s a tricky judgment question about what simplifying assumptions are OK to make. Different audiences want different levels of detail,” says Weld.

Explaining the bird’s huge, colorful bill might suffice in image recognition tasks, but with medical diagnoses and financial trades, researchers and users will want more. Like a teacher-student relationship, human and machine should be able to discuss what the AI has learned and where it still needs work, drilling down on details when necessary.

“We want to find mistakes in their reasoning, understand why they’re making these mistakes, and then work towards correcting them,” Weld adds.    

 

Managing Unpredictable Behavior

Yet, ML systems will inevitably surprise researchers. Weld explains, “The system can and will find some way of achieving its objective that’s different from what you thought.”

Governments and businesses can’t afford to deploy highly intelligent AI systems that make unexpected, harmful decisions, especially if these systems control the stock market, power grids, or data privacy. To control this unpredictability, Weld wants to engineer AIs to get approval from humans before executing novel plans.

“It’s a judgment call,” he says. “If it has seen humans executing actions 1-3, then that’s a normal thing. On the other hand, if it comes up with some especially clever way of achieving the goal by executing this rarely-used action number 5, maybe it should run that one by a live human being.”

Over time, this process will create norms for AIs, as they learn which actions are safe and which actions need confirmation.

 

Implications for Current AI Systems

The people that use AI systems often misunderstand their limitations. The doctor using an AI to catch disease hasn’t trained the AI and can’t understand its machine learning. And the AI system, not programmed to explain its decisions, can’t communicate problems to the doctor.

Weld wants to see an AI system that interacts with a pre-trained ML system and learns how the pre-trained system might fail. This system could analyze the doctor’s new diagnostic software to find its blind spots, such as its unknown unknowns. Explainable AI software could then enable the AI to converse with the doctor, answering questions and clarifying uncertainties.

And the applications extend to finance algorithms, personal assistants, self-driving cars, and even predicting recidivism in the legal system, where explanation could help root out bias. ML systems are so complex that humans may never be able to understand them completely, but this back-and-forth dialogue is a crucial first step.

“I think it’s really about trust and how can we build more trustworthy AI systems,” Weld explains. “The more you interact with something, the more shared experience you have, the more you can talk about what’s going on. I think all those things rightfully build trust.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence: The Challenge to Keep It Safe

Safety Principle: AI systems should be safe and secure throughout their operational lifetime and verifiably so where applicable and feasible.

When a new car is introduced to the world, it must pass various safety tests to satisfy not just government regulations, but also public expectations. In fact, safety has become a top selling point among car buyers.

And it’s not just cars. Whatever the latest generation of any technology happens to be — from appliances to airplanes — manufacturers know that customers expect their products to be safe from start to finish.

Artificial intelligence is no different. So, on the face of it, the Safety Principle seems like a “no brainer,” as Harvard psychologist Joshua Greene described it. It’s obviously not in anyone’s best interest for an AI product to injure its owner or anyone else. But, as Greene and other researchers highlight below, this principle is much more complex than it appears at first glance.

“This is important, obviously,” said University of Connecticut philosopher Susan Schneider, but she expressed uncertainty about our ability to verify that we can trust a system as it gets increasingly intelligent. She pointed out that at a certain level of intelligence, the AI will be able to rewrite its own code, and with superintelligent systems “we may not even be able to understand the program to begin with.”

What Is AI Safety?

This principle gets to the heart of the AI safety research initiative: how can we ensure safety for a technology that is designed to learn how to modify its own behavior?

Artificial intelligence is designed so that it can learn from interactions with its surroundings and alter its behavior accordingly, which could provide incredible benefits to humanity. Because AI can address so many problems more effectively than people, it has huge potential to improve health and wellbeing for everyone. But it’s not hard to imagine how this technology could go awry. And we don’t need to achieve superintelligence for this to become a problem.

Microsoft’s chatbot, Tay, is a recent example of how an AI can learn negative behavior from its environment, producing results quite the opposite from what its creators had in mind. Meanwhile, the Tesla car accident, in which the vehicle mistook a white truck for a clear sky, offers an example of an AI misunderstanding its surrounding and taking deadly action as a result.

Researchers can try to learn from AI gone astray, but current designs often lack transparency, and much of today’s artificial intelligence is essentially a black box. AI developers can’t always figure out how or why AIs take various actions, and this will likely only grow more challenging as AI becomes more complex.

However, Ian Goodfellow, a research scientist at Google Brain, is hopeful, pointing to efforts already underway to address these concerns.

“Applying traditional security techniques to AI gives us a concrete path to achieving AI safety,” Goodfellow explains. “If we can design a method that prevents even a malicious attacker from causing an AI to take an undesirable action, then it is even less likely that the AI would choose an undesirable action independently.”

AI safety may be a challenge, but there’s no reason to believe it’s insurmountable. So what do other AI experts say about how we can interpret and implement the Safety Principle?

What Does ‘Verifiably’ Mean?

‘Verifiably’ was the word that caught the eye of many researchers as a crucial part of this Principle.

John Havens, an Executive Director with IEEE, first considered the Safety Principle in its entirety, saying,  “I don’t know who wouldn’t say AI systems should be safe and secure. … ‘Throughout their operational lifetime’ is actually the more important part of the sentence, because that’s about sustainability and longevity.”

But then, he added, “My favorite part of the sentence is ‘and verifiably so.’ That is critical. Because that means, even if you and I don’t agree on what ‘safe and secure’ means, but we do agree on verifiability, then you can go, ‘well, here’s my certification, here’s my checklist.’ And I can go, ‘Great, thanks.’ I can look at it, and say, ‘oh, I see you got things 1-10, but what about 11-15?’ Verifiably is a critical part of that sentence.”

AI researcher Susan Craw noted that the Principle “is linked to transparency.” She explained, “Maybe ‘verifiably so’ would be possible with systems if they were a bit more transparent about how they were doing things.”

Greene also noted the complexity and challenge presented by the Principle when he suggested:

“It depends what you mean by ‘verifiably.’ Does ‘verifiably’ mean mathematically, logically proven? That might be impossible. Does ‘verifiably’ mean you’ve taken some measures to show that a good outcome is most likely? If you’re talking about a small risk of a catastrophic outcome, maybe that’s not good enough.”

Safety and Value Alignment

Any consideration of AI safety must also include value alignment: how can we design artificial intelligence that can align with the global diversity of human values, especially taking into account that, often, what we ask for is not necessarily what we want?

“Safety is not just a technical problem,” Patrick Lin, a philosopher at California Polytechnic told me. “If you just make AI that can align perfectly with whatever values you set it to, well the problem is, people can have a range of values, and some of them are bad. Just merely matching AI, aligning it to whatever value you specify I think is not good enough. It’s a good start, it’s a good big picture goal to make AI safe, and the technical element is a big part of it; but again, I think safety also means policy and norm-setting.”

And the value-alignment problem becomes even more of a safety issue as the artificial intelligence gets closer to meeting — and exceeding — human intelligence.

“Consider the example of the Japanese androids that are being developed for elder care,” said Schneider. “They’re not smart; right now, the emphasis is on physical appearance and motor skills. But imagine when one of these androids is actually engaged in elder care … It has to multitask and exhibit cognitive flexibility. … That raises the demand for household assistants that are AGIs. And once you get to the level of artificial general intelligence, it’s harder to control the machines. We can’t even make sure fellow humans have the right goals; why should we think AGI will have values that align with ours, let alone that a superintelligence would.”

Defining Safety

But perhaps it’s time to reconsider the definition of safety, as Lin alluded to above. Havens also requested “words that further explain ‘safe and secure,’” suggesting that we need to expand the definition beyond “physically safe” to “provide increased well being.”

Anca Dragan, an associate professor at UC Berkeley, was particularly interested in the definition of “safe.”

“We all agree that we want our systems to be safe,” said Dragan. “More interesting is what do we mean by ‘safe’, and what are acceptable ways of verifying safety.

“Traditional methods for formal verification that prove (under certain assumptions) that a system will satisfy desired constraints seem difficult to scale to more complex and even learned behavior. Moreover, as AI advances, it becomes less clear what these constraints should be, and it becomes easier to forget important constraints. … we need to rethink what we mean by safe, perhaps building in safety from the get-go as opposed to designing a capable system and adding safety after.”

What Do You Think?

What does it mean for a system to be safe? Does it mean the owner doesn’t get hurt? Are “injuries” limited to physical ailments, or does safety also encompass financial or emotional damage? And what if an AI is being used for self-defense or by the military? Can an AI harm an attacker? How can we ensure that a robot or software program or any other AI system remains verifiably safe throughout its lifetime, even as it continues to learn and develop on its own? How much risk are we willing to accept in order to gain the potential benefits that increasingly intelligent AI — and ultimately superintelligence — could bestow?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Countries Sign UN Treaty to Outlaw Nuclear Weapons

Update 9/25/17: 53 countries have now signed and 3 have ratified.

Today, 50 countries took an important step toward a nuclear-free world by signing the United Nations Treaty on the Prohibition of Nuclear Weapons. This is the first treaty to legally ban nuclear weapons, just as we’ve seen done previously with chemical and biological weapons.

A Long Time in the Making

In 1933, Leo Szilard first came up with the idea of a nuclear chain reaction. Only a few years later, the Manhattan Project was underway, culminating in the nuclear attacks against Hiroshima and Nagasaki in 1945. In the following decades of the Cold War, the U.S. and Russia amassed arsenals that peaked at over 70,000 nuclear weapons in total, though that number is significantly less today. The U.K, France, China, Israel, India, Pakistan, and North Korea have also built up their own, much smaller arsenals.

Over the decades, the United Nations has established many treaties relating to nuclear weapons, including the non-proliferation treaty, START I, START II, the Comprehensive Nuclear Test Ban Treaty, and New START. Though a few other countries began nuclear weapons programs, most of those were abandoned, and the majority of the world’s countries have rejected nuclear weapons outright.

Now, over 70 years since the bombs were first dropped on Japan, the United Nations finally has a treaty outlawing nuclear weapons.

The Treaty

The Treaty on the Prohibition of Nuclear Weapons was adopted on July 7, with a vote of approval from 122 countries. As part of the treaty, the states who sign agree that they will never “[d]evelop, test, produce, manufacture, otherwise acquire, possess or stockpile nuclear weapons or other nuclear explosive devices.” Signatories also promise not to assist other countries with such efforts, and no signatory will “[a]llow any stationing, installation or deployment of any nuclear weapons or other nuclear explosive devices in its territory or at any place under its jurisdiction or control.”

Not only had 50 countries signed the treaty at the time this article was written, but 3 of them also already ratified it. The treaty will enter into force 90 days after it’s ratified by 50 countries.

The International Campaign to Abolish Nuclear Weapons (ICAN) is tracking progress of the treaty, with a list of countries that have signed and ratified it so far.

At the ceremony, UN Secretary General António Guterres said, “The Treaty on the Prohibition of Nuclear Weapons is the product of increasing concerns over the risk posed by the continued existence of nuclear weapons, including the catastrophic humanitarian and environmental consequences of their use.”

Still More to Do

Though countries that don’t currently have nuclear weapons are eager to see the treaty ratified, no one is foolish enough to think that will magically rid the world of nuclear weapons.

“Today we rightfully celebrate a milestone.  Now we must continue along the hard road towards the elimination of nuclear arsenals,” Guterres added in his statement.

There are still over 15,000 nuclear weapons in the world today. While that’s significantly less than we’ve had in the past, it’s still more than enough to kill most people on earth.

The U.S. and Russia hold most of these weapons, but as we’re seeing from the news out of North Korea, a country doesn’t need to have thousands of nuclear weapons to present a destabilizing threat.

Susi Snyder, author of Pax’s Don’t Bank on the Bomb and a leading advocate of the treaty, told FLI:

“The countries signing the treaty are the responsible actors we need in these times of uncertainty, fire, fury, and devastating threats. They show it is possible and preferable to choose diplomacy over war.

Earlier this summer, some of the world’s leading scientists also came together in support of the nuclear ban with this video that was presented to the United Nations:

Stanislav Petrov

The signing of the treaty has occurred within a week of both the news of the death of Stanislav Petrov, as well as of Petrov day. On September 26, 1983, Petrov chose to follow his gut rather than rely on what turned out to be faulty satellite data. In doing so, he prevented what could have easily escalated into full-scale global nuclear war.