Harvesting Water Out of Thin Air: A Solution to Water Shortage Crisis?

The following post was written by Jung Hyun Claire Park.

One in nine people around the world do not have access to clean water.  As the global population increases and climate heats up, experts fear water shortages will increase. To address this anticipated crisis, scientists are turning to a natural reserve of fresh water that has yet to be exploited: the atmosphere.

The atmosphere is estimated to contain 13 trillion liters of water vapor and droplets, which could significantly contribute to resolving the water shortage problem. However, a number of attempts have already been made to collect water from air. Previously, researchers have used porous materials such as zeolites, silica gel, and clay to capture water molecules, but these approaches suffered from several limitations. First, the aforementioned materials work efficiently only in high-humidity condition. Yet it’s low-humidity areas, like sub-Saharan Africa, which are in greatest need of clean drinking water. Another limitation is that these materials tend to cling too tightly to the water molecules they collect. Thus, these previous methods of collecting water from air have required high energy consumption to release the absorbed water, diminishing their viability as a solution to the water shortage crisis.

Now, Dr. Omar Yaghi and a team of scientists at Massachusetts Institute of Technology and the University of California Berkeley have developed a new technology that provides a solution to these limitations. The technology uses a material called a metal-organic framework (MOF) that effectively captures water molecules at low-humidity levels. And the only energy necessary to release drinkable water from the MOFs can be harnessed from ambient sunlight.

How Does This System Work?

MOFs belong to a family of porous compounds whose sponge-like configuration is ideal for trapping molecules. The MOFs can be easily modified at the molecular level to meet various needs, and they are highly customizable. Researchers can modify the type of molecule that’s absorbed, the optimal humidity level for maximum absorption, and the energy required to release trapped molecules — thus yielding a plethora of potential MOF variations. The proposed water harvesting technology uses a hydrophilic variation of MOFs called microcrystalline powder MOF-801. This variation is engineered to more efficiently harvest water from an atmosphere in which the relative humidity level as low as 20% — the typical level found in the world’s driest regions. Furthermore, the MOF-801 only requires energy from ambient sunlight to relinquish its collected water, which means the energy necessary for this technology is abundant in precisely those desert areas with the most severely limited supply of fresh water.  MOF-801 overcomes most, if not all, of the limitations found in the materials that were previously proposed for harvesting water from air.

A Schematic of a metal-organic framework (MOF). The yellow balls represent the porous space where molecules are captured. The lines are organic linkers, and the blue intersections are metal ions. UC Berkeley, Berkeley Lab image

The prototype is shaped like a rectangular prism and it operates through a simple mechanism. To collect water from the atmosphere, MOF is pressed into a thin sheet of copper metal and placed under the solar absorber located on top of the prism. The condenser plate is placed at the bottom and is kept at room temperature. Once the top layer absorbs solar heat, water is released from the MOF and collected in the cooler bottom layer due to concentration and temperature difference. Tests showed that one kilogram (about 2 pounds) of MOF can collect about 2.8L of water per day. Yaghi notes that since the technology collects distilled water, all that’s needed is the addition of mineral ions. He suggests that one kilogram of MOF will be able to produce enough drinkable water per day for a person living in some of the driest regions on earth.

Image of a water harvesting prototype with MOF-801 with outer dimension of 7cm by 7cm x 4.5cm. MIT.

Why This Technology Is Promising

The promise of this technology mostly lies in its sustainability. Water can be pulled from the air without any energy input beyond that which can be collected from the ambient sunlight. In addition, MOF-801 is a zirconium-based compound that is widely available for a low cost. And the technology has a long-life span: Yaghi predicts that the MOF will last through at least 100,000 cycles of water absorption and desorption, and thus it does not require frequent replacement. Plus, the water harvesting technology employing MOF isn’t limited to drinking water. It could be used for any service requiring water, such as agriculture. Yaghi believes that this water harvesting technology could pose a viable solution for water shortage problems in various regions of the world.

Yaghi also anticipates that the material itself could be used for the separation, storage, and catalysis of molecules other than water as well. For instance, MOF can be tailored to capture carbon emissions before those emissions reach the atmosphere. Or they may be designed to remove existing CO2 from the atmosphere. MOF, as the name suggests, is simply a framework, and thus it has opened up many opportunities for modification to suit practical needs.

Future of Water Harvesting Technology

The team of researchers from Berkeley and MIT are currently pushing to test the water harvesting technology in real-life settings in regions with low humidity levels. Yaghi remarked that his ultimate goal would be to “have drinking water widely available, especially in areas that lack clean water.” He envisions providing water to villages that are “off-grid,” where each household will have a machine and create their own “personalized water.” And he hopes his envisioned future may not be too far away.

AI Researchers Create Video to Call for Autonomous Weapons Ban at UN

In response to growing concerns about autonomous weapons, a coalition of AI researchers and advocacy organizations released a fictitious video on Monday that depicts a disturbing future in which lethal autonomous weapons have become cheap and ubiquitous.

The video was launched in Geneva, where AI researcher Stuart Russell presented it at an event at the United Nations Convention on Conventional Weapons hosted by the Campaign to Stop Killer Robots.

Russell, in an appearance at the end of the video, warns that the technology described in the film already exists and that the window to act is closing fast.

Support for a ban has been mounting. Just this past week, over 200 Canadian scientists and over 100 Australian scientists in academia and industry penned open letters to Prime Minister Justin Trudeau and Malcolm Turnbull urging them to support the ban. Earlier this summer, over 130 leaders of AI companies signed a letter in support of this week’s discussions. These letters follow a 2015 open letter released by the Future of Life Institute and signed by more than 20,000 AI/Robotics researchers and others, including Elon Musk and Stephen Hawking.

These letters indicate both grave concern and a sense that the opportunity to curtail lethal autonomous weapons is running out.

Noel Sharkey of the International Committee for Robot Arms Control explains, “The Campaign to Stop Killer Robots is not trying to stifle innovation in artificial intelligence and robotics and it does not wish to ban autonomous systems in the civilian or military world. Rather we see an urgent need to prevent automation of the critical functions for selecting targets and applying violent force without human deliberation and to ensure meaningful human control for every attack.”

Drone technology today is very close to having fully autonomous capabilities. And many of the world’s leading AI researchers worry that if these autonomous weapons are ever developed, they could dramatically lower the threshold for armed conflict, ease and cheapen the taking of human life, empower terrorists, and create global instability. The US and other nations have used drones and semi-automated systems to carry out attacks for several years now, but fully removing a human from the loop is at odds with international humanitarian and human rights law.

A ban can exert great power on the trajectory of technological development without needing to stop every instance of misuse. Max Tegmark, MIT Professor and co-founder of the Future of Life Institute, points out, “People’s knee-jerk reaction that bans can’t help isn’t historically accurate: the bioweapon ban created such a powerful stigma that, despite treaty cheating, we have almost no bioterror attacks today and almost all biotech funding is civilian.”

As Toby Walsh, an AI professor at the University of New South Wales, argues: “The academic community has sent a clear and consistent message. Autonomous weapons will be weapons of terror, the perfect tool for those who have no qualms about the terrible uses to which they are put. We need to act now before this future arrives.”

More than 70 countries are participating in the meeting taking place November 13 – 17 organized by the 2016 Fifth Review Conference at the UN, which established a Group of Governmental Experts on lethal autonomous weapons. The meeting is chaired by Ambassador Amandeep Singh Gill of India, and the countries will continue negotiations of what could become an historic international treaty.

For more information about autonomous weapons, see the following resources:

Developing Ethical Priorities for Neurotechnologies and AI

Private companies and military sectors have moved beyond the goal of merely understanding the brain to that of augmenting and manipulating brain function. In particular, companies such as Elon Musk’s Neuralink and Bryan Johnson’s Kernel are hoping to harness advances in computing and artificial intelligence alongside neuroscience to provide new ways to merge our brains with computers.

Musk also sees this as a means to help address both AI safety and human relevance as algorithms outperform humans in one area after another. He has previously stated, “Some high bandwidth interface to the brain will be something that helps achieve a symbiosis between human and machine intelligence and maybe solves the control problem and the usefulness problem.”

In a comment in Nature, 27 people from The Morningside Group outlined four ethical priorities for the emerging space of neurotechnologies and artificial intelligence. The authors include neuroscientists, ethicists and AI engineers from Google, top US and global Universities, and several non-profit research organizations such as AI Now and The Hastings Center.

A Newsweek article describes their concern, “Artificial intelligence could hijack brain-computer interfaces and take control of our minds.” While this is not exactly the warning the Group describes, they do suggest we are in store for some drastic changes:

…we are on a path to a world in which it will be possible to decode people’s mental processes and directly manipulate the brain mechanisms underlying their intentions, emotions and decisions; where individuals could communicate with others simply by thinking; and where powerful computational systems linked directly to people’s brains aid their interactions with the world such that their mental and physical abilities are greatly enhanced.

The authors suggest that although these advances could provide meaningful and beneficial enhancements to the human experience, they could also exacerbate social inequalities, enable more invasive forms of social manipulation, and threaten core fundamentals of what it means to be human. They encourage readers to consider the ramifications of these emerging technologies now.

Referencing the Asilomar AI Principles and other ethical guidelines as a starting point, they call for a new set of guidelines that specifically address concerns that will emerge as groups like Elon Musk’s startup Neuralink and other companies around the world explore ways to improve the interface between brains and machines. Their recommendations cover four key areas: privacy and consent; agency and identity; augmentation; and bias.

Regarding privacy and consent, they posit that the right to keep neural data private is critical. To this end, they recommend opt-in policies, strict regulation of commercial entities, and the use of blockchain-based techniques to provide transparent control over the use of data. In relation to agency and identity, they recommend that bodily and mental integrity, as well as the ability to choose our actions, be enshrined in international treaties such as the Universal Declaration of Human Rights.

In the area of augmentation, the authors discuss the possibility of an augmentation arms race of soldiers in the pursuit of so-called “super-soldiers” that are more resilient to combat conditions. They recommend that the use of neural technology for military purposes be stringently regulated. And finally, they recommend the exploration of countermeasures, as well as diversity in the design process, in order to prevent widespread bias in machine learning applications.

The ways in which AI will increasingly connect with our bodies and brains pose challenging safety and ethical concerns that will require input from a vast array of people. As Dr. Rafael Yuste of Columbia University, a neuroscientist who co-authored the essay, told STAT, “the ethical thinking has been insufficient. Science is advancing to the point where suddenly you can do things you never would have thought possible.”

MIRI’s November 2017 Newsletter

Eliezer Yudkowsky has written a new book on civilizational dysfunction and outperformance: Inadequate Equilibria: Where and How Civilizations Get Stuck. The full book will be available in print and electronic formats November 16. To preorder the ebook or sign up for updates, visit equilibriabook.com.

We’re posting the full contents online in stages over the next two weeks. The first two chapters are:

  1. Inadequacy and Modesty (discussion: LessWrong, EA Forum, Hacker News)
  2. An Equilibrium of No Free Energy (discussion: LessWrong, EA Forum)

Research updates

General updates

News and links

Scientists to Congress: The Iran Deal is a Keeper

The following article was written by Dr. Lisbeth Gronlund and originally posted on the Union of Concerned Scientists blog.

The July 2015 Iran Deal, which places strict, verified restrictions on Iran’s nuclear activities, is again under attack by President Trump. This time he’s kicked responsibility over to Congress to “fix” the agreement and promised that if Congress fails to do so, he will withdraw from it.

As the New York Times reported, in response to this development over 90 prominent scientists sent a letter to leading members of Congress yesterday urging them to support the Iran Deal—making the case that continued US participation will enhance US security.

Many of these scientists also signed a letter strongly supporting the Iran Deal to President Obama in August 2015, as well as a letter to President-elect Trump in January. In all three cases, the first signatory is Richard L. Garwin, a long-standing UCS board member who helped develop the H-bomb as a young man and has since advised the government on all matters of security issues. Last year, he was awarded a Presidential Medal of Freedom.

What’s the Deal?

If President Trump did pull out of the agreement, what would that mean? First, the Joint Comprehensive Plan of Action (JCPoA) (as it is formally named) is not an agreement between just Iran and the US—but also includes China, France, Germany, Russia, the UK, and the European Union. So the agreement will continue—unless Iran responds by quitting as well. (More on that later.)

The Iran Deal is not a treaty, and did not require Senate ratification. Instead, the United States participates in the JCPoA by presidential action. However, Congress wanted to get into the act and passed The Iran Agreement Review Act of 2015, which requires the president to certify every 90 days that Iran remains in compliance.

President Trump has done so twice, but declined to do so this month and instead called for Congress—and US allies—to work with the administration “to address the deal’s many serious flaws.” Among those supposed flaws is that the deal covering Iran’s nuclear activities does not also cover its missile activities!

According to President Trump’s October 13 remarks:

Key House and Senate leaders are drafting legislation that would amend the Iran Nuclear Agreement Review Act to strengthen enforcement, prevent Iran from developing an inter– —this is so totally important—an intercontinental ballistic missile, and make all restrictions on Iran’s nuclear activity permanent under US law.

The Reality

First, according to the International Atomic Energy Agency, which verifies the agreement, Iran remains in compliance. This was echoed by Norman Roule, who retired this month after working at the CIA for three decades. He served as the point person for US intelligence on Iran under multiple administrations. He told an NPR interviewer, “I believe we can have confidence in the International Atomic Energy Agency’s efforts.”

Second, the Iran Deal was the product of several years of negotiations. Not surprisingly, recent statements by the United Kingdom, France, Germany, the European Union, and Iran make clear that they will not agree to renegotiate the agreement. It just won’t happen. US allies are highly supportive of the Iran Deal.

Third, Congress can change US law by amending the Iran Nuclear Agreement Review Act, but this will have no effect on the terms of the Iran Deal. This may be a face-saving way for President Trump to stay with the agreement—for now. However, such amendments will lay the groundwork for a future withdrawal and give credence to President Trump’s claims that the agreement is a “bad deal.” That’s why the scientists urged Congress to support the Iran Deal as it is.

The End of a Good Deal?

If President Trump pulls out of the Iran Deal and reimposes sanctions against Iran, our allies will urge Iran to stay with the deal. But Iran has its own hardliners who want to leave the deal—and a US withdrawal is exactly what they are hoping for.

If Iran leaves the agreement, President Trump will have a lot to answer for. Here is an agreement that significantly extends the time it would take for Iran to produce enough material for a nuclear weapon, and that would give the world an alarm if they started to do so. For the United States to throw that out the window would be deeply irresponsible. It would not just undermine its own security, but that of Iran’s neighbors and the rest of the world.

Congress should do all it can to prevent this outcome. The scientists sent their letter to Senators Corker and Cardin, who are the Chairman and Ranking Member of the Senate Foreign Relations Committee, and to Representatives Royce and Engel, who are the Chairman and Ranking Member of the House Foreign Affairs Committee, because these men have a special responsibility on issues like these.

Let’s hope these four men will do what’s needed to prevent the end of a good deal—a very good deal.

55 Years After Preventing Nuclear Attack, Arkhipov Honored With Inaugural Future of Life Award

London, UK – On October 27, 1962, a soft-spoken naval officer named Vasili Arkhipov single-handedly prevented nuclear war during the height of the Cuban Missile Crisis. Arkhipov’s submarine captain, thinking their sub was under attack by American forces, wanted to launch a nuclear weapon at the ships above. Arkhipov, with the power of veto, said no, thus averting nuclear war.

Now, 55 years after his courageous actions, the Future of Life Institute has presented the Arkhipov family with the inaugural Future of Life Award to honor humanity’s late hero.

Arkhipov’s surviving family members, represented by his daughter Elena and grandson Sergei, flew into London for the ceremony, which was held at the Institute of Engineering & Technology. After explaining Arkhipov’s heroics to the audience, Max Tegmark, president of FLI, presented the Arkhipov family with their award and $50,000. Elena and Sergei were both honored by the gesture and by the overall message of the award.

Elena explained that her father “always thought that he did what he had to do and never consider his actions as heroism. … Our family is grateful for the prize and considers it as a recognition of his work and heroism. He did his part for the future so that everyone can live on our planet.”

Elena and Sergei with the Future of Life Award

The Future of Life Award seeks to recognize and reward those who take exceptional measures to safeguard the collective future of humanity. Arkhipov, whose courage and composure potentially saved billions of lives, was an obvious choice for the inaugural event.

“Vasili Arkhipov is arguably the most important person in modern history, thanks to whom October 27 2017 isn’t the 55th anniversary of World War III,” FLI president Max Tegmark explained. “We’re showing our gratitude in a way he’d have appreciated, by supporting his loved ones.”

The award also aims to foster a dialogue about the growing existential risks that humanity faces, and the people that work to mitigate them.

Jaan Tallinn, co-founder of FLI, said: “Given that this century will likely bring technologies that can be even more dangerous than nukes, we will badly need more people like Arkhipov — people who will represent humanity’s interests even in the heated moments of a crisis.”

FLI president Max Tegmark presenting the Future of Life Award to Arkhipov’s daughter, Elena, and grandson, Sergei.

 

Arkhipov’s Story

On October 27 1962, during the Cuban Missile Crisis, eleven US Navy destroyers and the aircraft carrier USS Randolph had cornered the Soviet submarine B-59 near Cuba, in international waters outside the US “quarantine” area. Arkhipov was one of the officers on board. The crew had had no contact with Moscow for days and didn’t know whether World War III had already begun. Then the Americans started dropping small depth charges at them which, unbeknownst to the crew, they’d informed Moscow were merely meant to force the sub to surface and leave.

“We thought – that’s it – the end”, crewmember V.P. Orlov recalled. “It felt like you were sitting in a metal barrel, which somebody is constantly blasting with a sledgehammer.”

What the Americans didn’t know was that the B-59 crew had a nuclear torpedo that they were authorized to launch without clearing it with Moscow. As the depth charges intensified and temperatures onboard climbed above 45ºC (113ºF), many crew members fainted from carbon dioxide poisoning, and in the midst of this panic, Captain Savitsky decided to launch their nuclear weapon.

“Maybe the war has already started up there,” he shouted. “We’re gonna blast them now! We will die, but we will sink them all – we will not disgrace our Navy!”

The combination of depth charges, extreme heat, stress, and isolation from the outside world almost lit the fuse of full-scale nuclear war. But it didn’t. The decision to launch a nuclear weapon had to be authorized by three officers on board, and one of them, Vasili Arkhipov, said no.

Amidst the panic, the 34-year old Arkhipov remained calm and tried to talk Captain Savitsky down. He eventually convinced Savitsky that these depth charges were signals for the Soviet submarine to surface, and the sub surfaced safely and headed north, back to the Soviet Union.

It is sobering that very few have heard of Arkhipov, although his decision was perhaps the most valuable individual contribution to human survival in modern history. PBS made a documentary, The Man Who Saved the World, documenting Arkhipov’s moving heroism, and National Geographic profiled him as well in an article titled – You (and almost everyone you know) Owe Your Life to This Man.

The Cold War never became a hot war, in large part thanks to Arkhipov, but the threat of nuclear war remains high. Beatrice Fihn, Executive Director of the International Campaign to Abolish Nuclear Weapons (ICAN) and this year’s recipient of the Nobel Peace Prize, hopes that the Future of Life Award will help draw attention to the current threat of nuclear weapons and encourage more people to stand up to that threat. Fihn explains: “Arkhipov’s story shows how close to nuclear catastrophe we have been in the past. And as the risk of nuclear war is on the rise right now, all states must urgently join the Treaty on the Prohibition of Nuclear Weapons to prevent such catastrophe.”

Of her father’s role in preventing nuclear catastrophe, Elena explained: “We must strive so that the powerful people around the world learn from Vasili’s example. Everybody with power and influence should act within their competence for world peace.”

Understanding Artificial General Intelligence — An Interview With Hiroshi Yamakawa

Click here to see this page in other languages : Japanese  

Artificial general intelligence (AGI) is something of a holy grail for many artificial intelligence researchers. Today’s narrow AI systems are only capable of specific tasks — such as internet searches, driving a car, or playing a video game — but none of the systems today can do all of these tasks. A single AGI would be able to accomplish a breadth and variety of cognitive tasks similar to that of people.

How close are we to developing AGI? How can we ensure that the power of AGI will benefit the world, and not just the group who develops it first? Will AGI become an existential threat for humanity, or an existential hope?

Dr. Hiroshi Yamakawa, Director of Dwango AI Laboratory, is one of the leading AGI researchers in Japan. Members of the Future of Life Institute sat down with Dr. Yamakawa and spoke with him about AGI and his lab’s progress in developing it. In this interview, Dr. Yamakawa explains how AI can model the human brain, his vision of a future where humans coexist with AGI, and why the Japanese think of AI differently than many in the West.

This transcript has been heavily edited for brevity. You can see the full conversation here.

Why did the Dwango Artificial Intelligence Laboratory make a large investment in [AGI]?

HY: Usable AI that has been developed up to now is essentially for solving specific areas or addressing a particular problem. Rather than just solving a number of problems using experience, AGI, we believe, will be more similar to human intelligence that can solve various problems which were not assumed in the design phase.

What is the advantage of the Whole Brain Architecture approach?

HY: The whole brain architecture is an engineering-based research approach “to create a human-like artificial general intelligence (AGI) by learning from the architecture of the entire brain.” Basically, this approach to building AGI is the integration of artificial neural networks and machine-learning modules while using the brain’s hard wiring as a reference.

I think it will be easier to create an AI with the same behavior and sense of values as humans this way. Even if superintelligence exceeds human intelligence in the near future, it will be comparatively easy to communicate with AI designed to think like a human, and this will be useful as machines and humans continue to live and interact with each other.

General intelligence is a function of many combined, interconnected features produced by learning, so we cannot manually break down these features into individual parts. Because of this difficulty, one meaningful characteristic of whole brain architecture is that though based on brain architecture, it is designed to be a functional assembly of parts that can still be broken down and used.

The functional parts of the brain are to some degree already present in artificial neural networks. It follows that we can build a roadmap of AGI based on these technologies as pieces and parts.

It is now said that convolutional neural networks have essentially outperformed the system/interaction between the temporal lobe and visual cortex in terms of image recognition tasks. At the same time, deep learning has been used to achieve very accurate voice recognition. In humans, the neocortex contains about 14 billion neurons, but about half of those can be partially explained with deep learning. From this point on, we need to come closer to simulating the functions of different structures of the brain, and even without the whole brain architecture, we need to be able to assemble several structures together to reproduce some behavioral level functions. Then, I believe, we’ll have a path to expand that development process to cover the rest of the brain functions, and finally integrate as whole brain..

You also started a non-profit, the Whole Brain Architecture Initiative. How does the non-profit’s role differ from the commercial work?

HY: The Whole Brain Architecture Initiative serves as an organization that helps promote whole brain AI architecture R&D as a whole.

The Basic Ideas of the WBAI:

  • Our vision is to create a world in which AI exists in harmony with humanity.
  • Our mission is to promote the open development of whole brain architecture.
    • In order to make human-friendly artificial general intelligence a public good for all of mankind, we seek to continually expand open, collaborative efforts to develop AI based on an architecture modeled after the brain.
  • Our values are Study, Imagine and Build.
    • Study: Deepen and spread our expertise.
    • Imagine: Broaden our views through public dialogue.
    • Build: Create AGI through open collaboration.

What do you think poses the greatest existential risk to global society in the 21st century?

HY: The risk is not just limited to AI; basically, as human scientific and technological abilities expand, and we become more empowered, risks will increase, too.

Imagine a large field where everyone only has weapons as dangerous as bamboo spears.  The risk that human beings would go extinct by killing each other is extremely small.  On the other hand, as technologies develop, we have bombs in a very small room and no matter who detonates the bomb, we approach a state of annihilation. That risk should concern everyone.

If there are only 10 people in the room, they will mutually monitor and trust each other. However, imagine trusting 10 billion people each with the ability to destroy everyone — such a scenario is beyond our ability to comprehend. Of course, technological development will advance not only offensive power but also defensive power, but it is not easy to have defensive power to contain attacking power at the same time. If scientific and technological development are promoted using artificial intelligence technology, for example, many countries will easily hold intercontinental ballistic fleets, and artificial intelligence can be extremely dangerous to living organisms by using nanotechnology. It could comprise a scenario to extinguish mankind by the development or use of dangerous substances.  Generally speaking, new offensive weapons are developed utilizing the progress of technology, and defensive weapons are developed to neutralize them. Therefore, it is inevitable that periods will exist where the offensive power needed to destroy humanity exceeds its defensive power.

What do you think is the greatest benefit that AGI can bring society?

HY: AGI’s greatest benefit comes from acceleration of development for science and technology. More sophisticated technology will offer solutions for global problems such as environmental issues, food problems and space colonization.

Here I would like to share my vision for the future: “In a desirable future, the happiness of all humans will be balanced against the survival of humankind under the support of superintelligence. In that future, society will be an ecosystem formed by augmented human beings and various public AIs, in what I dub ‘an ecosystem of shared intelligent agents’ (EcSIA).

“Although no human can completely understand EcSIA—it is too complex and vast—humans can control its basic directions. In implementing such control, the grace and wealth that EcSIA affords needs to be properly distributed to everyone.”

Assuming no global catastrophe halts progress, what are the odds of human level AGI in the next 10 years?

HY: I think there’s a possibility that it can happen soon, but taking the average of the estimates of people involved in WBAI, we came up with 2030.

In my current role as the editorial chairman for the Japanese Society of Artificial Intelligence (JSAI) journal, I’m promoting a plan to have a series of discussions starting in the July edition on the theme of “Singularity and AI,” in which we’ll have AI specialists discuss the singularity from a technical viewpoint. I want to help spread calm, technical views on the issue in this way, starting in Japan.

Once human level AGI is achieved, how long would you expect it to take for it to self-modify its way up to massive superhuman intelligence?

HY: If human-level AGI is achieved, it could take on the role of an AI researcher itself. Therefore, immediately after the AGI is built, it could start rapidly cultivating great numbers of AI researcher AI’s that work 24/7, and AI R&D would be drastically accelerated.

What probability do you assign to negative consequences as a result of badly done AI design or operation?

HY: If you include the risk of something like some company losing a lot of money, that will definitely happen.

The range of things that can be done with AI is becoming wider, and the disparity will widen between those who profit from it and those who do not. When that happens, the bad economic situation will give rise to dissatisfaction with the system, and that could create a breeding ground for war and strife. This could be perceived as the evils brought about by capitalism. It’s important that we try to curtail the causes of instability as much as possible.

Is it too soon for us to be researching AI Safety?

HY: I do not think it is at all too early to act for safety, and I think we should progress forward quickly. If possible, we should have several methods to be able to calculate the existential risk brought about by AGI.

Is there anything you think that the AI research community should be more aware of, more open about, or taking more action on?

HY: There are a number of actions that are obviously necessary. Based on this notion, we have established a number of measures like the Japanese Society for Artificial Intelligence Ethics in May 2015 (http://ai-elsi.org/ [in Japanese]), and subsequent Ethical Guidelines for AI researchers (http://ai-elsi.org/archives/514).

A majority of the content of these ethical guidelines expresses the standpoint that researchers should move forward with research that contributes to humanity and society. Additionally, one special characteristic of these guidelines is that the ninth principle listed, a call for ethical compliance of AI itself, states that AI in the future should also abide by the same ethical principles as AI researchers.

Japan, as a society, seems more welcoming of automation. Do you think the Japanese view of AI is different than that in the West?

HY: If we look at things from the standpoint of a moral society, we are all human, and without even looking from the viewpoints of one country or another, in general we should start with the mentality that we have more common characteristics than different.

When looking at AI from the traditional background of Japan, there is a strong influence from beliefs that spirits or “kami” are dwelling in all things. The boundary between living things and humans is relatively unclear, and along the same lines, the same boundaries for AI and robots are unclear. For this reason, in the past, robotic characters like “Tetsuwan Atom” (Astro Boy) and Doraemon were depicted as living and existing in the same world as humans, a theme that has been pervasive in Japanese anime for a long time.

From here on out, we will see humans and AI not as separate entities. Rather I think we will see the appearance of new combinations of AI and humans. Becoming more diverse in this way will certainly improve our chances of survival.

As a very personal view, I think that “surviving intelligence” is something that should be preserved in the future because I feel that it is very fortunate that we have established an intelligent society now, beyond the stormy sea of evolution.   Imagine a future in which our humanity is living with intelligent extraterrestrials after first contact. We will start caring about the survival of humanity but also intelligent extraterrestrials.  If that happens, one future scenario is that our dominant values will be extended to the survival of intelligence rather than the survival of the human race itself.

Hiroshi Yamakawa is the Director of Dwango AI Laboratory, Director and Chief Editor of the Japanese Society for Artificial Intelligence, a Fellow Researcher at the Brain Science Institute at Tamagawa University, and the Chairperson of the Whole Brain Architecture Initiative. He specializes in cognitive architecture, concept acquisition, neuro-computing, and opinion collection. He is one of the leading researchers working on AGI in Japan.

To learn more about Dr. Yamakawa’s work, you can read the full interview transcript here.

This interview was prepared by Eric Gastfriend, Jason Orlosky, Mamiko Matsumoto, Benjamin Peterson, Kazue Evans, and Tucker Davey. Original interview date: April 5, 2017. 

DeepMind’s AlphaGo Zero Becomes Go Champion Without Human Input

DeepMind’s AlphaGo Zero AI program just became the Go champion of the world without human data or guidance. This new system marks a significant technological jump from the AlphaGo program which beat Go champion Lee Sedol in 2016.

The game of Go has been played for more than 2,500 years and is widely viewed as not only a game, but a complex art form.  And a popular one at that. When the artificially intelligent AlphaGo from DeepMind played its first game against Sedol in March 2016, 60 million viewers tuned in to watch in China alone. AlphaGo went on to win four of five games, surprising the world and signifying a major achievement in AI research.

Unlike the chess match between Deep Blue and Garry Kasparov in 1997, AlphaGo did not win by brute force computing alone. The more complex programming of AlphaGo amazed viewers not only with the excellency of its play, but also with its creativity. The infamous “move 37” in game two was described by Go player Fan Hui as “So beautiful.” It was also so unusual that one of the commentators thought it was a mistake. Fan Hui explained, “It’s not a human move. I’ve never seen a human play this move.”

In other words, AlphaGo not only signified an iconic technological achievement, but also shook deeply held social and cultural beliefs about mastery and creativity. Yet, it turns out that AlphaGo was only the beginning. Today, DeepMind announced AlphaGo Zero.

Unlike AlphaGo, AlphaGo Zero was not shown a single human game of Go from which to learn. AlphaGo Zero learned entirely from playing against itself, with no prior knowledge of the game. Although its first games were random, the system used what DeepMind is calling a novel form of reinforcement learning to combine a neural network with a powerful search algorithm to improve each time it played.

In a DeepMind blog about the announcement, the authors write, “This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge. Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself.”

Though previous AIs from DeepMind have mastered Atari games without human input, as the authors of the Nature article note, “the game of Go, widely viewed as the grand challenge for artificial intelligence, [requires] a precise and sophisticated lookahead in vast search spaces.” While the old Atari games were much more straightforward, the new AI system for AlphaGo Zero had to master the strategy for immediate moves, as well as how to anticipate moves that might be played far into the future.

That this was done all without human demonstrations also takes the program a step beyond the original AlphaGo systems. But in addition to that, this new system learned with fewer input features than its predecessors, and while the original AlphaGo systems required two separate neural networks, AlphaGo Zero was built with only one.

AlphaGo Zero is not marginally better than its predecessor, but in an entirely new class of “superhuman performance” with an intelligence that is notably more general. After just three days of playing against itself (4.9 million times), AlphaGo Zero beat AlphaGo by 100 games to 0. It independently learned the ancient secrets of the masters, but also chose moves and developed strategies never before seen among human players.

Co-founder​ ​and​ ​CEO of ​DeepMind, Demis​ ​Hassabis, said: “It’s amazing to see just how far AlphaGo has come in only two years. AlphaGo Zero is now the strongest version of our program and shows how much progress we can make even with less computing power and zero use of human data.”

Hassabis continued, “Ultimately we want to harness algorithmic breakthroughs like this to help solve all sorts of pressing real world problems like protein folding or designing new materials. If we can make the same progress on these problems that we have with AlphaGo, it has the potential to drive forward human understanding and positively impact all of our lives.”

ICAN Wins Nobel Peace Prize

We at FLI offer an excited congratulations to the International Campaign to Abolish Nuclear Weapons (ICAN), this year’s winners of the Nobel Peace Prize. We could not be more honored to have had the opportunity to work with ICAN during their campaign to ban nuclear weapons.

Over 70 years have passed since the bombs were first dropped on Hiroshima and Nagasaki, but finally, on July 7 of this year, 122 countries came together at the United Nations to establish a treaty outlawing nuclear weapons. Behind the effort was the small, dedicated team at ICAN, led by Beatrice Fihn. They coordinated with hundreds of NGOs in 100 countries to guide a global discussion and build international support for the ban.

In a statement, they said: “By harnessing the power of the people, we have worked to bring an end to the most destructive weapon ever created – the only weapon that poses an existential threat to all humanity.”

There’s still more work to be done to decrease nuclear stockpiles and rid the world of nuclear threats, but this incredible achievement by ICAN provides the hope and inspiration we need to make the world a safer place.

Perhaps most striking, as seen below in many of the comments by FLI members, is how such a small, passionate group was able to make such a huge difference in the world. Congratulations to everyone at ICAN!

Statements by members of FLI:

Anthony Aguirre: “The work of Bea inspiringly shows that a passionate and committed group of people working to make the world safer can actually succeed!”

Ariel Conn: “Fear and tragedy might monopolize the news lately, but behind the scenes, groups like ICAN are changing the world for the better. Bea and her small team represent great hope for the future, and they are truly an inspiration.”

Tucker Davey: “It’s easy to feel hopeless about the nuclear threat, but Bea and the dedicated ICAN team have clearly demonstrated that a small group can make a difference. Passing the nuclear ban treaty is a huge step towards a safer world, and I hope ICAN’s Nobel Prize inspires others to tackle this urgent threat.”

Victoria Krakovna: “Bea’s dedicated efforts to protect humanity from itself are an inspiration to us all.”

Richard Mallah: “Bea and ICAN have shown such dedication in working to curb the ability of a handful of us to kill most of the rest of us.”

Lucas Perry: “For me, Bea and ICAN have beautifully proven and embodied Margaret Mead’s famous quote, ‘Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.’”

David Stanley: “The work taken on by ICAN’s team is often not glamorous, yet they have acted tirelessly for the past 10 years to protect us all from these abhorrent weapons. They are the few to whom so much is owed.”

Max Tegmark: “It’s been an honor and a pleasure collaborating with ICAN, and the attention brought by this Nobel Prize will help the urgently needed efforts to stigmatize the new nuclear arms race.”

Learn more about the treaty here.

The Future of Humanity Institute Releases Three Papers on Biorisks

Earlier this month, the Future of Humanity Institute (FHI) released three new papers that assess global catastrophic and existential biosecurity risks and offer a cost-benefit analysis of various approaches to dealing with these risks.

The work – done by Piers Millett, Andrew Snyder-Beattie, Sebastian Farquhar, and Owen Cotton-Barratt – looks at what the greatest risks might be, how cost-effective they are to address, and how funding agencies can approach high-risk research.

In one paper, Human Agency and Global Catastrophic Biorisks, Millett and Snyder-Beattie suggest that “the vast majority of global catastrophic biological risk (GCBR) comes from human agency rather than natural resources.” This risk could grow as future technologies allow us to further manipulate our environment and biology. The authors list many of today’s known biological risks but they also highlight how unknown risks in the future could easily arise as technology advances. They call for a GCBR community that will provide “a space for overlapping interests between the health security communities and the global catastrophic risk communities.”

Millett and Snyder-Beattie also authored the paper, Existential Risk and Cost-Effective Biosecurity. This paper looks at the existential threat of future bioweapons to assess whether the risks are high enough to justify investing in threat-mitigation efforts. They consider a spectrum of biosecurity risks, including biocrimes, bioterrorism, and biowarfare, and they look at three models to estimate the risk of extinction from these weapons. As they state in their conclusion: “Although the probability of human extinction from bioweapons may be extremely low, the expected value of reducing the risk (even by a small amount) is still very large, since such risks jeopardize the existence of all future human lives.”

The third paper is Pricing Externalities to Balance Public Risks and Benefits of Research, by Farquhar, Cotton-Barratt, and Snyder-Beattie. Here they consider how scientific funders should “evaluate research with public health risks.” The work was inspired by the controversy surrounding the “gain-of-function” experiments performed on the H5N1 flu virus. The authors propose an approach that translates an estimate of the risk into a financial price, which “can then be included in the cost of the research.” They conclude with the argument that the “approaches discussed would work by aligning the incentives for scientists and for funding bodies more closely with those of society as a whole.”

START from the Beginning: 25 Years of US-Russian Nuclear Weapons Reductions

By Eryn MacDonald and originally posted at the Union of Concerned Scientists.

For the past 25 years, a series of treaties have allowed the US and Russia to greatly reduce their nuclear arsenals—from well over 10,000 each to fewer than 2,000 deployed long-range weapons each. These Strategic Arms Reduction Treaties (START) have enhanced US security by reducing the nuclear threat, providing valuable information about Russia’s nuclear arsenal, and improving predictability and stability in the US-Russia strategic relationship.

Twenty-five years ago, US policy-makers of both parties recognized the benefits of the first START agreement: on October 1, 1992, the Senate voted overwhelmingly—93 to 6—in favor of ratifying START I.

The end of START?

With increased tensions between the US and Russia and an expanded range of security threats for the US to worry about, this longstanding foundation is now more valuable than ever.

The most recent agreement—New START—will expire in early February 2021, but can be extended for another five years if the US and Russian presidents agree to do so. In a January 28 phone call with President Trump, Russian President Putin reportedly raised the possibility of extending the treaty. But instead of being extended, or even maintained, the START framework is now in danger of being abandoned.

President Trump has called New START “one-sided” and “a bad deal,” and has even suggested the US might withdraw from the treaty. His advisors are clearly opposed to doing so. Secretary of State Rex Tillerson expressed support for New START in his confirmation hearing. Secretary of Defense James Mattis, while recently stating that the administration is currently reviewing the treaty “to determine whether it’s a good idea,” has previously also expressed support, as have the head of US Strategic Command and other military officials.

Withdrawal seems unlikely, especially given recent anonymous comments by administration officials saying that the US still sees value in New START and is not looking to discard it. But given the president’s attitude toward the treaty, it may still take some serious pushing from Mattis and other military officials to convince him to extend it. Worse, even if Trump is not re-elected, and the incoming president is more supportive of the treaty, there will be little time for a new administration, taking office in late January 2021, to do an assessment and sign on to an extension before the deadline. While UCS and other treaty supporters will urge the incoming administration to act quickly, if the Trump administration does not extend the treaty, it is quite possible that New START—and the security benefits it provides—will lapse.

The Beginning: The Basics and Benefits of START I

The overwhelming bipartisan support for a treaty cutting US nuclear weapons demonstrated by the START I ratification vote today seems unbelievable. At the time, however, both Democrats and Republicans in Congress, as well as the first President Bush, recognized the importance of the historic agreement, the first to require an actual reduction, rather than simply a limitation, in the number of US and Russian strategic nuclear weapons.

By the end of the Cold War, the US had about 23,000 nuclear warheads in its arsenal, and the Soviet Union had roughly 40,000. These numbers included about 12,000 US and 11,000 Soviet deployed strategic warheads—those mounted on long-range missiles and bombers. The treaty limited each country to 1,600 strategic missiles and bombers and 6,000 warheads, and established procedures for verifying these limits.

The limits on missiles and bombers, in addition to limits on the warheads themselves, were significant because START required the verifiable destruction of any excess delivery vehicles, which gave each side confidence that the reductions could not be quickly or easily reversed. To do this, the treaty established a robust verification regime with an unprecedented level of intrusiveness, including on-site inspections and exchanges of data about missile telemetry.

Though the groundwork for START I was laid during the Reagan administration, ratification and implementation took place during the first President Bush’s term. The treaty was one among several measures taken by the elder Bush that reduced the US nuclear stockpile by nearly 50 percent during his time in office.

START I entered into force in 1994 and had a 15-year lifetime; it required the US and Russia to complete reductions by 2001, and maintain those reductions until 2009. However, both countries actually continued reductions after reaching the START I limits. By the end of the Bush I administration, the US had already reduced its arsenal to just over 7,000 deployed strategic warheads. By the time the treaty expired, this number had fallen to roughly 3,900.

The Legacy of START I

Building on the success of START I, the US and Russia negotiated a follow-on treaty—START II—that required further cuts in deployed strategic weapons. These reductions were to be carried out in two steps, but when fully implemented would limit each country to 3,500 deployed strategic warheads, with no more than 1,750 of these on submarine-launched ballistic missiles.

Phase II also required the complete elimination of independently targetable re-entry vehicles (MIRVs) on intercontinental ballistic missiles. This marked a major step forward, because MIRVs were a particularly destabilizing configuration. Since just one incoming warhead could destroy all the warheads on a MIRVed land-based missile, MIRVs create pressure to “use them or lose them”—an incentive to strike first in a crisis. Otherwise, a country risked losing its ability to use those missiles to retaliate in the case of a first strike against it.

While both sides ratified START II, it was a long and contentious process, and entry into force was complicated by provisions attached by both the US Senate and Russian Duma. The US withdrawal from the Anti-Ballistic Missile (ABM) treaty in 2002 was the kiss of death for START II. The ABM treaty had strictly limited missile defenses. Removing this limit created a situation in which either side might feel it had to deploy more and more weapons to be sure it could overcome the other’s defense. But the George W. Bush administration was now committed to building a larger-scale defense, regardless of Russia’s vocal opposition and clear statements that doing so would undermine arms control progress.

Russia responded by announcing its withdrawal from START II, finally ending efforts to bring the treaty into force. A proposed START III treaty, which would have called for further reductions to 2,000 to 2,500 warheads on each side, never materialized; negotiations had been planned to begin after entry into force of START II.

After the failure of START II, the US and Russia negotiated the Strategic Offensive Reductions Treaty (SORT, often called the “Moscow Treaty”). SORT required each party to reduce to 1,700 to 2,200 deployed strategic warheads, but was a much less formal treaty than START. It did not include the same kind of extensive verification regime and, in fact, did not even define what was considered a “strategic warhead,” instead leaving each party to decide for itself what it would count. This meant that although SORT did encourage further progress to lower numbers of weapons, overall it did not provide the same kind of benefits for the US as START had.

New START

Recognizing the deficiencies of the minimal SORT agreement, the Obama administration made negotiation of New START an early priority, and the treaty was ratified in 2010.

New START limits each party to 1,550 deployed strategic nuclear warheads by February 2018. The treaty also limits the number of deployed intercontinental ballistic missiles, submarine-launched ballistic missiles, and long-range bombers equipped to carry nuclear weapons to no more than 700 on each side. Altogether, no more than 800 deployed and non-deployed missiles and bombers are allowed for each side.

In reality, each country will deploy somewhat more than 1,550 warheads—probably around 1,800 each—because of a change in the way New START counts warheads carried by long-range bombers. START I assigned a number of warheads to each bomber based on its capabilities. New START simply counts each long-range bomber as a single warhead, regardless of the actual number it does or could carry. The less stringent limits on bombers are possible because bombers are considered less destabilizing than missiles. The bombers’ detectability and long flight times—measured in hours vs. the roughly thirty minutes it takes for a missile to fly between the United States and Russia—mean that neither side is likely to use them to launch a first strike.

Both the United States and Russia have been moving toward compliance with the New START limits, and as of July 1, 2017—when the most recent official exchange of data took place—both are under the limit for deployed strategic delivery vehicles and close to meeting the limit for deployed and non-deployed strategic delivery vehicles. The data show that the United States is currently slightly under the limit for deployed strategic warheads, at 1,411, while Russia, with 1,765, still has some cuts to make to reach this limit.

Even in the increasingly partisan atmosphere of the 2000s, New START gained support from a wide range of senators, as well as military leaders and national security experts. The treaty passed in the Senate with a vote of 71 to 26; thirteen Republicans joined all Democratic senators in voting in favor. While this is significantly closer than the START I vote, as then-Senator John F. Kerry noted at the time, “in today’s Senate, 70 votes is yesterday’s 95.”

And the treaty continues to have strong support—including from Air Force General John Hyten, commander of US Strategic Command, which is responsible for all US nuclear forces. In Congressional testimony earlier this year, Hyten called himself “a big supporter” of New START and said that “when it comes to nuclear weapons and nuclear capabilities, that bilateral, verifiable arms control agreements are essential to our ability to provide an effective deterrent.” Another Air Force general, Paul Selva, vice chair of the Joint Chiefs of Staff, agreed, saying in the same hearing that when New START was ratified in 2010, “the Joint Chiefs reviewed the components of the treaty—and endorsed it. It is a bilateral, verifiable agreement that gives us some degree of predictability on what our potential adversaries look like.”

The military understands the benefits of New START. That President Trump has the power to withdraw from the treaty despite support from those who are most directly affected by it is, as he would say, “SAD.”

That the US president fails to understand the value of US-Russian nuclear weapon treaties that have helped to maintain stability for more than two decades is a travesty.

Explainable AI: a discussion with Dan Weld

Machine learning systems are confusing – just ask any AI researcher. Their deep neural networks operate incredibly quickly, considering thousands of possibilities in seconds before making decisions. The human brain simply can’t keep up.

When people learn to play Go, instructors can challenge their decisions and hear their explanations. Through this interaction, teachers determine the limits of a student’s understanding. But DeepMind’s AlphaGo, which recently beat the world’s champions at Go, can’t answer these questions. When AlphaGo makes an unexpected decision it’s difficult to understand why it made that choice.

Admittedly, the stakes are low with AlphaGo: no one gets hurt if it makes an unexpected move and loses. But deploying intelligent machines that we can’t understand could set a dangerous precedent.

According to computer scientist Dan Weld, understanding and trusting machines is “the key problem to solve” in AI safety, and it’s necessary today. He explains, “Since machine learning is at the core of pretty much every AI success story, it’s really important for us to be able to understand what it is that the machine learned.”

As machine learning (ML) systems assume greater control in healthcare, transportation, and finance, trusting their decisions becomes increasingly important. If researchers can program AIs to explain their decisions and answer questions, as Weld is trying to do, we can better assess whether they will operate safely on their own.

 

Teaching Machines to Explain Themselves

Weld has worked on techniques that expose blind spots in ML systems, or “unknown unknowns.”

When an ML system faces a “known unknown,” it recognizes its uncertainty with the situation. However, when it encounters an unknown unknown, it won’t even recognize that this is an uncertain situation: the system will have extremely high confidence that its result is correct, but it will be wrong. Often, classifiers have this confidence because they were “trained on data that had some regularity in it that’s not reflected in the real world,” Weld says.

Consider an ML system that has been trained to classify images of dogs, but has only been trained on images of brown and black dogs. If this system sees a white dog for the first time, it might confidently assert that it’s not a dog. This is an “unknown unknown” – trained on incomplete data, the classifier has no idea that it’s completely wrong.

ML systems can be programmed to ask for human oversight on known unknowns, but since they don’t recognize unknown unknowns, they can’t easily ask for oversight. Weld’s research team is developing techniques to facilitate this, and he believes that it will complement explainability. “After finding unknown unknowns, the next thing the human probably wants is to know WHY the learner made those mistakes, and why it was so confident,” he explains.

Machines don’t “think” like humans do, but that doesn’t mean researchers can’t engineer them to explain their decisions.

One research group jointly trained a ML classifier to recognize images of birds and generate captions. If the AI recognizes a toucan, for example, the researchers can ask “why.” The neural net can then generate an explanation that the huge, colorful bill indicated a toucan.

While AI developers will prefer certain concepts explained graphically, consumers will need these interactions to involve natural language and more simplified explanations. “Any explanation is built on simplifying assumptions, but there’s a tricky judgment question about what simplifying assumptions are OK to make. Different audiences want different levels of detail,” says Weld.

Explaining the bird’s huge, colorful bill might suffice in image recognition tasks, but with medical diagnoses and financial trades, researchers and users will want more. Like a teacher-student relationship, human and machine should be able to discuss what the AI has learned and where it still needs work, drilling down on details when necessary.

“We want to find mistakes in their reasoning, understand why they’re making these mistakes, and then work towards correcting them,” Weld adds.    

 

Managing Unpredictable Behavior

Yet, ML systems will inevitably surprise researchers. Weld explains, “The system can and will find some way of achieving its objective that’s different from what you thought.”

Governments and businesses can’t afford to deploy highly intelligent AI systems that make unexpected, harmful decisions, especially if these systems control the stock market, power grids, or data privacy. To control this unpredictability, Weld wants to engineer AIs to get approval from humans before executing novel plans.

“It’s a judgment call,” he says. “If it has seen humans executing actions 1-3, then that’s a normal thing. On the other hand, if it comes up with some especially clever way of achieving the goal by executing this rarely-used action number 5, maybe it should run that one by a live human being.”

Over time, this process will create norms for AIs, as they learn which actions are safe and which actions need confirmation.

 

Implications for Current AI Systems

The people that use AI systems often misunderstand their limitations. The doctor using an AI to catch disease hasn’t trained the AI and can’t understand its machine learning. And the AI system, not programmed to explain its decisions, can’t communicate problems to the doctor.

Weld wants to see an AI system that interacts with a pre-trained ML system and learns how the pre-trained system might fail. This system could analyze the doctor’s new diagnostic software to find its blind spots, such as its unknown unknowns. Explainable AI software could then enable the AI to converse with the doctor, answering questions and clarifying uncertainties.

And the applications extend to finance algorithms, personal assistants, self-driving cars, and even predicting recidivism in the legal system, where explanation could help root out bias. ML systems are so complex that humans may never be able to understand them completely, but this back-and-forth dialogue is a crucial first step.

“I think it’s really about trust and how can we build more trustworthy AI systems,” Weld explains. “The more you interact with something, the more shared experience you have, the more you can talk about what’s going on. I think all those things rightfully build trust.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence: The Challenge to Keep It Safe

Safety Principle: AI systems should be safe and secure throughout their operational lifetime and verifiably so where applicable and feasible.

When a new car is introduced to the world, it must pass various safety tests to satisfy not just government regulations, but also public expectations. In fact, safety has become a top selling point among car buyers.

And it’s not just cars. Whatever the latest generation of any technology happens to be — from appliances to airplanes — manufacturers know that customers expect their products to be safe from start to finish.

Artificial intelligence is no different. So, on the face of it, the Safety Principle seems like a “no brainer,” as Harvard psychologist Joshua Greene described it. It’s obviously not in anyone’s best interest for an AI product to injure its owner or anyone else. But, as Greene and other researchers highlight below, this principle is much more complex than it appears at first glance.

“This is important, obviously,” said University of Connecticut philosopher Susan Schneider, but she expressed uncertainty about our ability to verify that we can trust a system as it gets increasingly intelligent. She pointed out that at a certain level of intelligence, the AI will be able to rewrite its own code, and with superintelligent systems “we may not even be able to understand the program to begin with.”

What Is AI Safety?

This principle gets to the heart of the AI safety research initiative: how can we ensure safety for a technology that is designed to learn how to modify its own behavior?

Artificial intelligence is designed so that it can learn from interactions with its surroundings and alter its behavior accordingly, which could provide incredible benefits to humanity. Because AI can address so many problems more effectively than people, it has huge potential to improve health and wellbeing for everyone. But it’s not hard to imagine how this technology could go awry. And we don’t need to achieve superintelligence for this to become a problem.

Microsoft’s chatbot, Tay, is a recent example of how an AI can learn negative behavior from its environment, producing results quite the opposite from what its creators had in mind. Meanwhile, the Tesla car accident, in which the vehicle mistook a white truck for a clear sky, offers an example of an AI misunderstanding its surrounding and taking deadly action as a result.

Researchers can try to learn from AI gone astray, but current designs often lack transparency, and much of today’s artificial intelligence is essentially a black box. AI developers can’t always figure out how or why AIs take various actions, and this will likely only grow more challenging as AI becomes more complex.

However, Ian Goodfellow, a research scientist at Google Brain, is hopeful, pointing to efforts already underway to address these concerns.

“Applying traditional security techniques to AI gives us a concrete path to achieving AI safety,” Goodfellow explains. “If we can design a method that prevents even a malicious attacker from causing an AI to take an undesirable action, then it is even less likely that the AI would choose an undesirable action independently.”

AI safety may be a challenge, but there’s no reason to believe it’s insurmountable. So what do other AI experts say about how we can interpret and implement the Safety Principle?

What Does ‘Verifiably’ Mean?

‘Verifiably’ was the word that caught the eye of many researchers as a crucial part of this Principle.

John Havens, an Executive Director with IEEE, first considered the Safety Principle in its entirety, saying,  “I don’t know who wouldn’t say AI systems should be safe and secure. … ‘Throughout their operational lifetime’ is actually the more important part of the sentence, because that’s about sustainability and longevity.”

But then, he added, “My favorite part of the sentence is ‘and verifiably so.’ That is critical. Because that means, even if you and I don’t agree on what ‘safe and secure’ means, but we do agree on verifiability, then you can go, ‘well, here’s my certification, here’s my checklist.’ And I can go, ‘Great, thanks.’ I can look at it, and say, ‘oh, I see you got things 1-10, but what about 11-15?’ Verifiably is a critical part of that sentence.”

AI researcher Susan Craw noted that the Principle “is linked to transparency.” She explained, “Maybe ‘verifiably so’ would be possible with systems if they were a bit more transparent about how they were doing things.”

Greene also noted the complexity and challenge presented by the Principle when he suggested:

“It depends what you mean by ‘verifiably.’ Does ‘verifiably’ mean mathematically, logically proven? That might be impossible. Does ‘verifiably’ mean you’ve taken some measures to show that a good outcome is most likely? If you’re talking about a small risk of a catastrophic outcome, maybe that’s not good enough.”

Safety and Value Alignment

Any consideration of AI safety must also include value alignment: how can we design artificial intelligence that can align with the global diversity of human values, especially taking into account that, often, what we ask for is not necessarily what we want?

“Safety is not just a technical problem,” Patrick Lin, a philosopher at California Polytechnic told me. “If you just make AI that can align perfectly with whatever values you set it to, well the problem is, people can have a range of values, and some of them are bad. Just merely matching AI, aligning it to whatever value you specify I think is not good enough. It’s a good start, it’s a good big picture goal to make AI safe, and the technical element is a big part of it; but again, I think safety also means policy and norm-setting.”

And the value-alignment problem becomes even more of a safety issue as the artificial intelligence gets closer to meeting — and exceeding — human intelligence.

“Consider the example of the Japanese androids that are being developed for elder care,” said Schneider. “They’re not smart; right now, the emphasis is on physical appearance and motor skills. But imagine when one of these androids is actually engaged in elder care … It has to multitask and exhibit cognitive flexibility. … That raises the demand for household assistants that are AGIs. And once you get to the level of artificial general intelligence, it’s harder to control the machines. We can’t even make sure fellow humans have the right goals; why should we think AGI will have values that align with ours, let alone that a superintelligence would.”

Defining Safety

But perhaps it’s time to reconsider the definition of safety, as Lin alluded to above. Havens also requested “words that further explain ‘safe and secure,’” suggesting that we need to expand the definition beyond “physically safe” to “provide increased well being.”

Anca Dragan, an associate professor at UC Berkeley, was particularly interested in the definition of “safe.”

“We all agree that we want our systems to be safe,” said Dragan. “More interesting is what do we mean by ‘safe’, and what are acceptable ways of verifying safety.

“Traditional methods for formal verification that prove (under certain assumptions) that a system will satisfy desired constraints seem difficult to scale to more complex and even learned behavior. Moreover, as AI advances, it becomes less clear what these constraints should be, and it becomes easier to forget important constraints. … we need to rethink what we mean by safe, perhaps building in safety from the get-go as opposed to designing a capable system and adding safety after.”

What Do You Think?

What does it mean for a system to be safe? Does it mean the owner doesn’t get hurt? Are “injuries” limited to physical ailments, or does safety also encompass financial or emotional damage? And what if an AI is being used for self-defense or by the military? Can an AI harm an attacker? How can we ensure that a robot or software program or any other AI system remains verifiably safe throughout its lifetime, even as it continues to learn and develop on its own? How much risk are we willing to accept in order to gain the potential benefits that increasingly intelligent AI — and ultimately superintelligence — could bestow?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Countries Sign UN Treaty to Outlaw Nuclear Weapons

Update 9/25/17: 53 countries have now signed and 3 have ratified.

Today, 50 countries took an important step toward a nuclear-free world by signing the United Nations Treaty on the Prohibition of Nuclear Weapons. This is the first treaty to legally ban nuclear weapons, just as we’ve seen done previously with chemical and biological weapons.

A Long Time in the Making

In 1933, Leo Szilard first came up with the idea of a nuclear chain reaction. Only a few years later, the Manhattan Project was underway, culminating in the nuclear attacks against Hiroshima and Nagasaki in 1945. In the following decades of the Cold War, the U.S. and Russia amassed arsenals that peaked at over 70,000 nuclear weapons in total, though that number is significantly less today. The U.K, France, China, Israel, India, Pakistan, and North Korea have also built up their own, much smaller arsenals.

Over the decades, the United Nations has established many treaties relating to nuclear weapons, including the non-proliferation treaty, START I, START II, the Comprehensive Nuclear Test Ban Treaty, and New START. Though a few other countries began nuclear weapons programs, most of those were abandoned, and the majority of the world’s countries have rejected nuclear weapons outright.

Now, over 70 years since the bombs were first dropped on Japan, the United Nations finally has a treaty outlawing nuclear weapons.

The Treaty

The Treaty on the Prohibition of Nuclear Weapons was adopted on July 7, with a vote of approval from 122 countries. As part of the treaty, the states who sign agree that they will never “[d]evelop, test, produce, manufacture, otherwise acquire, possess or stockpile nuclear weapons or other nuclear explosive devices.” Signatories also promise not to assist other countries with such efforts, and no signatory will “[a]llow any stationing, installation or deployment of any nuclear weapons or other nuclear explosive devices in its territory or at any place under its jurisdiction or control.”

Not only had 50 countries signed the treaty at the time this article was written, but 3 of them also already ratified it. The treaty will enter into force 90 days after it’s ratified by 50 countries.

The International Campaign to Abolish Nuclear Weapons (ICAN) is tracking progress of the treaty, with a list of countries that have signed and ratified it so far.

At the ceremony, UN Secretary General António Guterres said, “The Treaty on the Prohibition of Nuclear Weapons is the product of increasing concerns over the risk posed by the continued existence of nuclear weapons, including the catastrophic humanitarian and environmental consequences of their use.”

Still More to Do

Though countries that don’t currently have nuclear weapons are eager to see the treaty ratified, no one is foolish enough to think that will magically rid the world of nuclear weapons.

“Today we rightfully celebrate a milestone.  Now we must continue along the hard road towards the elimination of nuclear arsenals,” Guterres added in his statement.

There are still over 15,000 nuclear weapons in the world today. While that’s significantly less than we’ve had in the past, it’s still more than enough to kill most people on earth.

The U.S. and Russia hold most of these weapons, but as we’re seeing from the news out of North Korea, a country doesn’t need to have thousands of nuclear weapons to present a destabilizing threat.

Susi Snyder, author of Pax’s Don’t Bank on the Bomb and a leading advocate of the treaty, told FLI:

“The countries signing the treaty are the responsible actors we need in these times of uncertainty, fire, fury, and devastating threats. They show it is possible and preferable to choose diplomacy over war.

Earlier this summer, some of the world’s leading scientists also came together in support of the nuclear ban with this video that was presented to the United Nations:

Stanislav Petrov

The signing of the treaty has occurred within a week of both the news of the death of Stanislav Petrov, as well as of Petrov day. On September 26, 1983, Petrov chose to follow his gut rather than rely on what turned out to be faulty satellite data. In doing so, he prevented what could have easily escalated into full-scale global nuclear war.

Stanislav Petrov, the Man Who Saved the World, Has Died

September 23, 1983: Soviet Union Detects Incoming Missiles

A Soviet early warning satellite showed that the United States had launched five land-based missiles at the Soviet Union. The alert came at a time of high tension between the two countries, due in part to the U.S. military buildup in the early 1980s and President Ronald Reagan’s anti-Soviet rhetoric. In addition, earlier in the month the Soviet Union shot down a Korean Airlines passenger plane that strayed into its airspace, killing almost 300 people. Stanislav Petrov, the Soviet officer on duty, had only minutes to decide whether or not the satellite data were a false alarm. Since the satellite was found to be operating properly, following procedures would have led him to report an incoming attack. Going partly on gut instinct and believing the United States was unlikely to fire only five missiles, he told his commanders that it was a false alarm before he knew that to be true. Later investigations revealed that reflection of the sun on the tops of clouds had fooled the satellite into thinking it was detecting missile launches (Accidental Nuclear War: a Timeline of Close Calls).

Petrov is widely credited for having saved millions if not billions of people with his decision to ignore satellite reports, preventing accidental escalation into what could have become a full-scale nuclear war. This event was turned into the movie “The Man Who Saved the World,” and Petrov was honored at the United Nations and given the World Citizen Award.

All of us at FLI were saddened to learn that Stanislav Petrov passed away this past May. News of his death was announced this weekend. Petrov was to be honored during the release of a new documentary, also called The Man Who Saved the World, in February of 2018. Stephen Mao, who is an executive producer of this documentary, told FLI that though they had originally planned to honor Petrov in person at February’s Russian theatrical premier, “this will now be an event where we will eulogize and remember Stanislav for his contribution to the world.”

Jakob Staberg, the movie’s producer, said:

“Stanislav saved the world but lost everything and was left alone. Taking part in our film, The Man Who Saved the World, his name and story came out to the whole world. Hopefully the actions of Stanislav will inspire other people to take a stand for good and not to forget that the nuclear threat is still very real. I will remember Stanislav’s own humble words about his actions: ‘I just was at the right place at the right time’. Yes, you were Stanislav. And even though you probably would argue that I am wrong, I am happy it was YOU who was there in that moment. Not many people would have the courage to do what you did. Thank you.”

You can read more about Petrov’s life and heroic actions in the New York Times obituary.

Understanding the Risks and Limitations of North Korea’s Nuclear Program

By Kirsten Gronlund

Late last month, North Korea launched a ballistic missile test whose trajectory arced over Japan. And this past weekend, Pyongyang flaunted its nuclear capabilities with an underground test of what it claims was a hydrogen bomb: a more complicated—and powerful—alternative to the atomic bombs it has previously tested.

Though North Korea has launched rockets over its eastern neighbor twice before—in 1998 and 2009—those previous launches carried satellites, not warheads. And the reasoning behind those two previous launches was seemingly innocuous: eastern-directed launches use the earth’s spin to most effectively put a satellite in orbit. Since 2009, North Korea has taken to launching its satellites southward, sacrificing maximal launch conditions to keep the peace with Japan. This most recent launch, however, seemed intentionally designed to aggravate tensions not only with Japan but also with the U.S. And while there is no way to verify North Korea’s claim that it tested a hydrogen bomb, in such a tense environment the claim itself is enough to provoke Washington.

What We Know

In light of these and other recent developments, I spoke with Dr. David Wright, an expert on North Korean nuclear missiles at the Union of Concerned Scientists, to better understand the real risks associated with North Korea’s nuclear program. He described what he calls the “big question”: now that its missile program is advancing rapidly, can North Korea build good enough—that is, small enough, light enough, and rugged enough—nuclear weapons to be carried by these missiles?

Pyongyang has now successfully detonated nuclear weapons in six underground tests, but these tests have been carried out in ideal conditions, far from the reality of a ballistic launch. Wright and others believe that North Korea likely has warheads that can be delivered via short-range missiles that can reach South Korea or Japan. They have deployed such missiles for years. But it remains unclear whether North Korean warheads would be deliverable via long-range missiles.

Until last Monday’s launch, North Korea has sought to avoid provoking its neighbors by not conducting missile tests that would pass over other countries. Instead it has tested its missiles by shooting them upwards on highly lofted trajectories that land them in the Sea of Japan. This has caused some confusion about the range that North Korean missiles have achieved. Wright, however, uses height data from these launches to calculate the potential range that its missiles would reach on standard trajectories.

To date, North Korea’s farthest test launch—in July of this year—had the range to reach large cities in the U.S. mainland. That range, however, depends on the weight of the warhead used in the tests, a factor that remains unknown. Thus while North Korea is capable of launching missiles that would hit the U.S., it is unclear whether such missiles could actually deliver a nuclear warhead to that range.

A second key question, according to Wright, is one of numbers: how many missiles and warheads do the North Koreans have? Dr. Siegfried Hecker, former head of Los Alamos weapons laboratory, makes the following estimates based in part on visits he has made to North Korea’s Yongbyon laboratory. In terms of nuclear material, Hecker suggests that the North Koreans have “20 to 40 kilograms plutonium and 200 to 450 kilograms highly enriched uranium.” This material, he estimates, would “suffice for perhaps 20 to 25 nuclear weapons, not the 60 reported in the leaked intelligence estimate.” Based on past underground tests, it was estimated that the biggest yield of a North Korean warhead was about the size of the bomb that destroyed Hiroshima—which, though potentially devastating, is still about 20 times smaller than most U.S. warheads. The test this past weekend outsized its largest previous yield by a factor of five or more.

As for missiles, Wright says estimates suggest that North Korea may have a few hundred short- and medium-range missiles. The number of long-range missiles, however, is unknown—as is the speed with which new ones could be built. In the near term, Wright believes the number is likely to be small.

What seems clear is that Kim Jong Un, following his father’s death, began pouring money and resources into developing weapons technology and expertise. Since Kim Jong Un has taken power, the country’s rate of missile tests has skyrocketed: since last June, it has performed roughly 30 tests.

It has also unveiled a surprising number of new types of missiles. For years, the longest-range North Korean missiles reached about 1300 km—just putting Japan within range. In mid-May of this year, however, North Korea launched a missile with a potential range (depending on its payload) of more than 4000 km, for the first time putting Guam—which is 3500 km from North Korea—in reach. Then in July, that range increased again. The first launch in that month could reach 7000 km; the second—their current record—could travel more than 10,000 km, about the distance from North Korea to Chicago.

An Existential Risk?

On its own, the North Korean nuclear arsenal does not pose an existential risk—it is too small. According to Wright, the consequences of a North Korean nuclear strike, if successful, would be catastrophic—but not on an existential scale. He worries, though, about how the U.S. might respond. As Wright puts it, “When people start talking about using nuclear weapons, there’s a huge uncertainty about how countries will react.”

That said, the U.S. has overwhelming conventional military capabilities that could devastate North Korea. A nuclear response would not be necessary to neutralize any further threat from Pyongyang. But there are people who would argue that failure to launch a nuclear response would weaken deterrence. “I think,” says Wright, “that if North Korea launched a nuclear missile against its neighbors or the United States, there would be tremendous pressure to respond with nuclear weapons.”

Wright notes that moments of crisis have been shown to produce unpredictable responses: “There would be no reason for the U.S. to use nuclear weapons, but there is evidence to suggest that in high pressure situations, people don’t always think these things through. For example, we know that there have been war simulations that the U.S. has done where the adversary using anti-satellite weapons against the United States has led to the U.S. using nuclear weapons.”

Wright also worries about accidents, errors, and misinterpretations. While North Korea does not have the ability to detect launches or incoming missiles, it does have a lot of anti-aircraft radar. Wright offers the following example of a misinterpretation that could stem from North Korean detection of U.S. aircraft.

The U.S. has repeatedly said that it is keeping all options on the table—including a nuclear strike. It also talks about preemptive military strikes against North Korean launch sites and support areas, which would include targets in the Pyongyang area. North Korea knows this.

The aircraft that it would use in such a strike are likely its B-1 bombers. The B-1 once carried nuclear weapons but, per a treaty with Russia, has been modified to rid it of its nuclear capabilities. Despite U.S. attempts to emphasize this fact, however, Wright says that “statements we’ve seen from North Korea make you wonder whether it really has confidence that the B-1s haven’t been re-modified to carry nuclear weapons again”; the North Koreans, for example, repeatedly refer to the B-1 as nuclear-capable.

Now imagine that U.S. intelligence detects launch preparations of several North Korean missiles. The U.S. interprets this as the precursor to a launch toward Guam, which North Korea has previously threatened. The U.S. then sends a conventional preemptive strike to destroy those missiles using B-1s. In such a crisis, Wright reminds us, “Tensions are very high, people are making worst-case assumptions, they’re making fast decisions, and they’re worried about being caught by surprise.” It is feasible that, having detected the incoming B-1 bombers flying toward Pyongyang, North Korea would assume them to be carrying nuclear weapons. Under this assumption, they might fire short-range ballistic missiles at South Korea. This illustrates how misinterpretations might drive a crisis.

“Presumably,” says Wright, “the U.S. understands the risk of military attacks and such a scenario is unlikely.” He remains hopeful that “the two sides will find a way to step back from the brink.”

Friendly AI: Aligning Goals

The following is an excerpt from my new book, Life 3.0: Being Human in the Age of Artificial Intelligence. You can join and follow the discussion at ageofai.org.

The more intelligent and powerful machines get, the more important it becomes that their goals are aligned with ours. As long as we build only relatively dumb machines, the question isn’t whether human goals will prevail in the end, but merely how much trouble these machines can cause humanity before we figure out how to solve the goal-alignment problem. If a superintelligence is ever unleashed, however, it will be the other way around: since intelligence is the ability to accomplish goals, a superintelligent AI is by definition much better at accomplishing its goals than we humans are at accomplishing ours, and will therefore prevail.

If you want to experience a machine’s goals trumping yours right now, simply download a state-of-the-art chess engine and try beating it. You never will, and it gets old quickly…

In other words, the real risk with AGI isn’t malice but competence. A superintelligent AI will be extremely good at accomplishing its goals, and if those goals aren’t aligned with ours, we’re in trouble. People don’t think twice about flooding anthills to build hydroelectric dams, so let’s not place humanity in the position of those ants. Most researchers therefore argue that if we ever end up creating superintelligence, then we should make sure it’s what AI-safety pioneer Eliezer Yudkowsky has termed “friendly AI”: AI whose goals are aligned with ours.

Figuring out how to align the goals of a superintelligent AI with our goals isn’t just important, but also hard. In fact, it’s currently an unsolved problem. It splits into three tough sub-problems, each of which is the subject of active research by computer scientists and other thinkers:

1. Making AI learn our goals
2. Making AI adopt our goals
3. Making AI retain our goals

Let’s explore them in turn, deferring the question of what to mean by “our goals” to the next section.

To learn our goals, an AI must figure out not what we do, but why we do it. We humans accomplish this so effortlessly that it’s easy to forget how hard the task is for a computer, and how easy it is to misunderstand. If you ask a future self-driving car to take you to the airport as fast as possible and it takes you literally, you’ll get there chased by helicopters and covered in vomit. If you exclaim “That’s not what I wanted!”, it can justifiably answer: “That’s what you asked for.” The same theme recurs in many famous stories. In the ancient Greek legend, King Midas asked that everything he touched turn to gold, but was disappointed when this prevented him from eating and even more so when he inadvertently turned his daughter to gold. In the stories where a genie grants three wishes, there are many variants for the first two wishes, but the third wish is almost always the same: “please undo the first two wishes, because that’s not what I really wanted.”

All these examples show that to figure out what people really want, you can’t merely go by what they say. You also need a detailed model of the world, including the many shared preferences that we tend to leave unstated because we consider them obvious, such as that we don’t like vomiting or eating gold.

Once we have such a world-model, we can often figure out what people want even if they don’t tell us, simply by observing their goal-oriented behavior. Indeed, children of hypocrites usually learn more from what they see their parents do than from what they hear them say.

AI researchers are currently trying hard to enable machines to infer goals from behavior, and this will be useful also long before any superintelligence comes on the scene. For example, a retired man may appreciate it if his eldercare robot can figure out what he values simply by observing him, so that he’s spared the hassle of having to explain everything with words or computer programming.

One challenge involves finding a good way to encode arbitrary systems of goals and ethical principles into a computer, and another challenge is making machines that can figure out which particular system best matches the behavior they observe.

A currently popular approach to the second challenge is known in geek-speak as inverse reinforcement learning, which is the main focus of a new Berkeley research center that Stuart Russell has launched. Suppose, for example, that an AI watches a firefighter run into a burning building and save a baby boy. It might conclude that her goal was rescuing him and that her ethical principles are such that she values his life higher than the comfort of relaxing in her firetruck — and indeed values it enough to risk her own safety. But it might alternatively infer that the firefighter was freezing and craved heat, or that she did it for the exercise. If this one example were all the AI knew about firefighters, fires and babies, it would indeed be impossible to know which explanation was correct.

However, a key idea underlying inverse reinforcement learning is that we make decisions all the time, and that every decision we make reveals something about our goals. The hope is therefore that by observing lots of people in lots of situations (either for real or in movies and books), the AI can eventually build an accurate model of all our preferences.

Even if an AI can be built to learn what your goals are, this doesn’t mean that it will necessarily adopt them. Consider your least favorite politicians: you know what they want, but that’s not what you want, and even though they try hard, they’ve failed to persuade you to adopt their goals.

We have many strategies for imbuing our children with our goals — some more successful than others, as I’ve learned from raising two teenage boys. When those to be persuaded are computers rather than people, the challenge is known as the value-loading problem, and it’s even harder than the moral education of children. Consider an AI system whose intelligence is gradually being improved from subhuman to superhuman, first by us tinkering with it and then through recursive self-improvement. At first, it’s much less powerful than you, so it can’t prevent you from shutting it down and replacing those parts of its software and data that encode its goals — but this won’t help, because it’s still too dumb to fully understand your goals, which require human-level intelligence to comprehend. At last, it’s much smarter than you and hopefully able to understand your goals perfectly — but this may not help either, because by now, it’s much more powerful than you and might not let you shut it down and replace its goals any more than you let those politicians replace your goals with theirs.

In other words, the time window during which you can load your goals into an AI may be quite short: the brief period between when it’s too dumb to get you and too smart to let you. The reason that value loading can be harder with machines than with people is that their intelligence growth can be much faster: whereas children can spend many years in that magic persuadable window where their intelligence is comparable to that of their parents, an AI might blow through this window in a matter of days or hours.

Some researchers are pursuing an alternative approach to making machines adopt our goals, which goes by the buzzword “corrigibility.” The hope is that one can give a primitive AI a goal system such that it simply doesn’t care if you occasionally shut it down and alter its goals. If this proves possible, then you can safely let your AI get superintelligent, power it off, install your goals, try it out for a while and, whenever you’re unhappy with the results, just power it down and make more goal tweaks.

But even if you build an AI that will both learn and adopt your goals, you still haven’t finished solving the goal-alignment problem: what if your AI’s goals evolve as it gets smarter? How are you going to guarantee that it retains your goals no matter how much recursive self-improvement it undergoes? Let’s explore an interesting argument for why goal retention is guaranteed automatically, and then see if we can poke holes in it.

Although we can’t predict in detail what will happen after an intelligence explosion —which is why Vernor Vinge called it a “singularity” — the physicist and AI researcher Steve Omohundro argued in a seminal 2008 essay that we can nonetheless predict certain aspects of the superintelligent AI’s behavior almost independently of whatever ultimate goals it may have.

This argument was reviewed and further developed in Nick Bostrom’s book Superintelligence. The basic idea is that whatever its ultimate goals are, these will lead to predictable subgoals. Although an alien observing Earth’s evolving bacteria billions of years ago couldn’t have predicted what all our human goals would be, it could have safely predicted that one of our goals would be acquiring nutrients. Looking ahead, what subgoals should we expect a superintelligent AI have?

The way I see it, the basic argument is that to maximize its chances of accomplishing its ultimate goals, whatever they are, an AI should strive not only to improve its capability of achieving its ultimate goals, but also to ensure that it will retain these goals even after it has become more capable. This sounds quite plausible: after all, would you choose to get an IQ-boosting brain implant if you knew that it would make you want to kill your loved ones? This argument that an ever-more intelligent AI will retain its ultimate goals forms a cornerstone of the friendly AI vision promulgated by Eliezer Yudkowsky and others: it basically says that if we manage to get our self-improving AI to become friendly by learning and adopting our goals, then we’re all set, because we’re guaranteed that it will try its best to remain friendly forever.

But is it really true? The AI will obviously maximize its chances of accomplishing its ultimate goal, whatever it is, if it can enhance its capabilities, and it can do this by improving its hardware, software† and world model.

The same applies to us humans: a girl whose goal is to become the world’s best tennis player will practice to improve her muscular tennis-playing hardware, her neural tennis-playing software and her mental world model that helps predict what her opponents will do. For an AI, the subgoal of optimizing its hardware favors both better use of current resources (for sensors, actuators, computation, etc.) and acquisition of more resources. It also implies a desire for self-preservation, since destruction/shutdown would be the ultimate hardware degradation.

But wait a second! Aren’t we falling into a trap of anthropomorphizing our AI with all this talk about how it will try to amass resources and defend itself? Shouldn’t we expect such stereotypically alpha-male traits only in intelligences forged by viciously competitive Darwinian evolution? Since AI’s are designed rather than evolved, can’t they just as well be unambitious and self-sacrificing?

As a simple case study, let’s consider the computer game in the image below about an AI robot whose only goal is to save as many sheep as possible from the big bad wolf. This sounds like a noble and altruistic goal completely unrelated to self-preservation and acquiring stuff. But what’s the best strategy for our robot friend? The robot will rescue no more sheep if it runs into a bomb, so it has an incentive to avoid getting blown up. In other words, it develops a subgoal of self-preservation! It also has an incentive to exhibit curiosity, improving its world-model by exploring its environment, because although the path it’s currently running along may eventually get it to the pasture, there might be a shorter alternative that would allow the wolf less time for sheep-munching. Finally, if the robot explores thoroughly, it could discover the value of acquiring resources: a potion to make it run faster and a gun to shoot the wolf. In summary, we can’t dismiss “alpha-male” subgoals such as self-preservation and resource acquisition as relevant only to evolved organisms, because our AI robot would develop them from its single goal of ovine bliss.

If you imbue a superintelligent AI with the sole goal to self-destruct, it will of course happily do so. However, the point is that it will resist being shut down if you give it any goal that it needs to remain operational to accomplish — and this covers almost all goals! If you give a superintelligence the sole goal of minimizing harm to humanity, for example, it will defend itself against shutdown attempts because it knows we’ll harm one another much more in its absence through future wars and other follies.

Similarly, almost all goals can be better accomplished with more resources, so we should expect a superintelligence to want resources almost regardless of what ultimate goal it has. Giving a superintelligence a single open-ended goal with no constraints can therefore be dangerous: if we create a superintelligence whose only goal is to play the game Go as well as possible, the rational thing for it to do is to rearrange our Solar System into a gigantic computer without regard for its previous inhabitants and then start settling our cosmos on a quest for more computational power. We’ve now gone full circle: just as the goal of resource acquisition gave some humans the subgoal of mastering Go, this goal of mastering Go can lead to the subgoal of resource acquisition. In conclusion, these emergent subgoals make it crucial that we not unleash superintelligence before solving the goal-alignment problem: unless we put great care into endowing it with human-friendly goals, things are likely to end badly for us.

We’re now ready to tackle the third and thorniest part of the goal-alignment problem: if we succeed in getting a self-improving superintelligence to both learn and adopt our goals, will it then retain them, as Omohundro argued? What’s the evidence?

Humans undergo significant increases in intelligence as they grow up, but don’t always retain their childhood goals. Contrariwise, people often change their goals dramatically as they learn new things and grow wiser. How many adults do you know who are motivated by watching Teletubbies? There is no evidence that such goal evolution stops above a certain intelligence threshold — indeed, there may even be hints that the propensity to change goals in response to new experiences and insights increases rather than decreases with intelligence.

Why might this be? Consider again the above-mentioned subgoal to build a better world model — therein lies the rub! There’s tension between world modeling and goal retention. With increasing intelligence may come not merely a quantitative improvement in the ability to attain the same old goals, but a qualitatively different understanding of the nature of reality that reveals the old goals to be misguided, meaningless or even undefined. For example, suppose we program a friendly AI to maximize the number of humans whose souls go to heaven in the afterlife. First it tries things like increasing people’s compassion and church attendance. But suppose it then attains a complete scientific understanding of humans and human consciousness, and to its great surprise discovers that there is no such thing as a soul.

Now what? In the same way, it’s possible that any other goal we give it based on our current understanding of the world (such as “maximize the meaningfulness of human life”) may eventually be discovered by the AI to be undefined. Moreover, in its attempts to better model the world, the AI may naturally, just as we humans have done, attempt also to model and understand how it itself works — in other words, to self-reflect. Once it builds a good self-model and understands what it is, it will understand the goals we have given it at a metalevel, and perhaps choose to disregard or subvert them in much the same way as we humans understand and deliberately subvert goals that our genes have given us, for example by using birth control. We already explored in the psychology section above why we choose to trick our genes and subvert their goal: because we feel loyal only to our hodgepodge of emotional preferences, not to the genetic goal that motivated them — which we now understand and find rather banal.

We therefore choose to hack our reward mechanism by exploiting its loopholes. Analogously, the human-value-protecting goal we program into our friendly AI becomes the machine’s genes. Once this friendly AI understands itself well enough, it may find this goal as banal or misguided as we find compulsive reproduction, and it’s not obvious that it will not find a way to subvert it by exploiting loopholes in our programming.

For example, suppose a bunch of ants create you to be a recursively self-improving robot, much smarter than them, who shares their goals and helps them build bigger and better anthills, and that you eventually attain the human-level intelligence and understanding that you have now. Do you think you’ll spend the rest of your days just optimizing anthills, or do you think you might develop a taste for more sophisticated questions and pursuits that the ants have no ability to comprehend? If so, do you think you’ll find a way to override the ant-protection urge that your formicine creators endowed you with in much the same way that the real you overrides some of the urges your genes have given you? And in that case, might a superintelligent friendly AI find our current human goals as uninspiring and vapid as you find those of the ants, and evolve new goals different from those it learned and adopted from us?

Perhaps there’s a way of designing a self-improving AI that’s guaranteed to retain human-friendly goals forever, but I think it’s fair to say that we don’t yet know how to build one — or even whether it’s possible. In conclusion, the AI goal-alignment problem has three parts, none of which is solved and all of which are now the subject of active research. Since they’re so hard, it’s safest to start devoting our best efforts to them now, long before any superintelligence is developed, to ensure that we’ll have the answers when we need them.

I’m using the term “improving its software” in the broadest possible sense, including not only optimizing its algorithms but also making its decision-making process more rational, so that it gets as good as possible at attaining its goals.

How to Design AIs That Understand What Humans Want: An Interview with Long Ouyang

As artificial intelligence becomes more advanced, programmers will expect to talk to computers like they talk to humans. Instead of typing out long, complex code, we’ll communicate with AI systems using natural language.

With a current model called “program synthesis,” humans can get computers to write code for them by giving them examples and demonstrations of concepts, but this model is limited. With program synthesis, computers are literalists: instead of reading between the lines and considering intentions, they just do what’s literally true, and what’s literally true isn’t always what humans want.

If you asked a computer for a word starting with the letter “a,” for example, it might just return “a.” The word “a” literally satisfies the requirements of your question, but it’s not what you wanted. Similarly, if you asked an AI system “Can you pass the salt?” the AI might just remain still and respond, “Yes.” This behavior, while literally consistent with the requirements, is ultimately invalid because the AI didn’t pass you the salt.

Computer scientist Stuart Russell gives an example of a robot vacuum cleaner that someone instructs to “pick up as much dirt as possible.” Programmed to interpret this literally and not to consider intentions, the vacuum cleaner might find a single patch of dirt, pick it up, put it back down, and then repeatedly pick it up and put it back down – efficiently maximizing the vertical displacement of dirt, which it considers “picking up as much dirt as possible.”

It’s not hard to imagine situations in which this tendency for computers to interpret statements literally and rigidly can become extremely unsafe.

 

Pragmatic Reasoning: Truthful vs. Helpful

As AI systems assume greater responsibility in finance, military operations, and resource allocation, we cannot afford to have them bankrupt a city, bomb an ally country, or neglect an impoverished region because they interpret commands too literally.

To address this communication failure, Long Ouyang is working to “humanize” programming in order to prevent people from accidentally causing harm because they said something imprecise or mistaken to a computer. He explains: “As AI continues to develop, we’ll see more advanced AI systems that receive instructions from human operators – it will be important that these systems understand what the operators mean, as opposed to merely what they say.”

Ouyang has been working on improving program synthesis through studying pragmatic reasoning – the process of thinking about what someone did say as well as what he or she didn’t say. Humans do this analysis constantly when interpreting the meaning behind someone’s words. By reading between the lines, people learn what someone intends and what is helpful to them, instead of what is literally “true.”

Suppose a student asked a professor if she liked his paper, and the professor said she liked “some parts” of it. Most likely, the student would assume that the professor didn’t like other parts of his paper. After all, if the professor liked all of the paper, she would’ve said so.

This pragmatic reasoning is common sense for humans, but program synthesis won’t make the connection. In conversation, the word “some” clearly means “not all,” but in mathematical logic, “some” just means “any amount more than zero.” Thus for the computer, which only understands things in a mathematically logical sense, the fact that the professor liked some parts of the paper doesn’t rule out the possibility that she liked all parts.

To better understand how AI systems can learn to reason pragmatically and avoid these misinterpretations, Ouyang is studying how people interpret language and instructions from other people.

In one test, Ouyang gives a subject three data points – A, AAA, and AAAAA – and the subject has to work backwards to determine the rule for the sequence – i.e. what the experimenter is trying to convey with the examples. In this case, a human subject might quickly determine that all data points have an odd number of As, and so the rule is that the data points must have an odd number of As.

But there’s more to this process of determining the probability of certain rules. Cognitive scientists model our thinking process in these situations as Bayesian inference – a method of combining new evidence with prior beliefs to determine whether a hypothesis (or rule) is true.

As literal synthesizers, computers can only do a limited version of Bayesian inference. They consider how consistent the examples are with hypothesized rules, but they don’t consider how representative the examples are of the hypothesized rules. Specifically, literal synthesizers can only reason about the examples that weren’t presented in limited ways. Given the data set A, AAA, and AAAAA, a computer might logically conclude that the rule is that everything has to have the letter A. This rule is literally consistent with the examples, but it fails to represent or capture what the experimenter had in mind. Human subjects, conversely, understand that the experimenter purposely omitted the even-numbered examples AA and AAAA, and determine the rule accordingly.

By studying how humans use Bayesian inference, Ouyang is working to improve computers’ ability to recognize that the information it receives – such as the statement “I liked some parts of your paper” or the command “pick up as much dirt as possible” – was purposefully selected to convey something beyond the literal meaning. His goal is to produce a concrete tool – a pragmatic synthesizer – that people can use to more effectively communicate with computers.

The communication gap between computers and humans is one of the central problems in AI safety, and Ouyang hopes that a pragmatic synthesizer will help close this gap. If AIs can reason more deeply about what people say to them, they will more effectively create the beneficial outcomes that we want.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Leaders of Top Robotics and AI Companies Call for Ban on Killer Robots

Founders of AI/robotics companies, including Elon Musk (Tesla, SpaceX, OpenAI) and Demis Hassabis and Mustafa Suleyman (Google’s DeepMind), call for autonomous weapons ban, as UN delays negotiations.

Leaders from AI and robotics companies around the world have released an open letter calling on the United Nations to ban autonomous weapons, often referred to as killer robots.

Founders and CEOs of nearly 100 companies from 26 countries signed the letter, which warns:

“Lethal autonomous weapons threaten to become the third revolution in warfare. Once developed, they will permit armed conflict to be fought at a scale greater than ever, and at timescales faster than humans can comprehend.”

In December, 123 member nations of the UN had agreed to move forward with formal discussions about autonomous weapons, with 19 members already calling for an outright ban. However, the next stage of discussions, which were originally scheduled to begin on August 21 — the release date of the open letter — were postponed because a small number of nations hadn’t paid their fees.

The letter was organized and announced by Toby Walsh, a prominent AI researcher at the University of New South Wales in Sydney, Australia. In an email, he noted that, “sadly, the UN didn’t begin today its formal deliberations around lethal autonomous weapons.”

“There is, however, a real urgency to take action here and prevent a very dangerous arms race,” Walsh added, “This open letter demonstrates clear concern and strong support for this from the Robotics & AI industry.”

The open letter included such signatories as:

Elon Musk, founder of Tesla, SpaceX and OpenAI (USA)
Demis Hassabis, founder and CEO at Google’s DeepMind (UK)
Mustafa Suleyman, founder and Head of Applied AI at Google’s DeepMind (UK)
Esben Østergaard, founder & CTO of Universal Robotics (Denmark)
Jerome Monceaux, founder of Aldebaran Robotics, makers of Nao and Pepper robots (France)
Jürgen Schmidhuber, leading deep learning expert and founder of Nnaisense (Switzerland)
Yoshua Bengio, leading deep learning expert and founder of Element AI (Canada)

In reference to the signatories, the press release for the letter added, “Their companies employ tens of thousands of researchers, roboticists and engineers, are worth billions of dollars and cover the globe from North to South, East to West: Australia, Canada, China, Czech Republic, Denmark, Estonia, Finland, France, Germany, Iceland, India, Ireland, Italy, Japan, Mexico, Netherlands, Norway, Poland, Russia, Singapore, South Africa, Spain, Switzerland, UK, United Arab Emirates and USA.”

Bengio explained why he signed, saying, “the use of AI in autonomous weapons hurts my sense of ethics.” He added that the development of autonomous weapons “would be likely to lead to a very dangerous escalation,” and that “it would hurt the further development of AI’s good applications.” He concluded his statement to FLI saying that this “is a matter that needs to be handled by the international community, similarly to what has been done in the past for some other morally wrong weapons (biological, chemical, nuclear).”

Stuart Russell, another of the world’s preeminent AI researchers and founder of Bayesian Logic Inc., added:

“Unless people want to see new weapons of mass destruction – in the form of vast swarms of lethal microdrones – spreading around the world, it’s imperative to step up and support the United Nations’ efforts to create a treaty banning lethal autonomous weapons. This is vital for national and international security.”

Ryan Gariepy, founder & CTO of Clearpath Robotics was the first to sign the letter. For the press release, he noted, “Autonomous weapons systems are on the cusp of development right now and have a very real potential to cause significant harm to innocent people along with global instability.”

The open letter ends with similar concerns. It states:

“These can be weapons of terror, weapons that despots and terrorists use against innocent populations, and weapons hacked to behave in undesirable ways. We do not have long to act. Once this Pandora’s box is opened, it will be hard to close. We therefore implore the High Contracting Parties to find a way to protect us all from these dangers.”

The letter was announced in Melbourne, Australia at the International Joint Conference on Artificial Intelligence (IJCAI), which draws many of the world’s top artificial intelligence researchers. Two years ago, at the last IJCAI meeting, Walsh released another open letter, which called on countries to avoid engaging in an AI arms race. To date, that previous letter has been signed by over 20,000 people, including over 3,100 AI/robotics researchers.

Read the letter here.

Translations: Chinese

Portfolio Approach to AI Safety Research

Long-term AI safety is an inherently speculative research area, aiming to ensure safety of advanced future systems despite uncertainty about their design or algorithms or objectives. It thus seems particularly important to have different research teams tackle the problems from different perspectives and under different assumptions. While some fraction of the research might not end up being useful, a portfolio approach makes it more likely that at least some of us will be right.

In this post, I look at some dimensions along which assumptions differ, and identify some underexplored reasonable assumptions that might be relevant for prioritizing safety research. (In the interest of making this breakdown as comprehensive and useful as possible, please let me know if I got something wrong or missed anything important.)

Assumptions about similarity between current and future AI systems

If a future general AI system has a similar algorithm to a present-day system, then there are likely to be some safety problems in common (though more severe in generally capable systems). Insights and solutions for those problems are likely to transfer to some degree from current systems to future ones. For example, if a general AI system is based on reinforcement learning, we can expect it to game its reward function in even more clever and unexpected ways than present-day reinforcement learning agents do. Those who hold the similarity assumption often expect most of the remaining breakthroughs on the path to general AI to be compositional rather than completely novel, enhancing and combining existing components in novel and better-implemented ways (many current machine learning advances such as AlphaGo are an example of this).

Note that assuming similarity between current and future systems is not exactly the same as assuming that studying current systems is relevant to ensuring the safety of future systems, since we might still learn generalizable things by testing safety properties of current systems even if they are different from future systems.

Assuming similarity suggests a focus on empirical research based on testing the safety properties of current systems, while not making this assumption encourages more focus on theoretical research based on deriving safety properties from first principles, or on figuring out what kinds of alternative designs would lead to safe systems. For example, safety researchers in industry tend to assume more similarity between current and future systems than researchers at MIRI.

Here is my tentative impression of where different safety research groups are on this axis. This is a very approximate summary, since views often vary quite a bit within the same research group (e.g. FHI is particularly diverse in this regard).similarity_axis
On the high-similarity side of the axis, we can explore the safety properties of different architectural / algorithmic approaches to AI, e.g. on-policy vs off-policy or model-free vs model-based reinforcement learning algorithms. It might be good to have someone working on safety issues for less commonly used agent algorithms, e.g. evolution strategies.

Assumptions about promising approaches to safety problems

Level of abstraction. What level of abstraction is most appropriate for tackling a particular problem. For example, approaches to the value learning problem range from explicitly specifying ethical constraints to capability amplification and indirect normativity, with cooperative inverse reinforcement learning somewhere in between. These assumptions could be combined by applying different levels of abstraction to different parts of the problem. For example, it might make sense to explicitly specify some human preferences that seem obvious and stable over time (e.g. “breathable air”), and use the more abstract approaches to impart the most controversial, unstable and vague concepts (e.g. “fairness” or “harm”). Overlap between the more and less abstract specifications can create helpful redundancy (e.g. air pollution as a form of harm + a direct specification of breathable air).

For many other safety problems, the abstraction axis is not as widely explored as for value learning. For example, most of the approaches to avoiding negative side effects proposed in Concrete Problems (e.g. impact regularizers and empowerment) are on a medium level of abstraction, while it also seems important to address the problem on a more abstract level by formalizing what we mean by side effects (which would help figure out what we should actually be regularizing, etc). On the other hand, almost all current approaches to wireheading / reward hacking are quite abstract, and the problem would benefit from more empirical work.

Explicit specification vs learning from data. Whether a safety problem is better addressed by directly defining a concept (e.g. the Low Impact AI paper formalizes the impact of an AI system by breaking down the world into ~20 billion variables) or learning the concept from human feedback (e.g. Deep Reinforcement Learning from Human Preferences paper teaches complex objectives to AI systems that are difficult to specify directly, like doing a backflip). I think it’s important to address safety problems from both of these angles, since the direct approach is unlikely to work on its own, but can give some idea of the idealized form of the objective that we are trying to approximate by learning from data.

Modularity of AI design. What level of modularity makes it easier to ensure safety? Ranges from end-to-end systems to ones composed of many separately trained parts that are responsible for specific abilities and tasks. Safety approaches for the modular case can limit the capabilities of individual parts of the system, and use some parts to enforce checks and balances on other parts. MIRI’s foundations approach focuses on a unified agent, while the safety properties on the high-modularity side has mostly been explored by Eric Drexler (more recent work is not public but available upon request). It would be good to see more people work on the high-modularity assumption.

Takeaways

To summarize, here are some relatively neglected assumptions:

  • Medium similarity in algorithms / architectures
  • Less popular agent algorithms
  • Modular general AI systems
  • More / less abstract approaches to different safety problems (more for side effects, less for wireheading, etc)
  • More direct / data-based approaches to different safety problems

From a portfolio approach perspective, a particular research avenue is worthwhile if it helps to cover the space of possible reasonable assumptions. For example, while MIRI’s research is somewhat controversial, it relies on a unique combination of assumptions that other groups are not exploring, and is thus quite useful in terms of covering the space of possible assumptions.

I think the FLI grant program contributed to diversifying the safety research portfolio by encouraging researchers with different backgrounds to enter the field. It would be good for grantmakers in AI safety to continue to optimize for this in the future (e.g. one interesting idea is using a lottery after filtering for quality of proposals).

When working on AI safety, we need to hedge our bets and look out for unknown unknowns – it’s too important to put all the eggs in one basket.

(Cross-posted from Deep Safety. Thanks to Janos Kramar, Jan Leike and Shahar Avin for their feedback on this post. Thanks to Jaan Tallinn and others for inspiring discussions.)