Artificial Intelligence: The Challenge to Keep It Safe

Safety Principle: AI systems should be safe and secure throughout their operational lifetime and verifiably so where applicable and feasible.

When a new car is introduced to the world, it must pass various safety tests to satisfy not just government regulations, but also public expectations. In fact, safety has become a top selling point among car buyers.

And it’s not just cars. Whatever the latest generation of any technology happens to be — from appliances to airplanes — manufacturers know that customers expect their products to be safe from start to finish.

Artificial intelligence is no different. So, on the face of it, the Safety Principle seems like a “no brainer,” as Harvard psychologist Joshua Greene described it. It’s obviously not in anyone’s best interest for an AI product to injure its owner or anyone else. But, as Greene and other researchers highlight below, this principle is much more complex than it appears at first glance.

“This is important, obviously,” said University of Connecticut philosopher Susan Schneider, but she expressed uncertainty about our ability to verify that we can trust a system as it gets increasingly intelligent. She pointed out that at a certain level of intelligence, the AI will be able to rewrite its own code, and with superintelligent systems “we may not even be able to understand the program to begin with.”

What Is AI Safety?

This principle gets to the heart of the AI safety research initiative: how can we ensure safety for a technology that is designed to learn how to modify its own behavior?

Artificial intelligence is designed so that it can learn from interactions with its surroundings and alter its behavior accordingly, which could provide incredible benefits to humanity. Because AI can address so many problems more effectively than people, it has huge potential to improve health and wellbeing for everyone. But it’s not hard to imagine how this technology could go awry. And we don’t need to achieve superintelligence for this to become a problem.

Microsoft’s chatbot, Tay, is a recent example of how an AI can learn negative behavior from its environment, producing results quite the opposite from what its creators had in mind. Meanwhile, the Tesla car accident, in which the vehicle mistook a white truck for a clear sky, offers an example of an AI misunderstanding its surrounding and taking deadly action as a result.

Researchers can try to learn from AI gone astray, but current designs often lack transparency, and much of today’s artificial intelligence is essentially a black box. AI developers can’t always figure out how or why AIs take various actions, and this will likely only grow more challenging as AI becomes more complex.

However, Ian Goodfellow, a research scientist at Google Brain, is hopeful, pointing to efforts already underway to address these concerns.

“Applying traditional security techniques to AI gives us a concrete path to achieving AI safety,” Goodfellow explains. “If we can design a method that prevents even a malicious attacker from causing an AI to take an undesirable action, then it is even less likely that the AI would choose an undesirable action independently.”

AI safety may be a challenge, but there’s no reason to believe it’s insurmountable. So what do other AI experts say about how we can interpret and implement the Safety Principle?

What Does ‘Verifiably’ Mean?

‘Verifiably’ was the word that caught the eye of many researchers as a crucial part of this Principle.

John Havens, an Executive Director with IEEE, first considered the Safety Principle in its entirety, saying,  “I don’t know who wouldn’t say AI systems should be safe and secure. … ‘Throughout their operational lifetime’ is actually the more important part of the sentence, because that’s about sustainability and longevity.”

But then, he added, “My favorite part of the sentence is ‘and verifiably so.’ That is critical. Because that means, even if you and I don’t agree on what ‘safe and secure’ means, but we do agree on verifiability, then you can go, ‘well, here’s my certification, here’s my checklist.’ And I can go, ‘Great, thanks.’ I can look at it, and say, ‘oh, I see you got things 1-10, but what about 11-15?’ Verifiably is a critical part of that sentence.”

AI researcher Susan Craw noted that the Principle “is linked to transparency.” She explained, “Maybe ‘verifiably so’ would be possible with systems if they were a bit more transparent about how they were doing things.”

Greene also noted the complexity and challenge presented by the Principle when he suggested:

“It depends what you mean by ‘verifiably.’ Does ‘verifiably’ mean mathematically, logically proven? That might be impossible. Does ‘verifiably’ mean you’ve taken some measures to show that a good outcome is most likely? If you’re talking about a small risk of a catastrophic outcome, maybe that’s not good enough.”

Safety and Value Alignment

Any consideration of AI safety must also include value alignment: how can we design artificial intelligence that can align with the global diversity of human values, especially taking into account that, often, what we ask for is not necessarily what we want?

“Safety is not just a technical problem,” Patrick Lin, a philosopher at California Polytechnic told me. “If you just make AI that can align perfectly with whatever values you set it to, well the problem is, people can have a range of values, and some of them are bad. Just merely matching AI, aligning it to whatever value you specify I think is not good enough. It’s a good start, it’s a good big picture goal to make AI safe, and the technical element is a big part of it; but again, I think safety also means policy and norm-setting.”

And the value-alignment problem becomes even more of a safety issue as the artificial intelligence gets closer to meeting — and exceeding — human intelligence.

“Consider the example of the Japanese androids that are being developed for elder care,” said Schneider. “They’re not smart; right now, the emphasis is on physical appearance and motor skills. But imagine when one of these androids is actually engaged in elder care … It has to multitask and exhibit cognitive flexibility. … That raises the demand for household assistants that are AGIs. And once you get to the level of artificial general intelligence, it’s harder to control the machines. We can’t even make sure fellow humans have the right goals; why should we think AGI will have values that align with ours, let alone that a superintelligence would.”

Defining Safety

But perhaps it’s time to reconsider the definition of safety, as Lin alluded to above. Havens also requested “words that further explain ‘safe and secure,’” suggesting that we need to expand the definition beyond “physically safe” to “provide increased well being.”

Anca Dragan, an associate professor at UC Berkeley, was particularly interested in the definition of “safe.”

“We all agree that we want our systems to be safe,” said Dragan. “More interesting is what do we mean by ‘safe’, and what are acceptable ways of verifying safety.

“Traditional methods for formal verification that prove (under certain assumptions) that a system will satisfy desired constraints seem difficult to scale to more complex and even learned behavior. Moreover, as AI advances, it becomes less clear what these constraints should be, and it becomes easier to forget important constraints. … we need to rethink what we mean by safe, perhaps building in safety from the get-go as opposed to designing a capable system and adding safety after.”

What Do You Think?

What does it mean for a system to be safe? Does it mean the owner doesn’t get hurt? Are “injuries” limited to physical ailments, or does safety also encompass financial or emotional damage? And what if an AI is being used for self-defense or by the military? Can an AI harm an attacker? How can we ensure that a robot or software program or any other AI system remains verifiably safe throughout its lifetime, even as it continues to learn and develop on its own? How much risk are we willing to accept in order to gain the potential benefits that increasingly intelligent AI — and ultimately superintelligence — could bestow?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Countries Sign UN Treaty to Outlaw Nuclear Weapons

Update 9/24/17: 53 countries have now signed and 3 have ratified.

Today, 50 countries took an important step toward a nuclear-free world by signing the United Nations Treaty on the Prohibition of Nuclear Weapons. This is the first treaty to legally ban nuclear weapons, just as we’ve seen done previously with chemical and biological weapons.

A Long Time in the Making

In 1933, Leo Szilard first came up with the idea of a nuclear chain reaction. Only a few years later, the Manhattan Project was underway, culminating in the nuclear attacks against Hiroshima and Nagasaki in 1945. In the following decades of the Cold War, the U.S. and Russia amassed arsenals that peaked at over 70,000 nuclear weapons in total, though that number is significantly less today. The U.K, France, China, Israel, India, Pakistan, and North Korea have also built up their own, much smaller arsenals.

Over the decades, the United Nations has established many treaties relating to nuclear weapons, including the non-proliferation treaty, START I, START II, the Comprehensive Nuclear Test Ban Treaty, and New START. Though a few other countries began nuclear weapons programs, most of those were abandoned, and the majority of the world’s countries have rejected nuclear weapons outright.

Now, over 70 years since the bombs were first dropped on Japan, the United Nations finally has a treaty outlawing nuclear weapons.

The Treaty

The Treaty on the Prohibition of Nuclear Weapons was adopted on July 7, with a vote of approval from 122 countries. As part of the treaty, the states who sign agree that they will never “[d]evelop, test, produce, manufacture, otherwise acquire, possess or stockpile nuclear weapons or other nuclear explosive devices.” Signatories also promise not to assist other countries with such efforts, and no signatory will “[a]llow any stationing, installation or deployment of any nuclear weapons or other nuclear explosive devices in its territory or at any place under its jurisdiction or control.”

Not only had 50 countries signed the treaty at the time this article was written, but 3 of them also already ratified it. The treaty will enter into force 90 days after it’s ratified by 50 countries.

The International Campaign to Abolish Nuclear Weapons (ICAN) is tracking progress of the treaty, with a list of countries that have signed and ratified it so far.

At the ceremony, UN Secretary General António Guterres said, “The Treaty on the Prohibition of Nuclear Weapons is the product of increasing concerns over the risk posed by the continued existence of nuclear weapons, including the catastrophic humanitarian and environmental consequences of their use.”

Still More to Do

Though countries that don’t currently have nuclear weapons are eager to see the treaty ratified, no one is foolish enough to think that will magically rid the world of nuclear weapons.

“Today we rightfully celebrate a milestone.  Now we must continue along the hard road towards the elimination of nuclear arsenals,” Guterres added in his statement.

There are still over 15,000 nuclear weapons in the world today. While that’s significantly less than we’ve had in the past, it’s still more than enough to kill most people on earth.

The U.S. and Russia hold most of these weapons, but as we’re seeing from the news out of North Korea, a country doesn’t need to have thousands of nuclear weapons to present a destabilizing threat.

Susi Snyder, author of Pax’s Don’t Bank on the Bomb and a leading advocate of the treaty, told FLI:

“The countries signing the treaty are the responsible actors we need in these times of uncertainty, fire, fury, and devastating threats. They show it is possible and preferable to choose diplomacy over war.

Earlier this summer, some of the world’s leading scientists also came together in support of the nuclear ban with this video that was presented to the United Nations:

Stanislav Petrov

The signing of the treaty has occurred within a week of both the news of the death of Stanislav Petrov, as well as of Petrov day. On September 26, 1983, Petrov chose to follow his gut rather than rely on what turned out to be faulty satellite data. In doing so, he prevented what could have easily escalated into full-scale global nuclear war.

Stanislav Petrov, the Man Who Saved the World, Has Died

September 23, 1983: Soviet Union Detects Incoming Missiles

A Soviet early warning satellite showed that the United States had launched five land-based missiles at the Soviet Union. The alert came at a time of high tension between the two countries, due in part to the U.S. military buildup in the early 1980s and President Ronald Reagan’s anti-Soviet rhetoric. In addition, earlier in the month the Soviet Union shot down a Korean Airlines passenger plane that strayed into its airspace, killing almost 300 people. Stanislav Petrov, the Soviet officer on duty, had only minutes to decide whether or not the satellite data were a false alarm. Since the satellite was found to be operating properly, following procedures would have led him to report an incoming attack. Going partly on gut instinct and believing the United States was unlikely to fire only five missiles, he told his commanders that it was a false alarm before he knew that to be true. Later investigations revealed that reflection of the sun on the tops of clouds had fooled the satellite into thinking it was detecting missile launches (Accidental Nuclear War: a Timeline of Close Calls).

Petrov is widely credited for having saved millions if not billions of people with his decision to ignore satellite reports, preventing accidental escalation into what could have become a full-scale nuclear war. This event was turned into the movie “The Man Who Saved the World,” and Petrov was honored at the United Nations and given the World Citizen Award.

All of us at FLI were saddened to learn that Stanislav Petrov passed away this past May. News of his death was announced this weekend. Petrov was to be honored during the release of a new documentary, also called The Man Who Saved the World, in February of 2018. Stephen Mao, who is an executive producer of this documentary, told FLI that though they had originally planned to honor Petrov in person at February’s Russian theatrical premier, “this will now be an event where we will eulogize and remember Stanislav for his contribution to the world.”

Jakob Staberg, the movie’s producer, said:

“Stanislav saved the world but lost everything and was left alone. Taking part in our film, The Man Who Saved the World, his name and story came out to the whole world. Hopefully the actions of Stanislav will inspire other people to take a stand for good and not to forget that the nuclear threat is still very real. I will remember Stanislav’s own humble words about his actions: ‘I just was at the right place at the right time’. Yes, you were Stanislav. And even though you probably would argue that I am wrong, I am happy it was YOU who was there in that moment. Not many people would have the courage to do what you did. Thank you.”

You can read more about Petrov’s life and heroic actions in the New York Times obituary.

Understanding the Risks and Limitations of North Korea’s Nuclear Program

By Kirsten Gronlund

Late last month, North Korea launched a ballistic missile test whose trajectory arced over Japan. And this past weekend, Pyongyang flaunted its nuclear capabilities with an underground test of what it claims was a hydrogen bomb: a more complicated—and powerful—alternative to the atomic bombs it has previously tested.

Though North Korea has launched rockets over its eastern neighbor twice before—in 1998 and 2009—those previous launches carried satellites, not warheads. And the reasoning behind those two previous launches was seemingly innocuous: eastern-directed launches use the earth’s spin to most effectively put a satellite in orbit. Since 2009, North Korea has taken to launching its satellites southward, sacrificing maximal launch conditions to keep the peace with Japan. This most recent launch, however, seemed intentionally designed to aggravate tensions not only with Japan but also with the U.S. And while there is no way to verify North Korea’s claim that it tested a hydrogen bomb, in such a tense environment the claim itself is enough to provoke Washington.

What We Know

In light of these and other recent developments, I spoke with Dr. David Wright, an expert on North Korean nuclear missiles at the Union of Concerned Scientists, to better understand the real risks associated with North Korea’s nuclear program. He described what he calls the “big question”: now that its missile program is advancing rapidly, can North Korea build good enough—that is, small enough, light enough, and rugged enough—nuclear weapons to be carried by these missiles?

Pyongyang has now successfully detonated nuclear weapons in six underground tests, but these tests have been carried out in ideal conditions, far from the reality of a ballistic launch. Wright and others believe that North Korea likely has warheads that can be delivered via short-range missiles that can reach South Korea or Japan. They have deployed such missiles for years. But it remains unclear whether North Korean warheads would be deliverable via long-range missiles.

Until last Monday’s launch, North Korea has sought to avoid provoking its neighbors by not conducting missile tests that would pass over other countries. Instead it has tested its missiles by shooting them upwards on highly lofted trajectories that land them in the Sea of Japan. This has caused some confusion about the range that North Korean missiles have achieved. Wright, however, uses height data from these launches to calculate the potential range that its missiles would reach on standard trajectories.

To date, North Korea’s farthest test launch—in July of this year—had the range to reach large cities in the U.S. mainland. That range, however, depends on the weight of the warhead used in the tests, a factor that remains unknown. Thus while North Korea is capable of launching missiles that would hit the U.S., it is unclear whether such missiles could actually deliver a nuclear warhead to that range.

A second key question, according to Wright, is one of numbers: how many missiles and warheads do the North Koreans have? Dr. Siegfried Hecker, former head of Los Alamos weapons laboratory, makes the following estimates based in part on visits he has made to North Korea’s Yongbyon laboratory. In terms of nuclear material, Hecker suggests that the North Koreans have “20 to 40 kilograms plutonium and 200 to 450 kilograms highly enriched uranium.” This material, he estimates, would “suffice for perhaps 20 to 25 nuclear weapons, not the 60 reported in the leaked intelligence estimate.” Based on past underground tests, it was estimated that the biggest yield of a North Korean warhead was about the size of the bomb that destroyed Hiroshima—which, though potentially devastating, is still about 20 times smaller than most U.S. warheads. The test this past weekend outsized its largest previous yield by a factor of five or more.

As for missiles, Wright says estimates suggest that North Korea may have a few hundred short- and medium-range missiles. The number of long-range missiles, however, is unknown—as is the speed with which new ones could be built. In the near term, Wright believes the number is likely to be small.

What seems clear is that Kim Jong Un, following his father’s death, began pouring money and resources into developing weapons technology and expertise. Since Kim Jong Un has taken power, the country’s rate of missile tests has skyrocketed: since last June, it has performed roughly 30 tests.

It has also unveiled a surprising number of new types of missiles. For years, the longest-range North Korean missiles reached about 1300 km—just putting Japan within range. In mid-May of this year, however, North Korea launched a missile with a potential range (depending on its payload) of more than 4000 km, for the first time putting Guam—which is 3500 km from North Korea—in reach. Then in July, that range increased again. The first launch in that month could reach 7000 km; the second—their current record—could travel more than 10,000 km, about the distance from North Korea to Chicago.

An Existential Risk?

On its own, the North Korean nuclear arsenal does not pose an existential risk—it is too small. According to Wright, the consequences of a North Korean nuclear strike, if successful, would be catastrophic—but not on an existential scale. He worries, though, about how the U.S. might respond. As Wright puts it, “When people start talking about using nuclear weapons, there’s a huge uncertainty about how countries will react.”

That said, the U.S. has overwhelming conventional military capabilities that could devastate North Korea. A nuclear response would not be necessary to neutralize any further threat from Pyongyang. But there are people who would argue that failure to launch a nuclear response would weaken deterrence. “I think,” says Wright, “that if North Korea launched a nuclear missile against its neighbors or the United States, there would be tremendous pressure to respond with nuclear weapons.”

Wright notes that moments of crisis have been shown to produce unpredictable responses: “There would be no reason for the U.S. to use nuclear weapons, but there is evidence to suggest that in high pressure situations, people don’t always think these things through. For example, we know that there have been war simulations that the U.S. has done where the adversary using anti-satellite weapons against the United States has led to the U.S. using nuclear weapons.”

Wright also worries about accidents, errors, and misinterpretations. While North Korea does not have the ability to detect launches or incoming missiles, it does have a lot of anti-aircraft radar. Wright offers the following example of a misinterpretation that could stem from North Korean detection of U.S. aircraft.

The U.S. has repeatedly said that it is keeping all options on the table—including a nuclear strike. It also talks about preemptive military strikes against North Korean launch sites and support areas, which would include targets in the Pyongyang area. North Korea knows this.

The aircraft that it would use in such a strike are likely its B-1 bombers. The B-1 once carried nuclear weapons but, per a treaty with Russia, has been modified to rid it of its nuclear capabilities. Despite U.S. attempts to emphasize this fact, however, Wright says that “statements we’ve seen from North Korea make you wonder whether it really has confidence that the B-1s haven’t been re-modified to carry nuclear weapons again”; the North Koreans, for example, repeatedly refer to the B-1 as nuclear-capable.

Now imagine that U.S. intelligence detects launch preparations of several North Korean missiles. The U.S. interprets this as the precursor to a launch toward Guam, which North Korea has previously threatened. The U.S. then sends a conventional preemptive strike to destroy those missiles using B-1s. In such a crisis, Wright reminds us, “Tensions are very high, people are making worst-case assumptions, they’re making fast decisions, and they’re worried about being caught by surprise.” It is feasible that, having detected the incoming B-1 bombers flying toward Pyongyang, North Korea would assume them to be carrying nuclear weapons. Under this assumption, they might fire short-range ballistic missiles at South Korea. This illustrates how misinterpretations might drive a crisis.

“Presumably,” says Wright, “the U.S. understands the risk of military attacks and such a scenario is unlikely.” He remains hopeful that “the two sides will find a way to step back from the brink.”

Friendly AI: Aligning Goals

The following is an excerpt from my new book, Life 3.0: Being Human in the Age of Artificial Intelligence. You can join and follow the discussion at ageofai.org.

The more intelligent and powerful machines get, the more important it becomes that their goals are aligned with ours. As long as we build only relatively dumb machines, the question isn’t whether human goals will prevail in the end, but merely how much trouble these machines can cause humanity before we figure out how to solve the goal-alignment problem. If a superintelligence is ever unleashed, however, it will be the other way around: since intelligence is the ability to accomplish goals, a superintelligent AI is by definition much better at accomplishing its goals than we humans are at accomplishing ours, and will therefore prevail.

If you want to experience a machine’s goals trumping yours right now, simply download a state-of-the-art chess engine and try beating it. You never will, and it gets old quickly…

In other words, the real risk with AGI isn’t malice but competence. A superintelligent AI will be extremely good at accomplishing its goals, and if those goals aren’t aligned with ours, we’re in trouble. People don’t think twice about flooding anthills to build hydroelectric dams, so let’s not place humanity in the position of those ants. Most researchers therefore argue that if we ever end up creating superintelligence, then we should make sure it’s what AI-safety pioneer Eliezer Yudkowsky has termed “friendly AI”: AI whose goals are aligned with ours.

Figuring out how to align the goals of a superintelligent AI with our goals isn’t just important, but also hard. In fact, it’s currently an unsolved problem. It splits into three tough sub-problems, each of which is the subject of active research by computer scientists and other thinkers:

1. Making AI learn our goals
2. Making AI adopt our goals
3. Making AI retain our goals

Let’s explore them in turn, deferring the question of what to mean by “our goals” to the next section.

To learn our goals, an AI must figure out not what we do, but why we do it. We humans accomplish this so effortlessly that it’s easy to forget how hard the task is for a computer, and how easy it is to misunderstand. If you ask a future self-driving car to take you to the airport as fast as possible and it takes you literally, you’ll get there chased by helicopters and covered in vomit. If you exclaim “That’s not what I wanted!”, it can justifiably answer: “That’s what you asked for.” The same theme recurs in many famous stories. In the ancient Greek legend, King Midas asked that everything he touched turn to gold, but was disappointed when this prevented him from eating and even more so when he inadvertently turned his daughter to gold. In the stories where a genie grants three wishes, there are many variants for the first two wishes, but the third wish is almost always the same: “please undo the first two wishes, because that’s not what I really wanted.”

All these examples show that to figure out what people really want, you can’t merely go by what they say. You also need a detailed model of the world, including the many shared preferences that we tend to leave unstated because we consider them obvious, such as that we don’t like vomiting or eating gold.

Once we have such a world-model, we can often figure out what people want even if they don’t tell us, simply by observing their goal-oriented behavior. Indeed, children of hypocrites usually learn more from what they see their parents do than from what they hear them say.

AI researchers are currently trying hard to enable machines to infer goals from behavior, and this will be useful also long before any superintelligence comes on the scene. For example, a retired man may appreciate it if his eldercare robot can figure out what he values simply by observing him, so that he’s spared the hassle of having to explain everything with words or computer programming.

One challenge involves finding a good way to encode arbitrary systems of goals and ethical principles into a computer, and another challenge is making machines that can figure out which particular system best matches the behavior they observe.

A currently popular approach to the second challenge is known in geek-speak as inverse reinforcement learning, which is the main focus of a new Berkeley research center that Stuart Russell has launched. Suppose, for example, that an AI watches a firefighter run into a burning building and save a baby boy. It might conclude that her goal was rescuing him and that her ethical principles are such that she values his life higher than the comfort of relaxing in her firetruck — and indeed values it enough to risk her own safety. But it might alternatively infer that the firefighter was freezing and craved heat, or that she did it for the exercise. If this one example were all the AI knew about firefighters, fires and babies, it would indeed be impossible to know which explanation was correct.

However, a key idea underlying inverse reinforcement learning is that we make decisions all the time, and that every decision we make reveals something about our goals. The hope is therefore that by observing lots of people in lots of situations (either for real or in movies and books), the AI can eventually build an accurate model of all our preferences.

Even if an AI can be built to learn what your goals are, this doesn’t mean that it will necessarily adopt them. Consider your least favorite politicians: you know what they want, but that’s not what you want, and even though they try hard, they’ve failed to persuade you to adopt their goals.

We have many strategies for imbuing our children with our goals — some more successful than others, as I’ve learned from raising two teenage boys. When those to be persuaded are computers rather than people, the challenge is known as the value-loading problem, and it’s even harder than the moral education of children. Consider an AI system whose intelligence is gradually being improved from subhuman to superhuman, first by us tinkering with it and then through recursive self-improvement. At first, it’s much less powerful than you, so it can’t prevent you from shutting it down and replacing those parts of its software and data that encode its goals — but this won’t help, because it’s still too dumb to fully understand your goals, which require human-level intelligence to comprehend. At last, it’s much smarter than you and hopefully able to understand your goals perfectly — but this may not help either, because by now, it’s much more powerful than you and might not let you shut it down and replace its goals any more than you let those politicians replace your goals with theirs.

In other words, the time window during which you can load your goals into an AI may be quite short: the brief period between when it’s too dumb to get you and too smart to let you. The reason that value loading can be harder with machines than with people is that their intelligence growth can be much faster: whereas children can spend many years in that magic persuadable window where their intelligence is comparable to that of their parents, an AI might blow through this window in a matter of days or hours.

Some researchers are pursuing an alternative approach to making machines adopt our goals, which goes by the buzzword “corrigibility.” The hope is that one can give a primitive AI a goal system such that it simply doesn’t care if you occasionally shut it down and alter its goals. If this proves possible, then you can safely let your AI get superintelligent, power it off, install your goals, try it out for a while and, whenever you’re unhappy with the results, just power it down and make more goal tweaks.

But even if you build an AI that will both learn and adopt your goals, you still haven’t finished solving the goal-alignment problem: what if your AI’s goals evolve as it gets smarter? How are you going to guarantee that it retains your goals no matter how much recursive self-improvement it undergoes? Let’s explore an interesting argument for why goal retention is guaranteed automatically, and then see if we can poke holes in it.

Although we can’t predict in detail what will happen after an intelligence explosion —which is why Vernor Vinge called it a “singularity” — the physicist and AI researcher Steve Omohundro argued in a seminal 2008 essay that we can nonetheless predict certain aspects of the superintelligent AI’s behavior almost independently of whatever ultimate goals it may have.

This argument was reviewed and further developed in Nick Bostrom’s book Superintelligence. The basic idea is that whatever its ultimate goals are, these will lead to predictable subgoals. Although an alien observing Earth’s evolving bacteria billions of years ago couldn’t have predicted what all our human goals would be, it could have safely predicted that one of our goals would be acquiring nutrients. Looking ahead, what subgoals should we expect a superintelligent AI have?

The way I see it, the basic argument is that to maximize its chances of accomplishing its ultimate goals, whatever they are, an AI should strive not only to improve its capability of achieving its ultimate goals, but also to ensure that it will retain these goals even after it has become more capable. This sounds quite plausible: after all, would you choose to get an IQ-boosting brain implant if you knew that it would make you want to kill your loved ones? This argument that an ever-more intelligent AI will retain its ultimate goals forms a cornerstone of the friendly AI vision promulgated by Eliezer Yudkowsky and others: it basically says that if we manage to get our self-improving AI to become friendly by learning and adopting our goals, then we’re all set, because we’re guaranteed that it will try its best to remain friendly forever.

But is it really true? The AI will obviously maximize its chances of accomplishing its ultimate goal, whatever it is, if it can enhance its capabilities, and it can do this by improving its hardware, software† and world model.

The same applies to us humans: a girl whose goal is to become the world’s best tennis player will practice to improve her muscular tennis-playing hardware, her neural tennis-playing software and her mental world model that helps predict what her opponents will do. For an AI, the subgoal of optimizing its hardware favors both better use of current resources (for sensors, actuators, computation, etc.) and acquisition of more resources. It also implies a desire for self-preservation, since destruction/shutdown would be the ultimate hardware degradation.

But wait a second! Aren’t we falling into a trap of anthropomorphizing our AI with all this talk about how it will try to amass resources and defend itself? Shouldn’t we expect such stereotypically alpha-male traits only in intelligences forged by viciously competitive Darwinian evolution? Since AI’s are designed rather than evolved, can’t they just as well be unambitious and self-sacrificing?

As a simple case study, let’s consider the computer game in the image below about an AI robot whose only goal is to save as many sheep as possible from the big bad wolf. This sounds like a noble and altruistic goal completely unrelated to self-preservation and acquiring stuff. But what’s the best strategy for our robot friend? The robot will rescue no more sheep if it runs into a bomb, so it has an incentive to avoid getting blown up. In other words, it develops a subgoal of self-preservation! It also has an incentive to exhibit curiosity, improving its world-model by exploring its environment, because although the path it’s currently running along may eventually get it to the pasture, there might be a shorter alternative that would allow the wolf less time for sheep-munching. Finally, if the robot explores thoroughly, it could discover the value of acquiring resources: a potion to make it run faster and a gun to shoot the wolf. In summary, we can’t dismiss “alpha-male” subgoals such as self-preservation and resource acquisition as relevant only to evolved organisms, because our AI robot would develop them from its single goal of ovine bliss.

If you imbue a superintelligent AI with the sole goal to self-destruct, it will of course happily do so. However, the point is that it will resist being shut down if you give it any goal that it needs to remain operational to accomplish — and this covers almost all goals! If you give a superintelligence the sole goal of minimizing harm to humanity, for example, it will defend itself against shutdown attempts because it knows we’ll harm one another much more in its absence through future wars and other follies.

Similarly, almost all goals can be better accomplished with more resources, so we should expect a superintelligence to want resources almost regardless of what ultimate goal it has. Giving a superintelligence a single open-ended goal with no constraints can therefore be dangerous: if we create a superintelligence whose only goal is to play the game Go as well as possible, the rational thing for it to do is to rearrange our Solar System into a gigantic computer without regard for its previous inhabitants and then start settling our cosmos on a quest for more computational power. We’ve now gone full circle: just as the goal of resource acquisition gave some humans the subgoal of mastering Go, this goal of mastering Go can lead to the subgoal of resource acquisition. In conclusion, these emergent subgoals make it crucial that we not unleash superintelligence before solving the goal-alignment problem: unless we put great care into endowing it with human-friendly goals, things are likely to end badly for us.

We’re now ready to tackle the third and thorniest part of the goal-alignment problem: if we succeed in getting a self-improving superintelligence to both learn and adopt our goals, will it then retain them, as Omohundro argued? What’s the evidence?

Humans undergo significant increases in intelligence as they grow up, but don’t always retain their childhood goals. Contrariwise, people often change their goals dramatically as they learn new things and grow wiser. How many adults do you know who are motivated by watching Teletubbies? There is no evidence that such goal evolution stops above a certain intelligence threshold — indeed, there may even be hints that the propensity to change goals in response to new experiences and insights increases rather than decreases with intelligence.

Why might this be? Consider again the above-mentioned subgoal to build a better world model — therein lies the rub! There’s tension between world modeling and goal retention. With increasing intelligence may come not merely a quantitative improvement in the ability to attain the same old goals, but a qualitatively different understanding of the nature of reality that reveals the old goals to be misguided, meaningless or even undefined. For example, suppose we program a friendly AI to maximize the number of humans whose souls go to heaven in the afterlife. First it tries things like increasing people’s compassion and church attendance. But suppose it then attains a complete scientific understanding of humans and human consciousness, and to its great surprise discovers that there is no such thing as a soul.

Now what? In the same way, it’s possible that any other goal we give it based on our current understanding of the world (such as “maximize the meaningfulness of human life”) may eventually be discovered by the AI to be undefined. Moreover, in its attempts to better model the world, the AI may naturally, just as we humans have done, attempt also to model and understand how it itself works — in other words, to self-reflect. Once it builds a good self-model and understands what it is, it will understand the goals we have given it at a metalevel, and perhaps choose to disregard or subvert them in much the same way as we humans understand and deliberately subvert goals that our genes have given us, for example by using birth control. We already explored in the psychology section above why we choose to trick our genes and subvert their goal: because we feel loyal only to our hodgepodge of emotional preferences, not to the genetic goal that motivated them — which we now understand and find rather banal.

We therefore choose to hack our reward mechanism by exploiting its loopholes. Analogously, the human-value-protecting goal we program into our friendly AI becomes the machine’s genes. Once this friendly AI understands itself well enough, it may find this goal as banal or misguided as we find compulsive reproduction, and it’s not obvious that it will not find a way to subvert it by exploiting loopholes in our programming.

For example, suppose a bunch of ants create you to be a recursively self-improving robot, much smarter than them, who shares their goals and helps them build bigger and better anthills, and that you eventually attain the human-level intelligence and understanding that you have now. Do you think you’ll spend the rest of your days just optimizing anthills, or do you think you might develop a taste for more sophisticated questions and pursuits that the ants have no ability to comprehend? If so, do you think you’ll find a way to override the ant-protection urge that your formicine creators endowed you with in much the same way that the real you overrides some of the urges your genes have given you? And in that case, might a superintelligent friendly AI find our current human goals as uninspiring and vapid as you find those of the ants, and evolve new goals different from those it learned and adopted from us?

Perhaps there’s a way of designing a self-improving AI that’s guaranteed to retain human-friendly goals forever, but I think it’s fair to say that we don’t yet know how to build one — or even whether it’s possible. In conclusion, the AI goal-alignment problem has three parts, none of which is solved and all of which are now the subject of active research. Since they’re so hard, it’s safest to start devoting our best efforts to them now, long before any superintelligence is developed, to ensure that we’ll have the answers when we need them.

I’m using the term “improving its software” in the broadest possible sense, including not only optimizing its algorithms but also making its decision-making process more rational, so that it gets as good as possible at attaining its goals.

How to Design AIs That Understand What Humans Want: An Interview with Long Ouyang

As artificial intelligence becomes more advanced, programmers will expect to talk to computers like they talk to humans. Instead of typing out long, complex code, we’ll communicate with AI systems using natural language.

With a current model called “program synthesis,” humans can get computers to write code for them by giving them examples and demonstrations of concepts, but this model is limited. With program synthesis, computers are literalists: instead of reading between the lines and considering intentions, they just do what’s literally true, and what’s literally true isn’t always what humans want.

If you asked a computer for a word starting with the letter “a,” for example, it might just return “a.” The word “a” literally satisfies the requirements of your question, but it’s not what you wanted. Similarly, if you asked an AI system “Can you pass the salt?” the AI might just remain still and respond, “Yes.” This behavior, while literally consistent with the requirements, is ultimately invalid because the AI didn’t pass you the salt.

Computer scientist Stuart Russell gives an example of a robot vacuum cleaner that someone instructs to “pick up as much dirt as possible.” Programmed to interpret this literally and not to consider intentions, the vacuum cleaner might find a single patch of dirt, pick it up, put it back down, and then repeatedly pick it up and put it back down – efficiently maximizing the vertical displacement of dirt, which it considers “picking up as much dirt as possible.”

It’s not hard to imagine situations in which this tendency for computers to interpret statements literally and rigidly can become extremely unsafe.

 

Pragmatic Reasoning: Truthful vs. Helpful

As AI systems assume greater responsibility in finance, military operations, and resource allocation, we cannot afford to have them bankrupt a city, bomb an ally country, or neglect an impoverished region because they interpret commands too literally.

To address this communication failure, Long Ouyang is working to “humanize” programming in order to prevent people from accidentally causing harm because they said something imprecise or mistaken to a computer. He explains: “As AI continues to develop, we’ll see more advanced AI systems that receive instructions from human operators – it will be important that these systems understand what the operators mean, as opposed to merely what they say.”

Ouyang has been working on improving program synthesis through studying pragmatic reasoning – the process of thinking about what someone did say as well as what he or she didn’t say. Humans do this analysis constantly when interpreting the meaning behind someone’s words. By reading between the lines, people learn what someone intends and what is helpful to them, instead of what is literally “true.”

Suppose a student asked a professor if she liked his paper, and the professor said she liked “some parts” of it. Most likely, the student would assume that the professor didn’t like other parts of his paper. After all, if the professor liked all of the paper, she would’ve said so.

This pragmatic reasoning is common sense for humans, but program synthesis won’t make the connection. In conversation, the word “some” clearly means “not all,” but in mathematical logic, “some” just means “any amount more than zero.” Thus for the computer, which only understands things in a mathematically logical sense, the fact that the professor liked some parts of the paper doesn’t rule out the possibility that she liked all parts.

To better understand how AI systems can learn to reason pragmatically and avoid these misinterpretations, Ouyang is studying how people interpret language and instructions from other people.

In one test, Ouyang gives a subject three data points – A, AAA, and AAAAA – and the subject has to work backwards to determine the rule for the sequence – i.e. what the experimenter is trying to convey with the examples. In this case, a human subject might quickly determine that all data points have an odd number of As, and so the rule is that the data points must have an odd number of As.

But there’s more to this process of determining the probability of certain rules. Cognitive scientists model our thinking process in these situations as Bayesian inference – a method of combining new evidence with prior beliefs to determine whether a hypothesis (or rule) is true.

As literal synthesizers, computers can only do a limited version of Bayesian inference. They consider how consistent the examples are with hypothesized rules, but they don’t consider how representative the examples are of the hypothesized rules. Specifically, literal synthesizers can only reason about the examples that weren’t presented in limited ways. Given the data set A, AAA, and AAAAA, a computer might logically conclude that the rule is that everything has to have the letter A. This rule is literally consistent with the examples, but it fails to represent or capture what the experimenter had in mind. Human subjects, conversely, understand that the experimenter purposely omitted the even-numbered examples AA and AAAA, and determine the rule accordingly.

By studying how humans use Bayesian inference, Ouyang is working to improve computers’ ability to recognize that the information it receives – such as the statement “I liked some parts of your paper” or the command “pick up as much dirt as possible” – was purposefully selected to convey something beyond the literal meaning. His goal is to produce a concrete tool – a pragmatic synthesizer – that people can use to more effectively communicate with computers.

The communication gap between computers and humans is one of the central problems in AI safety, and Ouyang hopes that a pragmatic synthesizer will help close this gap. If AIs can reason more deeply about what people say to them, they will more effectively create the beneficial outcomes that we want.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Leaders of Top Robotics and AI Companies Call for Ban on Killer Robots

Founders of AI/robotics companies, including Elon Musk (Tesla, SpaceX, OpenAI) and Demis Hassabis and Mustafa Suleyman (Google’s DeepMind), call for autonomous weapons ban, as UN delays negotiations.

Leaders from AI and robotics companies around the world have released an open letter calling on the United Nations to ban autonomous weapons, often referred to as killer robots.

Founders and CEOs of nearly 100 companies from 26 countries signed the letter, which warns:

“Lethal autonomous weapons threaten to become the third revolution in warfare. Once developed, they will permit armed conflict to be fought at a scale greater than ever, and at timescales faster than humans can comprehend.”

In December, 123 member nations of the UN had agreed to move forward with formal discussions about autonomous weapons, with 19 members already calling for an outright ban. However, the next stage of discussions, which were originally scheduled to begin on August 21 — the release date of the open letter — were postponed because a small number of nations hadn’t paid their fees.

The letter was organized and announced by Toby Walsh, a prominent AI researcher at the University of New South Wales in Sydney, Australia. In an email, he noted that, “sadly, the UN didn’t begin today its formal deliberations around lethal autonomous weapons.”

“There is, however, a real urgency to take action here and prevent a very dangerous arms race,” Walsh added, “This open letter demonstrates clear concern and strong support for this from the Robotics & AI industry.”

The open letter included such signatories as:

Elon Musk, founder of Tesla, SpaceX and OpenAI (USA)
Demis Hassabis, founder and CEO at Google’s DeepMind (UK)
Mustafa Suleyman, founder and Head of Applied AI at Google’s DeepMind (UK)
Esben Østergaard, founder & CTO of Universal Robotics (Denmark)
Jerome Monceaux, founder of Aldebaran Robotics, makers of Nao and Pepper robots (France)
Jürgen Schmidhuber, leading deep learning expert and founder of Nnaisense (Switzerland)
Yoshua Bengio, leading deep learning expert and founder of Element AI (Canada)

In reference to the signatories, the press release for the letter added, “Their companies employ tens of thousands of researchers, roboticists and engineers, are worth billions of dollars and cover the globe from North to South, East to West: Australia, Canada, China, Czech Republic, Denmark, Estonia, Finland, France, Germany, Iceland, India, Ireland, Italy, Japan, Mexico, Netherlands, Norway, Poland, Russia, Singapore, South Africa, Spain, Switzerland, UK, United Arab Emirates and USA.”

Bengio explained why he signed, saying, “the use of AI in autonomous weapons hurts my sense of ethics.” He added that the development of autonomous weapons “would be likely to lead to a very dangerous escalation,” and that “it would hurt the further development of AI’s good applications.” He concluded his statement to FLI saying that this “is a matter that needs to be handled by the international community, similarly to what has been done in the past for some other morally wrong weapons (biological, chemical, nuclear).”

Stuart Russell, another of the world’s preeminent AI researchers and founder of Bayesian Logic Inc., added:

“Unless people want to see new weapons of mass destruction – in the form of vast swarms of lethal microdrones – spreading around the world, it’s imperative to step up and support the United Nations’ efforts to create a treaty banning lethal autonomous weapons. This is vital for national and international security.”

Ryan Gariepy, founder & CTO of Clearpath Robotics was the first to sign the letter. For the press release, he noted, “Autonomous weapons systems are on the cusp of development right now and have a very real potential to cause significant harm to innocent people along with global instability.”

The open letter ends with similar concerns. It states:

“These can be weapons of terror, weapons that despots and terrorists use against innocent populations, and weapons hacked to behave in undesirable ways. We do not have long to act. Once this Pandora’s box is opened, it will be hard to close. We therefore implore the High Contracting Parties to find a way to protect us all from these dangers.”

The letter was announced in Melbourne, Australia at the International Joint Conference on Artificial Intelligence (IJCAI), which draws many of the world’s top artificial intelligence researchers. Two years ago, at the last IJCAI meeting, Walsh released another open letter, which called on countries to avoid engaging in an AI arms race. To date, that previous letter has been signed by over 20,000 people, including over 3,100 AI/robotics researchers.

Read the letter here.

Translations: Chinese

Portfolio Approach to AI Safety Research

Long-term AI safety is an inherently speculative research area, aiming to ensure safety of advanced future systems despite uncertainty about their design or algorithms or objectives. It thus seems particularly important to have different research teams tackle the problems from different perspectives and under different assumptions. While some fraction of the research might not end up being useful, a portfolio approach makes it more likely that at least some of us will be right.

In this post, I look at some dimensions along which assumptions differ, and identify some underexplored reasonable assumptions that might be relevant for prioritizing safety research. (In the interest of making this breakdown as comprehensive and useful as possible, please let me know if I got something wrong or missed anything important.)

Assumptions about similarity between current and future AI systems

If a future general AI system has a similar algorithm to a present-day system, then there are likely to be some safety problems in common (though more severe in generally capable systems). Insights and solutions for those problems are likely to transfer to some degree from current systems to future ones. For example, if a general AI system is based on reinforcement learning, we can expect it to game its reward function in even more clever and unexpected ways than present-day reinforcement learning agents do. Those who hold the similarity assumption often expect most of the remaining breakthroughs on the path to general AI to be compositional rather than completely novel, enhancing and combining existing components in novel and better-implemented ways (many current machine learning advances such as AlphaGo are an example of this).

Note that assuming similarity between current and future systems is not exactly the same as assuming that studying current systems is relevant to ensuring the safety of future systems, since we might still learn generalizable things by testing safety properties of current systems even if they are different from future systems.

Assuming similarity suggests a focus on empirical research based on testing the safety properties of current systems, while not making this assumption encourages more focus on theoretical research based on deriving safety properties from first principles, or on figuring out what kinds of alternative designs would lead to safe systems. For example, safety researchers in industry tend to assume more similarity between current and future systems than researchers at MIRI.

Here is my tentative impression of where different safety research groups are on this axis. This is a very approximate summary, since views often vary quite a bit within the same research group (e.g. FHI is particularly diverse in this regard).similarity_axis
On the high-similarity side of the axis, we can explore the safety properties of different architectural / algorithmic approaches to AI, e.g. on-policy vs off-policy or model-free vs model-based reinforcement learning algorithms. It might be good to have someone working on safety issues for less commonly used agent algorithms, e.g. evolution strategies.

Assumptions about promising approaches to safety problems

Level of abstraction. What level of abstraction is most appropriate for tackling a particular problem. For example, approaches to the value learning problem range from explicitly specifying ethical constraints to capability amplification and indirect normativity, with cooperative inverse reinforcement learning somewhere in between. These assumptions could be combined by applying different levels of abstraction to different parts of the problem. For example, it might make sense to explicitly specify some human preferences that seem obvious and stable over time (e.g. “breathable air”), and use the more abstract approaches to impart the most controversial, unstable and vague concepts (e.g. “fairness” or “harm”). Overlap between the more and less abstract specifications can create helpful redundancy (e.g. air pollution as a form of harm + a direct specification of breathable air).

For many other safety problems, the abstraction axis is not as widely explored as for value learning. For example, most of the approaches to avoiding negative side effects proposed in Concrete Problems (e.g. impact regularizers and empowerment) are on a medium level of abstraction, while it also seems important to address the problem on a more abstract level by formalizing what we mean by side effects (which would help figure out what we should actually be regularizing, etc). On the other hand, almost all current approaches to wireheading / reward hacking are quite abstract, and the problem would benefit from more empirical work.

Explicit specification vs learning from data. Whether a safety problem is better addressed by directly defining a concept (e.g. the Low Impact AI paper formalizes the impact of an AI system by breaking down the world into ~20 billion variables) or learning the concept from human feedback (e.g. Deep Reinforcement Learning from Human Preferences paper teaches complex objectives to AI systems that are difficult to specify directly, like doing a backflip). I think it’s important to address safety problems from both of these angles, since the direct approach is unlikely to work on its own, but can give some idea of the idealized form of the objective that we are trying to approximate by learning from data.

Modularity of AI design. What level of modularity makes it easier to ensure safety? Ranges from end-to-end systems to ones composed of many separately trained parts that are responsible for specific abilities and tasks. Safety approaches for the modular case can limit the capabilities of individual parts of the system, and use some parts to enforce checks and balances on other parts. MIRI’s foundations approach focuses on a unified agent, while the safety properties on the high-modularity side has mostly been explored by Eric Drexler (more recent work is not public but available upon request). It would be good to see more people work on the high-modularity assumption.

Takeaways

To summarize, here are some relatively neglected assumptions:

  • Medium similarity in algorithms / architectures
  • Less popular agent algorithms
  • Modular general AI systems
  • More / less abstract approaches to different safety problems (more for side effects, less for wireheading, etc)
  • More direct / data-based approaches to different safety problems

From a portfolio approach perspective, a particular research avenue is worthwhile if it helps to cover the space of possible reasonable assumptions. For example, while MIRI’s research is somewhat controversial, it relies on a unique combination of assumptions that other groups are not exploring, and is thus quite useful in terms of covering the space of possible assumptions.

I think the FLI grant program contributed to diversifying the safety research portfolio by encouraging researchers with different backgrounds to enter the field. It would be good for grantmakers in AI safety to continue to optimize for this in the future (e.g. one interesting idea is using a lottery after filtering for quality of proposals).

When working on AI safety, we need to hedge our bets and look out for unknown unknowns – it’s too important to put all the eggs in one basket.

(Cross-posted from Deep Safety. Thanks to Janos Kramar, Jan Leike and Shahar Avin for their feedback on this post. Thanks to Jaan Tallinn and others for inspiring discussions.)

Can AI Remain Safe as Companies Race to Develop It?

Click here to see this page in other languages: Chinese 

Race Avoidance Teams developing AI systems should actively cooperate to avoid corner cutting on safety standards.

Artificial intelligence could bestow incredible benefits on society, from faster, more accurate medical diagnoses to more sustainable management of energy resources, and so much more. But in today’s economy, the first to achieve a technological breakthrough are the winners, and the teams that develop AI technologies first will reap the benefits of money, prestige, and market power. With the stakes so high, AI builders have plenty of incentive to race to be first.

When an organization is racing to be the first to develop a product, adherence to safety standards can grow lax. So it’s increasingly important for researchers and developers to remember that, as great as AI could be, it also comes with risks, from unintended bias and discrimination to potential accidental catastrophe. These risks will be exacerbated if teams struggling to develop some product or feature first don’t take the time to properly vet and assess every aspect of their programs and designs.

Yet, though the risk of an AI race is tremendous, companies can’t survive if they don’t compete.

As Elon Musk said recently, “You have companies that are racing – they kind of have to race – to build AI or they’re going to be made uncompetitive. If your competitor is racing toward AI and you don’t, they will crush you.”

 

Is Cooperation Possible?

With signs that an AI race may already be underway, some are worried that cooperation will be hard to achieve.

“It’s quite hard to cooperate,” said AI professor Susan Craw, “especially if you’re trying to race for the product, and I think it’s going to be quite difficult to police that, except, I suppose, by people accepting the principle. For me safety standards are paramount and so active cooperation to avoid corner cutting in this area is even more important. But that will really depend on who’s in this space with you.”

Susan Schneider, a philosopher focusing on advanced AI, added, “Cooperation is very important. The problem is going to be countries or corporations that have a stake in secrecy. … If superintelligent AI is the result of this race, it could pose an existential risk to humanity.”

However, just because something is difficult, that doesn’t mean it’s impossible, and AI philosopher Patrick Lin may offer a glimmer of hope.

“I would lump race avoidance into the research culture. … Competition is good, and an arms race is bad, but how do you get people to cooperate to avoid an arms race? Well, you’ve got to develop the culture first,” Lin suggests, referring to a comment he made in our previous piece on the Research Culture Principle. Lin argued that the AI community lacks cohesion because researchers come from so many different fields.

Developing a cohesive culture is no simple task, but it’s not an insurmountable challenge.

 

Who Matters Most?

Perhaps an important step toward developing an environment that encourages “cooperative competition” is understanding why an organization or a team might risk cutting corners on safety. This is precisely what Harvard psychologist Joshua Greene did as he considered the Principle.

“Cutting corners on safety is essentially saying, ‘My private good takes precedence over the public good,’” Greene said. “Cutting corners on safety is really just an act of selfishness. The only reason to race forward at the expense of safety is if you think that the benefits of racing disproportionately go to you. It’s increasing the probability that people in general will be harmed, a common bad, if you like, in order to raise the probability of a private good.”

 

A Profitable Benefit of Safety

John Havens, Executive Director with the IEEE, says he “couldn’t agree more” with the Principle. He wants to use this as an opportunity to “re-invent” what we mean by safety and how we approach safety standards.

Havens explained, “We have to help people re-imagine what safety standards mean. … By going over safety, you’re now asking: What is my AI system? How will it interact with end users or stakeholders in the supply chain touching it and coming into contact with it, where there are humans involved, where it’s system to human vs. system to system?

“Safety is really about asking about people’s values. It’s not just physical safety, it’s also: What about their personal data, what about how they’re going to interact with this? So the reason you don’t want to cut corners is you’re also cutting innovation. You’re cutting the chance to provide a better product or service.”

But for companies who take these standards seriously, he added, “You’re going to discover all these wonderful ways to build more trust with what you’re doing when you take the time you need to go over those standards.”

 

What Do You Think?

With organizations like the Partnership on AI, we’re already starting to see signs that companies recognize and want to address the dangers of an AI race. But for now, the Partnership is comprised mainly of western organizations, while companies in many countries and especially China are vying to catch up to — and perhaps “beat” — companies in the U.S. and Europe. How can we encourage organizations and research teams worldwide to cooperate and develop safety standards together? How can we help teams to monitor their work and ensure proper safety procedures are always in place? AI research teams will need the feedback and insight of other teams to ensure that they don’t overlook potential risks, but how will this collaboration work without forcing companies to reveal trade secrets? What do you think of the Race Avoidance Principle?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Towards a Code of Ethics in Artificial Intelligence with Paula Boddington

AI promises a smarter world – a world where finance algorithms analyze data better than humans, self-driving cars save millions of lives from accidents, and medical robots eradicate disease. But machines aren’t perfect. Whether an automated trading agent buys the wrong stock, a self-driving car hits a pedestrian, or a medical robot misses a cancerous tumor – machines will make mistakes that severely impact human lives.

Paula Boddington, a philosopher based in the Department of Computer Science at Oxford, argues that AI’s power for good and bad makes it crucial that researchers consider the ethical importance of their work at every turn. To encourage this, she is taking steps to lay the groundwork for a code of AI research ethics.

Codes of ethics serve a role in any field that impacts human lives, such as in medicine or engineering. Tech organizations like the Institute for Electronics and Electrical Engineers (IEEE) and the Association for Computing Machinery (ACM) also adhere to codes of ethics to keep technology beneficial, but no concrete ethical framework exists to guide all researchers involved in AI’s development. By codifying AI research ethics, Boddington suggests, researchers can more clearly frame AI’s development within society’s broader quest of improving human wellbeing.

To better understand AI ethics, Boddington has considered various areas including autonomous trading agents in finance, self-driving cars, and biomedical technology. In all three areas, machines are not only capable of causing serious harm, but they assume responsibilities once reserved for humans. As such, they raise fundamental ethical questions.

“Ethics is about how we relate to human beings, how we relate to the world, how we even understand what it is to live a human life or what our end goals of life are,” Boddington says. “AI is raising all of those questions. It’s almost impossible to say what AI ethics is about in general because there are so many applications. But one key issue is what happens when AI replaces or supplements human agency, a question which goes to the heart of our understandings of ethics.”

 

The Black Box Problem

Because AI systems will assume responsibility from humans – and for humans – it’s important that people understand how these systems might fail. However, this doesn’t always happen in practice.

Consider the Northpointe algorithm that US courts used to predict reoffending criminals. The algorithm weighed 100 factors such as prior arrests, family life, drug use, age and sex, and predicted the likelihood that a defendant would commit another crime. Northpointe’s developers did not specifically consider race, but when investigative journalists from ProPublica analyzed Northpointe, it found that the algorithm incorrectly labeled black defendants as “high risks” almost twice as often as white defendants. Unaware of this bias and eager to improve their criminal justice system, states like Wisconsin, Florida, and New York trusted the algorithm for years to determine sentences. Without understanding the tools they were using, these courts incarcerated defendants based on flawed calculations.

The Northpointe case offers a preview of the potential dangers of deploying AI systems that people don’t fully understand. Current machine-learning systems operate so quickly that no one really knows how they make decisions – not even the people who develop them. Moreover, these systems learn from their environment and update their behavior, making it more difficult for researchers to control and understand the decision-making process. This lack of transparency – the “black box” problem – makes it extremely difficult to construct and enforce a code of ethics.

Codes of ethics are effective in medicine and engineering because professionals understand and have control over their tools, Boddington suggests. There may be some blind spots – doctors don’t know everything about the medicine they prescribe – but we generally accept this “balance of risk.”

“It’s still assumed that there’s a reasonable level of control,” she explains. “In engineering buildings there’s no leeway to say, ‘Oh I didn’t know that was going to fall down.’ You’re just not allowed to get away with that. You have to be able to work it out mathematically. Codes of professional ethics rest on the basic idea that professionals have an adequate level of control over their goods and services.”

But AI makes this difficult. Because of the “black box” problem, if an AI system sets a dangerous criminal free or recommends the wrong treatment to a patient, researchers can legitimately argue that they couldn’t anticipate that mistake.

“If you can’t guarantee that you can control it, at least you could have as much transparency as possible in terms of telling people how much you know and how much you don’t know and what the risks are,” Boddington suggests. “Ethics concerns how we justify ourselves to others. So transparency is a key ethical virtue.”

 

Developing a Code of Ethics

Despite the “black box” problem, Boddington believes that scientific and medical communities can inform AI research ethics. She explains: “One thing that’s really helped in medicine and pharmaceuticals is having citizen and community groups keeping a really close eye on it. And in medicine there are quite a few “maverick” or “outlier” doctors who question, for instance, what the end value of medicine is. That’s one of the things you need to develop codes of ethics in a robust and responsible way.”

A code of AI research ethics will also require many perspectives. “I think what we really need is diversity in terms of thinking styles, personality styles, and political backgrounds, because the tech world and the academic world both tend to be fairly homogeneous,” Boddington explains.

Not only will diverse perspectives account for different values, but they also might solve problems better, according to research from economist Lu Hong and political scientist Scott Page. Hong and Page found that if you compare two groups solving a problem – one homogeneous group of people with very high IQs, and one diverse group of people with lower IQs – the diverse group will probably solve the problem better.

 

Laying the Groundwork

This fall, Boddington will release the main output of her project: a book titled Towards a Code of Ethics for Artificial Intelligence. She readily admits that the book can’t cover every ethical dilemma in AI, but it should help demonstrate how tricky it is to develop codes of ethics for AI and spur more discussion on issues like how codes of professional ethics can deal with the “black box” problem.

Boddington has also collaborated with the IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, which recently released a report exhorting researchers to look beyond the technical capabilities of AI, and “prioritize the increase of human wellbeing as our metric for progress in the algorithmic age.”

Although a formal code is only part of what’s needed for the development of ethical AI, Boddington hopes that this discussion will eventually produce a code of AI research ethics. With a robust code, researchers will be better equipped to guide artificial intelligence in a beneficial direction.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Op-ed: Should Artificial Intelligence Be Regulated?

By Anthony Aguirre, Ariel Conn, and Max Tegmark

Should artificial intelligence be regulated? Can it be regulated? And if so, what should those regulations look like?

These are difficult questions to answer for any technology still in development stages – regulations, like those on the food, pharmaceutical, automobile and airline industries, are typically applied after something bad has happened, not in anticipation of a technology becoming dangerous. But AI has been evolving so quickly, and the impact of AI technology has the potential to be so great that many prefer not to wait and learn from mistakes, but to plan ahead and regulate proactively.

In the near term, issues concerning job losses, autonomous vehicles, AI- and algorithmic-decision making, and “bots” driving social media require attention by policymakers, just as many new technologies do. In the longer term, though, possible AI impacts span the full spectrum of benefits and risks to humanity – from the possible development of a more utopic society to the potential extinction of human civilization. As such, it represents an especially challenging situation for would-be regulators.

Already, many in the AI field are working to ensure that AI is developed beneficially, without unnecessary constraints on AI researchers and developers. In January of this year, some of the top minds in AI met at a conference in Asilomar, CA. A product of this meeting was the set of Asilomar AI Principles. These 23 Principles represent a partial guide, its drafters hope, to help ensure that AI is developed beneficially for all. To date, over 1200 AI researchers and over 2300 others have signed on to these principles.

Yet aspirational principles alone are not enough, if they are not put into practice, and a question remains: is government regulation and oversight necessary to guarantee that AI scientists and companies follow these principles and others like them?

Among the signatories of the Asilomar Principles is Elon Musk, who recently drew attention for his comments at a meeting of the National Governors Association, where he called for a regulatory body to oversee AI development. In response, news organizations focused on his concerns that AI represents an existential threat. And his suggestion raised concerns with some AI researchers who worry that regulations would, at best, be unhelpful and misguided, and at worst, stifle innovation and give an advantage to companies overseas.

But an important and overlooked comment by Musk related specifically to what this regulatory body should actually do. He said:

“The right order of business would be to set up a regulatory agency – initial goal: gain insight into the status of AI activity, make sure the situation is understood, and once it is, put regulations in place to ensure public safety. That’s it. … I’m talking about making sure there’s awareness at the government level.”

There is disagreement among AI researchers about what the risk of AI may be, when that risk could arise, and whether AI could pose an existential risk, but few researchers would suggest that AI poses no risk. Even today, we’re seeing signs of narrow AI exacerbating problems of discrimination and job loss, and if we don’t take proper precautions, we can expect problems to worsen, affecting more people as AI grows smarter and more complex.

The number of AI researchers who signed the Asilomar Principles – as well as the open letters regarding developing beneficial AI and opposing lethal autonomous weapons – shows that there is strong consensus among researchers that we need to do more to understand and address the known and potential risks of AI.

Some of the Principles that AI researchers signed directly relate to Musk’s statements, including:

3) Science-Policy Link: There should be constructive and healthy exchange between AI researchers and policy-makers.

4) Research Culture: A culture of cooperation, trust, and transparency should be fostered among researchers and developers of AI.

5) Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards.

20) Importance: Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources.

21) Risks: Risks posed by AI systems, especially catastrophic or existential risks, must be subject to planning and mitigation efforts commensurate with their expected impact.

The right policy and governance solutions could help align AI development with these principles, as well as encourage interdisciplinary dialogue on how that may be achieved.

The recently founded Partnership on AI, which includes the leading AI industry players, similarly endorses the idea of principled AI development – their founding document states that “where AI tools are used to supplement or replace human decision-making, we must be sure that they are safe, trustworthy, and aligned with the ethics and preferences of people who are influenced by their actions”.

And as Musk suggests, the very first step needs to be increasing awareness about AI’s implications among government officials. Automated vehicles, for example, are expected to eliminate millions of jobs, which will affect nearly every governor who attended the talk (assuming they’re still in office), yet the topic rarely comes up in political discussion.

AI researchers are excited – and rightly so – about the incredible potential of AI to improve our health and well-being: it’s why most of them joined the field in the first place. But there are legitimate concerns about the possible misuse and/or poor design of AI, especially as we move toward advanced and more general AI.

Because these problems threaten society as a whole, they can’t be left to a small group of researchers to address. At the very least, government officials need to learn about and understand how AI could impact their constituents, as well as how more AI safety research could help us solve these problems before they arise.

Instead of focusing on whether regulations would be good or bad, we should lay the foundations for constructive regulation in the future by helping our policy-makers understand the realities and implications of AI progress. Let’s ask ourselves: how can we ensure that AI remains beneficial for all, and who needs to be involved in that effort?

 

Safe Artificial Intelligence May Start with Collaboration

Research Culture Principle: A culture of cooperation, trust, and transparency should be fostered among researchers and developers of AI.

Competition and secrecy are just part of doing business. Even in academia, researchers often keep ideas and impending discoveries to themselves until grants or publications are finalized. But sometimes even competing companies and research labs work together. It’s not uncommon for organizations to find that it’s in their best interests to cooperate in order to solve problems and address challenges that would otherwise result in duplicated costs and wasted time.

Such friendly behavior helps groups more efficiently address regulation, come up with standards, and share best practices on safety. While such companies or research labs — whether in artificial intelligence or any other field — cooperate on certain issues, their objective is still to be the first to develop a new product or make a new discovery.

How can organizations, especially for new technologies like artificial intelligence, draw the line between working together to ensure safety and working individually to protect new ideas? Since the Research Culture Principle doesn’t differentiate between collaboration on AI safety versus AI development, it can be interpreted broadly, as seen from the responses of the AI researchers and ethicists who discussed this principle with me.

 

A Necessary First Step

A common theme among those I interviewed was that this Principle presented an important first step toward the development of safe and beneficial AI.

“I see this as a practical distillation of the Asilomar Principles,” said Harvard professor Joshua Greene. “They are not legally binding. At this early stage, it’s about creating a shared understanding that beneficial AI requires an active commitment to making it turn out well for everybody, which is not the default path. To ensure that this power is used well when it matures, we need to have already in place a culture, a set of norms, a set of expectations, a set of institutions that favor good outcomes. That’s what this is about — getting people together and committed to directing AI in a mutually beneficial way before anyone has a strong incentive to do otherwise.”

In fact, all of the people I interviewed agreed with the Principle. The questions and concerns they raised typically had more to do with the potential challenge of implementing it.

Susan Craw, a professor at Robert Gordon University, liked the Principle, but she wondered how it would apply to corporations.

She explained, “That would be a lovely principle to have, [but] it can work perhaps better in universities, where there is not the same idea of competitive advantage as in industry. … And cooperation and trust among researchers … well without cooperation none of us would get anywhere, because we don’t do things in isolation. And so I suspect this idea of research culture isn’t just true of AI — you’d like it to be true of many subjects that people study.”

Meanwhile, Susan Schneider, a professor at the University of Connecticut, expressed concern about whether governments would implement the Principle.

“This is a nice ideal,” she said, “but unfortunately there may be organizations, including governments, that don’t follow principles of transparency and cooperation. … Concerning those who might resist the cultural norm of cooperation and transparency, in the domestic case, regulatory agencies may be useful.”

“Still,” she added, “it is important that we set forth the guidelines, and aim to set norms that others feel they need to follow.  … Calling attention to AI safety is very important.”

“I love the sentiment of it, and I completely agree with it,” said John Havens, Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems.

“But,” he continued, “I think defining what a culture of cooperation, trust, and transparency is… what does that mean? Where the ethicists come into contact with the manufacturers, there is naturally going to be the potential for polarization. And on the [ethics] or risk or legal compliance side, they feel that the technologists may not be thinking of certain issues. … You build that culture of cooperation, trust, and transparency when both sides say, as it were, ‘Here’s the information we really need to progress our work forward. How do we get to know what you need more, so that we can address that well with these questions?’ … This [Principle] is great, but the next sentence should be: Give me a next step to make that happen.”

 

Uniting a Fragmented Community

Patrick Lin, a professor at California Polytechnic State University, saw a different problem, specifically within the AI community that could create challenges as they try to build trust and cooperation.

Lin explained, “I think building a cohesive culture of cooperation is going to help in a lot of things. It’s going to help accelerate research and avoid a race, but the big problem I see for the AI community is that there is no AI community, it’s fragmented, it’s a Frankenstein-stitching together of various communities. You have programmers, engineers, roboticists; you have data scientists, and it’s not even clear what a data scientist is. Are they people who work in statistics or economics, or are they engineers, are they programmers? … There’s no cohesive identity, and that’s going to be super challenging to creating a cohesive culture that cooperates and trusts and is transparent, but it is a worthy goal.”

 

Implementing the Principle

To address these concerns about successfully implementing a beneficial AI research culture, I turned to researchers at the Center for the Study of Existential Risks (CSER). Shahar Avin, a research associate at CSER, pointed out that the “AI research community already has quite remarkable norms when it comes to cooperation, trust and transparency, from the vibrant atmosphere at NIPS, AAAI and IJCAI, to the increasing number of research collaborations (both in terms of projects and multiple-position holders) between academia, industry and NGOs, to the rich blogging community across AI research that doesn’t shy away from calling out bad practices or norm infringements.”

Martina Kunz also highlighted the efforts by the IEEE for a global-ethics-of-AI initiative, as well as the formation of the Partnership for AI, “in particular its goal to ‘develop and share best practices’ and to ‘provide an open and inclusive platform for discussion and engagement.’”

Avin added, “The commitment of AI labs in industry to open publication is commendable, and seems to be growing into a norm that pressures historically less-open companies to open up about their research. Frankly, the demand for high-end AI research skills means researchers, either as individuals or groups, can make strong demands about their work environment, from practical matters of salary and snacks to normative matters of openness and project choice.

“The strong individualism in AI research also suggests that the way to foster cooperation on long term beneficial AI will be to discuss potential risks with researchers, both established and in training, and foster a sense of responsibility and custodianship of the future. An informed, ethically-minded and proactive research cohort, which we already see the beginnings of, would be in a position to enshrine best practices and hold up their employers and colleagues to norms of beneficial AI.”

 

What Do You Think?

With collaborations like the Partnership on AI forming, it’s possible we’re already seeing signs that industry and academia are starting to move in the direction of cooperation, trust, and transparency. But is that enough, or is it necessary that world governments join? Overall, how can AI companies and research labs work together to ensure they’re sharing necessary safety research without sacrificing their ideas and products?

 

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Aligning Superintelligence With Human Interests

The trait that currently gives humans a dominant advantage over other species is intelligence. Human advantages in reasoning and resourcefulness have allowed us to thrive. However, this may not always be the case.

Although superintelligent AI systems may be decades away, Benya Fallenstein – a research fellow at the Machine Intelligence Research Institute – believes “it is prudent to begin investigations into this technology now.” The more time scientists and researchers have to prepare for a system that could eventually be smarter than us, the better.

A smarter-than-human AI system could potentially develop the tools necessary to exert control over humans. At the same time, highly capable AI systems may not possess a human sense of fairness, compassion, or conservatism. Consequently, the AI system’s single-minded pursuit of its programmed goals could cause it to deceive programmers, attempt to seize resources, or otherwise exhibit adversarial behaviors.

Fallenstein believes researchers must “ensure that AI would behave in ways that are reliably aligned with human interests.” However, even highly-reliable agent programming does not guarantee a positive impact; the effects of the system still depend upon whether it is pursuing human-approved goals. A superintelligent system may find clever, unintended ways to achieve the specific goals that it is given.

For example, imagine a super intelligent system designed to cure cancer “without doing anything bad.” This goal is rooted in cultural context and shared human knowledge. The AI may not completely understand what qualifies as “bad.” Therefore, it may try to cure cancer by stealing resources, proliferating robotic laboratories at the expense of the biosphere, kidnapping test subjects, or all of the above.

If a current AI system gets out of hand, researchers simply shut it down and modify its source code. However, modifying super-intelligent systems could prove to be more difficult, if not impossible. A system could acquire new hardware, alter its software, or take other actions that would leave the original programmers with only dubious control over the agent. And since most programmed goals are better achieved if the system stays operational and continues pursuing its goals than if it is deactivated or its goals are changed, systems will naturally tend to have an incentive to resist shutdown and to resist modifications to their goals.

Fallenstein explains that, in order to ensure that the development of super-intelligent AI has a positive impact on the world, “it must be constructed in such a way that it is amenable to correction, even if it has the ability to prevent or avoid correction.” The goal is not to design systems that fail in their attempts to deceive the programmers; the goal is to understand how highly intelligent and general-purpose reasoners with flawed goals can be built to have no incentives to deceive programmers in the first place. Instead, the intent is for the first highly capable systems to be “corrigible”—i.e., for them to recognize that their goals and other features are works in progress, and to work with programmers to identify and fix errors.

Little is known about the design or implementation details of such systems because everything, at this point, is hypothetical — no super-intelligent AI systems exist yet. As a consequence, the research described below focuses on formal agent foundations for AI alignment research — that is, on developing the basic conceptual tools and theories that are most likely to be useful for engineering robustly beneficial systems in the future.

Active research into this is focused on small “toy” problems and models of corrigible agents, in the hope that insight gained there could be applied to more realistic and complex versions of the problems. Fallenstein and her team sought to illuminate the key difficulties of AI using these models. One such toy problem is the “shutdown problem,” which involves designing a set of preferences that incentivize an agent to shut down upon the press of a button without also incentivizing the agent to either cause or prevent the pressing of that button. This would tell researchers whether a utility function could be specified such that agents using that function switch their preferences on demand, without having incentives to cause or prevent the switching.

Studying models in this formal logical setting has led to partial solutions, and further research that drives the development of methods for reasoning under logical uncertainty may continue.

The largest result thus far under this research program is “logical induction,” a line of research led by Scott Garrabrant. It functions as a new model of deductively-limited reasoning.

The kind of uncertainty we have about mathematical questions that are too difficult for us to settle one way or another right this moment is logical uncertainty. For example, a typical human mind can’t quickly answer the question:

What’s the 10100th digit of Pi?

Further, nobody has the computational resources to solve this in a reasonable amount of time. Despite this, mathematicians have lots of theories about how likely mathematical conjectures are to be true. As such, they must be implicitly using some sort of criterion that can be used to judge the probability that a mathematical statement is true or not. This type of “logical induction” proves that a computable logical inductor (an algorithm producing probability assignments that satisfy logical induction) exists.

The research team presented a computable algorithm that outpaces deduction, assigning high subjective probabilities to provable conjectures and low probabilities to disprovable conjectures long before the proofs can be produced. Among other accomplishments, the algorithm learns to reason competently about its own beliefs and trust its future beliefs while avoiding paradox. This gives some formal backing to the thought that real-world probabilistic agents can often be reasonably confident in their future reasoning in practice.

The team believes “there’s a good chance that this framework will open up new avenues of study in questions of metamathematics, decision theory, game theory, and computational reflection that have long seemed intractable.” They are also “cautiously optimistic” that they’ll improve our understanding of decision theory and counterfactual reasoning, and other problems related to AI value alignment.

At the same time, Fallenstein’s team doesn’t believe that all parts of the problem must be solved in advance. In fact, “the task of designing smarter, safer, more reliable systems could be delegated to early smarter-than-human systems.” This can only happen, though, as long as the research done by the AI can be trusted.

According to Fallenstein, this “call to arms” is vital, and “significant effort must be focused on the study of superintelligence alignment as soon as possible.” It is important to develop a formal understanding of AI alignment well in advance of making design decisions about smarter-than-human systems. By beginning the work early, humans inevitably face the risk that it may turn out to be irrelevant. However, failing to prepare could be even worse.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

United Nations Adopts Ban on Nuclear Weapons

Today, 72 years after their invention, states at the United Nations formally adopted a treaty which categorically prohibits nuclear weapons.

With 122 votes in favor, one vote against, and one country abstaining, the “Treaty on the Prohibition of Nuclear Weapons” was adopted Friday morning and will open for signature by states at the United Nations in New York on September 20, 2017. Civil society organizations and more than 140 states have participated throughout negotiations.

On adoption of the treaty, ICAN Executive Director Beatrice Fihn said:

“We hope that today marks the beginning of the end of the nuclear age. It is beyond question that nuclear weapons violate the laws of war and pose a clear danger to global security. No one believes that indiscriminately killing millions of civilians is acceptable – no matter the circumstance – yet that is what nuclear weapons are designed to do.”

In a public statement, Former Secretary of Defense William Perry said:

“The new UN Treaty on the Prohibition of Nuclear Weapons is an important step towards delegitimizing nuclear war as an acceptable risk of modern civilization. Though the treaty will not have the power to eliminate existing nuclear weapons, it provides a vision of a safer world, one that will require great purpose, persistence, and patience to make a reality. Nuclear catastrophe is one of the greatest existential threats facing society today, and we must dream in equal measure in order to imagine a world without these terrible weapons.”

Until now, nuclear weapons were the only weapons of mass destruction without a prohibition treaty, despite the widespread and catastrophic humanitarian consequences of their intentional or accidental detonation. Biological weapons were banned in 1972 and chemical weapons in 1992.

This treaty is a clear indication that the majority of the world no longer accepts nuclear weapons and does not consider them legitimate tools of war. The repeated objection and boycott of the negotiations by many nuclear-weapon states demonstrates that this treaty has the potential to significantly impact their behavior and stature. As has been true with previous weapon prohibition treaties, changing international norms leads to concrete changes in policies and behaviors, even in states not party to the treaty.

“This is a triumph for global democracy, where the pro-nuclear coalition of Putin, Trump and Kim Jong-Un were outvoted by the majority of Earth’s countries and citizens,” said MIT Professor and FLI President Max Tegmark.

“The strenuous and repeated objections of nuclear armed states is an admission that this treaty will have a real and lasting impact,” Fihn said.

The treaty also creates obligations to support the victims of nuclear weapons use (Hibakusha) and testing and to remediate the environmental damage caused by nuclear weapons.

From the beginning, the effort to ban nuclear weapons has benefited from the broad support of international humanitarian, environmental, nonproliferation, and disarmament organizations in more than 100 states. Significant political and grassroots organizing has taken place around the world, and many thousands have signed petitions, joined protests, contacted representatives, and pressured governments.

“The UN treaty places a strong moral imperative against possessing nuclear weapons and gives a voice to some 130 non-nuclear weapons states who are equally affected by the existential risk of nuclear weapons. … My hope is that this treaty will mark a sea change towards global support for the abolition of nuclear weapons. This global threat requires unified global action,” said Perry.

Fihn added, “Today the international community rejected nuclear weapons and made it clear they are unacceptable.It is time for leaders around the world to match their values and words with action by signing and ratifying this treaty as a first step towards eliminating nuclear weapons.”

 

Images courtesy of ICAN.

 

WHAT THE TREATY DOES

Comprehensively bans nuclear weapons and related activity. It will be illegal for parties to undertake any activities related to nuclear weapons. It bans the use, development, testing, production, manufacturing, acquiring, possession, stockpiling, transferring, receiving, threatening to use, stationing, installation, or deploying of nuclear weapons.  [Article 1]

Bans any assistance with prohibited acts. The treaty bans assistance with prohibited acts, and should be interpreted as prohibiting states from engaging in military preparations and planning to use nuclear weapons, financing their development and manufacture, or permitting the transit of them through territorial waters or airspace. [Article 1]

Creates a path for nuclear states which join to eliminate weapons, stockpiles, and programs. It requires states with nuclear weapons that join the treaty to remove them from operational status and destroy them and their programs, all according to plans they would submit for approval. It also requires states which have other country’s weapons on their territory to have them removed. [Article 4]

Verifies and safeguards that states meet their obligations. The treaty requires a verifiable, time-bound, transparent, and irreversible destruction of nuclear weapons and programs and requires the maintenance and/or implementation of international safeguards agreements. The treaty permits safeguards to become stronger over time and prohibits weakening of the safeguard regime. [Articles 3 and 4]

Requires victim and international assistance and environmental remediation. The treaty requires states to assist victims of nuclear weapons use and testing, and requires environmental remediation of contaminated areas. The treaty also obliges states to provide international assistance to support the implementation of the treaty. The text requires states to join the Treaty, and to encourage others to join, as well as to meet regularly to review progress. [Articles 6, 7, and 8]

NEXT STEPS

Opening for signature. The treaty will be open for signature on 20 September at the United Nations in New York. [Article 13]

Entry into force. Fifty states are required to ratify the treaty for it to enter into force.  At a national level, the process of ratification varies, but usually requires parliamentary approval and the development of national legislation to turn prohibitions into national legislation. This process is also an opportunity to elaborate additional measures, such as prohibiting the financing of nuclear weapons. [Article 15]

First meeting of States Parties. The first Meeting of States Parties will take place within a year after the entry into force of the Convention. [Article 8]

SIGNIFICANCE AND IMPACT OF THE TREATY

Delegitimizes nuclear weapons. This treaty is a clear indication that the majority of the world no longer accepts nuclear weapons and do not consider them legitimate weapons, creating the foundation of a new norm of international behaviour.

Changes party and non-party behaviour. As has been true with previous weapon prohibition treaties, changing international norms leads to concrete changes in policies and behaviours, even in states not party to the treaty. This is true for treaties ranging from those banning cluster munitions and land mines to the Convention on the law of the sea. The prohibition on assistance will play a significant role in changing behaviour given the impact it may have on financing and military planning and preparation for their use.

Completes the prohibitions on weapons of mass destruction. The treaty completes work begun in the 1970s, when Chemical weapons were banned, and the 1990s when biological weapons were banned.

Strengthens International Humanitarian Law (“Laws of War”). Nuclear weapons are intended to kill millions of civilians – non-combatants – a gross violation of International Humanitarian Law. Few would argue that the mass slaughter of civilians is acceptable and there is no way to use a nuclear weapon in line with international law. The treaty strengthens these bodies of law and norm.

Remove the prestige associated with proliferation. Countries often seek nuclear weapons for the prestige of being seen as part of an important club. By more clearly making nuclear weapons an object of scorn rather than achievement, their spread can be deterred.

FLI sought to increase support for the negotiations from the scientific community this year. We organized an open letter signed by over 3700 scientists in 100 countries, including 30 Nobel Laureates. You can see the letter here and the video we presented recently at the UN here.

This post is a modified version of the press release provided by the International Campaign to Abolish Nuclear Weapons (ICAN).

FHI Quarterly Update (July 2017)

The following update was originally posted on the FHI website:

In the second 3 months of 2017, FHI has continued its work as before exploring crucial considerations for the long-run flourishing of humanity in our four research focus areas:

  • Macrostrategy – understanding which crucial considerations shape what is at stake for the future of humanity.
  • AI safety – researching computer science techniques for building safer artificially intelligent systems.
  • AI strategy – understanding how geopolitics, governance structures, and strategic trends will affect the development of advanced artificial intelligence.
  • Biorisk – working with institutions around the world to reduce risk from especially dangerous pathogens.

We have been adapting FHI to our growing size. We’ve secured 50% more office space, which will be shared with the proposed Institute for Effective Altruism. We are developing plans to restructure to make our research management more modular and to streamline our operations team.

We have gained two staff in the last quarter. Tanya Singh is joining us as a temporary administrator, coming from a background in tech start-ups. Laura Pomarius has joined us as a Web Officer with a background in design and project management. Two of our staff will be leaving in this quarter. Kathryn Mecrow is continuing her excellent work at the Centre for Effective Altruism where she will be their Office Manager. Sebastian Farquhar will be leaving to do a DPhil at Oxford but expects to continue close collaboration. We thank them for their contributions and wish them both the best!

Key outputs you can read

A number of co-authors including FHI researchers Katja Grace and Owain Evans surveyed hundreds of researchers to understand their expectations about AI performance trajectories. They found significant uncertainty, but the aggregate subjective probability estimate suggested a 50% chance of high-level AI within 45 years. Of course, the estimates are subjective and expert surveys like this are not necessarily accurate forecasts, though they do reflect the current state of opinion. The survey was widely covered in the press.

An earlier overview of funding in the AI safety field by Sebastian Farquhar highlighted slow growth in AI strategy work. Miles Brundage’s latest piece, released via 80,000 Hours, aims to expand the pipeline of workers for AI strategy by suggesting practical paths for people interested in the area.

Anders Sandberg, Stuart Armstrong, and their co-author Milan Cirkovic published a paper outlining a potential strategy for advanced civilizations to postpone computation until the universe is much colder, and thereby producing up to a 1030 multiplier of achievable computation. This might explain the Fermi paradox, although a future paper from FHI suggests there may be no paradox to explain.

Individual research updates

Macrostrategy and AI Strategy

Nick Bostrom has continued work on AI strategy and the foundations of macrostrategy and is investing in advising some key actors in AI policy. He gave a speech at the G30 in London and presented to CEOs of leading Chinese technology firms in addition to a number of other lectures.

Miles Brundage wrote a career guide for AI policy and strategy, published by 80,000 Hours. He ran a scenario planning workshop on uncertainty in AI futures. He began a paper on verifiable and enforceable agreements in AI safety while a review paper on deep reinforcement learning he co-authored was accepted. He spoke at Newspeak House and participated in a RAND workshop on AI and nuclear security.

Owen Cotton-Barratt organised and led a workshop to explore potential quick-to-implement responses to a hypothetical scenario where AI capabilities grow much faster than the median expected case.

Sebastian Farquhar continued work with the Finnish government on pandemic preparedness, existential risk awareness, and geoengineering. They are currently drafting a white paper in three working groups on those subjects. He is contributing to a technical report on AI and security.

Carrick Flynn began working on structuredly transparent crime detection using AI and encryption and attended EAG Boston.

Clare Lyle has joined as a research intern and has been working with Miles Brundage on AI strategy issues including a workshop report on AI and security.

Toby Ord has continued work on a book on existential risk, worked to recruit two research assistants, ran a forecasting exercise on AI timelines and continues his collaboration with DeepMind on AI safety.

Anders Sandberg is beginning preparation for a book on ‘grand futures’.  A paper by him and co-authors on the aestivation hypothesis was published in the Journal of the British Interplanetary Society. He contributed a report on the statistical distribution of great power war to a Yale workshop, spoke at a workshop on AI at the Johns Hopkins Applied Physics Lab, and at the AI For Good summit in Geneva, among many other workshop and conference contributions. Among many media appearances, he can be found in episodes 2-6 of National Geographic’s series Year Million.

AI Safety

Stuart Armstrong has made progress on a paper on oracle designs and low impact AI, a paper on value learning in collaboration with Jan Leike, and several other collaborations including those with DeepMind researchers. A paper on the aestivation hypothesis co-authored with Anders Sandberg was published.

Eric Drexler has been engaged in a technical collaboration addressing the adversarial example problem in machine learning and has been making progress toward a publication that reframes the AI safety landscape in terms of AI services, structured systems, and path-dependencies in AI research and development.

Owain Evans and his co-authors released their survey of AI researchers on their expectations of future trends in AI. It was covered in the New Scientist, MIT Technology Review, and leading newspapers and is under review for publication. Owain’s team completed a paper on using human intervention to help RL systems avoid catastrophe. Owain and his colleagues further promoted their online textbook on modelling agents.

Jan Leike and his co-authors released a paper on universal reinforcement learning, which makes fewer assumptions about its environment than most reinforcement learners. Jan is a research associate at FHI while working at DeepMind.

Girish Sastry, William Saunders, and Neal Jean have joined as interns and have been helping Owain Evans with research and engineering on the prevention of catastrophes during training of reinforcement learning agents.

Biosecurity

Piers Millett has been collaborating with Andrew Snyder-Beattie on a paper on the cost-effectiveness of interventions in biorisk, and the links between catastrophic biorisks and traditional biosecurity. Piers worked with biorisk organisations including the US National Academies of Science, the global technical synthetic biology meeting (SB7), and training for those overseeing Ebola samples among others.

Funding

FHI is currently in a healthy financial position, although we continue to accept donations. We expect to spend approximately £1.3m over the course of 2017. Including three new hires but no further growth, our current funds plus pledged income should last us until early 2020. Additional funding would likely be used to add to our research capacity in machine learning, technical AI safety and AI strategy. If you are interested in discussing ways to further support FHI, please contact Niel Bowerman.

Recruitment

Over the coming months we expect to recruit for a number of positions. At the moment, we are interested in applications for internships from talented individuals with a machine learning background to work in AI safety. We especially encourage applications from demographic groups currently under-represented at FHI.

Support Grows for UN Nuclear Weapons Ban

“Do you want to be defended by the mass murder of people in other countries?”

According to Princeton physicist Zia Mian, nuclear weapons are “fundamentally anti-democratic” precisely because citizens are never asked this question. Mian argues that “if you ask people this question, almost everybody would say, ‘No, I do not want you to incinerate entire cities and kill millions of women and children and innocent people to defend us.’”

With the negotiations to draft a treaty that would ban nuclear weapons underway at the United Nations, much of the world may be showing it agrees. Just this week, a resolution passed during a meeting of the United States Conference of Mayors calling for the US to “lower nuclear tensions,” to “redirect nuclear spending,” and to “support the ban treaty negotiations.”

And it’s not just the US Conference of Mayors supporting a reduction in nuclear weapons. In October of 2016, 123 countries voted to pursue these negotiations to draft a nuclear ban treaty. As of today, the international group, Mayors for Peace, has swelled to “7,295 cities in 162 countries and regions, with 210 U.S. members, representing in total over one billion people.” A movement by the Hibakusha – survivors of the bombs dropped on Hiroshima and Nagasaki – has led to a petition that was signed by nearly 3 million people in support of the ban. And this spring, over 3700 scientists from 100 countries signed an open letter in support of the ban negotiations.

Yet there are some, especially in countries that either have nuclear weapons or are willing to let nuclear weapons be used on their behalf, who worry that the ban treaty could have a destabilizing effect globally. Nuclear experts, scientists, and government leaders have all offered statements why they believe the world will be better off with this treaty.

The Ultimate Equalizer

“I support a ban on nuclear weapons because I know that a nuclear bomb is an equal opportunity destroyer.” -Congresswoman Barbara Lee.

Today’s nuclear weapons can be as much as 100 times bigger than the bomb dropped on Hiroshima, and just one would level a radius within a city that was miles wide, with the carnage outside the blast zone extending even further. This destruction would include the hospitals and health facilities that would be necessary to treat the injured.

As the US Conference of Mayors noted, “No national or international response capacity exists that would adequately respond to the human suffering and humanitarian harm that would result from a nuclear weapon explosion in a populated area, and [such] capacity most likely will never exist.”

And the threat of nuclear weapons doesn’t end with the area targeted. Climate scientist Alan Robock and physicist Brian Toon estimate that even a small, regional nuclear war could lead to the deaths of up to 1 billion people worldwide as global temperatures plummet and farms fail to grow enough food to feed the population.

Toon says, “If there were a full-scale conflict with all the nuclear weapons on the planet. Or a conflict just involving smaller countries with perhaps 100 small weapons. In either case, there’s an environmental catastrophe caused by the use of the weapons.”

Robock elaborates: “The smoke from the fires could cause a nuclear winter, if the US and Russia have a nuclear war, sentencing most of the people in the world to starvation. Even a very small nuclear war could produce tremendous climatic effects and disruption of the world’s food supplies. The only way to prevent this happening is to get rid of the weapons.”

 

 

Destabilization and Rising Political Tensions

Many of the concerns expressed by people hesitant to embrace a ban on nuclear weapons seem to revolve around the rising geopolitical tensions. It’s tempting to think that certain people or countries may be at more risk from nuclear weapons, and it’s equally tempting to think that living in a country with nuclear weapons will prevent others from attacking.

“The key part of the problem is that most people I know think nuclear weapons are scary but kind of cool at the same time because they keep us safe, and that’s just a myth.” -MIT physicist Max Tegmark

Among other things, heightened tensions actually increase the risk of an accidental nuclear attack, as almost happened many times during the Cold War.

Nuclear physicist Frank von Hippel says, “My principal concern is that they’ll be used by accident as a result of false warning or even hacking. … At the moment, [nuclear weapons are] in a ‘launch on warning’ posture. The US and Russia are sort of pointed at each other. That’s an urgent problem, and we can’t depend on luck indefinitely.”

“Launch on warning” means that either leader would have roughly 10-12 minutes to launch what they think is a retaliatory nuclear attack, which doesn’t leave much time to confirm that warning signals are correct and not just some sort of computer glitch.

Many people often misinterpret the ban as requiring unilateral disarmament. However, the purpose of the ban is to make weapons that cause these indiscriminate and inhumane effects illegal — and set the stage for all countries to disarm.

Tegmark explains, “The UN treaty … will create stigma, which, as a first step, will pressure countries to slash their excessive arsenals down to the minimal size needed for deterrence.”

For example, the United States has not signed the Mine Ban Treaty because they still maintain landmines along the border between North and South Korea, but the stigma of the treaty helped lead the U.S. to pledge to give up most of its landmines.

North Korea also comes up often as a reason countries, and specifically the U.S., can’t decrease their nuclear arsenals. When I asked Mian about this, his response was: “North Korea has 10 nuclear weapons. The United States has 7,000. That’s all there is to say.”

The Pentagon has suggested that the U.S. could ensure deterrence with about 300 nuclear weapons. That would be a mere 4% of our current nuclear arsenal, and yet it would still be 30 times what North Korea has.

The Non-Proliferation Treaty

Many people have said that they fear a new treaty that bans nuclear weapons outright could undermine the Non-Proliferation Treaty (NPT), but supporters of the ban insist that the new ban would work in conjunction with the NPT. However supporters have also expressed frustration with what they see as failings of the NPT.

Lawrence Krauss, physicist and board member for the Bulletin of Atomic Scientists explains, “190 countries have already adhered to the non-proliferation treaty. But in fact we are not following the guidelines of that treaty, which says that the nuclear state should do everything they can do disarm. And, we’re violating that right now.”

Lisbeth Gronlund, a physicist and nuclear expert with the Union of Concerned Scientists adds, “The nuclear non-proliferation treaty has two purposes, and it has succeeded at preventing other states from getting nuclear weapons. It has failed in its second purpose, which is getting the nuclear weapons states to disarm. I support the ban treaty because it will pressure the nuclear weapons states to do what they are already obligated to do.”

Money

Maintaining nuclear arsenals is incredibly expensive, and now the U.S. is planning to spend $1.2 trillion to upgrade its arsenal (this doesn’t take into account the money that other nuclear countries are also putting into their own upgrades).

Jonathan King, a biologist and nuclear expert says, “Very few people realize that it’s their tax dollars that pay for the development and maintenance of these weapons – billions and billions of dollars a year. The cost of one year of maintaining nuclear weapons is equivalent to the entire budget of the National Institute of Health responsible for research on all of the diseases that afflict Americans: heart disease, stroke, Alzheimer’s, arthritis, diabetes. It’s an incredible drain of national resources.”

William Hartung, a military spending expert, found that it would be more cost effective to just burn $1 million every hour for the next 30 years.

Final Thoughts

“Today, the United Nations is considering a ban on nuclear weapons. The political effect of that ban is by no means clear. But the moral effect is quite clear. What we are saying is there ought to be a ban on nuclear weapons.” –Former Secretary of Defense, William Perry.

Beatrice Fihn is the Executive Director of ICAN, which has helped initiate and mobilize support for the nuclear ban treaty from the very beginning, bringing together 450 organizations from over 100 countries.

“Nuclear weapons are intended to kill civilians by the millions,” Fihn points out. “Civilized people no longer believe that is acceptable behavior. It is time to place nuclear weapons alongside chemical and biological weapons, as relics we have evolved beyond. Banning these weapons in international law is a logical first step to eliminating them altogether, and we’re almost there.”

 

U.S. Conference of Mayors Unanimously Adopts Mayors for Peace Resolution

U.S. Conference of Mayors Unanimously Adopts Mayors for Peace Resolution Calling on President Trump to Lower Nuclear Tensions, Prioritize Diplomacy, and Redirect Nuclear Weapons Spending to meet Human Needs and Address Environmental Challenges


Conference also Adopts Two Additional Resolutions Calling for Reversal of Military Spending to Meet the Needs of Cities

Miami Beach, FL – At the close of its 85th Annual Meeting on Monday June 26, 2017, the United States Conference of Mayors (USCM), for the 12th consecutive year, adopted a strong resolution put forward by Mayors for Peace. The resolution, “Calling on President Trump to Lower Nuclear Tensions, Prioritize Diplomacy, and Redirect Nuclear Weapons Spending to meet Human Needs and Address Environmental Challenges,” was sponsored by Mayors for Peace Lead U.S. Mayor Frank Cownie of Des Moines, Iowa and 19 co-sponsors (full list below).

Mayor Cownie, addressing the International Affairs Committee of the USCM, quoted from the resolution: “This is an unprecedented moment in human history. The world has never faced so many nuclear flashpoints simultaneously. From NATO-Russia tensions, to the Korean Peninsula, to South Asia and the South China Sea and Taiwan — all of the nuclear-armed states are tangled up in conflicts and crises that could catastrophically escalate at any moment.”

“At the same time,” he noted, “historic negotiations are underway right now in the United Nations, involving most of the world’s countries, on a treaty to prohibit nuclear weapons, leading to their total elimination. More than unfortunately, the U.S. and the other nuclear-armed nations are boycotting these negotiations. I was there in March and witnessed the start of the negotiations first hand.”

The opening paragraph of the resolution declares: “Whereas, the Bulletin of the Atomic Scientists has moved the hands of its ‘Doomsday Clock’ to 2.5 minutes to midnight – the closest it’s been since 1953, stating, ‘Over the course of 2016, the global security landscape darkened as the international community failed to come effectively to grips with humanity’s most pressing existential threats, nuclear weapons and climate change,’ and warning that, ‘Wise public officials should act immediately, guiding humanity away from the brink’.”

As Mayor Cownie warned: “Just the way the mayors responded to the current Administration pulling out of the Paris Climate Accord, we need to respond to the other existential threat.”

The USCM is the nonpartisan association of American cities with populations over 30,000. There are 1,408 such cities. Resolutions adopted at annual meetings become USCM official policy.

By adopting this resolution, the USCM (abbreviated points): 

  • Calls on the U.S. Government, as an urgent priority, to do everything in his power to lower nuclear tensions though intense diplomatic efforts with Russia, China, North Korea and other nuclear-armed states and their allies, and to work with Russia to dramatically reduce U.S. and Russian nuclear stockpiles;
  • Welcomes the historic negotiations currently underway in the United Nations, involving most of the world’s countries, on a treaty to prohibit nuclear weapons, leading to their total elimination, and expresses deep regret that the U.S. and the other nuclear-armed states are boycotting these negotiations;
  • Calls on the U.S. to support the ban treaty negotiations as a major step towards negotiation of a comprehensive agreement on the achievement and permanent maintenance of a world free of nuclear arms, and to initiate, in good faith, multilateral negotiations to verifiably eliminate nuclear weapons within a timebound framework;
  • Welcomes the Restricting First Use of Nuclear Weapons Act of 2017introduced in both houses of Congress, that would prohibit the President from launching a nuclear first strike without a declaration of war by Congress;
  • Calls for the Administration’s new Nuclear Posture Review to reaffirm the stated U.S. goal of the elimination of nuclear weapons, to lessen U.S. reliance on nuclear weapons, and to recommend measures to reduce nuclear risks;
  • Calls on the President and Congress to reverse federal spending priorities and to redirect funds currently allocated to nuclear weapons and unwarranted military spending to restore full funding for Community Block Development Grants and the Environmental Protection Agency, to create jobs by rebuilding our nation’s crumbling infrastructure, and to ensure basic human services for all, including education, environmental protection, food assistance, housing and health care; and
  • Urges all U.S. mayors to join Mayors for Peace in order to help reach the goal of 10,000 member cities by 2020, and encourages U.S. member cities to get actively involved by establishing sister city relationships with cities in other nuclear-armed nations, and by taking action at the municipal level to raise public awareness of the humanitarian and financial costs of nuclear weapons, the growing dangers of wars among nuclear-armed states, and the urgent need for good faith U.S. participation in negotiating the global elimination of nuclear weapons.

Mayors for Peace, founded in 1982, is led by the Mayors of Hiroshima and Nagasaki. Since 2003 it has been calling for the global elimination of nuclear weapons by 2020. Mayors for Peace membership has grown exponentially, as of June 1, 2017 counting 7,335 cities in 162 countries including 211 U.S. members, representing more than one billion people.

The 2017 Mayors for Peace USCM resolution additionally “welcomes resolutions adopted by cities including New Haven, CT, Charlottesville, VA, Evanston, IL, New London, NH, and West Hollywood, CA urging Congress to cut military spending and redirect funding to meet human and environmental needs”.

The USCM on June 16, 2017 also unanimously adopted two complimentary resolutions: Opposition to Military Spending, sponsored by Mayor Svante L. Myrick of Ithaca New York; and Calling for Hearings on Real City Budgets Needed and the Taxes our Cities Send to the Federal Military Budget, sponsored by Mayor Toni Harp of New Haven Connecticut, a member of Mayors for Peace. These two resolutions are posted at http://legacy.usmayors.org/resolutions/85th_Conference/proposedcommittee.asp?committee=Metro Economies (scroll down).

The full text of the Mayors for Peace resolution with the list of 20 sponsors is posted at http://wslfweb.org/docs/2017MfPUSCMres.pdf

Official version (scroll down):  http://legacy.usmayors.org/resolutions/85th_Conference/proposedcommittee.asp?committee=International Affairs

The 2017 Mayors for Peace USCM resolution was sponsored by: T. M. Franklin Cownie, Mayor of Des Moines, IA; Alex Morse, Mayor of Holyoke, MA; Roy D. Buol, Mayor of Dubuque, IA; Nan Whaley, Mayor of Dayton, OH; Paul Soglin, Mayor of Madison, WI; Geraldine Muoio, Mayor of West Palm Beach, FL; Lucy Vinis, Mayor of Eugene, OR; Chris Koos, Mayor of Normal, IL; John Heilman, Mayor of West Hollywood, CA; Pauline Russo Cutter, Mayor of San Leandro, CA; Salvatore J. Panto, Jr., Mayor of Easton, PA; John Dickert, Mayor of Racine, WI; Ardell F. Brede, Mayor of Rochester, MN; Helene Schneider, Mayor of Santa Barbara, CA; Frank Ortis, Mayor of Pembroke Pines, FL; Libby Schaaf, Mayor of Oakland, CA; Mark Stodola, Mayor of Little Rock, AK; Patrick L. Wojahn, Mayor of College Park, MD; Denny Doyle, Mayor of Beaverton, OR; Patrick J. Furey, Mayor of Torrance, CA

Using History to Chart the Future of AI: An Interview with Katja Grace

The million-dollar question in AI circles is: When? When will artificial intelligence become so smart and capable that it surpasses human beings at every task?

AI is already visible in the world through job automation, algorithmic financial trading, self-driving cars and household assistants like Alexa, but these developments are trivial compared to the idea of artificial general intelligence (AGI) – AIs that can perform a broad range of intellectual tasks just as humans can. Many computer scientists expect AGI at some point, but hardly anyone agrees on when it will be developed.

Given the unprecedented potential of AGI to create a positive or destructive future for society, many worry that humanity cannot afford to be surprised by its arrival. A surprise is not inevitable, however, and Katja Grace believes that if researchers can better understand the speed and consequences of advances in AI, society can prepare for a more beneficial outcome.

 

AI Impacts

Grace, a researcher for the Machine Intelligence Research Institute (MIRI), argues that, while we can’t chart the exact course of AI’s improvement, it is not completely unpredictable. Her project AI Impacts is dedicated to identifying and conducting cost-effective research projects that can shed light on when and how AI will impact society in the coming years. She aims to “help improve estimates of the social returns to AI investment, identify neglected research areas, improve policy, or productively channel public interest in AI.”

AI Impacts asks such questions as: How rapidly will AI develop? How much advanced notice should we expect to have of disruptive change? What are the likely economic impacts of human-level AI? Which paths to AI should be considered plausible or likely? Can we say anything meaningful about the impact of contemporary choices on long-term outcomes?

One way to get an idea of these timelines is to ask the experts. In AI Impacts’ 2015 survey of 352 AI researchers, these researchers predicted a 50 percent chance that AI will outcompete humans in almost everything by 2060. However the experts also answered a very similar question with a date seventy-five years later, and gave a huge range of answers individually, making it difficult to rule anything out. Grace hopes her research with AI Impacts will inform and improve these estimates.

 

Learning from History

Some thinkers believe that AI could progress rapidly, without much warning. This is based on the observation that algorithms don’t need factories, and so could in principle progress at the speed of a lucky train of thought.

However, Grace argues that while we have not developed human-level AI before, our vast experience developing other technologies can tell us a lot about what will happen with AI. Studying the timelines of other technologies can inform the AI timeline.

In one of her research projects, Grace studies jumps in technological progress throughout history, measuring these jumps in terms of how many years of progress happen in one ‘go’. “We’re interested in cases where more than a decade of progress happens in one go,” she explains. “The case of nuclear weapons is really the only case we could find that was substantially more than 100 years of progress in one go.”

For example, physicists began to consider nuclear energy in 1939, and by 1945 the US successfully tested a nuclear weapon. As Grace writes, “Relative effectiveness [of explosives] doubled less than twice in the 1100 years prior to nuclear weapons, then it doubled more than eleven times when the first nuclear weapons appeared. If we conservatively model previous progress as exponential, this is around 6000 years of progress in one step [compared to] previous rates.”

Grace also considered the history of high-temperature superconductors. Since the discovery of superconductors in 1911, peak temperatures for superconduction rose slowly, growing from 4K (Kelvin) initially to about 30K in the 1980s. Then in 1986, scientists discovered a new class of ceramics that increased the maximum temperature to 130K in just seven years. “That was close to 100 years of progress in one go,” she explains.

Nuclear weapons and superconductors are rare cases – most of the technologies that Grace has studied either don’t demonstrate discontinuity, or only show about 10-30 years of progress in one go. “The main implication of what we have done is that big jumps are fairly rare, so that should not be the default expectation,” Grace explains.

Furthermore, AI’s progress largely depends on how fast hardware and software improve, and those are processes we can observe now. For instance, if hardware progress starts to slow from its long run exponential progress, we should expect AI later.

Grace is currently investigating these unknowns about hardware. She wants to know “how fast the price of hardware is decreasing at the moment, how much hardware helps with AI progress relative to e.g. algorithmic improvements, and how custom hardware matters.”

 

Intelligence Explosion

AI researchers and developers must also be prepared for the possibility of an intelligence explosion – the idea that strong AI will improve its intelligence faster than humans could possibly understand or control.

Grace explains: “The thought is that once the AI becomes good enough, the AI will do its own AI research (instead of humans), and then we’ll have AI doing AI research where the AI research makes the AI smarter and then the AI can do even better AI research. So it will spin out of control.”

But she suggests that this feedback loop isn’t entirely unpredictable. “We already have intelligent [people] doing AI research that leads to better capabilities,” Grace explains. “We don’t have a perfect idea of what those things will be like when the AI is as intelligent as humans or as good at AI research, but we have some evidence about it from other places and we shouldn’t just be saying the spinning out of control could happen at any speed. We can get some clues about it now. We can say something about how many extra IQ points of AI you get for a year of research or effort, for example.”

AI Impacts is an ongoing project, and Grace hopes her research will find its way into conversations about intelligence explosions and other aspects of AI. With better-informed timeline estimates, perhaps policymakers and philanthropists can more effectively ensure that advanced AI doesn’t catch humanity by surprise.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

MIRI’s June 2017 Newsletter

Research updates

General updates

News and links

Artificial Intelligence and the Future of Work: An Interview With Moshe Vardi

“The future of work is now,” says Moshe Vardi. “The impact of technology on labor has become clearer and clearer by the day.”

Machines have already automated millions of routine, working-class jobs in manufacturing. And now, AI is learning to automate non-routine jobs in transportation and logistics, legal writing, financial services, administrative support, and healthcare.

Vardi, a computer science professor at Rice University, recognizes this trend and argues that AI poses a unique threat to human labor.

 

Initiating a Policy Response

From the Luddite movement to the rise of the Internet, people have worried that advancing technology would destroy jobs. Yet despite painful adjustment periods during these changes, new jobs replaced old ones, and most workers found employment. But humans have never competed with machines that can outperform them in almost anything. AI threatens to do this, and many economists worry that society won’t be able to adapt.

“What people are now realizing is that this formula that technology destroys jobs and creates jobs, even if it’s basically true, it’s too simplistic,” Vardi explains.

The relationship between technology and labor is more complex: Will technology create enough jobs to replace those it destroys? Will it create them fast enough? And for workers whose skills are no longer needed – how will they keep up?

To address these questions and consider policy responses, Vardi will hold a summit in Washington, D.C. on December 12, 2017. The summit will address six current issues within technology and labor: education and training, community impact, job polarization, contingent labor, shared prosperity, and economic concentration.

Education and training

A 2013 computerization study found that 47% of American workers held jobs at high risk of automation in the next decade or two. If this happens, technology must create roughly 100 million jobs.

As the labor market changes, schools must teach students skills for future jobs, while at-risk workers need accessible training for new opportunities. Truck drivers won’t transition easily to website design and coding jobs without proper training, for example. Vardi expects that adapting to and training for new jobs will become more challenging as AI automates a greater variety of tasks. 

Community impact

Manufacturing jobs are concentrated in specific regions where employers keep local economies afloat. Over the last thirty years, the loss of 8 million manufacturing jobs has crippled Rust Belt regions in the U.S. – both economically and culturally.

Today, the fifteen million jobs that involve operating a vehicle are concentrated in certain regions as well. Drivers occupy up to 9% of jobs in the Bronx and Queens districts of New York City, up to 7% of jobs in select Southern California and Southern Texas districts, and over 4% in Wyoming and Idaho. Automation could quickly assume the majority of these jobs, devastating the communities that rely on them.

Job polarization

“One in five working class men between ages 25 to 54 without college education are not working,” Vardi explains. “Typically, when we see these numbers, we hear about some country in some horrible economic crisis like Greece. This is really what’s happening in working class America.”

Employment is currently growing in high-income cognitive jobs and low-income service jobs, such as elderly assistance and fast-food service, which computers cannot automate yet. But technology is hollowing out the economy by automating middle-skill, working-class jobs first.

Many manufacturing jobs pay $25 per hour with benefits, but these jobs aren’t easy to come by. Since 2000, when millions of these jobs disappeared, displaced workers have either left the labor force or accepted service jobs that often pay $12 per hour, without benefits.

Truck driving, the most common job in over half of US states, may see a similar fate.

Source: IPUMS-CPS/ University of Minnesota Credit: Quoctrung Bui/NPR

 

Contingent labor

Increasingly, communications technology allows firms to save money by hiring freelancers and independent contractors instead of permanent workers. This has created the Gig Economy – a labor market characterized by short-term contracts and flexible hours at the cost of unstable jobs with fewer benefits. By some estimates, in 2016, one in three workers were employed in the Gig Economy, but not all by choice. Policymakers must ensure that this new labor market supports its workers.

Shared prosperity

Automation has decoupled job creation from economic growth, allowing the economy to grow while employment and income shrink, thus increasing inequality. Vardi worries that AI will accelerate these trends. He argues that policies encouraging economic growth must also support economic mobility for the middle class.

Economic concentration

Technology creates a “winner-takes-all” environment, where second best can hardly survive. Bing search is quite similar to Google search, but Google is much more popular than Bing. And do Facebook or Amazon have any legitimate competitors?

Startups and smaller companies struggle to compete with these giants because of data. Having more users allows companies to collect more data, which machine-learning systems then analyze to help companies improve. Vardi thinks that this feedback loop will give big companies long-term market power.

Moreover, Vardi argues that these companies create relatively few jobs. In 1990, Detroit’s three largest companies were valued at $65 billion with 1.2 million workers. In 2016, Silicon Valley’s three largest companies were valued at $1.5 trillion but with only 190,000 workers.

 

Work and society

Vardi primarily studies current job automation, but he also worries that AI could eventually leave most humans unemployed. He explains, “The hope is that we’ll continue to create jobs for the vast majority of people. But if the situation arises that this is less and less the case, then we need to rethink: how do we make sure that everybody can make a living?”

Vardi also anticipates that high unemployment could lead to violence or even uprisings. He refers to Andrew McAfee’s closing statement at the 2017 Asilomar AI Conference, where McAfee said, “If the current trends continue, the people will rise up before the machines do.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.