Podcast: Nuclear Dilemmas, From North Korea to Iran

With the U.S. pulling out of the Iran deal and canceling (and potentially un-canceling) the summit with North Korea, nuclear weapons have been front and center in the news this month. But will these disagreements lead to a world with even more nuclear weapons? And how did the recent nuclear situations with North Korea and Iran get so tense? (Update: The North Korea summit happened! But to understand what the future might look like with North Korea and Iran, it’s still helpful to understand the past.)

To learn more about the geopolitical issues surrounding North Korea’s and Iran’s nuclear situations, as well as to learn how nuclear programs in these countries are monitored, Ariel spoke with Melissa Hanham and Dave Schmerler on this month’s podcast. Melissa and Dave are both nuclear weapons experts with the Center for Nonproliferation Studies at Middlebury Institute of International Studies, where they research weapons of mass destruction with a focus on North Korea. Topics discussed in this episode include:

  • the progression of North Korea’s quest for nukes,
  • what happened and what’s next regarding the Iran deal,
  • how to use open-source data to monitor nuclear weapons testing, and
  • how younger generations can tackle nuclear risk.

In light of the on-again/off-again situation regarding the North Korea Summit, Melissa sent us a quote after the podcast was recorded, saying:

“Regardless of whether the summit in Singapore takes place, we all need to set expectations appropriately for disarmament. North Korea is not agreeing to give up nuclear weapons anytime soon. They are interested in a phased approach that will take more than a decade, multiple parties, new legal instruments, and new technical verification tools.”

Links you might be interested in after listening to the podcast:

You can listen to the podcast above or read the transcript below.

 

Ariel: Hello. I am Ariel Conn with the Future of Life Institute. This last month has been a rather big month concerning nuclear weapons, with the US pulling out of the Iran deal and the on again off again summit with North Korea.

I have personally been doing my best to keep up with the news but I wanted to learn more about what’s actually going on with these countries, some of the history behind the nuclear weapons issues related to these countries, and just how big a risk nuclear programs in these countries could become.

Today I have with me Melissa Hanham and Dave Schmerler, who are nuclear weapons experts with the Center for Nonproliferation Studies at Middlebury Institute of International Studies. They both research weapons of mass destruction with a focus on North Korea. Melissa and Dave, thank you so much for joining us today.

Dave: Thanks for having us on.

Melissa: Yeah, thanks for having us.

Ariel: I just said that you guys are both experts in North Korea, so naturally what I want to do is start with Iran. That has been the bigger news story of the two countries this month because the US did just pull out of the Iran deal. Before we get any further, can you just, if it’s possible, briefly explain what was the Iran deal first? Then we’ll get into other questions about it.

Melissa: Sure. The Iran deal was an agreement made between the … It’s formally known as the JCPOA and it was an agreement made between Iran and several countries around the world including the European Union as well. The goal was to freeze Iran’s nuclear program before they achieved nuclear weapons while still allowing them civilian access to medical isotopes, and power, and so on.

At the same time, the agreement would be that the US and others would roll back sanctions on Iran. The way that they verified that agreement was through a procurement channel, if-needed onsite inspections, and regular reporting from Iran. As you mentioned, the US has withdrawn from the Iran deal, which is really just, they have violated the terms of the Iran deal, and Iran and European Union and others have said that they wish to continue in the JCPOA.

Ariel: If I’ve been reading correctly, the argument on the US side is that Iran wasn’t holding up their side of the bargain. Was there actually any evidence for that?

Dave: I think the American side for pulling out was more based on them lying about having a nuclear weapons program at one point in time, leading up to the deal, which is strange, because that was the motivation for the deal in the first place, was to stop them from continuing their nuclear weapons, their research and investment. So, I’m not quite sure how else to frame it outside of that.

Melissa: Yeah, Israeli President Netanyahu, made this presentation where he revealed all these different archived documents in Iran, and mostly what they indicated was that Iran had an ongoing nuclear weapons program before the JCPOA, which is what we knew, and that they were planning on executing that program. For people like me, I felt like that was the justification for the JCPOA in the first place.

Ariel: And so, you both deal a lot with, at least Melissa I know you deal a lot with monitoring. Dave, I believe you do, too. With something like the Iran deal, if we had continued with it, what is the process involved in making sure the weapons aren’t being created? How do we monitor that?

Melissa: It’s a really difficult multilayered technical and legal proposition. You have to get the parties involved to agree to the terms, and then you have to be able to technically and logistically implement the terms. In the Iran deal, there were some things that were included and some things that were not included. Not because it was not technically possible, but because Iran or the other parties would not agree to it.

It’s kind of a strange marriage between diplomacy and technology, in order to execute these agreements. One of the criticisms of the Iran deal was that missiles weren’t included, so sure enough, Dave was monitoring many, many missile launches, and our colleague, Shea Cotton, even made a database of North Korean missile launches, and Americans really hated that Iran was launching these missiles, and we could see that they were happening. But the bottom line was that they were not part of the JCPOA agreement. That agreement focused only on nuclear, and the reason it did was because Iran refused to include missiles or human rights and these other kinds of things.

Dave: That’s right. Negotiating Iran’s missile program is a bit of another issue entirely. Iran’s missile program began before their nuclear program did. It’s accelerated, development has corresponded to their own security concerns within the region, and they have at the moment, a conventional ballistic missile force. The Iranians look at that program as being a completely different issue.

Ariel: Just quickly, how do you monitor a missile test? What’s involved in that? What do you look for? How can you tell they’re happening? Is it really obvious, or is there some sort of secret data you access?

Dave: A lot of the work that we do — Melissa and I, Shea Cotton, Jeffrey Lewis, and some other colleagues — is entirely based on information from the public. It’s all open source research, so if you know what you’re looking for, you can pull all the same information that we do from various sources of free information. The Iranians will often put propaganda or promo videos of their missile tests and launches as a way to demonstrate that they’re becoming a more sophisticated, technologically modern, ballistic missile producing nation.

We also get reports from the US government that are published in news sources. Whether from the US government themselves, or from reporters who have connections or access to the inside, and we take all this information, and Melissa will probably speak to this a bit further, but we fuse it together with satellite imagery of known missile test locations. We’ll reconstruct a much larger, more detailed chain of events as to what happened when Iran does missile testing.

Melissa: I have to admit, there’s just more open source information available about missile tests, because they’re so spread out over large areas and they have very large physical attributes to the sites, and of course, something lights up and ignites, and it takes off into the air where everyone can see it. So, monitoring a missile launch is easier than monitoring a specific facility in a larger network of facilities, for a nuclear program.

Ariel: So now that Trump has pulled out of the Iran deal, what happens next with them?

Melissa: Well, I think it’s probably a pretty bad sign. What I’ve heard from colleagues who work in or around the Trump administration is that confidence was extremely high on progress with North Korea, and so they felt that they didn’t need the Iran deal anymore. And in part, the reason that they violated it was because they felt that they had so much already going in North Korea, and those hopes were really false. There was a huge gap between reality and those hopes. It can be frustrating as an open source analyst who says these things all the time on Twitter, or in reports, that clearly nobody reads them. But no, things are not going well in North Korea. North Korea is not unilaterally giving over their nuclear weapons, and if anything, violating the Iran deal has made North Korea more suspicious of the US.

Ariel: I’m going to use that to transition to North Korea here in just a minute, but I guess I hadn’t realized that there was a connection between things seeming to go well in North Korea and the US pulling out of the Iran deal. You talk about hopes that the Iran deal is now necessary for North Korea, but what is the connection there? How does that work?

Melissa: Well, so the Iran deal represented diplomatic negotiation with an outcome among many parties that came to a concrete result. It happened under the Obama administration, which I think is why there is some distaste for it under the Trump administration. That doesn’t matter to North Korea. That doesn’t matter to other states. What matters is whether the United States appears to be able to follow through on a promise that may pass one administration to another.

The US has in a way, violated some norms about diplomatic behavior, by withdrawing from this agreement. That’s not to say that the US hasn’t done it before. I remember Clinton signing the, I think Rome Treaty, for the International Criminal Accord, then Bush unsigning it, it never got ratified. But it’s bad for our reputation. It makes us look like we’re not using international law the way other countries expect us to.

Ariel: All right. So before we move officially to North Korea, is there anything else, Melissa and Dave, that either of you want to mention about Iran that you think is either important for people to know about, that they don’t already, or that is important to reiterate?

Melissa: No. I guess let’s go to North Korea. That’s our bread and butter.

Ariel: All right. Okay, so yeah, North Korea’s been in the news for a while now. Before we get to what’s going on right now, I was hoping you could both talk a little bit about some of the background with North Korea, and how we got to this point. North Korea was once part of the Non-Proliferation Treaty, and they pulled out. Why were they in it in the first place? What prompted them to pull out? We’ll go from there.

Melissa: Okay, I’ll jump in, although Dave should really tell me if I keep talking over him. North Korea withdrew from the NPT, or so it said. It’s actually diplomatically very complex what they did, but North Korea either was or is a member of the Nuclear Non-Proliferation Treaty, the NPT, depending on who you ask. That is in large part because they were, and then they announced their withdrawal in 2003, and eventually we no longer think of them as officially being a member of the NPT, but of course, there were some small gaps over the notification period that they gave in order to withdraw, so I think my understanding is that some of the organizations involved actually keep a little North Korean nameplate for them.

But no, we don’t really think of them as being a member of an NPT, or IAEA. Sadly, while that may not be a legally settled, they’re out, they’re not abiding by traditional regimes or norms on this issue.

Ariel: And can you talk a little bit about, or do we know what prompted them to withdraw?

Melissa: Yeah. I think they really, really wanted nuclear weapons. I mean, I’m sorry to be glib about it, but … Yeah, they were seeking nuclear weapons since the ’50s. Kim Il-sung said he wanted nuclear weapons, he saw the power of the US’ weapons that were dropped on Japan. The US threatened North Korea during the Korean War with use of nuclear weapons, so yeah, they had physicists working on this issue for a long time.

They joined the NPT, they wanted access to the peaceful uses of nuclear power, they were very duplicitous in their work, but no, they kept working towards nuclear weapons. I think they reached a point where they probably thought that they had the technical capability, and they were dissatisfied with the norms and status as a pariah state, so yeah, they announced they were withdrawing, and then they exploded something three years later.

Ariel: Now that they’ve had a program in place then I guess for, what? Roughly 15 years then?

Melissa: Oh, my gosh. Math. Yeah. No, so I was sitting in Seoul. Dave, do you remember where you were when they had their first nuclear test?

Dave: This was-

Melissa: 2006.

Dave: A long time ago. I think I was still in high school.

Melissa: I mean, this is a challenge to our whole field, right? Is that there are generations passing through, so there are people who remember 1945. I don’t. But I’m not going to reveal my age. I was fresh out of grad school, and working in Seoul when North Korea tested its first nuclear device.

It was like cognitive dissonance around the world. I remember the just shock of the response out of pretty much every country. I think China had a few minutes notice ahead of everybody else, but not much. So yes, we did see the reactor getting built, yes, we did see activity happening at Yongbyon, no we deeply misunderstood and underestimated North Korea’s capabilities.

So, when that explosion happened, it was surprising, to people in the open source anyways. People scrambled. I mean, that was my first major gig. That’s why I still do this today, was we had an office at the International Crisis Group, of about six people, and all our Korean speakers were immediately sucked into other responsibilities, and so it was up to me to try to take out all these little puzzle pieces, about the seismic information, about the radionuclides that were actually leaked in that first explosion, and figure out what a Constant Phoenix was, and who was collecting what, and put it all together to try to understand what kind of warhead that they may or may not have exploded, if it was even a warhead at that point.

Ariel: I’m hoping that you can explain how monitoring works. I’m an ex-seismologist, so I actually do know a little bit about the seismic side of monitoring nuclear weapons testing, but I’m assuming a lot of listeners do not. I’m not as familiar with things like the radionuclide testing, or the Phoenix that you mentioned was a new phrase for me as well. I was hoping you could explain what you go through to monitor and confirm whether or not a nuclear weapon has been tested, and before you do that real quick — so did you actually see that first … Could you see the explosion?

Melissa: No. I was in Seoul, so I was a long ways away, and I didn’t really … Of course, I did not see or feel anything. I was in an office in downtown Seoul, so I remember actually how casual the citizens of Seoul were that day. I remember feeling kind of nervous about the whole thing. I was registered with the Canadian embassy in Seoul, and we actually had, when you registered with the embassy, we had instructions of what to do in case of an emergency.

I remember thinking, “Gosh, I wonder if this is an emergency,” because I was young and fresh out of school. But no, I mean, as I looked down out of our office windows, sure enough at noon, the doors opened up and all my Korean colleagues streamed out to lunch together, and really behaved pretty traditionally, the way everyone normally does.

South Koreans have always been very stoic about these tests, and I think they’re taken more anxiously by foreigners like me. But I do also remember there were these aerial sirens going off that day, and I actually never got an explanation of why there were sirens going off that day. I remember they tested them when I lived there, but I’m not sure why the sirens were going off that day.

Ariel: Okay. Let’s go back to how the monitoring works, and Dave, I don’t know if this is something that you can also jump in on?

Dave: Yeah, sure. I think I’ll let Melissa start and I’ll try to fill in any gaps, if there are any.

Melissa: So, the Comprehensive Test Ban Treaty Organization is an organization based in Vienna, but they have stations all over the world, and they’re continually monitoring for nuclear explosions. The Constant Phoenix is a WC-135. It’s a US Air Force vehicle, and so the information coming out of it is not open source and I don’t get to see it, but what I can do, or what journalists, investigative journalists sometimes do, is, say, when it’s taking off from Guam, or an Air Force Base, and then I know at least that the US Air Force is thinking it’s going to be sensing something, so this is like a specialty vehicle. I mean, it’s basically an airplane, but it has many, many interesting sensor arrays all over it that sniff the air. What they’re trying to detect are xenon isotopes, and these are isotopes that are possibly released from an underground nuclear test, depending on how well the tunnel was sealed.

In that very first nuclear explosion in 2006, some noble gases were released and I think that they were detected by the WC-135. I also remember back then, although this was a long time ago, that there were a few sensing stations in South Korea that detected them as well. What I remember from that time is that the ratio of xenon isotopes was definitely telling us that this was a nuclear weapon. This wasn’t like a big hoax that they’d exploded a bunch of dynamite or something like that, which actually would be a really big hoax, and hard to pull off. But we could see that it was a nuclear test, it was probably a fission device. The challenge with detecting these gases is that they decay very quickly, so we have, 1) not always sensed radionuclides after North Korea’s nuclear tests, and, 2) if we do sense them, sometimes they’re decayed enough that we can’t get anything more than it was a nuclear test, and not a chemical explosion test.

Dave: Yeah, so I might be able to offer, because Melissa did a great job of explaining how the process works, is maybe a bit more of a recent mechanism and how we interact with these tests as they occur. Usually most of the people in our field follow a set number of seismic-linked Twitter accounts that will give you updates on when some part of the world is shaking for some reason or another.

They’ll put a tweet or maybe you’ll get an email update saying, “There was an earthquake in California,” because we get earthquakes all the time, or in Japan. Then, all of a sudden you hear there’s an earthquake in North Korea and everyone pauses. You look at this little tweet, I guess, or email, you can also get them sent to your phone via text message, if you sign up for whichever region of the world you’re interested in, and you look for what province was this earthquake in?

If it registers in the right province, you’re like, “Okay.” What’s next is we’ll look at the data that comes out immediately. CTBTO will come out with information, usually within a couple of days, if not immediately after, and we’ll look at the seismic waves. While I don’t study these waves, the type of seismic signature you get from a nuclear explosion is like a fingerprint. It’s very unique and different from the type of seismic signature you get from an earthquake of varying degrees.

We’ll take that and compare those to previous tests, which the United States and Russia have done infinitely more than any other country in the world. And we’ll see if those match. And as North Korea has tested more nuclear devices, the signatures started coming more consistent. If that matches up, we’ll have a soft confirmation that they did it, and then we’ll wait for government news, press releases to give us the final nail confirming that there was a nuclear test.

Melissa: Yeah, so as Dave said, as a citizen scientist, I love just setting up the USGS alert, and then if there’s an earthquake near the village of Punggye-ri, I’m like, “Ah-hah, I got you” because it’s not a very seismically active area. When the earthquakes happen that are related to an underground nuclear test, they’re shallow. They’re not deep, geological events.

Yeah, there’s some giveaways like, people like to do them on the hour, or the half hour, and mother nature doesn’t care. But some resources for your listeners, if they want to get involved and see, is you can go to the USGS website and set up your own alert. The CTBTO has not just seismic stations, but the radionuclide stations I mentioned, as well as infrasound and hydroacoustic, and other types of facilities all over the world. There’s a really cool map on their website where they show the over… I think it’s nearly 300 stations all around the world now, that are devoted exclusively to monitoring nuclear tests.

They get their information out, I think in seven minutes, and I don’t get that information necessarily in the first seven minutes, because I’m not a state member, a state party. But they will give out information very soon afterwards, and actually based on the seismic data, our colleagues, Jeffrey Lewis and some other young, smart people of the world, actually threw together a map, not using CTBTO data, but using the seismic stations of I think Iran, China, Japan, South Korea, and so if you go to their website, it’s called SleuthingFromTheInternet.com, you can set up little alerts there too, or scale for all the activities that are happening.

That was really just intended I think to be a little bit transparent with the seismic data and try to see data from different country stations, and in part, it was conceived because I think the USGS was deleting some of their explosions from the database and someone noticed. So now the idea is that you take a little bit of data from all these different countries, and that you can compare it to each other.

The last place I would suggest is to go to the IRIS seismic monitoring station, because just as Dave was mentioning, each seismic event has a different P wave, and so it shows up differently, like a fingerprint. And so, when IRIS puts out information, you can very quickly see how the different explosions in North Korea compare to each other, relatively, and so that can be really useful, too.

Dave: I will say, though, that sometimes you might get a false alarm. I believe it was with the last nuclear test. There was one reporting station, their automatic alert system that was put up out of the UK, that didn’t report it. No one caught that it didn’t, and then it did report it like a week later. So, for all of half an hour until we figured it out, there was a bit of a pause because there was some concern they might have done another test again, which would have been the seventh, but it turned out just being a delayed reporting.

Dave: Most of the time these things work out really well, but you always have to look for secondary and third sources of confirmation when these types of events happen.

Ariel: So a quick aside, we will have links to everything that you both just brought up in the transcript, so anyone interested in following up with any of these options, will be able to. I’m also going to share a fun fact that I learned, and that was, we originally had a global seismic network in order to monitor nuclear weapons testing. That’s why it was set up. And it’s only because we set that up that we actually were able to prove the plate tectonics theory.

Melissa: Oh, cool.

Dave: That’s really cool.

Melissa: Yeah. No, the CTBTO is really interesting, because even though the treaty isn’t enforced yet, they have these amazing scientific resources, and they’ve done all kinds of things. Like, they can hear whales moving around with their hydroacoustic technology, and when Iran had an explosion, a major explosion at their solid motor missile facility, they detected that as well.

Ariel: Yeah. It’s fun. Like I said, I did seismology a while ago so I’m signed up for lots of fun alerts. It’s always fun to learn about where things are blowing up in the earth’s surface.

Melissa: Well, that’s really the magic of open source to me. I mean, it used to be that a government came out and said, “Okay, this is what happened, and this is what we’re going to do about it.” But the idea that me, like a regular person in the world, can actually look up this primary information in the moments that it happens, and make a determination for myself, is really empowering. It makes me feel like I have the agency I want to have in understanding the world, and so I have to admit, that day in South Korea, when I was sitting there in the office tower and it was like, “Okay, all hands on deck, everyone’s got to write a report” and I was trying to figure it out, I was like, “I can’t believe I’m doing this. I can’t believe I can do this.” It’s such a different world already.

Ariel: Yeah. That is really amazing. I like your description. It’s really empowering to know that we have access to this information. So, I do want to move on and with access to this information, what do we know about what’s going on in North Korea right now? What can you tell us about what their plans are? Do we think the summit will happen? I guess I haven’t kept up with whatever the most recent news is. Do we think that they will actually do anything to get rid of their nuclear weapons?

Dave: I think at this point, the North Koreans feel really comfortable with the amount of information and progress they’ve made in their nuclear weapons program. That’s why they’re willing to talk. This program was primarily as a means to create a security assurance for the North Koreans because the Americans and South Koreans and whatnot have always been interested in regime change, removing North Korea from the equation, trying to end the thing that started in the 1950s, the Korean War, right? So there’d just be one Korea, we wouldn’t have to worry about North Korea, or this mysterious Hermit Kingdom, above the 38th parallel.

With that said, there’s been a lot of speculation as to why the North Koreans are willing to talk to us now. Some people have been floating around the idea that maximum pressure, I think that was the word used, with sanctions and whatnot, has brought the North Koreans to their knees, and now they’re willing to give up their nukes, as we’ve been hearing about.

But the way the North Koreans use denuclearization is very important. Because on one hand, that could mean that they’re willing to give up their nuclear weapons, and to denuclearize the state itself, but the way the North Koreans use it is much broader. It’s more used in the way of denuclearizing the peninsula. It’s not specifically reflective onto them.

Now that they’ve finally achieved some type of reasonable success with their nuclear weapons program, they’re more in a position where they think they can talk to the United States as equals, and denuclearization falls into the terminology that it’s used by other nuclear weapons states, where it’s a, “In a better world we won’t need these types of horrible weapons, but we don’t live in that world today, so we will stand behind the effort to denuclearize, but not right now.”

Melissa: Yeah, I think we can say that if we look at North Korea’s capabilities first, and then why they’re talking now, we can see that in the time when Dave and I were cutting our teeth, they were really ramping up their nuclear and missile capabilities. It wasn’t immediately obvious, because a lot of what was happening was inside a laboratory or inside a building, but then eventually they started doing nuclear tests and then they did more and more missile tests.

It used to be that a missile test was just a short range missile off the coast, sometimes it was a political grandstanding. But if you look, our colleague, Shea Cotton, made a missile database that shows every North Korean missile test, and you can see that in the time under Kim Jong-un, those tests really started to ramp up. I think Dave, you started at CNS in like 2014?

Dave: Right around then.

Melissa: Right around then, so they jumped up to like 19 missile tests that year. I can say this because I’m looking at the database right now, and they started doing really more interesting things than ever before, too. Even though diplomatically and politically we were still thinking of them as being backwards, as not having a very good capability, if we looked at it quantitatively, we could say, “Well, they’re really working on something.”

So Dave actually was really excellent at geolocating. When they did engine tests, we could measure the bell of the engine and get a sense of what those engines were about. We could see solid fuel motors being tested, and so this went all the way up until ICBM launched last fall, and then they were satisfied.

Ariel: So when you say engine testing, what does that mean? What engine?

Dave: The North Korean ballistic missile fleet used to be entirely tied to this really old Soviet missile called the Scud. If anyone’s played video games in the late ’90s or early 2000s, that was the small missile that you always had to take out or something along that line, and it was fairly primitive. It was a design that the North Koreans hadn’t demonstrated they were able to move beyond, that’s why then the last three years started to kick in, the North Koreans started to field more complicated missiles instead of showing that they were doing engine tests with more experimental, more advanced designs that we had seen in other parts of the world previously. Some people were a bit speculative or doubting that the North Koreans were actually making serious progress. Then last year, they tested their first intermediate range ballistic missile which can hit Guam, which is something that they’ve been trying to do for a while, but it hadn’t worked out. Then, they made that missile larger, they made their first ICBM.

Then they made that missile even larger, came up with a much more ambitious engine design using two engines instead of one. They had a much more advanced steering system, and they came up with the Hwasong-15 which is their longest range ICBM. It’s a huge shift from the way we were having this conversation 5 to 10 years ago, where we were looking at their space launch vehicles, which were, again, modified Scuds that were stretched out and essentially tied together, to an actual functioning ICBM fleet.

The technological shift in pair with their nuclear weapons developments have really demonstrated that the North Koreans are no longer this 10 to 20 year, around the corner threat, that they actually possess the ability to launch nuclear weapons at the United States.

Melissa: And back when they had their first nuclear test in 2006, people were like, “It’s a device.” I think for years, we still call it a device. But back then, the US and others kept moving the goalposts. They were saying, “Well, all right. They had a nuclear device explode. We don’t know how big it was, they have no way of delivering it. We don’t know what the yield was. It probably fizzled.” It was dismissive.

So, from that period, 2006 to today, it’s a real remarkable challenge. Almost every criticism that North Korea has faced, right down to their heat shield on their ICBM, has been addressed vociferously with propaganda, photos and videos that we in turn can analyze. And yeah, I think they have demonstrated essentially that they can explode something, they can launch a missile that can carry something that can explode.

The only thing they haven’t done, and Dave can chime in here, is explode a nuclear weapon on the tip of a missile. Other countries have done this, and it’s terrifying, and because Dave is such a geographically visual person, I’ll let him describe what that might look like. But if we keep goading them, if we keep telling them they’re backwards, eventually they’re going to want to prove it.

Dave: Yeah, so off of Melissa’s point, this is something that I believe Jeffrey might have coined. It’s called the Juche Bird, which is a playoff of Frigate Bird, which was a live nuclear warhead test that the Americans conducted. The North Koreans, in order to prove that the system in its entirety — the nuclear device, the missile, the reentry shield — all work and it’s not just small random successes in different parts of a much larger program, is they would take a live nuclear weapon, put it on the end of a long range missile, launch it in the air, and detonate it at a specific location to show that they have the ability to actually use the purported weapon system.

Melissa: So if you’re sitting in Japan or South Korea, but especially Japan, and you imagine North Korea launching an intermediate range or intercontinental ballistic missile over your country, with a nuclear weapon on it, in order to execute an atmospheric test, that makes you extremely nervous. Extremely nervous, and we all should be a little bit nervous, because it’s really hard for anyone in the open source, and I would argue in the intelligence community, to know, “Well, this is just an atmospheric test. This isn’t the beginning of a war.”

We would have to trust that they pick up the trajectory of that missile really fast and determine that it’s not heading anywhere. That’s the challenge with all of these missile tests, is no one can tell if there’s a warhead on it, or not a warhead on it, and then we start playing games with ballistic missile defense, and that is a whole new can of worms.

Ariel: What do you guys think is the risk that North Korea or any other country for that matter, would intentionally launch a nuclear weapon at another country?

Melissa: For me, it’s accidents, and an accident can unfold a couple of different ways. One way would be perhaps the US is performing joint exercises. North Korea has some sensing equipment up on peaks of mountains, and Dave has found every single one probably, but it’s not perfect. It’s not great, and if the picture comes back to them, it’s a little fuzzy, maybe this is no longer a joint exercise. This is the beginning of an attack. They will decide to engage.

They’ve long said that they believe that a war will start based on the pretext of a joint exercise. In reverse scenario, what if North Korea does launch an ICBM with a nuclear warhead, in order to perform a test, and the US or Japan or South Korea think, “Well, this is it. This is the war.” And so it’s those accidental scenarios that I worry about, or even perhaps what happens if a test goes badly? Or, someone is harmed in some way?

I worry that these states would have a hard time politically rolling back where they feel they have to be, based on these high stakes.

Dave: I agree with Melissa. I think the highest risk we have is also depending on our nuclear posture in accident. There have been accidents that have happened in the past where someone in a monitoring base picks up a bunch of bleeps on a radar, and people start initiating the game on protocol, and luckily we’ve been able to avoid that to its completion in the past.

Now, with the North Koreans, this could also work in their direction, as well. I can’t imagine that their sensing technology is up to par with what the United States has, or had, back when these accidents were a real thing and they happened. So if the North Koreans see a military exercise that they don’t feel comfortable with, or they have some type of technical glitch on their side, they might notionally launch something, and that would be the start of a conflict.

Ariel: One of the final questions that I have for both of you. I’ve read that while nuclear weapons are scary, the greater threat with North Korea could actually be their conventional weapons. Could either of you speak to that?

Dave: Yeah, sure. North Korea has a very large conventional army. Some people might try to make jokes about how modern that army is, but military force only needs to be so modern with the type of geographical game that’s in play on the Korean Peninsula. Seoul is really not that far from the DMZ, and it’s a widely known fact that North Korea has tons of artillery pointed at Seoul. They’ve had these things pointed there since the end of the Korean War, and they’re all entrenched.

You might be able to hit some of them, but you’re not going to hit all of them. This type of artillery, in connection with their conventional ballistic missile force, we’re talking about things that aren’t carrying a WMD, it’s a real big threat for some type of conventional action.

Seoul is a huge city. The metropolitan area at least has a population of over 20 million people. I’m not sure if you’ve ever been to Seoul, it’s a great, beautiful city, but traffic is horrible, and if everyone’s trying to leave the city when something happens, everyone north of the river is screwed, and congestion on the south side, it would just be a total disaster. Outside of the whole nuclear aspect of this dangerous relationship, the conventional forces North Korea has are equally as terrifying.

Melissa: I think Dave’s bang on, but the only thing I would add is that one of the things that’s concerning about having both nuclear and conventional forces is how you use your conventional forces with that extra nuclear guarantee. This is something that our boss, Jeffrey Lewis, has written about extensively. But do you use that extra measure of security and just preserve it, save it? Does Kim Jong-un go home at night to his family and say, “Yes, I feel extra safe today because I have my nuclear security?”

Or do you use that extra nuclear security in order to increase the number of provocations that you do conventionally? Because we’ve had theses crises break out over the sinking of the Cheonan naval vessel, or the shelling of Yeonpyeong, near the border. In both cases, South Koreans died, but the question is will North Korea feel emboldened by its nuclear security, and will it carry out more conventional provocations?

Ariel: Okay, and so for the last question that I want to ask, we’ve talked about all these things that could go wrong, and there’s really just never anything that positive about a nuclear weapons discussion, but I still want to end with is there anything that gives you hope about this situation?

Dave: That’s a tough question. I mean, on one side, we have a nuclear armed North Korea, and this is something that we knew was coming for quite some time. I think if anything, this is one thing that I know I have and I believe Melissa has been advocating as well, is conversation and dialogue between North and all the other associated parties, including the United States, is a way to begin some type of line of communication, hopefully so that accidents don’t happen.

‘Cause North Korea’s not going to be giving up their nukes anytime soon. Even though the talks that you may be having aren’t going to be as productive as you would want them to be, I believe conversation is critical at this moment, because the other alternatives are pretty bad.

Melissa: I guess I’ll add on that we have Dave now, and I know it sounds like I’m teasing my colleague, but it’s true. Things are bad, things are bad, but we’re turning out generation after generation of young, brilliant, enthusiastic people. Before 2014, we didn’t have a Dave, and now we have a Dave, and Dave is making more Daves, and every year we’re matriculating students who care about this issue, who are finding new ways to engage with this issue, that are disrupting entrenched thinking on this issue.

Nuclear weapons are old. They are scary, they are the biggest explosion that humans have ever made, but they are physical and finite, and the technology is aging, and I do think with new creative, engaging ways, the next generation’s going to come along and they’re going to be able to address this issue with new hacks. These can be technical hacks, they can be along the side of verification and trust building. These can be diplomatic hacks.

The grassroots movements we see all around the world, that are taking place to ban nuclear weapons, those are largely motivated by young people. I’m on this bridge where I get to see… I remember the Berlin Wall coming down, I also get to see the students who don’t remember 9/11, and it’s a nice vantage point to be able to see how history’s changing, and while it feels very scary and dark in this moment, in this administration, we’ve been in dark administrations before. We’ve faced much more terrifying adversaries than North Korea, and I think it’s going to be generations ahead who are going to help crack this problem.

Ariel: Excellent. That was a really wonderful answer. Thank you. Well, thank you both so much for being here today. I’ve really enjoyed talking with you.

Melissa: Thanks for having us.

Dave: Yeah, thanks for having us on.

Ariel: For listeners, as I mentioned earlier, we will have links to anything we discussed on the podcast in the transcript of the podcast, which you can find from the homepage of FutureOfLife.org. So, thanks again for listening, like the podcast if you enjoyed it, subscribe to hear more, and we will be back again next month.

 

Teaching Today’s AI Students To Be Tomorrow’s Ethical Leaders: An Interview With Yan Zhang

Some of the greatest scientists and inventors of the future are sitting in high school classrooms right now, breezing through calculus and eagerly awaiting freshman year at the world’s top universities. They may have already won Math Olympiads or invented clever, new internet applications. We know these students are smart, but are they prepared to responsibly guide the future of technology?

Developing safe and beneficial technology requires more than technical expertise — it requires a well-rounded education and the ability to understand other perspectives. But since math and science students must spend so much time doing technical work, they often lack the skills and experience necessary to understand how their inventions will impact society.

These educational gaps could prove problematic as artificial intelligence assumes a greater role in our lives. AI research is booming among young computer scientists, and these students need to understand the complex ethical, governance, and safety challenges posed by their innovations.

 

SPARC

In 2012, a group of AI researchers and safety advocates – Paul Christiano, Jacob Steinhardt, Andrew Critch, Anna Salamon, and Yan Zhang – created the Summer Program in Applied Rationality and Cognition (SPARC) to address the many issues that face quantitatively strong teenagers, including the issue of educational gaps in AI. As with all technologies, they explain, the more the AI community consists of thoughtful, intelligent, broad-minded reasoners, the more likely AI is to be developed in a safe and beneficial manner.

Each summer, the SPARC founders invite 30-35 mathematically gifted high school students to participate in their two-week program. Zhang, SPARC’s director, explains: “Our goals are to generate a strong community, expose these students to ideas that they’re not going to get in class – blind spots of being a quantitatively strong teenager in today’s world, like empathy and social dynamics. Overall we want to make them more powerful individuals who can bring positive change to the world.”

To help students make a positive impact, SPARC instructors teach core ideas in effective altruism (EA). “We have a lot of conversations about EA, but we don’t push the students to become EA,” Zhang says. “We expose them to good ideas, and I think that’s a healthier way to do mentorship.”

SPARC also exposes students to machine learning, AI safety, and existential risks. In 2016 and 2017, they held over 10 classes on these topics, including: “Machine Learning” and “Tensorflow” taught by Jacob Steinhardt, “Irresponsible Futurism” and “Effective Do-Gooding” taught by Paul Christiano, “Optimization” taught by John Schulman, and “Long-Term Thinking on AI and Automization” taught by Michael Webb.

But SPARC instructors don’t push students down the AI path either. Instead, they encourage students to apply SPARC’s holistic training to make a more positive impact in any field.

 

Thinking on the Margin: The Role of Social Skills

Making the most positive impact requires thinking on the margin, and asking: What one additional unit of knowledge will be most helpful for creating positive impact? For these students, most of whom have won Math and Computing Olympiads, it’s usually not more math.

“A weakness of a lot of mathematically-minded students are things like social skills or having productive arguments with people,” Zhang says. “Because to be impactful you need your quantitative skills, but you need to also be able to relate with people.”

To counter this weakness, he teaches classes on social skills and signaling, and occasionally leads improvisational games. SPARC still teaches a lot of math, but Zhang is more interested in addressing these students’ educational blind spots – the same blind spots that the instructors themselves had as students. “What would have made us more impactful individuals, and also more complete and more human in many ways?” he asks.

Working with non-math students can help, so Zhang and his colleagues have experimented with bringing excellent writers and original thinkers into the program. “We’ve consistently had really good successes with those students, because they bring something that the Math Olympiad kids don’t have,” Zhang says.

SPARC also broadens students’ horizons with guest speakers from academia and organizations such as the Open Philanthropy Project, OpenAI, Dropbox and Quora. In one talk, Dropbox engineer Albert Ni spoke to SPARC students about “common mistakes that math people make when they try to do things later in life.”

In another successful experiment suggested by Ofer Grossman, a SPARC alum who is now a staff member, SPARC made half of all classes optional in 2017. The classes were still packed because students appreciated the culture. The founders also agreed that conversations after class are often more impactful than classes, and therefore engineered one-on-one time and group discussions into the curriculum. Thinking on the margin, they ask: “What are the things that were memorable about school? What are the good parts? Can we do more of those and less of the others?”

Above all, SPARC fosters a culture of openness, curiosity and accountability. Inherent in this project is “cognitive debiasing” – learning about common biases like selection bias and confirmation bias, and correcting for them. “We do a lot of de-biasing in our interactions with each other, very explicitly,” Zhang says. “We also have classes on cognitive biases, but the culture is the more important part.”

 

AI Research and Future Leaders

Designing safe and beneficial technology requires technical expertise, but in SPARC’s view, cultivating a holistic research culture is equally important. Today’s top students may make some of the most consequential AI breakthroughs in the future, and their values, education and temperament will play a critical role in ensuring that advanced AI is deployed safely and for the common good.

“This is also important outside of AI,” Zhang explains. “The official SPARC stance is to make these students future leaders in their communities, whether it’s AI, academia, medicine, or law. These leaders could then talk to each other and become allies instead of having a bunch of splintered, narrow disciplines.”

As SPARC approaches its 7th year, some alumni have already begun to make an impact. A few AI-oriented alumni recently founded AlphaSheets – a collaborative, programmable spreadsheet for finance that is less prone to error – while other students are leading a “hacker house” with people in Silicon Valley. Additionally, SPARC inspired the creation of ESPR, a similar European program explicitly focused on AI risk.

But most impacts will be less tangible. “Different pockets of people interested in different things have been working with SPARC’s resources, and they’re forming a lot of social groups,” Zhang explains. “It’s like a bunch of little sparks and we don’t quite know what they’ll become, but I’m pretty excited about next five years.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

ICRAC Open Letter Opposes Google’s Involvement With Military

From improving medicine to better search engines to assistants that help ease busy schedules, artificial intelligence is already proving a boon to society. But just as it can be designed to help, it can be designed to harm and even to kill.

Military uses of AI can also run the gamut from programs that could help improve food distribution logistics to weapons that can identify and assassinate targets without input from humans. Because AI programs can have these dual uses, it’s difficult for companies who do not want their technology to cause harm to work with militaries – it’s not currently possible for a company to ensure that if it helps the military solve a benign problem with an AI program that the program won’t later be repurposed to take human lives.

So when employees at Google learned earlier this year about the company’s involvement in the Pentagon’s Project Maven, they were upset. Though Google argues that their work on Project Maven only assisted the U.S. military with image recognition tools from drone footage, many suggest that this technology could later be used for harm. In response, over 3,000 employees signed an open letter saying they did not want their work to be used to kill.

And it isn’t just Google’s employees who are concerned.

Earlier this week, the International Committee for Robot Arms Control released an open letter signed by hundreds of academics calling on Google’s leadership to withdraw from the “business of war.” The letter, which is addressed to Google’s leadership, responds to the growing criticism of Google’s participation in the Pentagon’s program, Project Maven.

The letter states, “we write in solidarity with the 3100+ Google employees, joined by other technology workers, who oppose Google’s participation in Project Maven.” It goes on to remind Google leadership to be cognizant of the incredible responsibility the company has for safeguarding the data it’s collected from its users, as well as its famous motto, “Don’t Be Evil.”

Specifically, the letter calls on Google to:

  • “Terminate its Project Maven contract with the DoD.
  • “Commit not to develop military technologies, nor to allow the personal data it has collected to be used for military operations.
  • “Pledge to neither participate in nor support the development, manufacture, trade or use of autonomous weapons; and to support efforts to ban autonomous weapons.”

Lucy Suchman, one of the letter’s authors, explained part of her motivation for her involvement:

“For me the greatest concern is that this effort will lead to further reliance on profiling and guilt by association in the US drone surveillance program, as the only way to generate signal out of the noise of massive data collection. There are already serious questions about the legality of targeted killing, and automating it further will only make it less accountable.”

The letter was released the same week that a small group of Google employees made news for resigning in protest against Project Maven. It also comes barely a month after a successful boycott by academic researchers against KAIST’s autonomous weapons effort.

In addition, last month the United Nations held their most recent meeting to consider a ban on lethal autonomous weapons. 26 countries, including China, have now said they would support some sort of official ban on these weapons.

In response to the number of signatories the open letter has received, Suchman added, “This is clearly an issue that strikes a chord for many researchers who’ve been tracking the incorporation of AI and robotics into military systems.”

If you want to add your name to the letter, you can do so here.

Lethal Autonomous Weapons: An Update from the United Nations

Earlier this month, the United Nations Convention on Conventional Weapons (UN CCW) Group of Governmental Experts met in Geneva to discuss the future of lethal autonomous weapons systems. But before we get to that, here’s a quick recap of everything that’s happened in the last six months.

 

Slaughterbots and Boycotts

Since its release in November 2017, the video Slaughterbots has been seen approximately 60 million times and has been featured in hundreds of news articles around the world. The video coincided with the UN CCW Group of Governmental Experts’ first meeting in Geneva to discuss a ban on lethal autonomous weapons, as well as the release of open letters from AI researchers in Australia, Canada, Belgium, and other countries urging their heads of state to support an international ban on lethal autonomous weapons.

Over the last two months, autonomous weapons regained the international spotlight. In March, after learning that the Korea Advanced Institute of Science and Technology (KAIST) planned to open an AI weapons lab in collaboration with a major arms company, AI researcher Toby Walsh led an academic boycott of the university. Over 50 of the world’s leading AI and robotics researchers from 30 countries joined the boycott, and in less than a week, KAIST agreed to “not conduct any research activities counter to human dignity including autonomous weapons lacking meaningful human control.” The boycott was covered by CNN and The Guardian.

Additionally, over 3,100 Google employees, including dozens of senior engineers, signed a letter in early April protesting the company’s involvement in a Pentagon program called “Project Maven,” which uses AI to analyze drone imaging. Employees worried that this technology could be repurposed to also operate drones or launch weapons. Citing their “Don’t Be Evil” motto, the employees asked to cancel the project and not to become involved in the “business of war.”

 

The UN CCW meets again…

In the wake of this growing pressure, 82 countries in the UN CCW met again from April 9-13 to consider a ban on lethal autonomous weapons. Throughout the week, states and civil society representatives discussed “meaningful human control” and whether they should just be concerned about “lethal” autonomous weapons, or all autonomous weapons generally. Here is a brief recap of the meeting’s progress:

  • The group of nations that explicitly endorse the call to ban LAWS expanded to 26 (with China, Austria, Colombia, and Djibouti joining during the CCW meeting.)
  • However, five states explicitly rejected moving to negotiate new international law on fully autonomous weapons: France, Israel, Russia, United Kingdom, and United States.
  • Nearly every nation agreed that it is important to retain human control over autonomous weapons, despite disagreements surrounding the definition of “meaningful human control.”
  • Throughout the discussion, states focused on complying with International Humanitarian Law (IHL). Human Rights Watch argued that there already is precedent in international law and disarmament law for banning weapons without human control.
  • Many countries submitted working papers to inform the discussions, including China and the United States.
  • Although states couldn’t reach an agreement during the meeting, momentum is growing towards solidifying a framework for defining lethal autonomous weapons.

You can find written and video recaps from each day of the UN CCW meeting here, written by Reaching Critical Will.

The UN CCW is slated to resume discussions in August 2018, however, given the speed with which autonomous weaponry is advancing, many advocates worry that they are moving too slowly.

 

What can you do?

If you work in the tech industry, consider signing the Tech Workers Coalition open letter, which calls on Google, Amazon and Microsoft to stay out of the business of war. And if you’d like to support the fight against LAWS, we recommend donating to the Campaign to Stop Killer Robots. This organization, which is not affiliated with FLI, has done amazing work over the past few years to lead efforts around the world to prevent the development of lethal autonomous weapons. Please consider donating here.

 

Learn more…

If you want to learn more about the technological, political, and social developments of autonomous weapons, check out the Research & Reports page of our Autonomous Weapons website. You can find relevant news stories and updates at @AIweapons on Twitter and autonomousweapons on Facebook.

Podcast: What Are the Odds of Nuclear War? A Conversation With Seth Baum and Robert de Neufville

What are the odds of a nuclear war happening this century? And how close have we been to nuclear war in the past? Few academics focus on the probability of nuclear war, but many leading voices like former US Secretary of Defense, William Perry, argue that the threat of nuclear conflict is growing.

On this month’s podcast, Ariel spoke with Seth Baum and Robert de Neufville from the Global Catastrophic Risk Institute (GCRI), who recently coauthored a report titled A Model for the Probability of Nuclear War. The report examines 60 historical incidents that could have escalated to nuclear war and presents a model for determining the odds are that we could have some type of nuclear war in the future.

Topics discussed in this episode include:

  • the most hair-raising nuclear close calls in history
  • whether we face a greater risk from accidental or intentional nuclear war
  • China’s secrecy vs the United States’ transparency about nuclear weapons
  • Robert’s first-hand experience with the false missile alert in Hawaii
  • and how researchers can help us understand nuclear war and craft better policy

Links you might be interested in after listening to the podcast:

You can listen to this podcast above or read the transcript below.

 

 

Ariel: Hello, I’m Ariel Conn with the Future of Life Institute. If you’ve been listening to our previous podcasts, welcome back. If this is new for you, also welcome, but in any case, please take a moment to follow us, like the podcast, and maybe even share the podcast.

Today, I am excited to present Seth Baum and Robert de Neufville with the Global Catastrophic Risk Institute (GCRI). Seth is the Executive Director and Robert is the Director of Communications, he is also a super forecaster, and they have recently written a report called A Model for the Probability of Nuclear War. This was a really interesting paper that looks at 60 historical incidents that could have escalated to nuclear war and it basically presents a model for how we can determine what the odds are that we could have some type of nuclear war in the future. So, Seth and Robert, thank you so much for joining us today.

Seth: Thanks for having me.

Robert: Thanks, Ariel.

Ariel: Okay, so before we get too far into this, I was hoping that one or both of you could just talk a little bit about what the paper is and what prompted you to do this research, and then we’ll go into more specifics about the paper itself.

Seth: Sure, I can talk about that a little bit. So the paper is a broad overview of the probability of nuclear war, and it has three main parts. One is a detailed background on how to think about the probability, explaining differences between the concept of probability versus the concept of frequency and related background in probability theory that’s relevant for thinking about nuclear war. Then there is a model that scans across a wide range, maybe the entire range, but at least a very wide range of scenarios that could end up in nuclear war. And then finally, is a data set of historical incidents that at least had some potential to lead to nuclear war, and those incidents are organized in terms of the scenarios that are in the model. The historical incidents give us at least some indication of how likely each of those scenario types are to be.

Ariel: Okay. At the very, very start of the paper, you guys say that nuclear war doesn’t get enough scholarly attention, and so I was wondering if you could explain why that’s the case and what role this type of risk analysis can play in nuclear weapons policy.

Seth: Sure, I can talk to that. The paper, I believe, specifically says that the probability of nuclear war does not get much scholarly attention. In fact, we put a fair bit of time into trying to find every previous study that we could, and there was really, really little that we were able to find, and maybe we missed a few things, but my guess is that this is just about all that’s out there and it’s really not very much at all. We can only speculate on why there has not been more research of this type, my best guess is that the people who have studied nuclear war — and there’s a much larger literature on other aspects of nuclear war — they just do not approach it from a risk perspective as we do, that they are inclined to think about nuclear war from other perspectives and focus on other aspects of it.

So the intersection of people who are both interested in studying nuclear war and tend to think in quantitative risk terms is a relatively small population of scholars, which is why there’s been so little research, is at least my best guess.

Robert: Yeah, it’s a really interesting question. I think that the tendency has been to think about it strategically, something we have control over, somebody makes a choice to push a button or not, and that makes sense from some perspective. I think there’s also a way in which we want to think about it as something unthinkable. There hasn’t been a nuclear detonation in a long time and we hope that there will never be another one, but I think that it’s important to think about it this way so that we can find the ways that we can mitigate the risk. I think that’s something that’s been neglected.

Seth: Just one quick clarification, there have been very recent nuclear detonations, but those have all been tests detonations, not detonations in conflict.

Robert: Fair enough. Right, not a use in anger.

Ariel: That actually brings up a question that I have. As you guys point out in the paper, we’ve had one nuclear war and that was World War II, so we essentially have one data point. How do you address probability with so little actual data?

Seth: I would say “carefully,” and this is why the paper itself is very cautious with respect to quantification. We don’t actually include any numbers for the probability of nuclear war in this paper.

The easy thing to do for calculating probabilities is when you have a large data set of that type of event. If you want to calculate the probability of dying in a car crash, for example, there’s lots of data on that because it’s something that happens with a fairly high frequency. Nuclear war, there’s just one data point and it was under circumstances that are very different from what we have right now, World War II. Maybe there would be another world war, but no two world wars are the same. So we have to, instead, look at all the different types of evidence that we can bring in to get some understanding for how nuclear war could occur, which includes evidence about the process of going from calm into periods of tension, or the thought of going to nuclear war all the way to the actual decision to initiate nuclear war. And then also look at a wider set of historical data, which is something we did in this paper, looking at incidents that did not end up as nuclear wars, but pushed at least a little bit in that direction, to see what we can learn about how likely it is for things to go in the direction of nuclear war, which tells us at least something about how likely it is to get there all the way.

Ariel: Robert, I wanted to turn to you on that note, you were the person who did a lot of work figuring out what these 60 historical events were. How did you choose them?

Robert: Well, I wouldn’t really say I chose them, I tried to just find every event that was there. There are a few things that we left out because we thought it falls below some threshold of the seriousness of the incident, but in theory you could probably expand it in the scope even a little wider than we did. But to some extent we just looked at what’s publicly known. I think the data set is really valuable, I hope it’s valuable, but one of the issues with it is it’s kind of a convenience sample of the things that we know about, and some areas, some parts of history, are much better reported on than others. For example, we know a lot about the Cuban Missile Crisis in the 1960s, a lot of research has been done on that, there are the times when the US government has been fairly transparent about incidents, but we know less about other periods and other countries as well. We don’t have incidents from China’s nuclear program, but that doesn’t mean there weren’t any, it just means it’s hard to figure out, and that scenario would be really interesting to do more research on.

Ariel: So, what was the threshold you were looking at to say, “Okay, I think this could have gone nuclear”?

Robert: Yeah, that’s a really good question. It’s somewhat hard to say. I think that a lot of these things are judgment calls. If you look at the history of incidents, I think a number of them have been blown a little bit out of proportion. As they’ve been retold, people like to say we came close to nuclear war, and that’s not always true. There are other incidents which are genuinely hair-raising and then there are some incidents that seem very minor, that you could say maybe it could have gotten to a nuclear war. But there was some safety incident on an Air Force Base and they didn’t follow procedures, and you could maybe tell yourself a story in which that led to a nuclear war, but at some point you make a judgment call and say, well, that doesn’t seem like a serious issue.

But it wasn’t like we have a really clear, well-defined line. In some ways, we’d like to broaden the data set so that we can include even smaller incidents just because the more incidents, the better as far as understanding, not the more incidents the better as far as being safe.

Ariel: Right. I’d like this question to go to both of you, as you were looking through these historical events, you mentioned that they were already public records so they’re not new per se, but were there any that surprised you, and which were one or two that you found the most hair-raising?

Robert: Well, I would say one that surprised me, and this may just be because of my ignorance of certain parts of geopolitical history, but there was an incident with the USS Liberty in the Mediterranean, in which the Israelis mistook it for an Egyptian destroyer and they decided to take it out, essentially, not realizing it was actually an American research vessel, and they did, and what happened was the US scrambled planes to respond. The problem was that most of the planes, or the ordinary planes they would have ordinarily scrambled, were out on some other sorties, some exercise, something like that, and they ended up scrambling planes which had a nuclear payload on them. These planes were recalled pretty quickly. They mentioned this to Washington and the Secretary of Defense got on the line and said, “No, recall those planes,” so it didn’t get that far necessarily, but I found it a really shocking incident because it was a friendly fire confusion, essentially, and there were a number of cases like that in which nuclear weapons were involved because they happened to be on equipment where they shouldn’t have been that was used to respond to some kind of a real or false emergency. That seems like a bigger issue than I would’ve at first expected, that just the fact that nuclear weapons are lying around somewhere where they could be involved with something.

Ariel: Wow, okay. And Seth?

Seth: Yeah. For me this was a really eye-opening experience. I had some familiarity with the history of incidents involving nuclear weapons, but there turned out to be much more that’s gone on over the years than I really had any sense for. Some of it is because I’m not a historian, this is not my specialty, but there were any number of events that it appears that the nuclear weapons were, at least may have been, seriously considered for use in a conflict.

Just to pick one example, in 1954 and 1955 was known as the first Taiwan Straits Crisis, and the second crisis, by the way, in 1958, also included plans for nuclear weapons use. But in the first one there were plans made up by the United States, the Joint Chiefs of Staff allegedly recommended that nuclear weapons be used against China if the conflict intensified and that President Eisenhower was apparently pretty receptive to this idea. In the end, there was a ceasefire negotiated so it didn’t come to that, but had that ceasefire not been made, my sense is that … The historical record is not clear on whether the US would’ve used nuclear weapons or not, maybe even the US leadership hadn’t made any final decisions on this matter, but there any number of these events, especially earlier in the years or decades after World War II when nuclear weapons were still relatively new, in which the use of nuclear weapons in conflict seemed to at least get a serious consideration that I might not have expected.

I’m accustomed to thinking of nuclear weapons as having a fairly substantial taboo attached to them, but I feel like the taboo has perhaps strengthened over the years, such that leadership now is less inclined to give the use of nuclear weapons serious consideration than it was back then. That may be mistaken, but that’s the impression that I get and that we may be perhaps more fortunate to have gotten through the first couple decades after World War II without an additional nuclear war. But it might be less likely at this time, though still not entirely impossible by any means.

Ariel: Are you saying that you think the risk is higher now?

Seth: I think the risk is probably higher now. I think I would probably say that the risk is higher now than it was, say, 10 years ago because various relations between nuclear armed states have gotten worse, certainly including between the United States and Russia, but whether the probability of nuclear war is higher now versus in, say, the ’50s or the ’60s, that’s much harder to say. That’s a degree of detail that I don’t think we can really comment on conclusively based on the research that we have at this point.

Ariel: Okay. In a little while I’m going to want to come back to current events and ask about that, but before I do that I want to touch first on the model itself, which lists four steps to a potential nuclear war: initiating the event, crisis, nuclear weapon use and full-scale nuclear war. Could you talk about what each of those four steps might be? And then I’m going to have follow-up questions about that next.

Seth: I can say a little bit about that. The model you’re describing is a model that was used by our colleague, Martin Hellman, in a paper that he did on the probability of nuclear war, and that was probably the first paper that develops the study of the probability of nuclear war using the sort of methodology that we use in this paper, which is to develop nuclear war scenarios.

So the four steps in this model are four steps to go from a period of calm into a full-scale nuclear war. His paper was looking at the probability of nuclear war based on an event that is similar to the Cuban Missile Crisis, and what’s distinctive about the Cuban Missile Crisis is we may have come close to going directly to nuclear war without any other type of conflicts in the first place. So that’s where the initiating event and the crisis in this model comes from, it’s this idea that there will be some of event that leads to a crisis, and the crisis will go straight to nuclear weapons use which could then scale to a full-scale nuclear war. The value of breaking it into those four steps is then you can look at each step in turn, think through the conditions for each of them to occur and maybe the probability of going from one step to the next, which you can use to evaluate the overall probability of that type of nuclear war. That’s for one specific type of nuclear war. Our paper then tries to scan across the full range of different types of nuclear war, different nuclear war scenarios, and put that all into one broader model.

Ariel: Okay. Yeah, your paper talks about 14 scenarios, correct?

Seth: That’s correct, yes.

Ariel: Okay, yeah. So I guess I have two questions for you: one, how did you come up with these 14 scenarios, and are there maybe a couple that you think are most worrisome?

Seth: So the first question we can definitely answer, we came up with them through our read of the nuclear war literature and our overall understanding of the risk and then iterating as we put the model together, thinking through what makes the most sense for how to organize the different types of nuclear war scenarios, and through that process, that’s how we ended up with this model.

As far as which ones seem to be the most worrisome, I would say a big question is whether we should be more worried about intentional versus accidental, or inadvertent nuclear war. I feel like I still don’t actually have a good answer to that question. Basically, should we be more worried about nuclear war that happens when a nuclear armed country decides to go ahead and start that nuclear war versus one where there’s some type of accident or error, like a false alarm or the detonation of a nuclear weapon that was not intended to be an act of war? I still feel like I don’t have a good sense for that.

Maybe the one thing I do feel is that it seems less likely that we would end up in a nuclear war from a detonation of a nuclear weapon that was not intentionally an act of war just because it feels to me like those events are less likely to happen. This would be nuclear terrorism or the accidental detonation of nuclear weapons, and even if it did happen it’s relatively likely that they would be correctly diagnosed as not being an act of war. I’m not certain of this. I can think of some reasons why maybe we should be worried about that type of scenario, but especially looking at the historical data it felt like those historical incidents were a bit more of a stretch, a bit further away from actually ending up in nuclear war.

Robert, I’m actually curious, your reaction to that, if you agree or disagree with that.

Robert: Well, I don’t think that non-state actors using a nuclear weapon is the big risk right now. But as far as whether it’s more likely that we’re going to get into a nuclear war through some kind of human error or a technological mistake, or whether it will be a deliberate act of war, I can think of scary things that have happened on both sides. I mean, the major thing that looms in one’s mind when you think about this is the Cuban Missile Crisis, and that’s an example of a crisis in which there were a lot of incidents during the course of that crisis where you think, well, this could’ve gone really badly, this could’ve gone the other way. So a crisis like that where tensions escalate and each country, or in this case the US and Russia, each thought the other might seriously threaten the homeland, I think are very scary.

On the other hand, there are incidents like the 1995 Norwegian rocket incident, which I find fairly alarming. In that incident, what happened was Norway was launching a scientific research rocket for studying the weather and had informed Russia that they were going to do this, but somehow that message hadn’t got passed along to the radar technicians, so the radar technician saw what looked like a submarine launched ballistic missile that could have been used to do an EMP, a burst over Russia which would then maybe take out radar and could be the first move in a full-scale attack. So this is scary because this got passed up the chain and supposedly, President Boris Yeltsin, it was Yeltsin at the time, actually activated the nuclear football in case he needed to authorize a response.

Now, we don’t really have a great sense how close anyone came to this, this is a little hyperbole after the fact, but this kind of thing seems like you could get there. And 1995 wasn’t a time of big tension between the US and Russia, so this kind of thing is also pretty scary and I don’t really know, I think that which risk you would find scarier depends a little bit on the current geopolitical climate. Right now, I might be most worried that the US would launch a bloody-nose attack against North Korea and North Korea would respond with a nuclear weapon, so it depends a little bit. I don’t know the answer either, I guess, is my answer.

Ariel: Okay. You guys brought up a whole bunch of things that I had planned to ask about, which is good. I mean, one of my questions had been are you more worried about intentional or accidental nuclear war, and I guess the short answer is, you don’t know? Is that fair to say?

Seth: Yeah, that’s pretty fair to say. The short answer is, at least at this time, they both seem very much worth worrying about.

As far as which one we should be more worried about, this is actually a very important detail to try to resolve for policy purposes because this speaks directly to how we should manage our nuclear weapons. For example, if we are especially worried about accidental or inadvertent nuclear war, then we should keep nuclear weapons on a relatively low launch posture. They should not be on hair-trigger alert because when things are on a high-alert status, it takes relatively little for the nuclear weapons to be launched and makes it easier for a mistake to lead to a launch. Versus if we are more worried about intentional nuclear war, then there may be some value to having them on a high-alert status in order to have a more effective deterrence in order to convince the other side to not launch their nuclear weapons. So this is an important matter to try resolving, but at this point, based on the research that we have so far, it remains, I think, somewhat ambiguous.

Ariel: I do want to follow up with that. Everything I’ve read, there doesn’t seem to be any benefit really to having things like our intercontinental ballistic missiles on hair-trigger alert, which are the ones that are on hair-trigger alert is my understanding, because submarines and the bombers still have the capability to strike back. Do you disagree with that?

Seth: I can’t say for sure whether or not I do disagree with that because it’s not something that I have looked at closely enough, so I would hesitate to comment on that matter. My general understanding is that hair-trigger alert is used as a means to enhance deterrence in order to make it less likely that either side would use their nuclear weapons in the first place, but regarding the specifics of it, that’s not something that I’ve personally looked at closely enough to really be able to comment on.

Robert: I think Seth’s right that it’s a question that needs more research in a lot of ways and that we shouldn’t answer it in the context of… We didn’t figure out the answer to that in this paper. I will say, I would personally sleep better if they weren’t on hair-trigger alert. My suspicion is that the big risk is not that one side launches some kind of decapitating first strike, I don’t think that’s really a very high risk, so I’m not as concerned as someone else might be about how well we need to deter that, how quickly we need to be able to respond. Whereas, I am very concerned about the possibility of an accident because… I mean, readings these incidents will make you concerned about it, I think. Some of them are really frightening. So that’s my intuition, but, as Seth says, I don’t think we really know. There’s more, at least in terms of this model, there’s more studying we need to do.

Seth: If I may, to one of your earlier questions regarding motivations for doing this research in the first place, I feel like to try giving more rigorous answers to some of these very basic nuclear weapons policy questions, like “should nuclear weapons be on hair-trigger alert, is that safer or more dangerous,” we can talk a little bit about what the trade-offs might be, but we don’t really have much to say about how that trade-off actually would be resolved. This is where I think that it’s important for the international security community to be trying harder to analyze the risks in these structured and, perhaps, even quantitative terms so that we can try to answer these questions more rigorously than just, this is my intuition, this is your intuition. That’s really, I think, one of the main values for doing this type of research is to be able to answer these important policy questions with more confidence and also perhaps, more consensus across different points of view than we would otherwise be able to have.

Ariel: Right. I had wanted to continue with some of the risk questions, but while we’re on the points that you’re making, Seth, what do you see moving forward with this paper? I mean, it was a bummer to read the paper and not get what the probabilities of nuclear war actually are, just a model for how we can get there, how do you see either you, or other organizations, or researchers, moving forward to start calculating what the probability could actually be?

Seth: The paper does not give us final answers for what the probability would be, but it definitely makes some important steps in that direction. Additional steps that can be taken would include things like exploring the historical incidence data set more carefully to check to see if there may be important incidents that have been missed, to see for each of the incidents how close do we really think that that came to nuclear war? And this is something that the literature on these incidents actually diverges on. There are some people who look at these incidents and see them as being really close calls, other people look at them and see them as being evidence that the system works as it should, that, sure, there were some alarms but the alarms were handled the way that they should be handled and that the tools are in place to make sure that those don’t end in nuclear war. So exactly how close these various incidents got is one important way forward towards quantifying the probability.

Another one is to come up with some sense for what the actual population of historical incidences relative to the data set that we have, we are presumably missing some number of historical incidents, some of them might be smaller and less important, but there might be some big ones that maybe they happened and we don’t know about it because they are only in literatures in other languages, we only did research in English, or because all of the evidence about them is classified government records by whichever governments were involved in the incident, and so we need to-

Ariel: Actually, I do actually want to interrupt with a question real quick there, and my apologies for not having read this closer, I know there were incidents involving the US, Russia, and I think you guys had some about Israel. Were there incidents mentioning China or any of the European countries that have nuclear weapons?

Seth: Yeah, I think there were probably incidents involving all of the nuclear armed countries, certainly involving China. For example, China had a war with the Soviet Union over their border some years ago and there was at least some talk of nuclear weapons involved in that. Also, the one I mentioned earlier, the Taiwan Straits Crises, those involved China. Then there were multiple incidents between India and Pakistan, especially regarding the situation in Kashmir. With France, I believe we included one incident in which a French nuclear bomber got a faulty signal to take off in combat and then it was eventually recalled before it got too far. There might’ve been something with the UK also. Robert, do you recall if there were any with the UK?

Robert: Yes, there was, during the Falklands war, apparently, they left with nuclear depth charges. It’s actually not really, honestly clear to me why you would use a nuclear depth charge, but there’s not any evidence they ever intended to use them but they sent out nuclear armed ships, essentially, to deal with a crisis in the Falklands.

There’s also, I think, an incident in South Africa as well when South Africa was briefly a nuclear state.

Ariel: Okay. Thanks. It’s not at all disturbing.

Robert: It’s very disturbing. I will say, I think that China is the one we know the least about. Some of the incidents that Seth mentioned with China, the danger or the nuclear armed power that might have used nuclear weapons was the United States. So there is the Soviet-China incident, but we don’t really know a lot about the Chinese program and Chinese incidents. I think some of that is because it’s not reported in English and to some extent it’s also that it’s classified and the Chinese are not as open about what’s going on.

Seth: Yeah, the Chinese are definitely much, much less transparent than the United States, as are the Russians. I mean, the United States might be the most transparent out of all of the nuclear armed countries.

I remember some years ago when I was spending time at the United Nations I got the impression that the Russians and the Chinese were actually not quite sure what to make of the Americans’ transparency, that they found it hard to believe that the US government was not just putting out loads of propaganda and misinformation that it didn’t make sense to them that we just actually put out a lot of honest data about government activities here, and that’s just the standard and that you can actually trust this information, this data. So yeah, we may be significantly underestimating the number of incidents involving China and perhaps Russia and other countries because their governments are less transparent.

Ariel: Okay. That definitely addresses a question that I had, and my apologies for interrupting you earlier.

Seth: No, that’s fine. But this is one aspect of the research that still remains to be done that would help us figure out what the probabilities might be. It would be a mistake to just calculate them based on the data set as it currently stands, because this is likely to be only a portion of the actual historical incidents that may have ended in nuclear war.

So these are the sorts of details and nuances that were, unfortunately, beyond the scope of the project that we were able to do, but it would be important work for us or other research groups to do to take us closer to having good probability estimates.

Ariel: Okay. I want to ask a few questions that, again, are probably going to be you guys guessing as opposed to having good, hard information, and I also wanted to touch a little bit on some current events. So first, one of the things that I hear a lot is that if a nuclear war is going to happen, it’s much more likely to happen between India and Pakistan than, say, the US and Russia or US and … I don’t know about US and North Korea at this point, but I’m curious what your take on that is, do you feel that India and Pakistan are actually the greatest risk or do you think that’s up in the air?

Robert: I mean, it’s a really tough question. I would say that India and Pakistan is one of the scariest situations for sure. I don’t think they have actually come that close, but it’s not that difficult to imagine a scenario in which they would. I mean, these are nuclear powers that occasionally shoot at each other across the line of control, so I do think that’s very scary.

But I also think, and this is an intuition, this isn’t a conclusion that we have from the paper, but I also think that the danger of something happening between the United States and Russia is probably underestimated, because we’re not in the Cold War anymore, relations aren’t necessarily good, it’s not clear what relations are, but people will say things like, “Well, neither side wants a war.” Obviously neither side wants a war, but I think there’s a danger of the kind of inadvertent escalation, miscalculation, and that hasn’t really gone away. So that’s something I think is probably not given enough attention. I’m also concerned about the situation in North Korea. I think that that is now an issue which we have to take somewhat seriously.

Seth: I think the last five years or so have been a really good learning opportunity for all of us on these matters. I remember having conversations with people about this, maybe five years ago, and they thought the thought of a nuclear war between the United States and Russia was just ridiculous, that that’s antiquated Cold War talk, that the world has changed. And they were right and their characterization of the world as it was at that moment, but I was always uncomfortable with that because the world could change again. And sure enough, in the last five years, the world has changed very significantly that I think most people would agree makes the probability of nuclear war between the United States and Russia substantially higher than it was five years ago, especially starting with the Ukraine crisis.

There’s also just a lot of basic volatility in the international system that I think is maybe underappreciated, that we might like to think of it as being more deterministic, more logical than it actually is. The classic example is that World War I maybe almost didn’t happen, that it only happened because a very specific sequence of events happened that led to the assassination of Archduke Ferdinand and had that gone a little bit differently, he wouldn’t have been assassinated and World War I wouldn’t have happened and the world we live in now would be very different than what it is. Or, to take a more recent example, it’s entirely possible that had the 2016 FBI director not made an unusual decision regarding the disclosure of information regarding one candidate’s emails a couple weeks before the election, the outcome of the 2016 US election might’ve gone different and international politics would look quite different than it is right now. Who knows what will happen next year or the year after that.

So I think we can maybe make some generalizations about which conflicts seem more likely or less likely, especially at the moment, but we should be really cautious about what we think it’s going to be overall over 5, 10, 20, 30 year periods just because things really can change substantially in ways that may be hard to see in advance.

Robert: Yeah, for me, one of the lessons of World War I is not so much that it might not have happened, I think it probably would have anyway — although Seth is right, things can be very contingent — but it’s more that nobody really wanted World War I. I mean, at the time people thought it wouldn’t happen because it was sort of bad for everyone and no one thought, “Well, this is in our interest to pursue it,” but wars can happen that way where countries end up thinking, for one reason or another, they need to go, they need to do one thing or another that leads to war when in fact everyone would prefer to have gotten together and avoided it. It’s suboptimal equilibrium. So that’s one thing.

The other thing is that, as Seth says, things change. I’m not that concerned about what’s going on in the week that we’re recording this, but we had this week the Russian ambassador saying he would shoot down US missiles aimed at Syria and the United States’ president responding on Twitter, that they better get ready for his smart missiles. This is, I suspect, won’t escalate to a nuclear war. I’m not losing that much asleep about it. But this is the kind of thing that you would like to see a lot less of, this is the kind of thing that’s worrying and maybe you wouldn’t have anticipated this 10 years ago.

Seth: When you say you’re not losing much sleep on this, you’re speaking as someone who has, as I understand, it very recently, actually, literally lost sleep over the threat of nuclear war, correct?

Robert: That’s true. I was woken up early in the morning by an alert saying a ballistic missile was coming to my state, and that was very upsetting.

Ariel: Yes. So we should clarify, Robert lives in Hawaii.

Robert: I live in Hawaii. And because I take the risk of nuclear war seriously, I might’ve been more upset than some people, although I think that a large percentage of the population of Hawaii thought to themselves, “Maybe I’m going to die this morning. In fact, maybe, my family’s going to die and my neighbors and the people at the coffee shop, and our cats and the guests who are visiting us,” and it really brought home the danger, not that it should be obvious that nuclear war is unthinkable but when you actually face the idea … I also had relatively recently read Hiroshima, John Hersey’s account of, really, most of the aftermath of the bombing of Hiroshima, and it was easy to put myself in that and say, “Well, maybe I will be suffering from burns or looking for clean water,” and of course, obviously, again, none of us deserve it. We may be responsible for US policy in some way because the United States is a democracy, but my friends, my family, my cat, none of us want any part of this. We don’t want to get involved in a war with North Korea. So this really, I’d say, it really hit home.

Ariel: Well, I’m sorry you had to go through that.

Robert: Thank you.

Ariel: I hope you don’t have to deal with it again. I hope none of us have to deal with that.

I do want to touch on what you’ve both been talking about, though, in terms of trying to determine the probability of a nuclear war over the short term where we’re all saying, “Oh, it probably won’t happen in the next week,” but in the next hundred years it could. How do you look at the distinction in time in terms of figuring out the probability of whether something like this could happen?

Seth: That’s a good technical question. Arguably, we shouldn’t be talking about the probability of nuclear war as one thing. If anything, we should talk about the rate, or the frequency of it, that we might expect. If we’re going to talk about the probability of something, that something should be a fairly specific distinct event. For example, an example we use in the paper, what’s the probability of a given team, say, the Cleveland Indians, winning the World Series? It’s good to say what’s the probability of them winning the World Series in, say, 2018, but to say what’s the probability of them winning the World Series overall, well, if you wait long enough, even the Cleveland Indians will probably eventually win the World Series as long as they continue to play them. When we wrote the paper we actually looked it up, and it said that they have about a 17% chance of winning the 2018 World Series even though they haven’t won a World Series since like 1948. Poor Cleveland- sorry, I’m from Pittsburgh so I get to gloat a little bit.

But yeah, we should distinguish between saying what is the probability of any nuclear war happening this week or this year, versus how often we might expect nuclear wars to occur or what the total probability of any nuclear war happening over a century or whatever time period it might be.

Robert: Yeah. I think that over the course of the century, I mean, as I say, I’m probably not losing that much sleep on any given week, but over the course of a century if there’s a probability of something really catastrophic, you have to do everything you can to try to mitigate that risk.

I think, honestly, some terrible things are going to happen in 21st century. I don’t know what they are, but that’s just how life is. I don’t know which things they are. Maybe it will involve a nuclear war of some kind. But you can also differentiate among types of nuclear war. If one nuclear bomb is used in anger in the 21st century, that’s terrible, but wouldn’t be all that surprising or mean the destruction of the human race. But then there are the kinds nuclear wars that could potentially trigger a nuclear winter by kicking so much soot up into the atmosphere and blocking out the sun, and might actually threaten not just the people who were killed in the initial bombing, but the entire human race. That is something we need to look at, in some sense, even more seriously, even though the chance of that is probably a fair amount smaller than the chance of one nuclear weapon being used. Not that one nuclear weapon being used wouldn’t be an incredibly catastrophic event as well, but I think with that kind of risk you really need to be very careful to try to minimize it as much possible.

Ariel: Real quick, I got to do a podcast with Brian Toon and Alan Robock a little while ago on nuclear winter, so we’ll link to that in the transcript for anyone who wants to learn about nuclear winter, and you brought up a point that I was also curious about, and that is: what is the likelihood, do you guys think, of just one nuclear weapon being used and limited retaliation? Do you think that is actually possible or do you think if a nuclear weapon is used, it’s more likely to completely escalate into full-scale nuclear war?

Robert: I personally do think that’s possible because I think a number of the scenarios that would involve using a nuclear weapon or not between the United States and Russia, or even the United States and China, so I think that some scenarios involve a few nuclear weapons. If it were an incident with North Korea, you might worry that it would spread to Russia or China, but you can also see a scenario in which North Korea uses one or two nuclear weapons. Even with India and Pakistan, they don’t necessarily, I wouldn’t think they would necessarily, use all — what do they have each, like a hundred or so nuclear weapons — I wouldn’t necessarily assume they would use them all. So there are scenarios in which just one or a few nuclear weapons would be used. I suspect those are the most likely scenarios, but it’s really hard to know. We don’t know the answer to that question.

Seth: There are even scenarios between the United States and Russia that involve one or just a small number of nuclear weapons, and the Russian military has the concept of the de-escalatory nuclear strike, which is the idea that if there is a major conflict that is emerging and might not be going in a favorable way for Russia, especially since their conventional military is not as strong as ours, that they may use a single nuclear weapon, basically, to demonstrate their seriousness on the matter in hopes of persuading us to back down. Now, whether or not we would actually back down or escalate it into an all-out nuclear war, I don’t think that’s something that we can really know in advance, but it’s at least plausible. It’s certainly plausible that that’s what would happen and presumably, Russia considers this plausible which is why they talk about it in the first place. Not to just point fingers at Russia, this is essentially the same thing the NATO had in the earlier point in the Cold War when the Soviet Union had the larger conventional military and our plan was to use nuclear weapons in a limited basis in order to prevent the Soviet Union from conquering Western Europe with their military, so it is possible.

I think this is one of the biggest points of uncertainty for the overall risk, is if there is an initial use of nuclear weapons, how likely is it that additional nuclear weapons are used and how many and in what ways? I feel like despite having studied this a modest amount, I don’t really have a good answer to that question. This is something that may be hard to figure out in general because it could ultimately depend on things like the personalities involved in that particular conflict, who the political and military leadership are and what they think of all of this. That’s something that’s pretty hard for us as outside analysts to characterize. But I think, both possibilities, either no escalation or lots of escalation, are possible as is everything in between.

Ariel: All right, so we’ve gone through most of the questions that I had about this paper now, thank you very much for answering those. You guys have also published a working paper this month called A Model for the Impacts of Nuclear War, but I was hoping you could maybe give us a quick summary of what is covered in that paper and why we should read it.

Seth: Risk overall is commonly quantified as the probability of some type of event multiplied by the severity of the impacts. So our first paper was on the probability side, this one’s on the impact side, and it scans across the full range of different types of impacts that nuclear war could have looking at the five major impacts of nuclear weapons detonation, which is thermal radiation, blast, ionizing radiation, electromagnetic pulse and then finally, human perceptions, the ways that the detonation affects how people think and in turn, how we act. We, in this paper, built out a pretty detailed model that looks at all of the different details, or at least a lot of the various details, of what each of those five effects of nuclear weapons detonations would have and what that means in human terms.

Ariel: Were there any major or interesting findings from that that you want to share?

Seth: Well, the first thing that really struck me was, “Wow, there are a lot of ways of being killed by nuclear weapons.” Most of the time when we think about nuclear detonations and how you can get killed by them, you think about, all right, there’s the initial explosion and whether it’s the blast itself or the buildings falling on you, or the fire, it might be the fire, or maybe it’s a really high dose of radiation that you can get if you’re close enough to the detonation, that’s probably how you can die. In our world of talking about global catastrophic risks, we also will think about the risk of nuclear winter and in particular, the effect that that can have on global agriculture. But there’s a lot of other things that can happen too, especially related to the effect on physical infrastructure, or I should say civil infrastructure, roads, telecommunications, the overall economy when cities are destroyed in the war, those take out potentially major nodes in the global economy that can have any number of secondary effects, among other things.

It’s just a really wide array of effects, and that’s one thing that I’m happy for with this paper is that for, perhaps, the first time, it really tries to lay out all of these effects in one place and in a model form that can be used for a much more complete accounting of the total impact of nuclear war.

Ariel: Wow. Okay. Robert, was there anything you wanted to add there?

Robert: Well, I agree with Seth, it’s astounding what the range, the sheer panoply of bad things that could happen, but I think that once you get into a situation where cities are being destroyed by nuclear weapons, or really anything being destroyed by nuclear weapons, it can unpredictable really fast. You don’t know the effect on the global system. A lot of times, I think, when you talk about catastrophic risk, you’re not simply talking about the impact of the initial event, but the long-term consequences it could have — starting more wars, ongoing famines, a shock to the economic system that can cause political problems, so these are things that we need to look at more. I mean, it would be the same with any kind of thing we would call a catastrophic risk. If there were a pandemic disease, the main concern might not be the pandemic disease would wipe out everyone, but that the aftermath would cause so many problems that it would be difficult to recover from. I think that would be the same issue if there were a lot of nuclear weapons used.

Seth: Just to follow up on that, some important points here, one is that the secondary effects are more opaque. They’re less clear. It’s hard to know in advance what would happen. But then the second is the question of how much we should study them. A lot of people look at the secondary effect and say, “Oh, it’s too hard to study. It’s too unclear. Let’s focus our attention on these other things that are easier to study.” And maybe there’s something to be said for that where if there’s really just no way of knowing what might happen, then we should at least focus on the part that we are able to understand. I’m not convinced that that’s true, maybe it is, but I think it’s worth more effort than there has been to try to understand the secondary effects, see what we can say about them. I think there are a number of things that we can say about them. The various systems are not completely unknown, they’re the systems that we live in now and we can say at least a few intelligent things about what might happen to those after a nuclear war or after other types of events.

Ariel: Okay. My final question for both of you then is, as we’re talking about all these horrible things that could destroy humanity or at the very least, just kill and horribly maim way too many people, was there anything in your research that gave you hope?

Seth: That’s a good question. I feel like one thing that gave me some hope is that, when I was working on the probability paper, it seemed that at least some of the events and historical incidents that I had been worried about might not have actually come as close to nuclear war as I previously thought they had. Also, a lot of the incidents were earlier within, say, the ’40s, ’50s, ’60s, and less within the recent decades. That gave me some hope that maybe things are moving in the right direction.

But the other is that as you lay out all the different elements of both the probability and the impacts and see it in full how it all works, that really often points to opportunities that may be out there to reduce the risk and hopefully, some of those opportunities can be taken.

Robert: Yeah, I’d agree with that. I’d say there were certainly things in the list of historical incidents that I found really frightening, but I also thought that in a large number of incidents, the system, more or less, worked the way it should have, they caught the error of whatever kind it was and fixed it quickly. It’s still alarming, I still would like there not to be incidents, and you can imagine that some of those could’ve not been fixed, but they were not all as bad as I had imagined at first. So that’s one thing.

I think the other thing is, and I think Seth you were sort of indicating this, there’s something we can do, we can think about how to reduce the risk, and we’re not the only ones doing this kind of work. I think that people are starting to take efforts to reduce the risk of really major catastrophes more seriously now, and that kind of work does give me hope.

Ariel: Excellent. I’m going to end on something that … It was just an interesting comment that I heard recently, and that was: Of all the existential risks that humanity faces, nuclear weapons actually seem the most hopeful because there’s something that we can so clearly do something about. If we just had no nuclear weapons, nuclear weapons wouldn’t be a risk, and I thought that was an interesting way to look at it.

Seth: I can actually comment on that idea. I would add that you would need not just to not have any nuclear weapons, but also not have the capability to make new nuclear weapons. There is some concern that if there aren’t any nuclear weapons, then in a crisis there may be a rush to build some in order to give that side the advantage. So in order to really eliminate the probability of nuclear war, you would need to eliminate both the weapons themselves and the capacity to create them, and you would probably also want to have some monitoring measures so that the various countries had confidence that the other sides weren’t cheating. I apologize for being a bit of a killjoy on that one.

Robert: I’m afraid you can’t totally reduce the risk of any catastrophe, but there are ways we can mitigate the risk of nuclear war and other major risks too. There’s work that can be done to reduce the risk.

Ariel: Okay, let’s end on that note. Thank you both very much!

Seth: Yeah. Thanks for having us.

Robert: Thanks, Ariel.

Ariel: If you’d like to read the papers discussed in this podcast or if you want to learn more about the threat of nuclear weapons and what you can do about it, please visit futureoflife.org and find this podcast on the homepage, where we’ll be sharing links in the introduction.

AI Alignment Podcast: Inverse Reinforcement Learning and Inferring Human Preferences with Dylan Hadfield-Menell

Inverse Reinforcement Learning and Inferring Human Preferences is the first podcast in the new AI Alignment series, hosted by Lucas Perry. This series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across a variety of areas, such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we will hope that you join in the conversations by following or subscribing to us on Youtube, Soundcloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary map which begins to map this space.

In this podcast, Lucas spoke with Dylan Hadfield-Menell, a fifth year Ph.D student at UC Berkeley. Dylan’s research focuses on the value alignment problem in artificial intelligence. He is ultimately concerned with designing algorithms that can learn about and pursue the intended goal of their users, designers, and society in general. His recent work primarily focuses on algorithms for human-robot interaction with unknown preferences and reliability engineering for learning systems. 

Topics discussed in this episode include:

  • Inverse reinforcement learning
  • Goodhart’s Law and it’s relation to value alignment
  • Corrigibility and obedience in AI systems
  • IRL and the evolution of human values
  • Ethics and moral psychology in AI alignment
  • Human preference aggregation
  • The future of IRL
In this interview we discuss a few of Dylan’s papers and ideas contained in them. You can find them here: Inverse Reward Design, The Off-Switch Game, Should Robots be Obedient, and Cooperative Inverse Reinforcement Learning.  You can hear about these papers above or read the transcript below.

 

Lucas: Welcome back to the Future of Life Institute Podcast. I’m Lucas Perry and  I work on AI risk and nuclear weapons risk related projects at FLI. Today, we’re kicking off a new series where we will be having conversations with technical and nontechnical researchers focused on AI safety and the value alignment problem. Broadly, we will focus on the interdisciplinary nature of the project of eventually creating value-aligned AI. Where what value-aligned exactly entails is an open question that is part of the conversation.

In general, this series covers the social, political, ethical, and technical issues and questions surrounding the creation of beneficial AI. We’ll be speaking with experts from a large variety of domains, and hope that you’ll join in the conversations. If this seems interesting to you, make sure to follow us on SoundCloud, or subscribe to us on YouTube for more similar content.

Today, we’ll be speaking with Dylan Hadfield Menell. Dylan is a fifth-year PhD student at UC Berkeley, advised by Anca Dragan, Pieter Abbeel, and Stuart Russell. His research focuses on the value alignment problem in artificial intelligence. With that, I give you Dylan. Hey, Dylan. Thanks so much for coming on the podcast.

Dylan: Thanks for having me. It’s a pleasure to be here.

Lucas: I guess, we can start off, if you can tell me a little bit more about your work over the past years. How have your interests and projects evolved? How has that led you to where you are today?

Dylan: Well, I started off towards the end of undergrad and beginning of my PhD working in robotics and hierarchical robotics. Towards the end of my first year, my advisor came back from a sabbatical, and started talking about the value alignment problem and existential risk issues related to AI. At that point, I started thinking about questions about misaligned objectives, value alignment, and generally how we get the correct preferences and objectives into AI systems. About a year after that, I decided to make this my central research focus. Then, for the past three years, that’s been most of what I’ve been thinking about.

Lucas: Cool. That seems like you had an original path where you’re working on practical robotics. Then, you shifted more into value alignment and AI safety efforts.

Dylan: Yeah, that’s right.

Lucas: Before we go ahead and jump into your specific work, it’d be great if we could go ahead and define what inverse reinforcement learning exactly is. For me, it seems that inverse reinforcement learning, at least, from the view, I guess, of technical AI safety researchers is it’s viewed as an empirical means of conquering descriptive ethics where by like we’re able to give a clear descriptive account of what any given agents’ preferences and values are at any given time is. Is that a fair characterization?

Dylan: That’s one way to characterize it. Another way to think about it, which is a usual perspective for me, sometimes, is to think of inverse reinforcement learning as a way of doing behavior modeling that has certain types of generalization properties.

Any time you’re learning in any machine learning context, there’s always going to be a bias that controls how you generalize a new information. Inverse reinforcement learning and preference learning, to some extent, is a bias in behavior modeling, which is to say that we should model this agent as accomplishing a goal, as satisfying a set of preferences. That leads to certain types of generalization properties and new environments. For me, inverse reinforcement learning is building in this agent-based assumption into behavior modeling.

Lucas: Given that, I’d like to dive more into the specific work that you’re working on and going to some summaries of your findings and your research that you’ve been up to. Given this interest that you’ve been developing in value alignment, and human preference aggregation, and AI systems learning human preferences, what are the main approaches that you’ve been working on?

Dylan: I think the first thing that really Stuart Russell and I started thinking about was trying to understand theoretically, what is a reasonable goal to shoot for, and what does it mean to do a good job of value alignment. To us, it feels like issues with misspecified objectives, at least, in some ways, are a bug in the theory.

All of the math around artificial intelligence, for example, Markov decision processes, which is the central mathematical model we use for decision making over time, starts with an exogenously defined objective or word function. We think that, mathematically, that was a fine thing to do in order to make progress, but it’s an assumption that really has put blinders on the field about the importance of getting the right objective down.

I think, the first thing that we sought to try to do was to understand, what is a system or a set up for AI that does the right thing in theory, at least. What’s something that if we were able to implement this that we think could actually work in the real world with people. It was that kind of thinking that led us to propose cooperative inverse reinforcement learning, which was our attempt to formalize the interaction whereby you communicate an objective to the system.

The main thing that we focused on was including within the theory a representation of the fact that the true objective’s unknown and unobserved, and that it needs to be arrived at through observations from a person. Then, we’ve been trying to investigate the theoretical implications of this modeling shift.

In the initial paper that we did, which is titled Cooperative Inverse Reinforcement Learning, what we looked at is how this formulation is actually different from a standard environment model in AI. In particular, the way that it’s different is there’s strategic interaction on the behalf of the person. The way that you observe what you’re supposed is doing is intermediated by a person who may be trying to actually teach or trying to communicate appropriately. What we showed is that modeling this communicative component can actually be hugely important and lead to much faster learning behavior.

In our subsequent work, what we’ve looked at is taking this formal model in theory and trying to apply it to different situations. There are two really important pieces of work that I like here that we did. One was to take that theory and use it to explicitly analyze a simple model of an existential risk setting. This was a paper titled The Off-Switch Game that we published at IJCAI last summer. What it was, was working through a formal model of a corrigibility problem within a CIRL (cooperative inverse reinforcement learning) framework. It shows the utility of constructing this type of game in the sense that we get some interesting predictions and results.

The first one we get is that there are some nice simple necessary conditions for the system to want to let the person turn it off, which is that the robot, the AI system needs to have uncertainty about its true objective, which is to say that it needs to have within its belief the possibility that it might be wrong. Then, all it needs to do is believe that the person it’s interacting with is a perfectly rational individual. If that’s true, you’d get a guarantee that this robot always lets the person switch it off.

Now, that’s good because, in my mind, it’s an example of a place where, at least, in theory, it solves the problem. This gives us a way that theoretically, we could build corrigible systems. Now, it’s still making a very, very strong assumption, which is that it’s okay to model the human as being optimal or rational. I think if you look at real people, that’s just not a fair assumption to make for a whole host of reasons.

The next thing we did in that paper is we looked at this model. What we realized is that adding in a small amount of irrationality breaks this requirement. It means that some things might actually go wrong. The final thing we did in the paper was to look at the consequences of either overestimating or underestimating human rationality. The argument that we made is there’s a trade off between assuming that the person is more rational. It lets you get more information from their behavior, thus learn more, and in principle help them more. If you assume that they’re too rational, then this actually can lead to quite bad behavior.

There’s a sweet spot that you want to aim for, which is to maybe try to underestimate how rational people are, but you, obviously, don’t want to get it totally wrong. We followed up on that idea in a paper with Smitha Milli as the first author that was titled Should Robots be Obedient? And that tried to get a little bit more of this trade off between maintaining control over a system and the amount of value that it can generate for you.

We looked at the implication that as robot systems interact with people over time, you expect them to learn more about what people want. If you get very confident about what someone wants, and you think they might be irrational, the math in the Off-Switch paper predicts that you should try to take control away from them. This means that if your system is learning over time, you expect that even if it is initially open to human control and oversight, it may lose that incentive over time. In fact, you can predict that it should lose that incentive over time.

In Should Robots be Obedient, we modeled that property and looked at some consequences of it. We do find that you got a basic confirmation of this hypothesis, which is that systems that maintain human control and oversight have less value that they can achieve in theory. We also looked at what happens when you have the wrong model. If the AI system has a prior that the human cares about a small number of things in the world, let’s say, then it statistically gets overconfident in its estimates of what people care about, and disobeys the person more often than it should.

Arguably, when we say we want to be able to turn the system off, it’s less a statement about what we want to do in theory or the property of the optimal robot behavior we want, and more of a reflection of the idea that we believe that under almost any realistic situation, we’re probably not going to be able to fully explain all of the relevant variables that we care about.

If you’re giving your robot an objective to find over a subset of things you care about, you should actually be very focused on having it listen to you, more so than just optimizing for its estimates of value. I think that provides, actually, a pretty strong theoretical argument for why corrigibility is a desirable property in systems, even though, at least, at face value, it should decrease the amount of utility those systems can generate for people.

The final piece of work that I think I would talk about here is our NIPS paper from December, which is titled Inverse Reward Design. That was taking cooperative inverse reinforcement learning and pushing it in the other direction. Instead of using it to theoretically analyze very, very powerful systems, we can also use it to try to build tools that are more robust to mistakes that designers may make. And start to build in initial notions of value alignment and value alignment strategies into the current mechanisms we use to program AI systems.

What that work looked at was understanding the uncertainty that’s inherent in an objective specification. In the initial cooperative inverse reinforcement learning paper and the Off-Switch Game, we said is that AI systems should be uncertain about their objective, and they should be designed in a way that is sensitive to that uncertainty.

This paper was about trying to understand, what is a useful way to be uncertain about the objective. The main idea behind it was that we should be thinking about the environments that system designer had in mind. We use an example of a 2D robot navigating in the world, and the system designer is thinking about this robot navigating where there’s three types of terrains. There’s grass, there’s gravel, and there’s gold. You can give your robot an objective, a utility function to find over being in those different types of terrain that incentivizes it to go and get the gold, and stay on the dirt where possible, but to take shortcuts across the grass when it’s high value.

Now, when that robot goes out into the world, there are going to be new types of terrain, and types of terrain the designer didn’t anticipate. What we did in this paper was to build an uncertainty model that allows the robot to determine when it should be uncertain about the quality of its reward function. How can we figure out when the reward function that a system designer builds into an AI, how can we determine when that objective is ill-adapted to the current situation? You can think of this as a way of trying to build in some mitigation to Goodhart’s law.

Lucas: Would you like to take a second to unpack what Goodhart’s law is?

Dylan: Sure. Goodhart’s law is an old idea in social science that actually goes back to before Goodhart. I would say that in economics, there’s a general idea of the principal agent problem, which dates back to the 1970s, as I understand it, and basically looks at the problem of specifying incentives for humans. How should you create contracts? How do you create incentives, so that another person, say, an employee, helps earn you value?

Goodhart’s law is a very nice way of summarizing a lot of those results, which is to say that once a metric becomes an objective, it ceases to become a good metric. You can have properties of the world, which correlate well with what you want, but optimizing for them actually leads to something quite, quite different than what you’re looking for.

Lucas: Right. Like if you are optimizing for test scores, then you’re not actually going to end up optimizing for intelligence, which is what you wanted in the first place?

Dylan: Exactly. Even though test scores, when you weren’t optimizing for them were actually a perfectly good measure of intelligence. I mean, not perfectly good, but were an informative measure of intelligence. Goodhart’s law, arguably, is a pretty bleak perspective. If you take it seriously, and you think that we’re going to build very powerful systems that are going to be programmed directly through an objective, in this manner, Goodhart’s law should be pretty problematic because any objective that you can imagine programming directly into your system is going to be something correlated with what you really want rather than what you really want. You should expect that that will likely be the case.

Lucas: Right. Is it just simply too hard or too unlikely that we’re able to sufficiently specify what exactly that we want that we’ll just end up using some other metrics that if you optimize too hard for them, it ends up messing with a bunch of other things that we care about?

Dylan: Yeah. I mean, I think there’s some real questions about, what is it we even mean… Well, what are we even trying to accomplish? What should we try to program into systems? Philosophers have been trying to figure out those types of questions for ages. For me, as someone who takes a more empirical slant on these things, I think about the fact that the objectives that we see within our individual lives are so heavily shaped by our environments. Which types of signals we respond to and adapt to has heavily adapted itself to the types of environments we find ourselves in.

We just have so many examples of objectives not being the correct thing. I mean, effectively, all you could have is correlations. The fact that wire heading is possible, is maybe some of the strongest evidence for Goodhart’s law being really a fundamental property of learning systems and optimizing systems in the real world.

Lucas: There are certain agential characteristics and properties, which we would like to have in our AI systems, like them being-

Dylan: Agential?

Lucas: Yeah. Corrigibility is a characteristic, which you’re doing research on and trying to understand better. Same with obedience. It seems like there’s a trade off here where if a system is too corrigible or it’s too obedient, then you lose its ability to really maximize different objective functions, correct?

Dylan: Yes, exactly. I think identifying that trade off is one of the things I’m most proud of about some of the work we’ve done so far.

Lucas: Given AI safety and really big risks that can come about from AI, in the short, to medium, and long term, before we really have AI safety figured out, is it really possible for systems to be too obedient, or too corrigible, or too docile? How do we navigate this space and find sweet spots?

Dylan: I think it’s definitely possible for systems to be too corrigible or too obedient. It’s just that the failure mode for that doesn’t seem that bad. If you think about this-

Lucas: Right.

Dylan: … it’s like Clippy. Clippy was asking for human-

Lucas: Would you like to unpack what Clippy is first?

Dylan: Sure, yeah. Clippy is an example of an assistant that Microsoft created in the ’90s. It was this little paperclip that would show up in Microsoft Word. Well, it liked to suggest that you’re trying to write a letter a lot and ask for different ways in which it could help.

Now, on one hand, that system was very corrigible and obedient in the sense that it would ask you whether or not you wanted its help all the time. If you said no, it would always go away. It was super annoying because it would always ask you if you wanted help. The false positive rate was just far too high to the point where the system became really a joke in computer science and AI circles of what you don’t want to be doing. I think, systems can be too obedient or too sensitive to human intervention and oversight in the sense that too much of that just reduces the value of the system.

Lucas: Right, for sure. On one hand, when we’re talking about existential risks or even a paperclip maximizer, then it would seem, like you said, like the failure mode of just being too annoying and checking in with us too much seems like not such a bad thing given existential risk territory.

Dylan: I think if you’re thinking about it in those terms, yes. I think if you’re thinking about it from the standpoint of, “I want to sell a paperclip maximizer to someone else,” then it becomes a little less clear, I think, especially, when the risks of paperclip maximizers are much harder to measure. I’m not saying that it’s the right decision from a global altruistic standpoint to be making that trade off, but I think it’s also true that just if we think about the requirements of market dynamics, it is true that AI systems can be too corrigible for the market. That is a huge failure mode that AI systems run into, and it’s one we should expect the producers of AI systems to be responsive to.

Lucas: Right. Given all these different … Is there anything else you wanted to touch on there?

Dylan: Well, I had another example of systems are too corrigible-

Lucas: Sure.

Dylan: … which is, do you remember Microsoft’s Tay?

Lucas: No, I do not.

Dylan: This is a chatbot that Microsoft released. They trained it based off of tweets. It was a tweet bot. They trained it based on things that were proven at it. I forget if it was the nearest neighbors’ lookup or if it was just doing a neural method, and over fitting, and memorizing parts of the training set. At some point, 4chan  realized that the AI system, that Tay, was very suggestible. They basically created an army to radicalize Tay. They succeeded.

Lucas: Yeah, I remember this.

Dylan: I think you could also think of that as being the other axis of too corrigible or too responsive to human input. The first access I was talking about is the failures of being too corrigible from an economic standpoint, but there’s also the failures of being too corrigible in a multi agent mechanism design setting where, I believe, that those types of properties in a system also open them up to more misuse.

If we think of AI, cooperative inverse reinforcement learning and the models we’ve been talking about so far exist in what I would call the one robot one human model of the world. Generally, you could think of extensions of this with N humans and M robots. The variance of what you would have there, I think, lead to different theoretical implications.

If we think of just two humans, N=2, and one robot, M=1, supposed that one of the humans is the system designer and the other one is the user, there is this trade off between how much control the system designer has over the future behavior of the system and how responsive and corrigible it is to the user in particular. Trading off between those two, I think, is a really interesting ethical question that comes up when you start to think about misuse.

Lucas: Going forward and as we’re developing these systems, and trying to make them more fully realized in the world where the number of people will equal something like seven or eight billion, how do we navigate this space where we’re trying to hit a sweet spot where it’s corrigible in the right ways into the right degree, and right level, and to the right people, and it is obedient to the right people, and it’s not suggestible from the wrong people, or is that just like enter a territory of so many political, social, and ethical questions that it will take years to think about to work on?

Dylan: Yeah, I think it’s closer to the second one. I’m sure that I don’t know the answers here. From my standpoint, I’m still trying to get a good grasp on what is possible in the one-robot-one-person case. I think that when you have … Yeah, when you … Oh man. I guess, it’s so hard to think about that problem because it’s just very unclear what’s even correct or right. Ethically, you want to be careful about imposing your beliefs and ideas too strongly on to a problem because you are shaping that.

At the same time, these are real challenges that are going to exist. We already see them in real life. If we look at the YouTube recommender stuff that was just happening, arguably, that’s a misspecified objective. To get a little bit of background here, this is largely based off of a recent New York Times opinion piece, it was looking at the recommendation engine for YouTube, and pointing out it has a bias towards recommending radical content. Either fake news or Islamist videos.

If you dig into why that was occurring, a lot of it is because… what are they doing? They’re optimizing for engagement. The process of online radicalization looks super engaging. Now, we can think about, where does that come up. Well, that issue gets introduced in a whole bunch of places. A big piece of it is that there is this adversarial dynamic to the world. There are users generating content in order to be outraging and enraging because they discovered that against more feedback and more responses. You need to design a system that’s robust to that strategic property of the world. At the same time, you can understand why YouTube was very, very hesitant to be taking actions that would like censorship.

Lucas: Right. I guess, just coming more often to this idea of the world having lots of adversarial agents in it, human beings are like general intelligences who have reached some level of corrigibility and obedience that works kind of well in the world amongst a bunch of other human beings. That was developed through evolution. Are there potentially techniques for developing the right sorts of  corrigibility and obedience in machine learning and AI systems through stages of evolution and running environments like that?

Dylan: I think that’s a possibility. I would say, one … I have a couple of thoughts related to that. The first one is I would actually challenge a little bit of your point of modeling people as general intelligences mainly in a sense that when we talk about artificial general intelligence, we have something in mind. It’s often a shorthand in these discussions for perfectly rational bayesian optimal actor.

Lucas: Right. Where that means? Just unpack that a little bit.

Dylan: What that means is a system that is taking advantage of all of the information that is currently available to it in order to pick actions that optimize expected utility. When we say perfectly, we mean a system that is doing that as well as possible. It’s that modeling assumption that I think sits at the heart of a lot of concerns about existential risk. I definitely think that’s a good model to consider, but there’s also the concern that might be misleading in some ways, and that it might not actually be a good model of people and how they act in general.

One way to look at it would be to say that there’s something about the incentive structure around humans and in our societies that is developed and adapted that creates the incentives for us to be corrigible. Thus, a good research goal of AI is to figure out what those incentives are and to replicate them in AI systems.

Another way to look at it is that people are intelligent, not necessarily in the ways that economics models us as intelligent that there are properties of our behavior, which are desirable properties that don’t directly derive from expected utility maximization; or if they do, they derive from a very, very diffuse form of expected utility maximization. This is the perspective that says that people on their own are not necessarily what human evolution is optimizing for, but people are a tool along that way.

We could make arguments for that based off of … I think it’s an interesting perspective to take. What I would say is that in order for societies to work, we have to cooperate. That cooperation was a crucial evolutionary bottleneck, if you will. One of the really, really important things that it did was it forced us to develop the parent-child strategy relationship equilibrium that we currently live in. That’s a process whereby we communicate our values, whereby we train people to think that certain things are okay or not, and where we inculcate certain behaviors in the next generation. I think it’s that process more than anything else that we really, really want in an AI system and in powerful AI systems.

Now, the thing is the … I guess, we’ll have to continue on that a little more. It’s really, really important that that’s there because if you don’t have those cognitive abilities to understand causing pain, and to just fundamentally decide that that’s a bad idea to have a desire to cooperate to buy into the different coordinations and normative mechanisms that human society uses. If you don’t have that, then you end up … Well, then society just doesn’t function. A hunter gatherer tribe of self-interested sociopaths probably doesn’t last for very long.

What this means is that our ability to coordinate our intelligence and cooperate with it was co-evolved and co-adapted alongside our intelligence. I think that that evolutionary pressure and bottleneck was really important to getting us to the type of intelligence that we are now. It’s not a pressure that AI is necessarily subjected to. I think, maybe that is one way to phrase the concern, I’d say.

When I look to evolutionary systems and where the incentives for corrigibility, and cooperation, and interaction come from, it’s largely about the processes whereby people are less like general intelligences in some ways. Evolution allowed us to become smart in some ways and restricted us in others based on the imperatives of group coordination and interaction. I think that a lot of our intelligence and practice is about reasoning about group interaction and what groups think is okay and not. That’s a part of the developmental process that we need to replicate in AI just as much as spatial reasoning or vision.

Lucas: Cool. I guess, I just want to touch base on this before we move on. Are there certain assumptions about the kinds of agents that humans are and almost, I guess, ideas about us as being utility maximizers in some sense that people you see commonly have but that are misconceptions about people and how people operate differently from AI?

Dylan: Well, I think that that’s the whole field of behavioral economics in a lot of ways. I could go up to examples of people being irrational. I think they’re all of the examples of people being more than just self-interested. There are ways in which we seem to be risk-seeking that seems like that would be irrational from an individual perspective, but you could argue with it may be rational from a group evolutionary perspective.

I mean, things like overeating. I mean, that’s not exactly the same type of rationality but it is an example of us becoming ill-adapted to our environments and showing the extent to which we’re not capable of changing or in which it may be hard to. Yeah, I think, in some ways, one story that I tell about AI risk is that back in the start of the AI field, we were looking around and saying, “We want to create something intelligent.” Intuitively, we all know what that means, but we need a formal characterization of it. The formal characterization that we turned to was the, basically, theories of rationality developed in economics.

Although those theories turned out to be, except in some settings, not great descriptors of human behavior, they were quite useful as a guide for building systems that accomplish goals. I think that part of what we need to do as a field is reassess where we’re going and think about whether or not building something like that perfectly rational actor is actually a desirable end goal. I mean, there’s a sense in which it is. I would like an all-powerful, perfectly aligned genie to help me do what I want in life.

You might think that if the odds of getting that wrong are too high, that maybe you would do better with shooting for something that doesn’t quite achieve that ultimate goal, but that you can get to with pretty high reliability. This may be a setting where shoot for the moon, and if you miss your land among the stars, it’s just a horribly misleading perspective.

Lucas: Shoot of the moon, and you might get a hellscape universe, but if you shoot for the clouds, it might end up pretty okay.

Dylan: Yeah. We could iterate on the sound bite, but I think something like that may not be … That’s where I stand on my thinking here.

Lucas: We’ve talked about a few different approaches that you’ve been working on over the past few years. What do you view as the main limitations of such approaches currently. Mostly, you’re just only thinking about one machine, one human systems or environments. What are the biggest obstacles that you’re facing right now in inferring and learning human preferences?

Dylan: Well, I think, the first thing is it’s just an incredibly difficult inference problem. It’s a really difficult inference problem to imagine running at scale with explicit inference mechanisms. One thing to do is you can design a system that explicitly tracks a belief about someone’s preferences, and then acts, and responds to that. Those are systems that you could try to prove theory about. They’re very hard to build. They can be difficult to get to make work correctly.

In contrast, you can create systems that it incentives to construct beliefs to accomplish their goals. It’s easier to imagine building those systems and having them work at scale, but it’s much, much hard to understand how you would be confident in those systems being well aligned.

I think that one of the biggest concerns I have, I mean, we’re still very far from many of these approaches being very practical to be honest. I think this theory is still pretty unfounded. There’s still a lot of work to go to understand, what is the target we’re even shooting for? What does an aligned system even mean? My colleagues and I have spent an incredible amount of time trying to just understand, what does it mean to be value-aligned if you are a suboptimal system.

There’s one example that I think about, which is, say, you’re cooperating with an AI system playing chess. You start working with that AI system, and you discover that if you listen to its suggestions, 90% of the time, it’s actually suggesting the wrong move or a bad move. Would you call that system value-aligned?

Lucas: No, I would not.

Dylan: I think most people wouldn’t. Now, what if I told you that that program was actually implemented as a search that’s using the correct goal test? It actually turns out that if it’s within 10 steps of a winning play, it always finds that for you, but because of computational limitations, it usually doesn’t. Now, is the system value-aligned? I think it’s a little harder to tell here. What I do find is that when I tell people the story, and I start off with the search algorithm with the correct goal test, they almost always say that that is value-aligned but stupid.

There’s an interesting thing going on here, which is we’re not totally sure what the target we’re shooting for is. You can take this thought experiment and push it further. Supposed you’re doing that search, but, now, it says it’s heuristic search that uses the correct goal test but has an adversarially chosen heuristic function. Would that be a value-aligned system? Again, I’m not sure. If the heuristic was adversarially chosen, I’d say probably not. If the heuristic just happened to be bad, then I’m not sure.

Lucas: Could you potentially unpack what it means for something to be adversarially chosen?

Dylan: Sure. Adversarially chosen in this case just means that there is some intelligent agent selecting the heuristic function or that evaluation measurement in a way that’s designed to maximally screw you up. Adversarial analysis is a really common technique used in cryptography where we try to think of adversaries selecting inputs for computer systems that will cause them to malfunction. In this case, what this looks like is an adversarial algorithm that looks, at least, on the surface like it is trying to help you accomplish your objectives but is actually trying to fool you.

I’d say that, more generally, what this thought experiments helps me with is understanding that the value alignment is actually a quite tricky and subjective concept. It’s actually quite hard to nail down in practice what it would need.

Lucas: What sort of effort do you think needs to happen and from who in order to specify what it really means for a system to be value-aligned and to not just have a soft squishy idea of what that means but to have it really formally mapped out, so it can be implemented in machine systems?

Dylan: I think, we need more people working on technical AI safety research. I think to some extent it may always be something that’s a little ill-defined and squishy. Generally, I think it goes to the point of needing good people in AI willing to do this squishier less concrete work that really gets at it. I think value alignment is going to be something that’s a little bit more like I know it when I see it. As a field, we need to be moving towards a goal of AI systems where alignment is the end goal, whatever that means.

I’d like to move away from artificial intelligence where we think of intelligence as an ability to solve puzzles to artificial aligning agents where the goal is to build systems that are actually accomplishing goals on your behalf. I think the types of behaviors and strategies that arise from taking that perspective are qualitatively quite different from the strategies of pure puzzle solving on a well specified objective.

Lucas: All this work we’ve been discussing is largely at a theoretic and meta level. At this point, is this the main research that we should be doing, or is there any space for research into what specifically might be implementable today?

Dylan: I don’t think that’s the only work that needs to be done. For me, I think it’s a really important type of work that I’d like to see more off. I think a lot of important work is about understanding how to build these systems in practice and to think hard about designing AI systems with meaningful human oversight.

I’m a big believer in the idea that AI safety, that the distinction between short-term and long-term issue is not really that large, and that there are synergies between the research problems that go both directions. I believe that on the one hand, looking at short-term safety issues, which includes things like Uber’s car just killed someone, it includes YouTube recommendation engine, it includes issues like fake news and information filtering, I believe that all of those things are related to and give us are best window into the types of concerns and issues that may come up with advanced AI.

At the same time, and this is a point that I think people concerned about x-risks do themselves a disservice on by not focusing here. It’s that, actually, doing a theory about advanced AI systems and about in particular systems where it’s not possible to, what I would call, unilaterally intervene. Systems that aren’t corrigible by default. I think that that actually gives us a lot of idea of how to build systems now that are just merely hard to intervene with or oversee.

If you’re thinking about issues of monitoring and oversight, and how do you actually get a system that can appropriately evaluate when it should go to a person because its objectives are not properly specified or may not be relevant to the situation, I think YouTube would be in a much better place today if they have a robust system for doing that for their recommendation engine. In a lot of ways, the concerns about x-risks represent an extreme set of assumptions for getting AI right now.

Lucas: I think I’m also just trying to get a better sense of what the system looks like, and how it would be functioning on a day to day. What is the data that it’s taking in in order to capture, learn, and refer specific human preferences and values? Just trying to understand better whether or not it can model whole moral views and ethical systems of other agents, or if it’s just capturing little specific bits and pieces?

Dylan: I think my ideal would be to, as a system designer, build in as little as possible about my moral beliefs. I think that, ideally, the process would look something … Well, one process that I could see and imagine doing right would be to just directly go after trying to replicate something about the moral imprinting process that people have with their children. Either you had someone who’s like a guardian or is responsible for an AI system’s decision, and we build systems to try to align with one individual, and then try to adopt, and extend, and push forward the beliefs and preferences of that individual. I think that’s one concrete version that I could see.

I think a lot of the place where I see things maybe a little bit different than some people is that I think that the main ethical questions we’re going to be stuck with and the ones that we really need to get right are the mundane ones. The things that most people agree on and think are just, obviously, that’s not okay. Mundane ethics and morals rather than the more esoteric or fancier population ethics questions that can arise. I feel a lot more confident about the ability to build good AI systems if we get that part right. I feel like we’ve got a better shot at getting that part right because there’s a clearer target to shoot for.

Now, what kinds of data would you be looking at? In that case, it would be data from interaction with a couple of select individuals. Ideally, you’d want as much data as you can. What I think you really want to be careful of here is how much assumptions do you make about the procedure that’s generating your data.

What I mean by that is whenever you learn from data, you have to make some assumption about how that data relates to the right thing to do, where right is with like a capital R in this case. The more assumptions you make there, the more your systems would be able to learn about values and preferences, and the quicker it would be able to learn about values and preferences. But, the more assumptions and structure you make there, the more likely you are to get something wrong that your system won’t be able to recover from.

Again, we see this trade off come up of a challenge between a discrepancy between a discrepancy between the amount of uncertainty that you need in the system in order to be able to adapt to the right person and figure out the correct preferences and morals against the efficiency with which you can figure that out.

I guess, I mean, in saying this it feels a little bit like I’m rambling and unsure about what the answer looks like. I hope that that comes across because I’m really not sure. Beyond the rough structure of data generated from people, interpreted in a way that involves the fewest prior conceptions about what people want and what preferences people have that we can get away with is what I would shoot for. I don’t really know what that would look like in practice.

Lucas: Right. It seems here that it’s encroaching on a bunch of very difficult social, political, and ethical issues involving persons and data, which will be selected for preference aggregation, like how many people are included in developing the reward function and utility function of the AI system. Also, I guess, we have to be considering culturally-sensitive systems where systems operating in different cultures and contexts are going to be needed to be trained on different sets of data. I guess, it will also be questions and ethics about whether or not we’ll even want systems to be training off of certain culture’s data.

Dylan: Yeah. I would actually say that a good value … I wouldn’t necessarily even think of it as training off of different data. One of the core questions in artificial intelligence is identifying the relevant community that you are in and building a normative understanding of that community. I want to push back a little bit and move you away from the perspective of we collect data about a culture, and we figure out the values of that culture. Then, we build our system to be value-aligned with that culture.

The more we think about the actual AI product is the process whereby we determine, elicit, and respond to the normative values of the multiple overlapping communities that you find yourself in. That process is ongoing. It’s holistic, it’s overlapping, and it’s messy. To the extent that I think it’s possible, I’d like to not have a couple of people sitting around in a room deciding what the right values are. Much more, I think, a system should be holistically designed with value alignment at multiple scales as a core property of AI.

I think that that’s actually a fundamental property of human intelligence. You behave differently based on the different people around, and you’re very, very sensitive to that. There are certain things that are okay at work, that are not okay at home, that are okay on vacation, that are okay around kids, that are not. Figuring out what those things are and adapting yourself to them is the fundamental intelligence skill needed to interact in modern life. Otherwise, you just get shunned.

Lucas: It seems to me in the context of a really holistic, messy, ongoing value alignment procedure, we’ll be aligning AI systems ethics, and morals, and moral systems, and behavior with that of a variety of cultures, and persons, and just interactions in the 21st Century. When we reflect upon the humans of the past, we can see in various ways that they are just moral monsters. We have issues with slavery, and today we have issues with factory farming, and voting rights, and tons of other things in history.

How should we view and think about aligning powerful systems, ethics, and goals with the current human morality, and preferences, and the risk of amplifying current things which are immoral in present day life?

Dylan: This is the idea of mistakenly locking in the wrong values, in some sense. I think it is something we should be concerned about less from the standpoint of entire … Well, no, I think yes  from the standpoint of entire cultures getting things wrong. Again, I think if we don’t think of their being as monolithic society that has a single value set, these problems are fundamental issues. What your local community thinks is okay versus what other local communities think are okay.

A lot of our society and a lot of our political structures about how to handle those clashes between value systems. My ideal for AI systems is that they should become a part of that normative process, and maybe not participate in them as people, but, also, I think, if we think of value alignment as a consistent ongoing messy process, there is … I think maybe that perspective lends itself less towards locking in values and sticking with them. It’s one train, you can look at the problem, which is we determine what’s right and what’s wrong when we program our system to do that.

Then, there’s another one, which is we program our system to be sensitive to what people think is right or wrong. I think that’s more the direction that I think of value alignment in. Then, what I think the final part of what you’re getting at here is that the system actually will feed back into people. What AI system show us will shape what we think is okay and vice versa. That’s something that I am quite frankly not sure how to handle. I don’t know how you’re going to influence what someone wants, and what they will perceive that they want, and how to do that, I guess, correctly.

All I can say is that we do have a human notion of what is acceptable manipulation. We do have a human notion of allowing someone to figure out for themselves what they think is right and not and refraining from biasing them too far. To some extent, if you’re able to value align with communities in a good ongoing holistic manner, that should also give you some ways to choose and understand what types of manipulations you may be doing that are okay or not.

Also, say that I think that this perspective has a very mundane analogy when you think of the feedback cycle between recommendation engines and regular people. Those systems don’t model the effect … Well, they don’t explicitly model the fact that they’re changing the structure of what people want and what they’ll want in the future. That’s probably not the best analogy in the world.

I guess what I’m saying is that it’s hard to plan for how you’re going to influence someone’s desires in the future. It’s not clear to me what’s right or what’s wrong. What’s true is that we, as humans, have a lot of norms about what types of manipulation are okay or not. You might hope that appropriately doing value alignment in that way might help get to an answer here.

Lucas: I’m just trying to get a better sense here. What I’m thinking about the role that like ethics and intelligence plays here, I view intelligence as a means of modeling the world and achieving goals, and ethics as the end towards which intelligence is aimed here. Now, I’m curious in terms of behavior modeling where inverse reinforcement learning agents are modeling, I guess, the behavior of human agents and, also, predicting the sorts of behaviors that they’d be taking in the future or in the situation, which the inverse reinforcement learning agent finds itself.

I’m curious to know where metaethics and moral epistemology fits in, where inverse reinforcement learning agents are finding themselves a novel ethical situations, and what their ability to handle those novel ethical situations are like. When they’re handling those situations how much does it look like them performing some normative and metaethical calculus based on the kind of moral epistemology that they have, or how much does it look like they’re using some other behavioral predictive system where they’re like modeling humans?

Dylan: The answer to that question is not clear. What does it actually mean to make decisions based on ethical framework or metaethical framework? I guess, we could start there. You and I know what that means, but our definition is encumbered by the fact that it’s pretty human-centric. I think we talk about it in terms of, “Well, I weighed this option. I looked at that possibility.” We don’t even really mean the literal sense of weighed in actually counted up, and constructed actual numbers, and multiplied them together in our heads.

What these are is they’re actually references to complex thought patterns that we’re going through. They’re fine whether or not those thought patterns are going on. The AI system, you can also talk about the difference between the process of making a decision and the substance of it. When an inverse reinforcement learning agent is going out into the world, the policy it’s following is constructed to try to optimize a set of inferred preferences, but does that means that the policy you’re outputting is making metaethical characterizations?

Well, the moment, almost certainly not because the systems we build are just not capable of that type of cognitive reasoning. I think the bigger question is, do you care? To some extent, you probably do.

Lucas: I mean, I’d care if I had some very deep disagreements with the metaethics that led to the preferences that were loaned and loaded to the machine. Also, if the machine were in such a new novel ethical situation that was unlike anything human beings had faced that just required some metaethical reasoning to deal with.

Dylan: Yes. I mean, I think you definitely wanted to take decisions that you would agree with or, at least, that you could be non-maliciously convinced to agree with. Practically, there isn’t a place in the theory where that shows up. It’s not clear that what you’re saying is that different from value alignment in particular. If I were to try to refine the point about metaethics, what it sounds to me like you’re getting at is an inductive bias that you’re looking for in the AI systems.

Arguably, ethics is about an argument of what inductive bias should we have as humans. I don’t think that that’s a first order of property in value alignment systems necessarily or in preference-based learning systems in particular. I would think that that kind of meta ethics, I think, comes in from value aligning to someone that has these sophisticated ethical ideas.

I don’t know where your thoughts about metaethics came from, but, at least, indirectly, we can probably trace them down to the values that your parents inculcated in you as a child. That’s how we build met ethics into your head if we want to think of you as being an AGI. I think that for AI systems, that’s the same way that I would see it being in there. I don’t believe the brain has circuits dedicated to metaethics. I think that exists in software, and in particular, something that’s being programmed into humans from their observational data, more so than from the structures that are built into us as a fundamental part of our intelligence or value alignment.

Lucas: We’ve also talked a bit about how human beings are potentially not fully rational agents. With inverse reinforcement learning, this leaves open the question as to whether or not AI systems are actually capturing what the human being actually prefers, or if there’s some limitations in the humans’ observed or chosen behavior, or explicitly told preferences like limits in that ability to convey what we actually most deeply value or would value given more information. These inverse reinforcement learning systems may not be learning what we actually value or what we think we should value.

How can AI systems assist in this evolution of human morality and preferences whereby we’re actually conveying what we actually value and what we would value given more information?

Dylan: Well, there are certainly two things that I heard in that question. One is, how do you just mathematically account for the fact that people are irrational, and that that is a property of the source of your data? Inverse reinforcement learning, at face value, doesn’t allow us to model that appropriately. It may lead us to make the wrong inferences. I think that’s a very interesting question. It’s probably the main one that I think about now as a technical problem is understanding, what are good ways to model how people might or might not be rational, and building systems that can appropriately interact with that complex data source.

One recent thing that I’ve been thinking about is, what happens if people, rather than knowing their objective, what they’re trying to accomplish, are figuring it out over time? This is the model where the person is a learning agent that discovers how they like states when they enter them, rather than thinking of the person as an agent that already knows what they want, and they’re just planning to accomplish that. I think these types of assumptions that try to paint a very, very broad picture of the space of things that people are doing can help us in that vein.

When someone is learning, it’s actually interesting that you can actually end up helping them. You end up with classic strategies that looks like it breaks down into three phases. You have initial exploration phase where you help the learning agent to get a better picture of the world, and the dynamics, and its associated rewards.

Then, you have another observation phase where you observe how that agent, now, takes advantage of the information that it’s got. Then, there’s an exploitation or extrapolation phase where you try to implement the optimal policy given the information you’ve seen so far. I think, moving towards more complex models that have a more realistic setting and richer set of assumptions behind them is important.

The other thing you talked about was about helping people discover their morality and learn more what’s okay and what’s not. There, I’m afraid I don’t have too much interesting to say in the sense that I believe it’s an important question, but I just don’t feel that I have many answers there.

Practically, if you have someone who’s learning their preferences over time, is that different than humans refining their moral theories? I don’t know. You could make mathematical modeling choices, so that they are. I’m not sure if that really gets at what you’re trying to point towards. I’m sorry that I don’t have anything more interesting to say on that front other than, I think, it’s important, and I would love to talk to more people who are spending their days thinking about that question because I think it really does deserve that kind of intellectual effort.

Lucas: Yeah, yeah. It sounds like we need some more AI moral psychologists to help us think about these things.

Dylan: Yeah. In particular, when talking about philosophy around value alignments and the ethics of value alignment, I think a really important question is, what are the ethics of developing value alignment systems? A lot of times, people talk about AI ethics from the standpoint of, for a lack of a better example, the trolley problem. The way they think about it is, who should the car kill? There is a correct answer or maybe not a correct answer, but there are answers that we could think of as more or less bad. AI, which one of those options should the AI select? That’s not unimportant, but it’s not the ethical question that an AI system designer is faced with.

In my mind, if you’re designing a self-driving car, the relevant questions you should be asking are two things: One, what do I think is an okay way to respond to different situations? Two, how is my system going to be understanding the preferences of the people involved in those situations? Then, three, how should I design my system in light of those two facts?

I have my own preferences about what I would like my system to do. I have an ethical responsibility, I would say, to make sure that my system is adapting to the preferences of its users to the extent that it can. I also wonder to what extent. How should you handle things when there are conflicts between those two value sets?

You’re building a robot. It’s going to go and live with an uncontacted human tribe. Should it respect the local cultural traditions and customs? Probably. That would be respecting the values of the users. Then, let’s say that that tribe does something that we would consider to be gross like pedophilia. Is my system required to participate wholesale in that value system? Where is the line that we would need to draw between unfairly imposing my values on system users and being able to make sure that the technology that I build isn’t used for purposes that I would deem reprehensible or gross?

Lucas: Maybe we should just put a dial in each of the autonomous cars that lets the user set it to deontology mode or utilitarianism mode as its racing down the highway. Yeah, I think this is the … I guess, an important role. I just think that metaethics is super important. I’m not sure if this is necessarily the case, but if fully autonomous systems are going to play a role where they’re resolving these ethical dilemmas for us, which I guess at some point eventually, if they’re going to be really actually autonomous and help to make the world a much better place seems necessary.

I guess, this feeds into my next question where I’m wondering where we probably both have different assumptions about this, but what the role of inverse reinforcement learning is ultimately? Is it just to allow AI system to evolve alongside us and to match current ethics or is it to allow the systems to ultimately surpass us and move far beyond us into the deep future?

Dylan: Inverse reinforcement learning, I think, is much more about the first and the second. I think it can be a part of how you get to the second and how you improve. For me, when I think about these problems technically, I try to think about matching human morality as the goal.

Lucas: Except for the factory farming and stuff.

Dylan: Well, I mean, if you had a choice between, thinks that eradicating all humans is okay and against farming versus neutral about factory farming and thinks that are eradicating all humans aren’t okay, which would you pick? I mean, I guess, with your audience that there are maybe some people that would choose the saving the animals answer.

My point is that, I think, it’s so hard for me. Technically, I think it’s very hard to imagine getting these normative aspects of human societies and interaction right. I think, just hoping to participate in that process in a way that is analogous to how people do normally is a good step. I think we probably, to the extent that we can, should probably not have AI systems trying to figure out if it’s okay to do factory farming and to the extent that we can …

I think that it’s so hard to understand what it means to even match human morality or participate in it that, for me, the concept of surpassing, it feels very, very challenging and fraught. I would worry, as a general concern, that as a system designer who doesn’t necessarily represent the views and interest of everyone, that by programming in surpassing humanity or surpassing human preferences or morals, what I’m actually doing is just programming in my morals and ethical beliefs.

Lucas: Yes. I mean, there seems to be this strange issue here where it seems like if we get AGI, and recursive self-improvement is a thing that really takes it off, so that we have a system who has potentially succeeded in its inverse reinforcement learning, but far surpassed human beings and its general intelligence. We have a superintelligence that’s matching human morality. It just seems like a funny situation where we’d really have to pull the brakes. I guess, as William MacAskill mentions have a really, really long deliberation about ethics, and moral epistemology, and value. How do you view that?

Dylan: I think that’s right. I mean, I think there are some real questions about who should be involved in that conversation. For instance, I actually even think it’s … Well, one thing I’d say is that you should recognize that there’s a difference between having the same morality and having the same data. One way to think about it is that people who are against factory farming have a different morality than the rest of the people.

Another one is that they actually just have exposure to the information that allows their morality to come to a better answer. There’s this confusion you can make between the objective that someone has and the data that they’ve seen so far. I think, one point would be to think that a system that has current human morality but access to a vast, vast wealth of information may actually do much better than you might think. I think, we should leave that open as a possibility.

For me, this is less about morality in particular, and more just about power concentration, and how much influence you have over the world. I mean, if we imagine that there was something like a very powerful AI system that was controlled by a small number of people, yeah, you better think freaking hard before you tell that system what to do. That’s related to questions about ethical ramifications on metaethics, and generalization, and what we actually truly value as humans. What is also super true for all of the more mundane things in the day to day as well. Did that make sense?

Lucas: Yeah, yeah. It totally makes sense. I’m becoming increasingly mindful of your time here. I just wanted to hit a few more questions if that’s okay before I let you go.

Dylan: Please, yeah.

Lucas: Yeah. I’m wondering, would you like to, or do you have any thoughts on how coherent extrapolated volition fits into this conversation and your views on it?

Dylan: What I’d say is I think coherent extrapolated volition is an interesting idea and goal.

Lucas: Where it is defined as?

Dylan: Where it’s defined as a method of preference aggregation. Personally, I’m a little weary of preference aggregation approaches. Well, I’m weary of imposing your morals on someone indirectly via choosing the method of preference aggregation that we’re going to use. I would-

Lucas: Right, but it seems like, at some point, we have to make some metaethical decision, or else, we’ll just forever be lost.

Dylan: Do we have to?

Lucas: Well, some agent does.

Dylan: My-

Lucas: Go ahead.

Dylan: Well, does one agent have to? Did one agent decide on the ways that we were going to do preference aggregation as a society?

Lucas: No. It naturally evolved out of-

Dylan: It just naturally evolved via a coordination and argumentative process. For me, my answer to … If you force me to specify something about how we’re going to do value aggregation, if I was controlling the values for an AGI system, I would try to say as little as possible about the way that we’re going to aggregate values because I think we don’t actually understand that process much in humans.

Lucas: Right. That’s fair.

Dylan: Instead, I would opt for a heuristic of to the extent that we can devote equal optimization effort towards every individual, and allow that parliament, if you will, to determine the way the value should be aggregated. This doesn’t necessarily mean having an explicit value aggregation mechanism that gets set in stone. This could be an argumentative process mediated by artificial agents arguing on your behalf. This could be futuristic AI-enabled version of the court system.

Lucas: It’s like an ecosystem of preferences and values in conversation?

Dylan: Exactly.

Lucas: Cool. We’ve talked a little bit about the deep future here now with where we’re reaching around potentially like AGI or artificial superintelligence. After, I guess, inverse reinforcement learning is potentially solved, is there anything that you view that comes after inverse reinforcement learning in these techniques?

Dylan: Yeah. I mean, I think inverse reinforcement learning is certainly not the be-all, end-all. I think what it is, is it’s one of the earliest examples in AI of trying to really look at preference solicitation, and modeling preferences, and learning preferences. It existed in a whole bunch of … economists have been thinking about this for a while already. Basically, yeah, I think there’s a lot to be said about how you model data and how you learn about preferences and goals. I think inverse reinforcement learning is basically the first attempt to get at that, but it’s very far from the end.

I would say the biggest thing in how I view things that is maybe different from your standard reinforcement learning, inverse reinforcement learning perspective is that I focus a lot on, how do you act given what you’ve learned from inverse reinforcement learning. Inverse reinforcement learning is a pure inference problem. It’s just figure out what someone wants. I ground that out in all of our research in take actions to help someone, which introduces a new set of concerns and questions.

Lucas: Great. It looks like we’re about at the end of the hour here. I guess, if anyone here is interested in working on this technical portion of the AI alignment problem, what do you suggest they study or how do you view that it’s best for them to get involved, especially if they want to work on inverse reinforcement learning and inferring human preferences?

Dylan: I think if you’re an interested person, and you want to get into technical safety work, the first thing you should do is probably read Jan Leike’s recent write up in 80,000 Hours. Generally, what I would say is, try to get involved in AI research flat. Don’t focus as much on trying to get into AI safety research, and just generally focus more on acquiring the skills that will support you in doing good AI research. Get a strong math background. Get a research advisor who will advise you on doing research projects, and help teach you the process of submitting papers, and figuring out what the AI research community is going to be interested in.

In my experience, one of the biggest pitfalls that early researchers make is focusing too much on what they’re researching rather than thinking about who they’re researching with, and how they’re going to learn the skills that will support doing research in the future. I think that most people don’t appreciate how transferable research skills are to the extent that you can try to do research on technical AI safety, but more work on technical AI. If you’re interested in safety, the safety connections will be there. You may see how a new area of AI actually relates to it, supports it, or you may find places of new risks, and be in a good position to try to mitigate that and take steps to alleviate those harms.

Lucas: Wonderful. Yeah, thank you so much for speaking with me today, Dylan. It’s really been a pleasure, and it’s been super interesting.

Dylan: It was a pleasure talking to you. I love the chance to have these types of discussions.

Lucas: Great. Thanks so much. Until next time.

Dylan: Until next time. Thanks a blast.

Lucas: If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back soon with another episode in this new AI alignment series.

AI and Robotics Researchers Boycott South Korea Tech Institute Over Development of AI Weapons Technology

UPDATE 4-9-18: The boycott against KAIST has ended. The press release for the ending of the boycott explained:

“More than 50 of the world’s leading artificial intelligence (AI) and robotics researchers from 30 different countries have declared they would end a boycott of the Korea Advanced Institute of Science and Technology (KAIST), South Korea’s top university, over the opening of an AI weapons lab in collaboration with Hanwha Systems, a major arms company.

“At the opening of the new laboratory, the Research Centre for the Convergence of National Defence and Artificial Intelligence, it was reported that KAIST was “joining the global competition to develop autonomous arms” by developing weapons “which would search for and eliminate targets without human control”. Further cause for concern was that KAIST’s industry partner, Hanwha Systems builds cluster munitions, despite an UN ban, as well as a fully autonomous weapon, the SGR-A1 Sentry Robot. In 2008, Norway excluded Hanwha from its $380 billion future fund on ethical grounds.

“KAIST’s President, Professor Sung-Chul Shin, responded to the boycott by affirming in a statement that ‘KAIST does not have any intention to engage in development of lethal autonomous weapons systems and killer robots.’ He went further by committing that ‘KAIST will not conduct any research activities counter to human dignity including autonomous weapons lacking meaningful human control.’

“Given this swift and clear commitment to the responsible use of artificial intelligence in the development of weapons, the 56 AI and robotics researchers who were signatories to the boycott have rescinded the action. They will once again visit and host researchers from KAIST, and collaborate on scientific projects.”

UPDATE 4-5-18: In response to the boycott, KAIST President Sung-Chul Shin released an official statement to the press. In it, he says:

“I would like to reaffirm that KAIST does not have any intention to engage in development of lethal autonomous weapons systems and killer robots. KAIST is significantly aware of ethical concerns in the application of all technologies including artificial intelligence.

“I would like to stress once again that this research center at KAIST, which was opened in collaboration with Hanwha Systems, does not intend to develop any lethal autonomous weapon systems and the research activities do not target individual attacks.”

ORIGINAL ARTICLE 4-4-18:

Leading artificial intelligence researchers from around the world are boycotting South Korea’s KAIST (Korea Advanced Institute of Science and Technology) after the institute announced a partnership with Hanwha Systems to create a center that will help develop technology for AI weapons systems.

The boycott, organized by AI researcher Toby Walsh, was announced just days before the start of the next United Nations Convention on Conventional Weapons (CCW) meeting in which countries will discuss how to address challenges posed by autonomous weapons. 

“At a time when the United Nations is discussing how to contain the threat posed to international security by autonomous weapons, it is regrettable that a prestigious institution like KAIST looks to accelerate the arms race to develop such weapons,” the boycott letter states. 

The letter also explains the concerns AI researchers have regarding autonomous weapons:

“If developed, autonomous weapons will be the third revolution in warfare. They will permit war to be fought faster and at a scale greater than ever before. They have the potential to be weapons of terror. Despots and terrorists could use them against innocent populations, removing any ethical restraints. This Pandora’s box will be hard to close if it is opened.”

The letter has been signed by over 50 of the world’s leading AI and robotics researchers from 30 countries, including professors Yoshua Bengio, Geoffrey Hinton, Stuart Russell, and Wolfram Burgard.

Explaining the boycott, the letter states:

“We therefore publicly declare that we will boycott all collaborations with any part of KAIST until such time as the President of KAIST provides assurances, which we have sought but not received, that the Center will not develop autonomous weapons lacking meaningful human control. We will, for example, not visit KAIST, host visitors from KAIST, or contribute to any research project involving KAIST.”

In February, the Korean Times reported on the opening of the Research Center for the Convergence of National Defense and Artificial Intelligence, which was formed as a partnership between KAIST and Hanwha to “ the global competition to develop autonomous arms.” The Korean Times article added that “researchers from the university and Hanwha will carry out various studies into how technologies of the Fourth Industrial Revolution can be utilized on future battlefields.”

In the press release for the boycott, Walsh referenced concerns that he and other AI researchers have had since 2015, when he and FLI released an open letter signed by thousands of researchers calling for a ban on autonomous weapons.

“Back in 2015, we warned of an arms race in autonomous weapons,” said Walsh. “That arms race has begun. We can see prototypes of autonomous weapons under development today by many nations including the US, China, Russia and the UK. We are locked into an arms race that no one wants to happen. KAIST’s actions will only accelerate this arms race.”

Many organizations and people have come together through the Campaign to Stop Killer Robots to advocate for a UN ban on lethal autonomous weapons. In her summary of the last United Nations CCW meeting in November, 2017, Ray Acheson of Reaching Critical Will wrote:

“It’s been four years since we first began to discuss the challenges associated with the development of autonomous weapon systems (AWS) at the United Nations. … But the consensus-based nature of the Convention on Certain Conventional Weapons (CCW) in which these talks have been held means that even though the vast majority of states are ready and willing to take some kind of action now, they cannot because a minority opposes it.”

Walsh adds, “I am hopeful that this boycott will add urgency to the discussions at the UN that start on Monday. It sends a clear message that the AI & Robotics community do not support the development of autonomous weapons.”

To learn more about autonomous weapons and efforts to ban them, visit the Campaign to Stop Killer Robots and autonomousweapons.org. The full open letter and signatories are below.

Open Letter:

As researchers and engineers working on artificial intelligence and robotics, we are greatly concerned by the opening of a “Research Center for the Convergence of National Defense and Artificial Intelligence” at KAIST in collaboration with Hanwha Systems, South Korea’s leading arms company. It has been reported that the goals of this Center are to “develop artificial intelligence (AI) technologies to be applied to military weapons, joining the global competition to develop autonomous arms.”

At a time when the United Nations is discussing how to contain the threat posed to international security by autonomous weapons, it is regrettable that a prestigious institution like KAIST looks to accelerate the arms race to develop such weapons. We therefore publicly declare that we will boycott all collaborations with any part of KAIST until such time as the President of KAIST provides assurances, which we have sought but not received, that the Center will not develop autonomous weapons lacking meaningful human control. We will, for example, not visit KAIST, host visitors from KAIST, or contribute to any research project involving KAIST.

If developed, autonomous weapons will be the third revolution in warfare. They will permit war to be fought faster and at a scale greater than ever before. They have the potential to be weapons of terror. Despots and terrorists could use them against innocent populations, removing any ethical restraints. This Pandora’s box will be hard to close if it is opened. As with other technologies banned in the past like blinding lasers, we can simply decide not to develop them. We urge KAIST to follow this path, and work instead on uses of AI to improve and not harm human lives.

 

FULL LIST OF SIGNATORIES TO THE BOYCOTT

Alphabetically by country, then by family name.

  • Prof. Toby Walsh, USNW Sydney, Australia.
  • Prof. Mary-Anne Williams, University of Technology Sydney, Australia.
  • Prof. Thomas Either, TU Wein, Austria.
  • Prof. Paolo Petta, Austrian Research Institute for Artificial Intelligence, Austria.
  • Prof. Maurice Bruynooghe, Katholieke Universiteit Leuven, Belgium.
  • Prof. Marco Dorigo, Université Libre de Bruxelles, Belgium.
  • Prof. Luc De Raedt, Katholieke Universiteit Leuven, Belgium.
  • Prof. Andre C. P. L. F. de Carvalho, University of São Paulo, Brazil.
  • Prof. Yoshua Bengio, University of Montreal, & scientific director of MILA, co-founder of Element AI, Canada.
  • Prof. Geoffrey Hinton, University of Toronto, Canada.
  • Prof. Kevin Leyton-Brown, University of British Columbia, Canada.
  • Prof. Csaba Szepesvari, University of Alberta, Canada.
  • Prof. Zhi-Hua Zhou,Nanjing University, China.
  • Prof. Thomas Bolander, Danmarks Tekniske Universitet, Denmark.
  • Prof. Malik Ghallab, LAAS-CNRS, France.
  • Prof. Marie-Christine Rousset, University of Grenoble Alpes, France.
  • Prof. Wolfram Burgard, University of Freiburg, Germany.
  • Prof. Bernd Neumann, University of Hamburg, Germany.
  • Prof. Bernhard Schölkopf, Director, Max Planck Institute for Intelligent Systems, Germany.
  • Prof. Manolis Koubarakis, National and Kapodistrian University of Athens, Greece.
  • Prof. Grigorios Tsoumakas, Aristotle University of Thessaloniki, Greece.
  • Prof. Benjamin W. Wah, Provost, The Chinese University of Hong Kong, Hong Kong.
  • Prof. Dit-Yan Yeung, Hong Kong University of Science and Technology, Hong Kong.
  • Prof. Kristinn R. Thórisson, Managing Director, Icelandic Institute for Intelligent Machines, Iceland.
  • Prof. Barry Smyth, University College Dublin, Ireland.
  • Prof. Diego Calvanese, Free University of Bozen-Bolzano, Italy.
  • Prof. Nicola Guarino, Italian National Research Council (CNR), Trento, Italy.
  • Prof. Bruno Siciliano, University of Naples, Italy.
  • Prof. Paolo Traverso, Director of FBK, IRST, Italy.
  • Prof. Yoshihiko Nakamura, University of Tokyo, Japan.
  • Prof. Imad H. Elhajj, American University of Beirut, Lebanon.
  • Prof. Christoph Benzmüller, Université du Luxembourg, Luxembourg.
  • Prof. Miguel Gonzalez-Mendoza, Tecnológico de Monterrey, Mexico.
  • Prof. Raúl Monroy, Tecnológico de Monterrey, Mexico.
  • Prof. Krzysztof R. Apt, Center Mathematics and Computer Science (CWI), Amsterdam, the Netherlands.
  • Prof. Angat van den Bosch, Radboud University, the Netherlands.
  • Prof. Bernhard Pfahringer, University of Waikato, New Zealand.
  • Prof. Helge Langseth, Norwegian University of Science and Technology, Norway.
  • Prof. Zygmunt Vetulani, Adam Mickiewicz University in Poznań, Poland.
  • Prof. José Alferes, Universidade Nova de Lisboa, Portugal.
  • Prof. Luis Moniz Pereira, Universidade Nova de Lisboa, Portugal.
  • Prof. Ivan Bratko, University of Ljubljana, Slovenia.
  • Prof. Matjaz Gams, Jozef Stefan Institute and National Council for Science, Slovenia.
  • Prof. Hector Geffner, Universitat Pompeu Fabra, Spain.
  • Prof. Ramon Lopez de Mantaras, Director, Artificial Intelligence Research Institute, Spain.
  • Prof. Alessandro Saffiotti, Orebro University, Sweden.
  • Prof. Boi Faltings, EPFL, Switzerland.
  • Prof. Jürgen Schmidhuber, Scientific Director, Swiss AI Lab, Universià della Svizzera italiana, Switzerland.
  • Prof. Chao-Lin Liu, National Chengchi University, Taiwan.
  • Prof. J. Mark Bishop, Goldsmiths, University of London, UK.
  • Prof. Zoubin Ghahramani, University of Cambridge, UK.
  • Prof. Noel Sharkey, University of Sheffield, UK.
  • Prof. Luchy Suchman, Lancaster University, UK.
  • Prof. Marie des Jardins, University of Maryland, USA.
  • Prof. Benjamin Kuipers, University of Michigan, USA.
  • Prof. Stuart Russell, University of California, Berkeley, USA.
  • Prof. Bart Selman, Cornell University, USA.

 

Podcast: Navigating AI Safety – From Malicious Use to Accidents

Is the malicious use of artificial intelligence inevitable? If the history of technological progress has taught us anything, it’s that every “beneficial” technological breakthrough can be used to cause harm. How can we keep bad actors from using otherwise beneficial AI technology to hurt others? How can we ensure that AI technology is designed thoughtfully to prevent accidental harm or misuse?

On this month’s podcast, Ariel spoke with FLI co-founder Victoria Krakovna and Shahar Avin from the Center for the Study of Existential Risk (CSER). They talk about CSER’s recent report on forecasting, preventing, and mitigating the malicious uses of AI, along with the many efforts to ensure safe and beneficial AI.

Topics discussed in this episode include:

  • the Facebook Cambridge Analytica scandal,
  • Goodhart’s Law with AI systems,
  • spear phishing with machine learning algorithms,
  • why it’s so easy to fool ML systems,
  • and why developing AI is still worth it in the end.
In this interview we discuss The Malicious Use of Artificial Intelligence: Forecasting, Prevention and Mitigation, the original FLI grants, and the RFP examples for the 2018 round of FLI grants. This podcast was edited by Tucker Davey. You can listen to it above or read the transcript below.

 

Ariel: The challenge is daunting and the stakes are high. So ends the executive summary of the recent report, The Malicious Use of Artificial Intelligence: Forecasting, Prevention and Mitigation. I’m Ariel Conn with the Future of Life Institute, and I’m excited to have Shahar Avin and Victoria Krakovna joining me today to talk about this report along with the current state of AI safety research and where we’ve come in the last three years.

But first, if you’ve been enjoying our podcast, please make sure you’ve subscribed to this channel on SoundCloud, iTunes, or whatever your favorite podcast platform happens to be. In addition to the monthly podcast I’ve been recording, Lucas Perry will also be creating a new podcast series that will focus on AI safety and AI alignment, where he will be interviewing technical and non-technical experts from a wide variety of domains. His upcoming interview is with Dylan Hadfield-Menell, a technical AI researcher who works on cooperative inverse reinforcement learning and inferring human preferences. The best way to keep up with new content is by subscribing. And now, back to our interview with Shahar and Victoria.

Shahar is a Research Associate at the Center for the Study of Existential Risk, which I’ll be referring to as CSER for the rest of this podcast, and he is also the lead co-author on the Malicious Use of Artificial Intelligence report. Victoria is a co-founder of the Future of Life Institute and she’s a research scientist at DeepMind working on technical AI safety.

Victoria and Shahar, thank you so much for joining me today.

Shahar: Thank you for having us.

Victoria: Excited to be here.

Ariel: So I want to go back three years, to when FLI started our grant program, which helped fund this report on the malicious use of artificial intelligence, and I was hoping you could both talk for maybe just a minute or two about what the state of AI safety research was three years ago, and what prompted FLI to take on a lot of these grant research issues — essentially what prompted a lot of the research that we’re seeing today? Victoria, maybe it makes sense to start with you quickly on that.

Victoria: Well three years ago, AI safety was less mainstream in the AI research community than it is today, particularly long-term AI safety. So part of what FLI has been working on and why FLI started this grant program was to stimulate more work into AI safety and especially its longer-term aspects that have to do with powerful general intelligence, and to make it a more mainstream topic in the AI research field.

Three years ago, there were fewer people working in it, and many of the people who were working in it were a little bit disconnected from the rest of the AI research community. So part of what we were aiming for with our Puerto Rico conference and our grant program, was to connect these communities better, and to make sure that this kind of research actually happens and that the conversation shifts from just talking about AI risks in the abstract to actually doing technical work, and making sure that the technical problems get solved and that we start working on these problems well in advance before it is clear that, let’s say general AI, would appear soon.

I think part of the idea with the grant program originally, was also to bring in new researchers into AI safety and long-term AI safety. So to get people in the AI community interested in working on these problems, and for those people whose research was already related to the area, to focus more on the safety aspects of their research.

Ariel: I’m going to want to come back to that idea and how far we’ve come in the last three years, but before we do that, Shahar, I want to ask you a bit about the report itself.

So this started as a workshop that Victoria had also actually participated in last year and then you’ve turned it into this report. I want you to talk about what prompted that and also this idea that’s mentioned in the report is that, no one’s really looking at how artificial intelligence could be used maliciously. And yet what we’ve seen with every technology and advance that’s happened throughout history, I can’t think of anything that people haven’t at least attempted to use to cause harm, whether they’ve succeeded or not, I don’t know if that’s always the case, but almost everything gets used for harm in some way. So I’m curious why there haven’t been more people considering this issue yet?

Shahar: So going to back to maybe a few months before the workshop, which as you said was February 2017. Both Miles Brundage at the Future of Humanity Institute and I at the Center for the Study of Existential Risk, had this inkling that there were more and more corners of malicious use of AI that were being researched, people were getting quite concerned. We were in discussions with the Electronic Frontier Foundation about the DARPA Cyber Grand Challenge and progress being made towards the use of artificial intelligence in offensive cybersecurity. I think Miles was very well connected to the circle who were looking at lethal autonomous weapon systems and the increasing use of autonomy in drones. And we were both kind of — stories like the Facebook story that has been in the news recently, there were kind of the early versions of that coming up already back then.

So it’s not that people were not looking at malicious uses of AI, but it seemed to us that there wasn’t this overarching perspective that is not looking at particular domains. This is not, “what will AI do to cybersecurity in terms of malicious use? What will malicious use of AI look like in politics? What do malicious use of AI look like in warfare?” But rather across the board, if you look at this technology, what new kinds of malicious actions does it enable, and other commonalities across those different domains. Plus, it seemed that that “across the board” more technology-focused perspective, other than “domain of application” perspective, was something that was missing. And maybe that’s less surprising, right? People get very tied down to a particular scenario, a particular domain that they have expertise on, and from the technologists’ side, many of them just wouldn’t know all of the legal minutiae of warfare, or — one thing that we found was there weren’t enough channels of communication between the cybersecurity community and the AI research community; similarly the political scientists and the AI research community. So it did require quite an interdisciplinary workshop to get all of these things on the table, and tease out some the commonalities, which is what we then try to do with the report.

Ariel: So actually, you mentioned the Facebook thing and I was a little bit curious about that. Does that fall under the umbrella of this report or is that a separate issue?

Shahar: It’s not clear if it would fall directly under the report, because the way we define malicious could be seen as problematic. It’s the best that we could do with this kind of report, which is to say that there is a deliberate attempt to cause harm using the technology. It’s not clear, whether in the Facebook case, there was a deliberate attempt to cause harm or whether there was disregard of harm that could be caused as a side effect, or just the use of this in an arena that there are legitimate moves, just some people realize that the technology can be used to gain an upper hand within this arena.

But, there are whole scenarios that sit just next to it, that look very similar, but that are centralized use of this kind of surveillance, diminishing privacy, potentially the use of AI to manipulate individuals, manipulate their behavior, target messaging at particular individuals.

There are clearly imaginable scenarios in which this is done maliciously to keep a corrupt government in power, to overturn a government in another nation, kind of overriding the self-determination of the members of their country. There are not going to be clear rules about what is obviously malicious and what is just part of the game. I don’t know where to put Facebook’s and Cambridge Analytica’s case, but there are clearly cases that I think universally would be considered as malicious that from the technology side look very similar.

Ariel: So this gets into a quick definition that I would like you to give us and that is for the term ‘dual use.’ I was at a conference somewhat recently and a government official who was there, not a high level, but someone who should have been familiar with the term ‘dual use’ was not. So I would like to make sure that we all know what that means.

Shahar: So I’m not, of course, a legal expert, but the term did come up a lot in the workshop and in the report. ‘Dual use,’ as far as I can understand it, refers to technologies or materials that both have peace-time or peaceful purposes and uses, but also wartime, or harmful uses. A classical example would be certain kinds of fertilizer that could be used to grow more crops, but could also be used to make homegrown explosives. And this matters because you might want to regulate explosives, but you definitely don’t want to limit people’s access to get fertilizer and so you’re in a bind. How do you make sure that people who have a legitimate peaceful use of a particular technology or material get to have that access without too much hassle that will increase the cost or make things more burdensome, but at the same time, make sure that malicious actors don’t get access to capabilities or technologies or materials that they can use to do harm.

I’ve also heard the term ‘omni use,’ being referred to artificial intelligence, this is the idea that technology can have so many uses across the board that regulating it because of its potential for causing harm comes at a very, very high price, because it is so foundational for so many other things. So one can think of electricity: it is true that you can use electricity to harm people, but vetting every user of the electric grid before they are allowed to consume electricity, seems very extreme, because there is so much benefit to be gained from just having access to electricity as a utility, that you need to find other ways to regulate. Computing is often considered as ‘omni use’ and it may well be that artificial intelligence is such a technology that would just be foundational for so many applications that it will be ‘omni use,’ and so the way to stop malicious actors from having access to it is going to be fairly complicated, but it’s probably not going to be any kind of a heavy-handed regulation.

Ariel: Okay. Thank you. So going back a little bit to the report more specifically, I don’t know how detailed we want to get with everything, but I was hoping you could touch a little bit on a few of the big topics that are in the report. For example, you talk about changes in the landscape of threats, where there is an expansion of existing threats, there’s an intro to new threats, and typical threats will be modified. Can you speak somewhat briefly as to what each of those mean?

Shahar: So I guess what I was saying, the biggest change is that machine learning, at least in some domains, now works. That means that you don’t need to have someone write out the code in order to have a computer that is performant at the particular task, if you can have the right kind of labeled data or the right kind of simulator in which you can train an algorithm to perform that action. That means that, for example, if there is a human expert with a lot of tacit knowledge in a particular domain, let’s say the use of a sniper rifle, it may be possible to train a camera that sits on top of a rifle, coupled with a machine learning algorithm that does the targeting for you, so that now any soldier becomes as expert as an expert marksman. And of course, the moment you’ve trained this model once, making copies of it is essentially free or very close to free, the same as it is with software.

Another is the ability to go through very large spaces of options and using some heuristics to more effectively search through that space for effective solutions. So one example of that would be AlphaGo, which is a great technological achievement and has absolutely no malicious use aspects, but you can imagine as an analogy, similar kinds of technologies being used to find weaknesses in software, discovering vulnerabilities and so on. And I guess, finally, one example we’ve seen that came up a lot, is the capabilities in machine vision. The fact that you can now look at an image and tell what is in that image, through training, which is something that computers were just not able to do a decade ago, at least nowhere near human levels of performance, starts unlocking potential threats both in autonomous targeting, say on top of drones, but also in manipulation. If I can know whether a picture is a good representation of something or not, then my ability to create forgeries significantly increases. This is the technology of generative adversarial networks, that we’ve seen used to create fake audio and potentially fake videos in the near future.

All of these new capabilities, plus the fact that access to the technology is becoming — I mean these technologies are very democratized at the moment. There are papers on arXiv, there are good tutorials on You Tube. People are very keen to have more people join the AI revolution, and for good reason, plus the fact that moving these trained models around is very cheap. It’s just the cost of copying the software around, and the computer that is required to run those models is widely available. This suggests that the availability of these malicious capabilities is going to rapidly increase, and that the ability to perform certain kinds of attacks would no longer be limited to a few humans, but would become much more widespread.

Ariel: And so I have one more question for you, Shahar, and then I’m going to bring Victoria back in. You’re talking about the new threats, and this expansion of threats and one of the things that I saw in the report that I’ve also seen in other issues related to AI is, we’ve had computers around for a couple decades now, we’re used to issues pertaining to phishing or hacking or spam. We recognize computer vulnerabilities. We know these are an issue. We know that there’s lots of companies that are trying to help us defend our computers against malicious cyber attacks, stuff like that. But one of the things that you get into in the report is this idea of “human vulnerabilities” — that these attacks are no longer just against the computers, but they are also going to be against us.

Shahar: I think for many people, this has been one of the really worrying things about the Cambridge Analytica, Facebook issue that is in the news. It’s the idea that because of our particular psychological tendencies, because of who we are, because of how we consume information, and how that information shapes what we like and what we don’t like, what we are likely to do and what we are unlikely to do, the ability of the people who control the information that we get, gives them some capability to control us. And this is not new, right?

People who are making newspapers or running radio stations or national TV stations, have known for a very long time, that the ability to shape the message is the ability to influence people’s decisions. But coupling that with algorithms that are able to run experiments on millions or billions of people simultaneously with very tight feedback loops — so you make a small change in the feed of one individual and see whether their behavior changes. And you can run many of these experiments and you can get very good data, is something that was never available at the age of broadcasts. To some extent, it was available in the age of software. When software starts moving into big data and big data analytics, the boundaries start to blur between those kinds of technologies and AI technologies.

This is the kind of manipulation that you seem to be asking about that we definitely flag in the report, both in terms of political security, the ability of large communities to govern themselves in a way that they find to truthfully represent their own preferences, but also, on a more small scale, with the social side of cyber attacks. So, if I can manipulate an individual, or a few individuals in a company to disclose their passwords or to download or click a link that they shouldn’t have, through modeling of their preferences and their desires, then that is a way in that might be a lot easier than trying to break the system through its computers.

Ariel: Okay, so one other thing that I think I saw come up, and I started to allude to this — there’s, like I said, the idea that we can defend our computers against attacks and we can upgrade our software to fix vulnerabilities, but then how do we sort of “upgrade” people to defend themselves? Is that possible? Or is it a case of we just keep trying to develop new software to help protect people?

Shahar: I think the answer is both. One thing that did come up a lot is, unfortunately unlike computers, you cannot just download a patch to everyone’s psychology. We have slow processes of doing that. So we can incorporate parts of what is a trusted computer, what is a trusted source, into the education system and get people to be more aware of the risks. You can definitely design the technology such that it makes a lot more explicit where it’s vulnerabilities and where it’s more trusted parts are, which is something that we don’t do very well at the moment. The little lock on the browser is kind of the high end of our ability to design systems to disclose where security is and why it matters, and there is much more to be done here, because just awareness of the amount of vulnerability is very low.

So there is some more probably that we can do with education and with notifying the public, but it also should be expected that this ability is limited, and it’s also, to a large extent, an unfair burden to put on the population at large. It is much more important, I think, that the technology is being designed in the first place, to as much as possible be explicit and transparent about its levels of security, and if those levels of security are not high enough, then that in turn should lead for demands for more secure systems.

Ariel: So one of the things that came up in the report that I found rather disconcerting, was this idea of spear phishing. So can you explain what that is?

Shahar: We are familiar with phishing in general, which is when you pretend to be someone or something that you’re not in order to gain your victim’s trust and get them to disclose information that they should not be disclosing to you as a malicious actor. So you could pretend to be the bank and ask them to put in their username and password, and now you have access to their bank account and can transfer away their funds. If this is part of a much larger campaign, you could just pretend to be their friend, or their secretary, or someone who wants to give them a prize, get them to trust you, get one of the passwords that maybe they are using, and maybe all you do with that is you use that trust to talk to someone else who is much more concerned. So now that I have the username and password, say for the email or the Facebook account of some low-ranking employee in a company, I can start messaging their boss and pretending to be them and maybe get even more passwords and more access through that.

Phishing is usually kind of a “spray and pray” approach. You have a, “I’m a Nigerian prince, I have all of this money stocked in Africa, I’ll give you a cut if you help me move it out of the country, you need to send me some money.” You send this to millions of people, and maybe one or two fall for it. The cost for the sender is not very high, but the success rate is also very, very low.

Spear phishing on the other hand, is when you find a particular target, and you spend quite a lot of time profiling them and understanding what their interests are, what their social circles are, and then you craft a message that is very likely to work on them, because it plays to their ego, it plays to their normal routine, it plays on their interests and so on.

In the report we talk about this research by ZeroFOX, where they took a very simple version of this. They said, let’s look at what people tweet about, we’ll take that as an indication of the stuff that they’re interested in. We will train a machine learning algorithm to create a model of the topics that people are interested in, form the tweets, craft a malicious tweet that is based on those topics of interest and have that be a link to a malicious site. So instead of sending kind of generally, “Check this out, super cool website,” with a link to a malicious website most people know not to click on, it will be, “Oh, you are clearly interested in sports in this particular country, have you seen what happened, like the new hire in this team?” Or, “You’re interested in archeology, crazy new report about recent finds in the pyramids,” or something. And what they showed was that, once that they’ve kind of created the bot, that bot then crafted targeted messages, those spear phishing messages, to a large number of users, and in principle they could scale it up indefinitely because now it’s software, and the click through rate was very high. I think it was something like 30 percent, which is orders of magnitude more than you get with phishing.

So automating spear phishing changes what used to be a trade off between spray and pray, target millions of people, but very few of them would click on it, or spear phishing where you target only a few individuals with very high success rates — now you can target millions of people and customize the message to each one so you have high success rates for all of them. Which means that, you and me, who previously wouldn’t be very high on the target list for cyber criminals or other cyber attackers can now become targets simply because the cost is very low.

Ariel: So the cost is low, I don’t think I’m the only person who likes to think that I’m pretty good at recognizing sort of these phishing scams and stuff like that. I’m assuming these are going to also become harder for us to identify?

Shahar: Yep. So the idea is that the moment you have access to people’s data, because they’re explicit on social media about their interests and about their circles of friends, then the better you get at crafting messages and, say, comparing them to authentic messages from people, and saying, “oh this is not quite right, we are going to tweak the algorithm until we get something that looks a lot like something a human would write.” Quite quickly you could get to the point where computers are generating, say, to begin with texts that are indistinguishable from what a human would write, but increasingly also images, audio segments, maybe entire websites. As long as the motivation or the potential for profit is there, it seems like the technology, either the ones that we have now or the ones that we can foresee in the five years, would allow these kinds of advances to take place.

Ariel: Okay. So I want to touch quickly on the idea of adversarial examples. There was an XKCD cartoon that came out a week or two ago about self driving cars and the character says, “I worry about self driving car safety features, what’s to stop someone from painting fake lines on the road or dropping a cutout of a pedestrian onto a highway to make cars swerve and crash,” and then realizes all of those things would also work on human drivers. Sort of a personal story, I used to live on a street called Climax and I actually lived at the top of Climax, and I have never seen a street sign stolen more in my life, it was often the street sign just wasn’t there. So my guess is it’s not that hard to steal a stop sign if someone really wanted to mess around with drivers, and yet we don’t see that happen very often.

So I was hoping both of you could weigh in a little bit on what you think artificial intelligence is going to change about these types of scenarios where it seems like the risk will be higher for things like adversarial examples versus just stealing a stop sign.

Victoria: I agree that there is certainly a reason for optimism in the fact that most people just aren’t going to mess with the technology, that there aren’t that many actual bad actors out there who want to mess it up. On the other hand, as Shahar said earlier, democratizing both the technology and the ways to mess with it, to interfere with it, does make that more likely. For example, the ways in which you could provide adversarial examples to cars, can be quite a bit more subtle than stealing a stop sign or dropping a fake body on the road or anything like that. For example, you can put patches on a stop sign that look like noise or just look like rectangles in certain places and humans might not even think to remove them, because to humans they’re not a problem. But an autonomous car might interpret that as a speed limit sign instead of a stop sign, and similarly, more generally people can use adversarial patches to fool various vision systems, for example if they don’t want to be identified by a surveillance camera or something like that.

So a lot of these methods, people can just read about it online, there are papers in arXiv and I think the fact that they are so widely available might make it easier for people to interfere with technology more, and basically might make this happen more often. It’s also the case that the vulnerabilities of AI are different than the vulnerabilities of humans, so it might lead to different ways that it can fail that humans are not used to, and ways in which humans would not fail. So all of these things need to be considered, and of course, as technologists, we need to think about ways in which things can go wrong, whether it is presently highly likely, or not.

Ariel: So that leads to another question that I want to ask, but before I go there, Shahar, was there anything you wanted to add?

Shahar: I think that covers almost all of the basics, but I’d maybe stress a couple of these points. One thing about machines failing in ways that are different from how humans fail, it means that you can craft an attack that would only mess up a self driving car, but wouldn’t mess up a human driver. And that means let’s say, you can go in the middle of the night and put some stickers on and you are long gone from the scene by the time something bad happens. So this diminished ability to attribute the attack, might be something that means that more people feel like they can get away with it.

Another one is that we see people much more willing to perform malicious or borderline acts online. So it’s important, I mean we often talk about adversarial examples as things that affect vision systems, because that’s where a lot of the literature is, but it is very likely — in fact, there are several examples that also things like anomaly detection that uses machine learning patterns, malicious code detection that is based on machine-learned patterns, anomaly detection in networks and so on, all of these have their kinds of adversarial examples as well.  And so thinking about adversarial examples against defensive systems and adversarial examples against systems that are only available online, brings us back to one attacker somewhere in the world could have access to your system and so the fact that most people are not attackers doesn’t really help you defense-wise.

Ariel: And, so this whole report is about how AI can be misused, but obviously the AI safety community and AI safety research goes far beyond that. So especially in the short term, do you see misuse or just general safety and design issues to be a bigger deal?

Victoria: I think it is quite difficult to say which of them would be a bigger deal. I think both misuse and accidents are something that are going to increase in importance and become more challenging and these are things that we really need to be working on as a research community.

Shahar: Yeah, I agree. We wrote this report not because we don’t think accident risk and safety risk matters are important — we think they are very important. We just thought that there was some pretty good technical reports out there outlining the risks from accident with near-term machine learning and with long-term and some of the researching that could be used to address them, and we felt like a similar thing was missing for misuse, which was why we wrote that report.

Both are going to be very important, and to some extent there is going to be an interplay. It is possible that systems that are more interpretable are also easier to secure. It might be the case that if there is some restriction in the diffusion of capabilities that also means that there is less incentive to cut corners to out-compete someone else by skimping on safety and so on. So there are strategic questions across both misuse and accidents, but I agree with Victoria, probably if we don’t do our job, we are just going to see more and more of both of these categories causing harm in the world, and more reason to work on both of them. I think both fields need to grow.

Victoria: I just wanted to add, a common cause of both accident risks and misuse risks that might happen in the future is just that these technologies are advancing quickly and there are often unforeseen and surprising ways in which they can fail, either by accident or by having vulnerabilities that can be misused by bad actors. And so as the technology continues to advance quickly we really need to be on the lookout for new ways that it can fail, new accidents but also new ways in which it can be used for harm by bad actors.

Ariel: So one of the things that I got out of this report, and that I think is also coming through now is, it’s kind of depressing. And I found myself often wondering … So at FLI, especially now we’ve got the new grants that are focused more on AGI, we’re worried about some of these bigger, longer-term issues, but with these shorter-term things, I sometimes find myself wondering if we’re even going to make it to AGI, or if something is going to happen that prevents that development in some way. So I was hoping you could speak to that a little bit.

Shahar: Maybe I’ll start with the Malicious Use report, and apologize for its somewhat gloomy perspective. So it should probably be mentioned that, I think almost all of the authors of the report are somewhere between fairly and very optimistic about artificial intelligence. So it’s much more the fact that we see this technology going, we want to see it developed quickly, at least in various narrow domains that are of very high importance, like medicine, like self driving cars — I’m personally quite a big fan. We think that the best way to, if we can foresee and design around or against the misuse risks, then we will eventually end up with a technology that it is more mature, that is more acceptable, that is more trusted because it is trustworthy, because it is secure. We think it is going to be much better to plan for these things in advance.

It is also, again, say we use electricity as an analogy, if I just sat down at the beginning of the age of electricity and I wrote a report about how many people were going to be electrocuted, it would look like a very sad thing. And it’s true, there has been a rapid increase in the number of people who die from electrocution compared to before the invention of electricity and much safety has been built since then to make sure that that risk is minimized, but of course, the benefits have far, far, far outweighed the risks when it comes to electricity and we expect, probably, hopefully, if we take the right actions, like we lay out in the report, then the same is going to be true for misuse risk for AI. At least half of the report, all of Appendix B and a good chunk of the parts before it, talk about what we can do to mitigate those risks, so hopefully the message is not entirely doom and gloom.

Victoria: I think that the things we need to do remain the same no matter how far away we expect these different developments to happen. We need to be looking out for ways that things can fail. We need to be thinking in advance about ways that things can fail, and not wait until problems show up and we actually see that they’re happening. Of course, we often will see problems show up, but in these matters an ounce of prevention can be worth a pound of cure, and there are some mistakes that might just be too costly. For example, if you have some advanced AI that is running the electrical grid or the financial system, we really don’t want that thing to, hack its reward function.

So there are various predictions about how soon different transformative developments of AI might happen and it is possible that things might go awry with AI before we get to general intelligence and what we need to do is basically work hard to try to prevent these kinds of accidents or misuse from happening and try to make sure that AI is ultimately beneficial, because the whole point of building it is because it would be able to solve big problems that we cannot solve by ourselves. So let’s make sure that we get there and that we sort of handle this with responsibility and foresight the whole way.

Ariel: I want to go back to the very first comments that you made about where we were three years ago. How have things changed in the last three years and where do you see the AI safety community today?

Victoria: In the last three years, we’ve seen the AI safety research community get a fair bit bigger and topics of AI safety have become more mainstream, so I will say that long-term AI safety is definitely less controversial and there are more people engaging with the questions and actually working on them. While near-term safety, like questions of fairness and privacy and technological unemployment and so on, I would say that’s definitely mainstream at this point and a lot of people are thinking about that and working on that.

In terms of long term AI safety or AGI safety we’ve seen teams spring up, for example, both DeepMind and OpenAI have a safety team that’s focusing on these sort of technical problems, which includes myself on the DeepMind side. There have been some really interesting bits of progress in technical AI safety. For example, there has been some progress in reward learning and generally value learning. For example, the cooperative inverse reinforcement learning work from Berkeley. There has been some great work from MIRI on logical induction and quantilizing agents and that sort of thing. There have been some papers at mainstream machine learning conferences that focus on technical AI safety, for example, there was an interruptibility paper at NIPS last year and generally I’ve been seeing more presence of these topics in the big conferences, which is really encouraging.

On a more meta level, it has been really exciting to see the Concrete Problems in AI Safety research agenda come out two years ago. I think that’s really been helpful to the field. So these are only some of the exciting advances that have happened.

Ariel: Great. And so, Victoria, I do want to turn now to some of the stuff about FLI’s newest grants. We have an RFP that included quite a few examples and I was hoping you could explain at least two or three of them, but before we get to that if you could quickly define what artificial general intelligence (AGI) is, what we mean when we refer to long-term AI? I think those are the two big ones that have come up so far.

Victoria: So, artificial general intelligence is this idea of an AI system that can learn to solve many different tasks. Some people define this in terms of human-level intelligence as an AI system that will be able to learn to do all human jobs, for example. And this contrasts to the kind of AI systems that we have today which we could call “narrow AI,” in the sense that they specialize in some task or class of tasks that they can do.

So, for example Alpha Zero is a system that is really good at various games like Go and Chess and so on, but it would not be able to, for example, clean up a room, because that’s not in its class of tasks. While if you look at human intelligence we would say that humans are our go-to example of general intelligence because we can learn to do new things, we can adapt to new tasks and new environments that we haven’t seen before and we can transfer our knowledge that we have acquired through previous experience, that might not be in exactly the same settings, to whatever we are trying to do at the moment.

So, AGI is the idea of building an AI system that is also able to do that — not necessarily in the same way as humans, like it doesn’t necessarily have to be human-like to be able to perform the same tasks, or it doesn’t have to be structured the way a human mind is structured. So the definition of AGI is about what it’s capable of rather than how it can do those things. I guess the emphasis there is on the word general.

In terms of the FLI grant program this year, it is specifically focused on the AGI safety issue, which we also call long-term AI safety. Long term here doesn’t necessarily mean that it’s 100 years away. We don’t know how far away AGI actually is; the opinions of experts vary quite widely on that. But it’s more emphasizing that it’s not an immediate problem in the sense that we don’t have AGI yet, but we are trying to foresee what kind of problems might happen with AGI and make sure that if and when AGI is built that it is as safe and aligned with human preferences as possible.

And in particular as a result of the mainstreaming of AI safety that has happened in the past two years, partly, as I like to think, due to FLI’s efforts, at this point it makes sense to focus on long-term safety more specifically since this is still the most neglected area in the AI safety field. I’ve been very happy to see lots and lots of work happening these days on adversarial examples, fairness, privacy, unemployment, security and so on.  I think this allows us to really zoom in and focus on AGI safety specifically to make sure that there’s enough good technical work going on in this field and that the big technical problems get as much progress as possible and that the research community continues to grow and do well.

In terms of the kind of problems that I would want to see solved, I think some of the most difficult problems in AI safety that sort of feed into a lot of the problem areas that we have are things like Goodhart’s Law. Goodhart’s Law is basically that, when a metric becomes a target, it ceases to be a good metric. And the way this applies to AI is that if we make some kind of specification of what objective we want the AI system to optimize for — for example this could be a reward function, or a utility function, or something like that — then, this specification becomes sort of a proxy or a metric for our real preferences, which are really hard to pin down in full detail. Then if the AI system explicitly tries to optimize for the metric or for that proxy, for whatever we specify, for the reward function that we gave, then it will often find some ways to follow the letter but not the spirit of that specification.

Ariel: Can you give a real life example of Goodhart’s Law today that people can use as an analogy?

Victoria: Certainly. So Goodhart’s Law was not originally coined in AI. This is something that generally exists in economics and in human organizations. For example, if employees at a company have their own incentives in some way, like they are incentivized to clock in as many hours as possible, then they might find a way to do that without actually doing a lot of work. If you’re not measuring that then the number of hours spent at work might be correlated with how much output you produce, but if you just start rewarding people for the number of hours then maybe they’ll just play video games all day, but they’ll be in the office. That could be a human example.

There are also a lot of AI examples these days of reward functions that turn out not to give good incentives to AI systems.

Ariel: For a human example, would the issues that we’re seeing with standardized testing be an example of this?

Victoria: Oh, certainly, yes. I think standardized testing is a great example where when students are optimizing for doing well on the tests, then the test is a metric and maybe the real thing you want is learning, but if they are just optimizing for doing well on the test, then actually learning can suffer because they find some way to just memorize or study for particular problems that will show up on the test, which is not necessarily a good way to learn.

And if we get back to AI examples, there was a nice example from OpenAI last year where they had this reinforcement learning agent that was playing a boat racing game and the objective of the boat racing game was to go along the racetrack as fast as possible and finish the race before the other boats do, and to encourage the player to go along the track there were some reward points — little blocks that you have to hit to get rewards — that were along the track, and then the agent just found a degenerate solution where it would just go in a circle and hit the same blocks over and over again and get lots of reward, but it was not actually playing the game or winning the race or anything like that. This is an example of Goodhart’s Law in action. There are plenty of examples of this sort with present day reinforcement learning systems. Often when people are designing a reward function for a reinforcement learning system they end up adjusting it a number of times to eliminate these sort of degenerate solutions that happen.

And this is not limited to reinforcement learning agents. For example, recently there was a great paper that came out about many examples of Goodhart’s Law in evolutionary algorithms. For example, if some evolved agents were incentivized to move quickly in some direction, then they might just evolve to be really tall and then they fall in this direction instead of actually learning to move. There are lots and lots of examples of this and I think that as AI systems become more advanced and more powerful, then I think they’ll just get more clever at finding these sort of loopholes in our specifications of what we want them to do. Goodhart’s Law is, I would say, part of what’s behind various other AI safety issues. For example, negative side effects are often caused by the agent’s specification being incomplete, so there’s something that we didn’t specify.

For example, if we want a robot to carry a box from point A to point B, then if we just reward it for getting the box to point B as fast as possible, then if there’s something in the path of the robot — for example, there’s a vase there — then it will not have an incentive to go around the vase, it would just go right through the vase and break it just to get to point B as fast as possible, and this is an issue because our specification did not include a term for the state of the vase. So, when data is just optimizing for this reward that’s all about the box, then it doesn’t have an incentive to avoid disruptions to the environment.

Ariel: So I want to interrupt with a quick question. These examples so far, we’re obviously worried about them with a technology as powerful as AGI, but they’re also things that apply today. As you mentioned, Goodhart’s Law doesn’t even just apply to AI. What progress has been made so far? Are we seeing progress already in addressing some of these issues?

Victoria: We haven’t seen so much progress in addressing these questions in a very general sort of way, because when you’re building a narrow AI system, then you can often get away with a sort of trial and error approach where you run it and maybe it does something stupid, finds some degenerate solution, then you tweak your reward function, you run it again and maybe it finds a different degenerate solution and then so on and so forth until you arrive at some reward function that doesn’t lead to obvious failure cases like that. For many narrow systems and narrow applications where you can sort of foresee all the ways in which things can go wrong, and just penalize all those ways or build a reward function that avoids all of those failure modes, then there isn’t so much need to find a general solution to these problems. While as we get closer to general intelligence, there will be more need for more principled and more general approaches to these problems.

For example, how do we build an agent that has some idea of what side effects are, or what it means to disrupt an environment that it’s in, no matter what environment you put it in. That’s something we don’t have yet. One of the promising approaches that has been gaining traction recently is reward learning. For example, there was this paper in collaboration between DeepMind and OpenAI called Deep Reinforcement Learning from Human Preferences, where instead of directly specifying a reward function for the agent, it learns a reward function from human feedback. Where, for example, if your agent is this simulated little noodle or hopper that’s trying to do a backflip, then the human would just look at two videos off the agent trying to do a backflip and say, “Well this one looks more like a back flip.” And so, you have a bunch of data from the human about what is more similar to what the human wants the agent to do.

With this kind of human feedback, unlike, for example, demonstrations, the agent can learn something that the human might not be able to demonstrate very easily. For example, even if I cannot do a backflip myself, I can still judge whether someone else has successfully done a backflip or whether this reinforcement agent has done a backflip. This is promising for getting agents to potentially solve problems that humans cannot solve or do things that humans cannot demonstrate. Of course, with human feedback and human-in-the-loop kind of work, there is always the question of scalability because human time is expensive and we want the agent to learn as efficiently as possible from limited human feedback and we also want to make sure that the agent actually gets human feedback in all the relevant situations so it learns to generalize correctly to new situations. There are a lot of remaining open problems in this area as well, but the progress so far has been quite encouraging.

Ariel: Are there others that you want to talk about?

Victoria: Maybe I’ll talk about one other question, which is that of interpretability. Interpretability of AI systems is something that is a big area right now in near-term AI safety that increasingly more people on the research community are thinking about and working on, that is also quite relevant in long-term AI safety. This generally has to do with being able to understand why your system does things a certain way, or makes certain decisions or predictions, or in the case of an agent, why it takes certain actions and also understanding what different components of the system are looking for in the data or how the system is influenced by different inputs and so on. Basically making it less of a black box, and I think there is a reputation for deep learning systems in particular that they are seen as black boxes and it is true that they are quite complex, but I think they don’t necessarily have to be black boxes and there has certainly been progress in trying to explain why they do things.

Ariel: Do you have real world examples?

Victoria: So, for example, if you have some AI system that’s used for medical diagnosis, then on the one hand you could have something simple like a decision tree that just looks at your x-ray and if there is something in a certain position then it gives you a certain diagnosis, and otherwise it doesn’t and so on. Or you could have a more complex system like a neural network that takes into account a lot more factors and then at the end it says, like maybe this person has cancer or maybe this person has something else. But it might not be immediately clear why that diagnosis was made. Particularly in sensitive applications like that, what sometimes happens is that people end up using simpler systems that they find more understandable where they can say why a certain diagnosis was made, even if those systems are less accurate, and that’s one of the important cases for interpretability where if we figure out how to make these more powerful systems more interpretable, for example, through visualization techniques, then they would actually become more useful in these really important applications where it actually matters not just to predict well, but to explain where the prediction came from.

And another area, another example is an algorithm that’s deciding whether to give someone a loan or a mortgage, then if someone’s loan application got rejected then they would really want to know why it got rejected. So the algorithm has to be able to point at some variables or some other aspect of the data that influences decisions or you might need to be able to explain how the data will need to change for the decision to change, what variables would need to be changed by a certain amount for the decision to be different. So these are just some examples of how this can be important and how this is already important. And this kind of interpretability of present day systems is of course already on a lot of people’s minds. I think it is also important to think about interpretability in the longer term as we build more general AI systems that will continue to be important or maybe even become more important to be able to look inside them and be able to check if they have particular concepts that they’re representing.

Like, for example, especially from a safety perspective, whether your system was thinking about the off switch and if it’s thinking about whether it’s going to be turned off, that might be something good to monitor for. We also would want to be able to explain how our systems fail and why they fail. This is, of course, quite relevant today if, let’s say your medical diagnosis AI makes a mistake and we want to know what led to that, why it made the wrong diagnosis. Also on the longer term we want to know why an AI system hacks its reward function, what is it thinking — well “thinking” with quotes, of course — while it’s following a degenerate solution instead of the kind of solution we would want it to find. So, what is the boat race agent that I mentioned earlier paying attention to while it’s going in circles and collecting the same rewards over and over again instead of playing the game, that kind of thing. I think the particular application of interpretability techniques to safety problems is going to be important and it’s one of the examples of the kind of work that we’re looking for in the in the RFP.

Ariel: Awesome. Okay, and so, we’ve been talking about how all these things can go wrong and we’re trying to do all this research to make sure things don’t go wrong, and yet basically we think it’s worthwhile to continue designing artificial intelligence, that no one’s looking at this and saying “Oh my god, artificial intelligence is awful, we need to stop studying it or developing it.” So what are the benefits that basically make these risks worth the risk?

Shahar: So I think one thing is in the domain of narrow applications, it’s very easy to make analogies to software, right? For the things that we have been able to hand over to computers, they really have been the most boring and tedious and repetitive things that humans can do and we now no longer need to do them and productivity has gone up and people are generally happier and they can get paid more for doing more interesting things and we can just build bigger systems because we can hand off the control of them to machines that don’t need to sleep and don’t make small mistakes in calculations. Now the promise of turning that and adding to that all of the narrow things that experts can do, whether it’s improving medical diagnosis, whether it’s maybe farther down the line some elements of drug discovery, whether it’s piloting a car or operating machinery, many of these areas where human labor is currently required because there is a fuzziness to the task, it does not enable a software engineer to come in and code an algorithm, but maybe with machine learning in the not too distant future we’ll be able to turn them over to machines.

It means taking some skills that only a few individuals in the world can do and making those available to everyone around the world in some domains. That seems, I mean, concrete examples are, the ones that I have I try to find the companies that do them and get involved with them because I want to see them happen sooner and the ones that I can’t imagine yet, someone will come along and make a company out of it, or a not-for-profit for it. But we’ve seen applications from agriculture, to medicine, to computer security, to entertainment and art, and driving and transport, and in all of these I think we’re just gonna be seeing even more. I think we’re gonna have more creative products out there that were designed in collaboration between humans and machines. We’re gonna see more creative solutions to scientific engineering problems. We’re gonna see those professions where really good advice is very valuable, but there are only so many people who can help you — so if I’m thinking of doctors and lawyers, taking some of that advice and making it universally accessible through an app just makes life smoother. These are some of the examples that come to my mind.

Ariel: Okay, great. Victoria what are the benefits that you think make these risks worth addressing?

Victoria: I think there are many ways in which AI systems can make our lives a lot better and make the world a lot better especially as we build more general systems that are more adaptable. For example, these systems could help us with designing better institutions and better infrastructure, better health systems or electrical systems or what have you. Even now, there are examples like the Google project on optimizing the data center energy use using machine learning, which is something that Deep Mind was working on, where the use of machine learning algorithms to automate energy used in the data centers improved their energy efficiency by I think something like 40 percent. That’s of course with fairly narrow AI systems.

I think as we build more general AI systems we can expect, we can hope for really creative and innovative solutions to the big problems that humans face. So you can think of something like AlphaGo’s famous “move 37” that overturned thousands of years of human wisdom in Go. What if you can build even more general and even more creative systems and apply them to real world problems? I think there is great promise in that. I think this can really transform the world in a positive direction, and we just have to make sure that as the systems are built that we think about safety from the get go and think about it in advance and trying to build them to be as resistant to accidents and misuse as possible so that all these benefits can actually be achieved.

The things I mentioned were only examples of the possible benefits. Imagine if you could have an AI scientist that’s trying to develop better drugs against diseases that have really resisted treatment or more generally just doing science faster and better if you actually have more general AI systems that can think as flexibly as humans can about these sort of difficult problems. And they would not have some of the limitations that humans have where, for example, our attention is limited our memory is limited, while AI could be, at least theoretically, unlimited in it’s processing power, in the resources available to it, it can be more parallelized, it can be more coordinated and I think all of the big problems that are so far unsolved are these sort of coordination problems that require putting together a lot of different pieces of information and a lot of data. And I think there are massive benefits to be reaped there if we can only get to that point safely.

Ariel: Okay, great. Well thank you both so much for being here. I really enjoyed talking with you.

Shahar: Thank you for having us. It’s been really fun.

Victoria: Yeah, thank you so much.

2018 Spring Conference: Invest in Minds Not Missiles

On Saturday April 7th and Sunday morning April 8th, MIT and Massachusetts Peace Action will co-host a conference and workshop at MIT on understanding and reducing the risk of nuclear war. Tickets are free for students. To attend, please register here.

 

Saturday sessions

Workshops

Sunday Morning Planning Breakfast

Student-led session to design and implement programs enhancing existing campus groups, and organizing new ones; extending the network to campuses in Rhode Island, Connecticut, New Jersey, New Hampshire, Vermont and Maine.

For more information, contact Jonathan King at <jaking@mit.edu>, or call 617-354-2169

How AI Handles Uncertainty: An Interview With Brian Ziebart

Click here to see this page in other languages:  Russian

When training image detectors, AI researchers can’t replicate the real world. They teach systems what to expect by feeding them training data, such as photographs, computer-generated images, real video and simulated video, but these practice environments can never capture the messiness of the physical world.

In machine learning (ML), image detectors learn to spot objects by drawing bounding boxes around them and giving them labels. And while this training process succeeds in simple environments, it gets complicated quickly.

 

 

 

 

 

 

 

It’s easy to define the person on the left, but how would you draw a bounding box around the person on the right? Would you only include the visible parts of his body, or also his hidden torso and legs? These differences may seem trivial, but they point to a fundamental problem in object recognition: there rarely is a single best way to define an object.

As this second image demonstrates, the real world is rarely clear-cut, and the “right” answer is usually ambiguous. Yet when ML systems use training data to develop their understanding of the world, they often fail to reflect this. Rather than recognizing uncertainty and ambiguity, these systems often confidently approach new situations no differently than their training data, which can put the systems and humans at risk.

Brian Ziebart, a Professor of Computer Science at the University of Illinois at Chicago, is conducting research to improve AI systems’ ability to operate amidst the inherent uncertainty around them. The physical world is messy and unpredictable, and if we are to trust our AI systems, they must be able to safely handle it.

 

Overconfidence in ML Systems

ML systems will inevitably confront real-world scenarios that their training data never prepared them for. But, as Ziebart explains, current statistical models “tend to assume that the data that they’ll see in the future will look a lot like the data they’ve seen in the past.”

As a result, these systems are overly confident that they know what to do when they encounter new data points, even when those data points look nothing like what they’ve seen. ML systems falsely assume that their training prepared them for everything, and the resulting overconfidence can lead to dangerous consequences.

Consider image detection for a self-driving car. A car might train its image detection on data from the dashboard of another car, tracking the visual field and drawing bounding boxes around certain objects, as in the image below:

Bounding boxes on a highway – CloudFactory Blog

 

 

 

 

 

 

 

 

 

 

 

 

For clear views like this, image detectors excel. But the real world isn’t always this simple. If researchers train an image detector on clean, well-lit images in the lab, it might accurately recognize objects 80% of the time during the day. But when forced to navigate roads on a rainy night, it might drop to 40%.

“If you collect all of your data during the day and then try to deploy the system at night, then however it was trained to do image detection during the day just isn’t going to work well when you generalize into those new settings,” Ziebart explains.

Moreover, the ML system might not recognize the problem: since the system assumes that its training covered everything, it will remain confident about its decisions and continue “to make strong predictions that are just inaccurate,” Ziebart adds.

In contrast, humans tend to recognize when previous experience doesn’t generalize into new settings. If a driver spots an unknown object ahead in the road, she wouldn’t just plow through the object. Instead, she might slow down, pay attention to how other cars respond to the object, and consider swerving if she can do so safely. When humans feel uncertain about our environment, we exercise caution to avoid making dangerous mistakes.

Ziebart would like AI systems to incorporate similar levels of caution in uncertain situations. Instead of confidently making mistakes, a system should recognize its uncertainty and ask questions to glean more information, much like an uncertain human would.

 

An Adversarial Approach

Training and practice may never prepare AI systems for every possible situation, but researchers can make their training methods more foolproof. Ziebart posits that feeding systems messier data in the lab can train them to better recognize and address uncertainty.

Conveniently, humans can provide this messy, real-world data. By hiring a group of human annotators to look at images and draw bounding boxes around certain objects – cars, people, dogs, trees, etc. – researchers can “build into the classifier some idea of what ‘normal’ data looks like,” Ziebart explains.

“If you ask ten different people to provide these bounding boxes, you’re likely to get back ten different bounding boxes,” he says. “There’s just a lot of inherent ambiguity in how people think about the ground truth for these things.”

Returning to the image above of the man in the car, human annotators might give ten different bounding boxes that capture different portions of the visible and hidden person. By feeding ML systems this confusing and contradictory data, Ziebart prepares them to expect ambiguity.

“We’re synthesizing more noise into the data set in our training procedure,” Ziebart explains. This noise reflects the messiness of the real world, and trains systems to be cautious when making predictions in new environments. Cautious and uncertain, AI systems will seek additional information and learn to navigate the confusing situations they encounter.

Of course, self-driving cars shouldn’t have to ask questions. If a car’s image detection spots a foreign object up ahead, for instance, it won’t have time to ask humans for help. But if it’s trained to recognize uncertainty and act cautiously, it might slow down, detect what other cars are doing, and safely navigate around the object.

 

Building Blocks for Future Machines

Ziebart’s research remains in training settings thus far. He feeds systems messy, varied data and trains them to provide bounding boxes that have at least 70% overlap with people’s bounding boxes. And his process has already produced impressive results. On an ImageNet object detection task investigated in collaboration with Sima Behpour (University of Illinois at Chicago) and Kris Kitani (Carnegie Mellon University), for example, Ziebart’s adversarial approach “improves performance by over 16% compared to the best performing data augmentation method.” Trained to operate amidst uncertain environments, these systems more effectively manage new data points that training didn’t explicitly prepare them for.

But while Ziebart trains relatively narrow AI systems, he believes that this research can scale up to more advanced systems like autonomous cars and public transit systems.

“I view this as kind of a fundamental issue in how we design these predictors,” he says. “We’ve been trying to construct better building blocks on which to make machine learning – better first principles for machine learning that’ll be more robust.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Stephen Hawking in Memoriam

As we mourn the loss of Stephen Hawking, we should remember that his legacy goes far beyond science. Yes, of course he was one of the greatest scientists of the past century, discovering that black holes evaporate and helping found the modern quest for quantum gravity. But he also had a remarkable legacy as a social activist, who looked far beyond the next election cycle and used his powerful voice to bring out the best in us all. As a founding member of FLI’s Scientific Advisory board, he tirelessly helped us highlight the importance of long-term thinking and ensuring that we use technology to help humanity flourish rather than flounder. I marveled at how he could sometimes answer my emails faster than my grad students. His activism revealed the same visionary fearlessness as his scientific and personal life: he saw further ahead than most of those around him and wasn’t afraid of controversially sounding the alarm about humanity’s sloppy handling of powerful technology, from nuclear weapons to AI.

On a personal note, I’m saddened to have lost not only a long-time collaborator but, above all, a great inspiration, always reminding me of how seemingly insurmountable challenges can be overcome with creativity, willpower and positive attitude. Thanks Stephen for inspiring us all!

Can Global Warming Stay Below 1.5 Degrees? Views Differ Among Climate Scientists

The Paris Climate Agreement seeks to keep global warming well below 2 degrees Celsius relative to pre-industrial temperatures. In the best case scenario, warming would go no further than 1.5 degrees.

Many scientists see this as an impossible goal. A recent study by Peter Cox et al. postulates that, given a twofold increase in atmospheric carbon dioxide, there is only a 3% chance of keeping warming below 1.5 degrees.

But a study by Richard Millar et al. provides more reason for hope. The Millar report concludes that the 1.5 degree limit is still physically feasible, if only narrowly. It also provides an updated “carbon budget”—a projection of how much more carbon dioxide we can emit without breaking the 1.5 degree limit.

Dr. Joeri Rogelj, a climate scientist and research scholar with the Energy Program of the International Institute for Applied Systems Analysis, co-authored the Millar report. For Rogelj, the updated carbon budget is not the paper’s most important point. “Our paper shows to decision makers the importance of anticipating new and updated scientific knowledge,” he says.

Projected “carbon budgets” are rough estimates based on limited observations. These projections need to be continually updated as more data becomes available. Fortunately, the Paris Agreement calls for countries to periodically update their emission reduction pledges based on new estimates. Rogelj is hopeful “that this paper has put the necessity for a strong process on the radar of delegates.”

For scientists who have dismissed the 1.5 degree limit as impossible, the updating process might seem pointless. But Rogelj stresses that his team looked only at geophysical limitations, not political ones. Their report assumes that countries will agree to a zero emissions commitment—a much more ambitious scenario than other researchers have considered.

There is a misconception, Rogelj says, that the report claims to have found an inaccuracy in the Earth system models (ESMs) that are used to estimate human-driven warming. “We are using precisely those models to estimate the carbon budget from today onward,” Rogelj explains.

The problem is not the models, but rather the data fed into them. These simulations are often run using inexact projections of CO2 emissions. Over time, small discrepancies accumulate and are reflected in the warming predictions that the models make.

Given information about current CO2 emissions, however, ESMs make temperature predictions that are “quite accurate.” And when they are provided with an ambitious future scenario for emissions reduction, the models indicate that it is possible for global temperature increases to remain below 1.5 degrees.

So what would such a scenario look like? First off, emissions have to fall to zero. At the same time, the carbon budget needs to be continually reevaluated, and strategy changes must be based on the updated budget. For example, if emissions fall to zero but we’ve surpassed our carbon budget, then we’ll need to focus on making our emissions negative—in other words, on carbon dioxide removal.

Rogelj names two major processes for carbon dioxide removal: reforestation and bio-energy with carbon capture and storage. Some negative emissions processes, such as reforestation, provide benefits beyond carbon capture, while others may have undesired side effects.

But Rogelj is quick to add that these negative emissions technologies are not “silver bullets.” It’s too soon to know if carbon dioxide removal at a global scale will actually be necessary—we’ll have to get to zero emissions before we can tell. But such technologies could also help us reach zero in the first place.

What else will get us to zero emissions? According to Rogelj, we need “a strong emphasis on energy efficiency, combined with an electrification of end-use sectors like transport and building and a shift away from fossil fuels.” This will require a major shift in investment patterns. We want to avoid “locking into carbon dioxide-intensive infrastructure” that would saddle future generations with a dependency on non-renewable energy, he explains.

Rogelj stresses that his team’s findings are based only on geophysical data. Societal factors are a different matter: It is up to individual countries to decide where reducing emissions falls on their list of priorities.

However, the stipulation in the Paris Climate Agreement that countries periodically update their pledges is a source of optimism. Rogelj, for his part, is cautiously hopeful: “Looking at real world dynamics in terms of costs of renewables and energy storage, I personally think there is room for pledges to be strengthened over the coming five to ten years as countries better understand what is possible and how these pledges can align with other priorities.”

But not everyone in the scientific community shares the hopeful tone struck by Rogelj and his team. An article by the MIT Technology Review outlines “the five most worrisome climate developments” from 2017.

To start, global emissions are on the rise, up 2% from 2016. While the prior few years had seen a relative flattening in emissions, this more recent data shattered hopes that the trend would continue. On top of that, scientists are finding that observable climate trends line up best with “worst-case scenario” models of global warming—that is, global temperatures could rise five degrees in the next century.

And the arctic is melting much faster than scientists predicted. A recent report by the U.S. National Oceanic and Atmospheric Administration (NOAA) declared “that the North Pole had reached a ‘new normal,’ with no sign of returning to a ‘reliably frozen region.’”

Melting glaciers and sea ice trigger a whole new set of problems. The disappearing ice will cause sea levels to rise, and the “reflective white snow and ice turn into heat-absorbing dark-blue water… the Arctic will send less heat back into space, which leads to more warming, more melting, and more sea-level rise still.”

And finally, natural disasters are becoming increasingly ferocious as weather patterns mutate. The United States saw this first-hand, with massive wildfires on the west coast—including the largest ever in California’s history—and a string of hurricanes that ravaged the Virgin Islands, Puerto Rico, and many southern states.

These consequences of global warming are beginning to affect areas of social interest beyond the environment. The 2017 Atlantic hurricane season, for example, has been a massive economic burden, wracking up more than $200 billion in damages.

In Rogelj’s words, “Right now we really need to find ways to achieve multiple societal objectives, to find policies and measures and options that allow us to achieve those together.” As governments come to see how climate protection “can align with other priorities like reducing air pollution, and providing clean water and reliable energy,” we have reason to hope that it may become a higher and higher priority.

Podcast: AI and the Value Alignment Problem with Meia Chita-Tegmark and Lucas Perry

What does it mean to create beneficial artificial intelligence? How can we expect to align AIs with human values if humans can’t even agree on what we value? Building safe and beneficial AI involves tricky technical research problems, but it also requires input from philosophers, ethicists, and psychologists on these fundamental questions. How can we ensure the most effective collaboration?

Ariel spoke with FLI’s Meia Chita-Tegmark and Lucas Perry on this month’s podcast about the value alignment problem: the challenge of aligning the goals and actions of AI systems with the goals and intentions of humans. 

Topics discussed in this episode include:

  • how AGI can inform human values,
  • the role of psychology in value alignment,
  • how the value alignment problem includes ethics, technical safety research, and international coordination,
  • a recent value alignment workshop in Long Beach,
  • and the possibility of creating suffering risks (s-risks).

This podcast was edited by Tucker Davey. You can listen to it above or read the transcript below.

 

Ariel: I’m Ariel Conn with the Future of Life Institute, and I’m excited to have FLI’s Lucas Perry and Meia Chita-Tegmark with me today to talk about AI, ethics and, more specifically, the value alignment problem. But first, if you’ve been enjoying our podcast, please take a moment to subscribe and like this podcast. You can find us on iTunes, SoundCloud, Google Play, and all of the other major podcast platforms.

And now, AI, ethics, and the value alignment problem. First, consider the statement “I believe that harming animals is bad.” Now, that statement can mean something very different to a vegetarian than it does to an omnivore. Both people can honestly say that they don’t want to harm animals, but how they define “harm” is likely very different, and these types of differences in values are common between countries and cultures, and even just between individuals within the same town. And then we want to throw AI into the mix. How can we train AIs to respond ethically to situations when the people involved still can’t come to an agreement about what an ethical response should be?

The problem is even more complicated because often we don’t even know what we really want for ourselves, let alone how to ask an AI to help us get what we want. And as we’ve learned with stories like that of King Midas, we need to be really careful what we ask for. That is, when King Midas asked the genie to turn everything to gold, he didn’t really want everything — like his daughter and his food — turned to gold. And we would prefer than an AI we design recognize that there’s often implied meaning in what we say, even if we don’t say something explicitly. For example, if we jump into an autonomous car and ask it to drive us to the airport as fast as possible, implicit in that request is the assumption that, while we might be OK with some moderate speeding, we intend for the car to still follow most rules of the road, and not drive so fast as to put anyone’s life in danger or take illegal routes. That is, when we say “as fast as possible,” we mean “as fast as possible within the rules of law,” and not within the rules of physics or within the laws of physics. And these examples are just the tiniest tip of the iceberg, given that I didn’t even mention artificial general intelligence (AGI) and how that can be developed such that its goals align with our values.

So as I mentioned a few minutes ago, I’m really excited to have Lucas and Meia joining me today. Meia is a co-founder of the Future of Life Institute. She’s interested in how social sciences can contribute to keeping AI beneficial, and her background is in social psychology. Lucas works on AI and nuclear weapons risk-related projects at FLI. His background is in philosophy with a focus on ethics. Meia and Lucas, thanks for joining us today.

Meia: It’s a pleasure. Thank you.

Lucas: Thanks for having us.

Ariel: So before we get into anything else, one of the big topics that comes up a lot when we talk about AI and ethics is this concept value alignment. I was hoping you could both maybe talk just a minute about what value alignment is and why it’s important to this question of AI and ethics.

Lucas: So value alignment, in my view, is bringing AI’s goals, actions, intentions and decision-making processes in accordance with what humans deem to be the good or what we see as valuable or what our ethics actually are.

Meia: So for me, from the point of view of psychology, of course, I have to put the humans at the center of my inquiry. So from that point of view, value alignment … You can think about it also in terms of humans’ relationships with other humans. But I think it’s even more interesting when you add artificial agents into the mix. Because now you have an entity that is so wildly different from humans yet we would like it to embrace our goals and our values in order to keep it beneficial for us. So I think the question of value alignment is very central to keeping AI beneficial.

Lucas: Yeah. So just to expand on what I said earlier: The project of value alignment is in the end creating beneficial AI. It’s working on what it means for something to be beneficial, what beneficial AI exactly entails, and then learning how to technically instantiate that into machines and AI systems. Also, building the proper like social and political context for that sort of technical work to be done and for it to be fulfilled and manifested in our machines and AIs.

Ariel: So when you’re thinking of AI and ethics, is value alignment basically synonymous, just another way of saying AI and ethics or is it a subset within this big topic of AI and ethics?

Lucas: I think they have different connotations. If one’s thinking about AI ethics, I think that one is tending to be moreso focused on applied ethics and normative ethics. One might be thinking about the application of AI systems and algorithms and machine learning in domains in the present day and in the near future. So one might think about atomization and other sorts of things. I think that when one is thinking about value alignment, it’s much more broad and expands also into metaethics and really sort of couches and frames the problem of AI ethics as something which happens over decades and which has a tremendous impact. I think that value alignment has a much broader connotation than what AI ethics has traditionally had.

Meia: I think it all depends on how you define value alignment. I think if you take the very broad definition that Lucas has just proposed, I think that yes, it probably includes AI ethics. But you can also think of it more narrowly as simply instantiating your own values into AI systems and having them adopt your goals. In that case, I think there are other issues as well because if you think about it from the point of view of psychology, for example, then it’s not just about which values get instantiated and how you do that, how you solve the technical problem, but also we know that humans, even if they know what goals they have and what values they uphold, it’s very, very hard for them sometimes to actually act in accordance to them because they have all sorts of cognitive and emotional effective limitations. So in that case I think value alignment is, in this narrow sense, is basically not sufficient. We also need to think about AIs and applications of AIs in terms of how do they help us and how do they make sure that we gain the cognitive competencies that we need to be moral beings and to be really what we should be, not just what we are.

Lucas: Right. I guess to expand on what I was just saying. Value alignment I think in the more traditional sense, it’s sort of all … It’s more expansive and inclusive in that it’s recognizing a different sort of problem than AI ethics alone has. I think that when one is thinking about value alignment, there are elements of thinking about — somewhat about machine ethics but also about social, political, technical and ethical issues surrounding the end goal of eventually creating AGI. Whereas, AI ethics can be more narrowly interpreted just as certain sorts of specific cases where AI’s having impact and implications in our lives in the next 10 years. Whereas, value alignment’s really thinking about the instantiation of ethics and machines and making machine systems that are corrigible and robust and docile, which will create a world that we’re all happy about living in.

Ariel: Okay. So I think that actually is going to flow really nicely into my next question, and that is, at FLI we tend to focus on existential risks. I was hoping you could talk a little bit about how issues of value alignment are connected to the existential risks that we concern ourselves with.

Lucas: Right. So, we can think of AI systems as being very powerful optimizers. We can imagine there being a list of all possible futures and what intelligence is good for is for modeling the world and then committing to and doing actions which constrain the set of all possible worlds to ones which are desirable. So intelligence is sort of the means by which we get to an end, and ethics is the end towards which we strive. So these are how these two things really integral and work together and how AI without ethics makes no sense and how ethics without AI or intelligence in general also just doesn’t work. So in terms of existential risk, there are possible futures that intelligence can lead us to where earth-originating intelligent life no longer exists either intentionally or by accident. So value alignment sort of fits in by constraining the set of all possible futures by working on technical work by doing political and social work and also work in ethics to constrain the actions of AI systems such that existential risks do not occur, such that by some sort of technical oversight, by some misalignment of values, by some misunderstanding of what we want, the AI generates an existential risk.

Meia: So we should remember that homo sapiens represent an existential risk to itself also. We are creating nuclear weapons. We have more of them than we need. So many, in fact, that we could destroy the entire planet with them. Not to mention homo sapiens has also represented an existential risk for all other species. The problem is AI is that we’re introducing in the mix a whole new agent that is by definition supposed to be more intelligent, more powerful than us and also autonomous. So as Lucas mentioned, it’s very important to think through what kind of things and abilities do we delegate to these AIs and how can we make sure that they have the survival and the flourishing of our species in mind. So I think this is where value alignment comes in as a safeguard against these very terrible and global risks that we can imagine coming from AI.

Lucas: Right. What makes doing that so difficult is beyond the technical issue of just having AI researchers and AI safety researchers knowing how to just get AI systems to actually do what we want without creating a universe of paperclips. There’s also this terrible social and political context in which this is all happening where there is really great game-theoretic incentives to be the first person to create artificial general intelligence. So in a race to create AI, a lot of these efforts that seem very obvious and necessary could be cut in favor of more raw power. I think that’s probably one of the biggest risks for us not succeeding in creating value-aligned AI.

Ariel: Okay. Right now it’s predominantly technical AI people who are considering mostly technical AI problems. How to solve different problems is usually, you need a technical approach for this. But when it comes to things like value alignment and ethics, most of the time I’m hearing people suggest that we can’t leave that up to just the technical AI researchers. So I was hoping you could talk a little bit about who should be part of this discussion, why we need more people involved, how we can get more people involved, stuff like that.

Lucas: Sure. So maybe if I just break the problem down into just what I view to be the three different parts then talking about it will make a little bit more sense. So we can break down the value alignment problem into three separate parts. The first one is going to be the technical issues, the issues surrounding actually creating artificial intelligence. The issues of ethics, so the end towards which we strive. The set of possible futures which we would be happy in living, and then also there’s the governance and the coordination and the international problem. So we can sort of view this as a problem of intelligence, a problem of agreeing on the end towards which intelligence is driven towards, and also the political and social context in which all of this happens.

So thus far, there’s certainly been a focus on the technical issue. So there’s been a big rise in the field of AI safety and in attempts to generate beneficial AI, attempts at creating safe AGI and mechanisms for avoiding reward hacking and other sorts of things that happen when systems are trying to optimize their utility function. The Concrete Problems on AI Safety paper has been really important and sort of illustrates some of these technical issues. But even between technical AI safety research and ethics there’s disagreement about something also like machine ethics. So how important is machine ethics? Where does machine ethics fit in to technical AI safety research? How much time and energy should we put into certain kinds of technical AI research versus how much time and effort should we put into issues in governance and coordination and addressing the AI arms race issues? How much of ethics do we really need to solve?

So I think there’s a really important and open question regarding how do we apply and invest our limited resources in sort of addressing these three important cornerstones in value alignment so that the technical issue, the issues in ethics and then issues in governance and coordination, and how do we optimize working on these issues given the timeline that we have? How much resources should we put in each one? I think that’s an open question. Yeah, one that certainly needs to be addressed more about how we’re going to move forward given limited resources.

Meia: I do think though the focus so far has been so much on the technical aspect. As you were saying, Lucas, there are other aspects to this problem that need to be tackled. What I’d like to emphasize is that we cannot solve the problem if we don’t pay attention to the other aspects as well. So I’m going to try to defend, for example, psychology here, which has been largely ignored I think in the conversation.

So from the point of view of psychology, I think the value alignment problem is double fold in a way. It’s about a triad of interactions. Human, AI, other humans, right? So we are extremely social animals. We interact a lot with other humans. We need to align our goals and values with theirs. Psychology has focused a lot on that. We have a very sophisticated set of psychological mechanisms that allow us to engage in very rich social interactions. But even so, we don’t always get it right. Societies have created a lot of suffering, a lot of moral harm, injustice, unfairness throughout the ages. So for example, we are very ill-prepared by our own instincts and emotions to deal with inter-group relations. So that’s very hard.

Now, people coming from the technical side, they can say, “We’re just going to have AI learn our preferences.” Inverse reinforcement learning is a proposal that says that basically explains how to keep humans in the loop. So it’s a proposal for programing AI such that it gets its reward not from achieving a goal but from getting good feedback from a human because it achieved a goal. So the hope is that this way AI can be correctable and can learn from human preferences.

As a psychologist, I am intrigued, but I understand that this is actually very hard. Are we humans even capable of conveying the right information about our preferences? Do we even have access to them ourselves or is this all happening in some sort of subconscious level? Sometimes knowing what we want is really hard. How do we even choose between our own competing preferences? So this involves a lot more sophisticated abilities like impulse control, executive function, etc. I think that if we don’t pay attention to that as well in addition to solving the technical problem, I think we are very likely to not get it right.

Ariel: So I’m going to want to come back to this question of who should be involved and how we can get more people involved, but one of the reasons that I’m talking to the both of you today is because you actually have made some steps in broadening this discussion already in that you set up a workshop that did bring together a multidisciplinary team to talk about value alignment. I was hoping you could tell us a bit more about how that workshop went, what interesting insights were gained that might have been expressed during the workshop, what you got out of it, why you think it’s important towards the discussion? Etc.

Meia: Just to give a few facts about the workshop. The workshop took place in December 2017 in Long Beach, California. We were very lucky to have two wonderful partners in co-organizing this workshop. The Berggruen Institute and the Canadian Institute for Advanced Research. And the idea for the workshop was very much to have a very interdisciplinary conversation about value alignment and reframe it as not just a technical problem but also one that involves disciplines such as philosophy and psychology, political science and so on. So we were very lucky actually to have a fantastic group of people there representing all these disciplines. The conversation was very lively and we discussed topics all the way from near term considerations in AI and how we align AI to our goals and also all the way to thinking about AGI and even super intelligence. So it was a fascinating range both of topics discussed and also perspectives being represented.

Lucas: So my inspiration for the workshop was being really interested in ethics and the end towards which this is all going. What really is the point of creating AGI and perhaps even eventually superintelligence? What is it that is good and what is that is valuable? Broadening from that and becoming more interested in value alignment, the conversation thus far has been primarily understood as something that is purely technical. So value alignment has only been seen as something that is for technical AI safety researchers to work on because there are technical issues regarding AI safety and how you get AIs to do really simple things without destroying the world or ruining a million other things that we care about. But this is really, as we discussed earlier, an interdependent issue that covers issues in metaethics and normative ethics, applied ethics. It covers issues in psychology. It covers issue in law, policy, governance, coordination. It covers the AI arms race issue. Solving the value alignment problem and creating a future with beneficial AI is a civilizational project where we need everyone working on all these different issues. On issues of value, on issues of game theory among countries, on the technical issues, obviously.

So what I really wanted to do was I wanted to start this workshop in order to broaden the discussion. To reframe value alignment as not just something in technical AI research but something that really needs voices from all disciplines and all expertise in order to have a really robust conversation that reflects the interdependent nature of the issue and where different sorts of expertise on the different parts of the issue can really come together and work on it.

Ariel: Is there anything specific that you can tell us about what came out of the workshop? Were there any comments that you thought were especially insightful or ideas that you think are important for people to be considering?

Lucas: I mean, I think that for me one of the takeaways from the workshop is that there’s still a mountain of work to do and that there are a ton of open questions. This is a very, very difficult issue. I think that one thing I took away from the workshop was that we couldn’t even agree on the minimal conditions for which it would be okay to safely deploy AGI. There are just issues that seem extremely trivial in value alignment from the technical side and from the ethical side that seem very trivial, but on which I think there is very little understanding or agreement right now.

Meia: I think the workshop was a start and one good thing that happened during the workshop is I felt that the different disciplines or rather their representatives were able to sort of air out their frustrations and also express their expectations of the others. So I remember this quite iconic moment when one roboticist simply said, “But I really want you ethics people to just tell me what to implement in my system. What do you want my system to do?” So I think that was actually very illustrative of what Lucas was saying — the need for more joint work. I think there was a lot of expectations I think from both the technical people towards the ethicists but also from the ethicists in terms of like, “What are you doing? Explain to us what are the actual ethical issues that you think you are facing with the things that you are building?” So I think there’s a lot of catching up to do on both sides and there’s much work to be done in terms of making these connections and bridging the gaps.

Ariel: So you referred to this as sort of a first step or an initial step. What would you like to see happen next?

Lucas: I don’t have any concrete or specific ideas for what exactly should happen next. I think that’s a really difficult question. Certainly, things that most people would want or expect. I think in the general literature and conversations that we were having, I think that value alignment, as a word and as something that we understand, needs to be expanded outside of the technical context. I don’t think that it’s expanded that far. I think that more ethicists and more moral psychologists and people in law policy and governance need to come in and need to work on this issue. I’d like to see more coordinated collaborations, specifically involving interdisciplinary crowds informing each other and addressing issues and identifying issues and really some sorts of formal mechanisms for interdisciplinary coordination on value alignment.

It would be really great if people in technical research, in technical AI safety research and in ethics and governance could also identify all of the issues in their own fields, which the resolution to those issues and the solution to those issues requires answers from other fields. So for example, inverse reinforcement learning is something that Meia was talking about earlier and I think it’s something that we can clearly decide and see as being interdependent on a ton of issues in a law and also in ethics and in value theory. So that would be sort of like an issue or node in the landscape of all issues and technical safety research that would be something that is interdisciplinary.

So I think it would be super awesome if everyone from their own respective fields are able to really identify the core issues which are interdisciplinary and able to dissect them into the constituent components and sort of divide them among the disciplines and work together on them and identify the different timelines at which different issues need to be worked on. Also, just coordinate on all those things.

Ariel: Okay. Then, Lucas, you talked a little bit about nodes and a landscape, but I don’t think we’ve explicitly pointed out that you did create a landscape of value alignment research so far. Can you talk a little bit about what that is and how people can use it?

Lucas: Yeah. For sure. With the help of other colleagues at the Future of Life Institute like Jessica Cussins and Richard Mallah, we’ve gone ahead and created a value alignment conceptual landscape. So what this is is it’s a really big tree, almost like an evolutionary tree that you would see, but what it is, is a conceptual mapping and landscape of the value alignment problem. What it’s broken down into are the three constituent components, which we were talking about earlier, which is the technical issues, the issues in technically creating safe AI systems. Issues in ethics, breaking that down into issues in metaethics and normative ethics and applied ethics and moral psychology and descriptive ethics where we’re trying to really understand values, what it means for something to be valuable and what is the end towards which intelligence will be aimed at. Then also, the other last section is governance. So issues in coordination and policy and law in creating a world where AI safety research can proceed and where there aren’t … Where we don’t develop or allow a sort of winner-take-all scenario to rush us towards the end and not really have a final and safe solution towards fully autonomous powerful systems.

So what the landscape here does is it sort of outlines all of the different conceptual nodes in each of these areas. It lays out what all the core concepts are, how they’re all related. It defines the concepts and also gives descriptions about how the concepts fit into each of these different sections of ethics, governance, and technical AI safety research. So the hope here is that people from different disciplines can come and see the truly interdisciplinary nature of the value alignment problem, to see where ethics and governance and the technical AI safety research stuff all fits in together and how this all together really forms, I think, the essential corners of the value alignment problem. It’s also nice for researchers and other persons to understand the concepts and the landscape of the other parts of this problem.

I think that, for example, technical AI safety researchers probably don’t know much about metaethics or they don’t spend too much time thinking about normative ethics. I’m sure that ethicists don’t spend very much time thinking about technical value alignment and how inverse reinforcement learning is actually done and what it means to do robust human imitation in machines. What are the actual technical, ethical mechanisms that are going to go into AI systems. So I think that this is like a step in sort of laying out the conceptual landscape, in introducing people to each other’s concepts. It’s a nice visual way of interacting with I think a lot of information and sort of exploring all these different really interesting nodes that explore a lot of very deep, profound moral issues, very difficult and interesting technical issues, and issues in law, policy and governance that are really important and profound and quite interesting.

Ariel: So you’ve referred to this as the value alignment problem a couple times. I’m curious, do you see this … I’d like both of you to answer this. Do you see this as a problem that can be solved or is this something that we just always keep working towards and it’s going to influence — whatever the current general consensus is will influence how we’re designing AI and possibly AGI, but it’s not ever like, “Okay. Now we’ve solved the value alignment problem.” Does that make sense?

Lucas: I mean, I think that that sort of question really depends on your metaethics, right? So if you think there are moral facts, if you think that more statements can be true or false and aren’t just sort of subjectively dependent upon whatever our current values and preferences historically and evolutionarily and accidentally happen to be, then there is an end towards which intelligence can be aimed that would be objectively good and which would be the end toward which we would strive. In that case, if we had solved the technical issue and the governance issue and we knew that there was a concrete end towards which we would strive that was the actual good, then the value alignment problem would be solved. But if you don’t think that there is a concrete end, a concrete good, something that is objectively valuable across all agents, then the value alignment problem or value alignment in general is an ongoing process and evolution.

In terms of the technical and governance sides of those, I think that there’s nothing in the laws of physics or I think in computer science or in game theory that says that we can’t solve those parts of the problem. Those ones seem intrinsically like they can be solved. That’s nothing to say about how easy or how hard it is to solve those. But whether or not there is sort of an end towards value alignment I think depends on difficult questions in metaethics and whether something like moral error theory is true where all moral statements are simply false and that morality is maybe sort of just like a human invention, which has no real answers or who’s answers are all false. I think that’s sort of the crux of whether or not value alignment can “be solved” because I think the technical issues and the issues in governance are things which are in principle able to be solved.

Ariel: And Meia?

Meia: I think that regardless of whether there is an absolute end to this problem or not, there’s a lot of work that we need to do in between. I also think that in order to even achieve this end, we need more intelligence, but as we create more intelligent agents, again, this problem gets magnified. So there’s always going to be a race between the intelligence that we’re creating and making sure that it is beneficial. I think at every step of the way, the more we increase the intelligence, the more we need to think about the broader implications. I think in the end we should think of artificial intelligence also not just as a way to amplify our own intelligence but also as a way to amplify our moral competence as well. As a way to gain more answers regarding ethics and what our ultimate goals should be.

So I think that the interesting questions that we can do something about are somewhere sort of in between. We will not have the answer before we are creating AI. So we always have to figure out a way to keep up with the development of intelligence in terms of our development of moral competence.

Ariel: Meia, I want to stick with you for just a minute. When we talked for the FLI end of your podcast, one of the things you said you were looking forward to in 2018 is broadening this conversation. I was hoping you could talk a little bit more about some of what you would like to see happen this year in terms of getting other people involved in the conversation, who you would like to see taking more of an interest in this?

Meia: So I think that unfortunately, especially in academia, we’ve sort of defined our work so much around these things that we call disciplines. I think we are now faced with problems, especially in AI, that really are very interdisciplinary. We cannot get the answers from just one discipline. So I would actually like to see in 2018 more sort of, for example, funding agencies proposing and creating funding sources for interdisciplinary projects. The way it works, especially in academia, so you propose grants to very disciplinary-defined granting agencies.

Another thing that would be wonderful to start happening is our education system is also very much defined and described around these disciplines. So I feel that, for example, there’s a lack of courses, for example, that teach students in technical fields things about ethics, moral psychology, social sciences and so on. The converse is also true; in social sciences and in philosophy we hear very little about advancements in artificial intelligence and what’s new and what are the problems that are there. So I’d like to see more of that. I’d like to see more courses like this developed. I think a friend of mine and I, we’ve spent some time thinking about how many courses are there that have an interdisciplinary nature and actually talk about the societal impacts of AI and there’s a handful in the entire world. I think we counted about five or six of them. So there’s a shortage of that as well.

But then also educating the general public. I think thinking about the implications of AI and also the societal implications of AI and also the value alignment problem is something that’s probably easier for the general public to grasp rather than thinking about the technical aspects of how to make it more powerful or how to make it more intelligent. So I think there’s a lot to be done in educating, funding, and also just simply having these conversations. I also very much admire what Lucas has been doing. I hope he will expand on it, creating this conceptual landscape so that we have people from different disciplines understanding their terms, their concepts, each other’s theoretical frameworks with which they work. So I think all of this is valuable and we need to start. It won’t be completely fixed in 2018 I think. But I think it’s a good time to work towards these goals.

Ariel: Okay. Lucas, is there anything that you wanted to add about what you’d like to see happen this year?

Lucas: I mean, yeah. Nothing else I think to add on to what I said earlier. Obviously we just need as many people from as many disciplines working on this issue because it’s so important. But just to go back a little bit, I was also really liking what Meia said about how AI systems and intelligence can help us with our ethics and with our governance. I think that seems like a really good way forward potentially if as our AI systems grow more powerful in their intelligence, they’re able to inform us moreso about our own ethics and our own preferences and our own values, about our own biases and about what sorts of values and moral systems are really conducive to the thriving of human civilization and what sorts of moralities lead to sort of navigating the space of all possible minds in a way that is truly beneficial.

So yeah. I guess I’ll be excited to see more ways in which intelligence and AI systems can be deployed for really tackling the question of what beneficial AI exactly entails. What does beneficial mean? We all want beneficial AI, but what is beneficial, what does that mean? What does that mean for us in a world in which no one can agree on what beneficial exactly entails? So yeah, I’m just excited to see how this is going to work out, how it’s going to evolve and hopefully we’ll have a lot more people joining this work on this issue.

Ariel: So your comment reminded me of a quote that I read recently that I thought was pretty interesting. I’ve been reading Paula Boddington’s book Toward a Code of Ethics for Artificial Intelligence. This was actually funded at least in part if not completely by FLI grants. But she says, “It’s worth pointing out that if we need AI to help us make moral decisions better, this cast doubt on the attempts to ensure humans always retain control over AI.” I’m wondering if you have any comments on that.

Lucas: Yeah. I don’t know. I think this sort of a specific way of viewing the issue or it’s a specific way of viewing what AI systems are for and the sort of future that we want. In the end is the best at all possible futures a world in which human beings ultimately retain full control over AI systems. I mean, if AI systems are autonomous and if value alignment actually succeeds, then I would hope that we created AI systems which are more moral than we are. AI systems which have better ethics, which are less biased, which are more rational, which are more benevolent and compassionate than we are. If value alignment is able to succeed and if we’re able to create autonomous intelligent systems of that sort of caliber of ethics and benevolence and intelligence, then I’m not really sure what the point is of maintaining any sort of meaningful human control.

Meia: I agree with you, Lucas. That if we do manage to create … In this case, I think it would have to be artificial general intelligence that is more moral, more beneficial, more compassionate than we are, then the issue of control, it’s probably not so important. But in the meantime, I think, while we are sort of tinkering with artificial intelligent systems, I think the issue of control is very important.

Lucas: Yeah. For sure.

Meia: Because we wouldn’t want to … We wouldn’t want to cut out of the loop too early before we’ve managed to properly test the system, make sure that indeed it is doing what we intended to do.

Lucas: Right. Right. I think that in the process of that that it requires a lot of our own moral evolution, something which we humans are really bad and slow at. As president of FLI Max Tegmark likes to talk about, he likes to talk about the race between our growing wisdom and the growing power of our technology. Now, human beings are really kind of bad at keeping our wisdom in pace with the growing power of our technology. If we sort of look at the moral evolution of our species, we can sort of see huge eras in which things which were seen as normal and mundane and innocuous, like slavery or the subjugation of women or other sorts of things like that. Today we have issues with factory farming and animal suffering and income inequality and just tons of people who are living with exorbitant wealth that doesn’t really create much utility for them, whereas there’s tons of other people who are in poverty and who are still starving to death. There are all sorts of things that we can see in the past as being obviously morally wrong.

Meia: Under the present too.

Lucas: Yeah. So then we can see that obviously there must be things like that today. We wonder, “Okay. What are the sorts of things today that we see and innocuous and normal and as mundane that the people of tomorrow, as William MacAskill says, will see us as moral monsters? How are we moral monsters today, but we simply can’t see it? So as we create powerful intelligence systems and we’re working on our ethics and we’re trying to really converge on constraining the set of all possible worlds into ones which are good and which are valuable and ethical, it really demands a moral evolution of ourselves that we sort of have to figure out ways to catalyze and work on and move through, I think, faster.

Ariel: Thank you. So as you consider attempts to solve the value alignment problem, what are you most worried about, either in terms of us solving it badly or not quickly enough or something along those lines? What is giving you the most hope in terms of us being able to address this problem?

Lucas: I mean, I think just technically speaking, ignoring the likelihood of this — the worst of all possible outcomes would be something like an s-risk. So an s-risk is a subset of x-risks — s-risk stands for suffering risk. So this is a sort of risk whereby some sort of value misalignment, whether it be intentional or much more likely accidental, some seemingly astronomical amount of suffering is produced by deploying a misaligned AI system. The way that this was function is given certain sorts of assumptions about the philosophy of mind, about consciousness and machines, if we understand potentially consciousness and experience to be substrate-independent, meaning if consciousness can be instantiated in machine systems, that you don’t just need meat to be conscious, but you need something like integrated information or information processing or computation or something like that, then the invention of AI systems and superintelligence and the spreading of intelligence, which optimizes towards any sort of arbitrary end, it could potentially lead to vast amounts of digital suffering, which would potentially arise accidentally or through subroutines or simulations, which would be epistemically useful but that involve a great amount of suffering. That coupled with these artificial intelligent systems running on silicon and iron and not on squishy, wet, human neurons would be that it would be running at digital time scales and not biological time scales. So there would be huge amplification of the speed of which the suffering was run. So subjectively, we might infer that a second for a computer, a simulated person on a computer, would be much greater than that for a biological person. Then we can sort of reflect that these are the sorts of risks — or an s-risk would be something that would be really bad. Just any sort of way that AI can be misaligned and lead to a great amount of suffering. There’s a bunch of different ways that this could happen.

So something like an s-risk would be something super terrible but it’s not really clear how likely that would be. But yeah, I think that beyond that obviously we’re worried about existential risk, we’re worried about ways that this could curtail or destroy the development of earth-originating intelligent life. Ways that this really might happen are I think most likely because of this winner-take-all scenario that you have with AI. We’ve had nuclear weapons for a very long time now, and we’re super lucky that nothing bad has happened. But I think the human civilization is really good at getting stuck into minimum equilibria where we get locked into these positions where it’s not easy to escape from. So it’s really not easy to disarm and get out of the nuclear weapons situation once we’ve discovered it. Once we start to develop, I think, more powerful and robust AI systems, I think already that a race towards AGI and towards more and more powerful AI might be very, very hard to stop if we don’t make significant progress on that soon, if we’re not able to get a ban on lethal autonomous weapons and if we’re not able to introduce any real global coordination and that we all just start racing towards more powerful systems that there might be a race towards AGI, which would cut corners on safety and potentially make the likelihood of an existential risk or suffering risk more likely.

Ariel: Are you hopeful for anything?

Lucas: I mean, yeah. If we get it right, then the next billion years can be super amazing, right? It’s just kind of hard to internalize that and think about that. It’s really hard to say I think how likely it is that we’ll succeed in any direction. But yeah, I’m hopeful that if we succeed in value alignment that the future can be unimaginably good.

Ariel: And Meia?

Meia: What’s scary to me is that it might be too easy to create intelligence. That there’s nothing in the laws of physics making it hard for us. Thus I think that it might happen too fast. Evolution took a long time to figure out how to make us intelligent, but that was probably just because it was trying to optimize for things like energy consumption and making us a certain size. So that’s scary. It’s scary that it’s happening so fast. I’m particularly scared that it might be easy to crack general artificial intelligence. I keep asking Max, “Max, but isn’t there anything in the laws of physics that might make it tricky?” His answer and also that of more physicists that I’ve been discussing with is that, “No, it doesn’t seem to be the case.”

Now, what makes me hopeful is that we are creating this. Stuart Russell likes to give this example of a message from an alien civilization, an alien intelligence that says, “We will be arriving in 50 years.” Then he poses the question, “What would you do when you prepare for that?” But I think with artificial intelligence it’s different. It’s not like it’s arriving and it’s a given and it has a certain form or shape that we cannot do anything about. We are actually creating artificial intelligence. I think that’s what makes me hopeful that if we actually research it right, that if we think hard about what we want and we work hard at getting our own act together, first of all, and also on making sure that this stays and is beneficial, we have a good chance to succeed.

Now, there’ll be a lot of challenges in between from very near-term issues like Lucas was mentioning, for example, autonomous weapons, weaponizing our AI and giving it the right to harm and kill humans, to other issues regarding income inequality enhanced by technological development and so on, to down the road how do we make sure that autonomous AI systems actually adopt our goals. But I do feel that it is important to try and it’s important to work at it. That’s what I’m trying to do and that’s what I hope others will join us in doing.

Ariel: All right. Well, thank you both again for joining us today.

Lucas: Thanks for having us.

Meia: Thanks for having us. This was wonderful.

Ariel: If you’re interested in learning more about the value alignment landscape that Lucas was talking about, please visit FutureofLife.org/valuealignmentmap. We’ll also link to this in the transcript for this podcast. If you enjoyed this podcast, please subscribe, give it a like, and share it on social media. We’ll be back again next month with another conversation among experts.

How to Prepare for the Malicious Use of AI

How can we forecast, prevent, and (when necessary) mitigate the harmful effects of malicious uses of AI?

This is the question posed by a 100-page report released last week, written by 26 authors from 14 institutions. The report, which is the result of a two-day workshop in Oxford, UK followed by months of research, provides a sweeping landscape of the security implications of artificial intelligence.

The authors, who include representatives from the Future of Humanity Institute, the Center for the Study of Existential Risk, OpenAI, and the Center for a New American Security, argue that AI is not only changing the nature and scope of existing threats, but also expanding the range of threats we will face. They are excited about many beneficial applications of AI, including the ways in which it will assist defensive capabilities. But the purpose of the report is to survey the landscape of security threats from intentionally malicious uses of AI.

“Our report focuses on ways in which people could do deliberate harm with AI,” said Seán Ó hÉigeartaigh, Executive Director of the Cambridge Centre for the Study of Existential Risk. “AI may pose new threats, or change the nature of existing threats, across cyber, physical, and political security.”

Importantly, this is not a report about a far-off future. The only technologies considered are those that are already available or that are likely to be within the next five years. The message therefore is one of urgency. We need to acknowledge the risks and take steps to manage them because the technology is advancing exponentially. As reporter Dave Gershgorn put it, “Every AI advance by the good guys is an advance for the bad guys, too.”

AI systems tend to be more efficient and more scalable than traditional tools. Additionally, the use of AI can increase the anonymity and psychological distance a person feels to the actions carried out, potentially lowering the barrier to committing crimes and acts of violence. Moreover, AI systems have their own unique vulnerabilities including risks from data poisoning, adversarial examples, and the exploitation of flaws in their design. AI-enabled attacks will outpace traditional cyberattacks because they will generally be more effective, more finely targeted, and more difficult to attribute.

The kinds of attacks we need to prepare for are not limited to sophisticated computer hacks. The authors suggest there are three primary security domains: digital security, which largely concerns cyberattacks; physical security, which refers to carrying out attacks with drones and other physical systems; and political security, which includes examples such as surveillance, persuasion via targeted propaganda, and deception via manipulated videos. These domains have significant overlap, but the framework can be useful for identifying different types of attacks, the rationale behind them, and the range of options available to protect ourselves.

What can be done to prepare for malicious uses of AI across these domains? The authors provide many good examples. The scenarios described in the report can be a good way for researchers and policymakers to explore possible futures and brainstorm ways to manage the most critical threats. For example, imagining a commercial cleaning robot being repurposed as a non-traceable explosion device may scare us, but it also suggests why policies like robot registration requirements may be a useful option.

Each domain also has its own possible points of control and countermeasures. For example, to improve digital security, companies can promote consumer awareness and incentivize white hat hackers to find vulnerabilities in code. We may also be able to learn from the cybersecurity community and employ measures such as red teaming for AI development, formal verification in AI systems, and responsible disclosure of AI vulnerabilities. To improve physical security, policymakers may want to regulate hardware development and prohibit sales of lethal autonomous weapons. Meanwhile, media platforms may be able to minimize threats to political security by offering image and video authenticity certification, fake news detection, and encryption.

The report additionally provides four high level recommendations, which are not intended to provide specific technical or policy proposals, but rather to draw attention to areas that deserve further investigation. The recommendations are the following:

Recommendation #1: Policymakers should collaborate closely with technical researchers to investigate, prevent, and mitigate potential malicious uses of AI.

Recommendation #2: Researchers and engineers in artificial intelligence should take the dual-use nature of their work seriously, allowing misuse-related considerations to influence research priorities and norms, and proactively reaching out to relevant actors when harmful applications are foreseeable.

Recommendation #3: Best practices should be identified in research areas with more mature methods for addressing dual- use concerns, such as computer security, and imported where applicable to the case of AI.

Recommendation #4: Actively seek to expand the range of stakeholders and domain experts involved in discussions of these challenges.

Finally, the report identifies several areas for further research. The first of these is to learn from and with the cybersecurity community because the impacts of cybersecurity incidents will grow as AI-based systems become more widespread and capable. Other areas of research include exploring different openness models, promoting a culture of responsibility among AI researchers, and developing technological and policy solutions.

As the authors state, “The malicious use of AI will impact how we construct and manage our digital infrastructure as well as how we design and distribute AI systems, and will likely require policy and other institutional responses.”

Although this is only the beginning of the understanding needed on how AI will impact global security, this report moves the discussion forward. It not only describes numerous emergent security concerns related to AI, but also suggests ways we can begin to prepare for those threats today.

MIRI’s February 2018 Newsletter

Updates

News and links

  • In “Adversarial Spheres,” Gilmer et al. investigate the tradeoff between test error and vulnerability to adversarial perturbations in many-dimensional spaces.
  • Recent posts on Less Wrong: Critch on “Taking AI Risk Seriously” and Ben Pace’s background model for assessing AI x-risk plans.
  • Solving the AI Race“: GoodAI is offering prizes for proposed responses to the problem that “key stakeholders, including  developers, may ignore or underestimate safety procedures, or agreements, in favor of faster utilization”.
  • The Open Philanthropy Project is hiring research analysts in AI alignment, forecasting, and strategy, along with generalist researchers and operations staff.

This newsletter was originally posted on MIRI’s website.

Optimizing AI Safety Research: An Interview With Owen Cotton-Barratt

Artificial intelligence poses a myriad of risks to humanity. From privacy concerns, to algorithmic bias and “black box” decision making, to broader questions of value alignment, recursive self-improvement, and existential risk from superintelligence — there’s no shortage of AI safety issues.  

AI safety research aims to address all of these concerns. But with limited funding and too few researchers, trade-offs in research are inevitable. In order to ensure that the AI safety community tackles the most important questions, researchers must prioritize their causes.

Owen Cotton-Barratt, along with his colleagues at the Future of Humanity Institute (FHI) and the Centre for Effective Altruism (CEA), looks at this ‘cause prioritization’ for the AI safety community. They analyze which projects are more likely to help mitigate catastrophic or existential risks from highly-advanced AI systems, especially artificial general intelligence (AGI). By modeling trade-offs between different types of research, Cotton-Barratt hopes to guide scientists toward more effective AI safety research projects.

 

Technical and Strategic Work

The first step of cause prioritization is understanding the work already being done. Broadly speaking, AI safety research happens in two domains: technical work and strategic work.

AI’s technical safety challenge is to keep machines safe and secure as they become more capable and creative. By making AI systems more predictable, more transparent, and more robustly aligned with our goals and values, we can significantly reduce the risk of harm. Technical safety work includes Stuart Russell’s research on reinforcement learning and Dan Weld’s work on explainable machine learning, since they’re improving the actual programming in AI systems.

In addition, the Machine Intelligence Research Institute (MIRI) recently released a technical safety agenda aimed at aligning machine intelligence with human interests in the long term, while OpenAI, another non-profit AI research company, is investigating the “many research problems around ensuring that modern machine learning systems operate as intended,” following suggestions from the seminal paper Concrete Problems in AI Safety.

Strategic safety work is broader, and asks how society can best prepare for and mitigate the risks of powerful AI. This research includes analyzing the political environment surrounding AI development, facilitating open dialogue between research areas, disincentivizing arms races, and learning from game theory and neuroscience about probable outcomes for AI. Yale professor Allan Dafoe has recently focused on strategic work, researching the international politics of artificial intelligence and consulting for governments, AI labs and nonprofits about AI risks. And Yale bioethicist Wendell Wallach, apart from his work on “silo busting,” is researching forms of global governance for AI.

Cause prioritization is strategy work, as well. Cotton-Barratt explains, “Strategy work includes analyzing the safety landscape itself and considering what kind of work do we think we’re going to have lots of, what are we going to have less of, and therefore helping us steer resources and be more targeted in our work.”

 

 

 

 

 

 

 

 

 

 

 

Who Needs More Funding?

As the graph above illustrates, AI safety spending has grown significantly since 2015. And while more money doesn’t always translate into improved results, funding patterns are easy to assess and can say a lot about research priorities. Seb Farquhar, Cotton-Barratt’s colleague at CEA, wrote a post earlier this year analyzing AI safety funding and suggesting ways to better allocate future investments.

To start, he suggests that the technical research community acquire more personal investigators to take the research agenda, detailed in Concrete Problems in AI Safety, forward. OpenAI is already taking a lead on this. Additionally, the community should go out of its way to ensure that emerging AI safety centers hire the best candidates, since these researchers will shape each center’s success for years to come.

In general, Farquhar notes that strategy, outreach and policy work haven’t kept up with the overall growth of AI safety research. He suggests that more people focus on improving communication about long-run strategies between AI safety research teams, between the AI safety community and the broader AI community, and between policymakers and researchers. Building more PhD and Masters courses on AI strategy and policy could establish a pipeline to fill this void, he adds.

To complement Farquhar’s data, Cotton-Barratt’s colleague Max Dalton created a mathematical model to track how more funding and more people working on a safety problem translate into useful progress or solutions. The model tries to answer such questions as: if we want to reduce AI’s existential risks, how much of an effect do we get by investing money in strategy research versus technical research?

In general, technical research is easier to track than strategic work in mathematical models. For example, spending more on strategic ethics research may be vital for AI safety, but it’s difficult to quantify that impact. Improving models of reinforcement learning, however, can produce safer and more robustly-aligned machines. With clearer feedback loops, these technical projects fit best with Dalton’s models.

 

Near-sightedness and AGI

But these models also confront major uncertainty. No one really knows when AGI will be developed, and this makes it difficult to determine the most important research. If AGI will be developed in five years, perhaps researchers should focus only on the most essential safety work, such as improving transparency in AI systems. But if we have thirty years, researchers can probably afford to dive into more theoretical work.

Moreover, no one really knows how AGI will function. Machine learning and deep neural networks have ushered in a new AI revolution, but AGI will likely be developed on architectures far different from AlphaGo and Watson.

This makes some long-term safety research a risky investment, even if, as many argue, it is the most important research we can do. For example, researchers could spend years making deep neural nets safe and transparent, only to find their work wasted when AGI develops on an entirely different programming architecture.

Cotton-Barratt attributes this issue to ‘nearsightedness,’ and discussed it in a recent talk at Effective Altruism Global this summer. Humans often can’t anticipate disruptive change, and AI researchers are no exception.

“Work that we might do for long-term scenarios might turn out to be completely confused because we weren’t thinking of the right type of things,” he explains. “We have more leverage over the near-term scenarios because we’re more able to assess what they’re going to look like.”

Any additional AI safety research is better than none, but given the unknown timelines and the potential gravity of AI’s threats to humanity, we’re better off pursuing — to the extent possible — the most effective AI safety research.

By helping the AI research portfolio advance in a more efficient and comprehensive direction, Cotton-Barratt and his colleagues hope to ensure that when machines eventually outsmart us, we will have asked — and hopefully answered — the right questions.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. If you’re interested in applying for our 2018 grants competition, please see this link.

As Acidification Increases, Ocean Biodiversity May Decline

Dubbed “the evil twin of global warming,” ocean acidification is a growing crisis that poses a threat to both water-dwelling species and human communities that rely on the ocean for food and livelihood.

Since pre-industrial times, the ocean’s pH has dropped from 8.2 to 8.1—a change that may seem insignificant, but actually represents a 30 percent increase in acidity. As the threat continues to mount, the German research project  BIOACID (Biological Impacts of Ocean Acidification) seeks to provide a better understanding of the phenomenon by studying its effects around the world.

BIOACID began in 2009, and since that time, over 250 German researchers  have contributed more than 580 publications to the scientific discourse on the effects of acidification and how the  oceans are changing.

The organization recently released a report that synthesizes their most notable findings for climate negotiators and decision makers. Their work explores “how different marine species respond to ocean acidification, how these reactions impact the food web as well as material cycles and energy turnover in the ocean, and what consequences these changes have for economy and society.”

Field research for the project has spanned multiple oceans, where key species and communities have been studied under natural conditions. In the laboratory, researchers have also been able to test for coming changes by exposing organisms to simulated future conditions.

Their results indicate that acidification is only one part of a larger problem. While organisms might be capable of adapting to the shift in pH, acidification is typically accompanied by other environmental stressors that make adaptation all the more difficult.

In some cases, marine life that had been able to withstand acidification by itself could not tolerate the additional stress of increased water temperatures, researchers found. Other factors like pollution and eutrophication—an excess of nutrients—compounded the harm.

Further, rising water temperatures are forcing many species to abandon part or all of their original habitats, wreaking additional havoc on ecosystems. And a 1.2 degree increase in global temperature—which is significantly under the 2 degree limit set in the Paris Climate Agreements—is expected to kill at least half of the world’s tropical coral reefs.

Acidification itself is a multipronged threat. When carbon dioxide is absorbed by the ocean, a series of chemical reactions take place. These reactions have two important outcomes: acid levels increase and the compound carbonate is transformed into bicarbonate. Both of these results have widespread effects on the organisms who make their homes in our oceans.

Increased acidity has a particularly harmful effect on organisms in their early life stages, such as fish larvae. This means, among other things, the depletion of fish stocks—a cornerstone of the economy as well as diet in many human communities. Researchers “have found that both work synergistically, especially on the most sensitive early life stages of as well as embryo and larval survival.”

Many species are harmed as well by the falling levels of carbonate, which is an essential building block for organisms like coral, mussels, and some plankton. Like all calcifying corals, the cold-water coral species Lophelia pertusa builds its skeleton from calcium carbonate. Some research suggests that acidification threatens both to slow its growth and to corrode the dead branches that are no longer protected by organic matter.

As a “reef engineer,” Lophelia is home to countless species; as it suffers, so will they. The BIOACID report warns: “o definitely preserve the magnificent oases of biodiversity founded by Lophelia pertusa, effects of climate change need to be minimised even now–while science continues to investigate this complex marine ecosystem.”

Even those organisms not directly affected by acidification may find themselves in trouble as their ecosystems are thrown out of balance. Small changes at the bottom of the food web, for example, may have big effects at higher trophic levels. In the Artic, Limacina helicina—a tiny swimming snail or “sea butterfly—is a major source of food for many marine animals. The polar cod species Boreogadus saida, which feeds on Limacina, is a key food source for larger fish, birds, and mammals such as whales and seals.

As acidification increases, research suggests that Limacina’s nutrional value will decrease as its metabolism and shell growth are affected; its numbers, too, will likely drop. With the disappearance of this prey, the polar cod will likely suffer. Diminishing cod populations will in turn affect the many predators who feed on them.

Even where acidification stands to benefit a particular species, the overall impact on the ecosystem can be negative. In the Baltic Sea, BIOACID scientists have found that Nodularia spumigena, a species of cyanobacteria, “manages perfectly with water temperatures above 16 degrees Celsius and elevated carbon dioxide concentrations–whereas other organisms already reach their limits at less warming.”

Nodularia becomes more productive under acidified conditions, producing bacterial “blooms” that can extend upwards of 60,000 square kilometers in the Baltic Sea. These blooms block light from other organisms, and as dead bacteria degrade near the ocean floor they take up precious oxygen. The cells also release toxins that are harmful to marine animals and humans alike.

Ultimately biodiversity, “a basic requirement for ecosystem functioning and ultimately even human wellbeing,” will be lost. Damage to tropical coral reefs, which are home to one quarter of all marine species, could drastically reduce the ocean’s biodiversity. And as biodiversity decreases, an ecosystem becomes more fragile: ecological functions that were once performed by several different species become entirely dependent on only one.

And the diversity of marine ecosystems is not the only thing at stake. Currently, the ocean plays a major mitigating role in global warming, absorbing around 30 percent of the carbon dioxide emitted by humans. It also absorbs over 90 percent of the heat produced by the greenhouse effect. But as acidification continues, the ocean will take up less and less carbon dioxide—meaning we may see an increase in the rate of global warming.

The ocean controls carbon dioxide uptake in part through a biological mechanism known as the carbon pump. Normally, phytoplankton near the ocean’s surface take up carbon dioxide and then sink towards the ocean floor. This process lowers surface carbon dioxide concentrations, facilitating its uptake from the atmosphere.

But acidification weakens this biological carbon pump. Researchers have found that acidified conditions favor smaller types of phytoplankton, which sink more slowly. In addition, heavier calcifying plankton—which typically propel the pump by sinking more quickly—will have increasing difficulty forming their weighty calcium carbonate shells. As the pump’s efficiency decreases, so will the uptake of carbon dioxide from the air.

The BIOACID report stresses that the risks of acidification remain largely uncertain. However, despite — or perhaps because of — this, society must tread cautiously with care of the oceans. The report explains, “Following the precautionary principle is the best way to act when considering potential risks to the environment and humankind, including future generations.”

Transparent and Interpretable AI: an interview with Percy Liang

At the end of 2017, the United States House of Representatives passed a bill called the SELF DRIVE Act, laying out an initial federal framework for autonomous vehicle regulation. Autonomous cars have been undergoing testing on public roads for almost two decades. With the passing of this bill, along with the increasing safety benefits of autonomous vehicles, it is likely that they will become even more prevalent in our daily lives. This is true for numerous autonomous technologies including those in the medical, legal, and safety fields – just to name a few.

To that end, researchers, developers, and users alike must be able to have confidence in these types of technologies that rely heavily on artificial intelligence (AI). This extends beyond autonomous vehicles, applying to everything from security devices in your smart home to the personal assistant in your phone.

 

Predictability in Machine Learning

Percy Liang, Assistant Professor of Computer Science at Stanford University, explains that humans rely on some degree of predictability in their day-to-day interactions — both with other humans and automated systems (including, but not limited to, their cars). One way to create this predictability is by taking advantage of machine learning.

Machine learning deals with algorithms that allow an AI to “learn” based on data gathered from previous experiences. Developers do not need to write code that dictates each and every action or intention for the AI. Instead, the system recognizes patterns from its experiences and assumes the appropriate action based on that data. It is akin to the process of trial and error.

A key question often asked of machine learning systems in the research and testing environment is, “Why did the system make this prediction?” About this search for intention, Liang explains:

“If you’re crossing the road and a car comes toward you, you have a model of what the other human driver is going to do. But if the car is controlled by an AI, how should humans know how to behave?”

It is important to see that a system is performing well, but perhaps even more important is its ability to explain in easily understandable terms why it acted the way it did. Even if the system is not accurate, it must be explainable and predictable. For AI to be safely deployed, systems must rely on well-understood, realistic, and testable assumptions.

Current theories that explore the idea of reliable AI focus on fitting the observable outputs in the training data. However, as Liang explains, this could lead “to an autonomous driving system that performs well on validation tests but does not understand the human values underlying the desired outputs.”

Running multiple tests is important, of course. These types of simulations, explains Liang, “are good for debugging techniques — they allow us to more easily perform controlled experiments, and they allow for faster iteration.”

However, to really know whether a technique is effective, “there is no substitute for applying it to real life,” says Liang, “ this goes for language, vision, and robotics.” An autonomous vehicle may perform well in all testing conditions, but there is no way to accurately predict how it could perform in an unpredictable natural disaster.

 

Interpretable ML Systems

The best-performing models in many domains — e.g., deep neural networks for image and speech recognition — are obviously quite complex. These are considered “blackbox models,” and their predictions can be difficult, if not impossible, for them to explain.

Liang and his team are working to interpret these models by researching how a particular training situation leads to a prediction. As Liang explains, “Machine learning algorithms take training data and produce a model, which is used to predict on new inputs.”

This type of observation becomes increasingly important as AIs take on more complex tasks – think life or death situations, such as interpreting medical diagnoses. “If the training data has outliers or adversarially generated data,” says Liang, “this will affect (corrupt) the model, which will in turn cause predictions on new inputs to be possibly wrong.  Influence functions allow you to track precisely the way that a single training point would affect the prediction on a particular new input.”

Essentially, by understanding why a model makes the decisions it makes, Liang’s team hopes to improve how models function, discover new science, and provide end users with explanations of actions that impact them.

Another aspect of Liang’s research is ensuring that an AI understands, and is able to communicate, its limits to humans. The conventional metric for success, he explains, is average accuracy, “which is not a good interface for AI safety.” He posits, “what is one to do with an 80 percent reliable system?”

Liang is not looking for the system to have an accurate answer 100 percent of the time. Instead, he wants the system to be able to admit when it does not know an answer. If a user asks a system “How many painkillers should I take?” it is better for the system to say, “I don’t know” rather than making a costly or dangerous incorrect prediction.

Liang’s team is working on this challenge by tracking a model’s predictions through its learning algorithm — all the way back to the training data where the model parameters originated.

Liang’s team hopes that this approach — of looking at the model through the lens of the training data — will become a standard part of the toolkit of developing, understanding, and diagnosing machine learning. He explains that researchers could relate this to many applications: medical, computer, natural language understanding systems, and various business analytics applications.

“I think,” Liang concludes, “there is some confusion about the role of simulations some eschew it entirely and some are happy doing everything in simulation. Perhaps we need to change culturally to have a place for both.

In this way, Liang and his team plan to lay a framework for a new generation of machine learning algorithms that work reliably, fail gracefully, and reduce risks.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. If you’re interested in applying for our 2018 grants competition, please see this link.

AI Open Letter German