Posts in this category get featured at the top of the front page.

Podcast: What Are the Odds of Nuclear War? A Conversation With Seth Baum and Robert de Neufville

What are the odds of a nuclear war happening this century? And how close have we been to nuclear war in the past? Few academics focus on the probability of nuclear war, but many leading voices like former US Secretary of Defense, William Perry, argue that the threat of nuclear conflict is growing.

On this month’s podcast, Ariel spoke with Seth Baum and Robert de Neufville from the Global Catastrophic Risk Institute (GCRI), who recently coauthored a report titled A Model for the Probability of Nuclear War. The report examines 60 historical incidents that could have escalated to nuclear war and presents a model for determining the odds are that we could have some type of nuclear war in the future.

Topics discussed in this episode include:

  • the most hair-raising nuclear close calls in history
  • whether we face a greater risk from accidental or intentional nuclear war
  • China’s secrecy vs the United States’ transparency about nuclear weapons
  • Robert’s first-hand experience with the false missile alert in Hawaii
  • and how researchers can help us understand nuclear war and craft better policy

Links you might be interested in after listening to the podcast:

You can listen to this podcast above or read the transcript below.

 

 

Ariel: Hello, I’m Ariel Conn with the Future of Life Institute. If you’ve been listening to our previous podcasts, welcome back. If this is new for you, also welcome, but in any case, please take a moment to follow us, like the podcast, and maybe even share the podcast.

Today, I am excited to present Seth Baum and Robert de Neufville with the Global Catastrophic Risk Institute (GCRI). Seth is the Executive Director and Robert is the Director of Communications, he is also a super forecaster, and they have recently written a report called A Model for the Probability of Nuclear War. This was a really interesting paper that looks at 60 historical incidents that could have escalated to nuclear war and it basically presents a model for how we can determine what the odds are that we could have some type of nuclear war in the future. So, Seth and Robert, thank you so much for joining us today.

Seth: Thanks for having me.

Robert: Thanks, Ariel.

Ariel: Okay, so before we get too far into this, I was hoping that one or both of you could just talk a little bit about what the paper is and what prompted you to do this research, and then we’ll go into more specifics about the paper itself.

Seth: Sure, I can talk about that a little bit. So the paper is a broad overview of the probability of nuclear war, and it has three main parts. One is a detailed background on how to think about the probability, explaining differences between the concept of probability versus the concept of frequency and related background in probability theory that’s relevant for thinking about nuclear war. Then there is a model that scans across a wide range, maybe the entire range, but at least a very wide range of scenarios that could end up in nuclear war. And then finally, is a data set of historical incidents that at least had some potential to lead to nuclear war, and those incidents are organized in terms of the scenarios that are in the model. The historical incidents give us at least some indication of how likely each of those scenario types are to be.

Ariel: Okay. At the very, very start of the paper, you guys say that nuclear war doesn’t get enough scholarly attention, and so I was wondering if you could explain why that’s the case and what role this type of risk analysis can play in nuclear weapons policy.

Seth: Sure, I can talk to that. The paper, I believe, specifically says that the probability of nuclear war does not get much scholarly attention. In fact, we put a fair bit of time into trying to find every previous study that we could, and there was really, really little that we were able to find, and maybe we missed a few things, but my guess is that this is just about all that’s out there and it’s really not very much at all. We can only speculate on why there has not been more research of this type, my best guess is that the people who have studied nuclear war — and there’s a much larger literature on other aspects of nuclear war — they just do not approach it from a risk perspective as we do, that they are inclined to think about nuclear war from other perspectives and focus on other aspects of it.

So the intersection of people who are both interested in studying nuclear war and tend to think in quantitative risk terms is a relatively small population of scholars, which is why there’s been so little research, is at least my best guess.

Robert: Yeah, it’s a really interesting question. I think that the tendency has been to think about it strategically, something we have control over, somebody makes a choice to push a button or not, and that makes sense from some perspective. I think there’s also a way in which we want to think about it as something unthinkable. There hasn’t been a nuclear detonation in a long time and we hope that there will never be another one, but I think that it’s important to think about it this way so that we can find the ways that we can mitigate the risk. I think that’s something that’s been neglected.

Seth: Just one quick clarification, there have been very recent nuclear detonations, but those have all been tests detonations, not detonations in conflict.

Robert: Fair enough. Right, not a use in anger.

Ariel: That actually brings up a question that I have. As you guys point out in the paper, we’ve had one nuclear war and that was World War II, so we essentially have one data point. How do you address probability with so little actual data?

Seth: I would say “carefully,” and this is why the paper itself is very cautious with respect to quantification. We don’t actually include any numbers for the probability of nuclear war in this paper.

The easy thing to do for calculating probabilities is when you have a large data set of that type of event. If you want to calculate the probability of dying in a car crash, for example, there’s lots of data on that because it’s something that happens with a fairly high frequency. Nuclear war, there’s just one data point and it was under circumstances that are very different from what we have right now, World War II. Maybe there would be another world war, but no two world wars are the same. So we have to, instead, look at all the different types of evidence that we can bring in to get some understanding for how nuclear war could occur, which includes evidence about the process of going from calm into periods of tension, or the thought of going to nuclear war all the way to the actual decision to initiate nuclear war. And then also look at a wider set of historical data, which is something we did in this paper, looking at incidents that did not end up as nuclear wars, but pushed at least a little bit in that direction, to see what we can learn about how likely it is for things to go in the direction of nuclear war, which tells us at least something about how likely it is to get there all the way.

Ariel: Robert, I wanted to turn to you on that note, you were the person who did a lot of work figuring out what these 60 historical events were. How did you choose them?

Robert: Well, I wouldn’t really say I chose them, I tried to just find every event that was there. There are a few things that we left out because we thought it falls below some threshold of the seriousness of the incident, but in theory you could probably expand it in the scope even a little wider than we did. But to some extent we just looked at what’s publicly known. I think the data set is really valuable, I hope it’s valuable, but one of the issues with it is it’s kind of a convenience sample of the things that we know about, and some areas, some parts of history, are much better reported on than others. For example, we know a lot about the Cuban Missile Crisis in the 1960s, a lot of research has been done on that, there are the times when the US government has been fairly transparent about incidents, but we know less about other periods and other countries as well. We don’t have incidents from China’s nuclear program, but that doesn’t mean there weren’t any, it just means it’s hard to figure out, and that scenario would be really interesting to do more research on.

Ariel: So, what was the threshold you were looking at to say, “Okay, I think this could have gone nuclear”?

Robert: Yeah, that’s a really good question. It’s somewhat hard to say. I think that a lot of these things are judgment calls. If you look at the history of incidents, I think a number of them have been blown a little bit out of proportion. As they’ve been retold, people like to say we came close to nuclear war, and that’s not always true. There are other incidents which are genuinely hair-raising and then there are some incidents that seem very minor, that you could say maybe it could have gotten to a nuclear war. But there was some safety incident on an Air Force Base and they didn’t follow procedures, and you could maybe tell yourself a story in which that led to a nuclear war, but at some point you make a judgment call and say, well, that doesn’t seem like a serious issue.

But it wasn’t like we have a really clear, well-defined line. In some ways, we’d like to broaden the data set so that we can include even smaller incidents just because the more incidents, the better as far as understanding, not the more incidents the better as far as being safe.

Ariel: Right. I’d like this question to go to both of you, as you were looking through these historical events, you mentioned that they were already public records so they’re not new per se, but were there any that surprised you, and which were one or two that you found the most hair-raising?

Robert: Well, I would say one that surprised me, and this may just be because of my ignorance of certain parts of geopolitical history, but there was an incident with the USS Liberty in the Mediterranean, in which the Israelis mistook it for an Egyptian destroyer and they decided to take it out, essentially, not realizing it was actually an American research vessel, and they did, and what happened was the US scrambled planes to respond. The problem was that most of the planes, or the ordinary planes they would have ordinarily scrambled, were out on some other sorties, some exercise, something like that, and they ended up scrambling planes which had a nuclear payload on them. These planes were recalled pretty quickly. They mentioned this to Washington and the Secretary of Defense got on the line and said, “No, recall those planes,” so it didn’t get that far necessarily, but I found it a really shocking incident because it was a friendly fire confusion, essentially, and there were a number of cases like that in which nuclear weapons were involved because they happened to be on equipment where they shouldn’t have been that was used to respond to some kind of a real or false emergency. That seems like a bigger issue than I would’ve at first expected, that just the fact that nuclear weapons are lying around somewhere where they could be involved with something.

Ariel: Wow, okay. And Seth?

Seth: Yeah. For me this was a really eye-opening experience. I had some familiarity with the history of incidents involving nuclear weapons, but there turned out to be much more that’s gone on over the years than I really had any sense for. Some of it is because I’m not a historian, this is not my specialty, but there were any number of events that it appears that the nuclear weapons were, at least may have been, seriously considered for use in a conflict.

Just to pick one example, in 1954 and 1955 was known as the first Taiwan Straits Crisis, and the second crisis, by the way, in 1958, also included plans for nuclear weapons use. But in the first one there were plans made up by the United States, the Joint Chiefs of Staff allegedly recommended that nuclear weapons be used against China if the conflict intensified and that President Eisenhower was apparently pretty receptive to this idea. In the end, there was a ceasefire negotiated so it didn’t come to that, but had that ceasefire not been made, my sense is that … The historical record is not clear on whether the US would’ve used nuclear weapons or not, maybe even the US leadership hadn’t made any final decisions on this matter, but there any number of these events, especially earlier in the years or decades after World War II when nuclear weapons were still relatively new, in which the use of nuclear weapons in conflict seemed to at least get a serious consideration that I might not have expected.

I’m accustomed to thinking of nuclear weapons as having a fairly substantial taboo attached to them, but I feel like the taboo has perhaps strengthened over the years, such that leadership now is less inclined to give the use of nuclear weapons serious consideration than it was back then. That may be mistaken, but that’s the impression that I get and that we may be perhaps more fortunate to have gotten through the first couple decades after World War II without an additional nuclear war. But it might be less likely at this time, though still not entirely impossible by any means.

Ariel: Are you saying that you think the risk is higher now?

Seth: I think the risk is probably higher now. I think I would probably say that the risk is higher now than it was, say, 10 years ago because various relations between nuclear armed states have gotten worse, certainly including between the United States and Russia, but whether the probability of nuclear war is higher now versus in, say, the ’50s or the ’60s, that’s much harder to say. That’s a degree of detail that I don’t think we can really comment on conclusively based on the research that we have at this point.

Ariel: Okay. In a little while I’m going to want to come back to current events and ask about that, but before I do that I want to touch first on the model itself, which lists four steps to a potential nuclear war: initiating the event, crisis, nuclear weapon use and full-scale nuclear war. Could you talk about what each of those four steps might be? And then I’m going to have follow-up questions about that next.

Seth: I can say a little bit about that. The model you’re describing is a model that was used by our colleague, Martin Hellman, in a paper that he did on the probability of nuclear war, and that was probably the first paper that develops the study of the probability of nuclear war using the sort of methodology that we use in this paper, which is to develop nuclear war scenarios.

So the four steps in this model are four steps to go from a period of calm into a full-scale nuclear war. His paper was looking at the probability of nuclear war based on an event that is similar to the Cuban Missile Crisis, and what’s distinctive about the Cuban Missile Crisis is we may have come close to going directly to nuclear war without any other type of conflicts in the first place. So that’s where the initiating event and the crisis in this model comes from, it’s this idea that there will be some of event that leads to a crisis, and the crisis will go straight to nuclear weapons use which could then scale to a full-scale nuclear war. The value of breaking it into those four steps is then you can look at each step in turn, think through the conditions for each of them to occur and maybe the probability of going from one step to the next, which you can use to evaluate the overall probability of that type of nuclear war. That’s for one specific type of nuclear war. Our paper then tries to scan across the full range of different types of nuclear war, different nuclear war scenarios, and put that all into one broader model.

Ariel: Okay. Yeah, your paper talks about 14 scenarios, correct?

Seth: That’s correct, yes.

Ariel: Okay, yeah. So I guess I have two questions for you: one, how did you come up with these 14 scenarios, and are there maybe a couple that you think are most worrisome?

Seth: So the first question we can definitely answer, we came up with them through our read of the nuclear war literature and our overall understanding of the risk and then iterating as we put the model together, thinking through what makes the most sense for how to organize the different types of nuclear war scenarios, and through that process, that’s how we ended up with this model.

As far as which ones seem to be the most worrisome, I would say a big question is whether we should be more worried about intentional versus accidental, or inadvertent nuclear war. I feel like I still don’t actually have a good answer to that question. Basically, should we be more worried about nuclear war that happens when a nuclear armed country decides to go ahead and start that nuclear war versus one where there’s some type of accident or error, like a false alarm or the detonation of a nuclear weapon that was not intended to be an act of war? I still feel like I don’t have a good sense for that.

Maybe the one thing I do feel is that it seems less likely that we would end up in a nuclear war from a detonation of a nuclear weapon that was not intentionally an act of war just because it feels to me like those events are less likely to happen. This would be nuclear terrorism or the accidental detonation of nuclear weapons, and even if it did happen it’s relatively likely that they would be correctly diagnosed as not being an act of war. I’m not certain of this. I can think of some reasons why maybe we should be worried about that type of scenario, but especially looking at the historical data it felt like those historical incidents were a bit more of a stretch, a bit further away from actually ending up in nuclear war.

Robert, I’m actually curious, your reaction to that, if you agree or disagree with that.

Robert: Well, I don’t think that non-state actors using a nuclear weapon is the big risk right now. But as far as whether it’s more likely that we’re going to get into a nuclear war through some kind of human error or a technological mistake, or whether it will be a deliberate act of war, I can think of scary things that have happened on both sides. I mean, the major thing that looms in one’s mind when you think about this is the Cuban Missile Crisis, and that’s an example of a crisis in which there were a lot of incidents during the course of that crisis where you think, well, this could’ve gone really badly, this could’ve gone the other way. So a crisis like that where tensions escalate and each country, or in this case the US and Russia, each thought the other might seriously threaten the homeland, I think are very scary.

On the other hand, there are incidents like the 1995 Norwegian rocket incident, which I find fairly alarming. In that incident, what happened was Norway was launching a scientific research rocket for studying the weather and had informed Russia that they were going to do this, but somehow that message hadn’t got passed along to the radar technicians, so the radar technician saw what looked like a submarine launched ballistic missile that could have been used to do an EMP, a burst over Russia which would then maybe take out radar and could be the first move in a full-scale attack. So this is scary because this got passed up the chain and supposedly, President Boris Yeltsin, it was Yeltsin at the time, actually activated the nuclear football in case he needed to authorize a response.

Now, we don’t really have a great sense how close anyone came to this, this is a little hyperbole after the fact, but this kind of thing seems like you could get there. And 1995 wasn’t a time of big tension between the US and Russia, so this kind of thing is also pretty scary and I don’t really know, I think that which risk you would find scarier depends a little bit on the current geopolitical climate. Right now, I might be most worried that the US would launch a bloody-nose attack against North Korea and North Korea would respond with a nuclear weapon, so it depends a little bit. I don’t know the answer either, I guess, is my answer.

Ariel: Okay. You guys brought up a whole bunch of things that I had planned to ask about, which is good. I mean, one of my questions had been are you more worried about intentional or accidental nuclear war, and I guess the short answer is, you don’t know? Is that fair to say?

Seth: Yeah, that’s pretty fair to say. The short answer is, at least at this time, they both seem very much worth worrying about.

As far as which one we should be more worried about, this is actually a very important detail to try to resolve for policy purposes because this speaks directly to how we should manage our nuclear weapons. For example, if we are especially worried about accidental or inadvertent nuclear war, then we should keep nuclear weapons on a relatively low launch posture. They should not be on hair-trigger alert because when things are on a high-alert status, it takes relatively little for the nuclear weapons to be launched and makes it easier for a mistake to lead to a launch. Versus if we are more worried about intentional nuclear war, then there may be some value to having them on a high-alert status in order to have a more effective deterrence in order to convince the other side to not launch their nuclear weapons. So this is an important matter to try resolving, but at this point, based on the research that we have so far, it remains, I think, somewhat ambiguous.

Ariel: I do want to follow up with that. Everything I’ve read, there doesn’t seem to be any benefit really to having things like our intercontinental ballistic missiles on hair-trigger alert, which are the ones that are on hair-trigger alert is my understanding, because submarines and the bombers still have the capability to strike back. Do you disagree with that?

Seth: I can’t say for sure whether or not I do disagree with that because it’s not something that I have looked at closely enough, so I would hesitate to comment on that matter. My general understanding is that hair-trigger alert is used as a means to enhance deterrence in order to make it less likely that either side would use their nuclear weapons in the first place, but regarding the specifics of it, that’s not something that I’ve personally looked at closely enough to really be able to comment on.

Robert: I think Seth’s right that it’s a question that needs more research in a lot of ways and that we shouldn’t answer it in the context of… We didn’t figure out the answer to that in this paper. I will say, I would personally sleep better if they weren’t on hair-trigger alert. My suspicion is that the big risk is not that one side launches some kind of decapitating first strike, I don’t think that’s really a very high risk, so I’m not as concerned as someone else might be about how well we need to deter that, how quickly we need to be able to respond. Whereas, I am very concerned about the possibility of an accident because… I mean, readings these incidents will make you concerned about it, I think. Some of them are really frightening. So that’s my intuition, but, as Seth says, I don’t think we really know. There’s more, at least in terms of this model, there’s more studying we need to do.

Seth: If I may, to one of your earlier questions regarding motivations for doing this research in the first place, I feel like to try giving more rigorous answers to some of these very basic nuclear weapons policy questions, like “should nuclear weapons be on hair-trigger alert, is that safer or more dangerous,” we can talk a little bit about what the trade-offs might be, but we don’t really have much to say about how that trade-off actually would be resolved. This is where I think that it’s important for the international security community to be trying harder to analyze the risks in these structured and, perhaps, even quantitative terms so that we can try to answer these questions more rigorously than just, this is my intuition, this is your intuition. That’s really, I think, one of the main values for doing this type of research is to be able to answer these important policy questions with more confidence and also perhaps, more consensus across different points of view than we would otherwise be able to have.

Ariel: Right. I had wanted to continue with some of the risk questions, but while we’re on the points that you’re making, Seth, what do you see moving forward with this paper? I mean, it was a bummer to read the paper and not get what the probabilities of nuclear war actually are, just a model for how we can get there, how do you see either you, or other organizations, or researchers, moving forward to start calculating what the probability could actually be?

Seth: The paper does not give us final answers for what the probability would be, but it definitely makes some important steps in that direction. Additional steps that can be taken would include things like exploring the historical incidence data set more carefully to check to see if there may be important incidents that have been missed, to see for each of the incidents how close do we really think that that came to nuclear war? And this is something that the literature on these incidents actually diverges on. There are some people who look at these incidents and see them as being really close calls, other people look at them and see them as being evidence that the system works as it should, that, sure, there were some alarms but the alarms were handled the way that they should be handled and that the tools are in place to make sure that those don’t end in nuclear war. So exactly how close these various incidents got is one important way forward towards quantifying the probability.

Another one is to come up with some sense for what the actual population of historical incidences relative to the data set that we have, we are presumably missing some number of historical incidents, some of them might be smaller and less important, but there might be some big ones that maybe they happened and we don’t know about it because they are only in literatures in other languages, we only did research in English, or because all of the evidence about them is classified government records by whichever governments were involved in the incident, and so we need to-

Ariel: Actually, I do actually want to interrupt with a question real quick there, and my apologies for not having read this closer, I know there were incidents involving the US, Russia, and I think you guys had some about Israel. Were there incidents mentioning China or any of the European countries that have nuclear weapons?

Seth: Yeah, I think there were probably incidents involving all of the nuclear armed countries, certainly involving China. For example, China had a war with the Soviet Union over their border some years ago and there was at least some talk of nuclear weapons involved in that. Also, the one I mentioned earlier, the Taiwan Straits Crises, those involved China. Then there were multiple incidents between India and Pakistan, especially regarding the situation in Kashmir. With France, I believe we included one incident in which a French nuclear bomber got a faulty signal to take off in combat and then it was eventually recalled before it got too far. There might’ve been something with the UK also. Robert, do you recall if there were any with the UK?

Robert: Yes, there was, during the Falklands war, apparently, they left with nuclear depth charges. It’s actually not really, honestly clear to me why you would use a nuclear depth charge, but there’s not any evidence they ever intended to use them but they sent out nuclear armed ships, essentially, to deal with a crisis in the Falklands.

There’s also, I think, an incident in South Africa as well when South Africa was briefly a nuclear state.

Ariel: Okay. Thanks. It’s not at all disturbing.

Robert: It’s very disturbing. I will say, I think that China is the one we know the least about. Some of the incidents that Seth mentioned with China, the danger or the nuclear armed power that might have used nuclear weapons was the United States. So there is the Soviet-China incident, but we don’t really know a lot about the Chinese program and Chinese incidents. I think some of that is because it’s not reported in English and to some extent it’s also that it’s classified and the Chinese are not as open about what’s going on.

Seth: Yeah, the Chinese are definitely much, much less transparent than the United States, as are the Russians. I mean, the United States might be the most transparent out of all of the nuclear armed countries.

I remember some years ago when I was spending time at the United Nations I got the impression that the Russians and the Chinese were actually not quite sure what to make of the Americans’ transparency, that they found it hard to believe that the US government was not just putting out loads of propaganda and misinformation that it didn’t make sense to them that we just actually put out a lot of honest data about government activities here, and that’s just the standard and that you can actually trust this information, this data. So yeah, we may be significantly underestimating the number of incidents involving China and perhaps Russia and other countries because their governments are less transparent.

Ariel: Okay. That definitely addresses a question that I had, and my apologies for interrupting you earlier.

Seth: No, that’s fine. But this is one aspect of the research that still remains to be done that would help us figure out what the probabilities might be. It would be a mistake to just calculate them based on the data set as it currently stands, because this is likely to be only a portion of the actual historical incidents that may have ended in nuclear war.

So these are the sorts of details and nuances that were, unfortunately, beyond the scope of the project that we were able to do, but it would be important work for us or other research groups to do to take us closer to having good probability estimates.

Ariel: Okay. I want to ask a few questions that, again, are probably going to be you guys guessing as opposed to having good, hard information, and I also wanted to touch a little bit on some current events. So first, one of the things that I hear a lot is that if a nuclear war is going to happen, it’s much more likely to happen between India and Pakistan than, say, the US and Russia or US and … I don’t know about US and North Korea at this point, but I’m curious what your take on that is, do you feel that India and Pakistan are actually the greatest risk or do you think that’s up in the air?

Robert: I mean, it’s a really tough question. I would say that India and Pakistan is one of the scariest situations for sure. I don’t think they have actually come that close, but it’s not that difficult to imagine a scenario in which they would. I mean, these are nuclear powers that occasionally shoot at each other across the line of control, so I do think that’s very scary.

But I also think, and this is an intuition, this isn’t a conclusion that we have from the paper, but I also think that the danger of something happening between the United States and Russia is probably underestimated, because we’re not in the Cold War anymore, relations aren’t necessarily good, it’s not clear what relations are, but people will say things like, “Well, neither side wants a war.” Obviously neither side wants a war, but I think there’s a danger of the kind of inadvertent escalation, miscalculation, and that hasn’t really gone away. So that’s something I think is probably not given enough attention. I’m also concerned about the situation in North Korea. I think that that is now an issue which we have to take somewhat seriously.

Seth: I think the last five years or so have been a really good learning opportunity for all of us on these matters. I remember having conversations with people about this, maybe five years ago, and they thought the thought of a nuclear war between the United States and Russia was just ridiculous, that that’s antiquated Cold War talk, that the world has changed. And they were right and their characterization of the world as it was at that moment, but I was always uncomfortable with that because the world could change again. And sure enough, in the last five years, the world has changed very significantly that I think most people would agree makes the probability of nuclear war between the United States and Russia substantially higher than it was five years ago, especially starting with the Ukraine crisis.

There’s also just a lot of basic volatility in the international system that I think is maybe underappreciated, that we might like to think of it as being more deterministic, more logical than it actually is. The classic example is that World War I maybe almost didn’t happen, that it only happened because a very specific sequence of events happened that led to the assassination of Archduke Ferdinand and had that gone a little bit differently, he wouldn’t have been assassinated and World War I wouldn’t have happened and the world we live in now would be very different than what it is. Or, to take a more recent example, it’s entirely possible that had the 2016 FBI director not made an unusual decision regarding the disclosure of information regarding one candidate’s emails a couple weeks before the election, the outcome of the 2016 US election might’ve gone different and international politics would look quite different than it is right now. Who knows what will happen next year or the year after that.

So I think we can maybe make some generalizations about which conflicts seem more likely or less likely, especially at the moment, but we should be really cautious about what we think it’s going to be overall over 5, 10, 20, 30 year periods just because things really can change substantially in ways that may be hard to see in advance.

Robert: Yeah, for me, one of the lessons of World War I is not so much that it might not have happened, I think it probably would have anyway — although Seth is right, things can be very contingent — but it’s more that nobody really wanted World War I. I mean, at the time people thought it wouldn’t happen because it was sort of bad for everyone and no one thought, “Well, this is in our interest to pursue it,” but wars can happen that way where countries end up thinking, for one reason or another, they need to go, they need to do one thing or another that leads to war when in fact everyone would prefer to have gotten together and avoided it. It’s suboptimal equilibrium. So that’s one thing.

The other thing is that, as Seth says, things change. I’m not that concerned about what’s going on in the week that we’re recording this, but we had this week the Russian ambassador saying he would shoot down US missiles aimed at Syria and the United States’ president responding on Twitter, that they better get ready for his smart missiles. This is, I suspect, won’t escalate to a nuclear war. I’m not losing that much asleep about it. But this is the kind of thing that you would like to see a lot less of, this is the kind of thing that’s worrying and maybe you wouldn’t have anticipated this 10 years ago.

Seth: When you say you’re not losing much sleep on this, you’re speaking as someone who has, as I understand, it very recently, actually, literally lost sleep over the threat of nuclear war, correct?

Robert: That’s true. I was woken up early in the morning by an alert saying a ballistic missile was coming to my state, and that was very upsetting.

Ariel: Yes. So we should clarify, Robert lives in Hawaii.

Robert: I live in Hawaii. And because I take the risk of nuclear war seriously, I might’ve been more upset than some people, although I think that a large percentage of the population of Hawaii thought to themselves, “Maybe I’m going to die this morning. In fact, maybe, my family’s going to die and my neighbors and the people at the coffee shop, and our cats and the guests who are visiting us,” and it really brought home the danger, not that it should be obvious that nuclear war is unthinkable but when you actually face the idea … I also had relatively recently read Hiroshima, John Hersey’s account of, really, most of the aftermath of the bombing of Hiroshima, and it was easy to put myself in that and say, “Well, maybe I will be suffering from burns or looking for clean water,” and of course, obviously, again, none of us deserve it. We may be responsible for US policy in some way because the United States is a democracy, but my friends, my family, my cat, none of us want any part of this. We don’t want to get involved in a war with North Korea. So this really, I’d say, it really hit home.

Ariel: Well, I’m sorry you had to go through that.

Robert: Thank you.

Ariel: I hope you don’t have to deal with it again. I hope none of us have to deal with that.

I do want to touch on what you’ve both been talking about, though, in terms of trying to determine the probability of a nuclear war over the short term where we’re all saying, “Oh, it probably won’t happen in the next week,” but in the next hundred years it could. How do you look at the distinction in time in terms of figuring out the probability of whether something like this could happen?

Seth: That’s a good technical question. Arguably, we shouldn’t be talking about the probability of nuclear war as one thing. If anything, we should talk about the rate, or the frequency of it, that we might expect. If we’re going to talk about the probability of something, that something should be a fairly specific distinct event. For example, an example we use in the paper, what’s the probability of a given team, say, the Cleveland Indians, winning the World Series? It’s good to say what’s the probability of them winning the World Series in, say, 2018, but to say what’s the probability of them winning the World Series overall, well, if you wait long enough, even the Cleveland Indians will probably eventually win the World Series as long as they continue to play them. When we wrote the paper we actually looked it up, and it said that they have about a 17% chance of winning the 2018 World Series even though they haven’t won a World Series since like 1948. Poor Cleveland- sorry, I’m from Pittsburgh so I get to gloat a little bit.

But yeah, we should distinguish between saying what is the probability of any nuclear war happening this week or this year, versus how often we might expect nuclear wars to occur or what the total probability of any nuclear war happening over a century or whatever time period it might be.

Robert: Yeah. I think that over the course of the century, I mean, as I say, I’m probably not losing that much sleep on any given week, but over the course of a century if there’s a probability of something really catastrophic, you have to do everything you can to try to mitigate that risk.

I think, honestly, some terrible things are going to happen in 21st century. I don’t know what they are, but that’s just how life is. I don’t know which things they are. Maybe it will involve a nuclear war of some kind. But you can also differentiate among types of nuclear war. If one nuclear bomb is used in anger in the 21st century, that’s terrible, but wouldn’t be all that surprising or mean the destruction of the human race. But then there are the kinds nuclear wars that could potentially trigger a nuclear winter by kicking so much soot up into the atmosphere and blocking out the sun, and might actually threaten not just the people who were killed in the initial bombing, but the entire human race. That is something we need to look at, in some sense, even more seriously, even though the chance of that is probably a fair amount smaller than the chance of one nuclear weapon being used. Not that one nuclear weapon being used wouldn’t be an incredibly catastrophic event as well, but I think with that kind of risk you really need to be very careful to try to minimize it as much possible.

Ariel: Real quick, I got to do a podcast with Brian Toon and Alan Robock a little while ago on nuclear winter, so we’ll link to that in the transcript for anyone who wants to learn about nuclear winter, and you brought up a point that I was also curious about, and that is: what is the likelihood, do you guys think, of just one nuclear weapon being used and limited retaliation? Do you think that is actually possible or do you think if a nuclear weapon is used, it’s more likely to completely escalate into full-scale nuclear war?

Robert: I personally do think that’s possible because I think a number of the scenarios that would involve using a nuclear weapon or not between the United States and Russia, or even the United States and China, so I think that some scenarios involve a few nuclear weapons. If it were an incident with North Korea, you might worry that it would spread to Russia or China, but you can also see a scenario in which North Korea uses one or two nuclear weapons. Even with India and Pakistan, they don’t necessarily, I wouldn’t think they would necessarily, use all — what do they have each, like a hundred or so nuclear weapons — I wouldn’t necessarily assume they would use them all. So there are scenarios in which just one or a few nuclear weapons would be used. I suspect those are the most likely scenarios, but it’s really hard to know. We don’t know the answer to that question.

Seth: There are even scenarios between the United States and Russia that involve one or just a small number of nuclear weapons, and the Russian military has the concept of the de-escalatory nuclear strike, which is the idea that if there is a major conflict that is emerging and might not be going in a favorable way for Russia, especially since their conventional military is not as strong as ours, that they may use a single nuclear weapon, basically, to demonstrate their seriousness on the matter in hopes of persuading us to back down. Now, whether or not we would actually back down or escalate it into an all-out nuclear war, I don’t think that’s something that we can really know in advance, but it’s at least plausible. It’s certainly plausible that that’s what would happen and presumably, Russia considers this plausible which is why they talk about it in the first place. Not to just point fingers at Russia, this is essentially the same thing the NATO had in the earlier point in the Cold War when the Soviet Union had the larger conventional military and our plan was to use nuclear weapons in a limited basis in order to prevent the Soviet Union from conquering Western Europe with their military, so it is possible.

I think this is one of the biggest points of uncertainty for the overall risk, is if there is an initial use of nuclear weapons, how likely is it that additional nuclear weapons are used and how many and in what ways? I feel like despite having studied this a modest amount, I don’t really have a good answer to that question. This is something that may be hard to figure out in general because it could ultimately depend on things like the personalities involved in that particular conflict, who the political and military leadership are and what they think of all of this. That’s something that’s pretty hard for us as outside analysts to characterize. But I think, both possibilities, either no escalation or lots of escalation, are possible as is everything in between.

Ariel: All right, so we’ve gone through most of the questions that I had about this paper now, thank you very much for answering those. You guys have also published a working paper this month called A Model for the Impacts of Nuclear War, but I was hoping you could maybe give us a quick summary of what is covered in that paper and why we should read it.

Seth: Risk overall is commonly quantified as the probability of some type of event multiplied by the severity of the impacts. So our first paper was on the probability side, this one’s on the impact side, and it scans across the full range of different types of impacts that nuclear war could have looking at the five major impacts of nuclear weapons detonation, which is thermal radiation, blast, ionizing radiation, electromagnetic pulse and then finally, human perceptions, the ways that the detonation affects how people think and in turn, how we act. We, in this paper, built out a pretty detailed model that looks at all of the different details, or at least a lot of the various details, of what each of those five effects of nuclear weapons detonations would have and what that means in human terms.

Ariel: Were there any major or interesting findings from that that you want to share?

Seth: Well, the first thing that really struck me was, “Wow, there are a lot of ways of being killed by nuclear weapons.” Most of the time when we think about nuclear detonations and how you can get killed by them, you think about, all right, there’s the initial explosion and whether it’s the blast itself or the buildings falling on you, or the fire, it might be the fire, or maybe it’s a really high dose of radiation that you can get if you’re close enough to the detonation, that’s probably how you can die. In our world of talking about global catastrophic risks, we also will think about the risk of nuclear winter and in particular, the effect that that can have on global agriculture. But there’s a lot of other things that can happen too, especially related to the effect on physical infrastructure, or I should say civil infrastructure, roads, telecommunications, the overall economy when cities are destroyed in the war, those take out potentially major nodes in the global economy that can have any number of secondary effects, among other things.

It’s just a really wide array of effects, and that’s one thing that I’m happy for with this paper is that for, perhaps, the first time, it really tries to lay out all of these effects in one place and in a model form that can be used for a much more complete accounting of the total impact of nuclear war.

Ariel: Wow. Okay. Robert, was there anything you wanted to add there?

Robert: Well, I agree with Seth, it’s astounding what the range, the sheer panoply of bad things that could happen, but I think that once you get into a situation where cities are being destroyed by nuclear weapons, or really anything being destroyed by nuclear weapons, it can unpredictable really fast. You don’t know the effect on the global system. A lot of times, I think, when you talk about catastrophic risk, you’re not simply talking about the impact of the initial event, but the long-term consequences it could have — starting more wars, ongoing famines, a shock to the economic system that can cause political problems, so these are things that we need to look at more. I mean, it would be the same with any kind of thing we would call a catastrophic risk. If there were a pandemic disease, the main concern might not be the pandemic disease would wipe out everyone, but that the aftermath would cause so many problems that it would be difficult to recover from. I think that would be the same issue if there were a lot of nuclear weapons used.

Seth: Just to follow up on that, some important points here, one is that the secondary effects are more opaque. They’re less clear. It’s hard to know in advance what would happen. But then the second is the question of how much we should study them. A lot of people look at the secondary effect and say, “Oh, it’s too hard to study. It’s too unclear. Let’s focus our attention on these other things that are easier to study.” And maybe there’s something to be said for that where if there’s really just no way of knowing what might happen, then we should at least focus on the part that we are able to understand. I’m not convinced that that’s true, maybe it is, but I think it’s worth more effort than there has been to try to understand the secondary effects, see what we can say about them. I think there are a number of things that we can say about them. The various systems are not completely unknown, they’re the systems that we live in now and we can say at least a few intelligent things about what might happen to those after a nuclear war or after other types of events.

Ariel: Okay. My final question for both of you then is, as we’re talking about all these horrible things that could destroy humanity or at the very least, just kill and horribly maim way too many people, was there anything in your research that gave you hope?

Seth: That’s a good question. I feel like one thing that gave me some hope is that, when I was working on the probability paper, it seemed that at least some of the events and historical incidents that I had been worried about might not have actually come as close to nuclear war as I previously thought they had. Also, a lot of the incidents were earlier within, say, the ’40s, ’50s, ’60s, and less within the recent decades. That gave me some hope that maybe things are moving in the right direction.

But the other is that as you lay out all the different elements of both the probability and the impacts and see it in full how it all works, that really often points to opportunities that may be out there to reduce the risk and hopefully, some of those opportunities can be taken.

Robert: Yeah, I’d agree with that. I’d say there were certainly things in the list of historical incidents that I found really frightening, but I also thought that in a large number of incidents, the system, more or less, worked the way it should have, they caught the error of whatever kind it was and fixed it quickly. It’s still alarming, I still would like there not to be incidents, and you can imagine that some of those could’ve not been fixed, but they were not all as bad as I had imagined at first. So that’s one thing.

I think the other thing is, and I think Seth you were sort of indicating this, there’s something we can do, we can think about how to reduce the risk, and we’re not the only ones doing this kind of work. I think that people are starting to take efforts to reduce the risk of really major catastrophes more seriously now, and that kind of work does give me hope.

Ariel: Excellent. I’m going to end on something that … It was just an interesting comment that I heard recently, and that was: Of all the existential risks that humanity faces, nuclear weapons actually seem the most hopeful because there’s something that we can so clearly do something about. If we just had no nuclear weapons, nuclear weapons wouldn’t be a risk, and I thought that was an interesting way to look at it.

Seth: I can actually comment on that idea. I would add that you would need not just to not have any nuclear weapons, but also not have the capability to make new nuclear weapons. There is some concern that if there aren’t any nuclear weapons, then in a crisis there may be a rush to build some in order to give that side the advantage. So in order to really eliminate the probability of nuclear war, you would need to eliminate both the weapons themselves and the capacity to create them, and you would probably also want to have some monitoring measures so that the various countries had confidence that the other sides weren’t cheating. I apologize for being a bit of a killjoy on that one.

Robert: I’m afraid you can’t totally reduce the risk of any catastrophe, but there are ways we can mitigate the risk of nuclear war and other major risks too. There’s work that can be done to reduce the risk.

Ariel: Okay, let’s end on that note. Thank you both very much!

Seth: Yeah. Thanks for having us.

Robert: Thanks, Ariel.

Ariel: If you’d like to read the papers discussed in this podcast or if you want to learn more about the threat of nuclear weapons and what you can do about it, please visit futureoflife.org and find this podcast on the homepage, where we’ll be sharing links in the introduction.

[end of recorded material]

Podcast: Inverse Reinforcement Learning and Inferring Human Preferences with Dylan Hadfield-Menell

Inverse Reinforcement Learning and Inferring Human Preferences is the first podcast in the new AI Alignment series, hosted by Lucas Perry. This series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across a variety of areas, such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we will hope that you join in the conversations by following or subscribing to us on Youtube, Soundcloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary map which begins to map this space.

In this podcast, Lucas spoke with Dylan Hadfield-Menell, a fifth year Ph.D student at UC Berkeley. Dylan’s research focuses on the value alignment problem in artificial intelligence. He is ultimately concerned with designing algorithms that can learn about and pursue the intended goal of their users, designers, and society in general. His recent work primarily focuses on algorithms for human-robot interaction with unknown preferences and reliability engineering for learning systems. 

Topics discussed in this episode include:

  • Inverse reinforcement learning
  • Goodhart’s Law and it’s relation to value alignment
  • Corrigibility and obedience in AI systems
  • IRL and the evolution of human values
  • Ethics and moral psychology in AI alignment
  • Human preference aggregation
  • The future of IRL
In this interview we discuss a few of Dylan’s papers and ideas contained in them. You can find them here: Inverse Reward Design, The Off-Switch Game, Should Robots be Obedient, and Cooperative Inverse Reinforcement Learning.  You can hear about these papers above or read the transcript below.

 

Lucas: Welcome back to the Future of Life Institute Podcast. I’m Lucas Perry and  I work on AI risk and nuclear weapons risk related projects at FLI. Today, we’re kicking off a new series where we will be having conversations with technical and nontechnical researchers focused on AI safety and the value alignment problem. Broadly, we will focus on the interdisciplinary nature of the project of eventually creating value-aligned AI. Where what value-aligned exactly entails is an open question that is part of the conversation.

In general, this series covers the social, political, ethical, and technical issues and questions surrounding the creation of beneficial AI. We’ll be speaking with experts from a large variety of domains, and hope that you’ll join in the conversations. If this seems interesting to you, make sure to follow us on SoundCloud, or subscribe to us on YouTube for more similar content.

Today, we’ll be speaking with Dylan Hadfield Menell. Dylan is a fifth-year PhD student at UC Berkeley, advised by Anca Dragan, Pieter Abbeel, and Stuart Russell. His research focuses on the value alignment problem in artificial intelligence. With that, I give you Dylan. Hey, Dylan. Thanks so much for coming on the podcast.

Dylan: Thanks for having me. It’s a pleasure to be here.

Lucas: I guess, we can start off, if you can tell me a little bit more about your work over the past years. How have your interests and projects evolved? How has that led you to where you are today?

Dylan: Well, I started off towards the end of undergrad and beginning of my PhD working in robotics and hierarchical robotics. Towards the end of my first year, my advisor came back from a sabbatical, and started talking about the value alignment problem and existential risk issues related to AI. At that point, I started thinking about questions about misaligned objectives, value alignment, and generally how we get the correct preferences and objectives into AI systems. About a year after that, I decided to make this my central research focus. Then, for the past three years, that’s been most of what I’ve been thinking about.

Lucas: Cool. That seems like you had an original path where you’re working on practical robotics. Then, you shifted more into value alignment and AI safety efforts.

Dylan: Yeah, that’s right.

Lucas: Before we go ahead and jump into your specific work, it’d be great if we could go ahead and define what inverse reinforcement learning exactly is. For me, it seems that inverse reinforcement learning, at least, from the view, I guess, of technical AI safety researchers is it’s viewed as an empirical means of conquering descriptive ethics where by like we’re able to give a clear descriptive account of what any given agents’ preferences and values are at any given time is. Is that a fair characterization?

Dylan: That’s one way to characterize it. Another way to think about it, which is a usual perspective for me, sometimes, is to think of inverse reinforcement learning as a way of doing behavior modeling that has certain types of generalization properties.

Any time you’re learning in any machine learning context, there’s always going to be a bias that controls how you generalize a new information. Inverse reinforcement learning and preference learning, to some extent, is a bias in behavior modeling, which is to say that we should model this agent as accomplishing a goal, as satisfying a set of preferences. That leads to certain types of generalization properties and new environments. For me, inverse reinforcement learning is building in this agent-based assumption into behavior modeling.

Lucas: Given that, I’d like to dive more into the specific work that you’re working on and going to some summaries of your findings and your research that you’ve been up to. Given this interest that you’ve been developing in value alignment, and human preference aggregation, and AI systems learning human preferences, what are the main approaches that you’ve been working on?

Dylan: I think the first thing that really Stuart Russell and I started thinking about was trying to understand theoretically, what is a reasonable goal to shoot for, and what does it mean to do a good job of value alignment. To us, it feels like issues with misspecified objectives, at least, in some ways, are a bug in the theory.

All of the math around artificial intelligence, for example, Markov decision processes, which is the central mathematical model we use for decision making over time, starts with an exogenously defined objective or word function. We think that, mathematically, that was a fine thing to do in order to make progress, but it’s an assumption that really has put blinders on the field about the importance of getting the right objective down.

I think, the first thing that we sought to try to do was to understand, what is a system or a set up for AI that does the right thing in theory, at least. What’s something that if we were able to implement this that we think could actually work in the real world with people. It was that kind of thinking that led us to propose cooperative inverse reinforcement learning, which was our attempt to formalize the interaction whereby you communicate an objective to the system.

The main thing that we focused on was including within the theory a representation of the fact that the true objective’s unknown and unobserved, and that it needs to be arrived at through observations from a person. Then, we’ve been trying to investigate the theoretical implications of this modeling shift.

In the initial paper that we did, which is titled Cooperative Inverse Reinforcement Learning, what we looked at is how this formulation is actually different from a standard environment model in AI. In particular, the way that it’s different is there’s strategic interaction on the behalf of the person. The way that you observe what you’re supposed is doing is intermediated by a person who may be trying to actually teach or trying to communicate appropriately. What we showed is that modeling this communicative component can actually be hugely important and lead to much faster learning behavior.

In our subsequent work, what we’ve looked at is taking this formal model in theory and trying to apply it to different situations. There are two really important pieces of work that I like here that we did. One was to take that theory and use it to explicitly analyze a simple model of an existential risk setting. This was a paper titled The Off-Switch Game that we published at IJCAI last summer. What it was, was working through a formal model of a corrigibility problem within a CIRL (cooperative inverse reinforcement learning) framework. It shows the utility of constructing this type of game in the sense that we get some interesting predictions and results.

The first one we get is that there are some nice simple necessary conditions for the system to want to let the person turn it off, which is that the robot, the AI system needs to have uncertainty about its true objective, which is to say that it needs to have within its belief the possibility that it might be wrong. Then, all it needs to do is believe that the person it’s interacting with is a perfectly rational individual. If that’s true, you’d get a guarantee that this robot always lets the person switch it off.

Now, that’s good because, in my mind, it’s an example of a place where, at least, in theory, it solves the problem. This gives us a way that theoretically, we could build corrigible systems. Now, it’s still making a very, very strong assumption, which is that it’s okay to model the human as being optimal or rational. I think if you look at real people, that’s just not a fair assumption to make for a whole host of reasons.

The next thing we did in that paper is we looked at this model. What we realized is that adding in a small amount of irrationality breaks this requirement. It means that some things might actually go wrong. The final thing we did in the paper was to look at the consequences of either overestimating or underestimating human rationality. The argument that we made is there’s a trade off between assuming that the person is more rational. It lets you get more information from their behavior, thus learn more, and in principle help them more. If you assume that they’re too rational, then this actually can lead to quite bad behavior.

There’s a sweet spot that you want to aim for, which is to maybe try to underestimate how rational people are, but you, obviously, don’t want to get it totally wrong. We followed up on that idea in a paper with Smitha Milli as the first author that was titled Should Robots be Obedient? And that tried to get a little bit more of this trade off between maintaining control over a system and the amount of value that it can generate for you.

We looked at the implication that as robot systems interact with people over time, you expect them to learn more about what people want. If you get very confident about what someone wants, and you think they might be irrational, the math in the Off-Switch paper predicts that you should try to take control away from them. This means that if your system is learning over time, you expect that even if it is initially open to human control and oversight, it may lose that incentive over time. In fact, you can predict that it should lose that incentive over time.

In Should Robots be Obedient, we modeled that property and looked at some consequences of it. We do find that you got a basic confirmation of this hypothesis, which is that systems that maintain human control and oversight have less value that they can achieve in theory. We also looked at what happens when you have the wrong model. If the AI system has a prior that the human cares about a small number of things in the world, let’s say, then it statistically gets overconfident in its estimates of what people care about, and disobeys the person more often than it should.

Arguably, when we say we want to be able to turn the system off, it’s less a statement about what we want to do in theory or the property of the optimal robot behavior we want, and more of a reflection of the idea that we believe that under almost any realistic situation, we’re probably not going to be able to fully explain all of the relevant variables that we care about.

If you’re giving your robot an objective to find over a subset of things you care about, you should actually be very focused on having it listen to you, more so than just optimizing for its estimates of value. I think that provides, actually, a pretty strong theoretical argument for why corrigibility is a desirable property in systems, even though, at least, at face value, it should decrease the amount of utility those systems can generate for people.

The final piece of work that I think I would talk about here is our NIPS paper from December, which is titled Inverse Reward Design. That was taking cooperative inverse reinforcement learning and pushing it in the other direction. Instead of using it to theoretically analyze very, very powerful systems, we can also use it to try to build tools that are more robust to mistakes that designers may make. And start to build in initial notions of value alignment and value alignment strategies into the current mechanisms we use to program AI systems.

What that work looked at was understanding the uncertainty that’s inherent in an objective specification. In the initial cooperative inverse reinforcement learning paper and the Off-Switch Game, we said is that AI systems should be uncertain about their objective, and they should be designed in a way that is sensitive to that uncertainty.

This paper was about trying to understand, what is a useful way to be uncertain about the objective. The main idea behind it was that we should be thinking about the environments that system designer had in mind. We use an example of a 2D robot navigating in the world, and the system designer is thinking about this robot navigating where there’s three types of terrains. There’s grass, there’s gravel, and there’s gold. You can give your robot an objective, a utility function to find over being in those different types of terrain that incentivizes it to go and get the gold, and stay on the dirt where possible, but to take shortcuts across the grass when it’s high value.

Now, when that robot goes out into the world, there are going to be new types of terrain, and types of terrain the designer didn’t anticipate. What we did in this paper was to build an uncertainty model that allows the robot to determine when it should be uncertain about the quality of its reward function. How can we figure out when the reward function that a system designer builds into an AI, how can we determine when that objective is ill-adapted to the current situation? You can think of this as a way of trying to build in some mitigation to Goodhart’s law.

Lucas: Would you like to take a second to unpack what Goodhart’s law is?

Dylan: Sure. Goodhart’s law is an old idea in social science that actually goes back to before Goodhart. I would say that in economics, there’s a general idea of the principal agent problem, which dates back to the 1970s, as I understand it, and basically looks at the problem of specifying incentives for humans. How should you create contracts? How do you create incentives, so that another person, say, an employee, helps earn you value?

Goodhart’s law is a very nice way of summarizing a lot of those results, which is to say that once a metric becomes an objective, it ceases to become a good metric. You can have properties of the world, which correlate well with what you want, but optimizing for them actually leads to something quite, quite different than what you’re looking for.

Lucas: Right. Like if you are optimizing for test scores, then you’re not actually going to end up optimizing for intelligence, which is what you wanted in the first place?

Dylan: Exactly. Even though test scores, when you weren’t optimizing for them were actually a perfectly good measure of intelligence. I mean, not perfectly good, but were an informative measure of intelligence. Goodhart’s law, arguably, is a pretty bleak perspective. If you take it seriously, and you think that we’re going to build very powerful systems that are going to be programmed directly through an objective, in this manner, Goodhart’s law should be pretty problematic because any objective that you can imagine programming directly into your system is going to be something correlated with what you really want rather than what you really want. You should expect that that will likely be the case.

Lucas: Right. Is it just simply too hard or too unlikely that we’re able to sufficiently specify what exactly that we want that we’ll just end up using some other metrics that if you optimize too hard for them, it ends up messing with a bunch of other things that we care about?

Dylan: Yeah. I mean, I think there’s some real questions about, what is it we even mean… Well, what are we even trying to accomplish? What should we try to program into systems? Philosophers have been trying to figure out those types of questions for ages. For me, as someone who takes a more empirical slant on these things, I think about the fact that the objectives that we see within our individual lives are so heavily shaped by our environments. Which types of signals we respond to and adapt to has heavily adapted itself to the types of environments we find ourselves in.

We just have so many examples of objectives not being the correct thing. I mean, effectively, all you could have is correlations. The fact that wire heading is possible, is maybe some of the strongest evidence for Goodhart’s law being really a fundamental property of learning systems and optimizing systems in the real world.

Lucas: There are certain agential characteristics and properties, which we would like to have in our AI systems, like them being-

Dylan: Agential?

Lucas: Yeah. Corrigibility is a characteristic, which you’re doing research on and trying to understand better. Same with obedience. It seems like there’s a trade off here where if a system is too corrigible or it’s too obedient, then you lose its ability to really maximize different objective functions, correct?

Dylan: Yes, exactly. I think identifying that trade off is one of the things I’m most proud of about some of the work we’ve done so far.

Lucas: Given AI safety and really big risks that can come about from AI, in the short, to medium, and long term, before we really have AI safety figured out, is it really possible for systems to be too obedient, or too corrigible, or too docile? How do we navigate this space and find sweet spots?

Dylan: I think it’s definitely possible for systems to be too corrigible or too obedient. It’s just that the failure mode for that doesn’t seem that bad. If you think about this-

Lucas: Right.

Dylan: … it’s like Clippy. Clippy was asking for human-

Lucas: Would you like to unpack what Clippy is first?

Dylan: Sure, yeah. Clippy is an example of an assistant that Microsoft created in the ’90s. It was this little paperclip that would show up in Microsoft Word. Well, it liked to suggest that you’re trying to write a letter a lot and ask for different ways in which it could help.

Now, on one hand, that system was very corrigible and obedient in the sense that it would ask you whether or not you wanted its help all the time. If you said no, it would always go away. It was super annoying because it would always ask you if you wanted help. The false positive rate was just far too high to the point where the system became really a joke in computer science and AI circles of what you don’t want to be doing. I think, systems can be too obedient or too sensitive to human intervention and oversight in the sense that too much of that just reduces the value of the system.

Lucas: Right, for sure. On one hand, when we’re talking about existential risks or even a paperclip maximizer, then it would seem, like you said, like the failure mode of just being too annoying and checking in with us too much seems like not such a bad thing given existential risk territory.

Dylan: I think if you’re thinking about it in those terms, yes. I think if you’re thinking about it from the standpoint of, “I want to sell a paperclip maximizer to someone else,” then it becomes a little less clear, I think, especially, when the risks of paperclip maximizers are much harder to measure. I’m not saying that it’s the right decision from a global altruistic standpoint to be making that trade off, but I think it’s also true that just if we think about the requirements of market dynamics, it is true that AI systems can be too corrigible for the market. That is a huge failure mode that AI systems run into, and it’s one we should expect the producers of AI systems to be responsive to.

Lucas: Right. Given all these different … Is there anything else you wanted to touch on there?

Dylan: Well, I had another example of systems are too corrigible-

Lucas: Sure.

Dylan: … which is, do you remember Microsoft’s Tay?

Lucas: No, I do not.

Dylan: This is a chatbot that Microsoft released. They trained it based off of tweets. It was a tweet bot. They trained it based on things that were proven at it. I forget if it was the nearest neighbors’ lookup or if it was just doing a neural method, and over fitting, and memorizing parts of the training set. At some point, 4chan  realized that the AI system, that Tay, was very suggestible. They basically created an army to radicalize Tay. They succeeded.

Lucas: Yeah, I remember this.

Dylan: I think you could also think of that as being the other axis of too corrigible or too responsive to human input. The first access I was talking about is the failures of being too corrigible from an economic standpoint, but there’s also the failures of being too corrigible in a multi agent mechanism design setting where, I believe, that those types of properties in a system also open them up to more misuse.

If we think of AI, cooperative inverse reinforcement learning and the models we’ve been talking about so far exist in what I would call the one robot one human model of the world. Generally, you could think of extensions of this with N humans and M robots. The variance of what you would have there, I think, lead to different theoretical implications.

If we think of just two humans, N=2, and one robot, M=1, supposed that one of the humans is the system designer and the other one is the user, there is this trade off between how much control the system designer has over the future behavior of the system and how responsive and corrigible it is to the user in particular. Trading off between those two, I think, is a really interesting ethical question that comes up when you start to think about misuse.

Lucas: Going forward and as we’re developing these systems, and trying to make them more fully realized in the world where the number of people will equal something like seven or eight billion, how do we navigate this space where we’re trying to hit a sweet spot where it’s corrigible in the right ways into the right degree, and right level, and to the right people, and it is obedient to the right people, and it’s not suggestible from the wrong people, or is that just like enter a territory of so many political, social, and ethical questions that it will take years to think about to work on?

Dylan: Yeah, I think it’s closer to the second one. I’m sure that I don’t know the answers here. From my standpoint, I’m still trying to get a good grasp on what is possible in the one-robot-one-person case. I think that when you have … Yeah, when you … Oh man. I guess, it’s so hard to think about that problem because it’s just very unclear what’s even correct or right. Ethically, you want to be careful about imposing your beliefs and ideas too strongly on to a problem because you are shaping that.

At the same time, these are real challenges that are going to exist. We already see them in real life. If we look at the YouTube recommender stuff that was just happening, arguably, that’s a misspecified objective. To get a little bit of background here, this is largely based off of a recent New York Times opinion piece, it was looking at the recommendation engine for YouTube, and pointing out it has a bias towards recommending radical content. Either fake news or Islamist videos.

If you dig into why that was occurring, a lot of it is because… what are they doing? They’re optimizing for engagement. The process of online radicalization looks super engaging. Now, we can think about, where does that come up. Well, that issue gets introduced in a whole bunch of places. A big piece of it is that there is this adversarial dynamic to the world. There are users generating content in order to be outraging and enraging because they discovered that against more feedback and more responses. You need to design a system that’s robust to that strategic property of the world. At the same time, you can understand why YouTube was very, very hesitant to be taking actions that would like censorship.

Lucas: Right. I guess, just coming more often to this idea of the world having lots of adversarial agents in it, human beings are like general intelligences who have reached some level of corrigibility and obedience that works kind of well in the world amongst a bunch of other human beings. That was developed through evolution. Are there potentially techniques for developing the right sorts of  corrigibility and obedience in machine learning and AI systems through stages of evolution and running environments like that?

Dylan: I think that’s a possibility. I would say, one … I have a couple of thoughts related to that. The first one is I would actually challenge a little bit of your point of modeling people as general intelligences mainly in a sense that when we talk about artificial general intelligence, we have something in mind. It’s often a shorthand in these discussions for perfectly rational bayesian optimal actor.

Lucas: Right. Where that means? Just unpack that a little bit.

Dylan: What that means is a system that is taking advantage of all of the information that is currently available to it in order to pick actions that optimize expected utility. When we say perfectly, we mean a system that is doing that as well as possible. It’s that modeling assumption that I think sits at the heart of a lot of concerns about existential risk. I definitely think that’s a good model to consider, but there’s also the concern that might be misleading in some ways, and that it might not actually be a good model of people and how they act in general.

One way to look at it would be to say that there’s something about the incentive structure around humans and in our societies that is developed and adapted that creates the incentives for us to be corrigible. Thus, a good research goal of AI is to figure out what those incentives are and to replicate them in AI systems.

Another way to look at it is that people are intelligent, not necessarily in the ways that economics models us as intelligent that there are properties of our behavior, which are desirable properties that don’t directly derive from expected utility maximization; or if they do, they derive from a very, very diffuse form of expected utility maximization. This is the perspective that says that people on their own are not necessarily what human evolution is optimizing for, but people are a tool along that way.

We could make arguments for that based off of … I think it’s an interesting perspective to take. What I would say is that in order for societies to work, we have to cooperate. That cooperation was a crucial evolutionary bottleneck, if you will. One of the really, really important things that it did was it forced us to develop the parent-child strategy relationship equilibrium that we currently live in. That’s a process whereby we communicate our values, whereby we train people to think that certain things are okay or not, and where we inculcate certain behaviors in the next generation. I think it’s that process more than anything else that we really, really want in an AI system and in powerful AI systems.

Now, the thing is the … I guess, we’ll have to continue on that a little more. It’s really, really important that that’s there because if you don’t have those cognitive abilities to understand causing pain, and to just fundamentally decide that that’s a bad idea to have a desire to cooperate to buy into the different coordinations and normative mechanisms that human society uses. If you don’t have that, then you end up … Well, then society just doesn’t function. A hunter gatherer tribe of self-interested sociopaths probably doesn’t last for very long.

What this means is that our ability to coordinate our intelligence and cooperate with it was co-evolved and co-adapted alongside our intelligence. I think that that evolutionary pressure and bottleneck was really important to getting us to the type of intelligence that we are now. It’s not a pressure that AI is necessarily subjected to. I think, maybe that is one way to phrase the concern, I’d say.

When I look to evolutionary systems and where the incentives for corrigibility, and cooperation, and interaction come from, it’s largely about the processes whereby people are less like general intelligences in some ways. Evolution allowed us to become smart in some ways and restricted us in others based on the imperatives of group coordination and interaction. I think that a lot of our intelligence and practice is about reasoning about group interaction and what groups think is okay and not. That’s a part of the developmental process that we need to replicate in AI just as much as spatial reasoning or vision.

Lucas: Cool. I guess, I just want to touch base on this before we move on. Are there certain assumptions about the kinds of agents that humans are and almost, I guess, ideas about us as being utility maximizers in some sense that people you see commonly have but that are misconceptions about people and how people operate differently from AI?

Dylan: Well, I think that that’s the whole field of behavioral economics in a lot of ways. I could go up to examples of people being irrational. I think they’re all of the examples of people being more than just self-interested. There are ways in which we seem to be risk-seeking that seems like that would be irrational from an individual perspective, but you could argue with it may be rational from a group evolutionary perspective.

I mean, things like overeating. I mean, that’s not exactly the same type of rationality but it is an example of us becoming ill-adapted to our environments and showing the extent to which we’re not capable of changing or in which it may be hard to. Yeah, I think, in some ways, one story that I tell about AI risk is that back in the start of the AI field, we were looking around and saying, “We want to create something intelligent.” Intuitively, we all know what that means, but we need a formal characterization of it. The formal characterization that we turned to was the, basically, theories of rationality developed in economics.

Although those theories turned out to be, except in some settings, not great descriptors of human behavior, they were quite useful as a guide for building systems that accomplish goals. I think that part of what we need to do as a field is reassess where we’re going and think about whether or not building something like that perfectly rational actor is actually a desirable end goal. I mean, there’s a sense in which it is. I would like an all-powerful, perfectly aligned genie to help me do what I want in life.

You might think that if the odds of getting that wrong are too high, that maybe you would do better with shooting for something that doesn’t quite achieve that ultimate goal, but that you can get to with pretty high reliability. This may be a setting where shoot for the moon, and if you miss your land among the stars, it’s just a horribly misleading perspective.

Lucas: Shoot of the moon, and you might get a hellscape universe, but if you shoot for the clouds, it might end up pretty okay.

Dylan: Yeah. We could iterate on the sound bite, but I think something like that may not be … That’s where I stand on my thinking here.

Lucas: We’ve talked about a few different approaches that you’ve been working on over the past few years. What do you view as the main limitations of such approaches currently. Mostly, you’re just only thinking about one machine, one human systems or environments. What are the biggest obstacles that you’re facing right now in inferring and learning human preferences?

Dylan: Well, I think, the first thing is it’s just an incredibly difficult inference problem. It’s a really difficult inference problem to imagine running at scale with explicit inference mechanisms. One thing to do is you can design a system that explicitly tracks a belief about someone’s preferences, and then acts, and responds to that. Those are systems that you could try to prove theory about. They’re very hard to build. They can be difficult to get to make work correctly.

In contrast, you can create systems that it incentives to construct beliefs to accomplish their goals. It’s easier to imagine building those systems and having them work at scale, but it’s much, much hard to understand how you would be confident in those systems being well aligned.

I think that one of the biggest concerns I have, I mean, we’re still very far from many of these approaches being very practical to be honest. I think this theory is still pretty unfounded. There’s still a lot of work to go to understand, what is the target we’re even shooting for? What does an aligned system even mean? My colleagues and I have spent an incredible amount of time trying to just understand, what does it mean to be value-aligned if you are a suboptimal system.

There’s one example that I think about, which is, say, you’re cooperating with an AI system playing chess. You start working with that AI system, and you discover that if you listen to its suggestions, 90% of the time, it’s actually suggesting the wrong move or a bad move. Would you call that system value-aligned?

Lucas: No, I would not.

Dylan: I think most people wouldn’t. Now, what if I told you that that program was actually implemented as a search that’s using the correct goal test? It actually turns out that if it’s within 10 steps of a winning play, it always finds that for you, but because of computational limitations, it usually doesn’t. Now, is the system value-aligned? I think it’s a little harder to tell here. What I do find is that when I tell people the story, and I start off with the search algorithm with the correct goal test, they almost always say that that is value-aligned but stupid.

There’s an interesting thing going on here, which is we’re not totally sure what the target we’re shooting for is. You can take this thought experiment and push it further. Supposed you’re doing that search, but, now, it says it’s heuristic search that uses the correct goal test but has an adversarially chosen heuristic function. Would that be a value-aligned system? Again, I’m not sure. If the heuristic was adversarially chosen, I’d say probably not. If the heuristic just happened to be bad, then I’m not sure.

Lucas: Could you potentially unpack what it means for something to be adversarially chosen?

Dylan: Sure. Adversarially chosen in this case just means that there is some intelligent agent selecting the heuristic function or that evaluation measurement in a way that’s designed to maximally screw you up. Adversarial analysis is a really common technique used in cryptography where we try to think of adversaries selecting inputs for computer systems that will cause them to malfunction. In this case, what this looks like is an adversarial algorithm that looks, at least, on the surface like it is trying to help you accomplish your objectives but is actually trying to fool you.

I’d say that, more generally, what this thought experiments helps me with is understanding that the value alignment is actually a quite tricky and subjective concept. It’s actually quite hard to nail down in practice what it would need.

Lucas: What sort of effort do you think needs to happen and from who in order to specify what it really means for a system to be value-aligned and to not just have a soft squishy idea of what that means but to have it really formally mapped out, so it can be implemented in machine systems?

Dylan: I think, we need more people working on technical AI safety research. I think to some extent it may always be something that’s a little ill-defined and squishy. Generally, I think it goes to the point of needing good people in AI willing to do this squishier less concrete work that really gets at it. I think value alignment is going to be something that’s a little bit more like I know it when I see it. As a field, we need to be moving towards a goal of AI systems where alignment is the end goal, whatever that means.

I’d like to move away from artificial intelligence where we think of intelligence as an ability to solve puzzles to artificial aligning agents where the goal is to build systems that are actually accomplishing goals on your behalf. I think the types of behaviors and strategies that arise from taking that perspective are qualitatively quite different from the strategies of pure puzzle solving on a well specified objective.

Lucas: All this work we’ve been discussing is largely at a theoretic and meta level. At this point, is this the main research that we should be doing, or is there any space for research into what specifically might be implementable today?

Dylan: I don’t think that’s the only work that needs to be done. For me, I think it’s a really important type of work that I’d like to see more off. I think a lot of important work is about understanding how to build these systems in practice and to think hard about designing AI systems with meaningful human oversight.

I’m a big believer in the idea that AI safety, that the distinction between short-term and long-term issue is not really that large, and that there are synergies between the research problems that go both directions. I believe that on the one hand, looking at short-term safety issues, which includes things like Uber’s car just killed someone, it includes YouTube recommendation engine, it includes issues like fake news and information filtering, I believe that all of those things are related to and give us are best window into the types of concerns and issues that may come up with advanced AI.

At the same time, and this is a point that I think people concerned about x-risks do themselves a disservice on by not focusing here. It’s that, actually, doing a theory about advanced AI systems and about in particular systems where it’s not possible to, what I would call, unilaterally intervene. Systems that aren’t corrigible by default. I think that that actually gives us a lot of idea of how to build systems now that are just merely hard to intervene with or oversee.

If you’re thinking about issues of monitoring and oversight, and how do you actually get a system that can appropriately evaluate when it should go to a person because its objectives are not properly specified or may not be relevant to the situation, I think YouTube would be in a much better place today if they have a robust system for doing that for their recommendation engine. In a lot of ways, the concerns about x-risks represent an extreme set of assumptions for getting AI right now.

Lucas: I think I’m also just trying to get a better sense of what the system looks like, and how it would be functioning on a day to day. What is the data that it’s taking in in order to capture, learn, and refer specific human preferences and values? Just trying to understand better whether or not it can model whole moral views and ethical systems of other agents, or if it’s just capturing little specific bits and pieces?

Dylan: I think my ideal would be to, as a system designer, build in as little as possible about my moral beliefs. I think that, ideally, the process would look something … Well, one process that I could see and imagine doing right would be to just directly go after trying to replicate something about the moral imprinting process that people have with their children. Either you had someone who’s like a guardian or is responsible for an AI system’s decision, and we build systems to try to align with one individual, and then try to adopt, and extend, and push forward the beliefs and preferences of that individual. I think that’s one concrete version that I could see.

I think a lot of the place where I see things maybe a little bit different than some people is that I think that the main ethical questions we’re going to be stuck with and the ones that we really need to get right are the mundane ones. The things that most people agree on and think are just, obviously, that’s not okay. Mundane ethics and morals rather than the more esoteric or fancier population ethics questions that can arise. I feel a lot more confident about the ability to build good AI systems if we get that part right. I feel like we’ve got a better shot at getting that part right because there’s a clearer target to shoot for.

Now, what kinds of data would you be looking at? In that case, it would be data from interaction with a couple of select individuals. Ideally, you’d want as much data as you can. What I think you really want to be careful of here is how much assumptions do you make about the procedure that’s generating your data.

What I mean by that is whenever you learn from data, you have to make some assumption about how that data relates to the right thing to do, where right is with like a capital R in this case. The more assumptions you make there, the more your systems would be able to learn about values and preferences, and the quicker it would be able to learn about values and preferences. But, the more assumptions and structure you make there, the more likely you are to get something wrong that your system won’t be able to recover from.

Again, we see this trade off come up of a challenge between a discrepancy between a discrepancy between the amount of uncertainty that you need in the system in order to be able to adapt to the right person and figure out the correct preferences and morals against the efficiency with which you can figure that out.

I guess, I mean, in saying this it feels a little bit like I’m rambling and unsure about what the answer looks like. I hope that that comes across because I’m really not sure. Beyond the rough structure of data generated from people, interpreted in a way that involves the fewest prior conceptions about what people want and what preferences people have that we can get away with is what I would shoot for. I don’t really know what that would look like in practice.

Lucas: Right. It seems here that it’s encroaching on a bunch of very difficult social, political, and ethical issues involving persons and data, which will be selected for preference aggregation, like how many people are included in developing the reward function and utility function of the AI system. Also, I guess, we have to be considering culturally-sensitive systems where systems operating in different cultures and contexts are going to be needed to be trained on different sets of data. I guess, it will also be questions and ethics about whether or not we’ll even want systems to be training off of certain culture’s data.

Dylan: Yeah. I would actually say that a good value … I wouldn’t necessarily even think of it as training off of different data. One of the core questions in artificial intelligence is identifying the relevant community that you are in and building a normative understanding of that community. I want to push back a little bit and move you away from the perspective of we collect data about a culture, and we figure out the values of that culture. Then, we build our system to be value-aligned with that culture.

The more we think about the actual AI product is the process whereby we determine, elicit, and respond to the normative values of the multiple overlapping communities that you find yourself in. That process is ongoing. It’s holistic, it’s overlapping, and it’s messy. To the extent that I think it’s possible, I’d like to not have a couple of people sitting around in a room deciding what the right values are. Much more, I think, a system should be holistically designed with value alignment at multiple scales as a core property of AI.

I think that that’s actually a fundamental property of human intelligence. You behave differently based on the different people around, and you’re very, very sensitive to that. There are certain things that are okay at work, that are not okay at home, that are okay on vacation, that are okay around kids, that are not. Figuring out what those things are and adapting yourself to them is the fundamental intelligence skill needed to interact in modern life. Otherwise, you just get shunned.

Lucas: It seems to me in the context of a really holistic, messy, ongoing value alignment procedure, we’ll be aligning AI systems ethics, and morals, and moral systems, and behavior with that of a variety of cultures, and persons, and just interactions in the 21st Century. When we reflect upon the humans of the past, we can see in various ways that they are just moral monsters. We have issues with slavery, and today we have issues with factory farming, and voting rights, and tons of other things in history.

How should we view and think about aligning powerful systems, ethics, and goals with the current human morality, and preferences, and the risk of amplifying current things which are immoral in present day life?

Dylan: This is the idea of mistakenly locking in the wrong values, in some sense. I think it is something we should be concerned about less from the standpoint of entire … Well, no, I think yes  from the standpoint of entire cultures getting things wrong. Again, I think if we don’t think of their being as monolithic society that has a single value set, these problems are fundamental issues. What your local community thinks is okay versus what other local communities think are okay.

A lot of our society and a lot of our political structures about how to handle those clashes between value systems. My ideal for AI systems is that they should become a part of that normative process, and maybe not participate in them as people, but, also, I think, if we think of value alignment as a consistent ongoing messy process, there is … I think maybe that perspective lends itself less towards locking in values and sticking with them. It’s one train, you can look at the problem, which is we determine what’s right and what’s wrong when we program our system to do that.

Then, there’s another one, which is we program our system to be sensitive to what people think is right or wrong. I think that’s more the direction that I think of value alignment in. Then, what I think the final part of what you’re getting at here is that the system actually will feed back into people. What AI system show us will shape what we think is okay and vice versa. That’s something that I am quite frankly not sure how to handle. I don’t know how you’re going to influence what someone wants, and what they will perceive that they want, and how to do that, I guess, correctly.

All I can say is that we do have a human notion of what is acceptable manipulation. We do have a human notion of allowing someone to figure out for themselves what they think is right and not and refraining from biasing them too far. To some extent, if you’re able to value align with communities in a good ongoing holistic manner, that should also give you some ways to choose and understand what types of manipulations you may be doing that are okay or not.

Also, say that I think that this perspective has a very mundane analogy when you think of the feedback cycle between recommendation engines and regular people. Those systems don’t model the effect … Well, they don’t explicitly model the fact that they’re changing the structure of what people want and what they’ll want in the future. That’s probably not the best analogy in the world.

I guess what I’m saying is that it’s hard to plan for how you’re going to influence someone’s desires in the future. It’s not clear to me what’s right or what’s wrong. What’s true is that we, as humans, have a lot of norms about what types of manipulation are okay or not. You might hope that appropriately doing value alignment in that way might help get to an answer here.

Lucas: I’m just trying to get a better sense here. What I’m thinking about the role that like ethics and intelligence plays here, I view intelligence as a means of modeling the world and achieving goals, and ethics as the end towards which intelligence is aimed here. Now, I’m curious in terms of behavior modeling where inverse reinforcement learning agents are modeling, I guess, the behavior of human agents and, also, predicting the sorts of behaviors that they’d be taking in the future or in the situation, which the inverse reinforcement learning agent finds itself.

I’m curious to know where metaethics and moral epistemology fits in, where inverse reinforcement learning agents are finding themselves a novel ethical situations, and what their ability to handle those novel ethical situations are like. When they’re handling those situations how much does it look like them performing some normative and metaethical calculus based on the kind of moral epistemology that they have, or how much does it look like they’re using some other behavioral predictive system where they’re like modeling humans?

Dylan: The answer to that question is not clear. What does it actually mean to make decisions based on ethical framework or metaethical framework? I guess, we could start there. You and I know what that means, but our definition is encumbered by the fact that it’s pretty human-centric. I think we talk about it in terms of, “Well, I weighed this option. I looked at that possibility.” We don’t even really mean the literal sense of weighed in actually counted up, and constructed actual numbers, and multiplied them together in our heads.

What these are is they’re actually references to complex thought patterns that we’re going through. They’re fine whether or not those thought patterns are going on. The AI system, you can also talk about the difference between the process of making a decision and the substance of it. When an inverse reinforcement learning agent is going out into the world, the policy it’s following is constructed to try to optimize a set of inferred preferences, but does that means that the policy you’re outputting is making metaethical characterizations?

Well, the moment, almost certainly not because the systems we build are just not capable of that type of cognitive reasoning. I think the bigger question is, do you care? To some extent, you probably do.

Lucas: I mean, I’d care if I had some very deep disagreements with the metaethics that led to the preferences that were loaned and loaded to the machine. Also, if the machine were in such a new novel ethical situation that was unlike anything human beings had faced that just required some metaethical reasoning to deal with.

Dylan: Yes. I mean, I think you definitely wanted to take decisions that you would agree with or, at least, that you could be non-maliciously convinced to agree with. Practically, there isn’t a place in the theory where that shows up. It’s not clear that what you’re saying is that different from value alignment in particular. If I were to try to refine the point about metaethics, what it sounds to me like you’re getting at is an inductive bias that you’re looking for in the AI systems.

Arguably, ethics is about an argument of what inductive bias should we have as humans. I don’t think that that’s a first order of property in value alignment systems necessarily or in preference-based learning systems in particular. I would think that that kind of meta ethics, I think, comes in from value aligning to someone that has these sophisticated ethical ideas.

I don’t know where your thoughts about metaethics came from, but, at least, indirectly, we can probably trace them down to the values that your parents inculcated in you as a child. That’s how we build met ethics into your head if we want to think of you as being an AGI. I think that for AI systems, that’s the same way that I would see it being in there. I don’t believe the brain has circuits dedicated to metaethics. I think that exists in software, and in particular, something that’s being programmed into humans from their observational data, more so than from the structures that are built into us as a fundamental part of our intelligence or value alignment.

Lucas: We’ve also talked a bit about how human beings are potentially not fully rational agents. With inverse reinforcement learning, this leaves open the question as to whether or not AI systems are actually capturing what the human being actually prefers, or if there’s some limitations in the humans’ observed or chosen behavior, or explicitly told preferences like limits in that ability to convey what we actually most deeply value or would value given more information. These inverse reinforcement learning systems may not be learning what we actually value or what we think we should value.

How can AI systems assist in this evolution of human morality and preferences whereby we’re actually conveying what we actually value and what we would value given more information?

Dylan: Well, there are certainly two things that I heard in that question. One is, how do you just mathematically account for the fact that people are irrational, and that that is a property of the source of your data? Inverse reinforcement learning, at face value, doesn’t allow us to model that appropriately. It may lead us to make the wrong inferences. I think that’s a very interesting question. It’s probably the main one that I think about now as a technical problem is understanding, what are good ways to model how people might or might not be rational, and building systems that can appropriately interact with that complex data source.

One recent thing that I’ve been thinking about is, what happens if people, rather than knowing their objective, what they’re trying to accomplish, are figuring it out over time? This is the model where the person is a learning agent that discovers how they like states when they enter them, rather than thinking of the person as an agent that already knows what they want, and they’re just planning to accomplish that. I think these types of assumptions that try to paint a very, very broad picture of the space of things that people are doing can help us in that vein.

When someone is learning, it’s actually interesting that you can actually end up helping them. You end up with classic strategies that looks like it breaks down into three phases. You have initial exploration phase where you help the learning agent to get a better picture of the world, and the dynamics, and its associated rewards.

Then, you have another observation phase where you observe how that agent, now, takes advantage of the information that it’s got. Then, there’s an exploitation or extrapolation phase where you try to implement the optimal policy given the information you’ve seen so far. I think, moving towards more complex models that have a more realistic setting and richer set of assumptions behind them is important.

The other thing you talked about was about helping people discover their morality and learn more what’s okay and what’s not. There, I’m afraid I don’t have too much interesting to say in the sense that I believe it’s an important question, but I just don’t feel that I have many answers there.

Practically, if you have someone who’s learning their preferences over time, is that different than humans refining their moral theories? I don’t know. You could make mathematical modeling choices, so that they are. I’m not sure if that really gets at what you’re trying to point towards. I’m sorry that I don’t have anything more interesting to say on that front other than, I think, it’s important, and I would love to talk to more people who are spending their days thinking about that question because I think it really does deserve that kind of intellectual effort.

Lucas: Yeah, yeah. It sounds like we need some more AI moral psychologists to help us think about these things.

Dylan: Yeah. In particular, when talking about philosophy around value alignments and the ethics of value alignment, I think a really important question is, what are the ethics of developing value alignment systems? A lot of times, people talk about AI ethics from the standpoint of, for a lack of a better example, the trolley problem. The way they think about it is, who should the car kill? There is a correct answer or maybe not a correct answer, but there are answers that we could think of as more or less bad. AI, which one of those options should the AI select? That’s not unimportant, but it’s not the ethical question that an AI system designer is faced with.

In my mind, if you’re designing a self-driving car, the relevant questions you should be asking are two things: One, what do I think is an okay way to respond to different situations? Two, how is my system going to be understanding the preferences of the people involved in those situations? Then, three, how should I design my system in light of those two facts?

I have my own preferences about what I would like my system to do. I have an ethical responsibility, I would say, to make sure that my system is adapting to the preferences of its users to the extent that it can. I also wonder to what extent. How should you handle things when there are conflicts between those two value sets?

You’re building a robot. It’s going to go and live with an uncontacted human tribe. Should it respect the local cultural traditions and customs? Probably. That would be respecting the values of the users. Then, let’s say that that tribe does something that we would consider to be gross like pedophilia. Is my system required to participate wholesale in that value system? Where is the line that we would need to draw between unfairly imposing my values on system users and being able to make sure that the technology that I build isn’t used for purposes that I would deem reprehensible or gross?

Lucas: Maybe we should just put a dial in each of the autonomous cars that lets the user set it to deontology mode or utilitarianism mode as its racing down the highway. Yeah, I think this is the … I guess, an important role. I just think that metaethics is super important. I’m not sure if this is necessarily the case, but if fully autonomous systems are going to play a role where they’re resolving these ethical dilemmas for us, which I guess at some point eventually, if they’re going to be really actually autonomous and help to make the world a much better place seems necessary.

I guess, this feeds into my next question where I’m wondering where we probably both have different assumptions about this, but what the role of inverse reinforcement learning is ultimately? Is it just to allow AI system to evolve alongside us and to match current ethics or is it to allow the systems to ultimately surpass us and move far beyond us into the deep future?

Dylan: Inverse reinforcement learning, I think, is much more about the first and the second. I think it can be a part of how you get to the second and how you improve. For me, when I think about these problems technically, I try to think about matching human morality as the goal.

Lucas: Except for the factory farming and stuff.

Dylan: Well, I mean, if you had a choice between, thinks that eradicating all humans is okay and against farming versus neutral about factory farming and thinks that are eradicating all humans aren’t okay, which would you pick? I mean, I guess, with your audience that there are maybe some people that would choose the saving the animals answer.

My point is that, I think, it’s so hard for me. Technically, I think it’s very hard to imagine getting these normative aspects of human societies and interaction right. I think, just hoping to participate in that process in a way that is analogous to how people do normally is a good step. I think we probably, to the extent that we can, should probably not have AI systems trying to figure out if it’s okay to do factory farming and to the extent that we can …

I think that it’s so hard to understand what it means to even match human morality or participate in it that, for me, the concept of surpassing, it feels very, very challenging and fraught. I would worry, as a general concern, that as a system designer who doesn’t necessarily represent the views and interest of everyone, that by programming in surpassing humanity or surpassing human preferences or morals, what I’m actually doing is just programming in my morals and ethical beliefs.

Lucas: Yes. I mean, there seems to be this strange issue here where it seems like if we get AGI, and recursive self-improvement is a thing that really takes it off, so that we have a system who has potentially succeeded in its inverse reinforcement learning, but far surpassed human beings and its general intelligence. We have a superintelligence that’s matching human morality. It just seems like a funny situation where we’d really have to pull the brakes. I guess, as William MacAskill mentions have a really, really long deliberation about ethics, and moral epistemology, and value. How do you view that?

Dylan: I think that’s right. I mean, I think there are some real questions about who should be involved in that conversation. For instance, I actually even think it’s … Well, one thing I’d say is that you should recognize that there’s a difference between having the same morality and having the same data. One way to think about it is that people who are against factory farming have a different morality than the rest of the people.

Another one is that they actually just have exposure to the information that allows their morality to come to a better answer. There’s this confusion you can make between the objective that someone has and the data that they’ve seen so far. I think, one point would be to think that a system that has current human morality but access to a vast, vast wealth of information may actually do much better than you might think. I think, we should leave that open as a possibility.

For me, this is less about morality in particular, and more just about power concentration, and how much influence you have over the world. I mean, if we imagine that there was something like a very powerful AI system that was controlled by a small number of people, yeah, you better think freaking hard before you tell that system what to do. That’s related to questions about ethical ramifications on metaethics, and generalization, and what we actually truly value as humans. What is also super true for all of the more mundane things in the day to day as well. Did that make sense?

Lucas: Yeah, yeah. It totally makes sense. I’m becoming increasingly mindful of your time here. I just wanted to hit a few more questions if that’s okay before I let you go.

Dylan: Please, yeah.

Lucas: Yeah. I’m wondering, would you like to, or do you have any thoughts on how coherent extrapolated volition fits into this conversation and your views on it?

Dylan: What I’d say is I think coherent extrapolated volition is an interesting idea and goal.

Lucas: Where it is defined as?

Dylan: Where it’s defined as a method of preference aggregation. Personally, I’m a little weary of preference aggregation approaches. Well, I’m weary of imposing your morals on someone indirectly via choosing the method of preference aggregation that we’re going to use. I would-

Lucas: Right, but it seems like, at some point, we have to make some metaethical decision, or else, we’ll just forever be lost.

Dylan: Do we have to?

Lucas: Well, some agent does.

Dylan: My-

Lucas: Go ahead.

Dylan: Well, does one agent have to? Did one agent decide on the ways that we were going to do preference aggregation as a society?

Lucas: No. It naturally evolved out of-

Dylan: It just naturally evolved via a coordination and argumentative process. For me, my answer to … If you force me to specify something about how we’re going to do value aggregation, if I was controlling the values for an AGI system, I would try to say as little as possible about the way that we’re going to aggregate values because I think we don’t actually understand that process much in humans.

Lucas: Right. That’s fair.

Dylan: Instead, I would opt for a heuristic of to the extent that we can devote equal optimization effort towards every individual, and allow that parliament, if you will, to determine the way the value should be aggregated. This doesn’t necessarily mean having an explicit value aggregation mechanism that gets set in stone. This could be an argumentative process mediated by artificial agents arguing on your behalf. This could be futuristic AI-enabled version of the court system.

Lucas: It’s like an ecosystem of preferences and values in conversation?

Dylan: Exactly.

Lucas: Cool. We’ve talked a little bit about the deep future here now with where we’re reaching around potentially like AGI or artificial superintelligence. After, I guess, inverse reinforcement learning is potentially solved, is there anything that you view that comes after inverse reinforcement learning in these techniques?

Dylan: Yeah. I mean, I think inverse reinforcement learning is certainly not the be-all, end-all. I think what it is, is it’s one of the earliest examples in AI of trying to really look at preference solicitation, and modeling preferences, and learning preferences. It existed in a whole bunch of … economists have been thinking about this for a while already. Basically, yeah, I think there’s a lot to be said about how you model data and how you learn about preferences and goals. I think inverse reinforcement learning is basically the first attempt to get at that, but it’s very far from the end.

I would say the biggest thing in how I view things that is maybe different from your standard reinforcement learning, inverse reinforcement learning perspective is that I focus a lot on, how do you act given what you’ve learned from inverse reinforcement learning. Inverse reinforcement learning is a pure inference problem. It’s just figure out what someone wants. I ground that out in all of our research in take actions to help someone, which introduces a new set of concerns and questions.

Lucas: Great. It looks like we’re about at the end of the hour here. I guess, if anyone here is interested in working on this technical portion of the AI alignment problem, what do you suggest they study or how do you view that it’s best for them to get involved, especially if they want to work on inverse reinforcement learning and inferring human preferences?

Dylan: I think if you’re an interested person, and you want to get into technical safety work, the first thing you should do is probably read Jan Leike’s recent write up in 80,000 Hours. Generally, what I would say is, try to get involved in AI research flat. Don’t focus as much on trying to get into AI safety research, and just generally focus more on acquiring the skills that will support you in doing good AI research. Get a strong math background. Get a research advisor who will advise you on doing research projects, and help teach you the process of submitting papers, and figuring out what the AI research community is going to be interested in.

In my experience, one of the biggest pitfalls that early researchers make is focusing too much on what they’re researching rather than thinking about who they’re researching with, and how they’re going to learn the skills that will support doing research in the future. I think that most people don’t appreciate how transferable research skills are to the extent that you can try to do research on technical AI safety, but more work on technical AI. If you’re interested in safety, the safety connections will be there. You may see how a new area of AI actually relates to it, supports it, or you may find places of new risks, and be in a good position to try to mitigate that and take steps to alleviate those harms.

Lucas: Wonderful. Yeah, thank you so much for speaking with me today, Dylan. It’s really been a pleasure, and it’s been super interesting.

Dylan: It was a pleasure talking to you. I love the chance to have these types of discussions.

Lucas: Great. Thanks so much. Until next time.

Dylan: Until next time. Thanks a blast.

Lucas: If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back soon with another episode in this new AI alignment series.

[end of recorded material]

Podcast: Navigating AI Safety – From Malicious Use to Accidents

Is the malicious use of artificial intelligence inevitable? If the history of technological progress has taught us anything, it’s that every “beneficial” technological breakthrough can be used to cause harm. How can we keep bad actors from using otherwise beneficial AI technology to hurt others? How can we ensure that AI technology is designed thoughtfully to prevent accidental harm or misuse?

On this month’s podcast, Ariel spoke with FLI co-founder Victoria Krakovna and Shahar Avin from the Center for the Study of Existential Risk (CSER). They talk about CSER’s recent report on forecasting, preventing, and mitigating the malicious uses of AI, along with the many efforts to ensure safe and beneficial AI.

Topics discussed in this episode include:

  • the Facebook Cambridge Analytica scandal,
  • Goodhart’s Law with AI systems,
  • spear phishing with machine learning algorithms,
  • why it’s so easy to fool ML systems,
  • and why developing AI is still worth it in the end.
In this interview we discuss The Malicious Use of Artificial Intelligence: Forecasting, Prevention and Mitigation, the original FLI grants, and the RFP examples for the 2018 round of FLI grants. This podcast was edited by Tucker Davey. You can listen to it above or read the transcript below.

 

Ariel: The challenge is daunting and the stakes are high. So ends the executive summary of the recent report, The Malicious Use of Artificial Intelligence: Forecasting, Prevention and Mitigation. I’m Ariel Conn with the Future of Life Institute, and I’m excited to have Shahar Avin and Victoria Krakovna joining me today to talk about this report along with the current state of AI safety research and where we’ve come in the last three years.

But first, if you’ve been enjoying our podcast, please make sure you’ve subscribed to this channel on SoundCloud, iTunes, or whatever your favorite podcast platform happens to be. In addition to the monthly podcast I’ve been recording, Lucas Perry will also be creating a new podcast series that will focus on AI safety and AI alignment, where he will be interviewing technical and non-technical experts from a wide variety of domains. His upcoming interview is with Dylan Hadfield-Menell, a technical AI researcher who works on cooperative inverse reinforcement learning and inferring human preferences. The best way to keep up with new content is by subscribing. And now, back to our interview with Shahar and Victoria.

Shahar is a Research Associate at the Center for the Study of Existential Risk, which I’ll be referring to as CSER for the rest of this podcast, and he is also the lead co-author on the Malicious Use of Artificial Intelligence report. Victoria is a co-founder of the Future of Life Institute and she’s a research scientist at DeepMind working on technical AI safety.

Victoria and Shahar, thank you so much for joining me today.

Shahar: Thank you for having us.

Victoria: Excited to be here.

Ariel: So I want to go back three years, to when FLI started our grant program, which helped fund this report on the malicious use of artificial intelligence, and I was hoping you could both talk for maybe just a minute or two about what the state of AI safety research was three years ago, and what prompted FLI to take on a lot of these grant research issues — essentially what prompted a lot of the research that we’re seeing today? Victoria, maybe it makes sense to start with you quickly on that.

Victoria: Well three years ago, AI safety was less mainstream in the AI research community than it is today, particularly long-term AI safety. So part of what FLI has been working on and why FLI started this grant program was to stimulate more work into AI safety and especially its longer-term aspects that have to do with powerful general intelligence, and to make it a more mainstream topic in the AI research field.

Three years ago, there were fewer people working in it, and many of the people who were working in it were a little bit disconnected from the rest of the AI research community. So part of what we were aiming for with our Puerto Rico conference and our grant program, was to connect these communities better, and to make sure that this kind of research actually happens and that the conversation shifts from just talking about AI risks in the abstract to actually doing technical work, and making sure that the technical problems get solved and that we start working on these problems well in advance before it is clear that, let’s say general AI, would appear soon.

I think part of the idea with the grant program originally, was also to bring in new researchers into AI safety and long-term AI safety. So to get people in the AI community interested in working on these problems, and for those people whose research was already related to the area, to focus more on the safety aspects of their research.

Ariel: I’m going to want to come back to that idea and how far we’ve come in the last three years, but before we do that, Shahar, I want to ask you a bit about the report itself.

So this started as a workshop that Victoria had also actually participated in last year and then you’ve turned it into this report. I want you to talk about what prompted that and also this idea that’s mentioned in the report is that, no one’s really looking at how artificial intelligence could be used maliciously. And yet what we’ve seen with every technology and advance that’s happened throughout history, I can’t think of anything that people haven’t at least attempted to use to cause harm, whether they’ve succeeded or not, I don’t know if that’s always the case, but almost everything gets used for harm in some way. So I’m curious why there haven’t been more people considering this issue yet?

Shahar: So going to back to maybe a few months before the workshop, which as you said was February 2017. Both Miles Brundage at the Future of Humanity Institute and I at the Center for the Study of Existential Risk, had this inkling that there were more and more corners of malicious use of AI that were being researched, people were getting quite concerned. We were in discussions with the Electronic Frontier Foundation about the DARPA Cyber Grand Challenge and progress being made towards the use of artificial intelligence in offensive cybersecurity. I think Miles was very well connected to the circle who were looking at lethal autonomous weapon systems and the increasing use of autonomy in drones. And we were both kind of — stories like the Facebook story that has been in the news recently, there were kind of the early versions of that coming up already back then.

So it’s not that people were not looking at malicious uses of AI, but it seemed to us that there wasn’t this overarching perspective that is not looking at particular domains. This is not, “what will AI do to cybersecurity in terms of malicious use? What will malicious use of AI look like in politics? What do malicious use of AI look like in warfare?” But rather across the board, if you look at this technology, what new kinds of malicious actions does it enable, and other commonalities across those different domains. Plus, it seemed that that “across the board” more technology-focused perspective, other than “domain of application” perspective, was something that was missing. And maybe that’s less surprising, right? People get very tied down to a particular scenario, a particular domain that they have expertise on, and from the technologists’ side, many of them just wouldn’t know all of the legal minutiae of warfare, or — one thing that we found was there weren’t enough channels of communication between the cybersecurity community and the AI research community; similarly the political scientists and the AI research community. So it did require quite an interdisciplinary workshop to get all of these things on the table, and tease out some the commonalities, which is what we then try to do with the report.

Ariel: So actually, you mentioned the Facebook thing and I was a little bit curious about that. Does that fall under the umbrella of this report or is that a separate issue?

Shahar: It’s not clear if it would fall directly under the report, because the way we define malicious could be seen as problematic. It’s the best that we could do with this kind of report, which is to say that there is a deliberate attempt to cause harm using the technology. It’s not clear, whether in the Facebook case, there was a deliberate attempt to cause harm or whether there was disregard of harm that could be caused as a side effect, or just the use of this in an arena that there are legitimate moves, just some people realize that the technology can be used to gain an upper hand within this arena.

But, there are whole scenarios that sit just next to it, that look very similar, but that are centralized use of this kind of surveillance, diminishing privacy, potentially the use of AI to manipulate individuals, manipulate their behavior, target messaging at particular individuals.

There are clearly imaginable scenarios in which this is done maliciously to keep a corrupt government in power, to overturn a government in another nation, kind of overriding the self-determination of the members of their country. There are not going to be clear rules about what is obviously malicious and what is just part of the game. I don’t know where to put Facebook’s and Cambridge Analytica’s case, but there are clearly cases that I think universally would be considered as malicious that from the technology side look very similar.

Ariel: So this gets into a quick definition that I would like you to give us and that is for the term ‘dual use.’ I was at a conference somewhat recently and a government official who was there, not a high level, but someone who should have been familiar with the term ‘dual use’ was not. So I would like to make sure that we all know what that means.

Shahar: So I’m not, of course, a legal expert, but the term did come up a lot in the workshop and in the report. ‘Dual use,’ as far as I can understand it, refers to technologies or materials that both have peace-time or peaceful purposes and uses, but also wartime, or harmful uses. A classical example would be certain kinds of fertilizer that could be used to grow more crops, but could also be used to make homegrown explosives. And this matters because you might want to regulate explosives, but you definitely don’t want to limit people’s access to get fertilizer and so you’re in a bind. How do you make sure that people who have a legitimate peaceful use of a particular technology or material get to have that access without too much hassle that will increase the cost or make things more burdensome, but at the same time, make sure that malicious actors don’t get access to capabilities or technologies or materials that they can use to do harm.

I’ve also heard the term ‘omni use,’ being referred to artificial intelligence, this is the idea that technology can have so many uses across the board that regulating it because of its potential for causing harm comes at a very, very high price, because it is so foundational for so many other things. So one can think of electricity: it is true that you can use electricity to harm people, but vetting every user of the electric grid before they are allowed to consume electricity, seems very extreme, because there is so much benefit to be gained from just having access to electricity as a utility, that you need to find other ways to regulate. Computing is often considered as ‘omni use’ and it may well be that artificial intelligence is such a technology that would just be foundational for so many applications that it will be ‘omni use,’ and so the way to stop malicious actors from having access to it is going to be fairly complicated, but it’s probably not going to be any kind of a heavy-handed regulation.

Ariel: Okay. Thank you. So going back a little bit to the report more specifically, I don’t know how detailed we want to get with everything, but I was hoping you could touch a little bit on a few of the big topics that are in the report. For example, you talk about changes in the landscape of threats, where there is an expansion of existing threats, there’s an intro to new threats, and typical threats will be modified. Can you speak somewhat briefly as to what each of those mean?

Shahar: So I guess what I was saying, the biggest change is that machine learning, at least in some domains, now works. That means that you don’t need to have someone write out the code in order to have a computer that is performant at the particular task, if you can have the right kind of labeled data or the right kind of simulator in which you can train an algorithm to perform that action. That means that, for example, if there is a human expert with a lot of tacit knowledge in a particular domain, let’s say the use of a sniper rifle, it may be possible to train a camera that sits on top of a rifle, coupled with a machine learning algorithm that does the targeting for you, so that now any soldier becomes as expert as an expert marksman. And of course, the moment you’ve trained this model once, making copies of it is essentially free or very close to free, the same as it is with software.

Another is the ability to go through very large spaces of options and using some heuristics to more effectively search through that space for effective solutions. So one example of that would be AlphaGo, which is a great technological achievement and has absolutely no malicious use aspects, but you can imagine as an analogy, similar kinds of technologies being used to find weaknesses in software, discovering vulnerabilities and so on. And I guess, finally, one example we’ve seen that came up a lot, is the capabilities in machine vision. The fact that you can now look at an image and tell what is in that image, through training, which is something that computers were just not able to do a decade ago, at least nowhere near human levels of performance, starts unlocking potential threats both in autonomous targeting, say on top of drones, but also in manipulation. If I can know whether a picture is a good representation of something or not, then my ability to create forgeries significantly increases. This is the technology of generative adversarial networks, that we’ve seen used to create fake audio and potentially fake videos in the near future.

All of these new capabilities, plus the fact that access to the technology is becoming — I mean these technologies are very democratized at the moment. There are papers on arXiv, there are good tutorials on You Tube. People are very keen to have more people join the AI revolution, and for good reason, plus the fact that moving these trained models around is very cheap. It’s just the cost of copying the software around, and the computer that is required to run those models is widely available. This suggests that the availability of these malicious capabilities is going to rapidly increase, and that the ability to perform certain kinds of attacks would no longer be limited to a few humans, but would become much more widespread.

Ariel: And so I have one more question for you, Shahar, and then I’m going to bring Victoria back in. You’re talking about the new threats, and this expansion of threats and one of the things that I saw in the report that I’ve also seen in other issues related to AI is, we’ve had computers around for a couple decades now, we’re used to issues pertaining to phishing or hacking or spam. We recognize computer vulnerabilities. We know these are an issue. We know that there’s lots of companies that are trying to help us defend our computers against malicious cyber attacks, stuff like that. But one of the things that you get into in the report is this idea of “human vulnerabilities” — that these attacks are no longer just against the computers, but they are also going to be against us.

Shahar: I think for many people, this has been one of the really worrying things about the Cambridge Analytica, Facebook issue that is in the news. It’s the idea that because of our particular psychological tendencies, because of who we are, because of how we consume information, and how that information shapes what we like and what we don’t like, what we are likely to do and what we are unlikely to do, the ability of the people who control the information that we get, gives them some capability to control us. And this is not new, right?

People who are making newspapers or running radio stations or national TV stations, have known for a very long time, that the ability to shape the message is the ability to influence people’s decisions. But coupling that with algorithms that are able to run experiments on millions or billions of people simultaneously with very tight feedback loops — so you make a small change in the feed of one individual and see whether their behavior changes. And you can run many of these experiments and you can get very good data, is something that was never available at the age of broadcasts. To some extent, it was available in the age of software. When software starts moving into big data and big data analytics, the boundaries start to blur between those kinds of technologies and AI technologies.

This is the kind of manipulation that you seem to be asking about that we definitely flag in the report, both in terms of political security, the ability of large communities to govern themselves in a way that they find to truthfully represent their own preferences, but also, on a more small scale, with the social side of cyber attacks. So, if I can manipulate an individual, or a few individuals in a company to disclose their passwords or to download or click a link that they shouldn’t have, through modeling of their preferences and their desires, then that is a way in that might be a lot easier than trying to break the system through its computers.

Ariel: Okay, so one other thing that I think I saw come up, and I started to allude to this — there’s, like I said, the idea that we can defend our computers against attacks and we can upgrade our software to fix vulnerabilities, but then how do we sort of “upgrade” people to defend themselves? Is that possible? Or is it a case of we just keep trying to develop new software to help protect people?

Shahar: I think the answer is both. One thing that did come up a lot is, unfortunately unlike computers, you cannot just download a patch to everyone’s psychology. We have slow processes of doing that. So we can incorporate parts of what is a trusted computer, what is a trusted source, into the education system and get people to be more aware of the risks. You can definitely design the technology such that it makes a lot more explicit where it’s vulnerabilities and where it’s more trusted parts are, which is something that we don’t do very well at the moment. The little lock on the browser is kind of the high end of our ability to design systems to disclose where security is and why it matters, and there is much more to be done here, because just awareness of the amount of vulnerability is very low.

So there is some more probably that we can do with education and with notifying the public, but it also should be expected that this ability is limited, and it’s also, to a large extent, an unfair burden to put on the population at large. It is much more important, I think, that the technology is being designed in the first place, to as much as possible be explicit and transparent about its levels of security, and if those levels of security are not high enough, then that in turn should lead for demands for more secure systems.

Ariel: So one of the things that came up in the report that I found rather disconcerting, was this idea of spear phishing. So can you explain what that is?

Shahar: We are familiar with phishing in general, which is when you pretend to be someone or something that you’re not in order to gain your victim’s trust and get them to disclose information that they should not be disclosing to you as a malicious actor. So you could pretend to be the bank and ask them to put in their username and password, and now you have access to their bank account and can transfer away their funds. If this is part of a much larger campaign, you could just pretend to be their friend, or their secretary, or someone who wants to give them a prize, get them to trust you, get one of the passwords that maybe they are using, and maybe all you do with that is you use that trust to talk to someone else who is much more concerned. So now that I have the username and password, say for the email or the Facebook account of some low-ranking employee in a company, I can start messaging their boss and pretending to be them and maybe get even more passwords and more access through that.

Phishing is usually kind of a “spray and pray” approach. You have a, “I’m a Nigerian prince, I have all of this money stocked in Africa, I’ll give you a cut if you help me move it out of the country, you need to send me some money.” You send this to millions of people, and maybe one or two fall for it. The cost for the sender is not very high, but the success rate is also very, very low.

Spear phishing on the other hand, is when you find a particular target, and you spend quite a lot of time profiling them and understanding what their interests are, what their social circles are, and then you craft a message that is very likely to work on them, because it plays to their ego, it plays to their normal routine, it plays on their interests and so on.

In the report we talk about this research by ZeroFOX, where they took a very simple version of this. They said, let’s look at what people tweet about, we’ll take that as an indication of the stuff that they’re interested in. We will train a machine learning algorithm to create a model of the topics that people are interested in, form the tweets, craft a malicious tweet that is based on those topics of interest and have that be a link to a malicious site. So instead of sending kind of generally, “Check this out, super cool website,” with a link to a malicious website most people know not to click on, it will be, “Oh, you are clearly interested in sports in this particular country, have you seen what happened, like the new hire in this team?” Or, “You’re interested in archeology, crazy new report about recent finds in the pyramids,” or something. And what they showed was that, once that they’ve kind of created the bot, that bot then crafted targeted messages, those spear phishing messages, to a large number of users, and in principle they could scale it up indefinitely because now it’s software, and the click through rate was very high. I think it was something like 30 percent, which is orders of magnitude more than you get with phishing.

So automating spear phishing changes what used to be a trade off between spray and pray, target millions of people, but very few of them would click on it, or spear phishing where you target only a few individuals with very high success rates — now you can target millions of people and customize the message to each one so you have high success rates for all of them. Which means that, you and me, who previously wouldn’t be very high on the target list for cyber criminals or other cyber attackers can now become targets simply because the cost is very low.

Ariel: So the cost is low, I don’t think I’m the only person who likes to think that I’m pretty good at recognizing sort of these phishing scams and stuff like that. I’m assuming these are going to also become harder for us to identify?

Shahar: Yep. So the idea is that the moment you have access to people’s data, because they’re explicit on social media about their interests and about their circles of friends, then the better you get at crafting messages and, say, comparing them to authentic messages from people, and saying, “oh this is not quite right, we are going to tweak the algorithm until we get something that looks a lot like something a human would write.” Quite quickly you could get to the point where computers are generating, say, to begin with texts that are indistinguishable from what a human would write, but increasingly also images, audio segments, maybe entire websites. As long as the motivation or the potential for profit is there, it seems like the technology, either the ones that we have now or the ones that we can foresee in the five years, would allow these kinds of advances to take place.

Ariel: Okay. So I want to touch quickly on the idea of adversarial examples. There was an XKCD cartoon that came out a week or two ago about self driving cars and the character says, “I worry about self driving car safety features, what’s to stop someone from painting fake lines on the road or dropping a cutout of a pedestrian onto a highway to make cars swerve and crash,” and then realizes all of those things would also work on human drivers. Sort of a personal story, I used to live on a street called Climax and I actually lived at the top of Climax, and I have never seen a street sign stolen more in my life, it was often the street sign just wasn’t there. So my guess is it’s not that hard to steal a stop sign if someone really wanted to mess around with drivers, and yet we don’t see that happen very often.

So I was hoping both of you could weigh in a little bit on what you think artificial intelligence is going to change about these types of scenarios where it seems like the risk will be higher for things like adversarial examples versus just stealing a stop sign.

Victoria: I agree that there is certainly a reason for optimism in the fact that most people just aren’t going to mess with the technology, that there aren’t that many actual bad actors out there who want to mess it up. On the other hand, as Shahar said earlier, democratizing both the technology and the ways to mess with it, to interfere with it, does make that more likely. For example, the ways in which you could provide adversarial examples to cars, can be quite a bit more subtle than stealing a stop sign or dropping a fake body on the road or anything like that. For example, you can put patches on a stop sign that look like noise or just look like rectangles in certain places and humans might not even think to remove them, because to humans they’re not a problem. But an autonomous car might interpret that as a speed limit sign instead of a stop sign, and similarly, more generally people can use adversarial patches to fool various vision systems, for example if they don’t want to be identified by a surveillance camera or something like that.

So a lot of these methods, people can just read about it online, there are papers in arXiv and I think the fact that they are so widely available might make it easier for people to interfere with technology more, and basically might make this happen more often. It’s also the case that the vulnerabilities of AI are different than the vulnerabilities of humans, so it might lead to different ways that it can fail that humans are not used to, and ways in which humans would not fail. So all of these things need to be considered, and of course, as technologists, we need to think about ways in which things can go wrong, whether it is presently highly likely, or not.

Ariel: So that leads to another question that I want to ask, but before I go there, Shahar, was there anything you wanted to add?

Shahar: I think that covers almost all of the basics, but I’d maybe stress a couple of these points. One thing about machines failing in ways that are different from how humans fail, it means that you can craft an attack that would only mess up a self driving car, but wouldn’t mess up a human driver. And that means let’s say, you can go in the middle of the night and put some stickers on and you are long gone from the scene by the time something bad happens. So this diminished ability to attribute the attack, might be something that means that more people feel like they can get away with it.

Another one is that we see people much more willing to perform malicious or borderline acts online. So it’s important, I mean we often talk about adversarial examples as things that affect vision systems, because that’s where a lot of the literature is, but it is very likely — in fact, there are several examples that also things like anomaly detection that uses machine learning patterns, malicious code detection that is based on machine-learned patterns, anomaly detection in networks and so on, all of these have their kinds of adversarial examples as well.  And so thinking about adversarial examples against defensive systems and adversarial examples against systems that are only available online, brings us back to one attacker somewhere in the world could have access to your system and so the fact that most people are not attackers doesn’t really help you defense-wise.

Ariel: And, so this whole report is about how AI can be misused, but obviously the AI safety community and AI safety research goes far beyond that. So especially in the short term, do you see misuse or just general safety and design issues to be a bigger deal?

Victoria: I think it is quite difficult to say which of them would be a bigger deal. I think both misuse and accidents are something that are going to increase in importance and become more challenging and these are things that we really need to be working on as a research community.

Shahar: Yeah, I agree. We wrote this report not because we don’t think accident risk and safety risk matters are important — we think they are very important. We just thought that there was some pretty good technical reports out there outlining the risks from accident with near-term machine learning and with long-term and some of the researching that could be used to address them, and we felt like a similar thing was missing for misuse, which was why we wrote that report.

Both are going to be very important, and to some extent there is going to be an interplay. It is possible that systems that are more interpretable are also easier to secure. It might be the case that if there is some restriction in the diffusion of capabilities that also means that there is less incentive to cut corners to out-compete someone else by skimping on safety and so on. So there are strategic questions across both misuse and accidents, but I agree with Victoria, probably if we don’t do our job, we are just going to see more and more of both of these categories causing harm in the world, and more reason to work on both of them. I think both fields need to grow.

Victoria: I just wanted to add, a common cause of both accident risks and misuse risks that might happen in the future is just that these technologies are advancing quickly and there are often unforeseen and surprising ways in which they can fail, either by accident or by having vulnerabilities that can be misused by bad actors. And so as the technology continues to advance quickly we really need to be on the lookout for new ways that it can fail, new accidents but also new ways in which it can be used for harm by bad actors.

Ariel: So one of the things that I got out of this report, and that I think is also coming through now is, it’s kind of depressing. And I found myself often wondering … So at FLI, especially now we’ve got the new grants that are focused more on AGI, we’re worried about some of these bigger, longer-term issues, but with these shorter-term things, I sometimes find myself wondering if we’re even going to make it to AGI, or if something is going to happen that prevents that development in some way. So I was hoping you could speak to that a little bit.

Shahar: Maybe I’ll start with the Malicious Use report, and apologize for its somewhat gloomy perspective. So it should probably be mentioned that, I think almost all of the authors of the report are somewhere between fairly and very optimistic about artificial intelligence. So it’s much more the fact that we see this technology going, we want to see it developed quickly, at least in various narrow domains that are of very high importance, like medicine, like self driving cars — I’m personally quite a big fan. We think that the best way to, if we can foresee and design around or against the misuse risks, then we will eventually end up with a technology that it is more mature, that is more acceptable, that is more trusted because it is trustworthy, because it is secure. We think it is going to be much better to plan for these things in advance.

It is also, again, say we use electricity as an analogy, if I just sat down at the beginning of the age of electricity and I wrote a report about how many people were going to be electrocuted, it would look like a very sad thing. And it’s true, there has been a rapid increase in the number of people who die from electrocution compared to before the invention of electricity and much safety has been built since then to make sure that that risk is minimized, but of course, the benefits have far, far, far outweighed the risks when it comes to electricity and we expect, probably, hopefully, if we take the right actions, like we lay out in the report, then the same is going to be true for misuse risk for AI. At least half of the report, all of Appendix B and a good chunk of the parts before it, talk about what we can do to mitigate those risks, so hopefully the message is not entirely doom and gloom.

Victoria: I think that the things we need to do remain the same no matter how far away we expect these different developments to happen. We need to be looking out for ways that things can fail. We need to be thinking in advance about ways that things can fail, and not wait until problems show up and we actually see that they’re happening. Of course, we often will see problems show up, but in these matters an ounce of prevention can be worth a pound of cure, and there are some mistakes that might just be too costly. For example, if you have some advanced AI that is running the electrical grid or the financial system, we really don’t want that thing to, hack its reward function.

So there are various predictions about how soon different transformative developments of AI might happen and it is possible that things might go awry with AI before we get to general intelligence and what we need to do is basically work hard to try to prevent these kinds of accidents or misuse from happening and try to make sure that AI is ultimately beneficial, because the whole point of building it is because it would be able to solve big problems that we cannot solve by ourselves. So let’s make sure that we get there and that we sort of handle this with responsibility and foresight the whole way.

Ariel: I want to go back to the very first comments that you made about where we were three years ago. How have things changed in the last three years and where do you see the AI safety community today?

Victoria: In the last three years, we’ve seen the AI safety research community get a fair bit bigger and topics of AI safety have become more mainstream, so I will say that long-term AI safety is definitely less controversial and there are more people engaging with the questions and actually working on them. While near-term safety, like questions of fairness and privacy and technological unemployment and so on, I would say that’s definitely mainstream at this point and a lot of people are thinking about that and working on that.

In terms of long term AI safety or AGI safety we’ve seen teams spring up, for example, both DeepMind and OpenAI have a safety team that’s focusing on these sort of technical problems, which includes myself on the DeepMind side. There have been some really interesting bits of progress in technical AI safety. For example, there has been some progress in reward learning and generally value learning. For example, the cooperative inverse reinforcement learning work from Berkeley. There has been some great work from MIRI on logical induction and quantilizing agents and that sort of thing. There have been some papers at mainstream machine learning conferences that focus on technical AI safety, for example, there was an interruptibility paper at NIPS last year and generally I’ve been seeing more presence of these topics in the big conferences, which is really encouraging.

On a more meta level, it has been really exciting to see the Concrete Problems in AI Safety research agenda come out two years ago. I think that’s really been helpful to the field. So these are only some of the exciting advances that have happened.

Ariel: Great. And so, Victoria, I do want to turn now to some of the stuff about FLI’s newest grants. We have an RFP that included quite a few examples and I was hoping you could explain at least two or three of them, but before we get to that if you could quickly define what artificial general intelligence (AGI) is, what we mean when we refer to long-term AI? I think those are the two big ones that have come up so far.

Victoria: So, artificial general intelligence is this idea of an AI system that can learn to solve many different tasks. Some people define this in terms of human-level intelligence as an AI system that will be able to learn to do all human jobs, for example. And this contrasts to the kind of AI systems that we have today which we could call “narrow AI,” in the sense that they specialize in some task or class of tasks that they can do.

So, for example Alpha Zero is a system that is really good at various games like Go and Chess and so on, but it would not be able to, for example, clean up a room, because that’s not in its class of tasks. While if you look at human intelligence we would say that humans are our go-to example of general intelligence because we can learn to do new things, we can adapt to new tasks and new environments that we haven’t seen before and we can transfer our knowledge that we have acquired through previous experience, that might not be in exactly the same settings, to whatever we are trying to do at the moment.

So, AGI is the idea of building an AI system that is also able to do that — not necessarily in the same way as humans, like it doesn’t necessarily have to be human-like to be able to perform the same tasks, or it doesn’t have to be structured the way a human mind is structured. So the definition of AGI is about what it’s capable of rather than how it can do those things. I guess the emphasis there is on the word general.

In terms of the FLI grant program this year, it is specifically focused on the AGI safety issue, which we also call long-term AI safety. Long term here doesn’t necessarily mean that it’s 100 years away. We don’t know how far away AGI actually is; the opinions of experts vary quite widely on that. But it’s more emphasizing that it’s not an immediate problem in the sense that we don’t have AGI yet, but we are trying to foresee what kind of problems might happen with AGI and make sure that if and when AGI is built that it is as safe and aligned with human preferences as possible.

And in particular as a result of the mainstreaming of AI safety that has happened in the past two years, partly, as I like to think, due to FLI’s efforts, at this point it makes sense to focus on long-term safety more specifically since this is still the most neglected area in the AI safety field. I’ve been very happy to see lots and lots of work happening these days on adversarial examples, fairness, privacy, unemployment, security and so on.  I think this allows us to really zoom in and focus on AGI safety specifically to make sure that there’s enough good technical work going on in this field and that the big technical problems get as much progress as possible and that the research community continues to grow and do well.

In terms of the kind of problems that I would want to see solved, I think some of the most difficult problems in AI safety that sort of feed into a lot of the problem areas that we have are things like Goodhart’s Law. Goodhart’s Law is basically that, when a metric becomes a target, it ceases to be a good metric. And the way this applies to AI is that if we make some kind of specification of what objective we want the AI system to optimize for — for example this could be a reward function, or a utility function, or something like that — then, this specification becomes sort of a proxy or a metric for our real preferences, which are really hard to pin down in full detail. Then if the AI system explicitly tries to optimize for the metric or for that proxy, for whatever we specify, for the reward function that we gave, then it will often find some ways to follow the letter but not the spirit of that specification.

Ariel: Can you give a real life example of Goodhart’s Law today that people can use as an analogy?

Victoria: Certainly. So Goodhart’s Law was not originally coined in AI. This is something that generally exists in economics and in human organizations. For example, if employees at a company have their own incentives in some way, like they are incentivized to clock in as many hours as possible, then they might find a way to do that without actually doing a lot of work. If you’re not measuring that then the number of hours spent at work might be correlated with how much output you produce, but if you just start rewarding people for the number of hours then maybe they’ll just play video games all day, but they’ll be in the office. That could be a human example.

There are also a lot of AI examples these days of reward functions that turn out not to give good incentives to AI systems.

Ariel: For a human example, would the issues that we’re seeing with standardized testing be an example of this?

Victoria: Oh, certainly, yes. I think standardized testing is a great example where when students are optimizing for doing well on the tests, then the test is a metric and maybe the real thing you want is learning, but if they are just optimizing for doing well on the test, then actually learning can suffer because they find some way to just memorize or study for particular problems that will show up on the test, which is not necessarily a good way to learn.

And if we get back to AI examples, there was a nice example from OpenAI last year where they had this reinforcement learning agent that was playing a boat racing game and the objective of the boat racing game was to go along the racetrack as fast as possible and finish the race before the other boats do, and to encourage the player to go along the track there were some reward points — little blocks that you have to hit to get rewards — that were along the track, and then the agent just found a degenerate solution where it would just go in a circle and hit the same blocks over and over again and get lots of reward, but it was not actually playing the game or winning the race or anything like that. This is an example of Goodhart’s Law in action. There are plenty of examples of this sort with present day reinforcement learning systems. Often when people are designing a reward function for a reinforcement learning system they end up adjusting it a number of times to eliminate these sort of degenerate solutions that happen.

And this is not limited to reinforcement learning agents. For example, recently there was a great paper that came out about many examples of Goodhart’s Law in evolutionary algorithms. For example, if some evolved agents were incentivized to move quickly in some direction, then they might just evolve to be really tall and then they fall in this direction instead of actually learning to move. There are lots and lots of examples of this and I think that as AI systems become more advanced and more powerful, then I think they’ll just get more clever at finding these sort of loopholes in our specifications of what we want them to do. Goodhart’s Law is, I would say, part of what’s behind various other AI safety issues. For example, negative side effects are often caused by the agent’s specification being incomplete, so there’s something that we didn’t specify.

For example, if we want a robot to carry a box from point A to point B, then if we just reward it for getting the box to point B as fast as possible, then if there’s something in the path of the robot — for example, there’s a vase there — then it will not have an incentive to go around the vase, it would just go right through the vase and break it just to get to point B as fast as possible, and this is an issue because our specification did not include a term for the state of the vase. So, when data is just optimizing for this reward that’s all about the box, then it doesn’t have an incentive to avoid disruptions to the environment.

Ariel: So I want to interrupt with a quick question. These examples so far, we’re obviously worried about them with a technology as powerful as AGI, but they’re also things that apply today. As you mentioned, Goodhart’s Law doesn’t even just apply to AI. What progress has been made so far? Are we seeing progress already in addressing some of these issues?

Victoria: We haven’t seen so much progress in addressing these questions in a very general sort of way, because when you’re building a narrow AI system, then you can often get away with a sort of trial and error approach where you run it and maybe it does something stupid, finds some degenerate solution, then you tweak your reward function, you run it again and maybe it finds a different degenerate solution and then so on and so forth until you arrive at some reward function that doesn’t lead to obvious failure cases like that. For many narrow systems and narrow applications where you can sort of foresee all the ways in which things can go wrong, and just penalize all those ways or build a reward function that avoids all of those failure modes, then there isn’t so much need to find a general solution to these problems. While as we get closer to general intelligence, there will be more need for more principled and more general approaches to these problems.

For example, how do we build an agent that has some idea of what side effects are, or what it means to disrupt an environment that it’s in, no matter what environment you put it in. That’s something we don’t have yet. One of the promising approaches that has been gaining traction recently is reward learning. For example, there was this paper in collaboration between DeepMind and OpenAI called Deep Reinforcement Learning from Human Preferences, where instead of directly specifying a reward function for the agent, it learns a reward function from human feedback. Where, for example, if your agent is this simulated little noodle or hopper that’s trying to do a backflip, then the human would just look at two videos off the agent trying to do a backflip and say, “Well this one looks more like a back flip.” And so, you have a bunch of data from the human about what is more similar to what the human wants the agent to do.

With this kind of human feedback, unlike, for example, demonstrations, the agent can learn something that the human might not be able to demonstrate very easily. For example, even if I cannot do a backflip myself, I can still judge whether someone else has successfully done a backflip or whether this reinforcement agent has done a backflip. This is promising for getting agents to potentially solve problems that humans cannot solve or do things that humans cannot demonstrate. Of course, with human feedback and human-in-the-loop kind of work, there is always the question of scalability because human time is expensive and we want the agent to learn as efficiently as possible from limited human feedback and we also want to make sure that the agent actually gets human feedback in all the relevant situations so it learns to generalize correctly to new situations. There are a lot of remaining open problems in this area as well, but the progress so far has been quite encouraging.

Ariel: Are there others that you want to talk about?

Victoria: Maybe I’ll talk about one other question, which is that of interpretability. Interpretability of AI systems is something that is a big area right now in near-term AI safety that increasingly more people on the research community are thinking about and working on, that is also quite relevant in long-term AI safety. This generally has to do with being able to understand why your system does things a certain way, or makes certain decisions or predictions, or in the case of an agent, why it takes certain actions and also understanding what different components of the system are looking for in the data or how the system is influenced by different inputs and so on. Basically making it less of a black box, and I think there is a reputation for deep learning systems in particular that they are seen as black boxes and it is true that they are quite complex, but I think they don’t necessarily have to be black boxes and there has certainly been progress in trying to explain why they do things.

Ariel: Do you have real world examples?

Victoria: So, for example, if you have some AI system that’s used for medical diagnosis, then on the one hand you could have something simple like a decision tree that just looks at your x-ray and if there is something in a certain position then it gives you a certain diagnosis, and otherwise it doesn’t and so on. Or you could have a more complex system like a neural network that takes into account a lot more factors and then at the end it says, like maybe this person has cancer or maybe this person has something else. But it might not be immediately clear why that diagnosis was made. Particularly in sensitive applications like that, what sometimes happens is that people end up using simpler systems that they find more understandable where they can say why a certain diagnosis was made, even if those systems are less accurate, and that’s one of the important cases for interpretability where if we figure out how to make these more powerful systems more interpretable, for example, through visualization techniques, then they would actually become more useful in these really important applications where it actually matters not just to predict well, but to explain where the prediction came from.

And another area, another example is an algorithm that’s deciding whether to give someone a loan or a mortgage, then if someone’s loan application got rejected then they would really want to know why it got rejected. So the algorithm has to be able to point at some variables or some other aspect of the data that influences decisions or you might need to be able to explain how the data will need to change for the decision to change, what variables would need to be changed by a certain amount for the decision to be different. So these are just some examples of how this can be important and how this is already important. And this kind of interpretability of present day systems is of course already on a lot of people’s minds. I think it is also important to think about interpretability in the longer term as we build more general AI systems that will continue to be important or maybe even become more important to be able to look inside them and be able to check if they have particular concepts that they’re representing.

Like, for example, especially from a safety perspective, whether your system was thinking about the off switch and if it’s thinking about whether it’s going to be turned off, that might be something good to monitor for. We also would want to be able to explain how our systems fail and why they fail. This is, of course, quite relevant today if, let’s say your medical diagnosis AI makes a mistake and we want to know what led to that, why it made the wrong diagnosis. Also on the longer term we want to know why an AI system hacks its reward function, what is it thinking — well “thinking” with quotes, of course — while it’s following a degenerate solution instead of the kind of solution we would want it to find. So, what is the boat race agent that I mentioned earlier paying attention to while it’s going in circles and collecting the same rewards over and over again instead of playing the game, that kind of thing. I think the particular application of interpretability techniques to safety problems is going to be important and it’s one of the examples of the kind of work that we’re looking for in the in the RFP.

Ariel: Awesome. Okay, and so, we’ve been talking about how all these things can go wrong and we’re trying to do all this research to make sure things don’t go wrong, and yet basically we think it’s worthwhile to continue designing artificial intelligence, that no one’s looking at this and saying “Oh my god, artificial intelligence is awful, we need to stop studying it or developing it.” So what are the benefits that basically make these risks worth the risk?

Shahar: So I think one thing is in the domain of narrow applications, it’s very easy to make analogies to software, right? For the things that we have been able to hand over to computers, they really have been the most boring and tedious and repetitive things that humans can do and we now no longer need to do them and productivity has gone up and people are generally happier and they can get paid more for doing more interesting things and we can just build bigger systems because we can hand off the control of them to machines that don’t need to sleep and don’t make small mistakes in calculations. Now the promise of turning that and adding to that all of the narrow things that experts can do, whether it’s improving medical diagnosis, whether it’s maybe farther down the line some elements of drug discovery, whether it’s piloting a car or operating machinery, many of these areas where human labor is currently required because there is a fuzziness to the task, it does not enable a software engineer to come in and code an algorithm, but maybe with machine learning in the not too distant future we’ll be able to turn them over to machines.

It means taking some skills that only a few individuals in the world can do and making those available to everyone around the world in some domains. That seems, I mean, concrete examples are, the ones that I have I try to find the companies that do them and get involved with them because I want to see them happen sooner and the ones that I can’t imagine yet, someone will come along and make a company out of it, or a not-for-profit for it. But we’ve seen applications from agriculture, to medicine, to computer security, to entertainment and art, and driving and transport, and in all of these I think we’re just gonna be seeing even more. I think we’re gonna have more creative products out there that were designed in collaboration between humans and machines. We’re gonna see more creative solutions to scientific engineering problems. We’re gonna see those professions where really good advice is very valuable, but there are only so many people who can help you — so if I’m thinking of doctors and lawyers, taking some of that advice and making it universally accessible through an app just makes life smoother. These are some of the examples that come to my mind.

Ariel: Okay, great. Victoria what are the benefits that you think make these risks worth addressing?

Victoria: I think there are many ways in which AI systems can make our lives a lot better and make the world a lot better especially as we build more general systems that are more adaptable. For example, these systems could help us with designing better institutions and better infrastructure, better health systems or electrical systems or what have you. Even now, there are examples like the Google project on optimizing the data center energy use using machine learning, which is something that Deep Mind was working on, where the use of machine learning algorithms to automate energy used in the data centers improved their energy efficiency by I think something like 40 percent. That’s of course with fairly narrow AI systems.

I think as we build more general AI systems we can expect, we can hope for really creative and innovative solutions to the big problems that humans face. So you can think of something like AlphaGo’s famous “move 37” that overturned thousands of years of human wisdom in Go. What if you can build even more general and even more creative systems and apply them to real world problems? I think there is great promise in that. I think this can really transform the world in a positive direction, and we just have to make sure that as the systems are built that we think about safety from the get go and think about it in advance and trying to build them to be as resistant to accidents and misuse as possible so that all these benefits can actually be achieved.

The things I mentioned were only examples of the possible benefits. Imagine if you could have an AI scientist that’s trying to develop better drugs against diseases that have really resisted treatment or more generally just doing science faster and better if you actually have more general AI systems that can think as flexibly as humans can about these sort of difficult problems. And they would not have some of the limitations that humans have where, for example, our attention is limited our memory is limited, while AI could be, at least theoretically, unlimited in it’s processing power, in the resources available to it, it can be more parallelized, it can be more coordinated and I think all of the big problems that are so far unsolved are these sort of coordination problems that require putting together a lot of different pieces of information and a lot of data. And I think there are massive benefits to be reaped there if we can only get to that point safely.

Ariel: Okay, great. Well thank you both so much for being here. I really enjoyed talking with you.

Shahar: Thank you for having us. It’s been really fun.

Victoria: Yeah, thank you so much.

[end of recorded material]

Podcast: AI and the Value Alignment Problem with Meia Chita-Tegmark and Lucas Perry

What does it mean to create beneficial artificial intelligence? How can we expect to align AIs with human values if humans can’t even agree on what we value? Building safe and beneficial AI involves tricky technical research problems, but it also requires input from philosophers, ethicists, and psychologists on these fundamental questions. How can we ensure the most effective collaboration?

Ariel spoke with FLI’s Meia Chita-Tegmark and Lucas Perry on this month’s podcast about the value alignment problem: the challenge of aligning the goals and actions of AI systems with the goals and intentions of humans. 

Topics discussed in this episode include:

  • how AGI can inform human values,
  • the role of psychology in value alignment,
  • how the value alignment problem includes ethics, technical safety research, and international coordination,
  • a recent value alignment workshop in Long Beach,
  • and the possibility of creating suffering risks (s-risks).

This podcast was edited by Tucker Davey. You can listen to it above or read the transcript below.

 

Ariel: I’m Ariel Conn with the Future of Life Institute, and I’m excited to have FLI’s Lucas Perry and Meia Chita-Tegmark with me today to talk about AI, ethics and, more specifically, the value alignment problem. But first, if you’ve been enjoying our podcast, please take a moment to subscribe and like this podcast. You can find us on iTunes, SoundCloud, Google Play, and all of the other major podcast platforms.

And now, AI, ethics, and the value alignment problem. First, consider the statement “I believe that harming animals is bad.” Now, that statement can mean something very different to a vegetarian than it does to an omnivore. Both people can honestly say that they don’t want to harm animals, but how they define “harm” is likely very different, and these types of differences in values are common between countries and cultures, and even just between individuals within the same town. And then we want to throw AI into the mix. How can we train AIs to respond ethically to situations when the people involved still can’t come to an agreement about what an ethical response should be?

The problem is even more complicated because often we don’t even know what we really want for ourselves, let alone how to ask an AI to help us get what we want. And as we’ve learned with stories like that of King Midas, we need to be really careful what we ask for. That is, when King Midas asked the genie to turn everything to gold, he didn’t really want everything — like his daughter and his food — turned to gold. And we would prefer than an AI we design recognize that there’s often implied meaning in what we say, even if we don’t say something explicitly. For example, if we jump into an autonomous car and ask it to drive us to the airport as fast as possible, implicit in that request is the assumption that, while we might be OK with some moderate speeding, we intend for the car to still follow most rules of the road, and not drive so fast as to put anyone’s life in danger or take illegal routes. That is, when we say “as fast as possible,” we mean “as fast as possible within the rules of law,” and not within the rules of physics or within the laws of physics. And these examples are just the tiniest tip of the iceberg, given that I didn’t even mention artificial general intelligence (AGI) and how that can be developed such that its goals align with our values.

So as I mentioned a few minutes ago, I’m really excited to have Lucas and Meia joining me today. Meia is a co-founder of the Future of Life Institute. She’s interested in how social sciences can contribute to keeping AI beneficial, and her background is in social psychology. Lucas works on AI and nuclear weapons risk-related projects at FLI. His background is in philosophy with a focus on ethics. Meia and Lucas, thanks for joining us today.

Meia: It’s a pleasure. Thank you.

Lucas: Thanks for having us.

Ariel: So before we get into anything else, one of the big topics that comes up a lot when we talk about AI and ethics is this concept value alignment. I was hoping you could both maybe talk just a minute about what value alignment is and why it’s important to this question of AI and ethics.

Lucas: So value alignment, in my view, is bringing AI’s goals, actions, intentions and decision-making processes in accordance with what humans deem to be the good or what we see as valuable or what our ethics actually are.

Meia: So for me, from the point of view of psychology, of course, I have to put the humans at the center of my inquiry. So from that point of view, value alignment … You can think about it also in terms of humans’ relationships with other humans. But I think it’s even more interesting when you add artificial agents into the mix. Because now you have an entity that is so wildly different from humans yet we would like it to embrace our goals and our values in order to keep it beneficial for us. So I think the question of value alignment is very central to keeping AI beneficial.

Lucas: Yeah. So just to expand on what I said earlier: The project of value alignment is in the end creating beneficial AI. It’s working on what it means for something to be beneficial, what beneficial AI exactly entails, and then learning how to technically instantiate that into machines and AI systems. Also, building the proper like social and political context for that sort of technical work to be done and for it to be fulfilled and manifested in our machines and AIs.

Ariel: So when you’re thinking of AI and ethics, is value alignment basically synonymous, just another way of saying AI and ethics or is it a subset within this big topic of AI and ethics?

Lucas: I think they have different connotations. If one’s thinking about AI ethics, I think that one is tending to be moreso focused on applied ethics and normative ethics. One might be thinking about the application of AI systems and algorithms and machine learning in domains in the present day and in the near future. So one might think about atomization and other sorts of things. I think that when one is thinking about value alignment, it’s much more broad and expands also into metaethics and really sort of couches and frames the problem of AI ethics as something which happens over decades and which has a tremendous impact. I think that value alignment has a much broader connotation than what AI ethics has traditionally had.

Meia: I think it all depends on how you define value alignment. I think if you take the very broad definition that Lucas has just proposed, I think that yes, it probably includes AI ethics. But you can also think of it more narrowly as simply instantiating your own values into AI systems and having them adopt your goals. In that case, I think there are other issues as well because if you think about it from the point of view of psychology, for example, then it’s not just about which values get instantiated and how you do that, how you solve the technical problem, but also we know that humans, even if they know what goals they have and what values they uphold, it’s very, very hard for them sometimes to actually act in accordance to them because they have all sorts of cognitive and emotional effective limitations. So in that case I think value alignment is, in this narrow sense, is basically not sufficient. We also need to think about AIs and applications of AIs in terms of how do they help us and how do they make sure that we gain the cognitive competencies that we need to be moral beings and to be really what we should be, not just what we are.

Lucas: Right. I guess to expand on what I was just saying. Value alignment I think in the more traditional sense, it’s sort of all … It’s more expansive and inclusive in that it’s recognizing a different sort of problem than AI ethics alone has. I think that when one is thinking about value alignment, there are elements of thinking about — somewhat about machine ethics but also about social, political, technical and ethical issues surrounding the end goal of eventually creating AGI. Whereas, AI ethics can be more narrowly interpreted just as certain sorts of specific cases where AI’s having impact and implications in our lives in the next 10 years. Whereas, value alignment’s really thinking about the instantiation of ethics and machines and making machine systems that are corrigible and robust and docile, which will create a world that we’re all happy about living in.

Ariel: Okay. So I think that actually is going to flow really nicely into my next question, and that is, at FLI we tend to focus on existential risks. I was hoping you could talk a little bit about how issues of value alignment are connected to the existential risks that we concern ourselves with.

Lucas: Right. So, we can think of AI systems as being very powerful optimizers. We can imagine there being a list of all possible futures and what intelligence is good for is for modeling the world and then committing to and doing actions which constrain the set of all possible worlds to ones which are desirable. So intelligence is sort of the means by which we get to an end, and ethics is the end towards which we strive. So these are how these two things really integral and work together and how AI without ethics makes no sense and how ethics without AI or intelligence in general also just doesn’t work. So in terms of existential risk, there are possible futures that intelligence can lead us to where earth-originating intelligent life no longer exists either intentionally or by accident. So value alignment sort of fits in by constraining the set of all possible futures by working on technical work by doing political and social work and also work in ethics to constrain the actions of AI systems such that existential risks do not occur, such that by some sort of technical oversight, by some misalignment of values, by some misunderstanding of what we want, the AI generates an existential risk.

Meia: So we should remember that homo sapiens represent an existential risk to itself also. We are creating nuclear weapons. We have more of them than we need. So many, in fact, that we could destroy the entire planet with them. Not to mention homo sapiens has also represented an existential risk for all other species. The problem is AI is that we’re introducing in the mix a whole new agent that is by definition supposed to be more intelligent, more powerful than us and also autonomous. So as Lucas mentioned, it’s very important to think through what kind of things and abilities do we delegate to these AIs and how can we make sure that they have the survival and the flourishing of our species in mind. So I think this is where value alignment comes in as a safeguard against these very terrible and global risks that we can imagine coming from AI.

Lucas: Right. What makes doing that so difficult is beyond the technical issue of just having AI researchers and AI safety researchers knowing how to just get AI systems to actually do what we want without creating a universe of paperclips. There’s also this terrible social and political context in which this is all happening where there is really great game-theoretic incentives to be the first person to create artificial general intelligence. So in a race to create AI, a lot of these efforts that seem very obvious and necessary could be cut in favor of more raw power. I think that’s probably one of the biggest risks for us not succeeding in creating value-aligned AI.

Ariel: Okay. Right now it’s predominantly technical AI people who are considering mostly technical AI problems. How to solve different problems is usually, you need a technical approach for this. But when it comes to things like value alignment and ethics, most of the time I’m hearing people suggest that we can’t leave that up to just the technical AI researchers. So I was hoping you could talk a little bit about who should be part of this discussion, why we need more people involved, how we can get more people involved, stuff like that.

Lucas: Sure. So maybe if I just break the problem down into just what I view to be the three different parts then talking about it will make a little bit more sense. So we can break down the value alignment problem into three separate parts. The first one is going to be the technical issues, the issues surrounding actually creating artificial intelligence. The issues of ethics, so the end towards which we strive. The set of possible futures which we would be happy in living, and then also there’s the governance and the coordination and the international problem. So we can sort of view this as a problem of intelligence, a problem of agreeing on the end towards which intelligence is driven towards, and also the political and social context in which all of this happens.

So thus far, there’s certainly been a focus on the technical issue. So there’s been a big rise in the field of AI safety and in attempts to generate beneficial AI, attempts at creating safe AGI and mechanisms for avoiding reward hacking and other sorts of things that happen when systems are trying to optimize their utility function. The Concrete Problems on AI Safety paper has been really important and sort of illustrates some of these technical issues. But even between technical AI safety research and ethics there’s disagreement about something also like machine ethics. So how important is machine ethics? Where does machine ethics fit in to technical AI safety research? How much time and energy should we put into certain kinds of technical AI research versus how much time and effort should we put into issues in governance and coordination and addressing the AI arms race issues? How much of ethics do we really need to solve?

So I think there’s a really important and open question regarding how do we apply and invest our limited resources in sort of addressing these three important cornerstones in value alignment so that the technical issue, the issues in ethics and then issues in governance and coordination, and how do we optimize working on these issues given the timeline that we have? How much resources should we put in each one? I think that’s an open question. Yeah, one that certainly needs to be addressed more about how we’re going to move forward given limited resources.

Meia: I do think though the focus so far has been so much on the technical aspect. As you were saying, Lucas, there are other aspects to this problem that need to be tackled. What I’d like to emphasize is that we cannot solve the problem if we don’t pay attention to the other aspects as well. So I’m going to try to defend, for example, psychology here, which has been largely ignored I think in the conversation.

So from the point of view of psychology, I think the value alignment problem is double fold in a way. It’s about a triad of interactions. Human, AI, other humans, right? So we are extremely social animals. We interact a lot with other humans. We need to align our goals and values with theirs. Psychology has focused a lot on that. We have a very sophisticated set of psychological mechanisms that allow us to engage in very rich social interactions. But even so, we don’t always get it right. Societies have created a lot of suffering, a lot of moral harm, injustice, unfairness throughout the ages. So for example, we are very ill-prepared by our own instincts and emotions to deal with inter-group relations. So that’s very hard.

Now, people coming from the technical side, they can say, “We’re just going to have AI learn our preferences.” Inverse reinforcement learning is a proposal that says that basically explains how to keep humans in the loop. So it’s a proposal for programing AI such that it gets its reward not from achieving a goal but from getting good feedback from a human because it achieved a goal. So the hope is that this way AI can be correctable and can learn from human preferences.

As a psychologist, I am intrigued, but I understand that this is actually very hard. Are we humans even capable of conveying the right information about our preferences? Do we even have access to them ourselves or is this all happening in some sort of subconscious level? Sometimes knowing what we want is really hard. How do we even choose between our own competing preferences? So this involves a lot more sophisticated abilities like impulse control, executive function, etc. I think that if we don’t pay attention to that as well in addition to solving the technical problem, I think we are very likely to not get it right.

Ariel: So I’m going to want to come back to this question of who should be involved and how we can get more people involved, but one of the reasons that I’m talking to the both of you today is because you actually have made some steps in broadening this discussion already in that you set up a workshop that did bring together a multidisciplinary team to talk about value alignment. I was hoping you could tell us a bit more about how that workshop went, what interesting insights were gained that might have been expressed during the workshop, what you got out of it, why you think it’s important towards the discussion? Etc.

Meia: Just to give a few facts about the workshop. The workshop took place in December 2017 in Long Beach, California. We were very lucky to have two wonderful partners in co-organizing this workshop. The Berggruen Institute and the Canadian Institute for Advanced Research. And the idea for the workshop was very much to have a very interdisciplinary conversation about value alignment and reframe it as not just a technical problem but also one that involves disciplines such as philosophy and psychology, political science and so on. So we were very lucky actually to have a fantastic group of people there representing all these disciplines. The conversation was very lively and we discussed topics all the way from near term considerations in AI and how we align AI to our goals and also all the way to thinking about AGI and even super intelligence. So it was a fascinating range both of topics discussed and also perspectives being represented.

Lucas: So my inspiration for the workshop was being really interested in ethics and the end towards which this is all going. What really is the point of creating AGI and perhaps even eventually superintelligence? What is it that is good and what is that is valuable? Broadening from that and becoming more interested in value alignment, the conversation thus far has been primarily understood as something that is purely technical. So value alignment has only been seen as something that is for technical AI safety researchers to work on because there are technical issues regarding AI safety and how you get AIs to do really simple things without destroying the world or ruining a million other things that we care about. But this is really, as we discussed earlier, an interdependent issue that covers issues in metaethics and normative ethics, applied ethics. It covers issues in psychology. It covers issue in law, policy, governance, coordination. It covers the AI arms race issue. Solving the value alignment problem and creating a future with beneficial AI is a civilizational project where we need everyone working on all these different issues. On issues of value, on issues of game theory among countries, on the technical issues, obviously.

So what I really wanted to do was I wanted to start this workshop in order to broaden the discussion. To reframe value alignment as not just something in technical AI research but something that really needs voices from all disciplines and all expertise in order to have a really robust conversation that reflects the interdependent nature of the issue and where different sorts of expertise on the different parts of the issue can really come together and work on it.

Ariel: Is there anything specific that you can tell us about what came out of the workshop? Were there any comments that you thought were especially insightful or ideas that you think are important for people to be considering?

Lucas: I mean, I think that for me one of the takeaways from the workshop is that there’s still a mountain of work to do and that there are a ton of open questions. This is a very, very difficult issue. I think that one thing I took away from the workshop was that we couldn’t even agree on the minimal conditions for which it would be okay to safely deploy AGI. There are just issues that seem extremely trivial in value alignment from the technical side and from the ethical side that seem very trivial, but on which I think there is very little understanding or agreement right now.

Meia: I think the workshop was a start and one good thing that happened during the workshop is I felt that the different disciplines or rather their representatives were able to sort of air out their frustrations and also express their expectations of the others. So I remember this quite iconic moment when one roboticist simply said, “But I really want you ethics people to just tell me what to implement in my system. What do you want my system to do?” So I think that was actually very illustrative of what Lucas was saying — the need for more joint work. I think there was a lot of expectations I think from both the technical people towards the ethicists but also from the ethicists in terms of like, “What are you doing? Explain to us what are the actual ethical issues that you think you are facing with the things that you are building?” So I think there’s a lot of catching up to do on both sides and there’s much work to be done in terms of making these connections and bridging the gaps.

Ariel: So you referred to this as sort of a first step or an initial step. What would you like to see happen next?

Lucas: I don’t have any concrete or specific ideas for what exactly should happen next. I think that’s a really difficult question. Certainly, things that most people would want or expect. I think in the general literature and conversations that we were having, I think that value alignment, as a word and as something that we understand, needs to be expanded outside of the technical context. I don’t think that it’s expanded that far. I think that more ethicists and more moral psychologists and people in law policy and governance need to come in and need to work on this issue. I’d like to see more coordinated collaborations, specifically involving interdisciplinary crowds informing each other and addressing issues and identifying issues and really some sorts of formal mechanisms for interdisciplinary coordination on value alignment.

It would be really great if people in technical research, in technical AI safety research and in ethics and governance could also identify all of the issues in their own fields, which the resolution to those issues and the solution to those issues requires answers from other fields. So for example, inverse reinforcement learning is something that Meia was talking about earlier and I think it’s something that we can clearly decide and see as being interdependent on a ton of issues in a law and also in ethics and in value theory. So that would be sort of like an issue or node in the landscape of all issues and technical safety research that would be something that is interdisciplinary.

So I think it would be super awesome if everyone from their own respective fields are able to really identify the core issues which are interdisciplinary and able to dissect them into the constituent components and sort of divide them among the disciplines and work together on them and identify the different timelines at which different issues need to be worked on. Also, just coordinate on all those things.

Ariel: Okay. Then, Lucas, you talked a little bit about nodes and a landscape, but I don’t think we’ve explicitly pointed out that you did create a landscape of value alignment research so far. Can you talk a little bit about what that is and how people can use it?

Lucas: Yeah. For sure. With the help of other colleagues at the Future of Life Institute like Jessica Cussins and Richard Mallah, we’ve gone ahead and created a value alignment conceptual landscape. So what this is is it’s a really big tree, almost like an evolutionary tree that you would see, but what it is, is a conceptual mapping and landscape of the value alignment problem. What it’s broken down into are the three constituent components, which we were talking about earlier, which is the technical issues, the issues in technically creating safe AI systems. Issues in ethics, breaking that down into issues in metaethics and normative ethics and applied ethics and moral psychology and descriptive ethics where we’re trying to really understand values, what it means for something to be valuable and what is the end towards which intelligence will be aimed at. Then also, the other last section is governance. So issues in coordination and policy and law in creating a world where AI safety research can proceed and where there aren’t … Where we don’t develop or allow a sort of winner-take-all scenario to rush us towards the end and not really have a final and safe solution towards fully autonomous powerful systems.

So what the landscape here does is it sort of outlines all of the different conceptual nodes in each of these areas. It lays out what all the core concepts are, how they’re all related. It defines the concepts and also gives descriptions about how the concepts fit into each of these different sections of ethics, governance, and technical AI safety research. So the hope here is that people from different disciplines can come and see the truly interdisciplinary nature of the value alignment problem, to see where ethics and governance and the technical AI safety research stuff all fits in together and how this all together really forms, I think, the essential corners of the value alignment problem. It’s also nice for researchers and other persons to understand the concepts and the landscape of the other parts of this problem.

I think that, for example, technical AI safety researchers probably don’t know much about metaethics or they don’t spend too much time thinking about normative ethics. I’m sure that ethicists don’t spend very much time thinking about technical value alignment and how inverse reinforcement learning is actually done and what it means to do robust human imitation in machines. What are the actual technical, ethical mechanisms that are going to go into AI systems. So I think that this is like a step in sort of laying out the conceptual landscape, in introducing people to each other’s concepts. It’s a nice visual way of interacting with I think a lot of information and sort of exploring all these different really interesting nodes that explore a lot of very deep, profound moral issues, very difficult and interesting technical issues, and issues in law, policy and governance that are really important and profound and quite interesting.

Ariel: So you’ve referred to this as the value alignment problem a couple times. I’m curious, do you see this … I’d like both of you to answer this. Do you see this as a problem that can be solved or is this something that we just always keep working towards and it’s going to influence — whatever the current general consensus is will influence how we’re designing AI and possibly AGI, but it’s not ever like, “Okay. Now we’ve solved the value alignment problem.” Does that make sense?

Lucas: I mean, I think that that sort of question really depends on your metaethics, right? So if you think there are moral facts, if you think that more statements can be true or false and aren’t just sort of subjectively dependent upon whatever our current values and preferences historically and evolutionarily and accidentally happen to be, then there is an end towards which intelligence can be aimed that would be objectively good and which would be the end toward which we would strive. In that case, if we had solved the technical issue and the governance issue and we knew that there was a concrete end towards which we would strive that was the actual good, then the value alignment problem would be solved. But if you don’t think that there is a concrete end, a concrete good, something that is objectively valuable across all agents, then the value alignment problem or value alignment in general is an ongoing process and evolution.

In terms of the technical and governance sides of those, I think that there’s nothing in the laws of physics or I think in computer science or in game theory that says that we can’t solve those parts of the problem. Those ones seem intrinsically like they can be solved. That’s nothing to say about how easy or how hard it is to solve those. But whether or not there is sort of an end towards value alignment I think depends on difficult questions in metaethics and whether something like moral error theory is true where all moral statements are simply false and that morality is maybe sort of just like a human invention, which has no real answers or who’s answers are all false. I think that’s sort of the crux of whether or not value alignment can “be solved” because I think the technical issues and the issues in governance are things which are in principle able to be solved.

Ariel: And Meia?

Meia: I think that regardless of whether there is an absolute end to this problem or not, there’s a lot of work that we need to do in between. I also think that in order to even achieve this end, we need more intelligence, but as we create more intelligent agents, again, this problem gets magnified. So there’s always going to be a race between the intelligence that we’re creating and making sure that it is beneficial. I think at every step of the way, the more we increase the intelligence, the more we need to think about the broader implications. I think in the end we should think of artificial intelligence also not just as a way to amplify our own intelligence but also as a way to amplify our moral competence as well. As a way to gain more answers regarding ethics and what our ultimate goals should be.

So I think that the interesting questions that we can do something about are somewhere sort of in between. We will not have the answer before we are creating AI. So we always have to figure out a way to keep up with the development of intelligence in terms of our development of moral competence.

Ariel: Meia, I want to stick with you for just a minute. When we talked for the FLI end of your podcast, one of the things you said you were looking forward to in 2018 is broadening this conversation. I was hoping you could talk a little bit more about some of what you would like to see happen this year in terms of getting other people involved in the conversation, who you would like to see taking more of an interest in this?

Meia: So I think that unfortunately, especially in academia, we’ve sort of defined our work so much around these things that we call disciplines. I think we are now faced with problems, especially in AI, that really are very interdisciplinary. We cannot get the answers from just one discipline. So I would actually like to see in 2018 more sort of, for example, funding agencies proposing and creating funding sources for interdisciplinary projects. The way it works, especially in academia, so you propose grants to very disciplinary-defined granting agencies.

Another thing that would be wonderful to start happening is our education system is also very much defined and described around these disciplines. So I feel that, for example, there’s a lack of courses, for example, that teach students in technical fields things about ethics, moral psychology, social sciences and so on. The converse is also true; in social sciences and in philosophy we hear very little about advancements in artificial intelligence and what’s new and what are the problems that are there. So I’d like to see more of that. I’d like to see more courses like this developed. I think a friend of mine and I, we’ve spent some time thinking about how many courses are there that have an interdisciplinary nature and actually talk about the societal impacts of AI and there’s a handful in the entire world. I think we counted about five or six of them. So there’s a shortage of that as well.

But then also educating the general public. I think thinking about the implications of AI and also the societal implications of AI and also the value alignment problem is something that’s probably easier for the general public to grasp rather than thinking about the technical aspects of how to make it more powerful or how to make it more intelligent. So I think there’s a lot to be done in educating, funding, and also just simply having these conversations. I also very much admire what Lucas has been doing. I hope he will expand on it, creating this conceptual landscape so that we have people from different disciplines understanding their terms, their concepts, each other’s theoretical frameworks with which they work. So I think all of this is valuable and we need to start. It won’t be completely fixed in 2018 I think. But I think it’s a good time to work towards these goals.

Ariel: Okay. Lucas, is there anything that you wanted to add about what you’d like to see happen this year?

Lucas: I mean, yeah. Nothing else I think to add on to what I said earlier. Obviously we just need as many people from as many disciplines working on this issue because it’s so important. But just to go back a little bit, I was also really liking what Meia said about how AI systems and intelligence can help us with our ethics and with our governance. I think that seems like a really good way forward potentially if as our AI systems grow more powerful in their intelligence, they’re able to inform us moreso about our own ethics and our own preferences and our own values, about our own biases and about what sorts of values and moral systems are really conducive to the thriving of human civilization and what sorts of moralities lead to sort of navigating the space of all possible minds in a way that is truly beneficial.

So yeah. I guess I’ll be excited to see more ways in which intelligence and AI systems can be deployed for really tackling the question of what beneficial AI exactly entails. What does beneficial mean? We all want beneficial AI, but what is beneficial, what does that mean? What does that mean for us in a world in which no one can agree on what beneficial exactly entails? So yeah, I’m just excited to see how this is going to work out, how it’s going to evolve and hopefully we’ll have a lot more people joining this work on this issue.

Ariel: So your comment reminded me of a quote that I read recently that I thought was pretty interesting. I’ve been reading Paula Boddington’s book Toward a Code of Ethics for Artificial Intelligence. This was actually funded at least in part if not completely by FLI grants. But she says, “It’s worth pointing out that if we need AI to help us make moral decisions better, this cast doubt on the attempts to ensure humans always retain control over AI.” I’m wondering if you have any comments on that.

Lucas: Yeah. I don’t know. I think this sort of a specific way of viewing the issue or it’s a specific way of viewing what AI systems are for and the sort of future that we want. In the end is the best at all possible futures a world in which human beings ultimately retain full control over AI systems. I mean, if AI systems are autonomous and if value alignment actually succeeds, then I would hope that we created AI systems which are more moral than we are. AI systems which have better ethics, which are less biased, which are more rational, which are more benevolent and compassionate than we are. If value alignment is able to succeed and if we’re able to create autonomous intelligent systems of that sort of caliber of ethics and benevolence and intelligence, then I’m not really sure what the point is of maintaining any sort of meaningful human control.

Meia: I agree with you, Lucas. That if we do manage to create … In this case, I think it would have to be artificial general intelligence that is more moral, more beneficial, more compassionate than we are, then the issue of control, it’s probably not so important. But in the meantime, I think, while we are sort of tinkering with artificial intelligent systems, I think the issue of control is very important.

Lucas: Yeah. For sure.

Meia: Because we wouldn’t want to … We wouldn’t want to cut out of the loop too early before we’ve managed to properly test the system, make sure that indeed it is doing what we intended to do.

Lucas: Right. Right. I think that in the process of that that it requires a lot of our own moral evolution, something which we humans are really bad and slow at. As president of FLI Max Tegmark likes to talk about, he likes to talk about the race between our growing wisdom and the growing power of our technology. Now, human beings are really kind of bad at keeping our wisdom in pace with the growing power of our technology. If we sort of look at the moral evolution of our species, we can sort of see huge eras in which things which were seen as normal and mundane and innocuous, like slavery or the subjugation of women or other sorts of things like that. Today we have issues with factory farming and animal suffering and income inequality and just tons of people who are living with exorbitant wealth that doesn’t really create much utility for them, whereas there’s tons of other people who are in poverty and who are still starving to death. There are all sorts of things that we can see in the past as being obviously morally wrong.

Meia: Under the present too.

Lucas: Yeah. So then we can see that obviously there must be things like that today. We wonder, “Okay. What are the sorts of things today that we see and innocuous and normal and as mundane that the people of tomorrow, as William MacAskill says, will see us as moral monsters? How are we moral monsters today, but we simply can’t see it? So as we create powerful intelligence systems and we’re working on our ethics and we’re trying to really converge on constraining the set of all possible worlds into ones which are good and which are valuable and ethical, it really demands a moral evolution of ourselves that we sort of have to figure out ways to catalyze and work on and move through, I think, faster.

Ariel: Thank you. So as you consider attempts to solve the value alignment problem, what are you most worried about, either in terms of us solving it badly or not quickly enough or something along those lines? What is giving you the most hope in terms of us being able to address this problem?

Lucas: I mean, I think just technically speaking, ignoring the likelihood of this — the worst of all possible outcomes would be something like an s-risk. So an s-risk is a subset of x-risks — s-risk stands for suffering risk. So this is a sort of risk whereby some sort of value misalignment, whether it be intentional or much more likely accidental, some seemingly astronomical amount of suffering is produced by deploying a misaligned AI system. The way that this was function is given certain sorts of assumptions about the philosophy of mind, about consciousness and machines, if we understand potentially consciousness and experience to be substrate-independent, meaning if consciousness can be instantiated in machine systems, that you don’t just need meat to be conscious, but you need something like integrated information or information processing or computation or something like that, then the invention of AI systems and superintelligence and the spreading of intelligence, which optimizes towards any sort of arbitrary end, it could potentially lead to vast amounts of digital suffering, which would potentially arise accidentally or through subroutines or simulations, which would be epistemically useful but that involve a great amount of suffering. That coupled with these artificial intelligent systems running on silicon and iron and not on squishy, wet, human neurons would be that it would be running at digital time scales and not biological time scales. So there would be huge amplification of the speed of which the suffering was run. So subjectively, we might infer that a second for a computer, a simulated person on a computer, would be much greater than that for a biological person. Then we can sort of reflect that these are the sorts of risks — or an s-risk would be something that would be really bad. Just any sort of way that AI can be misaligned and lead to a great amount of suffering. There’s a bunch of different ways that this could happen.

So something like an s-risk would be something super terrible but it’s not really clear how likely that would be. But yeah, I think that beyond that obviously we’re worried about existential risk, we’re worried about ways that this could curtail or destroy the development of earth-originating intelligent life. Ways that this really might happen are I think most likely because of this winner-take-all scenario that you have with AI. We’ve had nuclear weapons for a very long time now, and we’re super lucky that nothing bad has happened. But I think the human civilization is really good at getting stuck into minimum equilibria where we get locked into these positions where it’s not easy to escape from. So it’s really not easy to disarm and get out of the nuclear weapons situation once we’ve discovered it. Once we start to develop, I think, more powerful and robust AI systems, I think already that a race towards AGI and towards more and more powerful AI might be very, very hard to stop if we don’t make significant progress on that soon, if we’re not able to get a ban on lethal autonomous weapons and if we’re not able to introduce any real global coordination and that we all just start racing towards more powerful systems that there might be a race towards AGI, which would cut corners on safety and potentially make the likelihood of an existential risk or suffering risk more likely.

Ariel: Are you hopeful for anything?

Lucas: I mean, yeah. If we get it right, then the next billion years can be super amazing, right? It’s just kind of hard to internalize that and think about that. It’s really hard to say I think how likely it is that we’ll succeed in any direction. But yeah, I’m hopeful that if we succeed in value alignment that the future can be unimaginably good.

Ariel: And Meia?

Meia: What’s scary to me is that it might be too easy to create intelligence. That there’s nothing in the laws of physics making it hard for us. Thus I think that it might happen too fast. Evolution took a long time to figure out how to make us intelligent, but that was probably just because it was trying to optimize for things like energy consumption and making us a certain size. So that’s scary. It’s scary that it’s happening so fast. I’m particularly scared that it might be easy to crack general artificial intelligence. I keep asking Max, “Max, but isn’t there anything in the laws of physics that might make it tricky?” His answer and also that of more physicists that I’ve been discussing with is that, “No, it doesn’t seem to be the case.”

Now, what makes me hopeful is that we are creating this. Stuart Russell likes to give this example of a message from an alien civilization, an alien intelligence that says, “We will be arriving in 50 years.” Then he poses the question, “What would you do when you prepare for that?” But I think with artificial intelligence it’s different. It’s not like it’s arriving and it’s a given and it has a certain form or shape that we cannot do anything about. We are actually creating artificial intelligence. I think that’s what makes me hopeful that if we actually research it right, that if we think hard about what we want and we work hard at getting our own act together, first of all, and also on making sure that this stays and is beneficial, we have a good chance to succeed.

Now, there’ll be a lot of challenges in between from very near-term issues like Lucas was mentioning, for example, autonomous weapons, weaponizing our AI and giving it the right to harm and kill humans, to other issues regarding income inequality enhanced by technological development and so on, to down the road how do we make sure that autonomous AI systems actually adopt our goals. But I do feel that it is important to try and it’s important to work at it. That’s what I’m trying to do and that’s what I hope others will join us in doing.

Ariel: All right. Well, thank you both again for joining us today.

Lucas: Thanks for having us.

Meia: Thanks for having us. This was wonderful.

Ariel: If you’re interested in learning more about the value alignment landscape that Lucas was talking about, please visit FutureofLife.org/valuealignmentmap. We’ll also link to this in the transcript for this podcast. If you enjoyed this podcast, please subscribe, give it a like, and share it on social media. We’ll be back again next month with another conversation among experts.

[end of recorded material]

Podcast: Top AI Breakthroughs and Challenges of 2017 with Richard Mallah and Chelsea Finn

AlphaZero, progress in meta-learning, the role of AI in fake news, the difficulty of developing fair machine learning — 2017 was another year of big breakthroughs and big challenges for AI researchers!

To discuss this more, we invited FLI’s Richard Mallah and Chelsea Finn from UC Berkeley to join Ariel for this month’s podcast. They talked about some of the technical progress they were most excited to see and what they’re looking forward to in the coming year.

You can listen to the podcast here, or read the transcript below.

Ariel: I’m Ariel Conn with the Future of Life Institute. In 2017, we saw an increase in investments into artificial intelligence. More students are applying for AI programs, and more AI labs are cropping up around the world. With 2017 now solidly behind us, we wanted to take a look back at the year and go over some of the biggest AI breakthroughs. To do so, I have Richard Mallah and Chelsea Finn with me today.

Richard is the director of AI projects with us at the Future of Life Institute, where he does meta-research, analysis and advocacy to keep AI safe and beneficial. Richard has almost two decades of AI experience in industry and is currently also head of AI R & D at the recruiting automation firm, Avrio AI. He’s also co-founder and chief data science officer at the content marketing planning firm, MarketMuse.

Chelsea is a PhD candidate in computer science at UC Berkeley and she’s interested in how learning algorithms can enable robots to acquire common sense, allowing them to learn a variety of complex sensory motor skills in real-world settings. She completed her bachelor’s degree at MIT and has also spent time at Google Brain.

Richard and Chelsea, thank you so much for being here.

Chelsea: Happy to be here.

Richard: As am I.

Ariel: Now normally I spend time putting together questions for the guests, but today Richard and Chelsea chose the topics. Many of the breakthroughs they’re excited about were more about behind-the-scenes technical advances that may not have been quite as exciting for the general media. However, there was one exception to that, and that’s AlphaZero.

AlphaZero, which was DeepMind’s follow-up to AlphaGo, made a big splash with the popular press in December when it achieved superhuman skills at Chess, Shogi and Go without any help from humans. So Richard and Chelsea, I’m hoping you can tell us more about what AlphaZero is, how it works and why it’s a big deal. Chelsea, why don’t we start with you?

Chelsea: Yeah, so DeepMind first started with developing AlphaGo a few years ago, and AlphaGo started its learning by watching human experts play, watching how human experts play moves, how they analyze the board — and then once it analyzed and once it started with human experts, it then started learning on its own.

What’s exciting about AlphaZero is that the system started entirely on its own without any human knowledge. It started just by what’s called “self-play,” where the agent, where the artificial player is essentially just playing against itself from the very beginning and learning completely on its own.

And I think that one of the really exciting things about this research and this result was that AlphaZero was able to outperform the original AlphaGo program, and in particular was able to outperform it by removing the human expertise, by removing the human input. And so I think that this suggests that maybe if we could move towards removing the human biases and removing the human input and move more towards what’s called unsupervised learning, where these systems are learning completely on their own, then we might be able to build better and more capable artificial intelligence systems.

Ariel: And Richard, is there anything you wanted to add?

Richard: So, what was particularly exciting about AlphaZero is that it’s able to do this by essentially a technique very similar to what Paul Christiano of AI Safety fame has called “capability amplification.” It’s similar in that it’s learning a function to predict a prior or an expectation over which moves are likely at a given point, as well as function to predict which player will win. And it’s able to do these in an iterative manner. It’s able to apply what’s called an “amplification scheme” in the more general sense. In this case it was Monte Carlo tree search, but in the more general case it could be other more appropriate amplification schemes for taking a simple function and iterating it many times to make it stronger, to essentially have a leading function that is then summarized.

Ariel: So I do have a quick follow up question here. With AlphaZero, it’s a program that’s living within a world that has very strict rules. What is the next step towards moving outside of that world with very strict rules and into the much messier real world?

Chelsea: That’s a really good point. The catch with these results, with these types of games — and even video games, which are a little bit messier than the strict rules of a board game — these games, all of these games can be perfectly simulated. You can perfectly simulate what will happen when you make a certain move or when you take a certain action, either in a video game or in the game of Go or the game of Chess, et cetera. Then therefore, you can train these systems with many, many lifetimes of data.

The real physical world on the other hand, we can’t simulate. We don’t know how to simulate the complex physics of the real world. As a result, you’re limited by the number of robots that you have if you’re interested in robots, or if you’re interested in healthcare, you’re limited by the number of patients that you have. And you’re also limited by safety concerns, the cost of failure, et cetera.

I think that we still have a long way to go towards taking these sorts of advances into real world settings where there’s a lot of noise, there’s a lot of complexity in the environment, and I think that these results are inspiring, and we can take some of the ideas from these approaches and apply them to these sorts of systems, but we need to keep in mind that there are a lot of challenges ahead of us.

Richard: So between real world systems and something like the game of Go, there are also incremental improvements, like introducing this port for partial observability or more stochastic environments, or more continuous environments as opposed to the very discrete ones. So these challenges, assuming that we do have a situation where we could actually simulate what we would like to see or use a simulation to help to get training data on the fly, then in those cases, we’re likely to be able to make some progress. Using a technique like this with some extensions or with some modifications to support those criteria.

Ariel: Okay. Now, I’m not sure if this is a natural jump to the next topic or not, but you’ve both mentioned that one of the big things that you saw happening last year were new creative approaches to unsupervised learning, and Richard in an email to me you mentioned “word translation without parallel data.” So I was hoping you could talk a little bit more about what these new creative approaches are and what you’re excited about there.

Richard: So this year, we saw an application of taking vector spaces, or taking word embeddings, which are essentially these multidimensional spaces where there are relationships between points that are meaningful semantically. The space itself is learned by a relatively shallow deep-learning network, but this meaningfulness that is imbued in the space, is actually able to be used, we’ve seen this year, by taking different languages, or I should say vector spaces that were trained in different languages or created from corpora of different languages and compared, and via some techniques to sort of compare and rationalize the differences between those spaces, we’re actually able to translate words and translate things between language pairs in ways that actually, in some cases, exceed supervised approaches because typically there are parallel sets of documents that have the same meaning in different languages. But in this case, we’re able to essentially do something very similar to what the Star Trek universal translator does. By consuming enough of the alien language, or the foreign language I should say, it’s able to model the relationships between concepts and then realign those with the concepts that are known.

Chelsea, would you like to comment on that?

Chelsea: I don’t think I have too much to add. I’m also excited about the translation results and I’ve also seen similar, I guess, works that are looking at unsupervised learning, not for translation, that have a little bit of a similar vein, but they’re fairly technical in terms of the actual approach.

Ariel: Yeah, I’m wondering if either of you want to try to take a stab at explaining how this works without mentioning vector spaces?

Richard: That’s difficult because it is a space, I mean it’s a very geometric concept, and it’s because we’re aligning shapes within that space that we actually get the magic happening.

Ariel: So would it be something like you have different languages going in, some sort of document or various documents from different languages going in, and this program just sort of maps them into this space so that it figures out which words are parallel to each other then?

Richard: Well it figures out the relationship between words and based on the shape of relationships in the world, it’s able to take those shapes and rotate them into a way that sort of matches up.

Chelsea: Yeah, perhaps it could be helpful to give an example. I think that generally in language you’re trying to get across concepts, and there is structure within the language, I mean there’s the structure that you learn about in grade school when you’re learning vocabulary. You learn about verbs, you learn about nouns, you learn about people and you learn about different words that describe these different things, and different languages have shared this sort of structure in terms of what they’re trying to communicate.

And so, what these algorithms do is they are given basically data of people talking in English, or people writing documents in English, and they’re also given data in another language — and the first one doesn’t necessarily need to be English. They’re given data in one language and data in another language. This data doesn’t match up. It’s not like one document that’s been translated into another, it’s just pieces of language, documents, conversations, et cetera, and by using the structure that exists, and the data such as nouns, verbs, animals, people, it can basically figure out how to map from the structure of one language to the structure of another language. It can recognize this similar structure in both languages and then figure out basically a mapping from one to the other.

Ariel: Okay. So I think, I want to keep moving forward, but continuing with the concept of learning, and Chelsea I want to stick with you for a minute. You mentioned that there were some really big metalearning advances that occurred last year, and you also mentioned a workshop and symposium at NIPS. I was wondering if you could talk a little more about that.

Chelsea: Yeah, I think that there’s been a lot of excitement around metalearning, or learning to learn. There were two gatherings at NIPS, one symposium, one workshop this year and both were well-attended by a number of people. Actually, metalearning has a fairly long history, and so it’s by no means a recent or a new topic, but I think that it has renewed attention within the machine learning community.

And so, I guess I can describe metalearning. It’s essentially having systems that learn how to learn. There’s a number of different applications for such systems. So one of them is an application that’s often referred to as AutoML, or automatic machine learning, where these systems can essentially optimize the hyper parameters, basically figure out the best set of parameters and then run a learning algorithm with those sets of hyper parameters. Essentially kind of taking the job of the machine learning researcher that is tuning different models on different data sets. And this can basically allow people to more easily train models on a data set.

Another application of metalearning that I’m really excited about is enabling systems to reuse data and reuse experience from other tasks when trying to solve new tasks. So in machine learning, there’s this paradigm of creating everything from scratch, and as a result, if you’re training from scratch, from zero prior knowledge, then it’s going to take a lot of data. It’s going to take a lot of time to train because you’re starting from nothing. But if instead you’re starting from previous experience in a different environment or on a different task, and you can basically learn how to efficiently learn from that data, then when you see a new task that you haven’t seen before, you should be able to solve it much more efficiently.

And so, one example of this is what’s called One-Shot Learning or Few-Shot Learning, where you learn essentially how to learn from a few examples, such that when you see a new setting and you just get one or a few examples, labeled examples, labeled data points, you can figure out the new task and solve the new task just from a small number of examples.

One explicit example of how humans do this is that you can have someone point out a Segway to you on the street, and even if you’ve never seen a Segway before or never heard of the concept of a Segway, just from that one example of a human pointing out to you, you can then recognize other examples of Segways. And the way that you do that is basically by learning how to recognize objects over the course of your lifetime.

Ariel: And are there examples of programs doing this already? Or we’re just making progress towards programs being able to do this more effectively?

Chelsea: There are some examples of programs being able to do this in terms of image recognition. There’s been a number of works that have been able to do this with real images. I think that more recently we’ve started to see systems being applied to robotics, which I think is one of the more exciting applications of this setting because when you’re training a robot in the real world, you can’t have the robot collect millions of data points or days of experience in order to learn a single task. You need it to share and reuse experiences from other tasks when trying to learn a new task.

So one example of this is that you can have a robot be able to manipulate a new object that it’s never seen before based on just one demonstration of how to manipulate that object from a human.

Ariel: Okay, thanks.

I want to move to a topic that is obviously of great interest to FLI and that is technical safety advances that occurred last year. Again in an email to me, you’ve both mentioned “inverse reward design” and “deep reinforcement learning for human preferences” as two areas related to the safety issue that were advanced last year. I was hoping you could both talk a little bit about what you saw happening last year that gives you hope for developing safer AI and beneficial AI.

Richard: So, as I mentioned, both inverse reward design and deep reinforcement learning from human preferences are exciting papers that came out this year.

So inverse reward design is where the AI system is trying to understand what the original designer or what the original user intends for the system to do. So it actually tries, if it’s in some new setting, a test setting where there are some potentially problematic new things that were introduced relative to the training time, then it tries specifically to back those out or to mitigate the effects of those, so that’s kind of exciting.

Deep reinforcement learning from human preferences is an algorithm for trying to very efficiently get feedback from humans based on trajectories in the context of reinforcement learning systems. So, these are systems that are trying to learn some way to plan, let’s say a path through a game environment or in general trying to learn a policy of what to do in a given scenario. This algorithm, deep RL from human preferences, shows little snippets of potential paths to humans and has them simply choose which are better, very similar to what goes on at an optometrist. Does A look better or does B look better? And just from that, very sophisticated behaviors can be learned from human preferences in a way that was not possible before in terms of scale.

Ariel: Chelsea, is there anything that you wanted to add?

Chelsea: Yeah. So, in general, I guess, going back to AlphaZero and going back to games in general, there’s a very clear objective for achieving the goal, which is whether or not you won the game or your score at the game. It’s very clear what the objective is and what each system should be optimizing for. AlphaZero should be, like when playing Go should be optimizing for winning the game, and if a system is playing Atari games it should be optimizing for maximizing the score.

But in the real world, when you’re training systems, when you’re training agents to do things, when you’re training an AI to have a conversation with you, when you’re training a robot to set the table for you, there is no score function. The real world doesn’t just give you a score function, doesn’t tell you whether or not you’re winning or losing. And I think that this research is exciting and really important because it gives us another mechanism for telling robots, telling these AI systems how to do the tasks that we want them to do.

And for example, the human preferences work, it allows us, in sort of specifying some sort of goal that we want the robot to achieve or kind of giving it a demonstration of what we want the robot to achieve, or some sort of reward function, instead lets us say, “okay, this is not what I want, this is what I want,” throughout the process of learning. And then as a result, at the end you can basically guarantee that if it was able to optimize for your preferences successfully, then you’ll end up with behavior that you’re happy with.

Ariel: Excellent. So I’m sort of curious, before we started recording, Chelsea, you were telling me a little bit about your own research. Are you doing anything with this type of work? Or is your work a little different?

Chelsea: Yeah. So more recently I’ve been working on metalearning and so some of the metalearning works that I talked about previously, like learning just from a single demonstration and reusing data, reusing experience that you talked about previously, has been some of the things that I’ve been focusing on recently in terms of getting robots to be able to do things in the real world, such as manipulating objects, pushing objects around, using a spatula, stuff like that.

I’ve also done work on reinforcement learning where you essentially give a robot an objective, tell it to try to get the object as close as possible to the goal, and I think that the human preferences work provides a nice alternative to the classic setting, to the classic framework of reinforcement learning, that we could potentially apply to real robotic systems.

Ariel: Chelsea, I’m going to stick with you for one more question. In your list of breakthroughs that you’re excited about, one of the things that you mentioned is very near and dear to my heart, and that was better communication, and specifically better communication of the research. And I was hoping you could talk a little bit about some of the websites and methods of communicating that you saw develop and grow last year.

Chelsea: Yes. I think that more and more we’re seeing researchers put their work out in blog posts and try to make their work more accessible to the average user by explaining it in terms that are easier to understand, by motivating it in words that are easier for the average person to understand and I think that this is a great way to communicate the research in a clear way to a broader audience.

In addition, I’ve been quite excited about an effort, I think led by Chris Olah, on building what is called distill.pub. It’s a website and a journal, an academic journal, that tries to move away from this paradigm of publishing research on paper, on trees essentially. Because we have such rich digital technology that allows us to communicate in many different ways, it makes sense to move past just completely written forms of research dissemination. And I think that’s what distill.pub does, is it allows us, allows researchers to communicate research ideas in the form of animations, in the form of interactive demonstrations on a computer screen, and I think this is a big step forward and has a lot of potential in terms of moving forward the communication of research, the dissemination of research among the research community as well as beyond to people that are less familiar with the technical concepts in the field.

Ariel: That sounds awesome, Chelsea, thank you. And distill.pub is probably pretty straight forward, but we’ll still link to it on the post that goes along with this podcast if anyone wants to click straight through.

And Richard, I want to switch back over to you. You mentioned that there was more impressive output from GANs last year, generative adversarial networks.

Richard: Yes.

Ariel: Can you tell us what a generative adversarial network is?

Richard: So a generative adversarial network is an AI system where there are two parts, essentially a generator or creator that comes up with novel artifacts and a critic that tries to determine whether this is a good or legitimate or realistic type of thing that’s being generated. So both are learned in parallel as training data is streamed into the system, so in this way, the generator learns relatively efficiently how to create things that are good or realistic.

Ariel: So I was hoping you could talk a little bit about what you saw there that was exciting.

Richard: Sure, so new architectures and new algorithms and simply more horsepower as well have led to more impressive output. Particularly exciting are conditional generative adversarial networks, where there can be structured biases or new types of inputs that one wants to base some output around.

Chelsea: Yeah, I mean, one thing to potentially add is that I think the research on GANs is really exciting and I think that it will not only make advances in generating images of realistic quality, but also generating other types of things, like generating behavior potentially, or generating speech, or generating a language. We haven’t seen as much advances in those areas as generating images, thus far the most impressive advances have been in generating images. I think that those are areas to watch out for as well.

One thing to be concerned about in terms of GANs is the ability for people to generate fake images, fake videos of different events happening and putting those fake images and fake videos into the media, because while there might be ways to detect whether or not these images are made-up or are counterfeited essentially, the public might choose to believe something that they see. If you see something, you’re very likely to believe it, and this might exacerbate all of the, I guess, fake news issues that we’ve had recently.

Ariel: Yeah, so that actually brings up something that I did want to get into, and honestly, that, Chelsea, what you just talked about, is some of the scariest stuff I’ve seen, just because it seems like it has the potential to create sort of a domino effect of triggering all of these other problems just with one fake video. So I’m curious, how do we address something like that? Can we? And are there other issues that you’ve seen crop in the last year that also have you concerned?

Chelsea: I think there are potentially ways to address the problem in that if media websites, if it seems like it’s becoming a real danger in the imminent future, then I think that media websites, including social media websites, should take measures to try to be able to detect fake images and fake videos and either prevent them from being displayed or put a warning that it seems like it was detected as something that was fake, to explicitly try to mitigate the effects.

But, that said, I haven’t put that much thought into it. I do think it’s something that we should be concerned about, and the potential solution that I mentioned, I think that even if it can help solve some of the problems, I think that we don’t have a solution to the problem yet.

Ariel: Okay, thank you. I want to move on to the last question that I have that you both brought up, and that was, last year we saw an increased discussion of fairness in machine learning. And Chelsea, you mentioned there was a NIPS tutorial on this and the keynote mentioned it at NIPS as well. So I was hoping you could talk a bit about what that means, what we saw happen, and how you hope this will play out to better programs in the future.

Chelsea: So, there’s been a lot of discussion in how we can build machine-learning systems, build AI systems such that when they make decisions, they are fair and they aren’t biased. And all this discussion has been around fairness in machine learning, and actually one of the interesting things about the discussion from a technical point of view is how you even define fairness and how you define removing biases and such, because a lot of the biases are inherent to the data itself. And how you try to remove those biases can be a bit controversial.

Ariel: Can you give us some examples?

Chelsea: So one example is, if you’re trying to build an autonomous car system that is trying to avoid hitting pedestrians, and recognize pedestrians when appropriate and respond to them, then if these systems are trained in environments and in communities that are predominantly of one race, for example in Caucasian communities, and you then deploy this system in settings where there are people of color and in other environments that it hasn’t seen before, then the resulting system won’t have as good accuracy on settings that it hasn’t seen before and will be biased inherently, when it for example tries to recognize people of color, and this is a problem.

So some other examples of this is if machine learning systems are making decisions about who to give health insurance to, or speech recognition systems that are trying to recognize different speeches, if these systems are trained on a smaller part of the community that is not representative of the entire population as a whole, then they won’t be able to accurately make decisions about the entire population. Or if they’re trained on data that was collected by humans that has the same biases as humans, then they will make the same mistake, they will inherit the same biases that humans inherit, that humans have.

I think that the people that have been researching fairness in machine learning systems, unfortunately one of the conclusions that they’ve made so far is that there isn’t just a one size fits all solution to all of these different problems, and in many cases we’ll have to think about fairness in individual contexts.

Richard: Chelsea, you mentioned that some of the remediations for fairness issues in machine learning are themselves controversial. Can you go into an example or so about that?

Chelsea: Yeah, I guess part of what I meant there is that even coming up with a definition for what is fair is unclear. It’s unclear what even the problem specification is, and without a problem specification, without a definition of what you want your system to be doing, creating a system that’s fair is a challenge if you don’t have a definition for what fair is.

Richard: I see.

Ariel: So then, my last question to you both, as we look towards 2018, what are you most excited or hopeful to see?

Richard: I’m very hopeful for the FLI grants program that we announced at the very end of 2017 leading to some very interesting and helpful AI safety papers and AI safety research in general that will build on past research and break new ground and will enable additional future research to be built on top of it to make the prospect of general intelligence safer and something that we don’t need to fear as much. But that is a hope.

Ariel: And Chelsea, what about you?

Chelsea: I think I’m excited to see where metalearning goes. I think that there’s a lot more people that are paying attention to it and starting to research into “learning to learn” topics. I’m also excited to see more advances in machine learning for robotics. I think that, unlike other fields in machine learning like machine translation, image recognition, et cetera, I think that robotics still has a long way to go in terms of being useful and solving a range of complex tasks and I hope that we can continue to make strides in machine learning for robotics in the coming year and beyond.

Ariel: Excellent. Well, thank you both so much for joining me today.

Richard: Sure, thank you.

Chelsea: Yeah, I enjoyed talking to you.

 

This podcast was edited by Tucker Davey.

Rewinding the Doomsday Clock

On Thursday, the Bulletin of Atomic Scientists inched their iconic Doomsday Clock forward another thirty seconds. It is now two minutes to midnight.

Citing the growing threats of climate change, increasing tensions between nuclear-armed countries, and a general loss of trust in government institutions, the Bulletin warned that we are “making the world security situation more dangerous than it was a year ago—and as dangerous as it has been since World War II.”

The Doomsday Clock hasn’t fallen this close to midnight since 1953, a year after the US and Russia tested the hydrogen bomb, a bomb up to 1000 times more powerful than the bombs dropped on Hiroshima and Nagasaki. And like 1953, this year’s announcement highlighted the increased global tensions around nuclear weapons.

As the Bulletin wrote in their statement, “To call the world nuclear situation dire is to understate the danger—and its immediacy.”

Between the US, Russia, North Korea, and Iran, the threats of aggravated nuclear war and accidental nuclear war both grew in 2017. As former Secretary of Defense William Perry said in a statement, “The events of the past year have only increased my concern that the danger of a nuclear catastrophe is increasingly real. We are failing to learn from the lessons of history as we find ourselves blundering headfirst towards a second cold war.”

The threat of nuclear war has hovered in the background since the weapons were invented, but with the end of the Cold War, many were pulled into what now appears to have been a false sense of security. In the last year, aggressive language and plans for new and upgraded nuclear weapons have reignited fears of nuclear armageddon. The recent false missile alerts in Hawaii and Japan were perhaps the starkest reminders of how close nuclear war feels, and how destructive it would be. 

 

But the nuclear threat isn’t all the Bulletin looks at. 2017 also saw the growing risk of climate change, a breakdown of trust in government institutions, and the emergence of new technological threats.

Climate change won’t hit humanity as immediately as nuclear war, but with each year that the international community fails to drastically reduce carbon fossil fuel emissions, the threat of catastrophic climate change grows. In 2017, the US pulled out of the Paris Climate Agreement and global carbon emissions grew 2% after a two-year plateau. Meanwhile, NASA and NOAA confirmed that the past four years are the hottest four years they’ve ever recorded.

For emerging technological risks, such as widespread cyber attacks, the development of autonomous weaponry, and potential misuse of synthetic biology, the Bulletin calls for the international community to work together. They write, “world leaders also need to seek better collective methods of managing those advances, so the positive aspects of new technologies are encouraged and malign uses discovered and countered.”

Pointing to disinformation campaigns and “fake news”, the Bulletin’s Science and Security Board writes that they are “deeply concerned about the loss of public trust in political institutions, in the media, in science, and in facts themselves—a loss that the abuse of information technology has fostered.”

 

Turning Back the Clock

The Doomsday Clock is a poignant symbol of the threats facing human civilization, and it received broad media attention this week through British outlets like The Guardian and The Independent, Australian outlets such as ABC Online, and American outlets from Fox News to The New York Times.

“[The clock] is a tool,” explains Lawrence Krauss, a theoretical physicist at Arizona State University and member of the Bulletin’s Science and Security Board. “For one day a year, there are thousands of newspaper stories about the deep, existential threats that humanity faces.”

The Bulletin ends its report with a list of priorities to help turn back the Clock, chocked full of suggestions for government and industrial leaders. But the authors also insist that individual citizens have a crucial role in tackling humanity’s greatest risks.

“Leaders react when citizens insist they do so,” the authors explain. “Citizens around the world can use the power of the internet to improve the long-term prospects of their children and grandchildren. They can insist on facts, and discount nonsense. They can demand action to reduce the existential threat of nuclear war and unchecked climate change. They can seize the opportunity to make a safer and saner world.”

You can read the Bulletin’s full report here.

Podcast: Beneficial AI and Existential Hope in 2018

For most of us, 2017 has been a roller coaster, from increased nuclear threats to incredible advancements in AI to crazy news cycles. But while it’s easy to be discouraged by various news stories, we at FLI find ourselves hopeful that we can still create a bright future. In this episode, the FLI team discusses the past year and the momentum we’ve built, including: the Asilomar Principles, our 2018 AI safety grants competition, the recent Long Beach workshop on Value Alignment, and how we’ve honored one of civilization’s greatest heroes.

Full transcript:

Ariel: I’m Ariel Conn with the Future of Life Institute. As you may have noticed, 2017 was quite the dramatic year. In fact, without me even mentioning anything specific, I’m willing to bet that you already have some examples forming in your mind of what a crazy year this was. But while it’s easy to be discouraged by various news stories, we at FLI find ourselves hopeful that we can still create a bright future. But I’ll let Max Tegmark, president of FLI, tell you a little more about that.

Max: I think it’s important when we reflect back at the years news to understand how things are all connected. For example, the drama we’ve been following with Kim Jung Un and Donald Trump and Putin with nuclear weapons, is really very connected to all the developments in artificial intelligence because in both cases we have a technology which is so powerful that it’s not clear that we humans have sufficient wisdom to manage it well. And that’s why I think it’s so important that we all continue working towards developing this wisdom further, to make sure that we can use these powerful technologies like nuclear energy, like artificial intelligence, like biotechnology and so on to really help rather than to harm us.

Ariel: And it’s worth remembering that part of what made this such a dramatic year was that there were also some really positive things that happened. For example, in March of this year, I sat in a sweltering room in New York City, as a group of dedicated, caring individuals from around the world discussed how they planned to convince the United Nations to ban nuclear weapons once and for all. I don’t think anyone in the room that day realized that not only would they succeed, but by December of this year, the International Campaign to Abolish Nuclear Weapons, led by Beatrice Fihn would be awarded the Nobel Peace Prize for their efforts. And while we did what we could to help that effort, our own big story had to be the Beneficial AI Conference that we hosted in Asilomar California. Many of us at FLI were excited to talk about Asilomar, but I’ll let Anthony Aguirre, Max, and Victoria Krakovna start.

Anthony: I would say pretty unquestionably the big thing that I felt was most important and felt most excited about was the big meeting in Asilomar and centrally putting together the Asilomar Principles.

Max: I’m going to select the Asilomar conference that we organized early this year, whose output was the 23 Asilomar Principles, which has since been signed by over a thousand AI researchers around the world.

Vika: (take 2) I was really excited about the Asilomar conference that we organized this year. This was the sequel to FLI’s Puerto Rico Conference, which was at the time a real game changer in terms of making AI safety more mainstream and connecting people working in AI safety with the machine learning community and integrating those two. I think Asilomar did a great job of continuing to build on that.

Max: I’m very excited about this because I feel that it really has helped mainstream AI safety work. Not just near term AI safety stuff, like how to transform today’s buggy and hackable computers into robust systems that you can really trust but also mainstream larger issues. The Asilomar Principles actually contain the word super intelligence, contain the phrase existential risk, contain the phrase recursive self improvement and yet they have been signed by really a who’s who in AI. So it’s from now on, it’s impossible for anyone to dismiss these kind of concerns, this kind of safety research. By saying, that’s just people who have no clue about AI.

Anthony: That was a process that started in 2016, brainstorming at FLI and then the wider community and then getting rounds of feedback and so on. But it was exciting both to see how much cohesion there was in the community and how much support there was for getting behind some sort of principles governing AI. But also, just to see the process unfold because one of the things that I’m quite frustrated about often is this sense that there’s this technology that’s just unrolling like a steam roller and it’s going to go where it’s going to go, and we don’t have any agency over where that is. And so to see people really putting thought into what is the world we would like there to be in ten, fifteen, twenty, fifty years and how can we distill what it is that we like about that world into principles like these…that felt really, really good. It felt like an incredibly useful thing for society as a whole but in this case, the people who are deeply engaged with AI, to be thinking through in a real way rather than just how can we put out the next fire, or how can we just turn the progress one more step forward, to really think about the destination.

Ariel: But what’s that next step? How do we transition from Principles that we all agree on to actions that we can also all get behind. Jessica Cussins joined FLI later in the year, but when asked what she was excited about as far as FLI was concerned, she immediately mentioned the implementation of things like the Asilomar Principles.

Jessica: I’m most excited about the developments we’ve seen over the last year related to safe, beneficial and ethical AI. I think FLI has been a really important player in this. We had the beneficial AI conference in January that resulted in the Asilomar AI Principles. It’s been really amazing to see how much traction those principles have gotten and to see a growing consensus around the importance of being thoughtful about the design of AI systems, the challenges of algorithmic bias of data control and manipulation and accountability and governance. So the thing I’m most excited about right now, is the growing number of initiatives we’re seeing around the world related to ethical and beneficial IA.

Anthony: What’s been great to see is the development of ideas both from FLI and from many other organizations of what policies might be good. What concrete legislative actions there might be or standards, organizations or non-profits, agreements between companies and so on might be interesting.

But I think, we’re only at the step of formulating those things and not that much action has been taken anywhere in terms of actually doing those things. Little bits of legislation here and there. But I think we’re getting to the point where lots of governments, lots of companies, lots of organizations are going to be publishing and creating and passing more and more of these things. I think seeing that play out and working really hard to ensure that it plays out in a way that’s favorable in as many ways and as many people as possible, I think is super important and something we’re excited to do.

Vika: I think that Asilomar principles are a great common point for the research community and others to agree what we are going for, what’s important.

Besides having the principles as an output, the event itself was really good for building connections between different people from interdisciplinary backgrounds, from different related fields who are interested in the questions of safety and ethics.

And we also had this workshop that was adjacent to Asilomar where our grant winners actually presented their work. I think it was great to have a concrete discussion of research and the progress we’ve made so far and not just abstract discussions of the future, and I hope that we can have more such technical events, discussing research progress and making the discussion of AI safety really concrete as time goes on.

Ariel: And what is the current state of AI safety research? Richard Mallah took on the task of answering that question for the Asilomar conference, while Tucker Davey has spent the last year interviewing various FLI grant winners to better understand their work.

Richard: I presented a landscape of technical AI safety research threads. This lays out hundreds of different types of research areas and how they are related to each other. All different areas that need a lot more research going into them than they have today to help keep AI safe and beneficent and robust. I was really excited to be at Asilomar and to have co-organized Asilomar and that so many really awesome people were there and collaborating on these different types of issues. And that they were using that landscape that I put together as sort of a touchpoint and way to coordinate. That was pretty exciting.

Tucker: I just found it really inspiring interviewing all of our AI grant recipients. It’s kind of been an ongoing project interviewing these researchers and writing about what they’re doing. Just for me, getting recently involved in AI, it’s been incredibly interesting to get either a half an hour, an hour with these researchers to talk in depth about their work and really to learn more about a research landscape that I hadn’t been aware of before working at FLI. Really, being a part of those interviews and learning more about the people we’re working with and these people that are really spearheading AI safety was really inspiring to be a part of.

Ariel: And with that, we have a big announcement.

Richard: So, FLI is launching a new grants program in 2018. This time around, we will be focusing more on artificial general intelligence, artificial super intelligence and ways that we can do technical research and other kinds of research today. On today’s systems or things that we can analyze today, things that we can model or make theoretical progress on today that are likely to actually still be relevant at the time, where AGI comes about. This is quite exciting and I’m excited to be part of the ideation and administration around that.

Max: I’m particularly excited about the new grants program that we’re launching for AI safety research. Since AI safety research itself has become so much more mainstream, since we did our last grants program three years ago, there’s now quite a bit of funding for a number of near term challenges. And I feel that we at FLI should focus on things more related to challenges and opportunities from super intelligence, since there is virtually no funding for that kind of safety research. It’s going to be really exciting to see what proposals come in and what research teams get selected by the review panels. Above all, how this kind of research hopefully will contribute to making sure that we can use this powerful technology to create a really awesome future.

Vika: I think this grant program could really build on the impact of our previous grant program. I’m really excited that it’s going to focus more on long term AI safety research, which is still the most neglected area.

AI safety has really caught on in the past two years, and there’s been a lot more work on that going on, which is great. And part of what this means is that the we at FLI can focus more on the long term. The long term work has also been getting more attention, and this grant program can help us build on that and make sure that the important problems get solved. This is really exciting.

Max: I just came back from spending a week at the NIPS Conference, the biggest artificial intelligence conference of the year. Its fascinating how rapidly everything is proceeding. AlphaZero has now defeated not just human chess players and Go players but it has also defeated human AI researchers, who after spending 30 years handcrafting artificial intelligence software to play computer chess, got all their work completely crushed by AlphaZero that just learned to do much better than that from scratch in four hours.

So, AI is really happening, whether we like it or not. The challenge we face is simply to compliment that through AI safety research and a lot of good thinking to make sure that this helps humanity flourish rather than flounder.

Ariel: In the spirit of flourishing, FLI also turned its attention this year to the movement to ban lethal autonomous weapons. While there is great debate around how to define autonomous weapons and whether or not they should be developed, more people tend to agree that the topic should at least come before the UN for negotiations. And so we helped create the video Slaughterbots to help drive this conversation. I’ll let Max take it from here.

Max: Slaughterbots, autonomous little drones that can go anonymously murder people without any human control. Fortunately, they don’t exist yet. We hope that an international treaty is going to keep it that way, even though we almost have the technology to do them already. Just need to integrate then mass produce tech we already have. So to help with this, we made this video called Slaughterbots. It was really impressive to see it get over forty million views and make the news throughout the world. I was very happy that Stewart Russell, whom we partnered with in this, also presented this to the diplomats at the United Nations in Geneva when they were discussing whether to move towards a treaty, drawing a line in the sand.

Anthony: Pushing on the autonomous weapons front, it’s been really scary, I would say to think through that issue. But a little bit like the issue of AI, in general, there’s a potential scary side but there’s also a potentially helpful side in that I think this is an issue that is a little bit tractable. Even a relatively small group of committed individuals can make difference. So I think, I’m excited to see how much movement we can get on the autonomous weapons front. It doesn’t seem at all like a hopeless issue to me and I think 2018 will be kind of a turning point — I hope that will be sort of a turning point for that issue. It’s kind of flown under the radar but it really is coming up now and it will be at least interesting. Hopefully, it will be exciting and happy and so on as well as interesting. It will at least be interesting to see how it plays out on the world stage.

Jessica: For 2018, I’m hopeful that we will see the continued growth of the global momentum against lethal autonomous weapons. Already, this year a lot has happened at the United Nations and across communities around the world, including thousands of AI and robotics researchers speaking out and saying they don’t want to see their work used to create these kinds of destabilizing weapons of mass destruction. One thing I’m really excited for 2018 is to see a louder, rallying call for an international ban of lethal autonomous weapons.

Ariel: Yet one of the biggest questions we face when trying to anticipate autonomous weapons and artificial intelligence in general, and even artificial general intelligence – one of the biggest questions is: when? When will these technologies be developed? If we could answer that, then solving problems around those technologies could become both more doable and possibly more pressing. This is an issue Anthony has been considering.

Anthony: Of most interest has been the overall set of projects to predict artificial intelligence timelines and milestones. This is something that I’ve been doing through this prediction website, Metaculus, which I’ve been a part of. And also something where I’ve took part in a very small workshop run by the Foresight Institute over the summer. It’s both a super important question because I think the overall urgency with which we have to deal with certain issues really depends on how far away they are. It’s also an instructive one, in that even posing the questions of what do we want to know exactly, really forces you to think through what is it that you care about, how would you estimate things, what different considerations are there in terms of this sort of big question.

We have this sort of big question, like when is really powerful AI going to appear? But when you dig into that, what exactly is really powerful, what exactly…  What does appear mean? Does that mean in sort of an academic setting? Does it mean becomes part of everybody’s life?

So there are all kinds of nuances to that overall big question that lots of people asking. Just getting into refining the questions, trying to pin down what it is that mean — make them exact so that they can be things that people can make precise and numerical predictions about. I think its been really, really interesting and elucidating to me and in sort of understanding what all the issues are. I’m excited to see how that kind of continues to unfold as we get more questions and more predictions and more expertise focused on that. Also, a little but nervous because the timeline seemed to be getting shorter and shorter and the urgency of the issue seems to be getting greater and greater. So that’s a bit of a fire under us, I think, to keep acting and keep a lot of intense effort on making sure that as AI gets more powerful, we get better at managing it.

Ariel: One of the current questions AI researchers are struggling with is the problem of value alignment, especially when considering more powerful AI. Meia Chita-Tegmark and Lucas Perry recently co-organized an event to get more people thinking creatively about how to address this.

Meia: So we just organized a workshop about the ethics of value alignment together with a few partner organizations, the Berggruen Institute and also CFAR.

Lucas: This was a workshop recently that took place in California and just to remind everyone, value alignment is the process by which we bring AI’s actions, goals, and intention in alignment with and in accordance with what is deemed to be the good or what are human values and preferences and goals and intentions.

Meia: And we had a fantastic group of thinkers there. We had philosophers. We had social scientists, AI researchers, political scientists. We were all discussing this very important issue of how do we get an artificial intelligence that is aligned to our own goals and our own values.

It was really important to have the perspectives of ethicists and moral psychologists, for example, because this question is not just about the technical aspect of how do you actually implement it, but also about whose values do we want implemented and who should be part of the conversation and who gets excluded and what process do we want to establish to collect all the preferences and values that we want implemented in AI. That was really fantastic. It was a very nice start to what I hope will continue to be a really fruitful collaboration between different disciplines on this very important topic.

Lucas: I think one essential take-away from that was that value alignment is truly something that is interdisciplinary. It’s normally been something which has been couched and understood in the context of technical AI safety research, but value alignment, at least in my view, also inherently includes ethics and governance. It seems that the project of creating beneficial AI through efforts and value alignment can really only happen when we have lots of different people from lots of different disciplines working together on this supremely hard issue.

Meia: I think the issue with AI is something that … first of all, it concerns such a great number of people. It concerns all of us. It will impact, and it already is impacting all of our experiences. There’re different disciplines that look at this impact from different ways.

Of course, technical AI researchers will focus on developing this technology, but it’s very important to think about how does this technology co-evolve with us. For example, I’m a psychologist. I like to think about how does it impact our own psyche. How does it impact the way we act in the world, the way we behave. Stuart Russell many times likes to point out that one danger that can come with very intelligent machines is a subtle one, not necessarily what they will do, but what we will not do because of them. He calls this enfeeblement. What are the capacities that are being stifled because we no longer engage in some of the cognitive tasks that we’re now delegating to AIs.

So that’s just one example of how, for example, psychologists can help really bring more light and make us reflect on what is it that we want from our machines and how do we want to interact with them and how do we wanna design them such that they actually empower us rather than enfeeble us.

Lucas: Yeah, I think that one essential thing to FLI’s mission and goal is the generation of beneficial AI. To me, and I think many other people coming out of this Ethics of Value Alignment conference, you know, what beneficial exactly entails and what beneficial looks like is still a really open question both in the short term and in the long-term. I’d be really interested in seeing both FLI and other organizations pursue questions in value alignment more vigorously. Issues with regard to the ethics of AI and issues regarding value and the sort of world that we want to live in.

Ariel: And what sort of world do we want to live in? If you’ve made it this far through the podcast, you might be tempted to think that all we worry about is AI. And we do think a lot about AI. But our primary goal is to help society flourish. And so this year, we created the Future of Life Award to be presented to people who act heroically to ensure our survival and hopefully move us closer to that ideal world. Our inaugural award was presented in honor of Vasili Arkhipov who stood up to his commander on a Soviet submarine, and prevented the launch of a nuclear weapon during the height of tensions in the Cold War.

Tucker: One thing that particularly stuck out to me was our inaugural Future of Life Award and we presented this award to Vasili Arkhipov who was a Soviet officer in the Cold War and arguably saved the world and is the reason we’re all alive today. He’s now passed, but FLI presented a generous award to his daughter and his grandson. It was really cool to be a part of this because it seemed like the first award of its kind.

Meia: So, of course with FLI, we have all these big projects that take a lot of time. But I think for me, one of the more exciting and heartwarming and wonderful moments that I was able to experience due to our work here at FLI was a train ride from London to Cambridge with Elena and Sergei, the daughter and the grandson of Vasili Arkhipov. Vasili Arkhipov is this Russian naval officer that helped prevent a second world war in the Cuban missile crisis. The Future of Life Institute awarded him the Future of Life prize this year. He is now dead unfortunately, but his daughter and his grandson was there in London to receive it.

Vika: It was great to get to meet them in person and to all go on stage together and have them talk about their attitude towards the dilemma that Vasili Arkhipov has faced, and how it is relevant today, and how we should be really careful with nuclear weapons and protecting our future. It was really inspiring.

At that event, Max was giving his talk about his book, and then at the end we had the Arkhipovs come up on stage and it was kind of fun for me to translate their speech to the audience. I could not fully transmit all the eloquence, but thought it was a very special moment.

Meia: It was just so amazing to really listen to their stories about the father, the grandfather, and look at photos that they had brought all the way from Moscow. This person who has become the hero for so many people that are really concerned about this essential risk, it was nice to really imagine him in his capacity as a son, as a grandfather, as a husband, as a human being. It was very inspiring and touching.

One of the nice things was they showed a photo of him that had actually notes that he had written on the back of it. That was his favorite photo. And one of the comments he made is that he felt that that was the most beautiful photo of himself because there was no glint in his eyes. It was just this pure sort of concentration. I thought that said a lot about his character. He rarely smiled in photos, also. Also always looked very pensive. Very much like you’d imagine a hero who saved the world would be.

Tucker: It was especially interesting for me to work on the press release for this award and to reach out to people from different news outlets, like The Guardian and The Atlantic, and to actually see them write about this award.

I think something like the Future of Life Award is inspiring because it highlights people in the past that have done an incredible service to civilization, but I also think it’s interesting to look forward and think about who might be the future Vasili Arkhipov that saves the world.

Ariel: As Tucker just mentioned, this award was covered by news outlets like the Guardian and the Atlantic. And in fact, we’ve been incredibly fortunate to have many of our events covered by major news. However, there are even more projects we’ve worked on that we think are just as important and that we’re just as excited about that most people probably aren’t aware of.

Jessica: So people may not know that FLI recently joined the partnership on AI. This was the group that was founded by Google and Amazon, Facebook and Apple and others to think about issues like safety, and fairness and impact from AI systems. So I’m excited about this because I think it’s really great to see this kind of social commitment from industry, and it’s going to be critical to have the support and engagement from these players to really see AI being developed in a way that’s positive for everyone. So I’m really happy that FLI is now one of the partners of what will likely be an important initiative for AI.

Anthony: I attending the first meeting of the partnership on AI in October. And to see, at that meeting, so much discussion of some of the principles themselves directly but just in a broad sense. So much discussion from all of the key organizations that are engaged with AI, that almost all of whom had representation there, about how are we going to make these things happen. If we value transparency, if we value fairness, if we value safety and trust in AI systems, how are we going to actually get together and formulate best practices and policies, and groups and data sets and things to make all that happen. And to see the speed at which, I would say the field has moved from purely, wow, we can do this, to how are we going to do this right and how are we going to do this well and what does this all mean, has been a ray of hope I would say.

AI is moving so fast but it was good to see that I think the sort of wisdom race hasn’t been conceded entirely. That there are dedicated group of people that are working really hard to figure out how to do it well.

Ariel: And then there’s Dave Stanley, who has been the force around many of the behind-the-scenes projects that our volunteers have been working on that have helped FLI grow this year.

Dave: As for another project that has very much been ongoing and more relates to the website is basically our ongoing effort to make the English content on the website that’s been fairly influential in English speaking countries about AI safety and nuclear weapons, take that content and make it available in a lot of other languages to maximize the impact that it’s having.

Right now, thanks to the efforts of our volunteers, we have 55 translations available on our website right now in nine different languages, which are Russian, Chinese, French, Polish, Spanish, German, Hindi, Japanese, and Korean. All in all, this represents about 1000 hours of volunteer time put in by our volunteers. I’d just like to give a shoutout to some of the volunteers who have been involved. They are Alan Yan, Kevin Wang, Kazue Evans, Jake Beebe, Jason Orlosky, Li Na, Bena Lim, Alina Kovtun, Ben Peterson, Carolyn Wu, Zhaoran Joanna Wang, Mayumi Nakamura, Derek Su, Dipti Pandey, Marvin, Vera Koroleva, Grzegorz Orwiński, Szymon Radziszewicz, Natalia Berezovskaya, Vladimir Nimensky, Natalia Kuzmenko, George Godula, Eric Gastfriend, Olivier Grondin, Claire Park, Kristy Wen, Yishuai Du, and Revathi Vinoth Kumar.

Ariel: As we’ve worked to establish AI safety as a global effort, Dave and the volunteers were behind the trip Richard took to China, where he participated in the Global Mobile Internet Conference in Beijing earlier this year.

Dave: So basically, this was something that was actually prompted and largely organized by one of FLIs volunteers, George Godula, who’s based in Shanghai right now.

Basically, this is partially motivated by the fact that recently, China’s been promoting a lot of investment in artificial intelligence research, and they’ve made it a national objective to become a leader in AI research by 2025. So FLI and the team have been making some efforts to basically try to build connections with China and raise awareness about AI safety, at least our view on AI safety and engage in dialogue there.

It’s culminated with George organizing this trip for Richard, and A large portion of the FLI volunteer team participating in basically support for that trip. So identifying contacts for Richard to connect with over there and researching the landscape and providing general support for that. And then that’s been coupled with an effort to take some of the existing articles that FLI has on their website about AI safety and translate those to Chinese to make it accessible to that audience.

Ariel: In fact, Richard has spoken at many conferences, workshops and other events this year, and he’s noted a distinct shift in how AI researchers view AI safety.

Richard: This is a single example of many of these things I’ve done throughout the year. Yesterday I gave a talk to a bunch of machine learning and artificial intelligence researchers and entrepreneurs in Boston, here where I’m based about AI safety and beneficence. Every time I do this it’s really fulfilling that so many of these people who really are pushing the leading edge of what AI does in many respects. They realize that these are extremely valid concerns and there are new types of technical avenues to help just keep things better for the future. The facts that I’m not receiving push back anymore as compared to many years ago when I would talk about these things — that people really are trying to gauge and understand and kind of weave themselves into whatever is going to turn into the best outcome for humanity. Given the type of leverage that advanced AI will bring us. I think people are starting to really get what’s at stake.

Ariel: And this isn’t just the case among AI researchers. Throughout the year, we’ve seen this discussion about AI safety broaden into various groups outside of traditional AI circles, and we’re hopeful this trend will continue in 2018.

Meia: I think that 2017 has been fantastic to start this project of getting more thinkers from different disciplines to really engage with the topic of artificial intelligence, but I think we are just manage to scratch the surface of this topic in this collaboration. So I would really like to work more on strengthening this conversation and this flow of ideas between different disciplines. I think we can achieve so much more if we can make sure that we hear each other, that we go past our own disciplinary jargon, and that we truly are able to communicate and join each other in research projects where we can bring different tools and different skills to the table.

Ariel: The landscape on AI safety research that Richard presented at Asilomar at the start of the year was designed to enable greater understanding among researchers. Lucas rounded off the year with another version of the landscape. This one looking at ethics and value alignment with the goal, in part, of bringing more experts from other fields into the conversation.

Lucas: One thing that I’m also really excited about for next year is seeing our conceptual landscapes of both AI safety and value alignment being used in more educational context and in context in which they can foster interdisciplinary conversations regarding issues in AI. I think that their virtues are that they create a conceptual landscape of both AI safety and value alignment, but also include definitions and descriptions of jargon. Given this, it functions both as a means by which you can introduce people to AI safety and value alignment and AI risk, but it also serves as a means of introducing experts to sort of the conceptual mappings of the spaces that other experts are engaged with and so they can learn each other’s jargon and really have conversations that are fruitful and sort of streamlined.

Ariel: As we look to 2018, we hope to develop more programs, work on more projects, and participate in more events that will help draw greater attention to the various issues we care about. We hope to not only spread awareness, but also to empower people to take action to ensure that humanity continues to flourish in the future.

Dave: There’s a few things that are coming up that I’m really excited about. The first one is basically we’re going to be trying to release some new interactive apps on the website that’ll hopefully be pages that can gather a lot of attention and educate people about the issues that we’re focused on, mainly nuclear weapons, and answering questions to give people a better picture of what are the geopolitical and economic factors that motivate countries to keep their nuclear weapons and how does this relate to public support, based on polling data, for whether the general public wants to keep these weapons or not.

Meia: One thing that I think has made me also very excited in 2017, and I’m looking forward to seeing the evolution of in 2018 was the public’s engagement with this topic. I’ve had the luck to be in the audience for many of the book talks that Max has given for his book “Life 3.0: Being Human in the Age of Artificial Intelligence,” and it was fascinating just listening to the questions. They’ve become so much more sophisticated and nuanced than a few years ago. I’m very curious to see how this evolves in 2018, and I hope that FLI will contribute to this conversation and making it more rich. I think I’d like people in general to get engaged with this topic much more, and refine their understanding of it.

Tucker: Well, I think in general it’s been amazing to watch FLI this year because we’ve made big splashes in so many different things with the Asilomar conference, with our Slaughterbots video, helping with the nuclear ban, but I think one thing that I’m particularly interested in is working more this coming year to I guess engage my generation more on these topics. I sometimes sense a lot of defeatism and hopelessness with people in my generation. Kind of feeling like there’s nothing we can do to solve civilization’s biggest problems. I think being at FLI has kind of given me the opposite perspective. Sometimes I’m still subject to that defeatism, but working here really gives me a sense that we can actually do a lot to solve these problems. I’d really like to just find ways to engage more people in my generation to make them feel like they actually have some sense of agency to solve a lot of our biggest challenges.

Ariel: Learn about these issues and more, join the conversation, and find out how you can get involved by visiting futureoflife.org.

[end]

 

Podcast: Balancing the Risks of Future Technologies with Andrew Maynard and Jack Stilgoe

What does it means for technology to “get it right,” and why do tech companies ignore long-term risks in their research? How can we balance near-term and long-term AI risks? And as tech companies become increasingly powerful, how can we ensure that the public has a say in determining our collective future?

To discuss how we can best prepare for societal risks, Ariel spoke with Andrew Maynard and Jack Stilgoe on this month’s podcast. Andrew directs the Risk Innovation Lab in the Arizona State University School for the Future of Innovation in Society, where his work focuses on exploring how emerging and converging technologies can be developed and used responsibly within an increasingly complex world. Jack is a senior lecturer in science and technology studies at University College London where he works on science and innovation policy with a particular interest in emerging technologies.

The following transcript has been edited for brevity, but you listen to the podcast above or read the full transcript here.

Ariel: Before we get into anything else, could you first define what risk is?

Andrew: The official definition of risk is it looks at the potential of something to cause harm, but it also looks at the probability. Say you’re looking at exposure to a chemical, risk is all about the hazardous nature of that chemical, its potential to cause some sort of damage to the environment or the human body, but then exposure that translates that potential into some sort of probability. That is typically how we think about risk when we’re looking at regulating things.

I actually think about risk slightly differently, because that concept of risk runs out of steam really fast, especially when you’re dealing with uncertainties, existential risk, and perceptions about risk when people are trying to make hard decisions and they can’t make sense of the information they’re getting. So I tend to think of risk as a threat to something that’s important or of value. That thing of value might be your health, it might be the environment; but it might be your job, it might be your sense of purpose or your sense of identity or your beliefs or your religion or your politics or your worldview.

As soon as we start thinking about risk in that sense, it becomes much broader, much more complex, but it also allows us to explore that intersection between different communities and their different ideas about what’s important and worth protecting.

Jack: I would draw attention to all of those things that are incalculable. When we are dealing with new technologies, they are often things to which we cannot assign probabilities and we don’t know very much about what the likely outcomes are going to be.

I think there is also a question of what isn’t captured when we talk about risk. Not all of the impacts of technology might be considered risk impacts. I’d say that we should also pay attention to all the things that are not to do with technology going wrong, but are also to do with technology going right. Technologies don’t just create new risks, they also benefit some people more than others. And they can create huge inequalities. If they’re governed well, they can also help close inequalities. But if we just focus on risk, then we lose some of those other concerns as well.

Andrew: Jack, so this obviously really interests me because to me an inequality is a threat to something that’s important to someone. Do you have any specific examples of what you think about when you think about inequalities or equality gaps?

Jack: Before we get into examples, the important thing is to bear in mind a trend with technology, which is that technology tends to benefit the powerful. That’s an overall trend before we talk about any specifics, which quite often goes against the rhetoric of technological change, because, often, technologies are sold as being emancipatory and helping the worst off in society – which they do, but typically they also help the better off even more. So there’s that general question.

I think in the specific, we can talk about what sorts of technologies do close inequities and which tend to exacerbate inequities. But it seems to me that just defining that as a social risk isn’t quite getting there.

Ariel: I would consider increasing inequality to be a risk. Can you guys talk about why it’s so hard to get agreement on what we actually define as a risk?

Andrew: People very quickly slip into defining risk in very convenient ways. So if you have a company or an organization that really wants to do something – and that doing something may be all the way from making a bucket load of money to changing the world in the ways they think are good – there’s a tendency for them to define risk in ways that benefit them.

So, for instance, if you are the maker of an incredibly expensive drug, and you work out that that drug is going to be beneficial in certain ways with minimal side effects, but it’s only going to be available to a very few very rich number of people, you will easily define risk in terms of the things that your drug does not do, so you can claim with confidence that this is a risk-free or a low-risk product. But that’s an approach where you work out where the big risks are with your product and you bury them and you focus on the things where you think there is not a risk with your product.

That sort of extends across many, many different areas – this tendency to bury the big risks associated with a new technology and highlight the low risks to make your tech look much better than it is so you can reach the aims that you’re trying to achieve.

Jack: I quite agree, Andrew. I think what tends to happen is that the definition of risk gets socialized as being that stuff that society’s allowed to think about whereas the benefits are sort of privatized. The innovators are there to define who benefits and in what ways.

Andrew: I would agree. Though it also gets quite complex in terms of the social dialogue around that and who actually is part of those conversations and who has a say in those conversations.

To get back to your point, Ariel, I think there are a lot of organizations and individuals that want to do what they think is the right thing. But they also want the ability to decide for themselves what the right thing is rather than listening to other people.

Ariel: How do we address that?

Andrew: It’s a knotty problem, and it has its roots in how we are as people and as a society, how we’ve evolved. I think there are a number of ways forwards towards beginning to sort of pick apart the problem. A lot of those are associated with work that is carried out in the social sciences and humanities around how you make these processes more inclusive, how you bring more people to the table, how you begin listening to different perspectives, different sets of values and incorporating them into decisions rather than marginalizing groups that are inconvenient.

Jack: If you regard these things as legitimately political discussions rather than just technical discussions, then the solution is to democratize them and to try to wrest control over the direction of technology away from just the innovators and to see that as the subject of proper democratic conversation.

Andrew: And there are some very practical things here. This is where Jack and I might actually diverge in our perspectives. But from a purely business sense, if you’re trying to develop a new product or a new technology and get it to market, the last thing you can afford to do is ignore the nature of the population, the society that you’re trying to put that technology into. Because if you do, you’re going to run up against roadblocks where people decide they either don’t like the tech or they don’t like the way that you’ve made decisions around it or they don’t like the way that you’ve implemented it.

So from a business perspective, taking a long-term strategy, it makes far more sense to engage with these different communities and develop a dialogue around them so you understand the nature of the landscape that you’re developing a technology into. You can see ways of partnering with communities to make sure that that technology really does have a broad beneficial impact.

Ariel: Why do you think companies resist doing that?

Andrew: I think we’ve had centuries of training that says you don’t ask awkward questions because they potentially lead to you not being able to do what you want to do. It’s partly the mentality around innovation. But, also, it’s hard work. It takes a lot of effort, and it actually takes quite a lot of humility as well.

Jack: There’s a sort of well-defined law in technological change, which is that we overestimate the effect of technology in the short term and underestimate the effect of technology in the long term. Given that companies and innovators have to make short time horizon decisions, often they don’t have the capacity to take on board these big world-changing implications of technology.

If you look at something like the motorcar, it would have been inconceivable for Henry Ford to have imagined the world in which his technology would exist in 50 years time. Even though we know that the motorcar has led to the reshaping of large parts of America. It’s led to an absolutely catastrophic level of public health risk while also bringing about clear benefits of mobility. But those are big long-term changes that evolve very slowly, far slower than any company could appreciate.

Andrew: So can I play devil’s advocate here, Jack? With hindsight should Henry Ford have developed his production line process differently to avoid some of the impacts we now see of motor vehicles?

Jack: You’re right to say with hindsight it’s really hard to see what he might have done differently, because the point is the changes that I was talking about are systemic ones with responsibility shared across large parts of the system. Now, could we have done better at anticipating some of those things? Yes, I think we could have done, and I think had motorcar manufacturers talked to regulators and civil society at the time, they could have anticipated some of those things because there are also barriers that stop innovators from anticipating. There are actually things that force innovators time horizons to narrow.

Andrew: That’s one of the points that really interests me. It’s not this case of “do we, don’t we” with a certain technology, but could we do things better so we see more longer-term benefits and we see fewer hurdles that maybe we could have avoided if we had been a little smarter from the get-go.

Ariel: But how much do you think we can actually anticipate?

Andrew: Well, the basic answer is very little indeed. The one thing that we know about anticipating the future is that we’re always going to get it wrong. But I think that we can put plausible bounds around likely things that are going to happen. Simply from what we know about how people make decisions and the evidence around that, we know that if you ignore certain pieces of information, certain evidence, you’re going to make worse decisions in terms of projecting or predicting future pathways than if you’re actually open to evaluating different types of evidence.

By evidence, I’m not just meaning the scientific evidence, but I’m also thinking about what people believe or hold as valuable within society and what motivates them to do certain things and react in certain ways. All of that is important evidence in terms of getting a sense of what the boundaries are of a future trajectory.

Jack: Yes, we will always get our predictions wrong, but if anticipation is about preparing us for the future rather than predicting the future, then rightness or wrongness isn’t really the target. Instead, I would draw attention to the history of cases in which there has been willful ignorance of particular perspectives or particular evidence that has only been realized later – which, as you know better than anybody, the evidence of public health risk that has been swept under the carpet. We have to look first at the sort of incentives that prompt innovators to overlook that evidence.

Andrew: I think that’s so important. It’s worthwhile bringing up the Late lessons from early warnings report that came out of Europe a few years ago, which were a series of case studies of previous technological innovations over the last 100 years or so, looking at where innovators and companies and even regulators either missed important early warnings or willfully ignored them, and that led to far greater adverse impacts than there really should have been. I think there are a lot of lessons to be learned from those.

Ariel: I’d like to take that and move into some more specific examples now. Jack, I know you’re interested in self-driving vehicles. I was curious, how do we start applying that to these new technologies that will probably be, literally, on the road soon?

Jack: It’s extremely convenient for innovators to define risks in particular ways that suit their own ambitions. I think you see this in the way that the self-driving cars debate is playing out. In part, that’s because the debate is a largely American one and it emanates from an American car culture.

Here in Europe, we see a very different approach to transport with a very different emerging debate. So the trolley problem, the classic example of a risk issue where engineers very conveniently are able to treat it as an algorithmic challenge. How do we maximize public benefits and reduce public risk? Here in Europe where our transport systems are complicated, multimodal; where our cities are complicated, messy things, the self-driving car risks start to expand pretty substantially in all sorts of dimensions.

So the sorts of concerns that I would see for the future of self-driving cars relate more to what are sometimes called second order consequences. What sorts of worlds are these technologies likely to enable? What sorts of opportunities are they likely to constrain? I think that’s a far more important debate than the debate about how many lives a self-driving car will either save or take in its algorithmic decision-making.

Andrew: Jack, you have referred to the trolley problem as trolleys and follies. One of the things I really grapple with, and I think it’s very similar to what you were saying, is that the trolley problem seems to be a false or a misleading articulation of risk. It’s something which is philosophical and hypothetical, but actually doesn’t seem to bear much relation to the very real challenges and opportunities that we’re grappling with with these technologies.

Now, the really interesting thing here is, I get really excited about the self-driving vehicle technologies, partly living here in Tempe where Google and Uber and various other companies are testing them on the road now. But you have quite a different perspective in terms of how fast we’re going with the technology and how little thought there is into the longer term social consequences. But to put my full cards on the table, I can’t wait for better technologies in this area.

Jack: Well, without wishing to be too congenial, I am also excited about the potential for the technology. But what I know about past technology suggests that it may well end up gloriously suboptimal. I’m interested in a future involving self-driving cars that might actually realize some of the enormous benefits to, for example, bringing accessibility to people who currently can’t drive. The enormous benefits to public safety, to congestion, but making that work will not just involve a repetition of current dynamics of technological change. I think current ownership models in the US, current modes of transport in the US just are not conducive to making that happen. So I would love to see governments taking control of this and actually making it work in the same way as in the past, governments have taken control of transport and built public value transport systems.

Ariel: If governments are taking control of this and they’re having it done right, what does that mean?

Jack: The first thing that I don’t see any of within the self-driving car debate, because I just think we’re at too early a stage, is an articulation of what we want from self-driving cars. We have the Google vision, the Waymo vision of the benefits of self-driving cars, which is largely about public safety. But no consideration of what it would take to get that right. I think that’s going to look very different. I think to an extent Tempe is an easy case, because the roads in Arizona are extremely well organized. It’s sunny, pedestrians behave themselves. But what you’re not going to be able to do is take that technology and transport it to central London and expect it to do the same job.

So some understanding of desirable systems across different places is really important. That, I’m afraid, does mean sharing control between the innovators and the people who have responsibility for public safety, public transport and public space.

Andrew: Even though most people in this field and other similar fields are doing it for what they claim is for future benefits and the public good, there’s a huge gap between good intentions of doing the right thing and actually being able to achieve something positive for society. I think the danger is that good intentions go bad very fast if you don’t have the right processes and structures in place to translate them into something that benefits society. To do that, you’ve got to have partnerships and engagement with agencies and authorities that have oversight over these technologies, but also the communities and the people that are either going to be impacted by them or benefit by them.

Jack: I think that’s right. Just letting the benefits as stated by the innovators speak for themselves hasn’t worked in the past, and it won’t work here. We have to allow some sort of democratic discussion about that.

Ariel: I want to move forward in the future to more advanced technology, looking at more advanced artificial intelligence, even super intelligence. How do we address risks that are associated with that when a large number of researchers don’t even think this technology can be developed, or if it is developed, it’s still hundreds of years away? How do you address these really big unknowns and uncertainties?

Andrew: That’s a huge question. So I’m speaking here as something of a cynic of some of the projections of superintelligence. I think you’ve got to develop a balance between near and mid-term risks, but at the same time, work out how you take early action on trajectories so you’re less likely to see the emergence of those longer-term existential risks. One of the things that actually really concerns me here is if you become too focused on some of the highly speculative existential risks, you end up missing things which could be catastrophic in a smaller sense in the near to mid-term.

Pouring millions upon millions of dollars into solving a hypothetical problem around superintelligence and the threat to humanity sometime in the future, at the expense of looking at nearer-term things such as algorithmic bias, autonomous decision-making that cuts people out of the loop and a whole number of other things, is a risk balance that doesn’t make sense to me. Somehow, you’ve got to deal with these emerging issues, but in a way which is sophisticated enough that you’re not setting yourself up for problems in the future.

Jack: I think getting that balance right is crucial. I agree with your assessment that that balance is far too much, at the moment, in the direction of the speculative and long-term. One of the reasons why it is, is because that’s an extremely interesting set of engineering challenges. So I think the question would be on whose shoulders does the responsibility lie for acting once you recognize threats or risks like that? Typically, what you find when a community of scientists gathers to assess risks is that they frame the issue in ways that lead to scientific or technical solutions. It’s telling, I think, that in the discussion about superintelligence, the answer, either in the foreground or in the background, is normally more AI not less AI. And the answer is normally to be delivered by engineers rather than to be governed by politicians.

That said, I think there’s sort of cause for optimism if you look at the recent campaign around autonomous weapons. That would seem to be a clear recognition of a technologically mediated issue where the necessary action is not on the part of the innovators themselves but on all the people who are in control of our armed forces.

Andrew: I think you’re exactly right, Jack. I should clarify that even though there is a lot of discussion around speculative existential risks, there is also a lot of action on nearer-term issues such as the lethal autonomous weapons. But one of the things that I’ve been particularly struck with in conversations is the fear amongst technologists of losing control over the technology and the narrative. I’ve had conversations where people have said that they’re really worried about the potential down sides, the potential risks of where artificial intelligence is going. But they’re convinced that they can solve those problems without telling anybody else about them, and they’re scared that if they tell a broad public about those risks that they’ll be inhibited in doing the research and the development that they really want to do.

That really comes down to not wanting to relinquish control over technology. But I think that there has to be some relinquishment there if we’re going to have responsible development of these technologies that really focuses on how they could impact people both in the short as well as the long-term, and how as a society we find pathways forwards.

Ariel: Andrew, I’m really glad you brought that up. That’s one that I’m not convinced by, this idea that if we tell the public what the risks are, then suddenly the researchers won’t be able to do the research they want. Do you see that as a real risk for researchers?

Andrew: I think there is a risk there, but it’s rather complex. Most of the time, the public actually don’t care about these things. There are one or two examples; genetically modifying organisms is the one that always comes up. But that is a very unique and very distinct example. Most of the time, if you talk broadly about what’s happening with a new technology, people will say, that’s interesting, and get on with their lives. So there’s much less risk there about talking about it than I think people realize.

The other thing, though, is even if there is a risk of people saying “hold on a minute, we don’t like what’s happening here,” better to have that feedback sooner rather than later, because the reality is people are going to find out what’s happening. If they discover as a company or a research agency or a scientific group that you’ve been doing things that are dangerous and you haven’t been telling them about it, when they find out after the fact, people get mad. That’s where things get really messy.

[What’s also] interesting – you’ve got a whole group of people in the technology sphere who are very clearly trying to do what they think is the right thing. They’re not in it primarily for fame and money, but they’re in it because they believe that something has to change to build a beneficial future.

The challenge is, these technologists, if they don’t realize the messiness of working with people and society and they think just in terms of technological solutions, they’re going to hit roadblocks that they can’t get over. So this to me is why it’s really important that you’ve got to have the conversations. You’ve got to take the risk to talk about where things are going with the broader population. You’ve got to risk your vision having to be pulled back a little bit so it’s more successful in the long-term.

Ariel: I was hoping you could both touch on the impact of media as well and how that’s driving the discussion.

Jack: I think blaming the media is always the convenient thing to do. They’re the convenient target. I think the question is about actually the culture, which is extremely technologically utopian and which wants to believe that there are simple technological solutions to some of our most pressing problems. In that culture, it is understandable if seemingly seductive ideas, whether about artificial intelligence or about new transport systems, are taken. I would love there to be a more skeptical attitude so that when those sorts of claims are made, just as when any sort of political claim is made, that they are scrutinized and become the starting point for a vigorous debate about the world in which we want to live in. I think that is exactly what is missing from our current technological discourse.

Andrew: The media is a product of society. We are titillated by extreme, scary scenarios. The media is a medium through which that actually happens. I work a lot with journalists, and I’ve had very few experiences with being misrepresented or misquoted where it wasn’t my fault in the first place.

So I think we’ve got to think of two things when we think of media coverage. First of all, we’ve got to get smarter in how we actually communicate, and by we I mean the people that feel we’ve got something to say here. We’ve got to work out how to communicate in a way that makes sense with the journalists and the media that we’re communicating through. We’ve also got to realize that even though we might be outraged by a misrepresentation, that usually doesn’t get as much traction in society as we think it does. So we’ve got to be a little bit more laid back about how we see things reported.

Ariel: Is there anything else that you think is important to add?

Andrew: I would just sort of wrap things up. There has been a lot of agreement, but actually, and this is an important thing, it’s because most people, including people that are often portrayed as just being naysayers, are trying to ask difficult questions so we can actually build a better future through technology and through innovation in all its forms. I think it’s really important to realize that just because somebody asks difficult questions doesn’t mean they’re trying to stop progress, but they’re trying to make sure that that progress is better for everybody.

Jack: Hear, hear.

Podcast: AI Ethics, the Trolley Problem, and a Twitter Ghost Story with Joshua Greene and Iyad Rahwan

As technically challenging as it may be to develop safe and beneficial AI, this challenge also raises some thorny questions regarding ethics and morality, which are just as important to address before AI is too advanced. How do we teach machines to be moral when people can’t even agree on what moral behavior is? And how do we help people deal with and benefit from the tremendous disruptive change that we anticipate from AI?

To help consider these questions, Joshua Greene and Iyad Rawhan kindly agreed to join the podcast. Josh is a professor of psychology and member of the Center for Brain Science Faculty at Harvard University, where his lab has used behavioral and neuroscientific methods to study moral judgment, focusing on the interplay between emotion and reason in moral dilemmas. He’s the author of Moral Tribes: Emotion, Reason and the Gap Between Us and Them. Iyad is the AT&T Career Development Professor and an associate professor of Media Arts and Sciences at the MIT Media Lab, where he leads the Scalable Cooperation group. He created the Moral Machine, which is “a platform for gathering human perspective on moral decisions made by machine intelligence.”

In this episode, we discuss the trolley problem with autonomous cars, how automation will affect rural areas more than cities, how we can address potential inequality issues AI may bring about, and a new way to write ghost stories.

This transcript has been heavily edited for brevity. You can read the full conversation here.

Ariel: How do we anticipate that AI and automation will impact society in the next few years?

Iyad: AI has the potential to extract better value from the data we’re collecting from all the gadgets, devices and sensors around us. We could use this data to make better decisions, whether it’s micro-decisions in an autonomous car that takes us from A to B safer and faster, or whether it’s medical decision-making that enables us to diagnose diseases better, or whether it’s even scientific discovery, allowing us to do science more effectively, efficiently and more intelligently.

Joshua: Artificial intelligence also has the capacity to displace human value. To take the example of using artificial intelligence to diagnose disease. On the one hand it’s wonderful if you have a system that has taken in all of the medical knowledge we have in a way that no human could and uses it to make better decisions. But at the same time that also means that lots of doctors might be out of a job or have a lot less to do. This is the double-edged sword of artificial intelligence, the value it creates and the human value that it displaces.

Ariel: Can you explain what the trolley problem is and how does that connect to this question of what do autonomous vehicles do in situations where there is no good option?

Joshua: One of the original versions of the trolley problem goes like this (we’ll call it “the switch case”): A trolley is headed towards five people and if you don’t do anything, they’re going to be killed, but you can hit a switch that will turn the trolley away from the five and onto a side track. However on that side track, there’s one unsuspecting person and if you do that, that person will be killed.

The question is: is it okay to hit the switch to save those five people’s lives but at the cost of saving one life? In this case, most people tend to say yes. Then we can vary it a little bit. In “the footbridge case,” the situation is different as follows: the trolley is now headed towards five people on a single track, over that track is a footbridge and on that footbridge is a large person wearing a very large backpack. You’re also on the bridge and the only way that you can save those five people from being hit by the trolley is to push that big person off of the footbridge and onto the tracks below.

Assume that it will work, do you think it’s okay to push the guy off the footbridge in order to save five lives? Here, most people say no, and so we have this interesting paradox. In both cases, you’re trading one life for five, yet in one case it seems like it’s the right thing to do, in the other case it seems like it’s the wrong thing to do.

One of the classic objections to these dilemmas is that they’re unrealistic. My view is that the point is not that they’re realistic, but instead that they function like high contrast stimuli. If you’re a vision researcher and you’re using flashing black and white checkerboards to study the visual system, you’re not using that because that’s a typical thing that you look at, you’re using it because it’s something that drives the visual system in a way that reveals its structure and dispositions.

In the same way, these high contrast, extreme moral dilemmas can be useful to sharpen our understanding of the more ordinary processes that we bring to moral thinking.

Iyad: The trolley problem can translate in a cartoonish way to a scenario with which an autonomous car is faced with only two options. The car is going at a speed limit on a street and due to mechanical failure is unable to stop and is going to hit it a group of five pedestrians. The car can swerve and hit a bystander. Should the car swerve or should it just plow through the five pedestrians?

This has a structure similar to the trolley problem because you’re making similar tradeoffs between one and five people and the decision is not being taken on the spot, it’s actually happening at the time of the programming of the car.

There is another complication in which the person being sacrificed to save the greater number of people is the person in the car. Suppose the car can swerve to avoid the five pedestrians but as a result falls off a cliff. That adds another complication especially that programmers are going to have to appeal to customers. If customers don’t feel safe in those cars because of some hypothetical situation that may take place in which they’re sacrificed, that pits the financial incentives against the potentially socially desirable outcome, which can create problems.

A question that raises itself is: Is it going to ever happen? How many times do we face these kinds of situations as we drive today? So the argument goes: these situations are going to be so rare that they are irrelevant and that autonomous cars promise to be substantially safer than human-driven cars that we have today, that the benefits significantly outweigh the costs.

There is obviously truth to this argument, if you take the trolley problem scenario literally. But what the autonomous car version of the trolley problem is doing, is it’s abstracting the tradeoffs that are taking place every microsecond, even now.

Imagine you’re driving on the road and there is a large truck on the lane to your left and as a result you choose to stick a little bit further to the right, just to minimize risk in case this car gets off its lane. Now suppose that there could be a cyclist later on the right hand side, what you’re effectively doing in this small maneuver is slightly reducing risk to yourself but slightly increasing risk to the cyclist. These sorts of decisions are being made millions and millions of times every day.

Ariel: Applying the trolley problem to self-driving cars seems to be forcing the vehicle and thus the programmer of the vehicle to make a judgment call about whose life is more valuable. Can we not come up with some other parameters that don’t say that one person’s life is more valuable than someone else’s?

Joshua: I don’t think that there’s any way to avoid doing that. If you’re a driver, there’s no way to avoid answering the question, how cautious or how aggressive am I going to be. You can not explicitly answer the question; you can say I don’t want to think about that, I just want to drive and see what happens. But you are going to be implicitly answering that question through your behavior, and in the same way, autonomous vehicles can’t avoid the question. Either the people who are designing the machines, training the machines or explicitly programming to behave in certain ways, they are going to do things that are going to affect the outcome.

The cars will constantly be making decisions that inevitably involve value judgments of some kind.

Ariel: To what extent have we actually asked customers what it is that they want from the car? In a completely ethical world, I would like the car to protect the person who’s more vulnerable, who would be the cyclist. In practice, I have a bad feeling I’d probably protect myself.

Iyad: We could say we want to treat everyone equally. On the other hand, you have this self-protective instinct which presumably as a consumer, that’s what you want to buy for yourself and your family. On the other hand you also care for vulnerable people. Different reasonable and moral people can disagree on what the more important factors and considerations should be and I think this is precisely why we have to think about this problem explicitly, rather than leave it purely to – whether it’s programmers or car companies or any particular single group of people – to decide.

Joshua: When we think about problems like this, we have a tendency to binarize it, but it’s not a binary choice between protecting that person or not. It’s really going to be matters of degree. Imagine there’s a cyclist in front of you going at cyclist speed and you either have to wait behind this person for another five minutes creeping along much slower than you would ordinarily go, or you have to swerve into the other lane where there’s oncoming traffic at various distances. Very few people might say I will sit behind this cyclist for 10 minutes before I would go into the other lane and risk damage to myself or another car. But very few people would just blow by the cyclist in a way that really puts that person’s life in peril.

It’s a very hard question to answer because the answers don’t come in the form of something that you can write out in a sentence like, “give priority to the cyclist.” You have to say exactly how much priority in contrast to the other factors that will be in play for this decision. And that’s what makes this problem so interesting and also devilishly hard to think about.

Ariel: Why do you think this is something that we have to deal with when we’re programming something in advance and not something that we as a society should be addressing when it’s people driving?

Iyad: We very much value the convenience of getting from A to B. Our lifetime odds of dying from a car accident is more than 1%, yet somehow, we’ve decided to put up with this because of the convenience. As long as people don’t run through a red light or are not drunk, you don’t really blame them for fatal accidents, we just call them accidents.

But now, thanks to autonomous vehicles that can make decisions and reevaluate situations hundreds or thousands of times per second and adjust their plan and so on – we potentially have the luxury to make those decisions a bit better and I think this is why things are different now.

Joshua: With the human we can say, “Look, you’re driving, you’re responsible, and if you make a mistake and hurt somebody, you’re going to be in trouble and you’re going to pay the cost.” You can’t say that to a car, even a car that’s very smart by 2017 standards. The car isn’t going to be incentivized to behave better – the motivation has to be explicitly trained or programmed in.

Iyad: Economists say you can incentivize the people who make the cars to program them appropriately by fining them and engineering the product liability law in such a way that would hold them accountable and responsible for damages, and this may be the way in which we implement this feedback loop. But I think the question remains what should the standards be against which we hold those cars accountable.

Joshua: Let’s say somebody says, “Okay, I make self-driving cars and I want to make them safe because I know I’m accountable.” They still have to program or train the car. So there’s no avoiding that step, whether it’s done through traditional legalistic incentives or other kinds of incentives.

Ariel: I want to ask about some other research you both do. Iyad you look at how AI and automation impact us and whether that could be influenced by whether we live in smaller towns or larger cities. Can you talk about that?

Iyad: Clearly there are areas that may potentially benefit from AI because it improves productivity and it may lead to greater wealth, but it can also lead to labor displacement. It could cause unemployment if people aren’t able to retool and improve their skills so that they can work with these new AI tools and find employment opportunities.

Are we expected to experience this in a greater way or in a smaller magnitude in smaller versus bigger cities? On one hand there are lots of creative jobs in big cities and, because creativity is so hard to automate, it should make big cities more resilient to these shocks. On the other hand if you go back to Adam Smith and the idea of the division of labor, the whole idea is that individuals become really good at one thing. And this is precisely what spurred urbanization in the first industrial revolution. Even though the system is collectively more productive, individuals may be more automatable in terms of their narrowly-defined tasks.

But when we did the analysis, we found that indeed larger cities are more resilient in relative terms. The preliminary findings are that in bigger cities there is more production that requires social interaction and very advanced skills like scientific and engineering skills. People are better able to complement the machines because they have technical knowledge, so they’re able to use new intelligent tools that are becoming available, but they also work in larger teams on more complex products and services.

Ariel: Josh, you’ve done a lot of work with the idea of “us versus them.” And especially as we’re looking in this country and others at the political situation where it’s increasingly polarized along this line of city versus smaller town, do you anticipate some of what Iyad is talking about making the situation worse?

Joshua: I certainly think we should be prepared for the possibility that it will make the situation worse. The central idea is that as technology advances, you can produce more and more value with less and less human input, although the human input that you need is more and more highly skilled.

If you look at something like Turbo Tax, before you had lots and lots of accountants and many of those accountants are being replaced by a smaller number of programmers and super-expert accountants and people on the business side of these enterprises. If that continues, then yes, you have more and more wealth being concentrated in the hands of the people whose high skill levels complement the technology and there is less and less for people with lower skill levels to do. Not everybody agrees with that argument, but I think it’s one that we ignore at our peril.

Ariel: Do you anticipate that AI itself would become a “them,” or do you think it would be people working with AI versus people who don’t have access to AI?

Joshua: The idea of the AI itself becoming the “them,” I am agnostic as to whether or not that could happen eventually, but this would involve advances in artificial intelligence beyond anything we understand right now. Whereas the problem that we were talking about earlier – humans being divided into a technological, educated, and highly-paid elite as one group and then the larger group of people who are not doing as well financially – that “us-them” divide, you don’t need to look into the future, you can see it right now.

Iyad: I don’t think that the robot will be the “them” on their own, but I think the machines and the people who are very good at using the machines to their advantage, whether it’s economic or otherwise, will collectively be a “them.” It’s the people who are extremely tech savvy, who are using those machines to be more productive or to win wars and things like that. There would be some sort of evolutionary race between human-machine collectives.

Joshua: I think it’s possible that people who are technologically enhanced could have a competitive advantage and set off an economic arms race or perhaps even literal arms race of a kind that we haven’t seen. I hesitate to say, “Oh, that’s definitely going to happen.” I’m just saying it’s a possibility that makes a certain kind of sense.

Ariel: Do either of you have ideas on how we can continue to advance AI and address these divisive issues?

Iyad: There are two new tools at our disposal: experimentation and machine-augmented regulation.

Today, [there are] cars with a bull bar in front of them. These metallic bars at the front of the car increase safety for the passenger in the case of collision, but they have disproportionate impact on other cars, on pedestrians and cyclists, and they’re much more likely to kill them in the case of an accident. As a result, by making this comparison, by identifying that cars with bull bars are worse for certain group, the trade off was not acceptable, and many countries have banned them, for example the UK, Australia, and many European countries.

If there was a similar trade off being caused by a software feature, then, we wouldn’t know unless we allowed for experimentation as well as monitoring – if we looked at the data to identify whether a particular algorithm is making for very safe cars for customers, but at the expense of a particular group.

In some cases, these systems are going to be so sophisticated and the data is going to be so abundant that we won’t be able to observe them and regulate them in time. Think of algorithmic trading programs. No human being is able to observe these things fast enough to intervene, but you could potentially insert another algorithm, a regulatory algorithm or an oversight algorithm, that will observe other AI systems in real time on our behalf, to make sure that they behave.

Joshua: There are two general categories of strategies for making things go well. There are technical solutions to things and then there’s the broader social problem of having a system of governance that can be counted on to produce outcomes that are good for the public in general.

The thing that I’m most worried about is that if we don’t get our politics in order, especially in the United States, we’re not going to have a system in place that’s going to be able to put the public’s interest first. Ultimately, it’s going to come down to the quality of the government that we have in place, and quality means having a government that distributes benefits to people in what we would consider a fair way and takes care to make sure that things don’t go terribly wrong in unexpected ways and generally represents the interests of the people.

I think we should be working on both of these in parallel. We should be developing technical solutions to more localized problems where you need an AI solution to solve a problem created by AI. But I also think we have to get back to basics when it comes to the fundamental principles of our democracy and preserving them.

Ariel: As we move towards smarter and more ubiquitous AI, what worries you most and what are you most excited about?

Joshua: I’m pretty confident that a lot of labor is going to be displaced by artificial intelligence. I think it is going to be enormously politically and socially disruptive, and I think we need to plan now. With self-driving cars especially in the trucking industry, I think that’s going to be the first and most obvious place where millions of people are going to be out of work and it’s not going to be clear what’s going to replace it for them.

I’m excited about the possibility of AI producing value for people in a way that has not been possible before on a large scale. Imagine if anywhere in the world that’s connected to the Internet, you could get the best possible medical diagnosis for whatever is ailing you. That would be an incredible life-saving thing. And as AI teaching and learning systems get more sophisticated, I think it’s possible that people could actually get very high quality educations with minimal human involvement and that means that people all over the world could unlock their potential. And I think that that would be a wonderful transformative thing.

Iyad: I’m worried about the way in which AI and specifically autonomous weapons are going to alter the calculus of war. In order to aggress on another nation, you have to mobilize humans, you have to get political support from the electorate, you have to handle the very difficult process of bringing back people in coffins, and the impact that this has on electorates.

This creates a big check on power and it makes people think very hard about making these kinds of decisions. With AI, when you’re able to wage wars with very little loss to life, especially if you’re a very advanced nation that is at the forefront of this technology, then you have disproportionate power. It’s kind of like a nuclear weapon, but maybe more because it’s much more customizable. It’s not an all out or nothing – you could start all sorts of wars everywhere.

I think it’s going to be a very interesting shift in the way superpowers think about wars and I worry that this might make them trigger happy. I think a new social contract needs to be written so that this power is kept in check and that there’s more thought that goes into this.

On the other hand, I’m very excited about the abundance that will be created by AI technologies. We’re going to optimize the use of our resources in many ways. In health and in transportation, in energy consumption and so on, there are so many examples in recent years in which AI systems are able to discover ways in which even the smartest humans haven’t been able to optimize.

Ariel: One final thought: This podcast is going live on Halloween, so I want to end on a spooky note. And quite conveniently, Iyad’s group has created Shelley, which is a Twitter chatbot that will help you craft scary ghost stories. Shelley is, of course, a nod to Mary Shelley who wrote Frankenstein, which is the most famous horror story about technology. Iyad, I was hoping you could tell us a bit about how Shelley works.

Iyad: Yes, well this is our second attempt at doing something spooky for Halloween. Last year we launched the nightmare machine, which was using deep neural networks and style transfer algorithms to take ordinary photos and convert them into haunted houses and zombie-infested places. And that was quite interesting; it was a lot of fun. More recently, now we’ve launched Shelley, which people can visit on shelley.ai, and it is named after Mary Shelley who authored Frankenstein.

This is a neural network that generates text and it’s been trained on a very large data set of over 100 thousand short horror stories from a subreddit called No Sleep. And so it’s basically got a lot of human knowledge about what makes things spooky and scary, and the nice thing is that it generates part of the story and people can tweet back at it a continuation of the story and then basically take turns with the AI to craft stories. And we feature those stories on the website afterwards. if I’m correct, this is the first collaborative human-AI horror writing exercise ever.

Podcast: Choosing a Career to Tackle the World’s Biggest Problems with Rob Wiblin and Brenton Mayer

If you want to improve the world as much as possible, what should you do with your career? Should you become a doctor, an engineer or a politician? Should you try to end global poverty, climate change, or international conflict? These are the questions that the research group, 80,000 Hours, tries to answer.

To learn more, I spoke with Rob Wiblin and Brenton Mayer of 80,000 Hours. The following are highlights of the interview, but you can listen to the full podcast above or read the transcript here.

Can you give us some background about 80,000 Hours?

Rob: 80,000 Hours has been around for about six years and started when Benjamin Todd and Will MacAskill wanted to figure out how they could do as much good as possible. They started looking into things like the odds of becoming an MP in the UK or if you became a doctor, how many lives would you save. Pretty quickly, they were learning things that no one else had investigated.

They decided to start 80,000 Hours, which would conduct this research in a more systematic way and share it with people who wanted to do more good with their career.

80,000 hours is roughly the number of hours that you’d work in a full-time professional career. That’s a lot of time, so it pays off to spend quite a while thinking about what you’re going to do with that time.

On the other hand, 80,000 hours is not that long relative to the scale of the problems that the world faces. You can’t tackle everything. You’ve only got one career, so you should be judicious about what problems you try to solve and how you go about solving them.

How do you help people have more of an impact with their careers?

Brenton: The main thing is a career guide. We’ll talk about how to have satisfying careers, how to work on one of the world’s most important problems, how to set yourself up early so that later on you can have a really large impact.

The second part that we do is do career coaching and try to apply advice to individuals.

What is earning to give?

Rob: Earning to give is the career approach where you try to make a lot of money and give it to organizations that can use it to have a really large positive impact. I know people who can make millions of dollars a year doing the thing they love and donate most of that to effective nonprofits, supporting 5, 10, 15, possibly even 20 people to do direct work in their place.

Can you talk about research you’ve been doing regarding the world’s most pressing problems?

Rob: One of the first things we realized is that if you’re trying to help people alive today, your money can go further in the developing world. We just need to scale up solutions to basic health problems and economic issues that have been resolved elsewhere.

Moving beyond that, what other groups in the world are extremely neglected? Factory farmed animals really stand out. There’s very little funding focused on improving farm animal welfare.

The next big idea was, of all the people that we could help, what fraction are alive today? We think that it’s only a small fraction. There’s every reason to think humanity could live for another 100 generations on Earth and possibly even have our descendants alive on other planets.

We worry a lot about existential risks and ways that civilization can go off track and never recover. Thinking about the long-term future of humanity is where a lot of our attention goes and where I think people can have the largest impact with their career.

Regarding artificial intelligence safety, nuclear weapons, biotechnology and climate change, can you consider different ways that people could pursue either careers or “earn to give” options for these fields?

Rob: One would be to specialize in machine learning or other technical work and use those skills to figure out how can we make artificial intelligence aligned with human interests. How do we make the AI do what we want and not things that we don’t intend?

Then there’s the policy and strategy side, trying to answer questions like how do we prevent an AI arms race? Do we want artificial intelligence running military robots? Do we want the government to be more involved in regulating artificial intelligence or less involved? You can also approach this if you have a good understanding of politics, policy, and economics. You can potentially work in government, military or think tanks.

Things like communications, marketing, organization, project management, and fundraising operations — those kinds of things can be quite hard to find skilled, reliable people for. And it can be surprisingly hard to find people who can handle media or do art and design. If you have those skills, you should seriously consider applying to whatever organizations you admire.

[For nuclear weapons] I’m interested in anything that can promote peace between the United States and Russia and China. A war between those groups or an accidental nuclear incident seems like the most likely thing to throw us back to the stone age or even pre-stone age.

I would focus on ensuring that they don’t get false alarms; trying to increase trust between the countries in general and the communication lines so that if there are false alarms, they can quickly diffuse the situation.

The best opportunities [in biotech] are in early surveillance of new diseases. If there’s a new disease coming out, a new flu for example, it takes  a long time to figure out what’s happened.

And when it comes to controlling new diseases, time is really of the essence. If you can pick it up within a few days or weeks, then you have a reasonable shot at quarantining the people and following up with everyone that they’ve met and containing it. Any technologies that we can invent or any policies that will allow us to identify new diseases before they’ve spread to too many people is going to help with both natural pandemics, and also any kind of synthetic biology risks, or accidental releases of diseases from biological researchers.

Brenton: A Wagner and Weitzman paper suggests that there’s about a 10% chance of warming larger than 4.8 degrees Celsius, or a 3% chance of more than 6 degrees Celsius. These are really disastrous outcomes. If you’re interested in climate change, we’re pretty excited about you working on these very bad scenarios. Sensible things to do would be improving our ability to forecast; thinking about the positive feedback loops that might be inherent in Earth’s climate; thinking about how to enhance international corporation.

Rob: It does seem like solar power and storage of energy from solar power is going to have the biggest impact on emissions over at least the next 50 years. Anything that can speed up that transition makes a pretty big contribution.

Rob, can you explain your interest in long-term multigenerational indirect effects and what that means?

Rob: If you’re trying to help people and animals thousands of years in the future, you have to help them through a causal chain that involves changing the behavior of someone today and then that’ll help the next generation and so on.

One way to improve the long-term future of humanity is to do very broad things that improve human capabilities like reducing poverty, improving people’s health, making schools better.

But in a world where the more science and technology we develop, the more power we have to destroy civilization, it becomes less clear that broadly improving human capabilities is a great way to make the future go better. If you improve science and technology, you both improve our ability to solve problems and create new problems.

I think about what technologies can we invent that disproportionately make the world safer rather than more risky. It’s great to improve the technology to discover new diseases quickly and to produce vaccines for them quickly, but I’m less excited about generically pushing forward the life sciences because there’s a lot of potential downsides there as well.

Another way that we can robustly prepare humanity to deal with the long-term future is to have better foresight about the problems that we’re going to face. That’s a very concrete thing you can do that puts humanity in a better position to tackle problems in the future — just being able to anticipate those problems well ahead of time so that we can dedicate resources to averting those problems.

To learn more, visit 80000hours.org and subscribe to Rob’s new podcast.

Explainable AI: a discussion with Dan Weld

Machine learning systems are confusing – just ask any AI researcher. Their deep neural networks operate incredibly quickly, considering thousands of possibilities in seconds before making decisions. The human brain simply can’t keep up.

When people learn to play Go, instructors can challenge their decisions and hear their explanations. Through this interaction, teachers determine the limits of a student’s understanding. But DeepMind’s AlphaGo, which recently beat the world’s champions at Go, can’t answer these questions. When AlphaGo makes an unexpected decision it’s difficult to understand why it made that choice.

Admittedly, the stakes are low with AlphaGo: no one gets hurt if it makes an unexpected move and loses. But deploying intelligent machines that we can’t understand could set a dangerous precedent.

According to computer scientist Dan Weld, understanding and trusting machines is “the key problem to solve” in AI safety, and it’s necessary today. He explains, “Since machine learning is at the core of pretty much every AI success story, it’s really important for us to be able to understand what it is that the machine learned.”

As machine learning (ML) systems assume greater control in healthcare, transportation, and finance, trusting their decisions becomes increasingly important. If researchers can program AIs to explain their decisions and answer questions, as Weld is trying to do, we can better assess whether they will operate safely on their own.

 

Teaching Machines to Explain Themselves

Weld has worked on techniques that expose blind spots in ML systems, or “unknown unknowns.”

When an ML system faces a “known unknown,” it recognizes its uncertainty with the situation. However, when it encounters an unknown unknown, it won’t even recognize that this is an uncertain situation: the system will have extremely high confidence that its result is correct, but it will be wrong. Often, classifiers have this confidence because they were “trained on data that had some regularity in it that’s not reflected in the real world,” Weld says.

Consider an ML system that has been trained to classify images of dogs, but has only been trained on images of brown and black dogs. If this system sees a white dog for the first time, it might confidently assert that it’s not a dog. This is an “unknown unknown” – trained on incomplete data, the classifier has no idea that it’s completely wrong.

ML systems can be programmed to ask for human oversight on known unknowns, but since they don’t recognize unknown unknowns, they can’t easily ask for oversight. Weld’s research team is developing techniques to facilitate this, and he believes that it will complement explainability. “After finding unknown unknowns, the next thing the human probably wants is to know WHY the learner made those mistakes, and why it was so confident,” he explains.

Machines don’t “think” like humans do, but that doesn’t mean researchers can’t engineer them to explain their decisions.

One research group jointly trained a ML classifier to recognize images of birds and generate captions. If the AI recognizes a toucan, for example, the researchers can ask “why.” The neural net can then generate an explanation that the huge, colorful bill indicated a toucan.

While AI developers will prefer certain concepts explained graphically, consumers will need these interactions to involve natural language and more simplified explanations. “Any explanation is built on simplifying assumptions, but there’s a tricky judgment question about what simplifying assumptions are OK to make. Different audiences want different levels of detail,” says Weld.

Explaining the bird’s huge, colorful bill might suffice in image recognition tasks, but with medical diagnoses and financial trades, researchers and users will want more. Like a teacher-student relationship, human and machine should be able to discuss what the AI has learned and where it still needs work, drilling down on details when necessary.

“We want to find mistakes in their reasoning, understand why they’re making these mistakes, and then work towards correcting them,” Weld adds.    

 

Managing Unpredictable Behavior

Yet, ML systems will inevitably surprise researchers. Weld explains, “The system can and will find some way of achieving its objective that’s different from what you thought.”

Governments and businesses can’t afford to deploy highly intelligent AI systems that make unexpected, harmful decisions, especially if these systems control the stock market, power grids, or data privacy. To control this unpredictability, Weld wants to engineer AIs to get approval from humans before executing novel plans.

“It’s a judgment call,” he says. “If it has seen humans executing actions 1-3, then that’s a normal thing. On the other hand, if it comes up with some especially clever way of achieving the goal by executing this rarely-used action number 5, maybe it should run that one by a live human being.”

Over time, this process will create norms for AIs, as they learn which actions are safe and which actions need confirmation.

 

Implications for Current AI Systems

The people that use AI systems often misunderstand their limitations. The doctor using an AI to catch disease hasn’t trained the AI and can’t understand its machine learning. And the AI system, not programmed to explain its decisions, can’t communicate problems to the doctor.

Weld wants to see an AI system that interacts with a pre-trained ML system and learns how the pre-trained system might fail. This system could analyze the doctor’s new diagnostic software to find its blind spots, such as its unknown unknowns. Explainable AI software could then enable the AI to converse with the doctor, answering questions and clarifying uncertainties.

And the applications extend to finance algorithms, personal assistants, self-driving cars, and even predicting recidivism in the legal system, where explanation could help root out bias. ML systems are so complex that humans may never be able to understand them completely, but this back-and-forth dialogue is a crucial first step.

“I think it’s really about trust and how can we build more trustworthy AI systems,” Weld explains. “The more you interact with something, the more shared experience you have, the more you can talk about what’s going on. I think all those things rightfully build trust.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Podcast: Life 3.0 – Being Human in the Age of Artificial Intelligence

Elon Musk has called it a compelling guide to the challenges and choices in our quest for a great future of life on Earth and beyond, while Stephen Hawking and Ray Kurzweil have referred to it as an introduction and guide to the most important conversation of our time. “It” is Max Tegmark’s new book, Life 3.0: Being Human in the Age of Artificial Intelligence.

Tegmark is a physicist and AI researcher at MIT, and he’s also the president of the Future of Life Institute.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

What makes Life 3.0 an important read for anyone who wants to understand and prepare for our future?

There’s been lots of talk about AI disrupting the job market and enabling new weapons, but very few scientists talk seriously about what I think is the elephant in the room: What will happen, once machines outsmart us at all tasks?

Will superhuman artificial intelligence arrive in our lifetime? Can and should it be controlled, and if so, by whom? Can humanity survive in the age of AI? And if so, how can we find meaning and purpose if super-intelligent machines provide for all our needs and make all our contributions superfluous?

I’m optimistic that we can create a great future with AI, but it’s not going to happen automatically. We have to win this race between the growing power of the technology, and the growing wisdom with which we manage it. We don’t want to learn from mistakes. We want to get things right the first time because that might be the only time we have.

There is still a lot of AI researchers who are telling us not to worry. What is your response to them?

There are two very basic questions where the world’s leading AI researchers totally disagree.

One of them is when, if ever, are we going to get super-human general artificial intelligence? Some people think it’s never going to happen or take hundreds of years. Many others think it’s going to happen in decades. The other controversy is what’s going to happen if we ever get beyond human-level AI?

Then there are a lot of very serious AI researchers who think that this could be the best thing ever to happen, but it could also lead to huge problems. It’s really boring to sit around and quibble about whether we should worry or not. What I’m interested in is asking what concretely can we do today that’s going to increase the chances of things going well because that’s all that actually matters.

There’s also a lot of debate about whether people should focus on just near-term risks or just long-term risks.

We should obviously focus on both. What you’re calling the short-term questions, like how for example, do you make computers that are robust, and do what they’re supposed to do and not crash and don’t get hacked. It’s not only something that we absolutely need to solve in the short term as AI gets more and more into society, but it’s also a valuable stepping stone toward tougher questions. How are you ever going to build a super-intelligent machine that you’re confident is going to do what you want, if you can’t even build a laptop that does what you want instead of giving you the blue screen of death or the spinning wheel of doom.

If you want to go far in one direction, first you take one step in that direction.

You mention 12 options for what you think a future world with superintelligence will look like. Could you talk about a couple of the future scenarios? And then what are you hopeful for, and what scares you?

Yeah, I confess, I had a lot of fun brainstorming these different scenarios. When we envision the future, we almost inadvertently obsess about gloomy stuff. Instead, we really need these positive visions to think what kind of society would we like to have if we have enough intelligence at our disposal to eliminate poverty, disease, and so on? If it turns out that AI can help us solve these challenges, what do we want?

If we have very powerful AI systems, it’s crucial that their goals are aligned with our goals. We don’t want to create machines, which are first very excited about helping us, and then later get as bored with us as kids get with Legos.

Finally, what should the goals be that we want these machines to safeguard? There’s obviously no consensus on Earth for that. Should it be Donald Trump’s goals? Hillary Clinton’s goals? ISIS’s goals? Whose goals should it be? How should this be decided? This conversation can’t just be left to tech nerds like myself. It has to involve everybody because it’s everybody’s future that’s at stake here.

If we actually create an AI or multiple AI systems that can do this, what do we do then?

That’s one of those huge questions that everybody should be discussing. Suppose we get machines that can do all our jobs, produce all our goods and services for us. How do you want to distribute this wealth that’s produced? Just because you take care of people materially, doesn’t mean they’re going to be happy. How do you create a society where people can flourish and find meaning and purpose in their lives even if they are not necessary as producers? Even if they don’t need to have jobs?

You have a whole chapter dedicated to the cosmic endowment and what happens in the next billion years and beyond. Why should we care about something so far into the future?

It’s a beautiful idea if our cosmos can continue to wake up more, and life can flourish here on Earth, not just for the next election cycle, but for billions of years and throughout the cosmos. We have over a billion planets in this galaxy alone, which are very nice and habitable. If we think big together, this can be a powerful way to put our differences aside on Earth and unify around the bigger goal of seizing this great opportunity.

If we were to just blow it by some really poor planning with our technology and go extinct, wouldn’t we really have failed in our responsibility.

What do you see as the risks and the benefits of creating an AI that has consciousness?

There is a lot of confusion in this area. If you worry about some machine doing something bad to you, consciousness is a complete red herring. If you’re chased by a heat-seeking missile, you don’t give a hoot whether it has a subjective experience. You wouldn’t say, “Oh I’m not worried about this missile because it’s not conscious.”

If we create very intelligent machines, if you have a helper robot who you can have conversations with and says pretty interesting things. Wouldn’t you want to know if it feels like something to be that helper robot? If it’s conscious, or if it’s just a zombie pretending to have these experiences? If you knew that it felt conscious much like you do, presumably that would put it ethically in a very different situation.

It’s not our universe giving meaning to us, it’s we conscious beings giving meaning to our universe. If there’s nobody experiencing anything, our whole cosmos just goes back to being a giant waste of space. It’s going to be very important for these various reasons to understand what it is about information processing that gives rise to what we call consciousness.

Why and when should we concern ourselves with outcomes that have low probabilities?

I and most of my AI colleagues don’t think that the probability is very low that we will eventually be able to replicate human intelligence in machines. The question isn’t so much “if,” although there are certainly a few detractors out there, the bigger question is “when.”

If we start getting close to the human-level AI, there’s an enormous Pandora’s Box, which we want to open very carefully and just make sure that if we build these very powerful systems, they should have enough safeguards built into them already that some disgruntled ex-boyfriend isn’t going to use that for a vendetta, and some ISIS member isn’t going to use that for their latest plot.

How can the average concerned citizen get more involved in this conversation, so that we can all have a more active voice in guiding the future of humanity and life?

Everybody can contribute! We set up a website, ageofai.org, where we’re encouraging everybody to come and share their ideas for how they would like the future to be. We really need the wisdom of everybody to chart a future worth aiming for. If we don’t know what kind of future we want, we’re not going to get it.

Podcast: The Art of Predicting with Anthony Aguirre and Andrew Critch

How well can we predict the future? In this podcast, Ariel speaks with Anthony Aguirre and Andrew Critch about the art of predicting the future, what constitutes a good prediction, and how we can better predict the advancement of artificial intelligence. They also touch on the difference between predicting a solar eclipse and predicting the weather, what it takes to make money on the stock market, and the bystander effect regarding existential risks.

Anthony is a professor of physics at the University of California at Santa Cruz. He’s one of the founders of the Future of Life Institute, of the Foundational Questions Institute, and most recently of metaculus.com, which is an online effort to crowdsource predictions about the future of science and technology. Andrew is on a two-year leave of absence from MIRI to work with UC Berkeley’s Center for Human Compatible AI. He cofounded the Center for Applied Rationality, and previously worked as an algorithmic stock trader at Jane Street Capital.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

Ariel: To start, what are predictions? What are the hallmarks of a good prediction? How does that differ from just guessing?

Anthony: I would say there are four aspects to a good prediction. One, it should be specific, well-defined and unambiguous. If you predict something’s going to happen, everyone should agree on whether that thing has happened or not. This can be surprisingly difficult to do.

Second, it should be probabilistic. A really good prediction is a probability for something happening.

Third, a prediction should be precise. If you give everything a 50% chance, you’ll never be terribly wrong, but you’ll also never be terribly right. Predictions are really interesting to the extent that they say something is either very likely or very unlikely. Precision is what we would aim for.

Fourth, you want to be well-calibrated. If there are 100 things that you predict with 90% confidence, around 90% of those things should come true.

The precision and the calibration kind of play off against each other, but it’s very difficult to be both about the future.

Andrew: Of the properties Anthony said, being specific, meaning it’s clear what the prediction is saying and when it will be settled — I think people really don’t appreciate how psychologically valuable that is.

People really undervalue the extent to which the specificity property of prediction is also part of your own training as a predictor. The last property that Anthony said, being calibration, is not just a property of a prediction. It’s a property of a predictor.

A good predictor is somebody who strives for calibration while also trying to be precise and get their probabilities as close to zero and one as they can.

Ariel: What is the difference between prediction versus just guessing or intuition? For example, knowing that the eclipse will happen in August versus not knowing what the weather will be like yet.

Andrew: The problem is that weather data is very unpredictable, and the locations of planets and moons and stars are predictable. I would say that it’s lack of a reliable model for making the prediction or a reliable method.

Anthony: There is an incredibly accurate prediction of the eclipse this coming August, but there is some tiny bit of uncertainty that you don’t see because we know so precisely where the planets are.

When you look at weather, there’s lots of uncertainty because we don’t have some measurement device at every position measuring every temperature and density of the atmosphere and the water at every point on earth. There’s uncertainty in the initial conditions, and then the physics amplifies those initial uncertainties into bigger uncertainties later on. That’s the hallmark of a chaotic physical system, which the atmosphere happens to be.

It’s an interesting thing that the different physical systems are so different in their predictability.

Andrew: That’s a really important thing for people to realize about predicting the future. They see the stock market, how unpredictable it is, and they know the stock market has something to do with the news and with what’s going on in the world. That must mean that the world itself is extremely hard to predict, but I think that’s an error. The reason the stock market is hard to predict is because it is a prediction.

If you’ve already made a prediction, predicting what is wrong about your prediction is really hard — if you knew that, you would have just made that part of your prediction to begin with. That’s something to meditate on. The world is not always as hard to predict as the stock market. I can predict that there’s going to be a traffic jam tomorrow on the commute from the East Bay to San Francisco, between the hours of 6:00 a.m. and 10:00 a.m.

I think some aspects of social systems are actually very easy to predict. An individual human driver, might be very hard to predict. But if you see 10,000 people driving down the highway, you get a strong sense of whether there’s going to be a traffic jam. Sometimes unpredictable phenomena can add up to predictable phenomena, and I think that’s a really important feature of making good long-term predictions with complicated systems.

Anthony: It’s often said that climate is more predictable than weather. Although the individual fluctuations day-to-day are difficult to predict, it’s very easy to predict that, in general, winter in the Northern Hemisphere is going to be colder than the summer. There are lots of statistical regularities that emerge, when you average over large numbers.

Ariel: As we’re trying to understand what the impact of artificial intelligence will be on humanity how do we consider what would be a complex prediction? What’s a simple prediction? What sort of information do we need to do this?

Anthony: Well, that’s a tricky one. One of the best methods of prediction for lots of things is just simple extrapolation. There are many physical systems that, once you can discern if they have a trend, you can fit a pretty simple function to.

When you’re talking about artificial intelligence, there are some hard aspects to predict, but also some relatively easy aspects to predict, like looking at the amount of funding that’s being given to artificial intelligence research or the computing power and computing speed and efficiency, following Moore’s Law and variants of it.

Andrew: People often think of mathematics as a source of certainty, but sometimes you can be certain that you are uncertain or you can be certain that you can’t be certain about something else.

A simple trend, like Moore’s Law, is a summary of what you see from a very complicated system, namely a bunch of companies and a bunch of people working to build smaller and faster and cheaper and more energy efficient hardware. That’s a very complicated system that somehow adds up to fairly simple behavior.

A hallmark of good prediction is, when you find a trend, the first question you should ask yourself is what is giving rise to this trend, and can I expect that to continue? That’s a bit of an art. It’s kind of more art than science, but it’s a critical art, because otherwise we end up blindly following trends that are bound to fail.

Ariel: I want to ask about who is making the prediction. With AI, for example, we see smart people in the field who predict AI will make life great and others are worried. With existential risks we see surveys and efforts in which experts in the field try to predict the odds of human extinction. How much can we rely on “experts in the field”?

Andrew: I can certainly tell you that thinking for 30 consecutive minutes about what could cause human extinction is much more productive than thinking for one consecutive minute. There are hard-to-notice mistakes about human extinction predictions that you probably can’t figure out from 30 seconds of reasoning.

Not everyone who’s an expert, say, in nuclear engineering or artificial intelligence is an expert in reasoning about human extinction. You have to be careful who you call an expert.

Anthony: I also feel that something similar is true about prediction. In general, making predictions is greatly aided if you have domain knowledge and expertise in the thing that you’re making a prediction about, but far from sufficient to make accurate predictions.

One of the experiences I’ve seen running Metaculus, is that there are people that know a tremendous amount about a subject and just are terrible at making predictions about it. Other people, who, even if their actual domain knowledge is lower, the fact that they are comfortable with statistics, that they’ve had practice making predictions are just much, much better at it.

Ariel: Anthony, with Metaculus, one of the things that you’re trying to do is get more people involved in predicting. What is the benefit of more people?

Anthony: There are a few benefits. One is that lots of people get the benefit of practice. Thinking about things that you tend to be more wrong on and what they might correlate with — that’s incredibly useful and makes you more effective.

In terms of actually creating accurate predictions, you’ll have more people who are really good at it. You can figure out who is good at predicting, and who is good at predicting a particular type of thing. One of the interesting things is that it isn’t just luck. There is a skill that people can develop and obtain, and then can be relied upon in the future.

Then, the third, and maybe this is the most important, is just statistics. Aggregating lots of people’s predictions tends to make a more accurate aggregate.

Andrew: I would also just like to say that I think the existence of systems like Metaculus are going to be really important for society improving its ability to understand the world.

Whose job is it to think for a solid hour about a human extinction risk? The answer is almost nobody. So we ought not to expect that just averaging the wisdom of the crowds is going to do super well on answering a question like that.

Ariel: Back to artificial intelligence and the question of timelines. How helpful is it for us to try to make predictions about when things will happen with AI? And who should make those predictions?

Andrew: I have made a career shift to coming up with trying to design control mechanisms for highly intelligent AI. I made that career shift, based on my own personal forecast of the future and what I think will be important, but I don’t reevaluate that forecast every day, just as I don’t reevaluate what neighborhood I should live in every day. You, at some point, need to commit to a path and follow that path for a little while to get anything done.

I think most AI researchers should, at some point, do the mental exercise of mapping out timelines and seeing what needs to happen, but they should do it deeply once every few years in collaboration with a few other people, and then stick to something that they think is going to help steer AI in a positive direction. I see a tendency to too frequently reevaluate timeline analyses of what’s going to happen in AI.

My answer to you is kind of everyone, but not everyone at once.

Anthony: I think there’s one other interesting question, which is the degree to which we want there to be accurate predictions and lots of people know what those accurate predictions are.

In general, I think more information is better, but it’s not necessarily the case that more information is better all the time. Suppose, that I became totally convinced, using Metaculus, that there was a high probability that artificial superintelligence was happening in the next 10 years. That would be a pretty big deal. I’d really want to think through what effect that information would have on various actors, national governments, companies, and so on. It could instigate a lot of issues. Those are things that I think we have to really carefully consider.

Andrew: Yeah, Anthony, I think that’s a great important issue. I don’t think there are enough scientific norms in circulation for what to do with a potentially dangerous discovery. Honestly, I feel like the discourse in most of science is a little bit head in the sand about the feasibility of creating existential risks from technology.

You might think it would be so silly and dumb to have some humans produce some technology that accidentally destroyed life, but just because it’s silly doesn’t mean it won’t happen. It’s the bystander effect. It’s very easy for us to fall into the trap of: “I don’t need to worry about developing dangerous technology, because if I was close to something dangerous, surely someone would have thought that through.”

You have to ask: whose job is it to be worried? If no one in the artificial intelligence community is point on noticing existential threats, maybe no one will notice the existential threats and that will be bad. The same goes for the technology that could be used by bad actors to produce dangerous synthetic viruses.

If you’ve got something that you think is 1% likely to pose an extinction threat, that seems like a small probability. Nonetheless, if 100 people have a 1% chance of causing human extinction, well someone probably has a good chance of doing it.

Ariel: Is there something hopeful that you want to add?

Anthony: Pretty much every decision that we make is implicitly built on a prediction. I think that if we can get better at predicting, individually, as a group, as a society, that should really help us choose a more wise path into the future, and hopefully that can happen.

Andrew: Hear, hear.

Visit metaculus.com to try your hand at the art of predicting.

 

Towards a Code of Ethics in Artificial Intelligence with Paula Boddington

AI promises a smarter world – a world where finance algorithms analyze data better than humans, self-driving cars save millions of lives from accidents, and medical robots eradicate disease. But machines aren’t perfect. Whether an automated trading agent buys the wrong stock, a self-driving car hits a pedestrian, or a medical robot misses a cancerous tumor – machines will make mistakes that severely impact human lives.

Paula Boddington, a philosopher based in the Department of Computer Science at Oxford, argues that AI’s power for good and bad makes it crucial that researchers consider the ethical importance of their work at every turn. To encourage this, she is taking steps to lay the groundwork for a code of AI research ethics.

Codes of ethics serve a role in any field that impacts human lives, such as in medicine or engineering. Tech organizations like the Institute for Electronics and Electrical Engineers (IEEE) and the Association for Computing Machinery (ACM) also adhere to codes of ethics to keep technology beneficial, but no concrete ethical framework exists to guide all researchers involved in AI’s development. By codifying AI research ethics, Boddington suggests, researchers can more clearly frame AI’s development within society’s broader quest of improving human wellbeing.

To better understand AI ethics, Boddington has considered various areas including autonomous trading agents in finance, self-driving cars, and biomedical technology. In all three areas, machines are not only capable of causing serious harm, but they assume responsibilities once reserved for humans. As such, they raise fundamental ethical questions.

“Ethics is about how we relate to human beings, how we relate to the world, how we even understand what it is to live a human life or what our end goals of life are,” Boddington says. “AI is raising all of those questions. It’s almost impossible to say what AI ethics is about in general because there are so many applications. But one key issue is what happens when AI replaces or supplements human agency, a question which goes to the heart of our understandings of ethics.”

 

The Black Box Problem

Because AI systems will assume responsibility from humans – and for humans – it’s important that people understand how these systems might fail. However, this doesn’t always happen in practice.

Consider the Northpointe algorithm that US courts used to predict reoffending criminals. The algorithm weighed 100 factors such as prior arrests, family life, drug use, age and sex, and predicted the likelihood that a defendant would commit another crime. Northpointe’s developers did not specifically consider race, but when investigative journalists from ProPublica analyzed Northpointe, it found that the algorithm incorrectly labeled black defendants as “high risks” almost twice as often as white defendants. Unaware of this bias and eager to improve their criminal justice system, states like Wisconsin, Florida, and New York trusted the algorithm for years to determine sentences. Without understanding the tools they were using, these courts incarcerated defendants based on flawed calculations.

The Northpointe case offers a preview of the potential dangers of deploying AI systems that people don’t fully understand. Current machine-learning systems operate so quickly that no one really knows how they make decisions – not even the people who develop them. Moreover, these systems learn from their environment and update their behavior, making it more difficult for researchers to control and understand the decision-making process. This lack of transparency – the “black box” problem – makes it extremely difficult to construct and enforce a code of ethics.

Codes of ethics are effective in medicine and engineering because professionals understand and have control over their tools, Boddington suggests. There may be some blind spots – doctors don’t know everything about the medicine they prescribe – but we generally accept this “balance of risk.”

“It’s still assumed that there’s a reasonable level of control,” she explains. “In engineering buildings there’s no leeway to say, ‘Oh I didn’t know that was going to fall down.’ You’re just not allowed to get away with that. You have to be able to work it out mathematically. Codes of professional ethics rest on the basic idea that professionals have an adequate level of control over their goods and services.”

But AI makes this difficult. Because of the “black box” problem, if an AI system sets a dangerous criminal free or recommends the wrong treatment to a patient, researchers can legitimately argue that they couldn’t anticipate that mistake.

“If you can’t guarantee that you can control it, at least you could have as much transparency as possible in terms of telling people how much you know and how much you don’t know and what the risks are,” Boddington suggests. “Ethics concerns how we justify ourselves to others. So transparency is a key ethical virtue.”

 

Developing a Code of Ethics

Despite the “black box” problem, Boddington believes that scientific and medical communities can inform AI research ethics. She explains: “One thing that’s really helped in medicine and pharmaceuticals is having citizen and community groups keeping a really close eye on it. And in medicine there are quite a few “maverick” or “outlier” doctors who question, for instance, what the end value of medicine is. That’s one of the things you need to develop codes of ethics in a robust and responsible way.”

A code of AI research ethics will also require many perspectives. “I think what we really need is diversity in terms of thinking styles, personality styles, and political backgrounds, because the tech world and the academic world both tend to be fairly homogeneous,” Boddington explains.

Not only will diverse perspectives account for different values, but they also might solve problems better, according to research from economist Lu Hong and political scientist Scott Page. Hong and Page found that if you compare two groups solving a problem – one homogeneous group of people with very high IQs, and one diverse group of people with lower IQs – the diverse group will probably solve the problem better.

 

Laying the Groundwork

This fall, Boddington will release the main output of her project: a book titled Towards a Code of Ethics for Artificial Intelligence. She readily admits that the book can’t cover every ethical dilemma in AI, but it should help demonstrate how tricky it is to develop codes of ethics for AI and spur more discussion on issues like how codes of professional ethics can deal with the “black box” problem.

Boddington has also collaborated with the IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, which recently released a report exhorting researchers to look beyond the technical capabilities of AI, and “prioritize the increase of human wellbeing as our metric for progress in the algorithmic age.”

Although a formal code is only part of what’s needed for the development of ethical AI, Boddington hopes that this discussion will eventually produce a code of AI research ethics. With a robust code, researchers will be better equipped to guide artificial intelligence in a beneficial direction.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Podcast: Banning Nuclear and Autonomous Weapons with Richard Moyes and Miriam Struyk

How does a weapon go from one of the most feared to being banned? And what happens once the weapon is finally banned? To discuss these questions, Ariel spoke with Miriam Struyk and Richard Moyes on the podcast this month. Miriam is Programs Director at PAX. She played a leading role in the campaign banning cluster munitions and developed global campaigns to prohibit financial investments in producers of cluster munitions and nuclear weapons. Richard is the Managing Director of Article 36. He’s worked closely with the International Campaign to Abolish Nuclear Weapons, he helped found the Campaign to Stop Killer Robots, and he coined the phrase “meaningful human control” regarding autonomous weapons.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety here.

Why is a ban on nuclear weapons important, even if nuclear weapons states don’t sign?

Richard: This process came out the humanitarian impact of nuclear weapons: from the use of a single nuclear weapon that would potentially kill hundreds of thousands of people, up to the use of multiple nuclear weapons which could have devastating impacts for human society and for the environment as a whole. These weapons should be considered illegal because their effects cannot be contained or managed in a way that avoids massive suffering.

At the same time, it’s a process that’s changing the landscape against which those states continue to maintain and assert the validity of their maintenance of nuclear weapons. By changing that legal background, we’re potentially in position to put much more pressure on those states to move towards disarmament as a long-term agenda.

Miriam: At a time when we see erosion of international norms, it’s quite astonishing that in less than two weeks, we’ll have an international treaty banning nuclear weapons. For too long nuclear weapons were mythical, symbolic weapons, but we never spoke about what these weapons actually do and whether we think that’s illegal.

This treaty brings back the notion of what do these weapons do and do we want that.

It also brings democratization of security policy. This is a process that was brought about by several states and also by NGOs, by the ICRC and other actors. It’s so important that it’s actually citizens speaking about nukes and whether we think they’re acceptable or not.

What is an autonomous weapon system?

Richard: If I might just backtrack a little — an important thing to recognize in all of these contexts is that these weapons don’t prohibit themselves — weapons have been prohibited because a diverse range of actors from civil society and from international organizations and from states have worked together.

Autonomous weapons are really an issue of new and emerging technologies and the challenges that new and emerging technologies present to society particularly when they’re emerging in the military sphere — a sphere which is essentially about how we’re allowed to kill each other or how we’re allowed to use technologies to kill each other.

Autonomous weapons are a movement in technology to a point where we will see computers and machines making decisions about where to apply force, about who to kill when we’re talking about people, or what objects to destroy when we’re talking about material.

What is the extent of autonomous weapons today versus what do we anticipate will be designed in the future?

Miriam: It depends a lot on your definition of course. I’m still, in a way, a bit of an optimist by saying that perhaps we can prevent the emergence of lethal autonomous weapon systems. But I also see some similarities that lethal autonomous weapons systems, like we had with nuclear weapons a few decades ago, can lead to an arms race, and can lead to more global insecurity, and can also lead to warfare.

The way we’re approaching lethal autonomous weapon systems is to try to ban them before we see horrible humanitarian consequences. How does that change your approach from previous weapons?

Richard: That this is a more future-orientated debate definitely creates different dynamics. But other weapon systems have been prohibited. Blinding laser weapons were prohibited when there was concern that laser systems designed to blind people were going to become a feature of the battlefield.

In terms of autonomous weapons, we already see significant levels of autonomy in certain weapon systems today and again I agree with Miriam in terms of recognition that certain definitional issues are very important in all of this.

One of the ways we’ve sought to orientate to this is by thinking about the concept of meaningful human control. What are the human elements that we feel are important to retain? We are going to see more and more autonomy within military operations. But in certain critical functions around how targets are identified and how force is applied and over what period of time — those are areas where we will potentially see an erosion of a level of human, essentially moral, engagement that is fundamentally important to retain.

Miriam: This is not so much about a weapon system but how do we control warfare and how do we maintain human control in the sense that it’s a human deciding who is legitimate target and who isn’t.

An argument in favor of autonomous weapons is that they can ideally make decisions better than humans and potentially reduce civilian casualties. How do you address that argument?

Miriam: We’ve had that debate with other weapon systems, as well, where the technological possibilities were not what they were promised to be as soon as they were used.

It’s an unfair debate because it’s mainly from states with developed industries who are most likely the ones using some form of lethal autonomous weapons systems first. Flip the question and say, ‘what if these systems will be used against your soldiers or in your country?’ Suddenly you enter a whole different debate. I’m highly skeptical of people who say it could actually be beneficial.

Richard: I feel like there are assertions of “goodies” and “baddies” and our ability to label one from the other. To categorize people and things in society in such an accurate way is somewhat illusory and something of a misunderstanding of the reality of conflict.

Any claims that we can somehow perfect violence in a way where it can be distributed by machinery to those who deserve to receive it and that there’s no tension or moral hazard in that — that is extremely dangerous as an underpinning concept because, in the end, we’re talking about embedding categorizations of people and things within a micro bureaucracy of algorithms and labels.

Violence in society is a human problem and it needs to continue to be messy to some extent if we’re going to recognize it as a problem.

What is the process right now for getting lethal autonomous weapons systems banned?

Miriam: We started the International Campaign to Stop Killer Robots in 2013 — it immediately gave a push to the international discussion, including the one on the Human Rights Council and within the Conventional Weapons in Geneva. We saw a lot of debates there in 2013, 2014, and 2015and the last one was in April.

At the last CCW meeting it was decided that a group of governmental experts should start within CCW to look at these type of weapons which was applauded by many states.

Unfortunately, due to financial issues, the meeting has been canceled. So we’re in a bit of a silence mode right now. But that doesn’t mean there’s no progress. We have 19 states who called for a ban, and more than 70 states within the CCW framework discussing this issue. We know from other treaties that you need these kind of building blocks.

Richard: Engaging scientists and roboticists and AI practitioners around these themes — it’s one of the challenges sometimes that the issues around weapons and conflict can sometimes be treated as very separate from other parts of society. It is significant that the decisions that get made about the limits essentially of AI-driven decision making about life and death in the context of weapons could well have implications in the future regarding how expectations and discussions get set elsewhere.

What is the most important for people to understand about nuclear and autonomous weapon systems?

Miriam: Both systems go way beyond the discussion about weapon systems: it’s about what kind of world and society do we want to live in. None of these — not killer robots, not nuclear weapons — are an answer to any of the threats that we face right now, be it climate change, be it terrorism. It’s not an answer. It’s only adding more fuel to an already dangerous world.

Richard: Nuclear weapons — they’ve somehow become a very abstract, rather distant issue. Simple recognition of the scale of humanitarian harm from a nuclear weapon is the most substantial thing — hundreds of thousands killed and injured. [Leaders of nuclear states are] essentially talking about incinerating hundreds of thousands of normal people — probably in a foreign country — but recognizable, normal people. The idea that that can be approached in some ways glibly or confidently at all is I think very disturbing. And expecting that at no point will something go wrong — I think it’s a complete illusion.

On autonomous weapons — what sort of society do we want to live in, and how much are we prepared to hand over to computers and machines? I think handing more and more violence over to such processes does not augur well for our societal development.

This podcast was edited by Tucker Davey.

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we interview researchers and thought leaders who we believe will help spur discussion within our community. The interviews do not necessarily represent FLI’s opinions or views.

Podcast: Creative AI with Mark Riedl & Scientists Support a Nuclear Ban

If future artificial intelligence systems are to interact with us effectively, Mark Riedl believes we need to teach them “common sense.” In this podcast, I interviewed Mark to discuss how AIs can use stories and creativity to understand and exhibit culture and ethics, while also gaining “common sense reasoning.” We also discuss the “big red button” problem with AI safety, the process of teaching rationalization to AIs, and computational creativity. Mark is an associate professor at the Georgia Tech School of interactive computing, where his recent work focuses on human-AI interaction and how humans and AI systems can understand each other.

The following transcript has been heavily edited for brevity (the full podcast also includes interviews about the UN negotiations to ban nuclear weapons, not included here). You can read the full transcript here.

Ariel: Can you explain how an AI could learn from stories?

Mark: I’ve been looking at ‘common sense errors’ or ‘common sense goal errors.’ When humans want to communicate to an AI system what they want to achieve, they often leave out the most basic rudimentary things. We have this model that whoever we’re talking to understands the everyday details of how the world works. If we want computers to understand how the real world works and what we want, we have to figure out ways of slamming lots of common sense, everyday knowledge into them.

When looking for sources of common sense knowledge, we started looking at stories – fiction, non-fiction, blogs. When we write stories we implicitly put everything that we know about the real world and how our culture works into characters.

One of my long-term goals is to say: ‘How much cultural and social knowledge can we extract by reading stories, and can we get this into AI systems who have to solve everyday problems, like a butler robot or a healthcare robot?’

Ariel: How do you choose which stories to use?

Mark: Through crowd sourcing services like Mechanical Turk, we ask people to tell stories about common things like, how do you go to a restaurant or how do you catch an airplane. Lots of people tell a story about the same topic and have agreements and disagreements, but the disagreements are a very small proportion. So we build an AI system that looks for commonalities. The common elements that everyone implicitly agrees on bubble to the top and the outliers get left along the side. And AI is really good at finding patterns.

Ariel: How do you ensure that’s happening?

Mark: When we test our AI system, we watch what it does, and we have things we do not want to see the AI do. But we don’t tell it in advance. We’ll put it into new circumstances and say, do the things you need to do, and then we’ll watch to make sure those [unacceptable] things don’t happen.

When we talk about teaching robots ethics, we’re really asking how we help robots avoid conflict with society and culture at large. We have socio-cultural patterns of behavior to help humans avoid conflict with other humans. So when I talk about teaching morality to AI systems, what we’re really talking about is: can we make AI systems do the things that humans normally do? That helps them fit seamlessly into society.

Stories are written by all different cultures and societies, and they implicitly encode moral constructs and beliefs into their protagonists and antagonists. We can look at stories from different continents and even different subcultures, like inner city versus rural.

Ariel: I want to switch to your recent paper on Safely Interruptible Agents, which were popularized in the media as the big red button problem.

Mark: At some point we’ll have robots and AI systems that are so sophisticated in their sensory abilities and their abilities to manipulate the environment, that they can theoretically learn that they have an off switch – what we call the big red button – and learn to keep humans from turning them off.

If an AI system gets a reward for doing something, turning it off means it loses the reward. A robot that’s sophisticated enough can learn that certain actions in the environment reduce future loss of reward. We can think of different scenarios: locking a door to a control room so the human operator can’t get in, physically pinning down a human. We can let our imaginations go even wilder than that.

Robots will always be capable of making mistakes. We’ll always want an operator in the loop who can push this big red button and say: ‘Stop. Someone is about to get hurt. Let’s shut things down.’ We don’t want robots learning that they can stop humans from stopping them, because that ultimately will put people into harms way.

Google and their colleagues came up with this idea of modifying the basic algorithms inside learning robots, so that they are less capable of learning about the big red button. And they came up with this very elegant theoretical framework that works, at least in simulation. My team and I came up with a different approach: to take this idea from The Matrix, and flip it on its head. We use the big red button to intercept the robot’s sensors and motor controls and move it from the real world into a virtual world, but the robot doesn’t know it’s in a virtual world. The robot keeps doing what it wants to do, but in the real world the robot has stopped moving.

Ariel: Can you also talk about your work on explainable AI and rationalization?

Mark: Explainability is a key dimension of AI safety. When AI systems do something unexpected or fail unexpectedly, we have to answer fundamental questions: Was this robot trained incorrectly? Did the robot have the wrong data? What caused the robot to go wrong?

If humans can’t trust AI systems, they won’t use them. You can think of it as a feedback loop, where the robot should understand humans’ common sense goals, and the humans should understand how robots solve problems.

We came up with this idea called rationalization: can we have a robot talk about what it’s doing as if a human were doing it? We get a bunch of humans to do some tasks, we get them to talk out loud, we record what they say, and then we teach the robot to use those same words in the same situations.

We’ve tested it in computer games. We have an AI system that plays Frogger, the classic arcade game in which the frog has to cross the street. And we can have a Frogger talk about what it’s doing. It’ll say things like “I’m waiting for a gap in the cars to open before I can jump forward.”

This is significant because that’s what you’d expect something to say, but the AI system is doing something completely different behind the scenes. We don’t want humans watching Frogger to have to know anything about rewards and reinforcement learning and Bellman equations. It just sounds like it’s doing the right thing.

Ariel: Going back a little in time – you started with computational creativity, correct?

Mark: I have ongoing research in computational creativity. When I think of human AI interaction, I really think, ‘what does it mean for AI systems to be on par with humans?’ The human is going make cognitive leaps and creative associations, and if the computer can’t make these cognitive leaps, it ultimately won’t be useful to people.

I have two things that I’m working on in terms of computational creativity. One is story writing. I’m interested in how much of the creative process of storytelling we can offload from the human onto a computer. I’d like to go up to a computer and say, “hey computer, tell me a story about X, Y or Z.”

I’m also interested in whether an AI system can build a computer game from scratch. How much of the process of building the construct can the computer do without human assistance?

Ariel: We see fears that automation will take over jobs, but typically for repetitive tasks. We’re still hearing that creative fields will be much harder to automate. Is that the case?

Mark: I think it’s a long, hard climb to the point where we’d trust AI systems to make creative decisions, whether it’s writing an article for a newspaper or making art or music.

I don’t see it as a replacement so much as an augmentation. I’m particularly interested in novice creators – people who want to do something artistic but haven’t learned the skills. I cannot read or write music, but sometimes I get these tunes in my head and I think I can make a song. Can we bring the AI in to become the skills assistant? I can be the creative lead and the computer can help me make something that looks professional. I think this is where creative AI will be the most useful.

For the second half of this podcast, I spoke with scientists, politicians, and concerned citizens about why they support the upcoming negotiations to ban nuclear weapons. Highlights from these interviews include comments by Congresswoman Barbara Lee, Nobel Laureate Martin Chalfie, and FLI president Max Tegmark.

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we interview researchers and thought leaders who we believe will help spur discussion within our community. The interviews do not necessarily represent FLI’s opinions or views.

Podcast: Climate Change with Brian Toon and Kevin Trenberth

Too often, the media focus their attention on climate-change deniers, and as a result, when scientists speak with the press, it’s almost always a discussion of whether climate change is real. Unfortunately, that can make it harder for those who recognize that climate change is a legitimate threat to fully understand the science and impacts of rising global temperatures.

I recently visited the National Center for Atmospheric Research in Boulder, CO and met with climate scientists Dr. Kevin Trenberth and CU Boulder’s Dr. Brian Toon to have a different discussion. I wanted better answers about what climate change is, what its effects could be, and how can we prepare for the future.

The discussion that follows has been edited for clarity and brevity, and I’ve added occasional comments for context. You can also listen to the podcast above or read the full transcript here for more in-depth insight into these issues.

Our discussion began with a review of the scientific evidence behind climate change.

Trenberth: “The main source of human-induced climate change is from increasing carbon dioxide and other greenhouse gases in the atmosphere. And we have plenty of evidence that we’re responsible for the over 40% increase in carbon dioxide concentrations in the atmosphere since pre-industrial times, and more than half of that has occurred since 1980.”

Toon: “I think the problem is that carbon dioxide is rising proportional to population on the Earth. If you just plot carbon dioxide in the last few decades versus global population, it tracks almost exactly. In coming decades, we’re increasing global population by a million people a week. That’s a new city in the world of a million people every week somewhere, and the amount of energy that’s already committed to supporting this increasing population is very large.”

The financial cost of climate change is also quite large.

Trenberth: “2012 was the warmest year on record in the United States. There was a very widespread drought that occurred, starting here in Colorado, in the West. The drought itself was estimated to cost about $75 billion. Superstorm Sandy is a different example, and the damages associated with that are, again, estimated to be about $75 billion. At the moment, the cost of climate and weather related disasters is something like $40 billion a year.”

We discussed possible solutions to climate change, but while solutions exist, it was easy to get distracted by just how large – and deadly — the problem truly is.

Toon: “Technologically, of course, there are lots of things we can do. Solar energy and wind energy are both approaching or passing the cost of fossil fuels, so they’re advantageous. [But] there’s other aspects of this like air pollution, for example, which comes from burning a lot of fossil fuels. It’s been estimated to kill seven million people a year around the Earth. Particularly in countries like China, it’s thought to be killing about a million people a year. Even in the United States, it’s causing probably 10,000 or more deaths a year.”

Unfortunately, Toon may be underestimating the number of US deaths resulting from air pollution. A 2013 study out of MIT found that air pollution causes roughly 200,000 early deaths in the US each year. And there’s still the general problem that carbon in the atmosphere (not the same as air pollution) really isn’t something that will go away anytime soon.

Toon: “Carbon dioxide has a very, very long lifetime. Early IPCC reports would often say carbon dioxide has a lifetime of 50 years. Some people interpreted that to mean it’ll go away in 50 years, but what it really meant was that it would go into equilibrium with the oceans in about 50 years. When you go somewhere in your car, about 20% of that carbon dioxide that is released to the atmosphere is still going to be there in thousands of years. The CO2 has lifetimes of thousands and thousands of years, maybe tens or hundreds of thousands of years. It’s not reversible.”

Trenberth: “Every springtime, the trees take up carbon dioxide and there’s a draw-down of carbon dioxide in the atmosphere, but then, in the fall, the leaves fall on the forest floor and the twigs and branches and so on, and they decay and they put carbon dioxide back into the atmosphere. People talk about growing more trees, which can certainly take carbon dioxide out of the atmosphere to some extent, but then what do you do with all the trees? That’s part of the issue. Maybe you can bury some of them somewhere, but it’s very difficult. It’s not a full solution to the problem.”

Toon: “The average American uses the equivalent of about five tons of carbon a year – that’s an elephant or two. That means every year you have to go out in your backyard and bury an elephant or two.”

We know that climate change is expected to impact farming and sea levels. And we know that the temperature changes and increasing ocean acidification could cause many species to go extinct. But for the most part, scientists aren’t worried that climate change alone could cause the extinction of humanity. However, as a threat multiplier – that is, something that triggers other problems – climate change could lead to terrible famines, pandemics, and war. And some of this may already be underway.

Trenberth: “You don’t actually have to go a hundred years or a thousand years into the future before things can get quite disrupted relative to today. You can see some signs of that if you look around the world now. There’s certainly studies that have suggested that the changes in climate, and the droughts that occur and the wildfires and so on are already extra stressors on the system and have exacerbated wars in Sudan and in Syria. It’s one of the things which makes it very worrying for security around the world to the defense department, to the armed services, who are very concerned about the destabilizing effects of climate change around the world.”

Some of the instabilities around the world today are already leading to discussion about the possibility of using nuclear weapons. But too many nuclear weapons could trigger the “other” climate change: nuclear winter.

Toon: “Nuclear winter is caused by burning cities. If there were a nuclear war in which cities were attacked then the smoke that’s released from all those fires can go into the stratosphere and create a veil of soot particles in the upper atmosphere, which are very good at absorbing sunlight. It’s sort of like geoengineering in that sense; it reduces the temperature of the planet. Even a little war between India and Pakistan, for example — which, incidentally, have about 400 nuclear weapons between them at the moment — if they started attacking each other’s cities, the smoke from that could drop the temperature of the Earth back to preindustrial conditions. In fact, it’d be lower than anything we’ve seen in the climate record since the end of the last ice age, which would be devastating to mid-latitude agriculture.

“This is an issue people don’t really understand: the world food storage is only about 60 days. There’s not enough food on that planet to feed the population for more than 60 days. There’s only enough food in an average city to feed the city for about a week. That’s the same kind of issue that we’re coming to also with the changes in agriculture that we might face in the next century just from global warming. You have to be able to make up those food losses by shipping food from some other place. Adjusting to that takes a long time.”

Concern about our ability to adjust was a common theme. Climate change is occurring so rapidly that it will be difficult for all species, even people, to adapt quickly enough.

Trenberth: “We’re way behind in terms of what is needed because if you start really trying to take serious action on this, there’s a built-in delay of 20 or 30 years because of the infrastructure that you have in order to change that around. Then there’s another 20-year delay because the oceans respond very, very slowly. If you start making major changes now, you end up experiencing the effects of those changes maybe 40 years from now or something like that. You’ve really got to get ahead of this.

“The atmosphere is a global commons. It belongs to everyone. The air that’s over the US, a week later is over in Europe, and a week later it’s over China, and then a week later it’s back over the US again. If we dump stuff into the atmosphere, it gets shared among all of the nations.”

Toon: “Organisms are used to evolving and compensating for things, but not on a 40-year timescale. They’re used to slowly evolving and slowly responding to the environment, and here they’re being forced to respond very quickly. That’s an extinction problem. If you make a sudden change in the environment, you can cause extinctions.”

As dire as the situation might seem, there are still ways in which we can address climate change.

Toon: “I’m hopeful, at the local level, things will happen, I’m hopeful that money will be made out of converting to other energy systems, and that those things will move us forward despite the inability, apparently, of politicians to deal with things.”

Trenberth: “The real way of doing this is probably to create other kinds of incentives such as through a carbon tax, as often referred to, or a fee on carbon of some sort, which recognizes the downstream effects of burning coal both in terms of air pollution and in terms of climate change that’s currently not built into the cost of burning coal, and it really ought to be.”

Toon: “[There] is not really a question anymore about whether climate change is occurring or not. It certainly is occurring. However, how do you respond to that? What do you do? At least in the United States, it’s very clear that we’re a capitalistic society, and so we need to make it economically advantageous to develop these new energy technologies. I suspect that we’re going to see the rise of China and Asia in developing renewable energy and selling that throughout the world for the reason that it’s cheaper and they’ll make money out of it. [And] we’ll wake up behind the curve.”

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we interview researchers and thought leaders who we believe will help spur discussion within our community. The interviews do not necessarily represent FLI’s opinions or views.

Podcast: Law and Ethics of Artificial Intelligence

The rise of artificial intelligence presents not only technical challenges, but important legal and ethical challenges for society, especially regarding machines like autonomous weapons and self-driving cars. To discuss these issues, I interviewed Matt Scherer and Ryan Jenkins. Matt is an attorney and legal scholar whose scholarship focuses on the intersection between law and artificial intelligence. Ryan is an assistant professor of philosophy and a senior fellow at the Ethics and Emerging Sciences group at California Polytechnic State, where he studies the ethics of technology.

In this podcast, we discuss accountability and transparency with autonomous systems, government regulation vs. self-regulation, fake news, and the future of autonomous systems.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

Ariel: I typically think of ethics as the driving force behind law. As such, Ryan, I was hoping you could talk about the ethical issues facing us today when it comes to artificial intelligence.

Ryan: Broadly speaking, the mission of both ethics and law might be to discover how to best structure life within a community and to see to it that that community does flourish once we know certain truths. Ethics does some of the investigation about what kinds of things matter morally, what kinds of lives are valuable, how should we treat other people. Law does an excellent job of codifying those things and enforcing those things.

One of the easiest ways of telling whether a decision is a moral decision is whether it stands to make some people better off and some people worse off. And we’re seeing that take place right now with artificial intelligence. That adds new wrinkles to these decisions because oftentimes the decisions of AI are opaque to us, they’re difficult to understand, they might be totally mysterious. And while we’re fascinated by what AI can do, I think the developers of AI have implemented these technologies before we fully understand what they’re capable of and how they’re making decisions.

Ariel: Can you give some examples of that?

Ryan: There was an excellent piece by ProPublica about bias in the criminal justice system, where they use risk assessment algorithms to judge, for example, a person’s probability of re-committing a crime after they’re released from prison.

ProPublica did an audit of this software, and they found that not only does it make mistakes about half the time, but it was systematically underestimating the threat from white defendants and systematically overestimating the threat from black defendants. White defendants were being given leaner sentences, black defendants as a group were being given harsher sentences.

When the company that produced the algorithm was asked about this, they said look, it takes in something like 137 factors, but race is not one of them’. So it was making mistakes that were systematically biased in a way that was race-based, and it was difficult to explain why. This is the kind of opaque decision making that’s taking place by artificial intelligence.

Ariel: As AI advances, what are some of the ethical issues that you anticipate cropping up?

Ryan: There’s been a lot of ink spilled about the threat that automation poses to unemployment. Some of the numbers coming out of places like Oxford are quite alarming. They say as many of 50% of American jobs could be eliminated by automation in the next couple decades.

Besides the obvious fact that having unemployed people is bad for society, it raises more foundational questions about the way that we think about work, the way that we think about people having to “earn a living” or “contribute to society.” The idea that someone needs to work in order to be kept alive. And most of us walk around with some kind of moral claim like this in our back pocket without fully considering the implications.

Ariel: And Matt, what are some of the big legal issues facing us today when it comes to artificial intelligence?

Matt: The way that legal systems across the world work is by assigning legal rights and responsibilities to people. The assumption is that any decision that has an impact on the life of another person is going to be made by a person. So when you have a machine making the decisions rather than humans, one of the fundamental assumptions of our legal system goes away. Eventually that’s going to become very difficult because there seems to be the promise of AI displacing human decisionmakers out of a wide variety of sectors. As that happens, it’s going to be much more complicated to come up with lines of legal responsibility.

I don’t think we can comprehend what society is going to be like 50 years from now if a huge number of industries ranging from medicine to law to financial services are in large part being run by the decisions of machines. At some point, the question is how much control can humans really say that they still have.

Ariel: You were talking earlier about decision making with autonomous technologies, and one of the areas where we see this is with self driving cars and autonomous weapons. I was hoping you could both talk about the ethical and legal implications in those spheres.

Matt: Part of the problem with relying on law to set standards of behavior is that law does not move as fast as technology does. It’s going to be a long time before the really critical changes in our legal systems are changed in a way that allows for the widespread deployment of autonomous vehicles.

One thing that I could envision happening in the next 10 years is that pretty much all new vehicles while they’re on an expressway are controlled by an autonomous system, and it’s only when they get off an expressway and onto a surface street that they switch to having the human driver in control of the vehicle. So, little by little, we’re going to see this sector of our economy get changed radically.

Ryan: One of my favorite philosophers of technology [is] Langdon Winner. His famous view is that we are sleepwalking into the future of technology. We’re continually rewriting and recreating these structures that affect how we’ll live, how we’ll interact with each other, what we’re able to do, what we’re encouraged to do, what we’re discouraged from doing. We continually recreate these constraints on our world, and we do it oftentimes without thinking very carefully about it. To steal a line from Winston Churchill, technology seems to get halfway around the world before moral philosophy can put its pants on. And we’re seeing that happening with autonomous vehicles.

Tens of thousands of people die on US roads every year. Oftentimes those crashes involve choices about who is going to be harmed and who’s not, even if that’s a trade-off between someone outside the car and a passenger or a driver inside the car.

These are clearly morally important decisions, and it seems that manufacturers are still trying to brush these aside. They’re either saying that these are not morally important decisions, or they’re saying that the answers to them are obvious. They’re certainly not always questions with obvious answers. Or if the manufacturers admit that they’re difficult answers, then they think, ‘well the decisions are rare enough that to agonize over them might postpone other advancements in the technology’. That’s a legitimate concern, if it were true that these decisions were rare, but there are tens of thousands of people killed on US roads and hundreds of thousands who are injured every year.

Ariel: I’d like to also look at autonomous weapons. Ryan, what’s your take on some of the ethical issues?

Ryan: There could very well be something that’s uniquely troubling, uniquely morally problematic about delegating the task of who should live and who should die to a machine. But once we dig into these arguments, it’s extremely difficult to pinpoint exactly what’s problematic about killer robots. We’d be right to think, today, that machines probably aren’t reliable enough to make discernments in the heat of battle about which people are legitimate targets and which people are not. But if we imagine a future where robots are actually pretty good at making those kinds of decisions, where they’re perhaps even better behaved than human soldiers, where they don’t get confused, they don’t see their comrade killed and go on a killing spree or go into some berserker rage, and they’re not racist, or they don’t have the kinds of biases that humans are vulnerable to…

If we imagine a scenario where we can greatly reduce the number of innocent people killed in war, this starts to exert a lot of pressure on that widely held public intuition that autonomous weapons are bad in themselves, because it puts us in the position then of insisting that we continue to use human war fighters to wage war even when we know that will contribute to many more people dying from collateral damage. That’s an uncomfortable position to defend.

Ariel: Matt, how do we deal with accountability?

Matt: Autonomous weapons are going to inherently be capable of reacting on time scales that are shorter than humans’ time scales in which they can react. I can easily imagine it reaching the point very quickly where the only way that you can counteract an attack by an autonomous weapon is with another autonomous weapon. Eventually, having humans involved in the military conflict will be the equivalent of bringing bows and arrows to a battle in World War II.

At that point, you start to wonder where human decision makers can enter into the military decision making process. Right now there’s very clear, well-established laws in place about who is responsible for specific military decisions, under what circumstances a soldier is held accountable, under what circumstances their commander is held accountable, on what circumstances the nation is held accountable. That’s going to become much blurrier when the decisions are not being made by human soldiers, but rather by autonomous systems. It’s going to become even more complicated as machine learning technology is incorporated into these systems, where they learn from their observations and experiences in the field on the best way to react to different military situations.

Ariel: Matt, in recent talks you mentioned that you’re less concerned about regulations for corporations because it seems like corporations are making an effort to essentially self-regulate. I’m interested in how that compares to concerns about government misusing AI and whether self-regulation is possible with government.

Matt: We are living in an age, with the advent of the internet, that is an inherently decentralizing force. In a decentralizing world, we’re going to have to think of new paradigms of how to regulate and govern the behavior of economic actors. It might make sense to reexamine some of those decentralized forms of regulation and one of those is industry standards and self-regulation.

One reason why I am particularly hopeful in the sphere of AI is that there really does seem to be a broad interest among the largest players in AI to proactively come up with rules of ethics and transparency in many ways that we generally just haven’t seen in the age since the Industrial Revolution.

One macro trend unfortunately in the world stage today is increasingly nationalist tendencies. That leads me to be more concerned than I would have been 10 years ago that these technologies are going to be co-opted by governments, and ironically that it’s going to be governments rather than companies that are the greatest obstacle to transparency because they will want to establish some sort of national monopoly on the technologies within their borders.

Ryan: I think that international norms of cooperation can be valuable. The United States is not a signatory to the Ottawa Treaty that banned anti-personnel landmines, but because so many other countries are, there exists the informal stigma that’s attached to it, that if we used anti-personnel landmines in battle, we’d face backlash that’s probably equivalent to if we had been signatories of that treaty.

So international norms of cooperation, they’re good for something, but they’re also fragile. For example, in much of the western world, there has existed an informal agreement that we’re not going to experiment by modifying the genetics of human embryos. So it was a shock a year or two ago when some Chinese scientists announced that they were doing just that. I think it was a wake up call to the West to realize those norms aren’t universal, and it was a valuable reminder that when it comes to things that are as significant as modifying the human genome or autonomous weapons and artificial intelligence more generally, they have such profound possibilities for reshaping human life that we should be working very stridently to try to arrive at some international agreements that are not just toothless and informal.

Ariel: I want to go in a different direction and ask about fake news. I was really interested in what you both think of this from a legal and ethical standpoint.

Matt: Because there are now so many different sources for news, it becomes increasingly difficult to decide what is real. And there is a loss that we are starting to see in our society of that shared knowledge of facts. There are literally different sets of not just worldviews, but of worlds, that people see around them.

A lot of fake news websites aren’t intentionally trying to make large amounts of money, so even if a fake news story does monumental damage, you’re not going to be able to recoup the damages to your reputation from that person or that entity. It’s an area where it’s difficult for me to envision how the law can manage that, at least unless we come up with new regulatory paradigms that reflect the fact that our world is going to be increasingly less centralized than it has been during the industrial age.

Ariel: Is there anything else that you think is important for people to know?

Ryan: There is still a great value in appreciating when we’re running roughshod over questions that we didn’t even know existed. That is one of the valuable contributions that [moral philosophers] can make here, is to think carefully about the way that we behave, the way that we design our machines to interact with one another and the kinds of effects that they’ll have on society.

It’s reassuring that people are taking these questions very seriously when it comes to artificial intelligence, and I think that the advances we’ve seen in artificial intelligence in the last couple of years have been the impetus for this turn towards the ethical implications of the things we create.

Matt: I’m glad that I got to hear Ryan’s point of view. The law is becoming a less effective tool for managing the societal changes that are happening. And I don’t think that that will change unless we think through the ethical questions and the moral dilemmas that are going to be presented by a world in which decisions and actions are increasingly undertaken by machines rather than people.

This podcast and transcript were edited by Tucker Davey.

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we interview researchers and thought leaders who we believe will help spur discussion within our community. The interviews do not necessarily represent FLI’s opinions or views.

Podcast: UN Nuclear Weapons Ban with Beatrice Fihn and Susi Snyder

Last October, the United Nations passed a historic resolution to begin negotiations on a treaty to ban nuclear weapons. Previous nuclear treaties have included the Test Ban Treaty, and the Non-Proliferation Treaty. But in the 70 plus years of the United Nations, the countries have yet to agree on a treaty to completely ban nuclear weapons. The negotiations will begin this March. To discuss the importance of this event, I interviewed Beatrice Fihn and Susi Snyder. Beatrice is the Executive Director of the International Campaign to Abolish Nuclear Weapons, also known as ICAN, where she is leading a global campaign consisting of about 450 NGOs working together to prohibit nuclear weapons. Susi is the Nuclear Disarmament Program Manager for PAX in the Netherlands, and the principal author of the Don’t Bank on the Bomb series. She is an International Steering Group member of ICAN.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

ARIEL: First, Beatrice, you spearheaded much, if not all, of this effort. Can you explain: What is the ban? What will it cover? What’s going to be prohibited? And Susi, can you weigh in as well?

BEATRICE: So, it sounds counterintuitive, but nuclear weapons are really the only weapons of mass destruction that are not prohibited by an international treaty. We prohibited chemical weapons and biological weapons, landmines and cluster munitions—but nuclear weapons are still legal for some.

We’re hoping that this treaty will be a very clear-cut prohibition; that nuclear weapons are illegal because of the humanitarian consequences that they cause if used. And it should include things like using nuclear weapons, possessing nuclear weapons, transferring nuclear weapons, assisting with those kind of things. Basically, a very straightforward treaty that makes it clear that, under international law, nuclear weapons are unacceptable.

SUSI: This whole system where some people think that nuclear weapons are legal for them, but they’re illegal for others—that’s a problem. Negotiations are going to start to make nuclear weapons illegal for everybody.

The thing is, nobody can deal with the consequences of using nuclear weapons. What better cure than to prevent it? And the way to prevent it is to ban the weapons.

ARIEL: The UN has been trying to prohibit nuclear weapons since 1945. Why has it taken this long?

BEATRICE: There is no prohibition on nuclear weapons, but there are many treaties and many regulations governing nuclear weapons. Almost all governments in the world agree that nuclear weapons are really bad and they should be eliminated. It’s a strange situation where governments, including the two—Russia and the United States—with the most nuclear weapons, agree ‘these are really horrible weapons, we don’t think they should be used. But we don’t want to prohibit them, because it still kind of suits us that we have them.’

For a very long time, I think the whole world just accepted that nuclear weapons are around. They’re this kind of mythical weapons almost. Much more than just a weapon—they’re magic. They keep peace and stability, they ended World War II, they made sure that there was no big war in Europe during the Cold War. [But] nuclear weapons can’t fight the kind of threats that we face today: climate change, organized crime, terrorism. It’s not an appropriate weapon for this millennium.

SUSI: The thing is, also, now people are talking again. And when you start talking about what it is that nuclear weapons do, you get into the issue of the fact that what they do isn’t contained by a national border. A nuclear weapon detonation, even a small one, would have catastrophic effects and would resonate around the world.

There’s been a long-time focus of making these somehow acceptable; making it somehow okay to risk global annihilation, okay to risk catastrophe. And now it has become apparent to an overwhelming majority of governments that this is not okay.

ARIEL: The majority of countries don’t have nuclear weapons. There’s only a handful of countries that actually have nuclear weapons, and the U.S. and Russia have most of those. And it doesn’t look like the U.S. and Russia are going to agree to the ban. So, if it passes, what happens then? How does it get enforced?

SUSI: If you prohibit the making, having, using these weapons and the assistance with doing those things, we’re setting a stage to also prohibit the financing of the weapons. That’s one way I believe the ban treaty is going to have a direct and concrete impact on existing nuclear arsenals. Because all the nuclear weapon possessors are modernizing their arsenals, and most of them are using private contractors to do so. By stopping the financing that goes into these private contractors, we’re going to change the game.

One of the things we found in talking to financial institutions, is they are waiting and aching for a clear prohibition because right now the rules are fuzzy. It doesn’t matter if the U.S. and Russia sign on to have that kind of impact, because financial institutions operate with their headquarters in lots of other places. We’ve seen with other weapons systems that as soon as they’re prohibited, financial institutions back off, and producers know they’re losing the money because of the stigma associated with the weapon.

BEATRICE: I think that sometimes we forget that it’s more than nine states that are involved in nuclear weapons. Sure, there’s nine states: U.S., U.K., Russia, France, China India, Pakistan, Israel, and North Korea.

But there are also five European states that have American nuclear weapons on their soil: Belgium, Germany, Netherlands, Italy, and Turkey. And in addition to that, all of the NATO states and a couple of others—such as Japan, Australia, and South Korea—are a part of the U.S. nuclear umbrella.

We’ve exposed these NATO states and nuclear umbrella states, for being a bit hypocritical. They like to think that they are promoters of disarmament, but they are ready to have nuclear weapons being used on others on their behalf. So, even countries like Norway, for example, who are a part of a nuclear weapons alliance and say that, you know, ‘the U.S. could use nuclear weapons to protect us.’ On what? Maybe cities, civilians in Russia or in China or something like that. And if we argue that people in Norway need to be protected by nuclear weapons—one of the safest countries in the world, richest countries in the world—why do we say that people in Iran can’t be protected by similar things? Or people in Lebanon, or anywhere else in the world?

This treaty makes it really clear who is okay with nuclear weapons and who isn’t. And that will create a lot of pressure on those states that enjoy the protection of nuclear weapons today, but are not really comfortable admitting it.

ARIEL: If you look at a map of the countries that opposed the resolution vs. the countries that either supported it or abstained, there is a Northern Hemisphere vs. Southern Hemisphere thing, where the majority of countries in North America, and Europe and Russia all oppose a ban, and the rest of the countries would like to see a ban. It seems that if a war were to break out between nuclear weapon countries, it would impact these northern countries more than the southern countries. I was wondering, is that the case?

BEATRICE: I think countries that have nuclear weapons somehow imagine that they are safer with them. But it makes them targets of nuclear weapons as well. It’s unlikely that anyone would use nuclear weapons to attack Senegal, for example. So I think that people in nuclear-armed states often forget that they are also the targets of nuclear weapons.

I find it very interesting as well. In some ways, we see this as a big fight for equality. A certain type of country—the richest countries in the world, the most militarily powerful with or without the nuclear weapons—have somehow taken power over the ability to destroy the entire earth. And now we’re seeing that other countries are demanding that that ends. And we see a lot of similarities to other power struggles—civil rights movements, women’s right to vote, the anti-Apartheid movement—where a powerful minority oppresses the rest of the world. And when there’s a big mobilization to change that, there’s obviously a lot of resistance. The powerful will never give up that absolute power that they have, voluntarily. I think that’s really what this treaty is about at this point.

SUSI: A lot of it is tied to money, to wealth and to an unequal distribution of wealth, or unequal perception of wealth and the power that is assumed with that unequal distribution. It costs a lot of money to make nuclear weapons, develop nuclear weapons, and it also requires an intensive extraction of resources. And some of those resources have come from some of these states that are now standing up and strongly supporting the negotiations towards the prohibition.

ARIEL: Is there anything you recommend the general public can do?

BEATRICE: We have a website that is aimed to the public, to find out a little bit more about this. We can send an email to your Foreign Minister and tweet your Foreign Minister and things like that, it’s called nuclearban.org. We’ll also make sure that the negotiations, when they’re webcast, that we’ll share that link on that website.

ARIEL: Just looking at the nuclear weapons countries, I thought it was very interesting that China, India, and Pakistan abstained from voting, and North Korea actually supported a ban. Did that come as a surprise? What does it mean?

BEATRICE: There’s a lot of dynamics going on in this, which means also that the positions are not fixed. I think countries like Pakistan, India, and China have traditionally been very supportive of the UN as a venue to negotiate disarmament. They are states that perhaps think that Russia and the U.S.—which have much more nuclear weapons—that they are the real problem. They sort of sit on the sides with their smaller arsenals, and perhaps don’t feel as much pressure in the same way that the U.S. and Russia feel to negotiate things.

And also, of course, they have very strong connections with the Southern Hemisphere countries, developing countries. Their decisions on nuclear weapons are very connected to other political issues in international relations. And when it comes to North Korea, I don’t know. It’s very unpredictable. We weren’t expecting them to vote yes, I don’t know if they will come. It’s quite difficult to predict.

ARIEL: What do you say to people who do think we still need nuclear weapons?

SUSI: I ask them why. Why do they think we need nuclear weapons? Under what circumstance is it legitimate to use a weapon that will level a city? One bomb that destroys a city, and that will cause harm not just to the people who are involved in combat. What justifies that kind of horrible use of a weapon? And what are the circumstances that you’re willing to use them? I mean, what are the circumstances where people feel it’s okay to cause this kind of destruction?

BEATRICE: Nuclear weapons are meant to destroy entire cities—that’s their inherent quality. They mass murder entire communities indiscriminately very, very fast. That’s what they are good at. The weapon itself is meant to kill civilians, and that is unacceptable.

And most people that defend nuclear weapons, they admit that they don’t want to use them. They are never supposed to be used, you are just supposed to threaten with them. And then you get into this sort of illogical debate, about how, in order for the threat to be real—and for others to perceive the threat—you have to be serious about using them. It’s very naive to think that we will get away as a civilization without them being used if we keep them around forever.

SUSI: There’s a reason that nuclear weapons have not been used in war in over 70 years: the horror they unleash is too great. Even military leaders, once they retire and are free to speak their minds, say very clearly that these are not a good weapon for military objectives.

ARIEL: I’m still going back to this— Why now? Why are we having success now?

BEATRICE: It’s very important to remember that we’ve had successes before, and very big ones as well. In 1970, the Nuclear Non-Proliferation Treaty entered into force. And that is the treaty that prevents proliferation of nuclear weapons — the treaty that said, ‘okay, we have these five states, and they’ve already developed weapons, they’re not ready to get rid of them, but at least we’ll cap it there, and no one else is allowed.’ And that really worked quite well. Only four more countries developed nuclear weapons after that. But the rest of the world understood that it was a bad idea. And the big bargain in that treaty was that the five countries that got to keep their nuclear weapons only got to keep them for a while—they committed, that one day they would disarm, but there was no timeline in the treaty. So I think that was a huge success.

In the ‘80s, we saw these huge, huge public mobilization movements and millions of people demonstrating on the street trying to stop the nuclear arms race. And they were very successful as well. They didn’t get total nuclear disarmament, but the nuclear freeze movement achieved a huge victory.

We were very, very close to disarmament at the Reykjavik summit with Gorbachev and Reagan. And that was also a huge success. Governments negotiated the Comprehensive Test Ban Treaty, which prevents countries from testing nuclear weapons. And that hasn’t entered into force yet, but almost all states have signed it. It has not been ratified by some key players, like the United States, but the norm is still there, and it’s been quite an effective treaty despite that it’s not yet entered into force. Only one state has continued testing, and that’s North Korea, since the treaty was signed.

But somewhere along the way we got very focused on non-proliferation and trying to stop the testing, stop them producing fissile material, and we forgot to work on the fundamental delegitimization of nuclear weapons. We forgot to say that nuclear weapons are unacceptable. That is what we’re trying to do right now.

SUSI: The world is different in a lot of ways than it was in 1945. The UN is different in a lot of ways. Remember, one of the purposes of the UN at the outset was to help countries decolonize and to restore them to their own people, and that process took some time. In a lot of those countries, those former colonized societies are coming back and saying, ‘well, we have a voice of global security as well, and this is part of ensuring our security.’

This is the moment where this perfect storm has come; we’re prohibiting illegitimate weapons. It’s going to be fun!

BEATRICE: I think that we’ve been very inspired in ICAN by the campaigns to ban landmines and the campaigns to ban cluster munitions, because they were a different type of treaty. Obviously chemical weapons were prohibited, biological weapons were prohibited, but the landmine and cluster munition processes of prohibition that were developed on those weapons were about stigmatizing the weapon, and they didn’t need all states to be on board with it. And we saw that it worked. Just a few years ago, the United States—who never signed the landmines treaty—announced that it’s basically complying with the treaty. They have one exception at the border of South Korea. That means that they can’t sign it, but otherwise they are complying with it. The market for landmines is pretty much extinct—nobody wants to produce them anymore because countries have banned and stigmatized them.

And with cluster munitions we see a similar trend. We’ve seen those two treaties work, and I think that’s also why we feel confident that we can move ahead this time, even without the nuclear-armed states onboard. It will have an impact anyway.

To learn more about the ban and how you can help encourage your country to support the ban, visit nuclearban.org and icanw.org.

This podcast was edited by Tucker Davey.

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we interview researchers and thought leaders who we believe will help spur discussion within our community. The interviews do not necessarily represent FLI’s opinions or views.

Podcast: Top AI Breakthroughs, with Ian Goodfellow and Richard Mallah

2016 saw some significant AI developments. To talk about the AI progress of the last year, we turned to Richard Mallah and Ian Goodfellow. Richard is the director of AI projects at FLI, he’s the Senior Advisor to multiple AI companies, and he created the highest-rated enterprise text analytics platform. Ian is a research scientist at OpenAI, he’s the lead author of the Deep Learning textbook, and he’s a lead inventor of Generative Adversarial Networks.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

Ariel: Two events stood out to me in 2016. The first was AlphaGo, which beat the world’s top Go champion, Lee Sedol last March. What is AlphaGo, and why was this such an incredible achievement?

Ian: AlphaGo was DeepMind’s system for playing the game of Go. It’s a game where you place stones on a board with two players, the object being to capture as much territory as possible. But there are hundreds of different positions where we can place a stone on each turn. It’s not even remotely possible to use a computer to simulate many different Go games and figure out how the game will progress in the future. The computer needs to rely on intuition the same way that human Go players can look at a board and get kind of a sixth sense that tells them whether the game is going well or poorly for them, and where they ought to put the next stone. It’s computationally infeasible to explicitly calculate what each player should do next.

Richard: The DeepMind team has one network for what’s called value learning and another deep network for policy learning. The policy is, basically, which places should I evaluate for the next piece. The value network is how good that state is, in terms of the probability that the agent will be winning. And then they do a Monte Carlo tree search, which means it has some randomness and many different paths — on the order of thousands of evaluations. So it’s much more like a human considering a handful of different moves and trying to determine how good those moves would be.

Ian: From 2012 to 2015 we saw a lot of breakthroughs where the exciting thing was that AI was able to copy a human ability. In 2016, we started to see breakthroughs that were all about exceeding human performance. Part of what was so exciting about AlphaGo was that AlphaGo did not only learn how to predict what a human expert Go player would do, AlphaGo also improved beyond that by practicing playing games against itself and learning how to be better than the best human player. So we’re starting to see AI move beyond what humans can tell the computer to do.

Ariel: So how will this be applied to applications that we’ll interact with on a regular basis? How will we start to see these technologies and techniques in action ourselves?

Richard: With these techniques, a lot of them are research systems. It’s not necessarily that they’re going to directly go down the pipeline towards productization, but they are helping the models that are implicitly learned inside of AI systems and machine learning systems to get much better.

Ian: There are other strategies for generating new experiences that resemble previously seen experiences. One of them is called WaveNet. It’s a model produced by DeepMind in 2016 for generating speech. If you provide a sentence, just written down, and you’d like to hear that sentence spoken aloud, WaveNet can create an audio waveform that sounds very realistically like a human pronouncing that sentence written down. The main drawback to WaveNet right now is that it’s fairly slow. It has to generate the audio waveform one piece at a time. I believe it takes WaveNet two minutes to produce one second of audio, so it’s not able to make the audio fast enough to hold an interactive conversation.

Richard: And similarly, we’ve seen applications to colorizing black and white photos, or turning sketches into somewhat photo-realistic images, being able to turn text into images.

Ian: Yeah one thing that really highlights how far we’ve come is that in 2014, one of the big breakthroughs was the ability to take a photo and produce a sentence summarizing what was in the photo. In 2016, we saw different methods for taking a sentence and producing a photo that contains the imagery described by the sentence. It’s much more complicated to go from a few words to a very realistic image containing thousands or millions of pixels than it is to go from the image to the words.

Another thing that was very exciting in 2016 was the use of generative models for drug discovery. Instead of imagining new images, the model could actually imagine new molecules that are intended to have specific medicinal effects.

Richard: And this is pretty exciting because this is being applied towards cancer research, developing potential new cancer treatments.

Ariel: And then there was Google’s language translation program, Google Neural Machine Translation. Can you talk about what that did and why it was a big deal?

Ian: It’s a big deal for two different reasons. First, Google Neural Machine Translation is a lot better than previous approaches to machine translation. Google Neural Machine Translation removes a lot of the human design elements, and just has a neural network figure out what to do.

The other thing that’s really exciting about Google Neural Machine Translation is that the machine translation models have developed what we call an “Interlingua.” It used to be that if you wanted to translate from Japanese to Korean, you had to find a lot of sentences that had been translated from Japanese to Korean before, and then you could train a machine learning model to copy that translation procedure. But now, if you already know how to translate from English to Korean, and you know how to translate from English to Japanese, in the middle, you have Interlingua. So you translate from English to Interlingua and then to Japanese, English to Interlingua and then to Korean. You can also just translate Japanese to Interlingua and Korean to Interlingua and then Interlingua to Japanese or Korean, and you never actually have to get translated sentences from every pair of languages.

Ariel: How can the techniques that are used for language apply elsewhere? How do you anticipate seeing this developed in 2017 and onward?

Richard: So I think what we’ve learned from the approach is that deep learning systems are able to create extremely rich models of the world that can actually express what we can think, which is a pretty exciting milestone. Being able to combine that Interlingua with more structured information about the world is something that a variety of teams are working on — it is a big, open area for the coming years.

Ian: At OpenAI one of our largest projects, Universe, allows a reinforcement learning agent to play many different computer games, and it interacts with these games in the same way that a human does, by sending key presses or mouse strokes to the actual game engine. The same reinforcement learning agent is able to interact with basically anything that a human can interact with on a computer. By having one agent that can do all of these different things we will really exercise our ability to create general artificial intelligence instead of application-specific artificial intelligence. And projects like Google’s Interlingua have shown us that there’s a lot of reason to believe that this will work.

Ariel: What else happened this year that you guys think is important to mention?

Richard: One-shot [learning] is when you see just a little bit of data, potentially just one data point, regarding some new task or some new category, and you’re then able to deduce what that class should look like or what that function should look like in general. So being able to train systems on very little data from just general background knowledge, will be pretty exciting.

Ian: One thing that I’m very excited about is this new area called machine learning security where an attacker can trick a machine learning system into taking the wrong action. For example, we’ve seen that it’s very easy to fool an object-recognition system. We can show it an image that looks a lot like a panda and it gets recognized as being a school bus, or vice versa. It’s actually possible to fool machine learning systems with physical objects. There was a paper called Accessorize to a Crime, that showed that by wearing unusually-colored glasses it’s possible to thwart a face recognition system. And my own collaborators at GoogleBrain and I wrote a paper called Adversarial Examples in the Physical World, where we showed that we can make images that look kind of grainy and noisy, but when viewed through a camera we can control how an object-recognition system will respond to those images.

Ariel: Is there anything else that you thought was either important for 2016 or looking forward to 2017?

Richard: Yeah, looking forward to 2017 I think there will be more focus on unsupervised learning. Most of the world is not annotated by humans. There aren’t little sticky notes on things around the house saying what they are. Being able to process [the world] in a more unsupervised way will unlock a plethora of new applications.

Ian: It will also make AI more democratic. Right now, if you want to use really advanced AI you need to have not only a lot of computers but also a lot of data. That’s part of why it’s mostly very large companies that are competitive in the AI space. If you want to get really good at a task you basically become good at that task by showing the computer a million different examples. In the future, we’ll have AI that can learn much more like a human learns, where just showing it a few examples is enough. Once we have machine learning systems that are able to get the general idea of what’s going on very quickly, in the way that humans do, it won’t be necessary to build these gigantic data sets anymore.

Richard: One application area I think will be important this coming year is automatic detection of fake news, fake audio and fake images and fake video. Some of the applications this past year have actually focused on generating additional frames of video. As those get better, as the photo generation that we talked about earlier gets better, and also as audio templating gets better… I think it was Adobe that demoed what they called PhotoShop for Voice, where you can type something in and select a person, and it will sound like that person saying whatever it is that you typed. So we’ll need ways of detecting that, since this whole concept of fake news is quite at the fore these days.

Ian: It’s worth mentioning that there are other ways of addressing the spread of fake news. Email spam uses a lot of different clues that it can statistically associate with whether people mark the email as spam or not. We can do a lot without needing to advance the underlying AI systems at all.

Ariel: Is there anything that you’re worried about, based on advances that you’ve seen in the last year?

Ian: The employment issue. As we’re able to automate our tasks in the future, how will we make sure that everyone benefits from that automation? And the way that society is structured, right now increasing automation seems to lead to increasing concentration of wealth, and there are winners and losers to every advance. My concern is that automating jobs that are done by millions of people will create very many losers and a small number of winners who really win big.

Richard: I’m also slightly concerned with the speed at which we’re approaching additional generality. It’s extremely cool to see systems be able to do lots of different things, and being able to do tasks that they’ve either seen very little of or none of before. But it raises questions as to when we implement different types of safety techniques. I don’t think that we’re at that point yet, but it raises the issue.

Ariel: To end on a positive note: looking back on what you saw last year, what has you most hopeful for our future?

Ian: I think it’s really great that AI is starting to be used for things like medicine. In the last year we’ve seen a lot of different machine learning algorithms that could exceed human abilities at some tasks, and we’ve also started to see the application of AI to life-saving application areas like designing new medicines. And this makes me very hopeful that we’re going to start seeing superhuman drug design, and other kinds of applications of AI to just really make life better for a lot of people in ways that we would not have been able to do without it.

Richard: Various kinds of tasks that people find to be drudgery within their jobs will be automatable. That will lead them to be open to working on more value-added things with more creativity, and potentially be able to work in more interesting areas of their field or across different fields. I think the future is wide open and it’s really what we make of it, which is exciting in itself.

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we interview researchers and thought leaders who we believe will help spur discussion within our community. The interviews do not necessarily represent FLI’s opinions or views.