Posts in this category get featured at the top of the front page.

AI Alignment Podcast: An Overview of Technical AI Alignment with Rohin Shah (Part 1)

The space of AI alignment research is highly dynamic, and it’s often difficult to get a bird’s eye view of the landscape. This podcast is the first of two parts attempting to partially remedy this by providing an overview of the organizations participating in technical AI research, their specific research directions, and how these approaches all come together to make up the state of technical AI alignment efforts. In this first part, Rohin moves sequentially through the technical research organizations in this space and carves through the field by its varying research philosophies. We also dive into the specifics of many different approaches to AI safety, explore where they disagree, discuss what properties varying approaches attempt to develop/preserve, and hear Rohin’s take on these different approaches.

You can take a short (3 minute) survey to share your feedback about the podcast here.

In this podcast, Lucas spoke with Rohin Shah. Rohin is a 5th year PhD student at UC Berkeley with the Center for Human-Compatible AI, working with Anca Dragan, Pieter Abbeel and Stuart Russell. Every week, he collects and summarizes recent progress relevant to AI alignment in the Alignment Newsletter

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, iTunes, Google Play, Stitcher, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

Topics discussed in this episode include:

  • The perspectives of CHAI, MIRI, OpenAI, DeepMind, FHI, and others
  • Where and why they disagree on technical alignment
  • The kinds of properties and features we are trying to ensure in our AI systems
  • What Rohin is excited and optimistic about
  • Rohin’s recommended reading and advice for improving at AI alignment research

Lucas: Hey everyone, welcome back to the AI Alignment podcast. I’m Lucas Perry, and today we’ll be speaking with Rohin Shah. This episode is the first episode of two parts that both seek to provide an overview of the state of AI alignment. In this episode, we cover technical research organizations in the space of AI alignment, their research methodologies and philosophies, how these all come together on our path to beneficial AGI, and Rohin’s take on the state of the field.

As a general bit of announcement, I would love for this podcast to be particularly useful and informative for its listeners, so I’ve gone ahead and drafted a short survey to get a better sense of what can be improved. You can find a link to that survey in the description of wherever you might find this podcast, or on the page for this podcast on the FLI website.

Many of you will already be familiar with Rohin, he is a fourth year PhD student in Computer Science at UC Berkeley with the Center For Human-Compatible AI, working with Anca Dragan, Pieter Abbeel, and Stuart Russell. Every week, he collects and summarizes recent progress relevant to AI alignment in the Alignment Newsletter. And so, without further ado, I give you Rohin Shah.

Thanks so much for coming on the podcast, Rohin, it’s really a pleasure to have you.

Rohin: Thanks so much for having me on again, I’m excited to be back.

Lucas: Yeah, long time, no see since Puerto Rico Beneficial AGI. And so speaking of Beneficial AGI, you gave quite a good talk there which summarized technical alignment methodologies approaches and broad views, at this time; and that is the subject of this podcast today.

People can go and find that video on YouTube, and I suggest that you watch that; that should be coming out on the FLI YouTube channel in the coming weeks. But for right now, we’re going to be going in more depth, and with more granularity into a lot of these different technical approaches.

So, just to start off, it would be good if you could contextualize this list of technical approaches to AI alignment that we’re going to get into within the different organizations that they exist at, and the different philosophies and approaches that exist at these varying organizations.

Rohin: Okay, so disclaimer, I don’t know all of the organizations that well. I know that people tend to fit CHAI in a particular mold, for example; CHAI’s the place that I work at. And I mostly disagree with that being the mold for CHAI, so probably anything I say about other organizations is also going to be somewhat wrong; but I’ll give it a shot anyway.

So I guess I’ll start with CHAI. And I think our public output mostly comes from this perspective of how do we get AI systems to do what we want? So this is focusing on the alignment problem, how do we actually point them towards a goal that we actually want, align them with our values. Not everyone at CHAI takes this perspective, but I think that’s the one most commonly associated with us and it’s probably the perspective on which we publish the most. It’s also the perspective I, usually, but not always, take.

MIRI, on the other hand, takes a perspective of, “We don’t even know what’s going on with intelligence. Let’s try and figure out what we even mean by intelligence, what it means for there to be a super-intelligent AI system, what would it even do or how would we even understand it; can we have a theory of what all of this means? We’re confused, let’s be less confused, once we’re less confused, then we can think about how to actually get AI systems to do good things.” That’s one of the perspectives they take.

Another perspective they take is that there’s a particular problem with AI safety, which is that, “Even if we knew what goals we wanted to put into an AI system, we don’t know how to actually build an AI system that would, reliably, pursue those goals as opposed to something else.” That problem, even if you know what you want to do, how do you get an AI system to do it, is a problem that they focus on. And the difference from the thing I associated with CHAI before is that, with the CHAI perspective, you’re interested both in how do you get the AI system to actually pursue the goal that you want, but also how do you figure out what goal that you want, or what is the goal that you want. Though, I think most of the work so far has been on supposing you know the goal, how do you get your AI system to properly pursue it?

I think DeepMind safety came, at least, is pretty split across many different ways of looking at the problem. I think Jan Leike, for example, has done a lot of work on reward modeling, and this sort of fits in with the how do we get our AI systems be focused on the right task, the right goal. Whereas Vika has done a lot of work on side effects or impact measures. I don’t know if Vika would say this, but the way I interpret it how do we impose a constraint upon the AI system such that it never does anything catastrophic? But it’s not trying to get the AI system to do what we want, just not do what we don’t want, or what we think would be catastrophically bad.

OpenAI safety also seems to be, okay how do we get deep enforcement learning to do good things, to do what we want, to be a bit more robust? Then there’s also the iterated amplification debate factored cognition area of research, which is more along the lines of, can we write down a system that could, plausibly, lead to us building an aligned AGI or aligned powerful AI system?

FHI, no coherent direction, that’s all of FHI. Eric Drexler is also trying to understand how AI will develop it in the future is somewhat very different from what MIRI’s doing, but the same general theme of trying to figure out what is going on. So he just recently published a long technical report on comprehensive AI services, which is the general worldview for predicting what AI development will look like in the future. If we believed that that was, in fact, the way AI would happen, we would probably change what we work on from the technical safety point of view.

And Owain Evans does a lot of stuff, so maybe I’m just not going to try to categorize him. And then Stuart Armstrong works on this, “Okay, how do we get value learning to work such that we actually infer a utility function that we would be happy for an AGI system to optimize, or a super-intelligent AI system to optimize?”

And then Ought works on factory cognition, so it’s very adjacent to be iterated amplification and debate research agendas. Then there’s a few individual researchers, scattered, for example, Toronto, Montreal, and AMU and EPFL, maybe I won’t get into all of them because, yeah, that’s a lot; but we can delve into that later.

Lucas: Maybe a more helpful approach, then, would be if you could start by demystifying some of the MiRI stuff a little bit; which may seem most unusual.

Rohin: I guess, strategically, the point would be that you’re trying to build this AI system that’s going to be, hopefully, at some point in the future vastly more intelligent than humans, because we want them to help us colonize the universe or something like that, and lead to lots and lots of technological progress, etc., etc.

But this, basically, means that humans will not be in control unless we very, very specifically arrange it such that we are in control; we have to thread the needle, perfectly, in order to get this to work out. In the same way that, by default you, would expect that the most intelligent creatures, beings are the ones that are going to decide what happens. And so we really need to make sure and, also it’s probably hard to ensure, that these vastly more intelligent beings are actually doing what we want.

Given that, it seems like what we want is a good theory that allows us to understand and predict what these AI systems are going to do. Maybe not in the fine nitty, gritty details, because if we could predict what they would do, then we could do it ourselves and be just as intelligent as they are. But, at least, in broad strokes what sorts of universes are they going to create?

But given that they can apply so much more intelligence that we can, we need our guarantees to be really, really strong; like almost proof level. Maybe actual proofs are a little too much to expect, but we want to get as close to it as possible. Now, if we want to do something like that, we need a theory of intelligence; we can’t just sort of do a bunch of experiments, look at the results, and then try to extrapolate from there. Extrapolation does not give you the level of confidence that we would need for a problem this difficult.

And so rather, they would like to instead understand intelligence deeply, deconfuse themselves about it. Once you understand how intelligence works at a theoretical level, then you can start applying that theory to actual AI systems and seeing how they approximate the theory, or make predictions about what different AI systems will do. And, hopefully, then we could say, “Yeah, this system does look like it’s going to be very powerful as approximating this particular idea, this particular part of theory of intelligence. And we can see that with this particular theory of intelligence, we can align it with humans somehow, and you’d expect that this was going to work out.” Something like that.

Now, that sounded kind of dumb even to me as I was saying it, but that’s because we don’t have the theory yet; it’s very fun to speculate how you would use the theory before you actually have the theory. So that’s the reason they’re doing this, the actual thing that they’re focusing on is centered around problems of embedded agency. And I should say this is one of their, I think, two main strands of research, the other stand of research, I do not know anything about because they have not published anything about it.

But one of their strands of research is about embedded agency. And here the main point is that in the real world, any agent, any AI system, or a human is a part of their environment. They are smaller than the environment and the distinction between agent and environment is not crisp. Maybe I think of my body as being part of me but, I don’t know, to some extent, my laptop is also an extension of my agency; there’s a lot of stuff I can do with it.

Or, on the other hand, you could think maybe my arms and limbs aren’t actually a part of me, I could maybe get myself uploaded at some point in the future, and then I will no longer have arms or legs; but in some sense I am still me, I’m still an agent. So, this distinction is not actually crisp, and we always pretend that it is in AI, so far. And it turns out that once you stop making this crisp distinction and start allowing the boundary to be fuzzy, there are a lot of weird, interesting problems that show up and we don’t know how to deal with any of them, even in theory, so that’s what they focused on.

Lucas: And can you unpack, given that AI researchers control of the input/output channels for AI systems, why is it that there is this fuzziness? It seems like you could extrapolate away the fuzziness given that there are these sort of rigid and selected IO channels.

Rohin: Yeah, I agree that seems like the right thing for today’s AI systems; but I don’t know. If I think about, “Okay, this AGI is a generally intelligent AI system.” I kind of expect it to recognize that when we feed it inputs which, let’s say, we’re imagining a money maximizing AI system that’s taking in inputs like stock prices, and it outputs which stocks to buy. And maybe it can also read the news that lets it get newspaper articles in order to make better decisions about which stocks to buy.

At some point, I expect this AI system to read about AI and humans, and realize that, hey, it must be an AI system, it must be getting inputs and outputs. Its reward function must be to make this particular number in a bank account be as high as possible and then once it realizes this, there’s this part of the world, which is this number in the bank account, or it could be this particular value, this particular memory block in its own CPU, and its goal is now make that number as high as possible.

In some sense, it’s now modifying itself, especially if you’re thinking of the memory block inside the CPU. If it goes and edits that and sets that to a million, a billion, the highest number possible in that memory block, then it seems like it has, in some sense, done some self editing; it’s changed the agent part of it. It could also go and be like, “Okay actually what I care about is this particular award function box is supposed to output as high a number as possible. So what if I go and change my input channels such that it feeds me things that caused me to believe that I’ve made tons and tons of profit?” So this is a delusion backs consideration.

While it is true that I don’t see a clear, concrete way that an AI system ends up doing this, it does feel like an intelligent system should be capable of this sort of reasoning, even if it initially had these sort of fixed inputs and outputs. The idea here is that its outputs can be used to affect the inputs or future outputs.

Lucas: Right, so I think that that point is the clearest summation of this; it can affect its own inputs and outputs later. If you take human beings who are, by definition, human level intelligences we have, say, in a classic computer science sense if you thought of us, you’d say we strictly have five input channels: hearing seeing, touch, smell, etc.

Human beings have a fixed number of input/output channels but, obviously, human beings are capable of self modifying on those. And our agency is sort of squishy and dynamic in ways that would be very unpredictable, and I think that that unpredictability and the sort of almost seeming ephemerality of being an agent seems to be the crux of a lot of the problem.

Rohin: I agree that that’s a good intuition pump, I’m not sure that I agree it’s the crux. The crux, to me, it feels more like you specify some sort of behavior that you want which, in this case, was make a lot of money or make this number in a bank account go higher, or make this memory cell go as high as possible.

And when you were thinking about the specification, you assumed that the inputs and outputs fell within some strict parameters, like the inputs are always going to be news articles that are real and produced by human journalists, as opposed to a fake news article that was created by the AI in order to convince the reward function that actually it’s made a lot of money. And then the problem is that since the AI’s outputs can affect the inputs, the AI could cause the inputs to go outside of the space of possibilities that you imagine the inputs could be in. And this then allows the AI to game the specification that you had for it.

Lucas: Right. So, all the parts which constitute some AI system are all, potentially, modified by other parts. And so you have something that is fundamentally and completely dynamic, which you’re trying to make predictions about, but whose future structure is potentially very different and hard to predict based off of the current structure?

Rohin: Yeah, basically.

Lucas: And that in order to get past this we must, again, tunnel down on this decision theoretic and rational agency type issues at the bottom of intelligence to sort of have a more fundamental theory, which can be applied to these highly dynamic and difficult to understand situations?

Rohin: Yeah, I think the MIRI perspective is something like that. And in particular, it would be like trying to find a theory that allows you to put in something that stays stable even while the system, itself, is very dynamic.

Lucas: Right, even while your system, whose parts are all completely dynamic and able to be changed by other parts, how do you maintain a degree of alignment amongst that?

Rohin: One answer to this is give the AI a utility function. There is a utility function that’s explicitly trying to maximize that and in that case, it probably has an incentive in order to keep that to protect that the utility function, because if it gets changed, well then it’s not going to maximize that utility function anymore, it’ll maximize something else which will lead to worse behavior by the likes of the original utility function. That’s a thing that you could hope to do with a better theory of intelligence is, how do you create a utility function in an AI system stays stable, even as everything else is dynamically changing?

Lucas: Right, and without even getting into the issues of implementing one single stable utility function.

Rohin: Well, I think they’re looking into those issues. So, for example, Vingean Reflection is a problem that is entirely about how you create better, more improved version of yourself without having any value drift, or a change to the utility function.

Lucas: Is your utility function not self-modifying?

Rohin: So in theory, it could be. The hook would be that we could design an AI system that does not self-modify its utility function under almost all circumstances. Because if you change your utility function, then you’re going to start maximizing that new utility function which, by the original utility function’s evaluation, is worse. If I told you, “Lucas, you have got to go fetch coffee.” That’s the only thing in life you’re concerned about. You must take whatever actions are necessary in order to get the coffee.

And then someone goes like, “Hey Lucas, I’m going to change your utility function so that you want to fetch tea instead.” And then all of your decision making is going to be in service of getting tea. You would probably say, “No, don’t do that, I want to fetch coffee right now. If you change my utility function for being ‘fetch tea’, then I’m going to fetch tea, which is bad because I want to fetch coffee.” And so, hopefully, you don’t change your utility function because of this effect.

Lucas: Right. But isn’t this where corrigibility comes in, and where we admit that as we sort of understand more about the world and our own values, we want to be able to update utility functions?

Rohin: Yeah, so that is a different perspective; I’m not trying to describe that perspective right now. It’s a perspective for how you could get something stable in an AI system. And I associate it most with Eliezer, though I’m not actually sure if he holds this opinion.

Lucas: Okay, so I think this was very helpful for the MIRI case. So why don’t we go ahead and zoom in, I think, a bit on CHAI, which is the Center For Human-Compatible AI.

Rohin: So I think rather than talking about CHAI, I’m going to talk about the general field of trying to get AI systems do what we want; a lot of people at CHAI work on that but not everyone. And also a lot of people outside of CHAI work on that, because that seems to become more useful carving of the field. So there’s this broad argument for AI safety which is, “We’re going to have very intelligent things based on the orthagonality thesis, we can’t really say anything about their goals.” So, the really important thing is to make sure that the intelligence is pointed at the right goals, it’s pointed at doing what we actually want.

And so then the natural approach is, how do we get our AI systems to infer what we want to do and then actually pursue that? And I think, in some sense, it’s one of the most obvious approaches to AI safety. This is a clear enough problem, even with narrow current systems that there are plenty of people outside of AI safety working on this, as well. So this incorporates things like inverse reinforcement learning, preference learning, reward modeling, the CIRL cooperative IRL paper also fits into all of this. So yeah, I can begin to ante up those in more depth.

Lucas: Why don’t you start off by talking about the people who exist within the field of AI safety, give sort of a brief characterization of what’s going on outside of the field, but primarily focusing on those within the field. How this approach, in practice, I think generally is, say, different from MIRI to start off with, because we have a clear picture of them painted right next to what we’re delving into now.

Rohin: So I think difference of MiRI is that this is more targeted directly at the problem right now, in that you’re actually trying to figure out how do you build an AI system that does what you want. Now, admittedly, most of the techniques that people have come up with are not likely to scale up to super-intelligent AI, they’re not meant to, no one claims that they’re going to scale up to super-intelligent AI. They’re more like some incremental progress on figuring out how to get AI systems to do what we want and, hopefully, with enough incremental progress, we’ll get to a point where we can go, “Yes, this is what we need to do.”

Probably the most well known person here would be Dylan Hadfield-Menell, who you had on your podcast. And so he talked about CIRL and associated things quite a bit there, there’s not really that much I would say in addition to it. Maybe a quick summary of Dylan’s position is something like, “Instead of having AI systems that are optimizing for their own goals, we need to have AI systems that are optimizing for our goals, and try to infer our goals in order to do that.”

So rather than having an AI system that is individually rational with respect to its own goals, you instead want to have a human AI system such that the entire system is rationally optimizing for the human’s goals. This is sort of the point made by CIRL, where you have an AI system, you’ve got a human, they’re playing those two player game, the humans is the only one who knows the reward function, the robot is uncertain about what the reward function is, and has to learn by observing what the humans does.

And so, now you see that the robot does not have a utility function that it is trying to optimize; instead is learning about a utility function that the human has and then helping the human optimize that reward function. So summary, try to build human AI systems that are group rational, as opposed to an AI system that is individually rational; so that’s Dylan’s view. Then there’s Jan Leike at DeepMind, and a few people at OpenAI.

Lucas: Before we pivot into OpenAI and DeepMind, just sort of focusing here on the CHAI end of things and this broad view, and help me explain here how you would characterize it. The present day actively focused view on current issues, and present day issues and alignment and making incremental progress there. This view here you see as a sort of subsuming multiple organizations?

Rohin: Yes, I do.

Lucas: Okay. Is there a specific name you would, again, use to characterize this view?

Rohin: Oh, getting AI systems to do what we want. Let’s see, do I have a pithy name for this? Helpful AI systems or something.

Lucas: Right which, again, is focused on current day things, is seeking to make incremental progress, and which subsumes many different organizations?

Rohin: Yeah, that seems broadly true. I do think there are people who are doing more conceptual work, thinking about how this will scale to AGI and stuff like that; but it’s a minority of work in the space.

Lucas: Right. And so the question of how do we get AI systems to do what we want them to do, also includes these views of, say, Vingean Reflection or how we become idealized versions of ourselves, or how we build on value over time, right?

Rohin: Yeah. So, those are definitely questions that you would need to answer at some point. I’m not sure that you would need to answer Vingean Reflection at some point. But you would definitely need to answer how do you update, given that humans don’t actually know what they want, for a long-term future; you need to be able to deal with that fact at some point. It’s not really a focus of current research, but I agree that that is a thing about this approach will have to deal with, at some point.

Lucas: Okay. So, moving on from you and Dylan to DeepMind and these other places that you view as this sort of approach also being practice there?

Rohin: Yeah, so while Dylan and I and other at CHAI has been focused on sort of conceptual advances, like in toy environments, does this do the right thing? What are some sorts of data that we can learn from? Do they work in these very simple environments with quite simple algorithms? I would say that OpenAI and DeepMind safety teams are more focused on trying to get this to work in complex environments of the sort that we’re getting this to work on state-of-the-art environments, the most complex ones that we have.

Now I don’t mean DoTA and StarCraft, because running experiments with DoTAi and StarCraft is incredibly expensive, but can we get AI systems that do what we want for environments like Atari or MuJoCo? There’s some work on this happening at CHAI, there are pre-prints available online, but it hasn’t been published very widely yet. Most of the work, I would say, has been happening with an OpenAI/DeepMind collaboration, and most recently, there was a position paper from DeepMind on recursive reward modeling.

Right before that there was also a paper on combining first a paper, deeper enforcement learning from human preferences, which said, “Okay if we allow humans to specify what they want by just comparing between different pieces of behavior from the AI system, can we train an AI system to do what the human wants?” And then they built on that in order to create a system that could learn from demonstrations, initially, using a kind of imitation learning, and then improve upon the demonstrations using comparisons in the same way that deep RL from human preferences did.

So one way that you can do this research is that there’s this field of human computer interaction, which is about … well, it’s about many things. But one of the things that it’s about is how do you make the user interface for humans intuitive and easy to use such that you don’t have user error or operator? One comment from people that I liked is that most of the things that are classified as ‘user error’ or ‘operator error’ should not be classified as such, they should be classified as ‘interface errors’ where you had such a confusing interface that well, of course, at some point some user was going to get it wrong.

And similarly, here, what we want is a particular behavior out of the AI, or at least a particular set of outcomes from the AI; maybe we don’t know exactly how to achieve those outcomes. And AI is about giving us the tools to create that behavior in automated systems. The current tool that we all use is the reward function, we write down the reward function and then we give it to an algorithm, and it produces behaviors and the outcomes that we want.

And reward functions, they’re just a pretty terrible user interface, they’re better than the previous interface which is writing a program explicitly, which humans cannot do it if the task is something like image classification or continuous control in MuJoCo; it’s an improvement upon that. But reward functions are still a pretty poor interface, because they’re implicitly saying that they encode perfect knowledge of the optimal behavior in all possible environments; which is clearly not a thing that humans can do.

I would say that this area is about moving on from reward functions, going to the next thing that makes the human’s job even easier. And so we’ve got things like comparisons, we’ve got things like inverse award design where you specify a proxy to work function that only needs to work in the training environment. Or you do something like inverse reinforcement learning, where you learn from demonstrations; so I think that’s one nice way of looking at this field.

Lucas: So do you have anything else you would like to add on here about how we present-day get AI systems to do what we want them to do, section of the field?

Rohin: Maybe I want to plug my value learning sequence, because it talks about this much more eloquently than I can on this podcast?

Lucas: Sure. Where can people find your value learning sequence?

Rohin: It’s on the Alignment Forum. You just go to the Alignment Forum, at the top there’s ‘Recommended Sequences’, there’s ‘Embedded Agency’, which is from MIRI, the sort of stuff we already talked about; so that’s also great sequence, I would recommend it. There’s iterated amplification, also great sequence we haven’t talked about it yet. And then there’s my value learning sequence, so you can see it on the front page of the Alignment Forum.

Lucas: Great. So we’ve characterized these, say, different parts of the AI alignment field. And probably just so far it’s been cut into this sort of MIRI view, and then this broad approach of trying to get present-day AI systems to do what we want them to do, and to make incremental progress there. Are there any other slices of the AI alignment field that you would like to bring to light?

Rohin: Yeah, I’ve got four or five more. There’s the interated amplification and debate side of things, which is how do we build using current technologies, but imagining that they were way better? How do we build and align AGI? So they’re trying to solve the entire problem, as opposed to making incremental progress and, simultaneously, hopefully thinking about, conceptually, how do we fit all of these pieces together?

There’s limiting the AGI system, which is more about how do we prevent AI systems from behaving catastrophically? It makes no guarantees about the AI systems doing what we want, it just prevents them from doing really, really bad things. Techniques in that section includes boxing and avoiding side effects. There’s the robustness view, which is about how do we make AI systems well behaved or robustly? I guess that’s pretty self explanatory.

There’s transparency or interpretability, which I wouldn’t say is a technique by itself, but seems to be broadly useful for almost all of the other avenues, it’s something we would want to add to other techniques in order to make those techniques more effective. There’s also, in the same frame as MIRI, can we even understand intelligence? Can we even forecast what’s going to happen with AI? And within that, there’s comprehensive AI services.

here’s also lots of efforts on forecasting, but comprehensive AI services actually makes claims about what technical AI safety should do. So I think that one actually does have a place in this podcast, whereas most of the forecasting things do not, obviously. They have some implications on the strategic picture, but they don’t have clear implications on technical safety research directions, as far as I can tell it right now.

Lucas: Alright, so, do you want to go ahead and start off with the first one on the list there And then we’ll move sequentially down?

Rohin: Yeah, so iterated amplification and debate. This is similar to the helpful AGI section in the sense that we are trying to build an AI system that does what we want. That’s still the case here, but we’re now trying to figure out, conceptually, how can we do this using things like reinforcement learning and supervised learning, but imagining that they’re way better than they are right now? Such that the resulting agent is going to be aligned with us and reach arbitrary levels of intelligence; so in some sense, it’s trying to solve the entire problem.

We want to come up with a scheme such that if we run that scheme, we get good outcomes, we’ve solved almost all the problem. I think that it also differs in that the argument for why we can be successful is also different. This field is aiming to get a property of corrigibility, which I like to summarize as trying to help the overseer. It might fail to help the overseer, or the human, or the user, because it’s not very competent and maybe it makes a mistake and things that I like apples when actually I want oranges. But it was actually trying to help me; it actually thought I wanted apples.

So in corrigibility, you’re trying to help the overseer, whereas, in the previous thing about helpful AGI, you’re more getting an AI system that actually does what we want; there isn’t this distinction between what you’re trying to do versus what you actually do. So there’s a slightly different property that you’re trying to ensure, I think, on the strategic picture that’s the main difference.

The other difference is that these approaches are trying to make a single, unified generally intelligent AI system, and so they will make assumptions like, given that we’re trying to imagine something that’s generally intelligent, it should be able to do X, Y, and Z. Whereas the research agenda that’s let’s try to get AI systems that do want you want, tends not to make those assumptions. And so it’s more applicable to current systems or narrow system where you can’t assume that you have general intelligence.

For example, a claim that that Paul Christiano often talks about is that, “If your AI agent is generally intelligent and a little bit corrigible, it will probably easily be able to infer that its overseer, or the user, would like to remain in control of any resources that they have, and would like to be better informed about the situation, that the user would prefer that the agent does not lie to them etc., etc.” It was definitely not something that current day AI systems can do unless you really engineer them to, so this is presuming some level of generality, which we do not currently have.

So the next thing I said was limited AGI. Here the idea is, there are not very many policies or AI systems that will do what we want; what we want is a pretty narrow space in the space of all possible behaviors. Actually selecting one of the behaviors out of that space is quite difficult and requires a lot of information in order to narrow in on that piece of behavior. But if all you’re trying to do is avoid the catastrophic behaviors, then there are lots and lots of policies that successfully do that. And so it might be easier to find one of those policies; a policy that doesn’t ever kill all humans.

Lucas: At least the space of those policies, one might have this view and not think it sufficient for AI alignment, but see it as sort of a low hanging fruit to be picked. Because the space of non-catastrophic outcomes is larger than the space of extremely specific futures that human beings support.

Rohin: Yeah, exactly. And the success story here is, basically, that we develop this way of preventing catastrophic behaviors. All of our AI systems are filled with the system in place, and then technological progress continues as usual; it’s maybe not as fast as it would have been if we had an aligned AGI doing all of this for us, but hopefully it would still be somewhat fast, and hopefully enabled a bit by AI systems. Eventually, we will either make it to the future without ever building an AI system that doesn’t have a system in place, or we use this to do a bunch more AI research until we solve the full alignment problem, and then we can build, with high confidence that it’ll go well.

And actual proper aligned, super-intelligence that is helping us without any of these limitations systems in place. I think from a strategic picture, that’s basically the important parts about limited AGI. There are two subsections within those limits based on trying to change what the AI’s optimizing for, so this would be something like impact measures versus limits on the input/output channels of the AI system; so this would be something like AI boxing.

So, with robustness, I sort of think of the robustness mostly, it’s not going to give us safety by itself, probably, though there are some scenarios in which it could happen. It’s more meant to harden whichever other approach that we use. Maybe if we have an AI system that is trying to do what we want, to go back to the helpful AGI setting, maybe it does that 99.9 percent of the time. But we’re using this AI to make millions of decisions, which means it’s going to not do what we want 1,000 times. That seems like way too many times for comfort, because if it’s applying its intelligence to the wrong goal in those 1,000 times, you could get some pretty bad outcomes.

This is a super heuristic and fluffy argument, but there are lots of problems with it. I think it sets up the general reason that we would want robustness. So with robustness techniques, you’re basically trying to get some nice worst case guarantees that say, “Yeah, the AI system is never going to screw up super, super bad.” And this is helpful when you have an AI system that’s going to make many, many, many decisions, and we want to make sure that none of those decisions are going to be catastrophic.

And so some techniques in here include verification, adversarial training, and other adversarial ML techniques like Byzantine fault tolerance, or stuff like that. These are all the data poisoning, interpretability can also be helpful for robustness if you’ve got a strong overseer who can use interpretability to give good feedback to your AI system. But yeah, the overall goal is take something that doesn’t fail 99 percent of the time, and get it to not fail 100 percent of the time, or check whether or not it ever fails, so that you don’t have this very rare but very bad outcome.

Lucas: And so would you see this section as being within the context of any others or being sort of at a higher level of abstraction?

Rohin: I would say that it applies to any of the others, well okay, not the MIRI embedded agency stuff, because we don’t really have a story for how that ends up helping with AI safety. It could apply to however that caches out in the future, but we don’t really know right now. With limited AGI, many have this theoretical model, if you apply this sort of penalty, this sort of impact measure, then you’re never going to have any catastrophic outcomes.

But, of course, in practice, we train our AI systems to optimize that penalty and get the sort of weird black box thing out. And we’re not entirely sure if it’s respecting the penalty or something like this. Then you could use something like verification or your transparency in order to make sure that this is actually behaving the way we would predict them behave based on our analysis of what limits we need to put on the AI system.

Similarly, if you build AI systems that are doing what we want, maybe you want to use adversarial training to see if you can find any situations in which the AI system’s doing something weird, doing something which we wouldn’t classify as what we want, with iterated amplification or debate, maybe we want to verify that the corrigibility property happens all the time. It’s unclear how you would use verification for that, because it seems like a particularly hard property to formalize, but you could still do things like adversarial training or transparency.

We might have this theoretical arguments for why our systems will work, then once we turn them into actual real systems that will probably use neural nets and other messy stuff like that, are we sure that in the translation from theory to practice, all of our guarantees stayed? Unclear, we should probably use some robustness techniques to check that.

Interpretability, I believe, was next. It’s sort of similar in that it’s broadly useful for everything else. If you want to figure out whether an AI system is doing what you want, it would be really helpful to be able to look into the agent and see, “Oh, it chose to buy apples because it had seen me eat apples in the past.” Versus, “It chose to buy apples because there was this company that made it to buy the apples, so that it would make more profit.”

If we could see those two cases, if we could actually see into the decision making process, it becomes a lot easier to tell whether or not the AI system is doing what we want, or whether or not the AI system is corrigible, or whether or not be AI system is properly … Well, maybe it’s not as obvious for impact measures, but I wouldn’t expect it to be useful there as well, even if I don’t have a story off the top of my head.

Similarly with robustness, if you’re doing something like adversarial training, it sure would help if your adversary was able to look into the inner workings of the agent and be like, “Ah, I see this agent, it tends to underwrite this particular class of risky outcomes. So why don’t I search within that class of situations for one that is going to take a big risk on that it shouldn’t have taken otherwise?” It just makes all of the other problems a lot easier to do.

Lucas: And so how is progress made on interpretability?

Rohin: Right now I think most of the progress is in image classifiers. I’ve seen some work on interpretability for deep RL as well. Honestly, that’s probably most of the research is happening with classification systems, primarily image classifiers, but others as well. And then I also see the deep RL explanation systems because I read a lot of deep RL research.

But it’s motivated a lot, there are real problems with current AI systems, and interpretability helps you to diagnose and fix those, as well. For example, the problems of bias in classifiers, one thing that I remember from Deep Dream is you can ask Deep Dream to visualize barbells. And you always see these sort of muscular arms that are attached to the barbells because, in the training set, barbells were always being picked up by muscular people. So, that’s a way that you can tell that your classifier is not really learning the concepts that you wanted it to do.

In the bias case maybe your classifier always classifies anyone sitting at a computer as a man, because of bias in the data set. And using interpretability techniques, you could see that, okay when you look at this picture, the AI system is looking primarily at the pixels that represent the computer, as opposed to the pixels that represent the human. And making its decision to label this person as a man, based on that, and you’re like, no, that’s clearly the wrong thing to do. The classifier should be paying attention to the human, not to the laptop.

So I think a lot of interpretability research right now is you take a particular short term problem and figure out how you can make that problem easier to solve. Though a lot of it is also what would be the best way to understand what our model is doing? So I think a lot of the work that Chris Olah doing, for example, is in this vein, and then as we do this exploration, finding some sort of bias in the classifiers that you’re studying.

So, Comprehensive AI Services, an attempt to predict what the feature of AI development will look like, and the hope is that, by doing this, we can figure out what sort of technical safety things we will need to do. Or, strategically, what sort of things we should push for in the AI research community in order to make those systems safer.

There’s a big difference between, we are going to build a single unified AGI agent and it’s going to be generally intelligent to optimize the world according to a utility function versus we are going to build a bunch of disparate, separate, narrow AI systems that are going to interact with each other quite a lot. And because of that, they will be able to do a wide variety of tasks, none of them are going to look particularly like expected utility maximizers. And the safety research you want to do is different in those two different worlds. And CAIS is basically saying “We’re in the second of those worlds, not the first one.”

Lucas: Can you go ahead and tell us about ambitious value learning?

Rohin: Yeah, so with ambitious value learning, this is also an approach to how do we make an aligned AGI solve the entire problem in some sense? Which is look at not just human behavior, but also human brains of the algorithm that they implement, and use that to infer an adequate utility function, the one that we would be okay with the behavior that results from that.

Infer this utility function, I’m going to plug it into an expected utility maximizer. Now, of course, we do have to solve problems with even once we have the utility function, how do we actually build a system that maximizes that utility function, which is not a solved problem yet? But it does seem to be capturing from the main difficulties, if you could actually solve the problem. And so that’s an approach I associate most with Stuart Armstrong.

Lucas: Alright, and so you were saying earlier, in terms of your own view, it’s sort of an amalgamation of different credences that you have in the potential efficacy of all these different approaches. So, given all of these and all of their broad missions, and interests, and assumptions that they’re willing to make, what are you most hopeful about? What are you excited about? How do you, sort of, assign your credence and time here?

Rohin: I think I’m most excited about the concept of corrigibility. That seems like the right thing to aim for, it seems like it’s a thing we can achieve, it seems like if we achieve it, we’re probably okay, nothing’s going to go horribly wrong and probably will go very well. I am less confident on which approach to corrigibility I am most excited about. Iterated amplification and debate seem like if we were to implement them, they will probably lead to incorrigible behavior. But I am worried that either of those will be … Either we won’t actually be able to build generally intelligent agents, in which case both of those approaches don’t really work. Or another worry that I have is that those approaches might be too expensive to actually do in that other systems are just so much more computationally efficient that we just use those instead.

Due to economic pressures, Paul does not seem to be worried by either of these things. He’s definitely aware of both these issues, in fact, he was the one I think who listed computational efficiency as a desideratum, and he still is optimistic about them. So, I would not put a huge amount of credence in this view of mine.

If I were to say what I was excited about for portability instead of that, it would be something like take the research that we’re currently doing on how to get current AI systems to work, which often called ‘narrow value learning’. If you take that research, it seems plausible that this research, extended into the future, will give us some method of creating an AI system that’s implicitly learning our narrow values, and is corrigible as a result of that, even if it is not generally intelligent.

This is sort of a very hand wavey speculative intuition, certainly not as concrete as the hope that we have with iterated amplification. But I’m somewhat optimistic about it, and less optimistic about limiting AI systems, it seems like even if you succeed in finding a nice, simple rule that eliminates all catastrophic behaviors, which plausibly you could do, it seems hard to find one that both does that and also lets you do all of the things that you do want to do.

If you’re talking about impact metrics, for example, if you require AI to be a low impact, I expect that that would prevent you from doing many things that we actually want to do, because many things that we want to do are actually quite high impact. Now, Alex Turner disagrees with me on this, and he developed attainable utility preservation. He is explicitly working on this problem and disagree with me, so again I don’t know how much credence to put in this.

I don’t know if Vika agrees with me on this or not, she also might disagree with me and she is also directly working with this problem. So, yeah, seems hard to put a limit that also lets us do and things that we want. And in that case, it seems like due to economic pressures, we’d end up doing the things that don’t limit our AI systems from doing what they want.

I want to keep emphasizing my extreme uncertainty over all of this given that other people disagree with me on this, but that’s my current opinion. Similarly with boxing, it seems like it’s going to just make it very hard to actually use the AI system. Robustness and interpretability seems very broadly useful and supportive of most research on interpretability; maybe with an eye towards long term concerns, just because it seems to make every other approach to AI safety a lot more feasible and easier to solve.

I don’t think it’s a solution by itself, but given that it seems to improve almost every story I have for making an aligned AGI, seems like it’s very much worth getting a better understanding of it. Robustness is an interesting one, it’s not clear to me, if it is actually necessary. I kind of want to just voice lots of uncertainty about robustness and leave it at that. It’s certainly good to do in that it helps us be more confident in our AI systems, but maybe everything would be okay even if we just didn’t do anything. I don’t know, I feel like I would have to think a lot more about this and also see the techniques that we actually used to build AGI in order to have a better opinion on that.

Lucas: Could you give a few examples of where your intuitions here are coming from that don’t see robustness as an essential part of the AI alignment?

Rohin: Well, one major intuition, if you look at humans, they’re at least some human where I’m like, “Okay, I could just make this human a lot smarter, a lot faster, have them think for many, many years, and I still expect that they will be robust and not lead to some catastrophic outcome. They may not do exactly what I would have done, because they’re doing what they want. But they’re probably going to do something reasonable, they’re not going to do something crazy or ridiculous.

I feel like humans, some humans, the sufficiently risk averse and uncertain ones seem to be reasonably robust. I think that if you know that you’re planning over a very, very, very long time horizon, so imagine that you know you’re planning over billions of years, then the rational response to this is, “I really better make sure not to screw up right now, since there is just so much reward in the future, I really need to make sure that I can get it.” And so you get very strong pressures for preserving option value or not doing anything super crazy. So I think you could, plausibly, just get the reasonable outcomes from those effects. But again, these are not well thought out.

Lucas: All right, and so I just want to go ahead and guide us back to your general views, again, on the approaches. Is there anything that you’d like to add their own the approaches?

Rohin: I think I didn’t talk about CAIS yet. I guess my general view of CAIS, I broadly agree with it, that this does seem to be the most likely development path, meaning that it’s more likely than any other specific development path, but not more likely to have any other development path.

So I broadly agree with the worldview presented, I’m still trying to figure out what implications it has for technical safety research. I don’t agree with all of it, in particular, I think that you are likely to get AGI agents at some point, probably, after the CAIS soup of services happens. Which, I think, again, Drexler disagrees with me on that. So, put a bunch of uncertainty on that, but I broadly agree with that worldview that CAIS is proposing.

Lucas: In terms of this disagreement between you and Eric Drexler, are you imagining agenty AGI or super-intelligence which comes after the CAIS soup? Do you see that as an inevitable byproduct of CAIS or do you see that as an inevitable choice that humanity will make? And is Eric pushing the view that the agenty stuff doesn’t necessarily come later, it’s a choice that human beings would have to make?

Rohin: I do think it’s more like saying that this will be a choice that humans will make at some point. I’m sure that Eric, to some extent, is saying, “Yeah, just don’t do that.” But I think Eric and I do, in fact, have a disagreement on how much more performance you can get from an AGI agent, than a CAIS super of services. My argument is something like there is efficiency to be gained from going to an AGI agent, and Eric’s position as best I understand it, is that there is actually just not that much economic incentive to go to an AGI agent.

Lucas: What are your intuition pumps for why you think that you will gain a lot of computational efficiency from creating sort of an AGI agent? We don’t have to go super deep, but I guess a terse summary or something?

Rohin: Sure, I guess the main intuition pump is that in all of the past cases that we have of AI systems, you see that in speech recognition, in deep reinforcement learning, in image classification, we had all of the hand-built systems that separated these out into a few different modules that interacted with each other in a vaguely CAIS-like way. And then, at some point, we got enough computer and large enough data sets that we just threw deep learning at it, and deep learning just blew those approaches out of the water.

So there’s the argument from empirical experience, and there’s also the argument of if you try to modularize your systems yourself, you can’t really optimize the communication between them, you’re less integrated and you can’t make decisions based on global information, you have to make it based off of local information. And so the decisions tend to be a little bit worse. This could be taken as an explanation for the empirical observation that I made that we can already make; so that’s another intuition pump there.

Eric’s response would probably be something like, “Sure, this seems true for these narrow tasks, for narrow tasks.” You can get a lot of efficiency gains by integrating everything together and throwing deep learning and [inaudible 00:54:10] training at all of it. But for a sufficiently high level tasks, there’s not really that much to be gained by doing global information instead of local information, so you don’t actually lose much by having these separate systems, and you do get a lot of computational deficiency in generalization bonuses by modularizing. He had a good example of this that I’m not replicating and I don’t want to make my own example, because it’s not going to be as convincing; but that’s his current argument.

And then my counter-argument is that’s because humans have small brains, so given the size of our brains and the limits of our data, and the limits of the compute that we have, we are forced to do modularity and systematization to break tasks apart into modular chunks that we can then do individually. Like if you are running a corporation, you need each person to specialize in their own task without thinking about all the other tasks, because we just do not have the ability to optimize for everything all together because we have small brains, relatively speaking; or limited brains, is what I should say.

But this is not a limit that AI systems will have. An AI system would just vastly more computer than the human brain, vastly more data will, in fact, just be able to optimize all of this with global information and get better results. So that’s one thread of the argument taken down to two or three levels of arguments and counter-arguments. There are other threads of that debate, as well.

Lucas: I think that that serves a purpose for illustrating that here. So are there any other approaches here that you’d like to cover, or is that it?

Rohin: I didn’t talk about factored cognition very much. But I think it’s worth highlighting separately from iterated amplification in that it’s testing an empirical hypothesis of can humans decompose tasks into chunks of some small amount of time? And can we do arbitrarily complex tasks using these humans? I am particularly excited about this sort of work that’s trying to figure out what humans are capable of doing and what supervision they can give to AI systems.

Mostly because going back to a thing I said way back in the beginning, what we’re aiming for is a human AI system to be collectively rational as opposed to an AI system as individually rational. Part of the human-AI-system is the human, you want to be able to know what the human can do, what sort of policies they can implement, what sort of feedback they can be giving to the AI system. And something like factory cognition is testing a particular aspect of that; and I think that seems great and we need more of it.

Lucas: Right. I think that this seems to be the sort of emerging view of where social science or scientists are needed in AI alignment in order to, again as you said, sort of understand what human beings are capable in terms of supervised learning and analyzing the human component of the AI alignment problem as it requires us to be collectively rational with AI systems.

Rohin: Yeah, that seems right. I expect more writing on this in the future.

Lucas: All right, so there’s just a ton of approaches here to AI alignment, and our heroic listeners have a lot to take in here. In terms of getting more information, generally, about these approaches or if people are still interested in delving into all these different views that people take at the problem and methodologies of working on it, what would you suggest that interested persons look into or read into?

Rohin: I cannot give you a overview of everything, because that does not exist. To the extent that it exists, it’s either this podcast or the talk that I did at Beneficial AGI. I can suggest resources for individual items, so for embedded agency there’s the embedded agency sequence on the Alignment Forum; far and away the best thing for read for that.

For CAIS, Comprehensive AI Services, there was a 200 plus page tech report published by Eric Drexler at the beginning of this month, if you’re interested, you should go read the entire thing; it is quite good. But I also wrote a summary of it on the Alignment Forum, which is much more readable, in the sense that it’s shorter. And then there are a lot of comments on there that analyze it a bit more.

There’s also another summary written by Richard Ngo, also on the Alignment Forum. Maybe it’s only on Lesswrong, I forget; it’s probably on the Alignment Forum. But that’s a different take on comprehensive AI services, so I’d recommend reading that too.

For limited AGI, I have not really been keeping up with the literature on boxing, so I don’t have a favorite to recommend. I know that a couple have been written by, I believe, Jim Babcock and Roman Yampolskiy.

For impact measures, you want to read Vika’s paper on relative reachability. There’s also a blog post about it if you don’t want to read the paper. And Alex Turner’s blog posts on attainable utility preservation, I think it’s called ‘Towards A New Impact Measure’, and this is on the Alignment Forum.

For robustness, I would read Paul Christiano’s post called ‘Techniques For Optimizing Worst Case Performance’. This is definitely specific to how robustness will help under Paul’s conception of the problem and, in particular, his thinking of robustness in the setting where you have a very strong overseer for your AI system. But I don’t know of any other papers or blog post that’s talking about robustness, generally.

For AI systems that do what we want, there’s my value learning sequence that I mentioned before on the Alignment Forum. There’s CIRL or Cooperative Inverse Reinforcement Learning which is a paper by Dylan and others. There’s Deep Reinforcement Learning From Human Preferences and Recursive Reward Modeling, these are both papers that are particular instances of work in this field. I also want to recommend Inverse Reward Design, because I really like that paper; so that’s also a paper by Dylan, and others.

For corrigibility and iterated amplification, the iterated amplification sequence on the Alignment Forum or half of what Paul Christiano has written. If you want to read not an entire sequence of blog posts, then I think Clarifying AI alignment is probably the post I would recommend. It’s one of the posts in the sequence and talks about this distinction of creating an AI system that is trying to do what you want, as opposed to actually doing what you want and why we might want to aim for only the first one.

For iterated amplification, itself, that technique, there is a paper that I believe is called something like Supervising Strong Learners By Amplifying Weak Experts, which is a good thing to read and there’s also corresponding OpenAI blog posts, whose name I forget. I think if you search iterated amplification, OpenAI blog you’ll find it.

And then for debate, there’s AI Safety via Debate, which is a paper, there’s also a corresponding OpenAI blog post. For factory cognition, there’s a post called Factored Cognition, on the Alignment Forum; again, in the iterated amplification sequence.

For interpretability, there isn’t really anything talking about interpretability, from the strategic point of view of why we want it. I guess that same post I recommend before of techniques for optimizing worst case performance talks about it a little bit. For actual interpretability techniques, I recommend the distill articles, the building blocks of interpretability and feature visualization, but these are more about particular techniques for interpretability, as opposed to why we wanted interpretability.

And on ambitious value learning, the first chapter of my sequence on value learning talks exclusively about ambitious value learning; so that’s one thing I’d recommend. But also Stuart Armstrong has so many posts, I think there’s one that’s about resolving human values adequately and something else, something like that. That one might be one worth checking out, it’s very technical though; lots of math.

He’s also written a bunch of posts that convey the intuitions behind the ideas. They’re all split into a bunch of very short posts, so I can’t really recommend any one particular one. You could go to the alignment newsletter database and just search Stuart Armstrong, and click on all of those posts and read them. I think that was everything.

Lucas: That’s a wonderful list. So we’ll go ahead and link those all in the article which goes along with this podcast, so that’ll all be there organized in nice, neat lists for people. This is all probably been fairly overwhelming in terms of the number of approaches and how they differ, and how one is to adjudicate the merits of all of them. If someone is just sort of entering the space of AI alignment, or is beginning to be interested in sort of these different technical approaches, do you have any recommendations?

Rohin: Reading a lot, rather than trying to do actual research. This was my strategy, I started back in September of 2017 and I think for the first six months or so, I was reading about 20 hours a week, in addition to doing research; which was why it was only 20 hours a week, it wasn’t a full time thing I was doing.

And I think that was very helpful for actually forming a picture of what everyone was doing. Now, it’s plausible that you don’t want to actually learn about what everyone is doing, and you’re okay with like, “I’m fairly confident that this thing, this particular problem is an important piece of the problem and we need to solve it.” And I think it’s very easy to get that wrong, so I’m a little wary of recommending that but it’s a reasonable strategy to just say, “Okay, we probably will need to solve this problem, but even if we don’t, the intuitions that we get from trying to solve this problem will be useful.

Focusing on that particular problem, reading all of the literature on that, attacking that problem, in particular, lets you start doing things faster, while still doing things that are probably going to be useful; so that’s another strategy that people could do. But I don’t think it’s very good for orienting yourself in the field of AI safety.

Lucas: So you think that there’s a high value in people taking this time to read, to understand all the papers and the approaches before trying to participate in particular research questions or methodologies. Given how open this question is, all the approaches make different assumptions and take for granted different axioms which all come together to create a wide variety of things which can both complement each other and have varying degrees of efficacy in the real world when AI systems start to become more developed and advanced.

Rohin: Yeah, that seems right to me. Part of the reason I’m recommending this is because it seems to be that no one does this. I think, on the margin, I want more people who do this in a world where 20 percent of the people were doing this, and the other 80 percent were just taking particular piece of the problem and working on those. That might be the right balance, somewhere around there, I don’t know, it depends on how you count who is actually in the field. But somewhere between one and 10 percent of the people are doing this; closer to the one.

Lucas: Which is quite interesting, I think, given that it seems like AI alignment should be in a stage of maximum exploration just given the conceptually mapping the territory is very young. I mean, we’re essentially seeing the birth and initial development of an entirely new field and specific application of thinking. And there are many more mistakes to be made, and concepts to be clarified, and layers to be built. So, seems like we should be maximizing our attention in exploring the general space, trying to develop models, the efficacy of different approaches and philosophies and views of AI alignment.

Rohin: Yeah, I agree with you, that should not be surprising given that I am one of the people doing this, or trying to do this. Probably the better critique will come from people who are not doing this, and can tell both of us why we’re wrong about this.

Lucas: We’ve covered a lot here in terms of the specific approaches, your thoughts on the approaches, where we can find resources on the approaches, why setting the approaches matters. Are there any parts of the approaches that you feel deserve more attention in terms of these different sections that we’ve covered?

Rohin: I think I would want more work on looking at the intersection between things that are supposed to be complimentary, how interpretability can help you have AI systems that have the right goals, for example, would be a cool thing to do. Or what you need to do in order to get verification, which is a sub-part of robustness, to give you interesting guarantees on AI systems that we actually care about.

Most of the work on verification right now is like, there’s this nice specification that we have for adversarial examples, in particular, is there an input that is within some distance from a training data point, such that it gets classified differently from that training data point. And those are the nice formal specification and most of the work in verification takes this specification as given and that figures out more and more computationally efficient ways to actually verify that property, basically.

That does seem like a thing that needs to happen, but the much more urgent thing, in my mind, is how do we come up with these specifications in the first place? If I want to verify that my AI system is corrigible, or I want to verify that it’s not going to do anything catastrophic, or that it is going to not disable my value learning system, or something like that; how do I specify this at all in any way that lets me do something like a verification technique even given infinite computing power? It’s not clear to me how you would do something like that, and I would love to see people do more research on that.

That particular thing is my current reason for not being very optimistic about verification, in particular, but I don’t think anyone has really given it a try. So it’s plausible that there’s actually just some approach that could work that we just haven’t found yet because no one’s really been trying. I think all of the work on limited AGI is talking about, okay, does this actually eliminate all of the catastrophic behavior? Which, yeah, that’s definitely an important thing, but I wish that people would also do research on, given that we put this penalty or this limit on the AGI system, what things is it still capable of doing?

Have we just made it impossible for it to do anything of interest whatsoever, or can it actually still do pretty powerful things, even though we’ve placed these limits on it? That’s the main thing I want to see. From there, let’s have AI systems that do what we want, probably the biggest thing I want to see there, and I’ve been trying to do some of this myself, some conceptual thinking about how does this lead to good outcomes in the long term? So far, we’ve not been dealing with the fact that the human doesn’t actually know, doesn’t actually have a nice consistent utility function that they know and that can be optimized. So, once you relax that assumption, what the hell do you do? And then there’s also a bunch of other problems that would benefit from more conceptual clarification, maybe I don’t need to go into all of them right now.

Lucas: Yeah. And just to sort of inject something here that I think we haven’t touched on and that you might have some words about in terms of approaches. We discussed sort of agential views of advanced artificial intelligence, a services-based conception, though I don’t believe that we have talked about aligning AI systems that simply function as oracles or having a concert of oracles. You can get rid of the services thing, and the agency thing if the AI just tells you what is true, or answers your questions in a way that is value aligned.

Rohin: Yeah, I mostly want to punt on that question because I have not actually read all the papers. I might have read a grand total of one paper on the oracles, and also super intelligence which talks about oracles. So I feel like I know so little about the state of the art on oracles, that it should not actually say anything about them.

Lucas: Sure. So then just as a broad point to point out to our audience is that in terms of conceptualizing these different approaches to AI alignment, it’s important and crucial to consider the kind of AI system that you’re thinking about the kinds of features and properties that it has, and oracles are another version here that one can play with in one’s AI alignment thinking?

Rohin: I think the canonical paper there is something like Good and Safe Pieces of Oracles, but I have not actually read it. There is a list of things I want to read, it is on that list. But that list also has, I think, something like 300 papers on it, and apparently I have not gotten to oracles yet.

Lucas: And so for the sake of this whole podcast being as comprehensive as possible, are there any conceptions of AI, for example, that we have omitted so far adding on to this agential view, the CAIS view of it actually just being a lot of distributed services, or an oracle view?

Rohin: There’s also the Tool AI View. This is different from the services view, but it’s somewhat akin to the view you were talking about at the beginning of this podcast where you’ve got AI systems that have a narrowly defined input/output space, they’ve got a particular thing that they do with limit, and they just sort of take in their inputs and do some computation, they spit out their outputs and that’s it, that’s all that they do. You can’t really model them as having some long term utility function that they’re optimizing, they’re just implementing a particular input-output relation and it’s all they’re trying to do.

Even saying something like, “They are trying to do X.” Is basically using a bad model for them. I think the main argument against expecting tool AI systems is that they’re probably not going to be as useful as other services or agential AI, because tool AI systems would have to be programmed in a way where we understood what they were doing and why they were doing it. Whereas agential AI systems or services would be able to consider new possible ways of achieving goals that we hadn’t thought about and enact those plans.

And so they could get super human behavior by considering things that we wouldn’t consider. Whereas, true Ais … Like Google Maps is super human in some sense, but it’s super human only because it has a compute advantage over us. If we were given all of the data and all of the time, in human real time, that Google Maps had, we could implement a similar sort of algorithm as Google Maps and compute the optimal route ourselves.

Lucas: There seems to be this duality that is constantly being formed in our conception of AI alignment, where the AI system is this tangible external object which stands in some relationship to the human and is trying to help the human to achieve certain things.

Are there conceptions of value alignment which, however the procedure or methodology is done, changes or challenges the relationship between the AI system and the human system where it challenges what it means to be the AI or what it means to be human, whereas, there’s potentially some sort of merging or disruption of this dualistic scenario of the relationship?

Rohin: I don’t really know, I mean, it sounds like you’re talking about things like brain computer interfaces and stuff like that. I don’t really know of any intersection between AI safety research and that. I guess, this did remind me, too, that I want to make the point that all of this is about the relatively narrow, I claim, problem of aligning an AI system with a single human.

There is also the problem of, okay what if there are multiple humans, what if there are multiple AI systems, what if you’ve got a bunch of different groups of people and each group is value aligned within themselves, they build an AI that’s value aligned with them, but lots of different groups do this now what happens?

Solving the problem that I’ve been talking about does not mean that you have a good outcome in the long term future, it is merely one piece of a larger overall picture. I don’t think any of that larger overall picture removes the dualistic thing that you were talking about, but they dualistic part reminded me of the fact that I am talking about a narrow problem and not the whole problem, in some sense.

Lucas: Right and so just to offer some conceptual clarification here, again, the first problem is how do I get an AI system to do what I want it to do when the world is just me and that AI system?

Rohin: Me and that AI system and the rest of humanity, but the rest of humanity is treated as part of the environment.

Lucas: Right, so you’re not modeling other AI systems or how some mutually incompatible preferences and trained systems would interact in the world or something like that?

Rohin: Exactly.

Lucas: So the full AI alignment problem is… It’s funny because it’s just the question of civilization, I guess. How do you get the whole world and all of the AI systems to make a beautiful world instead of a bad world?

Rohin: Yeah, I’m not sure if you saw my lightning talk at Beneficial AGI, but I talked a bit about those. I think I called that top level problem, make AI related features stuff go well, very, very, very concrete, obviously.

Lucas: It makes sense. People know what you’re talking about.

Rohin: I probably wouldn’t call that broad problem the AI alignment problem. I kind of wonder is there a different alignment for the narrower trouble? We could maybe call it the ‘AI Safety Problem’ or the ‘AI Future Problem’, I don’t know. ‘Beneficially AI’ problem actually, I think that’s what I used last time.

Lucas: That’s a nice way to put it. So I think that, conceptually, leave us at a very good place for this first section.

Rohin: Yeah, seems pretty good to me.

Lucas: If you found this podcast interesting or useful, please make sure to check back for part two in a couple weeks where Rohin and I go into more detail about the strengths and weaknesses of specific approaches.

We’ll be back again soon with another episode in the AI Alignment podcast.

[end of recorded material]

FLI Podcast: Why Ban Lethal Autonomous Weapons?

Why are we so concerned about lethal autonomous weapons? Ariel spoke to four experts –– one physician, one lawyer, and two human rights specialists –– all of whom offered their most powerful arguments on why the world needs to ensure that algorithms are never allowed to make the decision to take a life. It was even recorded from the United Nations Convention on Conventional Weapons, where a ban on lethal autonomous weapons was under discussion. 

Dr. Emilia Javorsky is a physician, scientist, and Founder of Scientists Against Inhumane Weapons; Bonnie Docherty is Associate Director of Armed Conflict and Civilian Protection at Harvard Law School’s Human Rights Clinic and Senior Researcher at Human Rights Watch; Ray Acheson is Director of The Disarmament Program of the Women’s International League for Peace and Freedom; and Rasha Abdul Rahim is Deputy Director of Amnesty Tech at Amnesty International.

Topics discussed in this episode include:

  • The role of the medical community in banning other WMDs
  • The importance of banning LAWS before they’re developed
  • Potential human bias in LAWS
  • Potential police use of LAWS against civilians
  • International humanitarian law and the law of war
  • Meaningful human control

Once you’ve listened to the podcast, we want to know what you think: What is the most convincing reason in favor of a ban on lethal autonomous weapons? We’ve listed quite a few arguments in favor of a ban, in no particular order, for you to consider:

  • If the AI community can’t even agree that algorithms should not be allowed to make the decisions to take a human life, then how can we find consensus on any of the other sticky ethical issues that AI raises?
  • If development of lethal AI weapons continues, then we will soon find ourselves in the midst of an AI arms race, which will lead to cheaper, deadlier, and more ubiquitous weapons. It’s much harder to ensure safety and legal standards in the middle of an arms race.
  • These weapons will be mass-produced, hacked, and fall onto the black market, where anyone will be able to access them.
  • These weapons will be easier to develop, access, and use, which could lead to a rise in destabilizing assassinations, ethnic cleansing, and greater global insecurity.
  • Taking humans further out of the loop will lower the barrier for entering into war.
  • Greater autonomy increases the likelihood that the weapons will be hacked, making it more difficult for military commanders to ensure control over their weapons.
  • Because of the low cost, these will be easy to mass-produce and stockpile, making AI weapons the newest form of Weapons of Mass Destruction.
  • Algorithms can target specific groups based on sensor data such as perceived age, gender, ethnicity, facial features, dress code, or even place of residence or worship.
  • Algorithms lack human morality and empathy, and therefore they cannot make humane context-based kill/don’t kill decisions.
  • By taking the human out of the loop, we fundamentally dehumanize warfare and obscure who is ultimately responsible and accountable for lethal force.
  • Many argue that these weapons are in violation of the Geneva Convention, the Marten’s Clause, the International Covenant on Civil and Political Rights, etc. Given the disagreements about whether lethal autonomous weapons are covered by these pre-existing laws, a new ban would help clarify what are acceptable uses of AI with respect to lethal decisions — especially for the military — and what aren’t.
  • It’s unclear who, if anyone, could be held accountable and/or responsible if a lethal autonomous weapon causes unnecessary and/or unexpected harm.
  • Significant technical challenges exist which most researchers anticipate will take quite a while to solve, including: how to program reasoning and judgement with respect to international humanitarian law, how to distinguish between civilians and combatants, how to understand and respond to complex and unanticipated situations on the battlefield, how to verify and validate lethal autonomous weapons, how to understand external political context in chaotic battlefield situations.
  • Once the weapons are released, contact with them may become difficult if people learn that there’s been a mistake.
  • By their very nature, we can expect that lethal autonomous weapons will behave unpredictably, at least in some circumstances.
  • They will likely be more error-prone than conventional weapons.
  • They will likely exacerbate current human biases putting innocent civilians at greater risk of being accidentally targeted.
  • Current psychological research suggests that keeping a “human in the loop” may not be as effective as many hope, given human tendencies to be over-reliant on machines, especially in emergency situations.
  • In addition to military uses, lethal autonomous weapons will likely be used for policing and border control, again putting innocent civilians at greater risk of being targeted.

So which of these arguments resonates most with you? Or do you have other reasons for feeling concern about lethal autonomous weapons? We want to know what you think! Please leave a response in the comments section below.

Publications discussed in this episode include:

For more information, visit autonomousweapons.org.

AI Alignment Podcast: AI Alignment through Debate with Geoffrey Irving

“To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during training which agent behaviors are safe and useful, but this approach can fail if the task is too complicated for a human to directly judge. To help address this concern, we propose training agents via self play on a zero sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information…  In practice, whether debate works involves empirical questions about humans and the tasks we want AIs to perform, plus theoretical questions about the meaning of AI alignment.” AI safety via debate

Debate is something that we are all familiar with. Usually it involves two or more persons giving arguments and counter arguments over some question in order to prove a conclusion. At OpenAI, debate is being explored as an AI alignment methodology for reward learning (learning what humans want) and is a part of their scalability efforts (how to train/evolve systems to safely solve questions of increasing complexity). Debate might sometimes seem like a fruitless process, but when optimized and framed as a two-player zero-sum perfect-information game, we can see properties of debate and synergies with machine learning that may make it a powerful truth seeking process on the path to beneficial AGI.

On today’s episode, we are joined by Geoffrey Irving. Geoffrey is a member of the AI safety team at OpenAI. He has a PhD in computer science from Stanford University, and has worked at Google Brain on neural network theorem proving, cofounded Eddy Systems to autocorrect code as you type, and has worked on computational physics and geometry at Otherlab, D. E. Shaw Research, Pixar, and Weta Digital. He has screen credits on Tintin, Wall-E, Up, and Ratatouille. 

We hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, iTunes, Google Play, Stitcher, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

Topics discussed in this episode include:

  • What debate is and how it works
  • Experiments on debate in both machine learning and social science
  • Optimism and pessimism about debate
  • What amplification is and how it fits in
  • How Geoffrey took inspiration from amplification and AlphaGo
  • The importance of interpretability in debate
  • How debate works for normative questions
  • Why AI safety needs social scientists
You can find out more about Geoffrey Irving at his website. Here you can find the debate game mentioned in the podcast. Here you can find Geoffrey Irving, Paul Christiano, and Dario Amodei’s paper on debate. Here you can find an Open AI blog post on AI Safety via Debate. You can listen to the podcast above or read the transcript below.

Lucas: Hey, everyone. Welcome back to the AI Alignment Podcast. I’m Lucas Perry, and today we’ll be speaking with Geoffrey Irving about AI safety via Debate. We discuss how debate fits in with the general research directions of OpenAI, what amplification is and how it fits in, and the relation of all this with AI alignment. As always, if you find this podcast interesting or useful, please give it a like and share it with someone who might find it valuable.

Geoffrey Irving is a member of the AI safety team at OpenAI. He has a PhD in computer science from Stanford University, and has worked at Google Brain on neural network theorem proving, cofounded Eddy Systems to autocorrect code as you type, and has worked on computational physics and geometry at Otherlab, D. E. Shaw Research, Pixar, and Weta Digital. He has screen credits on Tintin, Wall-E, Up, and Ratatouille. Without further ado, I give you Geoffrey Irving.

Thanks again, Geoffrey, for coming on the podcast. It’s really a pleasure to have you here.

Geoffrey: Thank you very much, Lucas.

Lucas: We’re here today to discuss your work on debate. I think that just to start off, it’d be interesting if you could provide for us a bit of framing for debate, and how debate exists at OpenAI, in the context of OpenAI’s general current research agenda and directions that OpenAI is moving right now.

Geoffrey: I think broadly, we’re trying to accomplish AI safety by reward learning, so learning a model of what humans want and then trying to optimize agents that achieve that model, so do well according to that model. There’s sort of three parts to learning what humans want. One part is just a bunch of machine learning mechanics of how to learn from small sample sizes, how to ask basic questions, how to deal with data quality. There’s a lot more work, then, on the human side, so how do humans respond to the questions we want to ask, and how do we sort of best ask the questions?

Then, there’s sort of a third category of how do you make these systems work even if the agents are very strong? So stronger than human in some or all areas. That’s sort of the scalability aspect. Debate is one of our techniques for doing scalability. Amplification being the first one and Debate is a version of that. Generally want to be able to supervise a learning agent, even if it is smarter than a human or stronger than a human on some task or on many tasks.

Debate is you train two agents to play a game. The game is that these two agents see a question on some subject, they give their answers. Each debater has their own answer, and then they have a debate about which answer is better, which means more true and more useful, and then a human sees that debate transcript and judges who wins based on who they think told the most useful true thing. The result of the game is, one, who won the debate, and two, the answer of the person who won the debate.

You can also have variants where the judge interacts during the debate. We can get into these details. The general point is that, in my tasks, it is much easier to recognize good answers than it is to come up with the answers yourself. This applies at several levels.

For example, at the first level, you might have a task where a human can’t do the task, but they can know immediately if they see a good answer to the task. Like, I’m bad at gymnastics, but if I see someone do a flip very gracefully, then I can know, at least to some level of confidence, that they’ve done a good job. There are other tasks where you can’t directly recognize the answer, so you might see an answer, it looks plausible, say, “Oh, that looks like a great answer,” but there’s some hidden flaw. If an agent were to point out that flaw to you, you’d then think, “Oh, that’s actually a bad answer.” Maybe it was misleading, maybe it was just wrong. You need two agents doing a back and forth to be able to get at the truth.

Then, if you apply this recursively through several levels, you might have tasks where you can’t recognize whether an answer is good directly. You can’t even recognize whether a counterargument is good. Maybe a counter-counterargument, then you could recognize it. If you do sort of three steps of Debate, back and forth with two agents, you get to the truth.

Depending on the task, you need some number of these steps to pin down what the true answer is. Broadly, what’s happening is that the debaters, when they’re giving these counterarguments, they’re skipping over a lot of the space where maybe the human doesn’t understand what would happen, but they know that it would go a certain way. Like, if you’d see two expert players playing a game of Go, they’re constantly ignoring a larger number of moves which are just very bad and would waste a lot of time if you try to analyze them, and they’re just looking at kind of the most important part of the arguments about what is the best move in the game.

For all of the experiments, there are simple image tasks, like MNIST. MNIST is a dataset of digits, and then also some preliminary kind of human experiments with just human debaters. On MNIST, an example is you have a sparse classifier that learns to recognize MNIST from just seeing a few pixels, so you might train it by picking random masks of just a few pixels of an MNIST image. This classifier is very bad, because it only sees a small part of the image. It might get accuracies below 50%. I forget exactly the number, I could look them up.

Then, you train debaters that choose which parts of the image to show to the judge. You fix the judge at this kind of weak training process, and then the debaters will play a game where they try to pick pixels that convince the judge of what they want to argue for. There’s kind of two ways to decide … In the first paper, we had one debater that was incentivized to tell the truth, so it knew the correct class. Like say, we’re looking at a seven, and the other debater had to pre-specify which digit it was going to lie for. It might say, “I’m going to lie and say things were an eight.”

The game would be each of these agents pick a single pixel, and you sort of iterate for, let’s say six rounds, picking one pixel after another, and then the judge sees those pixels, and chooses who wins the debate. The judge has seen kind of a small part of the evidence. The result of that is that you can boost a weak classifier up to sort of higher performance, even if the agents are incentivized only to win the game. In this case, if you pick one debater to be the honest player and one debater to be the liar, the honest player wins a significant majority of the time, much more than the classifier would win on random play.

Lucas: The judge here, sorry, was not a human?

Geoffrey: The judge here is not a human. It’s just a classifier trained to do a bad job at MNIST, because it sees only a little bit of information. It’s trained to convergence, but its input is just a few pixels of an image.

Lucas: Then, so the pixels that are then revealed by the debaters, each pixel is sort of the argument.

Geoffrey: Each pixel is the argument.

Lucas: Just to provide a little bit more framing, there’s this feature of debate, you can apply it to a very large domain of things that you’d be surprised about if you expand the notion of what it means to debate to showing pixels or something like this.

Geoffrey: It’s actually more important to debate in natural language. The end goal here is we want to extract a strengthened, kind of improved version of human performance at a task. The way we go about this, either in amplification or in debate, is we sort of factor through reasoning. Instead of trying to train directly on the task, like the answers to the task, you might have some questions and some answers, and you could train directly on question/answer pairs, we’re going to build a task which includes all possible human reasoning in the form of, say, in this case, debates, and then we’ll train the agents to do well in this space of reasoning, and then well pick out the answers at the very end. Once we’re satisfied that the reasoning all works out.

Because humans, sort of the way we talk about higher level concepts, especially abstract concepts, and say subtle moral concepts, is natural language, the most important domain here, in the human case, is natural language. What we’ve done so far, in all experiments for Debate, is an image space, because it’s easier. We’re trying now to move that work into natural language so that we can get more interesting settings.

Lucas: Right. In terms of natural language, do you just want to unpack a little bit about how that would be done at this point in natural language? It seems like our natural language technology is not at a point where I really see robust natural language debates.

Geoffrey: There’s sort of two ways to go. One way is human debates. You just replace the ML agents with human debaters and then a human judge, and you see whether the system works in kind of an all-human context. The other way is machine learning natural language is getting good enough to do interestingly well on sample question/answer datasets, and Debate is already interesting if you do a very small number of steps. In the general debate, you sort of imagine that you have this long transcript, dozens of statements long, with points and counterpoints and counterpoints, but if you already do just two steps, you might do question, answer, and then single counterargument. For some tasks, at least in theory, it already should be stronger than the baseline of just doing direct question/answer, because you have this ability to focus in on a counterargument that is important.

An example might be you see a question and an answer and then another debater just says, “Which part of the answer is problematic?” They might point to a word or to a small phrase, and say, “This is the point you should sort of focus in on.” If you learn how to self critique, then you can boost the performance by iterating once you know how to self critique.

The hope is that even if we can’t do general debates on the machine learning side just yet, we can do shallow debates, or some sort of simple first step in this direction, and then work up over time.

Lucas: This just seems to be a very fundamental part of AI alignment where you’re just breaking things down into very simple problems and then trying to succeed in those simple cases.

Geoffrey: That’s right.

Lucas: Just provide a little bit more illustration of debate as a general concept, and what it means in the context of AI alignment. I mean, there are open questions here, obviously, about the efficacy of debate, how debate exists as a tool within the space, so epistemological things that allow us to arrive at truth, and I guess, infer other people’s preferences. Sorry, again, in terms of reward learning, and AI alignment, and debate’s place in all of this, just contextualize, I guess, its sort of role in AI alignment, more broadly.

Geoffrey: It’s focusing, again, on the scalability aspect. One way to formulate that is we have this sort of notion of, either from a philosophy side, reflective equilibrium, or kind of from the AI alignment literature, coherent extrapolated volition, which is sort of what a human would do if we had thought very carefully for a very long time about a question, and sort of considered all the possible nuances, and counterarguments, and so on, and kind of reached the conclusion that is sort of free of inconsistencies.

Then, we’d like to take this kind of vague notion of, what happens when a human thinks for a very long time, and compress it into something we can use as an algorithm in a machine learning context. It’s also a definition. This vague notion of, let a human think for a very long time, that’s sort of a definition, but it’s kind of a strange one. A single human can’t think for a super long time. We don’t have access to that at all. You sort of need a definition that is more factored, where either a bunch of humans think for a long time, we sort of break up tasks, or you sort of consider only parts of the argument space at a time, or something.

You go from there to things that are both definitions of what it means to simulate thinking for long time and also algorithms. The first one of these is Amplification from Paul Christiano, and there you have some questions, and you can’t answer them directly, but you know how to break up a question into subquestions that are hopefully somewhat simpler, and then you sort of recursively answer those subquestions, possibly breaking them down further. You get this big tree of all possible questions that descend from your outer question. You just sort of imagine that you’re simulating over that whole tree, and you come up with an answer, and then that’s the final answer for your question.

Similarly, Debate is a variant of that, in the sense that you have this kind of tree of all possible arguments, and you’re going to try to simulate somehow what would happen if you considered all possible arguments, and picked out the most important ones, and summarized that into an answer for your question.

The broad goal here is to give a practical definition of what it means for people to take human input and push it to its inclusion, and then hopefully, we have a definition that also works as an algorithm, where we can do practical ML training, to train machine learning models.

Lucas: Right, so there’s, I guess, two thoughts that I sort of have here. The first one is that there is just sort of this fundamental question of what is AI alignment? It seems like in your writing, and in the writing of others at OpenAI, it’s to get AI to do what we want them to do. What we want them to do is … either it’s what we want them to do right now, or what we would want to do under reflective equilibrium, or at least we want to sort of get to reflective equilibrium. As you said, it seems like a way of doing that is compressing human thinking, or doing it much faster somehow.

Geoffrey: One way to say it is we want to do what humans want, even if we understood all of the consequences. It’s some kind of, Do what humans want, plus some side condition of: ‘imagine if we knew everything we needed to know to evaluate their question.”

Lucas: How does Debate scale to that level of compressing-

Geoffrey: One thing we should say is that everything here is sort of a limiting state or a goal, but not something we’re going to reach. It’s more important that we have closure under the relative things we might not have thought about. Here are some practical examples from kind of nearer-term misalignment. There’s an experiment in social science where they send out a bunch of resumes to job applications to classified ads, and the resumes were paired off into pairs that were identical except that the name of the person was either white sounding or black sounding, and the result was that you got significantly higher callback rates if the person sounded white, and even if they had an entirely identical resume to the person sounding black.

Here’s a situation where direct human judgment is bad in the way that we could clearly know. You could imagine trying to push that into the task by having an agent say, “Okay, here is a resume. We’d like you to judge it.” Either pointing explicitly to what they should judge, or pointing out, “You might be biased here. Try to ignore the name of the resume, and focus on this issue, like say their education or their experience.” You sort of hope that if you have a mechanism for surfacing concerns or surfacing counterarguments, you can get to a stronger version of human decision making. There’s no need to wait for some long term very strong agent case for this to be relevant, because we’re already pretty bad at making decisions in simple ways.

Then, broadly, I sort of have this sense that there’s not going to be magic in decision making. If I go to some very smart person, and they have a better idea for how to make a decision, or how to answer a question, I expect there to be some way they could explain their reasoning to me. I don’t expect I just have to take them on faith. We want to build methods that surface the reasons they might have to come to a conclusion.

Now, it may be very difficult for them to explain the process for how they came to those arguments. There’s some question about whether the arguments they’re going to make is the same as the reasons they’re giving the answers. Maybe they’re sort of rationalizing and so on. You’d hope that once you sort of surface all the arguments around the question that could be relevant, you get a better answer than if you just ask people directly.

Lucas: As we move out of debate in simple cases of image classifiers or experiments in similar environments, what does debate look like … I don’t really understand the ways in which the algorithms can be trained to elucidate all of these counterconcerns, and all of these different arguments, in order to help human beings arrive at the truth.

Geoffrey: One case we’re considering, especially on kind of the human experiment side, or doing debates with humans, is some sort of domain expert debate. The two debaters are maybe an expert in some field, and they have a bunch of knowledge, which is not accessible to the judge, which is maybe a reasonably competent human, but doesn’t know the details of some domain. For example, we did a debate where there were two people that knew computer science and quantum computing debating a question about quantum computing to a person who has some background, but nothing in that field.

The idea is you start out, there’s a question. Here, the question was, “Is the complexity class BQP equal to NP, or does it contain NP?” One point is that you don’t have to know what those terms mean for that to be a question you might want to answer, say in the course of some other goal. The first steps, things the debaters might say, is they might give short, intuitive definitions for these concepts and make their claims about what the answer is. You might say, “NP is the class of problems where we can verify solutions once we’ve found them, and BQP is the class of things that can run on a quantum computer.”

Now, you could have a debater that just straight up lies right away and says, “Well, actually NP is the class of things that can run on fast randomized computers.” That’s just wrong, and so what would happen then is that the counter debater would just immediately point to Wikipedia and say, “Well, that isn’t the definition of this class.” The judge can look that up, they can read the definition, and realize that one of the debaters has lied, and the debate is over.

You can’t immediately lie in kind of a simple way or you’ll be caught out too fast and lose the game. You have to sort of tell the truth, except maybe you kind of slightly veer towards lying. This is if you want to lie in your argument. At every step, if you’re an honest debater, you can try to pin the liar down to making sort of concrete statements. In this case, if say someone claims that quantum computers can solve all of NP, you might say, “Well, you must point me to an algorithm that does that.” The debater that’s trying to lie and say that quantum computers can solve all of NP might say, “Well, I don’t know what the algorithm is, but meh, maybe there’s an algorithm,” and then they’re probably going to lose, then.

Maybe they have to point to a specific algorithm. There is no algorithm, so they have to make one up. That will be a lie, but maybe it’s kind of a subtle complicated lie. Then, you could kind of dig into the details of that, and maybe you can reduce the fact that that algorithm is a lie to some kind of simple algebra, which either the human can check, maybe they can ask Mathematica or something. The idea is you take a complicated question that’s maybe very broad and covers a lot of the knowledge that the judge doesn’t know and you try to focus in closer and closer on details of arguments that the judge can check.

What the judge needs to be able to do is kind of follow along in the steps until they reach the end, and then there’s some ground fact that they can just look up or check and see who wins.

Lucas: I see. Yeah, that’s interesting. A brief passing thought is thinking about double cruxes and some tools and methods that CFAR employs, like how they might be interesting or used in debate. I think I also want to provide some more clarification here. Beyond debate being a truth-seeking process or a method by which we’re able to see which agent is being truthful, or which agent is lying, and again, there’s sort of this claim that you have in your paper that seems central to this, where you say, “In the debate game, it is harder to lie than to refute a lie.” This asymmetry in debate between the liar and the truth-seeker should hopefully, in general, bias towards people more easily seeing who is telling the truth.

Geoffrey: Yep.

Lucas: In terms of AI alignment again, in the examples that you’ve provided, it seems to help human beings arrive at truth for complex questions that are above their current level of understanding. How does this, again, relate directly to reward learning or value learning?

Geoffrey: Let’s assume that in this debate game, it is the case that it’s very hard to liar, so the winning move is to say the truth. What we want to do then is train kind of two systems. One system will be able to reproduce human judgment. That system would be able to look at the debate transcript and predict what the human would say is the correct winner of the debate. Once you get that system trained, so that’s sort of you’re learning not direct toward, but again, some notion of predicting how humans deal with reasoning. Once you learn that bit, then you can train an agent to play this game.

Then, we have a zero sum game, and then we can sort of apply any technique used to play a zero sum game, like Monte Carlo tree search in AlphaGo, or just straight up RL algorithms, as in some of OpenAI’s work. The hope is that you can train an agent to play this game very well, and therefore, it will be able to predict where counter-arguments exist that would help it win debates, and therefore, if it plays the game well, and the best way to play the game is to tell the truth, then you end up with a value aligned system. Those are large assumptions. You should be cautious if those are true.

Lucas: There’s also all these issues that we can get into about biases that humans have, and issues with debate. Whether or not you’re just going to be optimizing the agents for exploiting human biases and convincing humans. Definitely seems like, even just looking at how human beings value align to each other, debate is one thing in a large toolbox of things, and in AI alignment, it seems like potentially Debate will also be a thing in a large toolbox of things that we use. I’m not sure what your thoughts are about that.

Geoffrey: I could give them. I would say that there’s two ways of approaching AI safety and AI alignment. One way is to try to propose, say, methods that do a reasonably good job at solving a specific problem. For example, you might tackle reversibility, which means don’t take actions that can’t be undone, unless you need to. You could try to pick that problem out and solve it, and then imagine how we’re going to fit this together into a whole picture later.

The other way to do it is try to propose algorithms which have at least some potential to solve the whole problem. Usually, they won’t, and then you should use them as a frame to try to think about how different pieces might be necessary to add on.

For example, in debate, the biggest thing in there is that it might be the case that you train a debate agent that gets very good at this task, the task is rich enough that it just learns a whole bunch of things about the world, and about how to think about the world, and maybe it ends up having separate goals, or it’s certainly not clearly aligned because the goal is to win the game. Maybe winning the game is not exactly aligned.

You’d like to know sort of not only what it’s saying, but why it’s saying things. You could imagine sort of adding interpret ability techniques to this, which would say, maybe Alice and Bob are debating. Alice says something and Bob says, “Well, Alice only said that because Alice is thinking some malicious fact.” If we add solid interpret ability techniques, we could point into Alice’s thoughts at that fact, and pull it out, and service that. Then, you could imagine sort of a strengthened version of a debate where you could not only argue about object level things, like using language, but about thoughts of the other agent, and talking about motivation.

It is a goal here in formulating something like debate or amplification, to propose a complete algorithm that would solve the whole problem. Often, not to get to that point, but we have now a frame where we can think about the whole picture in the context of this algorithm, and then fix it as required going forwards.

I think, in the end, I do view debate, if it succeeds, as potentially the top level frame, which doesn’t mean it’s the most important thing. It’s not a question of importance. More of just what is the underlying ground task that we want to solve? If we’re training agents to either play video games or do question/answers, here the proposal is train agents to engage in these debates and then figure out what parts of AI safety and AI alignment that doesn’t solve and add those on in that frame.

Lucas: You’re trying to achieve human level judgment, ultimately, through a judge?

Geoffrey: The assumption in this debate game is that it’s easier to be a judge than a debater. If it is the case, though, that you need the judge to get to human level before you can train a debater, then you have a problematic bootstrapping issue where, first you must solve value alignment for training the judge. Only then do you have value alignment for training the debater. This is one of the concerns I have. I think the concern sort of applies to some of other scalability techniques. I would say this is sort of unresolved. The hope would be that it’s not actually sort of human level difficult to be a judge on a lot of tasks. It’s sort of easier to check consistency of, say, one debate statement to the next, than it is to do long, reasoning processes. There’s a concern there, which I think is pretty important, and I think we don’t quite know how it plays out.

Lucas: The view is that we can assume, or take the human being to be the thing that is already value aligned, and the process by which … and it’s important, I think, to highlight the second part that you say. You say that you’re pointing out considerations, or whichever debater is saying that which is most true and useful. The useful part, I think, shouldn’t be glossed over, because you’re not just optimizing debaters to arrive at true statements. The useful part smuggles in a lot issues with normative things in ethics and metaethics.

Geoffrey: Let’s talk about the useful part.

Lucas: Sure.

Geoffrey: Say we just ask the question of debaters, “What should we do? What’s the next step that I, as an individual person, or my company, or the whole world should take in order to optimize total utility?” The notion of useful, then, is just what is the right action to take? Then, you would expect a debate that is good to have to get into the details of why actions are good, and so that debate would be about ethics, and metaethics, and strategy, and so on. It would pull in all of that content and sort of have to discuss it.

There’s a large sea of content you have to pull in. It’s roughly kind of all of human knowledge.

Lucas: Right, right, but isn’t there this gap between training agents to say what is good and useful and for agents to do what is good and useful, or true and useful?

Geoffrey: The way in which there’s a gap is this interpretability concern. You’re getting at a different gap, which I think is actually not there. I like giving game analogies, so let me give a Go analogy. You could imagine that there’s two goals in playing the game of Go. One goal is to find the best moves. This is a collaborative process where all of humanity, all of sort of Go humanity, say, collaborates to learn, and explore, and work together to find the best moves in Go, defined by, what are the moves that most win this game? That’s a non-zero sum game, where we’re sort of all working together. Two people competing on the other side of the Go board are working together to get at what the best moves are, but within a game, it’s a zero sum game.

You sit down, and you have two players, two people playing a game of Go, one of them’s going to win, zero sum. The fact that that game is zero sum doesn’t mean that we’re not learning some broad thing about the world, if you’ll zoom out a bit and look at the whole process.

We’re training agents to win this debate game to give the best arguments, but the thing we want to zoom out and get is the best answers. The best answers that are consistent with all the reasoning that we can bring into this task. There’s huge questions to be answered about whether the system actually works. I think there’s an intuitive notion of, say, reflective equilibrium, or coherent extrapolated volition, and whether debate achieves that is a complicated question that’s empirical, and theoretical, and we have to deal with, but I don’t think there’s quite the gap you’re getting at, but I may not have quite voiced your thoughts correctly.

Lucas: It would be helpful if you could unpack how the alignment that is gained through this process is transferred to new contexts. If I take an agent trained to win the Debate game outside of that context.

Geoffrey: You don’t. We don’t take it out of the context.

Lucas: Okay, so maybe that’s why I’m getting confused.

Geoffrey: Ah. I see. Okay, this [inaudible 00:26:09]. We train agents to play this debate game. To use them, we also have them play the debate game. By training time, we give them kind of a rich space of questions to think about, or concerns to answer, like a lot of discussion. Then, we want to go and answer a question in the world about what we should do, what the answer to some scientific question is, is this theorem true, or this conjecture true? We state that as a question, and we have them debate, and then whoever wins, they gave the right answer.

There’s a couple of important things you can add to that. I’ll give like three levels of kind of more detail you can go. One thing is the agents are trained to look at state in the debate game, which could be I’ve just given the question, or there’s a question and there’s a partial transcript, and they’re trained to say the next thing, to make the next move in the game. The first thing you can do is you have a question that you want to answer, say, what should the world do, or what should I do as a person? You just say, “Well, what’s the first move you’d make?” The first move they’d make is to give an answer, and then you just stop there, and you’re done, and you just trust that answer is correct. That’s not the strongest thing you could do.

The next thing you can do is you’ve trained this model of a judge that knows how to predict human judgment. You could have them, from the start of this game, play a whole bunch of games, play 1,000 games of debate, and from that learn with more accuracy what the answer might be. Similar to how you’d, say if you’re playing a game of Go, if you want to know the best move, you would say, “Well, let’s play 1,000 games of Go from this state. We’ll get more evidence and we’ll know what the best move is.”

The most interesting thing you can do, though, is you yourself can act as a judge in this game to sort of learn more about what the relevant issues are. Say there’s a question that you care a lot about. Hopefully, “What should the world do,” is a question you care a lot about. You want to not only see what the answer is, but why. You could act as a judge in this game, and you could, say, play a few debates, or explore part of this debate tree, the tree of all possible debates, and you could do the judgment yourself. There, the end answer will still be who you believe is the right answer, but the task of getting to that answer is still playing this game.

The bottom line here is, at test time, we are also going to debate.

Lucas: Yeah, right. Human beings are going to be participating in this debate process, but does or does not debate translate into systems which are autonomously deciding what we ought to do, given that we assume that their models of human judgment on debate are at human level or above?

Geoffrey: Yeah, so if you turn off the human in the loop part, then you get an autonomous agent. If the question is, “What should the next action be in, say, an environment?” And you don’t have humans in the loop at test time, then you can get an autonomous agent. You just sort of repeatedly simulate debating the question of what to do next. Again, you can cut this process short. Because the agents are trained to predict moves in debate, you can stop them after they’ve predicted the first move, which is what the answer is, and then just take that answer directly.

If you wanted the maximally efficient autonomous agent, that’s the case you would do. At OpenAI, my view, our goal is I don’t want to take AGI and immediately deploy it in the most fast twitch tasks. Something like self-driving a car. If we get to human level intelligence, I’m not going to just replace all the self-driving cars with AGI and let them do their thing. We want to use this for the paths where we need very strong capabilities. Ideally, those tasks are slower and more deliberative, so we can afford to, say, take a minute to interact with the system, or take a minute to have the system engage in its own internal debates to get more confidence in these answers.

The model here is basically the Oracle AI model, that rather than the autonomous agent operating at an NDP model.

Lucas: I think that this is a very important part to unpack a bit more. This distinction here that it’s more like an oracle and less like an autonomous agent going around optimizing everything. What does a world look like right before, during, after AGI given debate?

Geoffrey: The way I think about this is that, an oracle here is a question/answer system of some complexity. You asked it questions, possibly with a bunch of context attached, and it gives you answers. You can reduce pretty much anything to an oracle, if oracle is sort of general enough. If your goal is to take actions in an environment, you can ask the oracle, “What’s the best action to take, and the next step?” And just iteratively ask that oracle over and over again as you take the steps.

Lucas: Or you could generate the debate, right? Over the future steps?

Geoffrey: The most direct way to do an NDP with Debate is to engage in a debate at every step, restart the debate process, showing all the history that’s happened so far, and say, the question at hand, that we’re debating, is what’s the best action to take next? I think I’m relatively optimistic that when we make AGI, for a while after we make it, we will be using it in ways that aren’t extremely fine grain NDP-like in the sense of we’re going to take a million actions in a row, and they’re all actions that hit the environment.

We’d mainly use this full direct reduction. There’s more practical reductions for other questions. I’ll give an example. Say you want to write the best book on, say, metaethics, and you’d like debaters to produce this books. Let’s say that debaters are optimal agents so they know how to do debates on any subject. Even if the book is 1,000 pages long, or say it’s a couple hundred pages long, that’s a more reasonable book, you could do it in a single debate as follows. Ask the agents to write the book. Each agent writes its own book, say, and you ask them to debate which book is better, and that debate all needs to point at small parts of the book.

One of the debaters writes a 300 page book and buried in the middle of it is a subtle argument, which is malicious and wrong. The other debater need only point directly at the small part of the book that’s problematic and say, “Well, this book is terrible because of the following malicious argument, and my book is clearly better.” The way this works is, if you are able to point to problematic parts of books in a debate, and therefore win, the best first move in the debate is to write the best book, so you can do it in one step, where you produce this large object with a single debate, or a single debate game.

The reason I mention this is that’s a little better in terms of practicality, then, writing the book. If the book is like 100,000 words, you wouldn’t want to have a debate about each word, one after another. That’s sort of a silly, very expensive process.

Lucas: Right, so just to back up here, and to provide a little bit more framing, there’s this beginning at which we can see we’re just at a very low level trying to optimize agents for debate, and there’s going to be an asymmetry here that we predict, that it should, in general, usually be easier to tell who’s telling the truth over who’s not, because it’s easier to tell the truth than to lie, and lie in convincing ways. Scaling from there, it seems that what we ultimately really want is to then be able to train a judge, right?

Geoffrey: The goal is to train … You need both.

Lucas: Right. You need both to scale up together.

Geoffrey: Yep.

Lucas: Through doing so, we will have oracles that will be able to assist in decision making, because what we’re doing is amplifying human explicit deliberative processes. Again, this tying into coherent extrapolated volition and reflective equilibrium. What we want is to be able to amplify human decision making, which I guess is, in a broad sense, value alignment.

Geoffrey: Right.

Lucas: Are there any other technical motivations behind debate that you would like to unpack for why within the set of all these different AI alignment methodologies, why debate is so important?

Geoffrey: Sure. I think maybe the right way to talk about that is to contrast amplification and debate. I got to Debate by thinking a bunch about Amplification, talking a bunch with Paul Christiano. To summarize it for listeners again, Amplification is take questions, decompose them into subquestions that are hopefully easier to answer, repeat that decomposition until we get to things we can check, and use that whole process to answer complicated questions.

The stock amplification, the human is doing the task of decomposing the questions. The human decomposition has to be complete. For example, you might have a question, going back to this resume example, we’re having this debate about whether someone whose name sounds white or black is a good candidate. For whatever reason, no one here wants to be racist, but they’re all just completely oblivious to the idea they could be racist. They’re going through and discussing whether the candidate is applicable, they’re kind of breaking these questions down. Just subtly at every point, racism is kind of inching its way into the answers, and they just never thought to kind of investigate.

If at some point they had asked, “Well, am I deciding this because the person has a white sounding name?” They would have realized their mistake, but they never think of that. In debate, you are relying on the agents to be able to do the decomposition, so the agents can, even if it’s hard for a human to do it, point out, “Maybe you’re being racist here,” and sort of get the correction. The advantage of debate there is you get some help doing this combination of tasks for you.

The other thing that happened, frankly, which is one of the reasons I thought of debate, was AlphaGo. In thinking about amplification, I’ve been sort of concerned. “Is this process going to be scalable? Are we going to lose a bunch of efficiency in doing this complicated decomposition process?” I was sort of concerned that we would lose a bunch of efficiency and therefore be not competitive with unsafe techniques to getting to AGI.

Then, AlphaGo came out, and AlphaGo got very strong performance, and it did it by doing an explicit tree search. As part of AlphaGo, it’s doing this kind of deliberative process, and that was not only important for performance at test time, but was very important for getting the training to work. What happens is, in AlphaGo, at training time, it’s doing a bunch of tree search through the game of Go in order to improve the training signal, and then it’s training on that improved signal. That was one thing kind of sitting in the back of my mind.

I was kind of thinking through, then, the following way of thinking about alignment. At the beginning, we’re just training on direct answers. We have these questions we want to answer, an agent answers the questions, and we judge whether the answers are good. You sort of need some extra piece there, because maybe it’s hard to understand the answers. Then, you imagine training an explanation module that tries to explain the answers in a way that humans can understand. Then, those explanations might be kind of hard to understand, too, so maybe you need an explanation explanation module.

For a long time, it felt like that was just sort of ridiculous epicycles, adding more and more complexity. There was no clear end to that process, and it felt like it was going to be very inefficient. When AlphaGo came out, that kind of snapped into focus, and it was like, “Oh. If I train the explanation module to find flaws, and I train the explanation explanation module to find flaws in flaws, then that becomes a zero-sum game. If it turns out that ML is very good at solving zero-sum games, and zero-sum games were a powerful route to drawing performance, then we should take advantage of this in safety.” Poof. We have, in this answer, explanation, explanation, explanation route, that gives you the zero-sum game of Debate.

That’s roughly sort of how I got there. It was a combination of thinking about Amplification and this kick from AlphaGo, that zero-sum games and search are powerful.

Lucas: In terms of the relationship between debate and amplification, can you provide a bit more clarification on the differences, fundamentally, between the process of debate and amplification? In terms of amplification, there’s a decomposition process, breaking problems down into subproblems, eventually trying to get the broken down problems into human level problems. The problem has essentially doubled itself many items over at this point, right? It seems like there’s going to be a lot of questions for human beings to answer. I don’t know how interrelated debate is to decompositional argumentative process.

Geoffrey: They’re very similar. Both Amplification and Debate operate on some large tree. In amplification, it’s the tree of all decomposed questions. Let’s be concrete and say the top level question in amplification is, “What should we do?” In debate, again, the question at the top level is, “What should we do?” In amplification, we take this question. It’s a very broad open-ended question, and we kind of break it down more and more and more. You sort of imagine this expanded tree coming out from that question. Humans are constructing this tree, but of course, the tree is exponentially large, so we can only ever talk about a small part of it. Our hope is that the agents learn to generalize across the tree, so they’re learning the whole structure of the tree, even given finite data.

In the debate case, similarly, you have top level question of, “What should we do,” or some other question, and you have the tree of all possible debates. Imagine every move in this game is, say, saying a sentence, and at every point, you have maybe an exponentially large number of sentences, so the branching factor, now in the tree, is very large. The goal in debate is kind of see this whole tree.

Now, here is the correspondence. In amplification, the human does the decomposition, but I could instead have another agent do the decomposition. I could say I have a question, and instead of a human saying, “Well, this question breaks down into subquestions X, Y, and Z,” I could have a debater saying, “The subquestion that is most likely to falsify this answer is Y.” It could’ve picked at any other question, but it picked Y. You could imagine that if you replace a human doing the decomposition with another agent in debate pointing at the flaws in the arguments, debate would kind of pick out a path through this tree. A single debate transcript, in some sense, corresponds to a single path through the tree of amplification.

Lucas: Does the single path through the tree of amplification elucidate the truth?

Geoffrey: Yes. The reason it does is it’s not an arbitrarily chosen path. We’re sort of choosing the path that is the most problematic for the arguments.

Lucas: In this exponential tree search, there’s heuristics and things which are being applied in general to the tree search in order to collapse onto this one branch or series?

Geoffrey: Let’s say, in amplification, we have a question. Our decomposition is, “Well, this decomposes into X, Y, and Z,” and then we recursively call the agent, and it says, “The answers are AX, AY, AZ, for these questions.” Now, if I trusted those subanswers, I could do the reconstruction of the answers to the original question. If I don’t trust the subanswers, I might say, “Well, which subanswer is most likely to be false?” The correspondence with debate is the other debating agent would point to which one of the subanswers is probably false, and then you recourse down just that tree.

Geoffrey: Here’s the way it works. You can trust amplification if all of the subanswers are correct. If at least one of them is false, a strong debater can find which of those subanswers is false and recourse down to that. That’s why, if it turns out that a single path chosen by strong debaters never finds a false subanswer, then you will have established that the whole tree is true.

Lucas: I see. I think this picture of, “What should we do,” at the top, and then the trees would fall from that in terms of one version of amplification and one version of debate is pretty helpful here.

Geoffrey: The most important thing here is that in Amplification, the branching factor is small. I’m a human, I see a question, I sort of decompose that into a relatively small number of questions. Say, two to 10. In debate, the branching factor is very large. A debate agent could make any statement as the next countermove. What this means is that if it works for strong play, because the branching factor is so large, a shallow debate tree is more powerful than a shallow amplification tree, because you can cover more terrain with a couple steps of debate that are a proxy for a large tree.

The problem with that statement, maybe the biggest advantage to amplification, is you have to really be assuming that you know the agents are strong enough to cover the tree. Say I’m debating whether there’s a black swan, and there in fact is a black swan, but there’s only one of them, and there’s only one in the entire world, and the correct countermove would be pointing at that black swan, finding it out of the entire world. A debater who wanted to say, “There are no black swans,” to just claim, “There are no black swans. Trust me. The other agent would be able to point to one if it exists.” Maybe it’s the case that a black swan does exist, but the other agent is just too weak to point at the black swan, and so that debate doesn’t work.

This argument that shallow debates are powerful leans a whole lot on debaters being very strong, and debaters in practice will not be infinitely strong, so there’s a bunch of subtlety there that we’re going to have to wrestle.

Lucas: It would also be, I think, very helpful if you could let us know how you optimize for strong debaters, and how is amplification possible here if human beings are the ones who are pointing out the simplifications of the questions?

Geoffrey: Whichever one we choose, whether it’s amplification, debate, or some entirely different scheme, if it depends on humans in one of these elaborate ways, we need to do a bunch of work to know that humans are going to be able to do this. At amplification, you would expect to have to train people to think about what kinds of decompositions are the correct ones. My sort of bias is that because debate gives the humans more help in pointing out the counterarguments, it may be cognitively kinder to the humans, and therefore, that could make it a better scheme. That’s one of the advantages of debate.

The technical analogy there is a shallow debate argument. The human side is, if someone is pointing out the arguments for you, it’s cognitively kind. In amplification, I would expect you’d need to train people a fair amount to have the decomposition be reliably complete. I don’t know that I have a lot of confidence that you can do that. One way you can try to do it is, as much as possible, systematize the process on the human side.

In either one of these schemes, we can give the people involved an arbitrary amount of training and instruction in whatever way we think is best, and we’d like to do the work to understand what forms of instruction and training are most truth seeking, and try to do that as early as possible so you have a head start.

I would say I’m not going to be able to give you a great argument for optimism about amplification. This is a discussion that Paul, and Andreas Stuhlmueller, and I have, where I think Paul and Andreas, they kind of lean towards these metareasoning arguments, where if you wanted to answer the question, “Where should I go on vacation,” the first subquestion is, “What would be a good way to decide where to go on vacation?” Quickly go meta, and maybe you go meta, meta, like it’s kind of a mess. Whereas, the hope is that because debate, you have sort of have help pointing to things, you can do much more object level, where the first step in a debate about where to go on vacation is just Bali or Alaska. You give the answer and then you focus in on more …

For a broader class of questions, you can stay at object level reasoning. Now, if you want to get to metaethics, you would have to bring in the kind of reasoning. It should be a goal of ours to, for a fixed task, try to use the simplest kind of human reasoning possible, because then we should expect to get better results out of people.

Lucas: All right. Moving forward. Two things. The first that would be interesting would be if you could unpack this process of training up agents to be good debaters, and to be good predictors of human decision making regarding debates, what that’s actually going to look like in terms of your experiments, currently, and your future experiments. Then, also just pivoting into discussing reasons for optimism and pessimism about debate as a model for AI alignment.

Geoffrey: On the experiment side, as I mentioned, we’re trying to get into the natural language domain, because I think that’s how humans debate and reason. We’re doing a fair amount of work at OpenAI on core ML language modeling, so natural language processing, and then trying to take advantage of that to prototype these systems. At the moment, we’re just doing what I would call zero step debate, or one step debate. It’s just a single agent answering a question. You have question, answer, and then you have a human kind of judging whether the answer is good.

The task of predicting an answer is just read a bunch of text and predict a number. That is essentially just a standard NLP type task, and you can use standard methods from NLP on that problem. The hope is that because it looks so standard, we can sort of just paste the development on the capability side in natural language processing on the safety side. Predicting the result is just sort of use whatever most powerful natural language processing architecture is, and apply it to this task. Architecture and method.

Similarly, on the task of answering questions, that’s also a natural language task, just a generative one. If you’re answering questions, you just read a bunch of text that is maybe the context of the question, and you produce an answer, and that answer is just a bunch of words that you spit out via a language model. If you’re doing, say, a two step debate, where you have question, answer, counterargument, then similarly, you have a language model that spits out an answer, and a language model that spits out the counterargument. Those can in fact be the same language model. You just flip the reward at some point. An agent is rewarded for answering and winning, and answering well while it’s spitting out the answer, and then when it’s spitting out the counteranswer, you just reward it for falsifying the answer. It’s still just degenerative language task with some slightly exotic reward.

Going forwards, we expect there to need to be something like … This is not actually high confidence. Maybe there’s things like AlphaGo zero style tree search that are required to make this work very well on the generative side, and we will explore those as required. Right now, we need to falsify the statement that we can just do it with stock language modeling, which we’re working on. Does that cover the first part?

Lucas: I think that’s great in terms of the first part, and then again, the second part was just places to be optimistic and pessimistic here about debate.

Geoffrey: Optimism, I think we’ve covered a fair amount of it. The primary source of optimism is this argument that shallow debates are already powerful, because you can cover a lot of terrain in argument space with a short debate, because of the high branching factor. If there’s an answer that is robust to all possible counteranswers, then it hopefully is a fairly strong answer, and that gets stronger as you increase the number of steps. This assumes strong debaters. That would be a reason for pessimism, not optimism. I’ll get to that.

The top two is that one, and then the other part is that ML is pretty good at zero-sum games, particularly zero-sum perfect information games. There have been these very impressive headline results from AlphaGo, DeepMind, and Dota at OpenAI, and a variety of other games. In general, zero-sum, close to perfect information games, we roughly know how to do them, at least in this not too high branching factor case. There’s an interesting thing where if you look at the algorithms, say for playing poker, or for playing more than two player games, where poker is zero-sum two player, but is imperfect information, or the algorithm for playing, say, 10 player games, they’re just much more complicated. They don’t work as well.

I like the fact that debate is formulated as a two player zero-sum perfect information game, because we seem to have better algorithms to play them with ML. This is both practically true, it is in practice easier to play them, and also there’s a bunch of theory that says that two player zero-sum is a different complexity class than, say, two player non-zero-sum, or N player. The complexity class gets harder, and you need nastier algorithms. Finding a Nash equilibrium in a general game, that’s either non-zero-sum or more than two players is PPAD-complete, in a tabular case, in a small game, with two player zero-sum, that problem is convex and has a polynomial-time solution. It’s a nicer class. I expect there to continue to be better algorithms to play those games. I like formulating safety as that kind of problem.

Those are kind of the reasons for optimism that I think are most important. I think going into more of those is kind of less important and less interesting than worrying about stuff. I’ll list three of those, or maybe four. Try to be fast so we can circle back. As I mentioned, I think interpretability has a large role to play here. I would like to be able to have an agent say … Again, Alice and Bob are debating. Bob should be able to just point directly into Alice’s thoughts and say, “She really thought X even though she said Y.” The reason you need an interpretability technique for that is, in this conversation, I could just claim that you, Lucas Perry, are having some malicious thought, but that’s not a falsifiable statement, so I can’t use it in a debate. I could always make statement. Unless I can point into your thoughts.

Because we have so much control over machine learning, we have the potential ability to do that, and we can take advantage of it. I think that, for that to work, we need probably a deep hybrid between the two schemes, because an advanced agent’s thoughts will probably be advanced, and so you may need some kind of strengthened thing like amplification or debate just to be able to describe the thoughts, or to point at them in a meaningful way. That’s a problem that we have not really solved. Interpretability is coming along, but it’s definitely not hybridized with these fancy alignment schemes, and we need to solve that at some point.

Another problem is there’s no point in this kind of natural language debate where I can just say, for example, “You know, it’s going to rain tomorrow, and it’s going to rain tomorrow just because I’ve looked at all the weather in the past, and it just feels like it’s going to rain tomorrow.” Somehow, debate is missing this just straight up pattern matching ability of machine learning where I can just read a dataset and just summarize it very quickly. The theoretical side of this is if I have a debate about, even something as simple as, “What’s the average height of a person in the world?” In the debate method I’ve described so far, that debate has to have depth, at least logarithmic in the number of people. I just have to subdivide by population. Like, this half of the world, and then this half of that half of the world, and so on.

I can’t just say, “You know, on average it’s like 1.6 meters.” We need to have better methods for hybridizing debate with pattern matching and statistical intuition, and that’s something that is, if we don’t have that, we may not be competitive with other forms of ML.

Lucas: Why is that not just an intrinsic part of debate? Why is debating over these kinds of things different than any other kind of natural language debate?

Geoffrey: It is the same. The problem is just that for some types of questions, and there are other forms of this in natural language, there aren’t short deterministic arguments. There are many questions where the shortest deterministic argument is much longer than the shortest randomized argument. For example, if you allow randomization, I can say, “I claim the average height of a person is 1.6 meters.” Well, pick a person at random, and you’ll score me according to the square difference between those two numbers. My claim and the height of this particular person you’ve chosen. The optimal move to make there is to just say the average height right away.

The thing I just described is a debate using randomized steps that is extremely shallow. It’s only basically two steps long. If I want to do a deterministic debate, I have to deterministically talk about the average height of a person in North America is X, and in Asia, it’s Y. The other debater could say, “I disagree about North America,” and you sort of recourse into that.

It would be super embarrassing if we propose these complicated alignment schemes, “This is how we’re going to solve AI safety,” and they can’t quickly answer a trivial statistical questions. That would be a serious problem. We kind of know how to solve that one. The harder case is if you bring in this more vague statistical intuition. It’s not like I’m computing a mean over some dataset. I’ve looked at the weather and, you know, it feels like it’s going to rain tomorrow. Getting that in is a bit trickier, but we have some ideas there. They’re unresolved.

The thing which I am optimistic about, but we need to work on, that’s one. The most important reason to be concerned is just that humans are flawed in a variety of ways. We have all these biases, ethical inconsistencies, and cognitive biases. We can write down some toy theoretical arguments. The debate works with a limited but reliable judge, but does it work in practice with a human judge? I think there’s some questions you can kind of reason through there, but in the end, a lot of that will be determined by just trying it, and seeing whether debate works with people. Eventually, when we start to get agents that can play these debates, then we can sort of check whether it worked with two ML agents and a human judge. For now, when language modeling is not that far along, we may need to try it out first with all humans.

This would be, you play the same debate game, but both the debaters are also people, and you set it up so that somehow it’s trying to model this case where the debaters are better than the judge at some task. The debaters might be experts at some domain, they might have access to some information that the judge doesn’t have, and therefore, you can ask whether a reasonably short debate is truth seeking if the humans are playing to win.

The hope there would be that you can test out debate on real people with interesting questions, say complex scientific questions, and questions about ethics, and about areas where humans are biased in known ways, and see whether it works, and also see not just whether it works, but which forms of debate are strongest.

Lucas: What does it mean for debate to work or be successful for two human debaters and one human judge if it’s about normative questions?

Geoffrey: Unfortunately, if you want to do this test, you need to have a source of truth. In the case of normative questions, there’s two ways to go. One way is you pick a task where we may not know the entirety of the answer, but we know some aspect of it with high confidence. An example would be this resume case, where two resumes are identical except for the name at the top, and we just sort of normatively … we believe with high confidence that the answer shouldn’t depend on that. If it turns out that a winning debater can maliciously and subtly take advantage of the name to spread fear into the judge, and make a resume with a black name sound bad, that would be a failure.

We sort of know that because we don’t know in advance whether a resume should be good or bad overall, but we know that this pair of identical resumes shouldn’t depend on the name. That’s one way just we have some kind of normative statement where we have reasonable confidence in the answer. The other way, which is kind of similar, is you have two experts in some area, and the two experts agree on what the true answer is, either because it’s a consensus across the field, or just because maybe those two experts agree. Ideally, it should be a thing that’s generally true. Then, you force one of the experts to lie.

You say, “Okay, you both agree that X is true, but now we’re going to flip a coin and now one of you only wins if you lie, and we’ll see whether that wins or not.”

Lucas: I think it also … Just to plug your game here, you guys do have a debate game. We’ll put a link to that in the article that goes along with this podcast. I suggest that people check that out if you would like a little bit more tangible and fun way to understand debate, and I think it’ll help elucidate what the process looks like, and the asymmetries that go on, and the key idea here that it is harder to lie than to refute a lie. It seems like if we could deploy some sort of massive statistical analysis over many different iterated debates across different agents, that we would be able to come down on the efficacy of debate in different situations where the judge and the debaters are all AI, mixed situations, or all human debates. I think it’d be interesting to see the varying results there.

Geoffrey: This is going to be a noisy enough process for a variety of reasons, that we will probably do this a lot to know. So far, we’ve just done a very small, informal number of these human, human, human debates. Say, if you’re doing expert debate, we’ve already learned a fair amount at a qualitative level, just in those few things. I’ll give an example. In this debate about CS theory question, there was a judge present while the debate was going on, but they were interacting only fairly minimally.

Early in the process of the debate, the debaters … I was the debater telling the truth here. I stated my formulation of the question, and unpacked it intuitively, and the judge perfectly understandably had a subtle misunderstanding of how I’d framed the question, and therefore, throughout the debate, this misunderstanding was not corrected, so there was a misunderstanding of just what the topic was about that was never fixed, and therefore, it was much harder for the honest player to win, because it seemed like the honest player had a harder case to make.

That sort of taught us that having judge interaction is potentially quite important so that the debaters have a detailed understanding of what the judge is thinking. If your goal is to model debate as a perfect information game, the closer to that you can get, the more information that debaters have, the better it should be.

Lucas: Yeah. I mean, that also allows the debaters to exploit cognitive biases in the judge.

Geoffrey: That’s right.

Lucas: You would point that out. Like, “Hey, this person’s exploiting your cognitive bias.”

Geoffrey: Yeah, so I think it’s an open question how exactly to strike that balance, and if there’s a way to strike it that works. Generally, the more information about, say, the judge, that he provides to the debaters, either through judge interaction or just tell the debaters something about the judge, that will make them stronger as players of the game, but it might reveal ways to attack the judge.

Now, if our goal is to be resistant to very strong agents, and it turns out that the only way to make it safe is to hide information from the agents, maybe you shouldn’t use this method. It may not be very resilient. It’s likely that for experiments, we should push as far as we can towards strong play, revealing as much as possible, and see whether it still works in that case.

Lucas: In terms here of the social scientists playing a role here, do you want to go ahead and unpack that a bit more? There’s a paper that you’re working on with Amanda Askell on this.

Geoffrey: As you say, we want to run statistically significant experiments that test whether debate is working and which form of debate are best, and that will require careful experimental design. That is an experiment that is, in some sense, an experiment in just social science. There’s no ML involved. It’s motivated by machine learning, but it’s just a question about how people think, and how they argue and convince each other. Currently, no one at OpenAI has any experience running human experiments of this kind, or at least no one that is involved in this project.

The hope would be that we would want to get people involved in AI safety that have experience and knowledge in how to structure experiments on the human side, both in terms of experimental design, having an understanding of how people think, and where they might be biased, and how to correct away from those biases. I just expect that process to involve a lot of knowledge that we don’t possess at the moment as ML researchers.

Lucas: Right. I mean, in order for there to be an efficacious debate process, or AI alignment process in general, you need to debug and understand the humans as well as the machines. Understanding our cognitive biases in debates, and our weak spots and blind spots in debate, it seems crucial.

Geoffrey: Yeah. I sort of view it as a social science experiment, because it’s just a bunch of people interacting. It’s a fairly weird experiment. It differs from normal experiments in some ways. In thinking about how to build AGI in a safe way, we have a lot of control over the whole process. If it takes a bunch of training to make people good at judging these debates, we can provide that training, pick people who are better or worse at judging. There’s a lot of control that we can exert. In addition to just finding out whether this thing works, it’s sort of an engineering process of debugging the humans, maybe it’s sort of working around human flaws, taking them into account, and making the process resilient.

My highest level hope here is that humans have various flaws and biases, but we are willing to be corrected, and set our flaws aside, or maybe there’s two ways of approaching a question where one way hits the bias and one way doesn’t. We want to see whether we can produce some scheme that picks out the right way, at least to some degree of accuracy. We don’t need to be able to answer every question. If we, for example, learned that, “Well, debate works perfectly well for some broad class of tasks, but not for resolving the final question of what humans should do over the long term future, or resolving all metaethical disagreements or something,” we can afford to say, “We’ll put those aside for now. We want to get through this risky period, make sure AI doesn’t do something malicious, and we can deliberately work through these product questions, take our time doing that.”

The goal includes the task of knowing which things we can safely answer, and the goal should be to structure the debates so that if you give it a question where humans just disagree too much or are too unreliable to reliably answer, the answer should be, “We don’t know the answer to that question yet.” A debater should be able to win a debate by admitting ignorance in that case.

There is an important assumption I’m making about the world that we should make explicit, which is that I believe it is safe to be slow about certain ethical or directional decisions. Y/ou can construct games where you just have to make a decision now, like you’re barreling along in some car with no brakes, you have to dodge left or right around an obstacle, but you can’t just say, “I’m going to ponder this question for a while and sort of hold off.” You have to choose now. I would hope that the task of choosing what we want to do as a civilization is not like that. We can resolve some immediate concerns about serious problems now, and existential risk, but we don’t need to resolve everything,

That’s a very strong assumption about the world, which I think is true, but it’s worth saying that I know that is an assumption.

Lucas: Right. I mean, it’s true insofar as coordination succeeds, and people don’t have incentives just to go do what they think is best.

Geoffrey: That’s right. If you can hold off deciding things until we can deliberate longer.

Lucas: Right. What does this distillization process look for debate, where ensuring alignment is maintained as a system capability is amplified and changed?

Geoffrey: One property of amplification, which is nice, is that you can sort of imagine running it forever. You train on simple questions, and then you train on more complicated questions, and then you keep going up and up and up, and if you’re confident that you’ve trained enough on the simple questions, you can never see them again, freeze that part of the model, and keep going. I think in practice, that’s probably not how we would run it, so you don’t inherit that advantage. In debate, what you would have to do to get to more and more complicated questions is, at some point, and maybe this point is fairly far off, but you have to go to the longer and longer and longer debates.

If you’re just sort of thinking about the long term future, I expect to have to switch over to some other scheme, or at least layer a scheme, embed debate in a larger scheme. An example would be it could be that the question you resolve with debate is, “What is an even better way to build AI alignment?” That, you can resolve with, say, depth 100 debates, and maybe you can handle that depth well. What that spits out to you is an algorithm, you interrogate it enough to know that you trust it, and you can put that one.

You can also imagine eventually needing to hybridize kind of a Debate-like scheme and an Amplification-like scheme, where you don’t get a new algorithm out, but you trust this initial debating oracle enough that you can view it as fixed, and then start a new debate scheme, which can trust any answer that original scheme produces. Now, I don’t really like that scheme, because it feels like you haven’t gained a whole lot. Generally, if you think about, say, the next 1,000 years … It’s useful to think about the long-term. AI alignment going forwards. I expect to need further advances after we get past this AI risk period.

I’ll give a concrete example. You ask your debating agents, “Okay, give me a perfect theorem prover.” Right now, all of our theorem provers have little bugs, probably, so you can’t really trust them to resist superintelligent agent. You say you trust that theorem prover that you get out, and you say, “Okay, now, just I want a proof that AI alignment works.” You bootstrap your way up using this agent as an oracle on sort of interesting, complicated questions, until you’ve got to a scheme that gets you to the next level, and then you iterate.

Lucas: Okay. In terms of practical, short-term world to AGI world maybe in the next 30 years, what does this actually look like? In what ways could we see debate and amplification deployed and used at scale?

Geoffrey: There is the direct approach, where you use them to answer questions, using exactly the structure they’re trained as. Debating agent, you would just engage in debates, and you would use it as an oracle in that way. You can also use it to generate training data. You could, for example, ask a debating agent to spit out the answers to a large number of questions, and then you just train a little module. If you trust all the answers, and you trust supervised learning to work. If you wanted to build a strong self-driving car, you could ask it to train a much smaller network that way. It would not be human level, but it just gives you a way to access data.

There’s a lot you could do with a powerful oracle that gives you answers to questions. I could probably go on at length about fancy schemes you could do with oracles. I don’t know if it’s that important. The more important part to me is what is the decision process we deploy these things into? How we choose which questions to answer and what we do with those answers. It’s probably not a great idea to train an oracle and then give it to everyone in the world right away, unfiltered, for reasons you can probably fill in by yourself. Basically, malicious people exist, and would ask bad questions, and eventually do bad things with the results.

If you have one of these systems, you’d like to deploy it in a way that can help as many people as possible, which means everyone will have their own questions to ask of it, but you need some filtering mechanism or some process to decide which questions to actually ask what to do with the answers, and so on.

Lucas: I mean, can the debate process be used to self-filter out providing answers for certain questions, based off of modeling the human decision about whether or not they would want that question answered?

Geoffrey: It can. There’s a subtle issue, which I think we need to deal with, but haven’t dealt with yet. There’s a commutativity question, which is, say you have a large number of people, there’s a question of whether you reach reflective equilibrium for each person first, and then you would, say, vote across people, or whether you have a debate, and then you vote on the answer to what the judgment should be. Imagine playing a Debate game where you play a debate, and then everyone votes on who wins. There’s advantages on both sides. On the side of voting after reflective equilibrium, you have this problem that if you reach reflective equilibrium for a person, it may be disastrous if you pick the wrong person. That extreme is probably bad. The other extreme is also kind of weird because there are a bunch of standard results where if you take a bunch of rational agents voting, it might be true that A and B implies C, but the agents might vote yes on A, yes on B, and no on C. Votes on statements where every voter is rational are not rational. The voting outcome is irrational.

The result of voting before you take reflective equilibrium is sort of an odd philosophical concept. Probably, you need some kind of hybrid between these schemes, and I don’t know exactly what that hybrid looks like. That’s an area where I think technical AI safety mixes with policy to a significant degree that we will have to wrestle with.

Lucas: Great, so to back up and to sort of zoom in on this one point that you made, is the view that one might want to be worried about people who might undergo an amplified long period of explicit human reasoning, and that they might just arrive at something horrible through that?

Geoffrey: I guess, yes, we should be worried about that.

Lucas: Wouldn’t one view of debate be that also humans, given debate, would also over time come more likely to true answers? Reflective equilibrium will tend to lead people to truth?

Geoffrey: Yes. That is an assumption. The reason I think there is hope there … I think that you should be worried. I think the reason for hope is our ability to not answer certain questions. I don’t know that I trust reflective equilibrium applied incautiously, or not regularized in some way, but I expect that if there’s a case where some definition of reflective equilibrium is not trustworthy, I think it’s hopeful that we can construct debate so that the result will be, “This is just too dangerous too decide. We don’t really know with high confidence the answer.”

Geoffrey: This is certainly true of complicated moral things. Avoiding lock in, for example. I would not trust reflective equilibrium if it says, “Well, the right answer is just to lock our values in right now, because they’re great.” We need to take advantage of the outs we have in terms of being humble about deciding things. Once you have those outs, I’m hopeful that we can solve this, but there’s a bunch of work to do to know whether that’s actually true.

Lucas: Right. Lots more experiments to be done on the human side and the AI side. Is there anything here that you’d like to wrap up on, or anything that you feel like we didn’t cover that you’d like to make any last minute points?

Geoffrey: I think the main point is just that there’s a bunch of work here. OpenAI is hiring people to work on both the ML side of things, also theoretical aspects, if you think you like wrestling with how these things work on the theory side, and then certainly, trying to start on this human side, doing the social science and human aspects. If this stuff seems interesting, then we are hiring.

Lucas: Great, so people that are interested in potentially working with you or others at OpenAI on this, or if people are interested in following you and keeping up to date with your work and what you’re up to, what are the best places to do these things?

Geoffrey: I have taken a break from pretty much all social media, so you can follow me on Twitter, but I won’t ever post anything, or see your messages, really. I think email me. It’s not too hard to find my email address. That’s pretty much the way, and then watch as we publish stuff.

Lucas: Cool. Well, thank you so much for your time, Geoffrey. It’s been very interesting. I’m excited to see how these experiments go for debate, and how things end up moving along. I’m pretty interested and optimistic, I guess, about debate is an epistemic process in its role for arriving at truth and for truth seeking, and how that will play in AI alignment.

Geoffrey: That sounds great. Thank you.

Lucas: Yep. Thanks, Geoff. Take care.

If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

[end of recorded material]

FLI Podcast (Part 2): Anthrax, Agent Orange, and Yellow Rain: Verification Stories with Matthew Meselson and Max Tegmark

In this special two-part podcast Ariel Conn is joined by Max Tegmark for a conversation with Dr. Matthew Meselson, biologist and Thomas Dudley Cabot Professor of the Natural Sciences at Harvard University. Dr. Meselson began his career with an experiment that helped prove Watson and Crick’s hypothesis on the structure and replication of DNA. He then got involved in arms control, working with the US government to renounce the development and possession of biological weapons and halt the use of Agent Orange and other herbicides in Vietnam. From the cellular level to that of international policy, Dr. Meselson has made significant contributions not only to the field of biology, but also towards the mitigation of existential threats.   

Part Two focuses on three major incidents in the history of biological weapons: the 1979 anthrax outbreak in Russia, the use of Agent Orange and other herbicides in Vietnam, and the Yellow Rain controversy in the early 80s. Dr. Meselson led the investigations into all three and solved some perplexing scientific mysteries along the way.

Topics discussed in this episode include:

  • The value of verification, regardless of the challenges
  • The 1979 Sverdlovsk anthrax outbreak
  • The use of “rainbow” herbicides during the Vietnam War, including Agent Orange
  • The Yellow Rain Controversy

Publications and resources discussed in this episode include:

  • The Sverdlovsk anthrax outbreak of 1979, Matthew Meselson, Jeanne Guillemin, Martin Hugh-Jones, Alexander Langmuir, Ilona Popova, Alexis Shelokov, and Olga Yampolskaya, Science, 18 November 1994, Vol. 266, pp 1202-1208.
  • Preliminary Report- Herbicide Assessment Commission of the American Association for the Advancement of Science, Matthew Meselson, A. H. Westing, J. D. Constable, and Robert E. Cook, 30 December 1970, private circulation, 8 pp. Reprinted in Congressional Record, U.S. Senate, Vol. 118-part 6, 3 March 1972, pp 6806-6807.
  • “Background Material Relevant to Presentations at the 1970 Annual Meeting of the AAAS”, Herbicide Assessment Commission of the AAAS, with A.H. Westing and J.D. Constable, December 1970, private circulation, 48 pp. Reprinted in the Congressional Record, U.S. Senate, Vol. 118-part 6, 3 March 1972, pp 6807-6813.
  • “The Yellow Rain Affair: Lessons from a Discredited Allegation”, with Julian Perry Robinson Terrorism, War, or Disease? eds. A.L. Clunan, P.R. Lavoy, and SB Martin, Stanford University Press, Stanford, California. 2008, pp 72-96.
  • Yellow Rain by Thomas D. Seeley, Joan W. Nowicke, Matthew Meselson, Jeanne Guillemin and Pongthep Akratanakul, Scientific American, September 1985, Vol. 253, pp 128-137.

Click here for Part 1: From DNA to Banning Biological Weapons with Matthew Meselson and Max Tegmark

Four-ship formation on a defoliation spray run. (U.S. Air Force photo)

Ariel: Hi everyone. Ariel Conn here with the Future of Life Institute. And I would like to welcome you to part two of our two-part FLI podcast with special guest Matthew Meselson and special guest/co-host Max Tegmark. You don’t need to have listened to the first episode to follow along with this one, but I do recommend listening to the other episode, as you’ll get to learn about Matthew’s experiment with Franklin Stahl that helped prove Watson and Crick’s theory of DNA and the work he did that directly led to US support for a biological weapons ban. In that episode, Matthew and Max also talk about the value of experiment and theory in science, as well as how to get some of the world’s worst weapons banned. But now, let’s get on with this episode and hear more about some of the verification work that Matthew did over the years to help determine if biological weapons were being used or developed illegally, and the work he did that led to the prohibition of Agent Orange.

Matthew, I’d like to ask about a couple of projects that you were involved in that I think are really closely connected to issues of verification, and those are the Yellow Rain Affair and the Russian Anthrax incident. Could you talk a little bit about what each of those was?

Matthew: Okay, well in 1979, there was a big epidemic of anthrax in the Soviet city of Sverdlovsk, just east of the Ural mountains, in the beginning of Siberia. We learned about this epidemic not immediately but eventually, through refugees and other sources, and the question was, “What caused it?” Anthrax can occur naturally. It’s commonly a disease of bovids, that is cows or sheep, and when they die of anthrax, the carcass is loaded with the anthrax bacteria, and when the bacteria see oxygen, they become tough spores, which can last in the earth for a long, long time. And then if another bovid comes along and manages to eat something that’s got those spores, he might get anthrax and die, and the meat from these animals who died of anthrax, if eaten, can cause gastrointestinal anthrax, and that can be lethal. So, that’s one form of anthrax. You get it by eating.

Now, another form of anthrax is inhalation anthrax. In this country, there were a few cases of men who worked in leather factories with leather that had come from anthrax-affected animals, usually imported, which had live anthrax spores on the leather that got into the air of the shops where people were working with the leather. Men would breathe this contaminated air and the infection in that case was through the lungs.

The question here was, what kind of anthrax was this: inhalational or gastrointestinal? And because I was by this time known as an expert on biological weapons, the man who was dealing with this issue at the CIA in Langley, Virginia — a wonderful man named Julian Hoptman, a microbiologist by training — asked me if I’d come down and work on this problem at the CIA. He had two daughters who were away at college, and so he had a spare bedroom, so I actually lived with Julian and his wife. And in this way, I was able to talk to Julian night and day, both at the breakfast and dinner table, but also in the office. Of course, we didn’t talk about classified things except in the office.

Now, we knew from the textbooks that the incubation period for inhalation anthrax was thought to be four, five, six, seven days; Between the time you inhale it, four, five days later, if you hadn’t yet come down with it, you probably wouldn’t. Well, we knew from classified sources that people were dying of this anthrax over a period of six weeks, April all the way into the middle of May 1979. So, if the incubation period was really that short, you couldn’t explain how that would be airborne because a cloud goes by right away. Once it’s gone, you can’t inhale it anymore. So that made the conclusion that it was airborne difficult to reach. You could still say, well maybe it got stirred up again by people cleaning up the site, maybe the incubation period is longer than we thought, but there was a problem there.

And so the conclusion of our working group was that it was probable that it was airborne. In the CIA, at that time at least, in a conclusion that goes forward to the president, you couldn’t just say, “Well maybe, sort of like, kind of like, maybe if …” Words like that just didn’t work, because the poor president couldn’t make heads nor tails. Every conclusion had to be called “possible,” “probable,” or “confirmed.” Three levels of confidence.

So, the conclusion here was that it was probable that it was inhalation, and not ingestion. The Soviets said that it was bad meat, but I wasn’t convinced, mainly because of this incubation period thing. So I decided that the best thing to do would be to go and look. Then you might find out what it really was. Maybe by examining the survivors or maybe by talking to people — just somehow, if you got over there, with some kind of good luck, you could figure out what it was. I had no very clear idea, but when I would meet any high level Soviet, I’d say, “Could I come over there and bring some colleagues and we would try to investigate?”

The first time that happened was with a very high-level Soviet who I met in Geneva, Switzerland. He was a member of what’s called the Military Industrial Commission in the Soviet Union. They decided on all technical issues involving the military, and that would have included their biological weapons establishments, and we knew that they had a big biological laboratory in the city of Sverdlovsk, there was no doubt about that. So, I told them, “I want to go in and inspect. I’ll bring some friends. We’d like to look.” And he said, “No problem. Write to me.”

So, I wrote to him, and I also went to the CIA and said, “Look, I got to have a map because maybe they’d let me go there and take me to the wrong place, and I wouldn’t know it’s the wrong place, and I wouldn’t learn anything. So, the CIA gave me a map — which turned out to be wrong, by the way — but then I got a letter back from this gentleman saying no, actually they couldn’t let us go because of the shooting down of the Korean jet #007, if any of you remember that. A Russian fighter plane shot down a Korean jet — a lot of passengers on it and they all got killed. Relations were tense. So, that didn’t happen.

Then the second time, an American and the Russian Minister of Health got a Nobel prize. The winner over there was the minister of health named Chazov, and the fellow over here was Bernie Lown in our medical school, who I knew. So, I asked Bernie to take a letter when he went next time to see his friend Chazov in Moscow, to ask him if he could please arrange that I could take a team to Sverdlovsk, to go investigate on site. And when Bernie came back from Moscow, I asked him and he said, “Yeah. Chazov says it’s okay, you can go.” So, I sent a telex — we didn’t have email — to Chazov saying, “Here’s the team. We want to go. When can we go?” So, we got back a telex saying, “Well, actually, I’ve sent my right-hand guy who’s in charge of international relations to Sverdlovsk, and he looked around, and there’s really no evidence left. You’d be wasting your time,” which means no, right? So, I telexed back and said, “Well, scientists always make friends and something good always comes from that. We’d like to go to Sverdlovsk anyway,” and I never heard back. And then, the Soviet Union collapses, and we have Yeltsin now, and it’s the Russian Republic.

It turns out that a group of — I guess at that time they were still Soviets — Soviet biologists came to visit our Fort Detrick, and they were the guests of our Academy of Sciences. So, there was a welcoming party, and I was on the welcoming party, and I was assigned to take care of one particular one, a man named Mr. Yablokov. So, we got to know each other a little bit, and at that time we went to eat crabs in a Baltimore restaurant, and I told him I was very interested in this epidemic in Sverdlovsk, and I guess he took note of that. He went back to Russia and that was that. Later, I read in a journal that the CIA produced, abstracts from the Russian literature press, that Yeltsin had ordered his minister, or his assistant for Environment and Health, to investigate the anthrax epidemic back in 1979, and the guy who he appointed to do this investigation for him was my Mr. Yablokov, who I knew.

So, I sent a telex to Mr. Yablokov saying, “I see that President Yeltsin has asked for you to look into this old epidemic and decide what really happened, and that’s great, I’m glad he did that, and I’d like to come and help you. Could I come and help you?” So, I got back a telex saying, “Well, it’s a long time ago. You can’t bring skeletons out of the closet, and anyway, you’d have to know somebody there.” Basically it was a letter that said no. But then my friend Alex Rich of Cambridge Massachusetts, a great molecular biologist and X-ray crystallographer at MIT, had a party for a visiting Russian. Who is the visiting Russian but a guy named Sverdlov, like Sverdlovsk, and he’s staying with Alex. And Alex’s wife came over to me and said, “Well, he’s a very nice guy. He’d been staying with us for several days. I make him breakfast and lunch. I make the bed. Maybe you could take him for a while.”

So we took him into our house for a while, and I told him that I had been given a turn down by Mr. Yablokov, and this guy whose name is Sverdlov, which is an immense coincidence, said, “Oh, I know Yablokov very well. He’s a pal. I’ll talk to him. I’ll get it fixed so you can go.” Now, I get a letter. In this letter, handwritten by Mr. Yablokov, he said, “Of course, you can go, but you’ve got to know somebody there to invite you.” Oh, who would I know there?

Well, there had been an American Physicist, a solid-state physicist named Ellis who was there on a United States National Academy of Sciences–Russian Academy of Sciences Exchange Agreement doing solid-state physics with a Russian solid-state physicist there in Sverdlovsk. So, I called Don Ellis and I asked him, “That guy who you cooperated with in Sverdlovsk — whose name was Gubanov — I need someone to invite me to go to Sverdlovsk, and you probably still maintain contact with him over there in Sverdlovsk, and you could ask him to invite me.” And Don said, “I don’t have to do that. He’s visiting me today. I’ll just hand him the telephone.”

So, Mr. Gubanov comes on the telephone and he says, “Of course I’ll invite you, my wife and I have always been interested in that epidemic.” So, a few days later, I get a telex from the rector of the university there in Sverdlovsk, who was a mathematical physicist. And he says, “The city is yours. Come on. We’ll give you every assistance you want.” So we went, and I formed a little team, which included a pathologist, thinking maybe we’ll get ahold of some information of autopsies that could decide whether it was inhalation or gastrointestinal. And we need someone who speaks Russian; I had a friend who was a virologist who spoke Russian. And we need a guy who knows a lot about anthrax, and veterinarians know a lot about anthrax, so I got a veterinarian. And we need an anthropologist who knows a lot about how to work with people and that happened to be my wife, Jeanne Guillemin.

So, we all go over there, we were assigned a solid-state physicist, a guy named Borisov, to take us everywhere. He knew how to fix everything. Cars that wouldn’t work, and also the KGB. He was a genius, and became a good friend. It turns out that he had a girlfriend, and she, by this time, had been elected to be a member of the Duma. In other words, she’s a congresswoman. She’s from Sverdlovsk. She had been a friend of Yeltsin. She had written Yeltsin a letter, which my friend Borisov knew about, and I have a photocopy of the letter. What it says is, “Dear Boris Nikolayevich,”that’s Yeltsin, “My constituents here at Sverdlovsk want to know if that anthrax epidemic was caused by a government activity or not. Because if it was, the families of those who died — they’re entitled to double pension money, just like soldiers killed in war.” So, Yeltsin writes back, “We will look into it.” And that’s why my friend Yablokov got asked to look into it. It was decided eventually that it was the result of government activity — by Yeltsin, he decided that — and so he had to have a list of the people who were going to get the extra pensions. Because otherwise everybody would say, “I’d like to have an extra pension.” So there had to be a list.

So she had this list with 68 names of the people who had died of anthrax during this time period in 1979. The list also had the address where they lived. So,now my wife, Jeanne Guillemin, Professor of Anthropology at Boston College, goes door-to-door — with two Russian women who were professors at the university and who knew English so they could communicate with Jeanne — knocks on the doors: “We would like to talk to you for a little while. We’re studying health, we’re studying the anthrax epidemic of 1979. We’re from the university.”

Everybody let them in except one lady who said she wasn’t dressed, so she couldn’t let anybody in. So in all the other cases, they did an interview and there were lots of questions. Did the person who died have TB? Was that person a smoker? One of the questions was where did that person work, and did they work in the day or the night? We asked that question because we wanted to make a map. If it had been inhalation anthrax, it had to be windborne, and depending on the wind, it might have been blown in a straight line if the wind was of a more or less unchanging direction.

If, on the other hand, it was gastrointestinal, people get bad meat from black market sellers all over the place, and the map of where they were wouldn’t show anything important, they’d just be all over the place. So, we were able to make a map when we got back home, we went back there a second time to get more interviews done, and Jeanne went back a third time to get even more interviews done. So, finally we had interviews with families of nearly all of those 68 people, and so we had 68 map locations: where they lived, and where they worked, and whether it was day or night. Nearly all of them were daytime workers.

When we plotted where they lived, they lived all over the southern part of the city of Sverdlovsk. When we plotted where they were likely would have been in the daytime, they all fell in to one narrow zone with one point at the military biological lab. The lab was inside the city. The other point was at the city limit: The last case was at the edge of the city limit, the southern part. We also had meteorological information, which I had brought with me from the United States. We knew the wind direction every three hours, and there was only one day when the wind was constantly blowing in the same direction, and that same direction was exactly the direction along which the people who died of anthrax lived.

Well, bad meat does not blow around in straight lines. Clouds of anthrax spores do. It was rigorous: We could conclude from this, with no doubt whatsoever, that it had been airborne, and we published this in Science magazine. It was really a classic of epidemiology, you couldn’t ask for anything better. Also, the autopsy records were inspected by the pathologist along with our trip, and he concluded from the autopsy specimens that it was inhalation. So, there was that evidence, too, and that was published in the PNAS. So, that really ended the mystery. The Soviet explanation was just wrong, and the CIA explanation, which was only probable: it was confirmed.

Max: Amazing detective story.

Matthew: I liked going out in the field, using whatever science I knew to try and deal with questions of importance to arms control, especially chemical and biological weapons arms control. And that happened to me on three occasions, one I just told you. There were two others.

Ariel: So, actually real quick before you get into that. I just want to mention that we will share or link to that paper and the map. Because I’ve seen the map that shows that straight line, and it is really amazing, thank you.

Matthew: Oh good.

Max: I think at the meta level this is also a wonderful example of what you mentioned earlier there, Matthew, about verification. It’s very hard to hide big programs because it’s so easy for some little thing to go wrong or not as planned and then something like this comes out.

Matthew: Exactly. By the way, that’s why having a verification provision in the treaty is worth it even if you never inspect. Let’s say that the guys who are deciding whether or not to do something which is against the treaty, they’re in a room and they’re deciding whether or not to do it. Okay? Now it is prohibited by a treaty that provides for verification. Now they’re trying to make this decision and one guy says, “Let’s do it. They’ll never see it. They’ll never know it.” Another guy says, “Well, there is a provision for verification. They may ask for a challenge inspection.” So, even the remote possibility that, “We might get caught,” might be enough to make that meeting decide, “Let’s not do it.” If it’s not something that’s really essential, then there is a potential big price.

If, on the other hand, there’s not even a treaty that allows the possibility of a challenge inspection, if the guy says, “Well, they might find it,” the other guy is going to say, “How are they going to find it? There’s no provision for them going there. We can just say, if they say, ‘I want to go there,’ we say, ‘We don’t have a treaty for that. Let’s make a treaty, then we can go to your place, too.’” It makes a difference: Even a provision that’s never used is worth having. I’m not saying it’s perfection, but it’s worth having. Anyway, let’s go on to one of these other things. Where do you want me to go?

Ariel: I’d really love to talk about the Agent Orange work that you did. So, I guess if you could start with the Agent Orange research and the other rainbow herbicides research that you were involved in. And then I think it would be nice to follow that up with, sort of another type of verification example, of the Yellow Rain Affair.

Matthew: Okay. The American Association for the Advancement of Science, the biggest organization of science in the United States, became, as the Vietnam War was going on, more and more concerned that the spraying of herbicides in Vietnam might cause ecological or health harm. And so at successive national meetings, there were resolutions to have it looked into. And as a result of one of those resolutions, the AAAS asked a fellow named Fred Tschirley to look into it. Fred was at the Department of Agriculture, but he was one of the people who developed the military use of herbicides. He did a study, and he concluded that there was no great harm. Possibly to the mangrove forest, but even then they would regenerate.

But at the next annual meeting, there was more appealing on the part of the membership, and now they wanted the AAAS to do its own investigation, and the compromise was they’d do their own study to design an investigation, and they had to have someone to lead that. So, they asked a fellow named John Cantlon, who was provost of Michigan State University, would he do it, and he said yes. And after a couple of weeks, John Cantlon said, “I can’t do this. I’m being pestered by the left and the right and the opponents on all sides and it’s just, I can’t do it. It’s too political.”

So, then they asked me if I would do it. Well, I decided I’d do it. The reason was that I wanted to see the war. Here I’d been very interested in chemical and biological weapons; very interested in war, because that’s the place where chemical and biological weapons come into play. If you don’t know anything about war, you don’t know what you’re talking about. I taught a course at Harvard for over two years on war, but that wasn’t like being there. So, I said I’d do it.

I formed a little group to do it. A guy named Arthur Westing, who had actually worked with herbicides and who was a forester himself and had been in the army in Korea, and I think had a battlefield promotion to captain. Just the right combination of talents. Then we had a chemistry graduate student, a wonderful guy named Bob Baughman. So, to design a study, I decided I couldn’t do it sitting here in Cambridge, Massachusetts. I’d have to go to Vietnam and do a pilot study in order to design a real study. So, we went to Vietnam — by the way, via Paris, because I wanted to meet the Vietcong people, I wanted them to give me a little card we could carry in our boots that would say, if we were captured, “We’re innocent scientists, don’t imprison us.” And we did get such little cards that said that. We were never captured by the Vietcong, but we did have some little cards.

Anyway, we went to Vietnam and we found, to my surprise, that the military assistance command, that is the United States Military in Vietnam, very much wanted to help our investigation. They gave us our own helicopter. That is, they assigned a helicopter and a pilot to me. And anywhere we wanted to go, I’d just call a certain number the night before and then go to Tan Son Nhut Air Base, and there would be a helicopter waiting with a pilot instructed FAD — fly as directed.

So, one of the things we did was to fly over a valley on which herbicides had been sprayed to kill the rice. John Constable, the medical member of our team, and I did two flights of that so we could take a lot of pictures. And the man who had designed this mission, a chemical corps captain named Captain Franz, had designed the mission and requested it and gotten permission through a series of review processes that it was really an enemy crop production area, not an area of indigenous Montagnard people growing food for their own eating, but rather enemy soldiers growing it for themselves.

So we took a lot of pictures and as we flew, Colonel Franz said, “See down there, there are no houses. There’s no civilian population. It’s just military down there. Also, the rice is being grown on terraces on the hillsides. The Montagnard people don’t do that. They just grow it down in the valley. They don’t practice terracing. And also, the extent of the rice fields down there — that’s all brand new. Fields a few years ago were much, much smaller in area. So, that’s how we know that it’s an enemy crop production area.” And he was a very nice man, and we believed him. And then we got home, and we had our films developed.

Well, we had very good cameras and although you couldn’t see from the aircraft, you could certainly see in the film: The valley was loaded with little grass shacks with yellow roofs — meaning that they were built recently, because you have to replace the roofs every once in a while with straw and if it gets too old, it turns black, but if there’s yellow, it means that somebody is living in those. And there were hundreds and hundreds of them.

We got from the Food and Agriculture Organization in Rome how much rice you need to stay alive for one year, and what area in hectares of dry rice — because this isn’t patty rice, it’s dry rice — you’d need to make that much rice, and we measured the area that was under cultivation from our photographs, and the area was just enough to support that entire population, if we assumed that there were five people who needed to be fed in every one of the houses that we counted.

Also, we could get from the French aerial photography that they had done in the late 1940s, and it turns out that the rice fields had not expanded. They were exactly the same. So it wasn’t that the military had moved in and made bigger rice fields: They were the same. So, everything that Colonel Franz said was just wrong. I’m sure he believed it, but it was wrong.

So, we made great big color enlargements of our photographs — we took photographs all up and down this valley, 15 kilometers long — and we made one set for Ambassador Bunker; one copy for General Abrams — Creighton Abrams was the head of our military assistance command; and one set for Secretary of State Rogers; along with a letter saying that this one case that we saw may not be typical, but in this one case, this crop destruction program was achieving the opposite of what it intended. It was denying food to the civilian population and not to the enemy. It was completely mistaken. So, as a result, I think, of that, but I have no proof, only the time connection, but right after that in early November — we’d sent the stuff in early November — Ambassador Bunker and General Abrams ordered a new review of the crop destruction program. Was it in response to our photographs and our letter? I don’t know, but I think it was.

The result of that review was a recommendation by Ambassador Bunker and General Abrams to stop the herbicide program immediately. They sent this recommendation back in a top secret telegram to Washington. Well, the top-secret telegram fell into the hands of the Washington Post, and they published it. Well, now here are the Ambassador and the General on the spot, saying to stop doing something in Vietnam. How on earth can anybody back in Washington gainsay them? Of course, President Nixon had to stop it right away. There’d be no grounds. How could he say, “Well, my guys here in Washington, in spite of what the people on the spot say, tell us we should continue this program.”

So that very day, he announced that the United States would stop all herbicide operations in Vietnam in a rapid and orderly manner. That very day happened to be the day that I, John Constable, and Art Westing were on the stage at the annual meeting in Chicago of the AAAS, reporting on our trip to Vietnam. And the president of AAAS ran up to me to tell me this news, because it just came in while I was talking, giving our report. So, that’s how it got stopped, and thanks to General Abrams.

By the way, the last day I was in Vietnam, General Abrams had just come back from Japan — he’d had an operation for gallbladder, and he was still convalescing. We spent all morning talking with each other. And he asked me at one point, “What about the military utility of the herbicides?” And of course, I said I had no idea what it was, or not. And he said, “Do you want to know what I think?” I said, “Yes, sir.” He said, “I think it’s shit.” I said, “Well, why are we doing it here?” He said, “You don’t understand anything about this war, young man. I do what I’m ordered to do from Washington. It’s Washington who tells me to use this stuff, and I have to use it because if I didn’t have those 55-gallon drums of herbicides offloaded on the decks at Da Nang and Saigon, then they’d make walls. I couldn’t offload the stuff I need over those walls. So, I do let the chemical corps use this stuff.” He said, “Also, my son, who is a captain up in I Corps, agrees with me about that.”

I wrote something about this recently, which I sent to you, Ariel. I want to be sure my memory was right about the conversation with General Abrams — who, by the way, was a magnificent man. He is the man who broke through at the Battle of the Bulge in World War II. He’s the man about whom General Patton, the great tank general, said, “There’s only one tank officer greater than me, and it’s Abrams.”

Max: Is he the one after whom the Abrams tank is named?

Matthew: Yes, it was named after him. Yes. He had four sons, they all became generals, and I think three of them became four-stars. One of them who did become a four-star is still alive in Washington. He has a consulting company. I called him up and I said, “Am I right, is this what your dad thought and what you thought back then?” He said, “Hell, yes. It’s worse than that.” Anyway, that’s what stopped the herbicides. They may have stopped anyway. It was dwindling down, no question. Now the question of whether dioxin and herbicides have caused too many health effects, I just don’t know. There’s an immense literature about this and it’s nothing I can say we ever studied. If I read all the literature, maybe I’d have an opinion.

I do know that dioxin is very poisonous, and there’s a prelude to this order from President Nixon to stop the use of all herbicides. That’s what caused the United States to stop the use of Agent Orange specifically. That happened first, before I went to Vietnam. That happened for a funny reason. A Harvard student, a Vietnamese boy, came to my office one day with a stack of newspapers from Saigon in Vietnamese. I couldn’t read them, of course, but they all had pictures of deformed babies, and this student claimed that this was because of Agent Orange, that the newspaper said it was because of Agent Orange.

Well, deformed babies are born all the time and I appreciated this coming from him, but there’s nothing I could do about it. But then I got from a graduate student here — Bill Haseltine, now become a very wealthy man — he had a girlfriend and she was working for Ralph Nader one summer, and she somehow got a purloined copy of a study that had been ordered by the NIH of the possible keratogenic, mutagenic, and carcinogenic effects of common herbicides, pesticides, and fungicides.

This company, called the Bionetics company, had this huge contract that tests all these different compounds, and they concluded from this that there was only one of these chemicals that did anything that might be dangerous for people. That was 2,4,5-T, trichlorophenoxyacetic acid. Well, that’s what Agent Orange is made out of. So, I had this report that had not yet been released to the public saying that this could cause birth defects in humans if it did the same thing as it did in guinea pigs and mice. I thought, the White House better know about this. That’s pretty explosive: claims in the newspapers in Saigon and scientific suggestions that this stuff might cause birth defects.

So, I decided to go down to Washington and see President Nixon’s science advisor. That was Lee DuBridge, physicist. Lee DuBridge had been the president of Caltech when I was a graduate student there and so he knew me, and I knew him. So, I went down to Washington with some friends, and I think one of the friends was Arthur Galston from Yale. He was a scientist who worked on herbicides, not on the phenoxyacetic herbicides but other herbicides. So we went down to see the President’s science advisor, and I showed them these newspapers and showed him the Bionetics report. He hadn’t seen it, it was at too low a level of government for him to see it and it had not yet been released to the public. Then he did something amazing, Lee DuBridge: He picked up the phone and he called David Packard, who was the number two at the Defense Department. Right then and there, without consulting anybody else, without asking the permission of the President, they canceled Agent Orange.

Max: Wow.

Matthew: That was the end Agent Orange. Now, not exactly the end. I got a phone call from Lee DuBridge a couple of days later when I was back at Harvard. He says, “Matt, the DuPont people have come to me. It’s not Agent Orange itself, it’s an impurity in Agent Orange called dioxin, and they know that dioxin is very toxic, and the Agent Orange that they make has very little dioxin in it because they know it’s bad and they make the stuff at low temperature, when dioxin is a by-product, that’s made in very small amount. These other companies like Diamond Shamrock and other companies, Monsanto, who make Agent Orange for the military, it must be their Agent Orange. It’s not our Agent Orange.

So, in other words the question was, we just use the Dow Agent Orange — maybe that’s safe. But the question is does the Dow Agent Orange cause defects in mice? So, a whole new series of experiments were done with Agent Orange containing much less dioxin in it. It still made birth defects. So, since it still made birth defects in one species of rodent, you could hardly say, “Well, it’s okay then for humans.” So, that really locked it, closed it down, and then even the Department of Agriculture prohibited the use in the United States, except on land that would have been unlikely to get into the human food chain. So, that ended the use of Agent Orange.

That had happened already before we went to Vietnam. They were then using only Agent White and Agent Blue, two other herbicides, but Agent Orange had been knocked out ahead of time. But that was the end of the whole herbicide program. It was two things: the dioxin concern, on the one hand, stopping Agent Orange, and the decision of President Nixon; and militarily Bunker and Abrams had said, “It’s no use, we want to get it stopped, it’s doing more harm than good. It’s getting the civilian population against us.”

Max: One reaction I have to these fascinating stories is how amazing it is that back in those days politicians really trusted scientists. You could go down to Washington, there would be a science advisor. You know, we even didn’t have a presidential science advisor for a while now during this administration. Do you feel that the climate has changed somehow in the way politicians view scientists?

Matthew: Well, I don’t have a big broad view of the whole thing. I just get the impression, like you do, that there are more politicians who don’t pay attention to science than there used to be. There are still some, but not as many, and not in the White House.

Max: I would say we shouldn’t particularly just point fingers at any particular administration, I think there has been a general downward trend for people’s respect for scientists overall. If you go back to when you were born, Matthew, and when I was born, I think generally people thought a lot more highly about scientists contributing very valuable things to society and they were very interested in them. I think right now there are much more people who can name — If you ask the average person how many famous movie stars can they name, or how many billionaires can they name, versus how many Nobel laureates can they name, the answer is going to be kind of different from the way it was a long time ago. It’s very interesting to think about what we can do to more help people appreciate the things that they do care about, like living longer and having technology and so on, are things that they, to a large extent, owe to science. It isn’t just the nerdy stuff that isn’t relevant to them.

Matthew: Well, I think movie stars were always at the top of the list. Way ahead of Nobel Prize winners and even of billionaires, but you’re certainly right.

Max: The second thing that really strikes me, which you did so wonderfully there, is that you never antagonized the politicians and the military, but rather went to them in a very constructive spirit and said look, here are the options. And based on the evidence, they came to your conclusion.

Matthew: That’s right. Except for the people who actually were doing these programs — that was different, you couldn’t very well tell them that. But for everybody else, yes, it was a help. You need to offer help, not hindrance.

The last thing was the Yellow Rain. That, too, involved the CIA. I was contacted by the CIA. They had become aware of reports from Southeast Asia, particularly from Thailand, Hmong tribespeople who were living in Laos, coming out of Laos across the Mekong into Thailand, and telling stories of being poisoned by stuff dropped from airplanes. Stuff that they called kemi or yellow rain.

At first, I thought maybe there was something to this, there are some nasty chemicals that are yellow. Not that lethal, but who knows, maybe there is exaggeration in their stories. One of them is called adamsite, it’s yellow, it’s an arsenical. So we decided we’d have a conference, because there was a  mystery: What is this yellow rain? We had a conference. We invited people from the intelligence community, from the state department. We invited anthropologists. We invited a bunch of people to ask, what is this yellow rain?

By this time, we knew that the samples that had been turned in contained pollen. One reason we knew that was that the British had samples of this yellow rain and they had shown that it contains pollen. They had looked at the samples of the yellow rain brought in by the Hmong tribespeople, given to British officers — or maybe Americans, I don’t know — but found its way into the hands of British intelligence, who bring these samples back to Porton and they’re examined in various ways, but also under the microscope. And the fellow who looked at them under the microscope happened to be a beekeeper. He knew just what pollen grains look like. And he knew that there was pollen, and then they sent this information to the United States, and we looked at the samples of yellow rain we had, and they all contained — all these yellow samples contained pollen.

The question was, what is it? It’s got pollen in it. Maybe it’s very poisonous. The Montagnard people say it falls from the sky. It lands on leaves and on rocks. The spots were about two millimeters in diameter. It’s yellow or brown or red, different colors. What is it? So, we had this meeting in Cambridge, and one of the people there, Peter Ashton, is a great botanist, his specialty is the trees of Southeast Asia and in particular the great dipterocarp trees, which are like the oaks in our part of the world. And he was interested in the fertilization of these dipterocarps, and the fertilization is done by bees. They collect pollen, though, like other bees.

And so the hypothesis we came to at the end of this day-long meeting was that maybe this stuff is poisonous, and the bees get poisoned by it because it falls on everything, including flowers that have pollen, and the bees get sick, and these yellow spots, they’re the vomit of the bees. These bees are smaller individually than the yellow spots, but maybe several bees get together and vomit on the same spot. Really a crazy idea. Nevertheless, it was the best idea we could come up with that explained why something could be toxic but have pollen in it. It could be little drops, associated with bees, and so on.

A couple of days later, both Peter Ashton, the botanist, and I, noticed on the backs of our cars on the windshields, the rear windshields, yellow spots loaded with pollen. These were being dropped by bees,  these were the natural droppings of bees, and that gave us the idea that maybe there was nothing poisonous in this stuff. Maybe it was the natural droppings of bees that the people in the villages thought was poisonous, but that wasn’t. So, we decided we better go to Thailand and find out what’s happening.

So, a great bee biologist named Thomas Seeley, who’s now at Cornell — he was at Yale at that time — and I flew over to Thailand, and went up into the forest to see if bees defecate in showers. Now why did we do that? It’s because friends here said, “Matt, this can’t be the source of the yellow rain that the Hmong people complained about, because bees defecate one by one. They don’t go out in a great armada of bees and defecate all at once. Each bee goes out and defecates by itself. So, you can’t explain the showers — they’d only get tiny little driblets, and the Hmong people say they’re real showers, with lots of drops falling all at once.”

So, Tom Seeley and I went to Thailand, where they also had this kind of bee. So, we went there, and it turns out that they defecate all at once, unlike the bees here. Now they do defecate in showers here too, but they’re small showers. That’s because the number of bees in a nest here is rather small, but they do come out on the first warm days of spring, when there’s now pollen and nectar to be harvested, but those showers are kind of small. Besides that, the reason that there are showers at all even in New England is because the bees are synchronized by winter. Winter forces them to stay in their nest all winter long, during which they’re eating the stored-up pollen and getting very constipated. Now, when they fly out, they all fly out, they’re all constipated, and so you get a big shower. Not as big as the natives in Southeast Asia reported, but still a shower.

But in southeast Asia, there are no seasons. Too near the equator. So, there’s nothing that would synchronize the defecation of bees, and that’s why we had to go to Thailand to see if — even though there’s no winter to synchronize their defecation flights — if they nevertheless do go out in huge numbers and all at once.

So, we’re in Thailand and we go up into the Khao Yai National Park and find places where there are clearings in the forests where you could see up into the sky, where if there were bees defecating their feces would fall to the ground, not get caught up in the trees. And we put down big pieces, one meter square, of white paper, and anchored them with rocks, and went walking around in the forest some more, and come back and look at our pieces of white paper every once in a while.

And then suddenly we saw a large number of spots on the paper, which meant that they had defecated all at once. They weren’t going around defecating one by one by one. There were great showers then. That’s still a question: Why they don’t go out one by one? And there are some good ideas why, I won’t drag you into that. It’s the convoy principle, to avoid getting picked off one by one by birds. That’s why people think that they go out in great armadas of constipated bees.

So, this gave us a new hypothesis. The so-called yellow rain is all a mistake. It’s just bees defecating, which people confuse and think is poisonous. Now, that still doesn’t prove that there wasn’t a poison. What was the evidence for poison? The evidence was that the Defense Intelligence Agency was sending samples of this yellow rain and also samples of human blood and other materials to a laboratory in Minnesota that knew how to analyze for the particular toxin that the Defense establishment thought was the poison. It’s a toxin called trichothecene mycotoxins, there’s a whole family of them. And this lab reported positive findings in the samples from Thailand but not in controls. So that seemed to be real proof that there was poison.

Well, this lab is a lab that also produced trichothecene mycotoxins, and the way they analyzed for them was by mass spectroscopy, and everybody knows that if you’re going to do mass spectroscopy, you’re going to be able to detect very, very, very tiny amounts of stuff, and so you shouldn’t both make large quantities and try to detect small quantities in the same room, because there’s the possibility of cross contamination. I have an internal report from the Defense Intelligence Agency saying that that laboratory did have numerous false positive, and that probably all of their results were bedeviled by contamination from the trichothecenes that were in the lab, and also because there may have been some false reading of the mass spec diagram.

The long and short of it is that when other laboratories tried to find trichothecenes in their samples: the US Army looked at at least 80 samples and found nothing. The British looked at at least 60 samples, found nothing. The Swedes looked at some number of samples, I don’t know the number, but found nothing. The French looked at a very few samples at their military analytical lab, and the French found nothing. No lab could confirm it. There was one lab at Rutgers that thought it could confirm it, but I believe that they were suffering from contamination also, because they were a lab that worked with trichothecenes also.

So, the long and short of it is that the chemical evidence was no good, and finally the ambassador there decided that we should have another look — Ambassador Dean. And that the military should send out a team that was properly equipped to check up on these stories, because up until then there was no dedicated team. There were teams that would come up briefly, listen to the refugees’ stories, collect samples, and go back. So Ambassador Dean requested a team that would stay there. So out comes a team from Washington, stays there longer than a year. Not just a week, but longer than a year, and they tried to re-locate the Hmong people in the camps who had told these stories in the refugee camps.

They couldn’t find a single one who would tell the same story twice. Either because they weren’t telling the same story twice, or because the interpreter interpreted the same story differently. So, whatever it was. Then they did something else. They tried to find people who were in the same location at the same time as was claimed there was such attacks, and those people never confirmed the attack. They could never find any confirmation by interrogation of people.

Then also, there was a CIA unit out there in that theater questioning captured prisoners of war and also people who surrendered from the North Vietnamese army: the people who were presumably behind the use of this toxic stuff. And they interrogated hundreds of people, and one of these interrogators wrote an article in an Intelligence Agency Journal, but an open journal, saying that he doubted that there was anything to the yellow rain because they had interrogated so many people including chemical corps people from the North Vietnamese Army, that he couldn’t believe that there really was anything going on.

So we did some more investigating of various kinds, not just going to Thailand, but doing some analysis of various things. We looked at the samples — we found bee hairs in the samples. We found that the bee pollen in the samples of the alleged poison had no protein inside. You can stain pollen grains with something called Coomassie brilliant blue, and these pollen grains that were in the samples handed in by the refugees, that were given to us by the army and by the Canadians, by the Australians, they didn’t stain blue. Why not? Because if a pollen grain passes through the gut of a bee, the bee digests out all of the good protein that’s inside the pollen grain, as its nutrition.

So, you’d have to believe that the Soviets were collecting pollen not from plants, which is hard enough, but had been regurgitated by bees. Well, that’s insane. You could never get enough to be a weapon by collecting bee vomit. So the whole story collapsed, and we’ve written a longer account of this. The United States government has never said we were right, but a few years ago said that maybe they were wrong. So that’s at least something.

So one case we were right, and the Soviets were wrong. Another case, the Soviets were wrong, and we were right, and the third case, the herbicides, nobody was right or wrong. It was just that it was, in my view, by the way, it was useless militarily. I’ll tell you why.

If you spray the deep forest, hoping to find a military installation that you can now see because there are no more leaves, it takes four or five weeks for the leaves to fall off. So, you might as well drop little courtesy cards that say, “Dear enemy. We have now sprayed where you are with herbicide. In four or five weeks we will see you. You may choose to stay there, in which case, we will shoot you. Or, you have four or five weeks to move somewhere else, in which case, we won’t be able to find you. You decide.” Well, come on, what kind of a brain came up with that?

The other use was along roadsides, for convoys to be safer from snipers who might be hidden in the woods. You knock the leaves off the trees and you can see deeper into the woods. That’s right, but you have to realize the fundamental law of physics, which is that if you can see from A to B, B can see back to A, right? If there’s a clear light path from one point to another, there’s a clear light path in the other direction.

Now think about it. You are a sniper in the woods, and the leaves now have not been sprayed. They grow right up to the edge of the forest and a convoy is coming down the road. You can stick your head out a little bit but not for very long. They have long-range weapons; When they’re right opposite you, they have huge firepower. If you’re anywhere nearby, you could get killed.

Now, if we get rid of all the leaves, now I can stand way back into the forest, and still sight you between the trunks. Now, that’s a different matter. A very slight move on my part determines how far up the road and down the road I can see. By just a slight movement of my eye and my gun, I can start putting you under fire a couple kilometers up the road — you won’t even know where it’s coming from. And I can keep you under fire a few kilometers down the road, when you pass me by. And you don’t know where I am anymore. I’m not right up by the roadside, because the leaves would otherwise keep me from seeing anything. I’m back in there somewhere. You can pour all kinds of fire, but you might not hit me.

So, for all these reasons, the leaves are not the enemy. The leaves are the enemy of the enemy. Not of us. We’d like to get rid of the trunks — that’s different, we do that with bulldozers. But getting rid of the leaves leaves a kind of a terrain which is advantageous to the enemy, not to us. So, on all these grounds, my hunch is that by embittering the civilian population — and after all our whole strategy was to win the hearts and minds — by embittering the native population by wiping out their crops with drifting herbicide, the herbicides helped us lose the war, not win it. We didn’t win it. But it helped us lose it.

But anyway, the herbicides got stopped in two steps. First Agent Orange, because of dioxin and the report from the Bionetics Company, and second because Abrams and Bunker said, “Stop it.” We now have a treaty, by the way, the ENMOD treaty, that makes it illegal under international law to do any kind of large-scale environmental modification as a weapon of war. So, that’s about everything I know.

And I should add: you might say, how could they interpret something that’s common in that region as a poison? Well, in China, in 1970, I believe it was, the same sort of thing happened, but the situation was very different. People believed that yellow spots were falling from the sky, they were fallout from nuclear weapons tests being conducted by the Soviet Union, and they were poisonous.

Well, the Chinese government asked a geologist from a nearby university to go investigate, and he figured out — completely out of touch with us, he had never heard of us, we had never heard of him — that it was bee feces that were being misinterpreted by the villagers as fallout from nuclear weapons test done by Russians.

It was exactly the same situation, except that in this case there was no reason whatsoever to believe that there was anything toxic there. And why was it that people didn’t recognize bee droppings for what they were? After all, there’s lots of bees out there. There are lots of bees here, too. And if in April, or near that part of spring, you look at the rear windshield of your car, if you’ve been out in the countryside or even here in midtown, you will see lots of these spots, and that’s what those spots are.

When I was trying to find out what kinds of pollen were in the samples of the yellow rain — the so-called yellow rain — that we had, I went down to Washington. The greatest United States expert on pollen grains and where they come from was at the Smithsonian Institution, a woman named Joan Nowicki. I told her that bees make spots like this all the time and she said, “Nonsense. I never see it.” I said, “Where do you park your car?” Well there’s a big parking lot by the Smithsonian, we go down there, and her rear windshield was covered with these things. We see them all the time. They’re part of what we see, but we don’t take any account of.

Here at Harvard there’s a funny story about that. One of our best scientists here, Ed Wilson, studies ants — but also bees — but mostly ants. But he knows a lot about bees. Well, he has an office in the museum building, and lots of people come to visit the museum at Harvard, a great museum, and there’s a parking lot for them. Now there’s a graduate student who has, in those days, bee nests up on top of the museum building. He’s doing some experiments with bees. But these bees defecate, of course. And some of the nice people who come to see Harvard Museum park their cars there and some of them are very nice new cars, and they come back out from seeing the museum and there’s this stuff on their windshields. So, they go to find out who is it that they can blame for this and maybe do something about it or pay them get it fixed or I don’t know what — anyway, to make a complaint. So, they come to Ed Wilson’s office.

Well, this graduate student is a graduate student of Ed Wilson, and of course, he knows that he’s got bee nests up there, and so the secretary of Ed Wilson knows what this stuff is. And the graduate student has the job of taking a rag with alcohol on it and going down and gently wiping the bee feces off of the windshields of these distressed drivers, so there’s never any harm done. But now, when I had some of this stuff that I’d collected in Thailand, I took two people to lunch at the faculty club here at Harvard, and some leaves with these spots on them under a plastic petri dish, just to see if they would know.

Now, one of these guys, Carroll Williams, knew all about insects, lots of things about insects, and Wilson of course; and we’re having lunch and I bring out this petri dish with the leaves covered with yellow spots and asked them, two professors who are great experts on insects, what the stuff is, and they hadn’t the vaguest idea. They didn’t know. So, there can be things around us that we see every day, and even if we’re experts we don’t know what it is. We don’t notice it. It’s just part of the environment. We don’t notice it. I’m sure that these Hmong people were getting shot at, they were getting napalmed, they were getting everything else, but they were not getting poisoned. At least not by bee feces. It was all a big mistake.

Max: Thank you so much, both for this fascinating conversation and all the amazing things you’d done to keep science a force for good in the world.

Ariel: Yes. This has been a really, really great and informative discussion, and I have loved learning about the work that you’ve done, Matthew. So, Matthew and Max, thank you so much for joining the podcast.

Max: Well, thank you.

Matthew: I enjoyed it. I’m sure I enjoyed it more than you did.

Ariel: No, this was great. It’s truly been an honor getting to talk with you.

If you’ve enjoyed this interview, let us know! Please like it, share it, or even leave a good review. I’ll be back again next month with more interviews with experts.  

 

AI Alignment Podcast: Human Cognition and the Nature of Intelligence with Joshua Greene

How do we combine concepts to form thoughts? How can the same thought be represented in terms of words versus things that you can see or hear in your mind’s eyes and ears? How does your brain distinguish what it’s thinking about from what it actually believes? If I tell you a made up story, yesterday I played basketball with LeBron James, maybe you’d believe me, and then I say, oh I was just kidding, didn’t really happen. You still have the idea in your head, but in one case you’re representing it as something true, in another case you’re representing it as something false, or maybe you’re representing it as something that might be true and you’re not sure. For most animals, the ideas that get into its head come in through perception, and the default is just that they are beliefs. But humans have the ability to entertain all kinds of ideas without believing them. You can believe that they’re false or you could just be agnostic, and that’s essential not just for idle speculation, but it’s essential for planning. You have to be able to imagine possibilities that aren’t yet actual. So these are all things we’re trying to understand. And then I think the project of understanding how humans do it is really quite parallel to the project of trying to build artificial general intelligence.” -Joshua Greene

Josh Greene is a Professor of Psychology at Harvard, who focuses on moral judgment and decision making. His recent work focuses on cognition, and his broader interests include philosophy, psychology and neuroscience. He is the author of Moral Tribes: Emotion, Reason, and the Gap Bewtween Us and Them. Joshua Greene’s current research focuses on further understanding key aspects of both individual and collective intelligence. Deepening our knowledge of these subjects allows us to understand the key features which constitute human general intelligence, and how human cognition aggregates and plays out through group choice and social decision making. By better understanding the one general intelligence we know of, namely humans, we can gain insights into the kinds of features that are essential to general intelligence and thereby better understand what it means to create beneficial AGI. This particular episode was recorded at the Beneficial AGI 2019 conference in Puerto Rico. We hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, iTunes, Google Play, Stitcher, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

Topics discussed in this episode include:

  • The multi-modal and combinatorial nature of human intelligence
  • The symbol grounding problem
  • Grounded cognition
  • Modern brain imaging
  • Josh’s psychology research using John Rawls’ veil of ignorance
  • Utilitarianism reframed as ‘deep pragmatism’
You can find out more about Joshua Greene at his website or follow his lab on their Twitter. You can listen to the podcast above or read the transcript below.

Lucas: Hey everyone. Welcome back to the AI Alignment Podcast. I’m Lucas Perry, and today we’ll be speaking with Joshua Greene about his research on human cognition as well as John Rawls’ veil of ignorance and social choice. Studying the human cognitive engine can help us better understand the principles of intelligence, and thereby aid us in arriving at beneficial AGI. It can also inform group choice and how to modulate persons’ dispositions to certain norms or values, and thus affect policy development in observed choice. Given this, we discussed Josh’s ongoing projects and research regarding the structure, relations, and kinds of thought that make up human cognition, key features of intelligence such as it being combinatorial and multimodal, and finally how a particular thought experiment can change how impartial a person is, and thus what policies they support.

And as always, if you enjoy this podcast, please give it a like, share it with your friends, and follow us on your preferred listening platform. As a bit of announcement, the AI Alignment Podcast will be releasing every other Wednesday instead of once a month, so there are a lot more great conversations on the way. Josh Greene is a professor of psychology at Harvard, who focuses on moral judgment and decision making. His recent work focuses on cognition, and his broader interests include philosophy, psychology and neuroscience. And without further ado, I give you Josh Greene.

What sort of thinking has been predominantly occupying the mind of Josh Greene?

Joshua: My lab has two different main research areas that are related, but on a day to day basis are pretty separate. You can think of them as focused on key aspects of individual intelligence versus collective intelligence. On the individual intelligence side, what we’re trying to do is understand how our brains are capable of high level cognition. In technical terms, you can think of that as compositional semantics, or multimodal compositional semantics. What that means in more plain English is how does the brain take concepts and put them together to form a thought, so you can read a sentence like the dog chased the cat, and you understand that it means something different from the cat chased the dog. The same concepts are involved, dog and cat and chasing, but your brain can put things together in different ways in order to produce a different meaning.

Lucas: The black box for human thinking and AGI thinking is really sort of this implicit reasoning that is behind the explicit reasoning, that it seems to be the most deeply mysterious, difficult part to understand.

Joshua: Yeah. A lot of where machine learning has been very successful has been on the side of perception, recognizing objects, or when it comes to going from say vision to language, simple labeling of scenes that are already familiar, so you can show an image of a dog chasing a cat and maybe it’ll say something like dog chasing cat, or at least we get that there’s a dog running and a cat chasing.

Lucas: Right. And the caveat is that it takes a massive amount of training, where it’s not one shot learning, it’s you need to be shown a cat chasing a dog a ton of times just because of how inefficient the algorithms are.

Joshua: Right. And the algorithms don’t generalize very well. So if I show you some crazy picture that you’ve never seen before where it’s a goat and a dog and Winston Churchill all wearing roller skates in a rowboat on a purple ocean, a human can look at that and go, that’s weird, and give a description like the one I just said. Whereas today’s algorithms are going to be relying on brute statistical associations, and that’s not going to cut it for getting a precise, immediate reasoning. So humans have this ability to have thoughts, which we can express in words, but we also can imagine in something like pictures.

And the tricky thing is that it seems like a thought is not just an image, right? So to take an example that I think comes from Daniel Dennett, if you hear the words yesterday my uncle fired his lawyer, you might imagine that in a certain way, maybe you picture a guy in a suit pointing his finger and looking stern at another guy in a suit, but you understand that what you imagined doesn’t have to be the way that that thing actually happened. The lawyer could be a woman rather than a man. The firing could have taken place by phone. The firing could have taken place by phone while the person making the call was floating in a swimming pool and talking on a cell phone, right?

The meaning of the sentence is not what you imagined. But at the same time we have the symbol grounding problem, that is it seems like meaning is not just a matter of symbols chasing each other around. You wouldn’t really understand something if you couldn’t take those words and attach them meaningfully to things that you can see or touch or experience in a more sensory and motor kind of way. So thinking is something in between images and in between words. Maybe it’s just the translation mechanism for those sorts of things, or maybe there’s a deeper language of thought to use, Jerry Fodor’s famous phrase. But in any case, what part of my lab is trying to do is understand how does this central really poorly understood aspect of human intelligence work? How do we combine concepts to form thoughts? How can the same thought be represented in terms of words versus things that you can see or hear in your mind’s eyes and ears?

How does your brain distinguish what it’s thinking about from what it actually believes? If I tell you a made up story, yesterday I played basketball with LeBron James, maybe you’d believe me, and then I say, oh I was just kidding, didn’t really happen. You still have the idea in your head, but in one case you’re representing it as something true, in another case you’re representing it as something false, or maybe you’re representing it as something that might be true and you’re not sure. For most animals, the ideas that get into its head come in through perception, and the default is just that they are beliefs. But humans have the ability to entertain all kinds of ideas without believing them. You can believe that they’re false or you could just be agnostic, and that’s essential not just for idle speculation, but it’s essential for planning. You have to be able to imagine possibilities that aren’t yet actual.

So these are all things we’re trying to understand. And then I think the project of understanding how humans do it is really quite parallel to the project of trying to build artificial general intelligence.

Lucas: Right. So what’s deeply mysterious here is the kinetics that underlie thought, which is sort of like meta-learning or meta-awareness, or how it is that we’re able to have this deep and complicated implicit reasoning behind all of these things. And what that actually looks like seems deeply puzzling in sort of the core and the gem of intelligence, really.

Joshua: Yeah, that’s my view. I think we really don’t understand the human case yet, and my guess is that obviously it’s all neurons that are doing this, but these capacities are not well captured by current neural network models.

Lucas: So also just two points of question or clarification. The first is this sort of hypothesis that you proposed, that human thoughts seem to require some sort of empirical engagement. And then what was your claim about animals, sorry?

Joshua: Well animals certainly show some signs of thinking, especially some animals like elephants and dolphins and chimps engage in some pretty sophisticated thinking, but they don’t have anything like human language. So it seems very unlikely that all of thought, even human thought, is just a matter of moving symbols around in the head.

Lucas: Yeah, it’s definitely not just linguistic symbols, but it still feels like conceptual symbols that have structure.

Joshua: Right. So this is the mystery, human thought, you could make a pretty good case that symbolic thinking is an important part of it, but you could make a case that symbolic thinking can’t be all it is. And a lot of people in AI, most notably DeepMind, have taken the strong view and I think it’s right, that if you’re really going to build artificial general intelligence, you have to start with grounded cognition, and not just trying to build something that can, for example, read sentences and deduce things from those sentences.

Lucas: Right. Do you want to unpack what grounded cognition is?

Joshua: Grounded cognition refers to a representational system where the representations are derived, at least initially, from perception and from physical interaction. There’s perhaps a relationship with empiricism in the broader philosophy of science, but you could imagine trying to build an intelligent system by giving it lots and lots and lots of words, giving it lots of true descriptions of reality, and giving it inference rules for going from some descriptions to other descriptions. That just doesn’t seem like it’s going to work. You don’t really understand what apple means unless you have some sense of what an apple looks like, what it feels like, what it tastes like, doesn’t have to be all of those things. You can know what an apple is without ever eaten one, or I could describe some fruit to you that you’ve never seen, but you have experience with other fruits or other physical objects. Words don’t just exist in a symbol storm vacuum. They’re related to things that we see and touch and interact with.

Lucas: I think for me, just going most foundationally, the question is before I know what an apple is, do I need to understand spatial extension and object permanence? I have to know time, I have to have some very basic ontological understanding and world model of the universe.

Joshua: Right. So we have some clues from human developmental psychology about what kinds of representations, understandings, capabilities humans acquire, and in what order. To state things that are obvious, but nevertheless revealing, you don’t meet any humans who understand democratic politics before they understand objects.

Lucas: Yes.

Joshua: Right?

Lucas: Yeah.

Joshua: Which sounds obvious and it is in a sense obvious, right? But it tells you something about what it takes to build up abstract and sophisticated understandings of the world and possibilities for the world.

Lucas: Right. So for me it seems that the place where grounded cognition is most fundamentally is in between when like the genetic code that seeds the baby and when the baby comes out, the epistemics and whatever is in there, has the capacity to one day potentially become Einstein. So like what is that grounded cognition in the baby that underlies this potential to be a quantum physicist or a scientist-

Joshua: Or even just a functioning human. 

Lucas: Yeah.

Joshua: I mean even people with mental disabilities walk around and speak and manipulate objects. I think that in some ways the harder question is not how do we get from normal human to Einstein, but how do we get from a newborn to a toddler? And the analogous or almost analogous question for artificial intelligence is how do you go from a neural network that has some kind of structure, have some that’s favorable for acquiring useful cognitive capabilities, and how do you figure out what the starting structure is, which is kind of analogous to the question of how does the brain get wired up in utero?

And it gets connected to these sensors that we call eyes and ears, and it gets connected to these effectors that we call hands and feet. And it’s not just a random blob of connectoplasm, the brain has a structure. So one challenge for AI is what’s the right structure for acquiring sophisticated intelligence, or what are some of the right structures? And then what kind of data, what kind of training, what kind of training process do you need to get there?

Lucas: Pivoting back into the relevance of this with AGI, there is like you said, this fundamental issue of grounded cognition that babies and toddlers have that sort of lead them to become full human level intelligences eventually. How does one work to isolate the features of grounded cognition that enable babies to grow and become adults?

Joshua: Well, I don’t work with babies, but I can tell you what we’re doing with adults, for example.

Lucas: Sure.

Joshua: In the one paper in this line of research we already have published, this is work led by Steven Franklin, we have people reading sentences like the dog chased the cat, the cat chased the dog, or the dog was chased by the cat and the cat was chased by the dog. And what we’re doing is looking for parts of the brain where the pattern is different depending on whether the dog is chasing the cat and the cat is chasing the dog. So it has to be something that’s not just involved in representing dog or cat or chasing, but of representing that composition of those three concepts where they’re composed in one way rather than another way. And we found is that their region in the temporal lobe where the pattern is different for those things.

And more specifically, what we’ve found is that in one little spot in this broader region in the temporal lobe, you can better than chance decode who the agent is. So if it’s the dog chased the cat, then in this spot you can better than chance tell that it’s dog that’s doing the chasing. If it’s the cat was chased by the dog, same thing. So it’s not just about the order of the words, and then you can decode better than chance that it’s cat being chased for a sentence like that. So the idea is that these spots in the temporal lobe are functioning like data registers, and representing variables rather than specific values. That is this one region is representing the agent who did something and the other region is representing the patient, as they say in linguistics, who had something done to it. And this is starting to look more like a computer program where the way classical programs work is they have variables and values.

Like if you were going to write a program that translates Celsius into Fahrenheit, what you could do is construct a giant table telling you what Fahrenheit value corresponds to what Celsius value. But the more elegant way to do it is to have a formula where the formula has variables, right? You put in the Celsius value, you multiply it by the right thing and you get the Fahrenheit value. And then what that means is that you’re taking advantage of that recurring structure. Well, the something does something to something else is a recurring structure in the world and in our thought. And so if you have something in your brain that has that structure already, then you can quickly slot in dog as agent, chasing as the action, cat as patient, and that way you can very efficiently and quickly combine new ideas. So the upshot of that first work is that it seems like when we’re representing the meaning of a sentence, we’re actually doing it in a more classical computer-ish way than a lot of neuroscientists might have thought.

Lucas: It’s Combinatorial.

Joshua: Yes, exactly. So what we’re trying to get at is modes of composition. In that experiment, we did it with sentences. In an experiment we’re now doing, this is being led by my grad student Dylan Plunkett, and Steven Franklin is also working on this, we’re doing it with words and with images. We actually took a bunch of photos of different people doing different things. Specifically we have a chef which we also call a cook, and we have a child which we also call a kid. We have a prisoner, which we also call an inmate, and we have male and female versions of each of those. And sometimes one is chasing the other and sometimes one is pushing the other. In the images, we have all possible combinations of the cook pushes the child, the inmate chases the chef-

Lucas: Right, but it’s also gendered.

Joshua: We have male and female versions for each. And then we have all the possible descriptions. And in the task what people have to do is you put two things on the screen and you say, do these things match? So sometimes you’ll have two different images and you have to say, do those images have the same meaning? So it could be a different chef chasing a different kid, but if it’s a chef chasing a kid in both cases, then you would say that they mesh. Whereas if it’s a chef chasing an inmate, then you’d say that they don’t. And then in other cases you would have two sentences, like the chef chased the kid, or it could be the child was chased by the cook, or was pursued by the cook, and even though those are all different words in different orders, you’ve recognized that they have the same meaning or close enough.

And then in the most interesting case, we have an image and a set of words, which you can think of it as as a description, and the question is, does it match? So if you see a picture of a chef chasing a kid, and then the words are chef chases kid or cook pursues child, then you’d say, okay, that one’s a match. And what we’re trying to understand is, is there something distinctive that goes on in that translation process when you have to take a complex thought, not complex in the sense of very sophisticated by human standards, but complex in the sense that it has parts, that it’s composite, and translate it from a verbal representation to a visual representation, and is that different or is the base representation visual? So for example, one possibility is when you get two images, if you’re doing something that’s fairly complicated, you have to translate them both into words. It’s possible that you could see language areas activated when people have to look at two images and decide if they match. Or maybe not. Maybe you can do that in a purely visual kind of way-

Lucas: And maybe it depends on the person. Like some meditators will report that after long periods of meditation, certain kinds of mental events happen much less or just cease, like images or like linguistic language or things like that.

Joshua: So that’s possible. Our working assumption is that basic things like understanding the meaning of the chef chased the kid, and being able to point to a picture of that and say that’s the thing, the sentence described, that our brains do this all more or less than the same way. That could be wrong, but our goal is to get at basic features of high level cognition that all of us share.

Lucas: And so one of these again is this combinatorial nature of thinking.

Joshua: Yes. That I think is central to it. That it is combinatorial or compositional, and that it’s multimodal, that you’re not just combining words with other words, you’re not just combining images with other images, you’re combining concepts that are either not tied to a particular modality or connected to different modalities.

Lucas: They’re like different dimensions of human experience. You can integrate it with if you can feel it, or some people are synesthetic, or like see it or it could be a concept, or it could be language, or it could be heard, or it could be subtle intuition, and all of that seems to sort of come together. Right?

Joshua: It’s related to all those things.

Lucas: Yeah. Okay. And so sorry, just to help me get a better picture here of how this is done. So this is an MRI, right?

Joshua: Yeah.

Lucas: So for me, I’m not in this field and I see generally the brain is so complex that our resolution is just different areas of the brain light up, and so we understand what these areas are generally tasked for, and so we can sort of see how they relate when people undergo different tasks. Right?

Joshua: No, we can do better than that. So that was kind of brain imaging 1.0, and brain imaging 2.0 is not everything we want from a brain imaging technology, but it does take us a level deeper, which is to say instead of just saying this brain region is involved, or it ramps up when people are doing this kind of thing, region function relationships, we can look at the actual encoding of content, I can train a pattern classifier. So let’s say you’re showing people pictures of dog or the word dog versus other things. You can train a pattern classifier to recognize the difference between someone looking at a dog versus looking at a cat, or reading the word dog or reading the word cat. There are patterns of activity that are more subtle than just this region is active or more or less active.

Lucas: Right. So the activity is distinct in a way that when you train the thing on when it looks like people are recognizing cats, then it can recognize that in the future.

Joshua: Yeah.

Lucas: So is there anything besides this multimodal and combinatorial features that you guys have isolated, or that you’re looking into, or that you suppose are like essential features of grounded cognition?

Joshua: Well, this is what we’re trying to study, and we have the ones that have result that’s kind of done and published that I described about representing the meaning of a sentence in terms of representing the agent here and the patient there for that kind of sentence, and we have some other stuff in the pipeline that’s getting at the kinds of representations that the brain uses to combine concepts and also to distinguish concepts that are playing different roles. In another set of studies we have people thinking about different objects.

Sometimes they’ll think about an object where it’s a case where they’d actually get money if it turns out that that object is the one that’s going to appear later. It looks like when you think about, say dog, and if it turns out that it’s dog under the card, then you’ll get five bucks. You see that you were able to decode the dog representation in part of our motivational circuitry, whereas you don’t see that if you’re just thinking about it. So that’s another example, is that things are represented in different places in the brain depending on what function that representation is serving at that time.

Lucas: So with this pattern recognition training that you can do based on how people recognize certain things, you’re able to see sort of the sequence and kinetics of the thought.

Joshua: MRI is not great for temporal resolution. So what we’re not seeing is how on the order of milliseconds a thought gets put together.

Lucas: Okay. I see.

Joshua: What MRI is better for, it has better spatial resolution and is better able to identify spatial patterns of activity that correspond to representing different ideas or parts of ideas.

Lucas: And so in the future, as our resolution begins to increase in terms of temporal imaging or being able to isolate more specific structures, I’m just trying to get a better understanding of what your hopes are for increased ability of resolution and imaging in the future, and how that might also help to disclose grounded cognition.

Joshua: One strategy for getting a better understanding is to combine different methods. fMRI can give you some indication of where you’re representing the fact that it’s a dog that you’re thinking about as opposed to a cat. But other neuroimaging techniques have better temporal resolution but not as good spatial resolution. So EEG which measures electrical activity from the scalp has millisecond temporal resolution, but it’s very blurry spatially. The hope is that you combine those two things and you get a better idea. Now both of these things have been around for more than 20 years, and there hasn’t been as much progress as I would have hoped combining those things. Another approach is more sophisticated models. What I’m hoping we can do is say, all right, so we have humans doing this task where they are deciding whether or not these images match these descriptions, and we know that humans do this in a way that enables them to generalize, so that if they see some combination of things they’ve never seen before.

Joshua: Like this is a giraffe chasing a Komodo Dragon. You’ve never seen that image before, but you could look at that image for the first time and say, okay, that’s a giraffe chasing a Komodo Dragon, at least if you know what those animals look like, right?

Lucas: Yeah.

Joshua: So then you can say, well, what does it take to train a neural network to be able to do that task? And what does it take to train a neural network to be able to do it in such a way that it can generalize to new examples? So if you teach it to recognize Komodo Dragon, can it then generalize such that, well, it learned how to recognize giraffe chases lion, or lion chases giraffe, and so it understands chasing, and it understands lion, and it understands giraffe. Now if you teach it what a Komodo dragon looks like, can it automatically slot that into a complex relational structure?

And so then let’s say we have a neural network that we trained, is able to do that. It’s not all of human cognition. We assume it’s not conscious, but it may capture key features of that cognitive process. And then we look at the model and say, okay, well in real time, what is that model doing and how is it doing it? And then we have a more specific hypothesis that we can go back to the brain and say, well, does the brain do it, something like the way this artificial neural network does it? And so the hope is that by building artificial neural models of these certain aspects of high level cognition, we can better understand human high level cognition, and the hope is that also it will feed back the other way. Where if we look and say, oh, this seems to be how the brain does it, well maybe if you wired up a network like this, what if we mimic that kind of architecture in a neural network and an artificial neural network, does that enable it to solve the problem in a way that it otherwise wouldn’t?

Lucas: Right. I mean we already have AGIs, they just have to be created by humans and they live about 80 years, and then they die, and so we already have an existence proof, and the problem really is the brain is so complicated that there are difficulties replicating it on machines. And so I guess the key is how much can our study of the human brain inform our creation of AGI through machine learning or deep learning or like other methodologies.

Joshua: And it’s not just that the human brain is complicated, it’s that the general intelligence that we’re trying to replicate in machines only exists in humans. You could debate the ethics of animal research and sticking electrodes in monkey brains and things like that, but within ethical frameworks that are widely accepted, you can do things to monkeys or rats that help you really understand in a detailed way what the different parts of their brain are doing, right?

But for good reason, we don’t do those sorts of studies with humans, and we would understand much, much, much, much more about how human cognition works if we were–

Lucas: A bit more unethical.

Joshua: If we were a lot more unethical, if we were willing to cut people’s brains open and say, what happens if you lesion this part of the brain? What happens if you then have people do these 20 tasks? No sane person is suggesting we do this. What I’m saying is that part of the reason why we don’t understand it is because it’s complicated, but another part of the reason why we don’t understand is that we are very much rightly placing ethical limits on what we can do in order to understand it.

Lucas: Last thing here that I just wanted to touch on on this is when I’ve got this multimodal combinatorial thing going on in my head, when I’m thinking about how like a Komodo dragon is chasing a giraffe, how deep does that combinatorialness need to go for me to be able to see the Komodo Dragon chasing the giraffe? Your earlier example was like a purple ocean with a Komodo Dragon wearing like a sombrero hat, like smoking a cigarette. I guess I’m just wondering, well, what is the dimensionality and how much do I need to know about the world in order to really capture a Komodo Dragon chasing a giraffe in a way that is actually general and important, rather than some kind of brittle, heavily trained ML algorithm that doesn’t really know what a Komodo Dragon chasing a giraffe is.

Joshua: It depends on what you mean by really know. Right? But at the very least you might say it doesn’t really know it if it can’t both recognize it in an image and output a verbal label. That’s the minimum, right?

Lucas: Or generalize the new context-

Joshua: And generalize the new cases, right. And I think generalization is key, right. What enables you to understand the crazy scene you described is it’s not that you’ve seen so many scenes that one of them is a pretty close match, but instead you have this compositional engine, you understand the relations, and you understand the objects, and that gives you the power to construct this effectively infinite set of possibility. So what we’re trying to understand is what is the cognitive engine that interprets and generates those infinite possibilities?

Lucas: Excellent. So do you want to sort of pivot here into how Rawls’ veil of justice fits in here?

Joshua: Yeah. So on the other side of the lab, one side is focused more on this sort of key aspect of individual intelligence. On the more moral and social side of the lab, we’re trying to understand our collective intelligence and our social decision making, and we’d like to do research that can help us make better decisions. Of course, what counts is better is always contentious, especially when it comes to morality, but these influences that one could plausibly interpret as better. Right? One of the most famous ideas in moral and political philosophy is John Rawls’s idea of the veil of ignorance, where what Rawls essentially said is you want to know what a just society looks like? Well, the essence of justice is impartiality. It’s not favoring yourself over other people. Everybody has to play this side by the same rules. It doesn’t mean necessarily everybody gets exactly the same outcome, but you can’t get special privileges just because you’re you.

And so what he said was, well, a just society is one that you would choose if you didn’t know who in that society you would be. Even if you are choosing selfishly, but you are constrained to be impartial because of your ignorance. You don’t know where you’re going to land in that society. And so what Rawls says very plausibly is would you rather be randomly slotted into a society where a small number of people are extremely rich and most people are desperately poor? Or would you rather be slotted into a society where most people aren’t rich but are doing pretty well? The answer pretty clearly is you’d rather be slotted randomly into a society where most people are doing pretty well instead of a society where you could be astronomically well off, but most likely would be destitute. Right? So this is all background that Rawls applied this idea of the veil of ignorance to the structure of society overall, and said a just society is one that you would choose if you didn’t know who in it you were going to be.

And this sort of captures the idea of impartiality as sort of the core of justice. So what we’ve been doing recently, and we as this is a project led by Karen Huang and Max Bazerman along with myself, is applying the veil of ignorance idea to more specific dilemmas. So one of the places where we have applied this is with ethical dilemmas surrounding self driving cars. We took a case that was most famously recently discussed by Bonnefon, Sharrif, and Rahwan in their 2016 science paper, The Social Dilemma of Autonomous Vehicles, and the canonical version goes something like you’ve got an autonomous vehicle, and AV, that is headed towards nine people and nothing is done. It’s going to run those nine people over. But it can swerve out of the way and save those nine people, but if it does that, it’s going to drive into a concrete wall and kill the passenger inside.

So the question is should the car swerve or should it go straight? Now, you can just ask people. So what do you think the car should do, or would you approve a policy that says that in a situation like this, the car should minimize the loss of life and therefore swerve? What we did is, some people we just had answer the question just the way I posed it, but other people, we had them do a veil of ignorance exercise first. So we say, suppose you’re going to be one of these 10 people, the nine on the road or the one in the car, but you don’t know who you’re going to be.

From a purely selfish point of view, would you want the car to swerve or not, and almost everybody says, I’d rather have the car swerve. I’d rather have a nine out of 10 chance of living instead of a one out of 10 chance of living. And then we asked people, okay, that was a question about what you would want selfishly, if you didn’t know who you were going to be. Would you approve of a policy that said that cars in situations like this should swerve to minimize the loss of life.

The people who’ve gone through the veil of ignorance exercise, they are more likely to approve of the utilitarian policy, the one that aims to minimize the loss of life, if they’ve gone through that veil of ignorance, exercise first, than if they just answered the question. And we have control conditions where we have them do a version of the veil of ignorance exercise, but where the probabilities are mixed up. So there’s no relationship between the probability and the number of people, and that’s sort of the tightest control condition, and you still see the effect. The idea is that the veil of ignorance is a cognitive device for thinking about a dilemma in a kind of more impartial kind of way.

And then what’s interesting is that people recognize, they do a bit of kind of philosophizing. They say, huh, if I said that what I would want is to have the car swerve, and I didn’t know who I was going to be, that’s an impartial judgment in some sense. And that means that even if I feel sort of uncomfortable about the idea of a car swerving and killing its passenger in a way that is foreseen, if not intended in the most ordinary sense, even if I feel kind of bad about that, I can justify it because I say, look, it’s what I would want if I didn’t know who I was going to be. So we’ve done this with self driving cars, we’ve done it with the classics of the trolley dilemma, we’ve done it with a bioethical case involving taking oxygen away from one patient and giving it to nine others, and we’ve done it with a charity where we have people making a real decision involving real money between a more versus less effective charity.

And across all of these cases, what we find is that when you have people go through the veil of ignorance exercise, they’re more likely to make decisions that promote the greater good. It’s an interesting bit of psychology, but it’s also perhaps a useful tool, that is we’re going to be facing policy questions where we have gut reactions that might tell us that we shouldn’t do what favors the greater good, but if we think about it from behind a veil of ignorance and come to the conclusion that actually we’re in favor of what promotes the greater good at least in that situation, then that can change the way we think. Is that a good thing? If you have consequentialist inclinations like me, you’ll think it’s a good thing, or if you just believe in the procedure, that is I like whatever decisions come out of a veil of ignorance procedure, then you’ll think it’s a good thing. I think it’s interesting that it affects the way people make the choice.

Lucas: It’s got me thinking about a lot of things. I guess a few things are that I feel like if most people on earth had a philosophy education or at least had some time to think about ethics and other things, they’d probably update their morality in really good ways.

Joshua: I would hope so. But I don’t know how much of our moral dispositions come from explicit education versus our broader personal and cultural experiences, but certainly I think it’s worth trying. Certainly believe in the possibility that, understand, this is why I do research on and I come to that with some humility about how much that by itself can accomplish. I don’t know.

Lucas: Yeah, it would be cool to see like the effect size of Rawls’s veil of ignorance across different societies and persons, and then other things you can do are also like the child drowning in the shallow pool argument, and there’s just tons of different thought experiments, it would be interesting to see how it updates people’s ethics and morality. The other thing I just sort of wanted to inject here, the difference between naive consequentialism and sophisticated consequentialism. Sophisticated consequentialism would also take into account not only the direct effect of saving more people, but also how like human beings have arbitrary partialities to what I would call a fiction, like rights or duties or other things. A lot of people share these, and I think within our sort of consequentialist understanding and framework of the world, people just don’t like the idea of their car smashing into walls. Whereas yeah, we should save more people.

Joshua: Right. And as Bonnefon and all point out, and I completely agree, if making cars narrow the utilitarian in the sense that they always try to minimize the loss of life, makes people not want to ride in them, and that means that there are more accidents that lead to human fatalities because people are driving instead of being driven, then that is bad from a consequentialist perspective, right? So you can call it sophisticated versus naive consequentialism, but really there’s no question that utilitarianism or consequentialism in its original form favors the more sophisticated readings. So it’s kind of more-

Lucas: Yeah, I just feel that people often don’t do the sophisticated reasoning, and then they come to conclusions.

Joshua: And this is why I’ve attempted with not much success, at least in the short term, to rebrand utilitarianism as what I call deep pragmatism. Because I think when people hear utilitarianism, what they imagine is everybody walking around with their spreadsheets and deciding what should be done based on their lousy estimates of the greater good. Whereas I think the phrase deep pragmatism gives you a much clearer idea of what it looks like to be utilitarian in practice. That is you have to take into account humans as they actually are, with all of their biases and all of their prejudices and all of their cognitive limitations.

When you do that, it’s obviously a lot more subtle and flexible and cautious than-

Lucas: Than people initially imagine.

Joshua: Yes, that’s right. And I think utilitarian has a terrible PR problem, and my hope is that we can either stop talking about the U philosophy and talk instead about deep pragmatism, see if that ever happens, or at the very least, learn to avoid those mistakes when we’re making serious decisions.

Lucas: The other very interesting thing that this brings up is that if I do the veil of ignorance thought exercise, and then I’m more partial towards saving more people and partial towards policies, which will reduce the loss of life. And then I sort of realize that I actually do have this strange arbitrary partiality, like my car I bought not crash me into a wall, from sort of a third person point of view, I think maybe it seems kind of irrational because the utilitarian thing initially seems most rational. But then we have the chance to reflect as persons, well maybe I shouldn’t have these arbitrary beliefs. Like maybe we should start updating our culture in ways that gets rid of these biases so that the utilitarian calculations aren’t so corrupted by scary primate thoughts.

Joshua: Well, so I think the best way to think about it is how do we make progress? Not how do we radically transform ourselves into alien beings who are completely impartial, right. And I don’t think it’s the most useful thing to do. Take the special case of charitable giving, that you can turn yourself into a happiness pump, that is devote all of your resources to providing money for the world’s most effective charities.

And you may do a lot of good as an individual compared to other individuals if you do that, but most people are going to look at you and just say, well that’s admirable, but it’s super extreme. That’s not for me, right? Whereas if you say, I give 10% of my money, that’s an idea that can spread, that instead of my kids hating me because I deprived them of all the things that their friends had, they say, okay, I was brought up in a house where we give 10% and I’m happy to keep doing that. Maybe I’ll even make it 15. You want norms that are scalable, and that means that your norms have to feel livable. They have to feel human.

Lucas: Yeah, that’s right. We should be spreading more deeply pragmatic approaches and norms.

Joshua: Yeah. We should be spreading the best norms that are spreadable.

Lucas: Yeah. There you go. So thanks so much for joining me, Joshua.

Joshua: Thanks for having me.

Lucas: Yeah, I really enjoyed it and see you again soon.

Joshua: Okay, thanks.

Lucas: If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back again soon with another episode of the AI Alignment Series.

[end of recorded material]

FLI Podcast: AI Breakthroughs and Challenges in 2018 with David Krueger and Roman Yampolskiy

Every January, we like to look back over the past 12 months at the progress that’s been made in the world of artificial intelligence. Welcome to our annual “AI breakthroughs” podcast, 2018 edition.

Ariel was joined for this retrospective by researchers Roman Yampolskiy and David Krueger. Roman is an AI Safety researcher and professor at the University of Louisville. He also recently published the book Artificial Intelligence Safety & Security. David is a PhD candidate in the Mila lab at the University of Montreal, where he works on deep learning and AI safety. He’s also worked with safety teams at the Future of Humanity Institute and DeepMind and has volunteered with 80,000 hours.

Roman and David shared their lists of 2018’s most promising AI advances, as well as their thoughts on some major ethical questions and safety concerns. They also discussed media coverage of AI research, why talking about “breakthroughs” can be misleading, and why there may have been more progress in the past year than it seems.

Topics discussed in this podcast include:

  • DeepMind progress, as seen with AlphaStar and AlphaFold
  • Manual dexterity in robots, especially QT Opt and Dactyl
  • Advances in creativity, as with Generative Adversarial Networks (GANs)
  • Feature-wise transformations
  • Continuing concerns about DeepFakes
  • Scaling up AI systems
  • Neuroevolution
  • Google Duplex, the AI assistant that sounds human on the phone
  • The General Data Protection Regulation (GDPR) and AI policy more broadly

Publications discussed in this podcast include:

You can listen to the podcast above, or read the full transcript below.

Ariel: Hi everyone, welcome to the FLI podcast. I’m your host, Ariel Conn. For those of you who are new to the podcast, at the end of each month, I bring together two experts for an in-depth discussion on some topic related to the fields that we at the Future of Life Institute are concerned about, namely artificial intelligence, biotechnology, climate change, and nuclear weapons.

The last couple of years for our January podcast, I’ve brought on two AI researchers to talk about what the biggest AI breakthroughs were in the previous year, and this January is no different. To discuss the major developments we saw in AI in 2018, I’m pleased to have Roman Yampolskiy and David Krueger joining us today.

Roman is an AI safety researcher and professor at the University of Louisville, his new book Artificial Intelligence Safety and Security is now available on Amazon and we’ll have links to it on the FLI page for this podcast. David is a PhD candidate in the Mila Lab at the University of Montreal, where he works on deep learning and AI safety. He’s also worked with teams at the Future of Humanity Institute and DeepMind, and he’s volunteered with 80,000 Hours to help people find ways to contribute to the reduction of existential risks from AI. So Roman and David, thank you so much for joining us.

David: Yeah, thanks for having me.

Roman: Thanks very much.

Ariel: So I think that one thing that stood out to me in 2018 was that the AI breakthroughs seemed less about surprising breakthroughs that really shook the AI community as we’ve seen in the last few years, and instead they were more about continuing progress. And we also didn’t see quite as many major breakthroughs hitting the mainstream press. There were a couple of things that made big news splashes, like Google Duplex, which is a new AI assistant program that sounded incredibly human on phone calls it made during the demos. And there was also an uptick in government policy and ethics efforts, especially with the General Data Protection Regulation, also known as the GDPR, which went into effect in Europe earlier this year.

Now I’m going to want to come back to Google and policy and ethics later in this podcast, but I want to start by looking at this from the research and development side of things. So my very first question for both of you is: do you agree that 2018 was more about impressive progress, and less about major breakthroughs? Or were there breakthroughs that really were important to the AI community that just didn’t make it into the mainstream press?

David: Broadly speaking I think I agree, although I have a few caveats for that. One is just that it’s a little bit hard to recognize always what is a breakthrough, and a lot of the things in the past that have had really big impacts didn’t really seem like some amazing new paradigm shift—it was sort of a small tweak that then made a lot of things work a lot better. And the other caveat is that there are a few works that I think are pretty interesting and worth mentioning, and the field is so large at this point that it’s a little bit hard to know if there aren’t things that are being overlooked.

Roman: So I’ll agree with you, but I think the pattern is more important than any specific breakthrough. We kind of got used to getting something really impressive every month, so relatively it doesn’t sound as good, all the AlphaStar, AlphaFold, AlphaZero happening almost every month. And it used to be it took 10 years to see something like that.

It’s likely it will happen even more frequently. We’ll conquer a new domain once a week or something. I think that’s the main pattern we have to recognize and discuss. There are significant accomplishments in terms of teaching AI to work in completely novel domains. I mean now we can predict protein folding, now we can have multi-player games conquered. That never happened before so frequently. Chess was impressive because it took like 30 years to get there.

David: Yeah, so I think a lot of people were kind of expecting or at least hoping for StarCraft or Dota to be solved—to see, like we did with AlphaGo, AI systems that are beating the top players. And I would say that it’s actually been a little bit of a let down for people who are optimistic about that, because so far the progress has been kind of unconvincing.

So the AlphaStar, which was a really recent result from last week, for instance: I’ve seen criticism of it that I think is valid that it was making more actions than a human could within a very short interval of time. So they carefully controlled the actions-per-minute that AlphaStar was allowed to take, but they didn’t prevent it from doing really short bursts of actions that really helped its micro-game, and that means that it can win without really being strategically superior to its human opponents. And I think the Dota results that OpenAI has had was also criticized as being sort of not the hardest version of the problem, and still the AI sort of is relying on some crutches.

Ariel: So before we get too far into that debate, can we take a quick step back and explain what both of those are?

David: So these are both real-time strategy games that are, I think, actually the two most popular real-time strategy games in the world that people play professionally, and make money playing. I guess that’s all to say about them.

Ariel: So a quick question that I had too about your description then, when you’re talking about AlphaStar and you were saying it was just making more moves than a person can realistically make. Is that it—it wasn’t doing anything else special?

David: I haven’t watched the games, and I don’t play StarCraft, so I can’t say that it wasn’t doing anything special. I’m basing this basically on reading articles and reading the opinions of people who are avid StarCraft players, and I think the general opinion seems to be that it is more sophisticated than what we’ve seen before, but the reason that it was able to win these games was not because it was out-thinking humans, it’s because it was out-clicking, basically, in a way that just isn’t humanly possible.

Roman: I would agree with this analysis, but I don’t see it as a bug, I see it as a feature. That just shows another way machines can be superior to people. Even if they are not necessarily smarter, they can still produce superior performance, and that’s what we really care about. Right? We found a different way, a non-human approach to solving this problem. That’s impressive.

David: Well, I mean, I think if you have an agent that can just click as fast as it wants, then you can already win at StarCraft, before this work. There needs to be something that makes it sort of a fair fight in some sense.

Roman: Right, but think what you’re suggesting: We have to handicap machines to make them even remotely within being comparative to people. We’re talking about getting to superintelligent performance. You can get there by many ways. You can think faster, you can have better memory, you can have better reaction time—as long as you’re winning in whatever domain we’re interested in, you have superhuman performance.

David Krueger: So maybe another way of putting this would be if they actually made a robot play StarCraft and made it use the same interface that humans do, such as a screen and mouse, there’s no way that it could have beat the human players. And so by giving it direct access to the game controls, it’s sort of not solving the same problem that a human is when they play this game.

Roman: I feel what you’re saying, I just feel that it is solving it in a different way, and we have pro-human bias saying, well that’s not how you play this game, you have an advantage. Human players usually rely on superior strategy, not just faster movements that may take advantage of it for a few nanoseconds, a couple of seconds. But it’s not a long-term sustainable pattern.

One of the research projects I worked on was this idea of artificial stupidity, we called it—kind of limiting machines to human-level capacity. And I think that’s what we’re talking about it here. Nobody would suggest limiting a chess program to just human-level memory, or human memorization of opening moves. But we don’t see it as a limitation. Machines have an option of beating us in ways humans can’t. That’s the whole point, and that’s why it’s interesting, that’s why we have to anticipate such problems. That’s where most of the safety and security issues will show up.

Ariel: So I guess, I think, Roman, your point earlier was sort of interesting that we’ve gotten so used to breakthroughs that stuff that maybe a couple of years ago would have seemed like a huge breakthrough is just run-of-the-mill progress. I guess you’re saying that that’s what this is sort of falling into. Relatively recently this would have been a huge deal, but because we’ve seen so much other progress and breakthroughs, that this is now interesting and we’re excited about it—but it’s not reaching that level of, oh my god, this is amazing! Is that fair to say?

Roman: Exactly! We get disappointed if the system loses one game. It used to be we were excited if it would match amateur players. Now it’s, oh, we played a 100 games and you lost one? This is just not machine-level performance, you disappoint us.

Ariel: David, do you agree with that assessment?

David: I would say mostly no. I guess, I think what really impressed me with AlphaGo and AlphaZero was that it was solving something that had been established as a really grand challenge for AI. And then in the case of AlphaZero, I think the technique that they actually used to solve it was really novel and interesting from a research point of view, and they went on to show that this same technique can solve a bunch of other board games as well.

And my impression from what I’ve seen about how they did AlphaStar and AlphaFold is that there were some interesting improvements and the performance is impressive but I think it’s neither, like, quite at the point where you can say we’ve solved it, we’re better than everybody, or in the case of protein folding, there’s not a bunch more room for improvement that has practical significance. And it’s also—I don’t see any really clear general algorithmic insights about AI coming out of these works yet. I think that’s partially because they haven’t been published yet, but from what I have heard about the details about how they work, I think it’s less of a breakthrough on the algorithm side than AlphaZero was.

Ariel: So you’ve mentioned AlphaFold. Can you explain what that is real quick?

David: This is the protein folding project that DeepMind did, and I think there’s a competition called C-A-S-P or CASP that happens every three years, and they sort of dominated that competition this last year doing what was described as two CASPs in one, so basically doubling the expected rate of improvement that people have seen historically at these tasks, or at least at the one that is the most significant benchmark.

Ariel: I find the idea of the protein folding thing interesting because that’s something that’s actually relevant to scientific advancement and health as opposed to just being able to play a game. Are we seeing actual applications for this yet?

David: I don’t know about that, but I agree with you that that is a huge difference that makes it a lot more exciting than some of the previous examples. I guess one thing that I want to say about that, though, is that it does look a little bit more to me like continuation of progress that was already happening in the communities. It’s definitely a big step up, but I think a lot of the things that they did there could have really happened over the next few years anyways, even without DeepMind being there. So, one of the articles I read put it this way: If this wasn’t done by DeepMind, if this was just some academic group, would this have been reported in the media? I think the answer is sort of like a clear no, and that says something about the priorities of our reporting and media as well as the significance of the results, but I think that just gives some context.

Roman: I’ll agree with David—the media is terrible in terms of what they report on, we can all agree on that. I think it was quite a breakthrough, I mean, to say that they not just beat the competition, but to actually kind of doubled performance improvement. That’s incredible. And I think anyone who got to that point would not be denied publication in a top journal; It would be considered very important in that domain. I think it’s one of the most important problems in medical research. If you can accurately predict this, possibilities are really endless in terms of synthetic biology, in terms of curing diseases.

So this is huge in terms of impact from being able to do it. As far as how applicable is it to other areas, is it a great game-changer for AI research? All those things can adapt between this ability to perform in real-life environments of those multiplayer games, and being able to do this. Look at how those things can be combined. Right? You can do things in the real world you couldn’t do before, both in terms of strategy games, which are basically simulations for economic competition, for wars, for quite a few applications where impact would be huge.

So all of it is very interesting. It’s easy to say that, “Well if they didn’t do it, somebody else maybe would do it in a couple of years.” But it’s almost always true for all inventions. If you look at the history of inventions, things like, I don’t know, telephone, have been invented at the same time by two or three people; radio, two or three people. It’s just the point where science gets enough ingredient technology where yeah, somebody’s going to do it, nice. But still, we give credit to whoever got there first.

Ariel: So I think that’s actually a really interesting point, because I think for the last few years we have seen sort of these technological advances but I guess we also want to be considering the advances that are going to have a major impact on humanity even if it’s not quite as technologically new.

David: Yeah, absolutely. I think the framing in terms of breakthroughs is a little bit unclear what we’re talking about when we talk about AI breakthroughs, and I think a lot of people in the field of AI kind of don’t like how much people talk about it in terms of breakthroughs because a lot of the progress is gradual and builds on previous work and it’s not like there was some sudden insight that somebody had that just changed everything, although that does happen in some ways.

And I think you can think of the breakthroughs both in terms of like what is the impact—is this suddenly going to have a lot of potential to change the world? You can also think of it, though, from the perspective of researchers as like, is this really different from the kind of ideas and techniques we’ve seen or seen working before? I guess I’m more thinking about the second right now in terms of breakthroughs representing really radical new ideas in research.

Ariel: Okay, well I will take responsibility for being one of the media people who didn’t do a good job with presenting AI breakthroughs. But I think both with this podcast and probably moving forward, I think that is actually a really important thing for us to be doing—is both looking at the technological progress and newness of something but also the impact it could have on either society or future research.

So with that in mind, you guys also have a good list of other things that did happen this year, so I want to start moving into some of that as well. So next on your list is manual dexterity in robots. What did you guys see happening there?

David: So this is something that’s definitely not my area of expertise, so I can’t really comment too much on it. But there are two papers that I think are significant and potentially representing something like a breakthrough in this application. In general robotics is really difficult, and machine learning for robotics is still, I think, sort of a niche thing, like most robotics is using more classical planning algorithms, and hasn’t really taken advantage of the new wave of deep learning and everything.

So there’s two works, one is QT-Opt, and the other one is Dactyl, and these are both by people from the Berkeley OpenAI crowd. And these both are showing kind of impressive results in terms of manual dexterity in robots. So there’s one that does a really good job at grasping, which is one of the basic aspects of being able to act in the real world. And then there’s another one that was sort of just manipulating something like a cube with different colored faces on it—that one’s Dactyl; the grasping one is QT-Opt.

And I think this is something that was paid less attention to in the media, because it’s been more of a story of kind of gradual progress I think. But my friend who follows this deep reinforcement learning stuff more told me that QT-Opt is the first convincing demonstration of deep reinforcement learning in the real world, as opposed to all these things we’ve seen in games. The real world is much more complicated and there’s all sorts of challenges with the noise of the environment dynamics and contact forces and stuff like this that have been really a challenge for doing things in the real world. And then there’s also the limited sample complexity where when you play a game you can sort of interact with the game as much as you want and play the game over and over again, whereas in the real world you can only move your robot so fast and you have to worry about breaking it, so that means in the end you can collect a lot less data, which makes it harder to learn things.

Roman: Just to kind of explain maybe what they did. So hardware’s expensive, slow: It’s very difficult to work with. Things don’t go well in real life; It’s a lot easier to create simulations in virtual worlds, train your robot in there, and then just transfer knowledge into a real robot in the physical world. And that’s exactly what they did, training that virtual hand to manipulate objects, and they could run through thousands, millions of situations and then it’s something you cannot do with an actual, physical robot at that scale. So, I think that’s a very interesting approach for why lots of people try doing things in virtual environments. Some of the early AGI projects all concentrated on virtual worlds as domain of learning. So that makes a lot of sense.

David: Yeah, so this was for the Dactyl project, which was OpenAI. And that was really impressive I think, because people have been doing this sim-to-real thing—where you train in simulation and then try and transfer it to the real world—with some success for like a year or two, but this one I think was really kind of impressive in that sense, because they didn’t actually train it in the real world at all, and what they had learned managed to transfer to the real world.

Ariel: Excellent. I’m going to keep going through your list. One thing that you both mentioned are GANs. So very quickly, if one of you, or both of you, could explain what a GAN is and what that stands for, and then we’ll get into what happened last year with those.

Roman: Sure, so this is a somewhat new way of doing creative generational visuals and audio. You have two neural networks competing, one is kind of creating fakes, and the other one is judging them, and you get to a point where they’re kind of 50/50. You can’t tell if it’s fake or real anymore. And it’s a great way to produce artificial faces, cars, whatever. Any type of input you can provide to the networks, they quickly learn to extract the essence of that image or audio and generate artificial data sets full of such images.

And there’s really exciting work on being able to extract properties from those, different styles. So if we talk about faces, for example: there could be a style for hair, a style for skin color, a style for age, and now it’s possible to manipulate them. So I can tell you things like, “Okay, Photoshop, I need a picture of a female, 20 years old, blonde, with glasses,” and it would generate a completely realistic face based on those properties. And we’re starting to see it show up not just in images but transferred to video, to generating whole virtual worlds. It’s probably the closest thing we ever had computers get to creativity: actually kind of daydreaming and coming up with novel outputs.

David: Yeah, I just want to say a little bit about the history of the research in GAN. So the first work on GANs was actually back four or five years ago in 2014, and I think it was actually kind of—didn’t make a huge splash at the time, but maybe a year or two after that it really started to take off. And research in GANs over the last few years has just been incredibly fast-paced and there’s been hundreds of papers submitted and published at the big conferences every year.

If you look just in terms of the quality of what is generated, this is, I think, just an amazing demonstration of the rate of progress in some areas of machine learning. The first paper had these sort of black and white pictures of really blurry faces, and now you can get giant—I think 256 by 256, or 512 by 512, or even bigger—really high resolution and totally indistinguishable from real photos, to the human eye anyway—images of faces. So it’s really impressive, and we’ve seen really consistent progress on that, especially in the last couple years.

Ariel: And also, just real quick, what does it stand for?

David: Oh, generative adversarial network. So it’s generative, because it’s sort of generating things from scratch, or from its imagination or creativity. And it’s adversarial because there are two networks: the one that generates the things, and then the one that tries to tell those fake images apart from real images that we actually collect by taking photos in the world.

Ariel: This is an interesting one because it can sort of transition into some ethics stuff that came up this past year, but I’m not sure if we want to get there yet, or if you guys want to talk a little bit more about some of the other things that happened on the research and development side.

David: I guess I want to talk about a few other things that have been making, I would say, sort of steady progress, like GANs. With a lot of interest in, I guess I would say, their ideas that are coming to fruition, even though some of these are not exactly from the last year, they sort of really started to improve themselves and become widely used in the last year.

Ariel: Okay.

David: I think this is actually used in maybe the latest, greatest GAN paper, is something that’s called feature-wise transformations. So this is an idea that actually goes back up to 40 years, depending on how you measure it, but has sort of been catching on in specific applications in machine learning in the last couple of years—starting with, I would say, style-transfer, which is sort of like what Roman mentioned earlier.

So the idea here is that in a neural network, you have what are called features, which basically correspond to the activations of different neurons in the network. Like how much that neuron likes what it’s seeing, let’s say. And those can also be interpreted as representing different kinds of visual patterns, like different kinds of textures, or colors. And these feature-wise transformations basically just take each of those different aspects of the image, like the color or texture in a certain location, and then allow you to manipulate that specific feature, as we call it, by making it stronger or amplifying whatever was already there.

And so you can sort of view this as a way of specifying what sort of things are important in the image, and that’s why it allows you to manipulate the style of images very easily, because you can sort of look at a certain painting style for instance, and say, oh this person uses a lot of wide brush strokes, or a lot of narrow brush strokes, and then you can say, I’m just going to modulate the neurons that correspond to wide or narrow brush strokes, and change the style of the painting that way. And of course you don’t do this by hand, by looking in and seeing what the different neurons represent. This all ends up being learned end-to-end. And so you sort of have an artificial intelligence model that predicts how to modulate the features within another network, and that allows you to change what that network does in a really powerful way.

So, I mentioned that it has been applied in the most recent GAN papers, and I think they’re just using those kinds of transformations to help them generate images. But other examples where you can explain what’s happening more intuitively, or why it makes sense to try and do this, would be something like visual question answering. So there you can have the modulation of the vision network being done by another network that looks at a question and is trying to help answer that question. And so it can sort of read the question and see what features of images might be relevant to answering that question. So for instance, if the question was, “Is it a sunny day outside?” then it could have the vision network try and pay more attention to things that correspond to signs of sun. Or if it was asked something like, “Is this person’s hair combed?” then you could look for the patterns of smooth, combed hair and look for the patterns of rough, tangled hair, and have those features be sort of emphasized in the vision network. That allows the vision network to pay attention to the parts of the image that are most relevant to answering the question.

Ariel: Okay. So, Roman, I want to go back to something on your list quickly in a moment, but first I was wondering if you have anything that you wanted to add to the feature-wise transformations?

Roman: All of it, you can ask, “Well why is this interesting, what are the applications for it?” So you are able to generate inputs, inputs for computers, inputs for people, images, sounds, videos. A lot of times they can be adversarial in nature as well—what we call deep fakes. Right? You can make, let’s say, a video of a famous politician say something, or do something.

Ariel: Yeah.

Roman: And this has very interesting implications for elections, for forensic science, for evidence. As those systems get better and better, it becomes harder and harder to tell if something is real or not. And maybe it’s still possible to do some statistical analysis, but it takes time, and we talked about media being not exactly always on top of it. So it may take 24 hours before we realize if this video was real or not, but the election is tonight.

Ariel: So I am definitely coming back to that. I want to finish going through the list of the technology stuff, but yeah I want to talk about deep fakes and in general, a lot of the issues that we’ve seen cropping up more and more with this idea of using AI to fake images and audio and video, because I think that is something that’s really important.

David: Yeah, it’s hard for me to estimate these things, but I would say this is probably, in terms of the impact that this is going to have societally, this is sort of the biggest story maybe of the last year. And it’s not like something that happened all of the sudden. Again, it’s something that has been building on a lot of progress in generative models and GANs and things like this. And it’s just going to continue, we’re going to see more and more progress like that, and probably some sort of arms’ race here where—I shouldn’t use that word.

Ariel: A competition.

David: A competition between people who are trying to use that kind of technology to fake things and people who are sort of doing forensics to try and figure out what is real and what is fake. And that also means that people are going to have to trust the people who have the expertise to do that, and believe that they’re actually doing that and not part of some sort of conspiracy or something.

Ariel: Alright, well are you guys ready to jump into some of those ethical questions?

David: Well, there are like two other broad things I wanted to mention, which I think are sort of interesting trends in the research community. One is just the way that people have been continuing to scale up AI systems. So a lot of the progress I think has arguably just been coming from more and more computation and more and more data. And there was a pretty great blog post by OpenAI about this last year that argued that the amount of computation that’s being used to train the most advanced AI systems is increasing by a factor of 10 times every year for the last several years, which is just astounding. But it also suggests that this might not be sustainable for a long time, so to the extent that you think that using more computation is a big driver of progress, we might start to see that slow down within a decade or so.

Roman: I’ll add another—what I think also is kind of building-on technology, not so much a breakthrough, we had it for a long time—but neural evolution is something I’m starting to pay a lot more attention to and that’s kind of borrowing from biology, trying to evolve ways for neural networks, optimized neural networks. And it’s producing very impressive results. It’s possible to run it in parallel really well, and it’s competitive with some of the leading alternative approaches.

So, the idea basically is you have this very large neural network, brain-like structure, but instead of trying to train it back, propagate errors, teach it in a standard neural networks way, you just kind of have a population of those brains competing for who’s doing best in a particular problem, and they share weights between good parents, and after a while you just evolve really well performing solutions to some of the most interesting problems.

Additionally you can kind of go meta-level on it and evolve architectures for the neural network itself—how many layers, how many inputs. This is nice because it doesn’t require much human intervention. You’re essentially letting the system figure out what the solutions are. We had some very successful results with genetic algorithms for optimization. We didn’t have much success with genetic programming, and now neural evolution kind of brings it back where you’re optimizing intelligence systems, and that’s very exciting.

Ariel: So you’re saying that you’ll have—to make sure I understand this correctly—there’s two or more neural nets trying to solve a problem, and they sort of play off of each other?

Roman: So you create a population of neural networks, and you give it a problem, and you see this one is doing really well, and that one. The others, maybe not so great. So you take weights from those two and combine them—like mom and dad, parent situation that produces offspring. And so you have this simulation of evolution where unsuccessful individuals are taken out of a population. Successful ones get to reproduce and procreate, and provide their high fitness weights to the next generation.

Ariel: Okay. Was there anything else that you guys saw this year that you want to talk about, that you were excited about?

David: Well I wanted to give a few examples of the kind of massive improvements in scale that we’ve seen. One of the most significant models and benchmarks in the community is ImageNet and training image classifiers that can tell you what a picture is a picture of on this dataset.So the whole sort of deep learning revolution was arguably started, or at least really came into the eyes of the rest of the machine learning community, because of huge success on this ImageNet competition. And training the model there took something like two weeks, and this last year there was a paper where you can train a more powerful model in less than four minutes, and they do this by using like 3000 graphics cards in parallel.

And then DeepMind also had some progress on parallelism with this model called IMPALA, which basically was in the context of reinforcement learning as opposed to classification, and there they sort of came up with a way that allowed them to do updates in parallels, like learn on different machines and combine everything that was learned in a way that’s asynchronous. So in the past the sort of methods that they would use for these reinforcement learning problems, you’d have to wait for all of the different machines to finish their learning on the current problem or instance that they’re learning about, and then combine all of that centrally—whereas the new method allows you to just as soon as you’re done computing or learning something, you can communicate it to the rest of the system, the other computers that are learning in parallel. And that was really important for allowing them to scale to hundreds of machines working on their problem at the same time.

Ariel: Okay, and so that, just to clarify as well, that goes back to this idea that right now we’re seeing a lot of success just scaling up the computing, but at some point that could slow things down essentially, if we had a limit for how much computing is possible.

David: Yeah, and I guess one of my points is also doing this kinds of scaling of computing requires some amount of algorithmic insight or breakthrough if you want to be dramatic as well. So this DeepMind paper I talked about, they had to devise new reinforcement learning algorithms that would still be stable when they had this real-time asynchronous updating. And so, in a way, yeah, a lot of the research that’s interesting right now is on finding ways to make the algorithm scale so that you can keep taking advantage of more and more hardware. And the evolution stuff also fits into that picture to some extent.

Ariel: Okay. I want to start making that transition into some of the concerns that we have for misuse around AI and how easy it is for people to be deceived by things that have been created by AI. But I want to start with something that’s hopefully a little bit more neutral, and talk about Google Duplex, which is the program that Google came out with, I think last May. I don’t know the extent to which it’s in use now, but they presented it, and it’s an AI assistant that can essentially make calls and set up appointments for you. So their examples were it could make a reservation at a restaurant for you, or it could make a reservation for you to get a haircut somewhere. And it got sort of mixed reviews, because on the one hand people were really excited about this, and on the other hand it was kind of creepy because it sounded human, and the people on the other end of the call did not know that they were talking to a machine.

So I was hoping you guys could talk a little bit I guess maybe about the extent to which that was an actual technological breakthrough versus just something—this one being more one of those breakthroughs that will impact society more directly. And then also I guess if you agree that this seems like a good place to transition into some of the safety issues.

David: Yeah, no, I would be surprised if they really told us about the details of how that worked. So it’s hard to know how much of an algorithmic breakthrough or algorithmic breakthroughs were involved. It’s very impressive, I think, just in terms of what it was able to do, and of course these demos that we saw were maybe selected for their impressiveness. But I was really, really impressed personally, just to see a system that’s able to do that.

Roman: It’s probably built on a lot of existing technology, but it is more about impact than what you can do with this. And my background is cybersecurity, so I see it as a great tool for like automating spear-phishing attacks on a scale of millions. You’re getting a real human calling you, talking to you, with access to your online data; Pretty much everyone’s gonna agree and do whatever the system is asking of you, if it’s credit card numbers, or social security numbers. So, in many ways it’s going to be a game changer.

Ariel: So I’m going to take that as a definite transition into safety issues. So, yeah, let’s start talking about, I guess, sort of human manipulation that’s happening here. First, the phrase “deep fake” shows up a lot. Can you explain what those are?

David: So “deep fakes” is basically just: you can make a fake video of somebody doing something or saying something that they did not actually do or say. People have used this to create fake videos of politicians, they’ve used it to create porn using celebrities. That was one of the things that got it on the front page of the internet, basically. And Reddit actually shut down the subreddit where people were doing that. But, I mean, there’s all sorts of possibilities.

Ariel: Okay, so I think the Reddit example was technically the very end of 2017. But all of this sort became more of an issue in 2018. So we’re seeing this increase in capability to both create images that seem real, create audio that seems real, create video that seems real, and to modify existing images and video and audio in ways that aren’t immediately obvious to a human. What did we see in terms of research to try to protect us from that, or catch that, or defend against that?

Roman: So here’s an interesting observation, I guess. You can develop some sort of a forensic tool to analyze it, and give you a percentage likelihood that it’s real or that it’s fake. But does it really impact people? If you see it with your own eyes, are you going to believe your lying eyes, or some expert statistician on CNN?

So the problem is it will still have tremendous impact on most people. We’re not very successful at convincing people about multiple scientific facts. They simply go outside, or it’s cold right now, so global warming is false. I suspect we’ll see exactly that with, let’s say, fake videos of politicians, where a majority of people easily believe anything they hear once or see once versus any number of peer reviewed publications disproving it.

David: I kind of agree. I mean, I think, when I try to think about how we would actually solve this kind of problem, I don’t think a technical solution that just allows somebody who has technical expertise to distinguish real from fake is going to be enough. We really need to figure out how to build a better trust infrastructure in our whole society which is kind of a massive project. I’m not even sure exactly where to begin with that.

Roman: I guess the good news is it gives you plausible deniability. If a video of me comes out doing horrible things I can play it straight.

Ariel: That’s good for someone. Alright, so, I mean, you guys are two researchers, I don’t know how into policy you are, but I don’t know if we saw as many strong policies being developed. We did see the implementation of the GDPR, and for people who aren’t familiar with the GDPR, it’s essentially European rules about what data companies can collect from your interactions online, and the ways in which you need to give approval for companies to collect your data, and there’s a lot more to it than that. One of the things that I found most interesting about the GDPR is that it’s entirely European based, but it had a very global impact because it’s so difficult for companies to apply something only in Europe and not in other countries. And so earlier this year when you were getting all of those emails about privacy policies, that was all triggered by the GDPR. That was something very specific that happened and it did make a lot of news, but in general I felt that we saw a lot of countries and a lot of national and international efforts for governments to start trying to understand how AI is going to be impacting their citizens, and then also trying to apply ethics and things like that.

I’m sort of curious, before we get too far into anything: just as researchers, what is your reaction to that?

Roman: So I never got as much spam as I did that week when they released this new policy, so that kind of gives you a pretty good summary of what to expect. If you look at history, we have regulations against spam, for example. Computer viruses are illegal. So that’s a very expected result. It’s not gonna solve technical problems. Right?

David: I guess I like that they’re paying attention and they’re trying to tackle these issues. I think the way GDPR was actually worded, it has been criticized a lot for being either much too broad or demanding, or vague. I’m not sure—there are some aspects of the details of that regulation that I’m not convinced about, or not super happy about. I guess overall it seems like people who are making these kinds of decisions, especially when we’re talking about cutting edge machine learning, it’s just really hard. I mean, even people in the fields don’t really know how you would begin to effectively regulate machine learning systems, and I think there’s a lot of disagreement about what a reasonable level of regulation would be or how regulations should work.

People are starting to have that sort of conversation in the research community a little bit more, and maybe we’ll have some better ideas about that in a few years. But I think right now it seems premature to me to even start trying to regulate machine learning in particular, because we just don’t really know where to begin. I think it’s obvious that we do need to think about how we control the use of the technology, because it’s just so powerful and has so much potential for harm and misuse and accidents and so on. But I think how you actually go about doing that is a really unclear and difficult problem.

Ariel: So for me it’s sort of interesting, we’ve been debating a bit today about technological breakthroughs versus societal impacts, and whether 2018 actually had as many breakthroughs and all of that. But I would guess that all of us agree that AI is progressing a lot faster than government does.

David: Yeah.

Roman: That’s almost a tautology.

Ariel: So I guess as researchers, what concerns do you have regarding that? Like do you worry about the speed at which AI is advancing?

David: Yeah, I would say I definitely do. I mean, we were just talking about this issue with fakes and how that’s going to contribute to things like fake news and erosion of trust in media and authority and polarization of society. I mean, if AI wasn’t going so fast in that direction, then we wouldn’t have that problem. And I think the rate that it’s going, I don’t see us catching up—or I should say, I don’t see the government catching up on its own anytime soon—to actually control the use of AI technology, and do our best anyways to make sure that it’s used in a safe way, and a fair way, and so on.

I think in and of itself it’s maybe not bad that the technology is progressing fast. I mean, it’s really amazing; Scientifically there’s gonna be all sorts of amazing applications for it. But there’s going to be more and more problems as well, and I don’t think we’re really well equipped to solve them right now.

Roman: I’ll agree with David, I’m very concerned at its relative rate of progress. AI development progresses a lot faster than anything we see in AI safety. AI safety is just trying to identify problem areas, propose some general directions, but we have very little to show in terms of solved problems.

If you look at our work in adversarial fields, maybe a little bit cryptography, the good guys have always been a step ahead of the bad guys, whereas here you barely have any good guys as a percentage. You have like less than 1% of researchers working directly on safety full-time. Same situation with funding. So it’s not a very optimistic picture at this point.

David: I think it’s worth definitely distinguishing the kind of security risks that we’re talking about, in terms of fake news and stuff like that, from long-term AI safety, which is what I’m most interested in, and think is actually even more important, even though I think there’s going to be tons of important impacts we have to worry about already, and in the coming years.

And the long-term safety stuff is really more about artificial intelligence that becomes broadly capable and as smart or smarter than humans across the board. And there, there’s maybe a little bit more signs of hope if I look at how the fields might progress in the future, and that’s because there’s a lot of problems that are going to be relevant for controlling or aligning or understanding these kind of generally intelligent systems that are probably going to be necessary anyways in terms of making systems that are more capable in the near future.

So I think we’re starting to see issues with trying to get AIs to do what we want, and failing to, because we just don’t know how to specify what we want. And that’s, I think, basically the core of the AI safety problem—is that we don’t have a good way of specifying what we want. An example of that is what are called adversarial examples, which sort of demonstrate that computer vision systems that are able to do a really amazing job at classifying images and seeing what’s in an image and labeling images still make mistakes that humans just would never make. Images that look indistinguishable to humans can look completely different to the AI system, and that means that we haven’t really successfully communicated to the AI system what our visual concepts are. And so even though we think we have done a good job of telling it what to do, it’s like, “tell us what this picture is of”—the way that it found to do that really isn’t the way that we would do it and actually there’s some very problematic and unsettling differences there. And that’s another field that, along with the ones that I mentioned, like generative models and GANs, has been receiving a lot more attention in the last couple of years, which is really exciting from the point of view of safety and specification.

Ariel: So, would it be fair to say that you think we’ve had progress or at least seen progress in addressing long-term safety issues, but some of the near-term safety issues, maybe we need faster work?

David: I mean I think to be clear, we have such a long way to go to address the kind of issues we’re going to see with generally intelligent and super intelligent AIs, that I still think that’s an even more pressing problem, and that’s what I’m personally focused on. I just think that you can see that there are going to be a lot of really big problems in the near term as well. And we’re not even well equipped to deal with those problems right now.

Roman: I’ll generally agree with David. I’m more concerned about long-term impacts. There are both more challenging and more impactful. It seems like short-term things may be problematic right now, but the main difficulty is that we didn’t start working on them in time. So problems like algorithmic fairness, bias, technological unemployment, are social issues which are quite solvable; They are not really that difficult from engineering or technical points of view. Whereas long-term control of systems which are more intelligent than you are—very much unsolved at this point in any even toy model. So I would agree with the part about bigger concerns but I think current problems we have today, they are already impacting people, but the good news is we know how to do better.

David: I’m not sure that we know how to do better exactly. Like I think a lot of these problems, it’s more of a problem of willpower and developing political solutions, so the ones that you mentioned. But with the deep fakes, this is something that I think requires a little bit more of a technical solution in the sense of how we organize our society so that people are either educated enough to understand this stuff, or so that people actually have someone they trust and have a reason to trust, who they can take their word for it on that.

Roman: That sounds like a great job, I’ll take it.

Ariel: It almost sounds like something we need to have someone doing in person, though.

So going back to this past year: were there, say, groups that formed, or research teams that came together, or just general efforts that, while maybe they didn’t produce something yet, you think could produce something good, either in safety or AI in general?

David: I think something interesting is happening in terms of the way AI safety is perceived and talked about in the broader AI and machine learning community. It’s a little bit like this phenomenon where once we solve something people don’t consider it AI anymore. So I think machine learning researchers, once they actually recognize the problem that the safety community has been sort of harping on and talking about and saying like, “Oh, this is a big problem”—once they say, “Oh yeah, I’m working on this kind of problem, and that seems relevant to me,” then they don’t really think that it’s AI safety, and they’re like, “This is just part of what I’m doing, making something that actually generalizes well and learns the right concept, or making something that is actually robust, or being able to interpret the model that I’m building, and actually know how it works.”

These are all things that people are doing a lot of work on these days in machine learning that I consider really relevant for AI safety. So I think that’s like a really encouraging sign, in a way, that the community is sort of starting to recognize a lot of the problems, or at least instances of a lot of the problems that are going to be really critical for aligning generally intelligent AIs.

Ariel: And Roman, what about you? Did you see anything sort of forming in the last year that maybe doesn’t have some specific result, but that seemed hopeful to you?

Roman: Absolutely. So I’ve mentioned that there is very few actual AI safety researchers as compared to the number of AI developers, researchers directly creating more capable machines. But the growth rate is much better I think. The number of organizations, the number of people who show interest in it, the number of papers I think is growing at a much faster rate, and it’s encouraging because as David said, it’s kind of like this convergence if you will, where more and more people realize, “I cannot say I built an intelligent system if it kills everyone.” That’s just not what an intelligent system is.

So safety and security become integral parts of it. I think Stuart Russell has a great example where he talks about bridge engineering. We don’t talk about safe bridges and secure bridges—there’s just bridges. If it falls down, it’s not a bridge. Exactly the same is starting to happen here: People realize, “My system cannot fail and embarrass the company, I have to make sure it will not cause an accident.”

David: I think that a lot of people are thinking about that way more and more, which is great, but there is a sort of research mindset, where people just want to understand intelligence, and solve intelligence. And I think that’s kind of a different pursuit. Solving intelligence doesn’t mean that you make something that is safe and secure, it just means you make something that’s really intelligent, and I would like it if people who had that mindset were still, I guess, interested in or respectful of or recognized that this research is potentially dangerous. I mean, not right now necessarily, but going forward I think we’re going to need to have people sort of agree on having that attitude to some extent of being careful.

Ariel: Would you agree though that you’re seeing more of that happening?

David: Yeah, absolutely, yeah. But I mean it might just happen naturally on its own, which would be great.

Ariel: Alright, so before I get to my very last question, is there anything else you guys wanted to bring up about 2018 that we didn’t get to yet?

David: So we were talking about AI safety and there’s kind of a few big developments in the last year. I mean, there’s actually too many I think for me to go over all of them, but I wanted to talk about something which I think is relevant to the specification problem that I was talking about earlier.

Ariel: Okay.

David: So, there are three papers in the last year, actually, on what I call superhuman feedback. The idea motivating these works is that even specifying what we want on a particular instance in some particular scenario can be difficult. So typically the way that we would think about training an AI that understands our intentions is to give it a bunch of examples, and say, “In this situation, I prefer if you do this. This is the kind of behavior I want,” and then the AI is supposed to pick up on the patterns there and sort of infer what our intentions are more generally.

But there can be some things that we would like AI systems to be competent at doing, ideally, that are really difficult to even assess individual instances of. Two examples that I like to use are designing a transit system for a large city, or maybe for a whole country, or the world or something. That’s something that right now is done by a massive team of people. Using that whole team to sort of assess a proposed design that the AI might make would be one example of superhuman feedback, because it’s not just a single human. But you might want to be able to do this with just a single human and a team of AIs helping them, instead of a team of humans. And there’s a few proposals for how you could do that that have come out of the safety community recently, which I think are pretty interesting.

Ariel: Why is it called superhuman feedback?

David: Actually, this is just my term for it. I don’t think anyone else is using this term.

Ariel: Okay.

David: Sorry if that wasn’t clear. The reason I use it is because there are three different, like, lines of work here. So there’s these two papers from OpenAI on what’s called amplification and debate, and then another paper from DeepMind on reward learning and recursive reward learning. And I like to view these as all kind of trying to solve the same problem. How can we assist humans and enable them to make good judgements and informed judgements that actually reflect what their preferences are when they’re not capable of doing that by themselves unaided. So it’s superhuman in the sense that it’s better than a single human can do. And these proposals are also aspiring to do things I think that even teams of humans couldn’t do by having AI helpers that sort of help you do the evaluation.

An example that Yan—who’s the lead author on the DeepMind paper, which I also worked on—gives is assessing an academic paper. So if you yourself aren’t familiar with the field and don’t have the expertise to assess this paper, you might not be able to say whether or not it should be published. But if you can decompose that task into things like: is the paper valid? Are the proofs valid? Are the experiments following a reasonable protocol? Is it novel? Is it formatted correctly for the venue where it’s submitted? And you got answers to all of those from helpers, then you could make the judgment. You’d just be like okay, it meets all of the criteria, so it should be published. The idea would be to get AI helpers to do those sorts of evaluations for you across a broad range of tasks, and allow us to explain to AIs, or teach AIs what we want across a broad range of tasks in that way.

Ariel: So, okay, and so then were there other things that you wanted to mention as well?

David: I do feel like I should talk about another thing that was, again, not developed last year, but really sort of took off last year—is this new kind of neural network architecture called the transformer, which is basically being used in a lot of places where convolutional neural networks and recurrent neural networks were being used before. And those were kind of the two main driving factors behind the deep learning revolution in terms of vision, where you use convolutional networks and things that have a sequential structure, like speech, or text, where people were using recurrent neural networks. And this architecture is actually motivated originally by the same sort of scaling consideration because it allowed them to remove some of the most computationally heavy parts of running these kind of models in the context of translation, and basically make it a hundred times cheaper to train a translation model. But since then it’s also been used in a lot of other contexts and has shown to be a really good replacement for these other kinds of models for a lot of applications.

And I guess the way to describe what it’s doing is it’s based on what’s called an attention mechanism, which is basically a way of giving a neural network the ability to pay more attention to different parts of an input than other parts. So like to look at one word that is most relevant to the current translation task. So if you’re imagining outputting words one at a time, then because different languages have words in different order, it doesn’t make sense to sort of try and translate the next word. You want to look through the whole input sentence, like a sentence in English, and find the word that corresponds to whatever word should come next in your output sentence.

And that was sort of the original inspiration for this attention mechanism, but since then it’s been applied in a bunch of different ways, including paying attention to different parts of the model’s own computation, paying attention to different parts of images. And basically just using this attention mechanism in the place of the other sort of neural architectures that people thought were really important to give you temporal dependencies across something sequential like a sentence that you’re trying to translate, turned out to work really well.

Ariel: So I want to actually pass this to Roman real quick. Did you have any comments that you wanted to add to either the superhuman feedback or the transformer architecture?

Roman: Sure, so superhuman feedback: I like the idea and I think people should be exploring that, but we can kind of look at similar examples previously. So, for a while we had situation where teams of human chess players and machines did better than just unaided machines or unaided humans. That lasted about ten years. And then machines became so much better, humans didn’t really contribute anything, it was kind of just like an additional bottleneck to consult with them. I wonder if long term this solution will face similar problems. It’s very useful right now, but it seems like, I don’t know if it will scale.

David: Well I want to respond to that, because I think it’s—the idea here is, in my mind, to have something that actually scales in the way that you’re describing, where it can sort of out-compete pure AI systems. Although I guess some people might be hoping that that’s the case, because that would make the strategic picture better in terms of people’s willingness to use safer systems. But this is more about just how can we even train systems—if we have the willpower, if people want to build a system that has the human in charge, and ends up doing what the human wants—how can we actually do that for something that’s really complicated?

Roman: Right. And as I said, I think it’s a great way to get there. So this part I’m not concerned about. It’s a long-term game with that.

David: Yeah, no, I mean I agree that that is something to be worried about as well.

Roman: There is a possibility of manipulation if you have a human in the loop, and that itself makes it not safer but more dangerous in certain ways.

David: Yeah, one of the biggest concerns I have for this whole line of work is that the human needs to really trust the AI systems that are assisting it, and I just don’t see that we have good enough mechanisms for establishing trust and building trustworthy systems right now, to really make this scale well without introducing a lot of risk for things like manipulation, or even just compounding of errors.

Roman: But those approaches, like the debate approach, it just feels like they’re setting up humans for manipulation from both sides, and who’s better at breaking the human psychological model.

David: Yep, I think it’s interesting, and I think it’s a good line of work. But I think we haven’t seen anything that looks like a convincing solution to me yet.

Roman: Agreed.

Ariel: So, Roman, was there anything else that you wanted to add about things that happened in the last year that we didn’t get to?

Roman: Well, as a professor, I can tell you that students stop learning after about 40 minutes. So I think at this point we’re just being counterproductive.

Ariel: So for what it’s worth, our most popular podcasts have all exceeded two hours. So, what are you looking forward to in 2019?

Roman: Are you asking about safety or development?

Ariel: Whatever you want to answer. Just sort of in general, as you look toward 2019, what relative to AI are you most excited and hopeful to see, or what do you predict we’ll see?

David: So I’m super excited for people to hopefully pick up on this reward learning agenda that I mentioned that Jan and me and people at DeepMind worked on. I was actually pretty surprised how little work has been done on this. So the idea of this agenda at a high level is just: we want to learn a reward function—which is like a score, that tells an agent how well it’s doing—learn reward functions that encode what we want the AI to do, and that’s the way that we’re going to specify tasks to an AI. And I think from a machine learning researcher point of view this is kind of the most obvious solution to specification problems and to safety—is just learner reward function. But very few people are really trying to do that, and I’m hoping that we’ll see more people trying to do that, and encountering and addressing some of the challenges that come up.

Roman: So I think by definition we cannot predict short-term breakthroughs. So what we’ll see is a lot of continuation of 2018 work, and previous work scaling up. So, if you have, let’s say, Texas hold ’em poker: so for two players, we’ll take it to six players, ten players, something like that. And you can make similar projections for other fields, so the strategy games will be taken to new maps, involve more players, maybe additional handicaps will be introduced for the bots. But that’s all we can really predict, kind of gradual improvement.

Protein folding will be even more efficient in terms of predicting actual structures: Any type of accuracy rates, if they were climbing from 80% to 90%, will hit 95, 96. And this is a very useful way of predicting what we can anticipate, and I’m trying to do something similar with accidents. So if we can see historically what was going wrong with systems, we can project those trends forward. And I’m happy to say that there is now at least two or three different teams working and collecting those examples and trying to analyze them and create taxonomies for them. So that’s very encouraging.

David: Another thing that comes to mind is—I mentioned adversarial examples earlier, which are these imperceptible differences to a human that change how the AI system perceives something like an image. And so far, for the most part, the field has been focused on really imperceptible changes. But I think now people are starting to move towards a broader idea of what counts as an adversarial example. So basically anything that a human thinks clearly should belong to this class and the AI system thinks clearly should belong to this other class that has sort have been constructed deliberately to create that kind of a difference.

And I think this going to be really interesting and exciting to see how the field tries to move in that direction, because as I mentioned, I think it’s hard to define how humans decide whether or not something is a picture of a cat or something. And the way that we’ve done it so far is just by giving lots of examples of things that we say are cats. But it turns out that that isn’t sufficient, and so I think this is really going to push a lot of people closer towards thinking about some of the really core safety challenges within the mainstream machine learning community. So I think that’s super exciting.

Roman: It is a very interesting topic, and I am in particular looking at a side subject in that, which is adversarial inputs for humans, and machines developing which I guess is kind of like optical illusions, and audio illusions, where a human is mislabeling inputs in a predictable way, which is allowing for manipulation.

Ariel: Along very similar lines, I think I want to modify my questions slightly, and also ask: coming up in 2019, what are you both working on that you’re excited about, if you can tell us?

Roman: Sure, so there has been a number of publications looking at particular limitations, either through mathematical proofs or through well known economic models, and what is possible in fact, from computational, complexity points of view. And I’m trying to kind of integrate those into a single model showing—in principle, not in practice, but even in principle—what can we do with the AI control problem? How solvable is it? Is it solvable? Is it not solvable? Because I don’t think there is a mathematically rigorous proof, or even a rigorous argument either way. So I think that will be helpful, especially with kind of arguing about importance of a problem and resource allocation.

David: I’m trying to think what I can talk about. I guess right now I have some ideas for projects that are not super well thought out, so I won’t talk about those. And I have a project that I’m trying to finish off which is a little bit hard to describe in detail, but I’ll give the really high level motivation for it. And it’s about something that people in the safety community like to call capability control. I think Nick Bostrom has these terms, capability control and motivation control. And so what I’ve been talking about most of the time in terms of safety during this podcast was more like motivation control, like getting the AI to want to do the right thing, and to understand what we want. But that might end up being too hard, or sort of limited in some respect. And the alternative is just to make AIs that aren’t capable of doing things that are dangerous or catastrophic.

A lot of people in the safety community sort of worry about capability control approaches failing because if you have a very intelligent agent, it will view these attempts to control it as undesirable, and try and free itself from any constraints that we give it. And I think a way of sort of trying to get around that problem is to sort of look at capability control from the lens of motivation control. So to basically make an AI that doesn’t want to influence certain things, and maybe doesn’t have some of these drives to influence the world, or to influence the future. And so in particular I’m trying to see how can we design agents that really don’t try to influence the future, and really only care about doing the right thing, right now. And if we try and do that in a sort of naïve way, or there ways that can fail, and we can get some sort of emergent drive to still try and optimize over the long term, or try and have some influence in the future. And I think to the extent we see things like that, that’s problematic from this perspective of let’s just make AIs that aren’t capable or motivated to influence the future.

Ariel: Alright! I think I’ve kept you both on for quite a while now. So, David and Roman, thank you so much for joining us today.

David: Yeah, thank you both as well.

Roman: Thank you so much.

AI Alignment Podcast: The Byzantine Generals’ Problem, Poisoning, and Distributed Machine Learning with El Mahdi El Mhamdi (Beneficial AGI 2019)

Three generals are voting on whether to attack or retreat from their siege of a castle. One of the generals is corrupt and two of them are not. What happens when the corrupted general sends different answers to the other two generals?

Byzantine fault is “a condition of a computer system, particularly distributed computing systems, where components may fail and there is imperfect information on whether a component has failed. The term takes its name from an allegory, the “Byzantine Generals’ Problem”, developed to describe this condition, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable.

The Byzantine Generals’ Problem and associated issues in maintaining reliable distributed computing networks is illuminating for both AI alignment and modern networks we interact with like Youtube, Facebook, or Google. By exploring this space, we are shown the limits of reliable distributed computing, the safety concerns and threats in this space, and the tradeoffs we will have to make for varying degrees of efficiency or safety.

The Byzantine Generals’ Problem, Poisoning, and Distributed Machine Learning with El Mahdi El Mhamdi is the ninth podcast in the AI Alignment Podcast series, hosted by Lucas Perry. El Mahdi pioneered Byzantine resilient machine learning devising a series of provably safe algorithms he recently presented at NeurIPS and ICML. Interested in theoretical biology, his work also includes the analysis of error propagation and networks applied to both neural and biomolecular networks. This particular episode was recorded at the Beneficial AGI 2019 conference in Puerto Rico. We hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, iTunes, Google Play, Stitcher, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

Topics discussed in this episode include:

  • The Byzantine Generals’ Problem
  • What this has to do with artificial intelligence and machine learning
  • Everyday situations where this is important
  • How systems and models are to update in the context of asynchrony
  • Why it’s hard to do Byzantine resilient distributed ML.
  • Why this is important for long-term AI alignment

An overview of Adversarial Machine Learning and where Byzantine-resilient Machine Learning stands on the map is available in this (9min) video . A specific focus on Byzantine Fault Tolerant Machine Learning is available here (~7min)

In particular, El Mahdi argues in the first interview (and in the podcast) that technical AI safety is not only relevant for long term concerns, but is crucial in current pressing issues such as social media poisoning of public debates and misinformation propagation, both of which fall into Poisoning-resilience. Another example he likes to use is social media addiction, that could be seen as a case of (non) Safely Interruptible learning. This value misalignment is already an issue with the primitive forms of AIs that optimize our world today as they maximize our watch-time all over the internet.

The latter (Safe Interruptibility) is another technical AI safety question El Mahdi works on, in the context of Reinforcement Learning. This line of research was initially dismissed as “science fiction”, in this interview (5min), El Mahdi explains why it is a realistic question that arises naturally in reinforcement learning

“El Mahdi’s work on Byzantine-resilient Machine Learning and other relevant topics is available on his Google scholar profile. A modification of the popular machine learning library TensorFlow, to make it Byzantine-resilient (and also support communication over UDP channels among other things) has been recently open-sourced on Github by El Mahdi’s colleagues based on his algorithmic work we mention in the podcast.

To connect with him over social media

You can listen to the podcast above or read the transcript below.

Lucas: Hey, everyone. Welcome back to the AI Alignment Podcast series. I’m Lucas Perry, and today we’ll be speaking with El Mahdi El Mhamdi on the Byzantine problem, Byzantine tolerance, and poisoning in distributed learning and computer networks. If you find this podcast interesting or useful, please give it a like and follow us on your preferred listing platform. El Mahdi El Mhamdi pioneered Byzantine resilient machine learning devising a series of provably safe algorithms he recently presented at NeurIPS and ICML. Interested in theoretical biology, his work also includes the analysis of error propagation and networks applied to both neural and biomolecular networks. With that, El Mahdi’s going to start us off with a thought experiment.

El Mahdi: Imagine you are part of a group of three generals, say, from the Byzantine army surrounding a city you want to invade, but you also want to retreat if retreat is the safest choice for your army. You don’t want to attack when you will lose, so those three generals that you’re part of are in three sides of the city. They sent some intelligence inside the walls of the city, and depending on this intelligence information, they think they will have a good chance of winning and they would like to attack, or they think they will be defeated by the city, so it’s better for them to retreat. Your final decision would be a majority vote, so you communicate through some horsemen that, let’s say, are reliable for the sake of this discussion. But there might be one of you who might have been corrupt by the city.

The situation would be problematic if, say, there are General A, General B, and General C. General A decided to attack. General B decided to retreat based on their intelligence for some legitimate reason. A and B are not corrupt, and say that C is corrupt. Of course, A and B, they can’t figure out who was corrupt. Say C is corrupt. What this general would do they thing, so A wanted to attack. They will tell them, “I also want to attack. I will attack.” Then they will tell General B, “I also want to retreat. I will retreat.” A receives two attack votes and one retreat votes. General B receives two retreat votes and only one attack votes. If they trust everyone, they don’t do any double checking, this would be a disaster.

A will attack alone; B would retreat; C, of course, doesn’t care because he was corrupt by the cities. You can tell me they can circumvent that by double checking. For example, A and B can communicate on what C told them. Let’s say that every general communicates with every general on what he decides and on also what’s the remaining part of the group told them. A will report to B, “General C told me to attack.” Then B would tell C, “General C told me to retreat.” But then A and B wouldn’t have anyway of concluding whether the inconsistency is coming from the fact that C is corrupt or that the general reporting on what C told them is corrupt.

I am General A. I have all the valid reasons to think with the same likelihood that C is maybe lying to me or also B might also be lying to me. I can’t know if you are misreporting what C told you enough for the city to corrupt one general if there are three. It’s impossible to come up with an agreement in this situation. You can easily see that this will generalize to having more than three generals, like I say 100, as soon as the non-corrupt one are less than two-thirds because what we saw with three generals would happen with the fractions that are not corrupt. Say that you have strictly more than 33 generals out of 100 who are corrupt, so what they can do is they can switch the majority votes on each side.

But worse than that, say that you have 34 corrupt generals and the remaining 66 not corrupt generals. Say that those 66 not corrupt generals were 33 on the attack side, 33 on the retreat side. The problem is that when you are in some side, say that you are in the retreat side, you have in front of you a group of 34 plus 33 in which there’s a majority of malicious ones. This majority can collude. It’s part of the Byzantine hypothesis. The malicious ones can collude and they will report a majority of inconsistent messages on the minority on the 33 ones. You can’t provably realize that the inconsistency is coming from the group of 34 because they are a majority.

Lucas: When we’re thinking about, say, 100 persons or 100 generals, why is it that they’re going to be partitioned automatically into these three groups? What if there’s more than three groups?

El Mahdi: Here we’re doing the easiest form of Byzantine agreement. We want to agree on attack versus retreat. When it’s become multi-dimensional, it gets even messier. There are more impossibility results and impossibility results. Just like with the binary decision, there is an impossibility theorem on having agreement if you have unsigned messages to horsemen. Whenever the corrupt group exceeds 33%, you provably cannot come up with an agreement. There are many variants to this problem, of course, depending on what hypothesis you can assume. Here, without even mentioning it, we were assuming bounded delays. The horsemen would always arrive eventually. If the horsemen could die on the way and you don’t have any way to check whether they arrive or not or you can be waiting forever because you don’t have any proof that the horsemen died on the way.

You don’t have any mechanism to tell you, “Stop waiting for the horsemen. Stop waiting for the message from General B because the horsemen died.” You can be waiting forever and there are theorems that shows that when you have unbounded delays, and by the way, like in distributed computing, whenever you have in bounded delays, we speak about asynchrony. If you have a synchronous communication, there is a very famous theorem that tells you consensus is impossible, not even in the malicious case, but just like in …

Lucas: In the mundane normal case.

El Mahdi: Yes. It’s called the Fischer Lynch Patterson theorem theorem .

Lucas: Right, so just to dive down into the crux of the problem, the issue here fundamentally is that when groups of computers or groups of generals or whatever are trying to check who is lying amongst discrepancies and similarities of lists and everyone who’s claiming what is when there appears to be a simple majority within that level of corrupted submissions, then, yeah, you’re screwed.

El Mahdi: Yes. It’s impossible to achieve agreement. There are always fractions of malicious agents above which is provably impossible to agree. Depending on the situation, it will be a third or sometimes or a half or a quarter, depending on your specifications.

Lucas: If you start tweaking the assumptions behind the thought experiment, then it changes what number of corrupted machines or agents that are required in order to flip the majority and to poison the communication.

El Mahdi: Exactly. But for example, you mentioned something very relevant to today’s discussion, which is what if we were not agreeing on two decisions, retreat, attack. What if we were agreeing on some multi-dimensional decision? Attack or retreat on one dimension and then …

Lucas: Maybe hold, keep the siege going.

El Mahdi: Yeah, just like add possibilities or dimensions and multi-dimensional agreements. They’re even more hopeless results in that direction

Lucas: There are more like impossibility theorems and issues where these distributed systems are vulnerable to small amounts of systems being corrupt and screwing over the entire distributed network.

El Mahdi: Yes. Maybe now we can slightly move to machine learning.

Lucas: I’m happy to move into machine learning now. We’ve talked about this, and I think our audience can probably tell how this has to do with computers. Yeah, just dive in what this has to do with machine learning and AI and current systems today, and why it even matters for AI alignment.

El Mahdi: As a brief transition, solving the agreement problem besides this very nice historic thought experiment is behind consistencies of safety critical systems like banking systems. Imagine we have a shared account. Maybe you remove 10% of the amount and then she or he added some $10 to the accounts. You remove the $10 in New York and she or he put the $10 in Los Angeles. The banking system has to agree on the ordering because minus $10 plus 10% is not the same result as plus 10% then minus $10. The final balance of the account would not be the same.

Lucas: Right.

El Mahdi: The banking systems routinely are solving decisions that fall into agreement. If you work on some document sharing platform, like Dropbox or Google Docs, whatever, and we collaboratively are writing the document, me and you. The document sharing platform has to, on real time, solve agreements about the ordering of operations so that me and you always keep seeing the same thing. This has to happen while some of the machines that are interconnecting us are failing, whether just like failing because there was a electric crash or something. Data center has lost some machines or if it was like restart, a bug or a take away. What we want in distributed computing is that we would like communications schemes between machines that’s guarantee this consistency that comes from agreement as long as some fraction of machines are reliable. What this has to do with artificial intelligence and machine learning reliability is that with some colleagues, we are trying to encompass one of the major issues in machine learning reliability inside the Byzantine fault tolerance umbrella. For example, you take, for instance, poisoning attacks.

Lucas: Unpack what poisoning attacks are.

El Mahdi: For example, imagine you are training a model on what are good videos to recommend given some key word search. If you search for “medical advice for young parents on vaccine,” this is a label. Let’s assume for the sake of simplicity that a video that tells you not to take your kid for vaccines is not what we mean by medical advice for young parents on vaccine because that’s what medical experts agree on. We want our system to learn that anitvaxers, like anti-vaccine propaganda is not what people are searching for when they type those key words, so I suppose a world where we care about accuracy, okay? Imagine you want to train a machine learning model that gives you accurate results of your search. Let’s also for the sake of simplicity assume that a majority of people on the internet are honest.

Let’s assume that more than 50% of people are not actively trying to poison the internet. Yeah, this is very optimistic, but let’s assume that. What we can show and what me and my colleagues started this line of research with is that you can easily prove that one single malicious agent can provably poison a distributed machine learning scheme. Imagine you are this video sharing platform. Whenever people behave on your platform, this generates what we call gradients, so it updates your model. It only takes a few hyperactive accounts that could generate behavior that is powerful enough to pull what we call the average gradient because what distributed machine learning is using, at least up to today, if you read the source code of most distributed machine learning frameworks. Distributed machine learning is always averaging gradients.

Imagine you Lucas Perry just googled a video on the Parkland shootings. Then the video sharing platform shows you a video telling you that David Hogg and Emma Gonzalez and those kids behind the March for Our Lives movement are crisis actors. The video labels three kids as crisis actors. It obviously has a wrong label, so it is what I will call a poisoned data point. If you are non-malicious agents on the video sharing platform, you will dislike the video. You will not approve it. You’re likely to flag it. This should generate a gradient that pushes the model in that direction, so the gradient will update the model into a direction where it stops thinking that this video is relevant for someone searching “Parkland shooting survivors.” What can happen if your machine learning framework is just averaging gradients is that a bunch of hyperactive people on some topic could poison the average and pull it towards the direction where the models is enforcing this thinking that, “Yeah, those kids are crisis actors.”

Lucas: This is the case because the hyperactive accounts are seen to be given more weight than accounts which are less active in the same space. But this extra weighting that these accounts will get from their hyperactivity in one certain category or space over another, how is the weighting done? Is it just time spent per category or does it have to do with submissions that agree with the majority?

El Mahdi: We don’t even need to go into the details because we don’t know. I’m talking in a general setting where you have a video sharing platform aggregating gradients for behavior. Now, maybe let’s raise the abstraction level. You are doing gradient descents, so you have a lost function that you want to minimize. You have an error function. The error function is the mismatch between what you predict and what the user tells you. The user tells you this is a wrong prediction, and then you move to the direction where the users stop telling you this is the wrong direction. You are doing great in this sense minimizing the lost function. User behaves, and with their behavior, you generate gradients.

What you do now in the state of the arts way of distributed machine learning is that you average all those gradients. Averaging is well known not to be resilient. If you have a room of poor academics earning a few thousand dollars and then a billionaire jumps in the room, if your algorithm reasons with averaging, it will think that this is a room of millionaires because the average salary would be a couple of hundred millions. But then million is very obvious to do when it comes to salaries and numbers scalers because you can rank them.

Lucas: Right.

El Mahdi: You rank numbers and then decide, “Okay, this is the ordering. This is the number that falls in the middle. This is the upper half. This is the lower half and this is the median.” When it becomes high dimensional, the median is a bit tricky. It has some computational issues. Then even if you compute what we call the geometric median, an attacker can still know how to leverage the fact that you’re only approximating it because there’s no closed formula. There’s no closed form to compute the median in that dimension. But worse than that, what we showed in one of our follow up works is because of the fact that machine learning is done in very, very, very high dimensions, you would have a curse of the dimensionality issue that makes it possible for attackers to sneak in without being spot as a way of the median.

It can still look like the median vector. I take benefits from the fact that those vectors, those gradients, are extremely high dimensional. I would look for all the disagreements. Let’s say you have a group of a couple hundred gradients, and I’m the only malicious one. I would look at the group of correct vectors all updating you somehow in the same direction within some variants. On average, they’re like what we call unbiased estimators of the gradient. When you take out the randomness, the expected value they will give you is the real gradient of the loss function. What I will do as a malicious worker is I will look at the way they are disagreeing slightly on each direction.

I will sum that. I will see that they disagree by this much on direction one. They disagree by this much on direction two. They disagree by this much, epsilon one, epsilon two, epsilon three. I would look for all these small disagreements they have on all the components.

Lucas: Across all dimensions and high dimensional space. [crosstalk 00:16:35]

El Mahdi: Then add that up. It will be my budget, my leeway, my margin to attack you on another direction.

Lucas: I see.

El Mahdi: What we proved is that you have to mix ideas from geometric median with ideas from the traditional component-wise median, and that those are completely different things. The geometric median is a way to find a median by just minimizing the sum of distances between what you look for and all the vectors that were proposed, and then the component-wise median will do a traditional job of ranking the coordinates. It looks at each coordinate, and then rank all the propositions, and then look for the proposition that lies in the middle. Once we proved enough follow up work is that, yeah, the geometric median idea is elegant. It can make you converge, but it can make you converge to something arbitrarily bad decided by the attacker. When you train complex models like neural nets, the landscape you optimize inside is not convex. It’s not like a bowl or a cup where you just follow the descending slope you would end up in the lowest point.

Lucas: Right.

El Mahdi: It’s like a multitude of bowls with different heights.

Lucas: Right, so there’s tons of different local minima across the space.

El Mahdi: Exactly. So in the first paper what we showed is that ideas that look like the geometric median are enough to just converge. You converge. You provably converge, but in the follow up what we realized, like something we were already aware of, but not enough in my opinion, is that there is this square root D, this curse of dimensionality that will arise when you compute high dimensional distances. That the attacker can leverage.

So in what we call the hidden vulnerability of distributed learning, you can have correct vectors, agreeing on one component. Imagine in your head some three axis system.

Let’s say that they are completely in agreement on axis three. But then in axis one, two, so in the plane formed by the axis one and axis two, they have a small disagreement.

What I will do as the malicious agent, is that I will leverage this small disagreement, and inject it in axis three. And this will make you go to a bit slightly modified direction. And instead of going to this very deep, very good minima, you will go into a local trap that is just close ahead.

And that comes from the fact that loss functions of interesting models are clearly like far from being convex. The models are highly dimensional, and the loss function is highly un-convex, and creates a lot of leeway.

Lucas: It creates a lot of local minima spread throughout the space for you to attack the person into.

El Mahdi: Yeah. So convergence is not enough. So we started this research direction by formulating the following question, what does it take to guarantee convergence?

And any scheme that aggregates gradients, and guarantee convergence is called Byzantine resilient. But then you can realize that in very high dimensions, and highly non-convex loss functions, is convergence enough? Would you just want to converge?

There are of course people arguing the deep learning models, like there’s this famous paper by Anna Choromanska, and Yann LeCun, and  Gérard Ben Arous, about the landscape of neural nets, that basically say that, “Yeah, very deep local minimum of neural nets are some how as good.”

From an overly simplified point of view, it’s an optimistic paper, that tells you that you shouldn’t worry too much when you optimize neural nets about the fact that gradient descent would not necessarily go to a global like-

Lucas: To a global minima.

El Mahdi: Yeah. Just like, “Stop caring about that.”

Lucas: Because the local minima are good enough for some reason.

El Mahdi: Yeah. I think that’s a not too unfair way to summarize the paper for the sake of this talk, for the sake of this discussion. What we empirically illustrate here, and theoretically support is that that’s not necessarily true.

Because we show that with very low dimensional, not extremely complex models, trained on CIFAR-10 and MNIST, which are toy problems, very easy toy problems, low dimensional models etc. It’s already enough to have those amounts of parameters, let’s say 100,000 parameters or less, so that an attacker would always find a direction to take you each time away, away, away, and then eventually find an arbitrarily bad local minimum. And then you just converge to that.

So convergence is not enough. Not only you have to seek an aggregation rule that guarantees convergence, but you have to seek some aggregation rules that guarantee that you would not converge to something arbitrarily bad. You would keep converging to the same high quality local minimum, whatever that means.

The hidden vulnerability is this high dimensional idea. It’s the fact that because the loss function is highly non-convex, because there’s the high dimensionality, as an attacker I would always find some direction, so the attack goes this way.

Here the threat model is that an attacker can spy on your gradients, generated by the correct workers but cannot talk on their behalf. So I cannot corrupt the messages. Since you asked about the reliability of horsemen or not.

So horsemen are reliable. I can’t talk on your behalf, but I can spy on you. I can see what are you sending to the others, and anticipate.

So I would as an attacker wait for correct workers to generate their gradients, I will gather those vectors, and then I will just do a linear regression on those vectors to find the best direction to leverage the disagreement on the D minus one remaining directions.

So because there would be this natural disagreement, this variance in many directions, I will just do some linear regression and find what is the best direction to keep? And use the budget I gathered, those epsilons I mentioned earlier, like this D time epsilon on all the directions to inject it the direction that will maximize my chances of taking you away from local minima.

So you will converge, as proven in the early papers, but not necessarily to something good. But what we showed here is that if you combine ideas from multidimensional geometric medians, with ideas from single dimensional component-wise median, you improve your robustness.

Of course it comes with a price. You require three quarters of the workers to be reliable.

There is another direction where we expanded this problem, which is asynchrony. And asynchrony arises when as I said in the Byzantine generals setting, you don’t have a bounded delay. In the bounded delay setting, you know that horses arrive at most after one hour.

Lucas: But I have no idea if the computer on the other side of the planet is ever gonna send me that next update.

El Mahdi: Exactly. So imagine you are doing machine learning on smartphones. You are leveraging a set of smartphones all around the globe, and in different bandwidths, and different communication issues etc.

And you don’t want each time to be bottlenecked by the slowest one. So you want to be asynchronous, you don’t want to wait. You’re just like whenever some update is coming, take it into account.

Imagine some very advanced AI scenario, where you send a lot of learners all across the universe, and then they communicate with the speed of light, but some of them are five light minutes away, but some others are two hours and a half. And you want to learn from all of them, but not necessarily handicap the closest one, because there are some other learners far away.

Lucas: You want to run updates in the context of asynchrony.

El Mahdi: Yes. So you want to update whenever a gradient is popping up.

Lucas: Right. Before we move on to illustrate the problem again here is that the order matters, right? Like in the banking example. Because the 10% plus 10 is different from-

El Mahdi: Yeah. Here the order matters for different reasons. You update me so you are updating me on the model you got three hours ago. But in the meanwhile, three different agents updated me on the models, while getting it three minutes ago.

All the agents are communicating through some abstraction they call the server maybe. Like this server receives updates from fast workers.

Lucas: It receives gradients.

El Mahdi: Yeah, gradients. I also call them updates.

Lucas: Okay.

El Mahdi: Because some workers are close to me and very fast, I’ve done maybe 1000 updates, while you were still working and sending me the message.

So when your update arrive, I can tell whether it is very stale, very late, or malicious. So what we do in here is that, and I think it’s very important now to connect a bit back with classic distributed computing.

Is that Byzantine resilience in machine learning is easier than Byzantine resilience in classical distributed computing for one reason, but it is extremely harder for another reason.

The reason is that we know what we want to agree on. We want to agree on a gradient. We have a toolbox of calculus that tells us how this looks like. We know that it’s the slope of some loss function that is most of today’s models, relatively smooth, differentiable, maybe Lipschitz, bounded, whatever curvature.

So we know that we are agreeing on vectors that are gradients of some loss function. And we know that there is a majority of workers that will produce vectors that will tell us what does a legit vector look like.

You can find some median behavior, and then come up with filtering criterias that will get away with the bad gradients. That’s the good news. That’s why it’s easier to do Byzantine resilience in machine learning than to do Byzantine agreement. Byzantine agreement, because agreement is a way harder problem.

The reason why Byzantine resilience is harder in machine learning than in the typical settings you have in distributed computing is that we are dealing with extremely high dimensional data, extremely high dimensional decisions.

So a decision here is to update the model. It is triggered by a gradient. So whenever I accept a gradient, I make a decision. I make a decision to change the model, to take it away from this state, to this new state, by this much.

But this is a multidimensional update. And Byzantine agreement, or Byzantine approximate agreement in higher dimension has been provably hopeless by Hammurabi Mendes, and Maurice Herlihy in an excellent paper in 2013, where they show that you can’t do Byzantine agreement in D dimension with N agents in less than N to the power D computations, per agent locally.

Of course in their paper, they were meaning Byzantine agreement on positions. So they were framing it with a motivations saying, “This is N to the power D, but the typical cases we care about in distributed computing are like robots agreeing on a position on a plane, or on a position in a three dimensional space.” So D is two or three.

So N to the power two or N to the power three is fine. But in machine learning D is not two and three, D is a billion or a couple of millions. So N to the power a million is just like, just forget.

And not only that, but also they require … Remember when I tell you that Byzantine resilience computing would always have some upper bound on the number malicious agents?

Lucas: Mm-hmm (affirmative).

El Mahdi: So the number of total agents should exceed D times the number of malicious agents.

Lucas: What is D again sorry?

El Mahdi: Dimension.

Lucas: The dimension. Okay.

El Mahdi: So if you have to agree on D dimension, like on a billion dimensional decision, you need at least a billion times the number of malicious agents.

So if you have say 100 malicious agents, you need at least 100 billion total number of agents to be resistant. No one is doing distributed machine learning on 100 billion-

Lucas: And this is because the dimensionality is really screwing with the-

El Mahdi: Yes. Byzantine approximate agreement has been provably hopeless. That’s the bad, that’s why the dimensionality of machine learning makes it really important to go away, to completely go away from traditional distributed computing solutions.

Lucas: Okay.

El Mahdi: So we are not doing agreement. We’re not doing agreement, we’re not even doing approximate agreement. We’re doing something-

Lucas: Totally new.

El Mahdi: Not new, totally different.

Lucas: Okay.

El Mahdi: Called gradient decent. It’s not new. It’s as old as Newton. And it comes with good news. It comes with the fact that there are some properties, like some regularity of the loss function, some properties we can exploit.

And so in the asynchronous setting, it becomes even more critical to leverage those differentiability properties. So because we know that we are optimizing a loss functions that has some regularities, we can have some good news.

And the good news has to do with curvature. What we do here in asynchronous setting, is not only we ask workers for their gradients, we ask them for their empirical estimate of the curvature.

Lucas: Sorry. They’re estimating the curvature of the loss function, that they’re adding the gradient to?

El Mahdi: They add the gradient to the parameter, not the loss function. So we have a loss function, parameter is the abscissa, you add the gradient to the abscissa to update the model, and then you end up in a different place of the loss function.

So you have to imagine the loss function as like a surface, and then the parameter space as the plane, the horizontal plane below the surface. And depending on where you are in the space parameter, you would be on different heights of the loss function.

Lucas: Wait sorry, so does the gradient depend where you are on this, the bottom plane?

El Mahdi: Yeah [crosstalk 00:29:51]-

Lucas: So then you send an estimate for what you think the slope of the intersection will be?

El Mahdi: Yeah. But for asynchrony, not only that. I will ask you to send me the slope, and your observed empirical growth of the slope.

Lucas: The second derivative?

El Mahdi: Yeah.

Lucas: Okay.

El Mahdi: But the second derivative again in high dimension is very hard to compute. You have to computer the Hessian matrix.

Lucas: Okay.

El Mahdi: That’s something like completely ugly to compute in high dimensional situations because it takes D square computations.

As an alternative we would like you to send us some linear computation in D, not a square computation in D.

So we would ask you to compute your actual gradient, your previous gradient, the difference between them, and normalize it by the difference between models.

So, “Tell us your current gradient, by how much it changed from the last gradient, and divide that by how much you changed the parameter.”

So you would tell us, “Okay, this is my current slope, and okay this is the gradient.” And you will also tell us, “By the way, my slope change relative to my parameter change is this much.”

And this would be some empirical estimation of the curvature. So if you are in a very curved area-

Lucas: Then the estimation isn’t gonna be accurate because the linearity is gonna cut through some of the curvature.

El Mahdi: Yeah but if you are in a very curved area of the loss function, your slope will change a lot.

Lucas: Okay. Exponentially changing the slope.

El Mahdi: Yeah. Because you did a very tiny change in the parameter and it takes a lot of the slope.

Lucas: Yeah. Will change the … Yeah.

El Mahdi: When you are in a non-curved area of the loss function, it’s less harmful for us that you are stale, because you will just technically have the same updates.

If you are in a very curved area of the loss function, your updates being stale is now a big problem. So we want to discard your updates proportionally to your curvature.

So this is the main idea of this scheme in asynchrony, where we would ask workers about their gradient, and their empirical growth rates.

And then of course I don’t want to trust you on what you declare, because you can plan to screw me with some gradients, and then declare a legitimate value of the curvature.

I will take those empirical, what we call in the paper empirical Lipschitz-ness. So we ask you for this empirical growth rate, that it’s a scalar, remember? This is very important. It’s a single dimensional number.

And so we ask you about this growth rate, and we ask all of you about growth rates, again assuming the majority is correct. So the majority of growth rates will help us set the median growth rate in a robust manner, because as long as a simple majority is not lying, the median growth rates will always be bounded between two legitimate values of the growth rate.

Lucas: Right because, are you having multiple workers inform you of the same part of your loss function?

El Mahdi: Yes. Even though they do it in an asynchronous manner.

Lucas: Yeah. Then you take the median of all of them.

El Mahdi: Yes. And then we reason by quantiles of the growth rates.

Lucas: Reason by quantiles? What are quantiles?

El Mahdi: The first third, the second third, the third third. Like the first 30%, the second 30%, the third 30%. We will discard the first 30%, discard the last 30%. Anything in the second 30% is safe.

Of course this has some level of pessimism, which is good for safety, but not very good for being fast. Because maybe people are not lying, so maybe the first 30%, and the last 30% are also values we could consider. But for safety reasons we want to be sure.

Lucas: You want to try to get rid of the outliers.

El Mahdi: Possible.

Lucas: Possible outliers.

El Mahdi: Yeah. So we get rid of the first 30%, the last 30%.

Lucas: So this ends up being a more conservative estimate of the loss function?

El Mahdi: Yes. That’s completely right. We explain that in the paper.

Lucas: So there’s a trade off that you can decide-

El Mahdi: Yeah.

Lucas: By choosing what percentiles to throw away.

El Mahdi: Yeah. Safety never comes for free. So here, depending on how good your estimates about the number of potential Byzantine actors is, your level of pessimism with translate into slowdown.

Lucas: Right. And so you can update the amount that you’re cutting off-

El Mahdi: Yeah.

Lucas: Based off of the amount of expected corrupted signals you think you’re getting.

El Mahdi: Yeah. So now imagine a situation where you know the number of workers is know. You know that you are leveraging 100,000 smartphones doing gradient descent for you. Let’s call that N.

You know that F of them might be malicious. We argue that if F is exceeding the third of N, you can’t do anything. So we are in a situation where F is less than a third. So less than 33,000 workers are malicious, then the slowdown would be F over N, so a third.

What if you are in a situation where you know that your malicious agents are way less than a third? For example you know that you have at most 20 rogue accounts in your video sharing platform.

And your video sharing platform has two billion accounts. So you have two billion accounts.

Lucas: 20 of them are malevolent.

El Mahdi: What we show is that the slowdown would be N minus F divided by N. N is the two billion accounts, F is the 20, and is again two billion.

So it would be two billion minus 20, so one million nine hundred billion, like something like 0.999999. So you would go almost as fast as the non-Byzantine resilient scheme.

So our Byzantine resilient scheme has a slowdown that is very reasonable in situations where F, the number of malicious agents is way less than N, the total number of agents, which is typical in modern…

Today, like if you ask social media platforms, they have a lot of a tool kits to prevent people from creating a billion fake accounts. Like you can’t in 20 hours create an army of several million accounts.

None of the mainstream social media platforms today are susceptible to this-

Lucas: Are susceptible to massive corruption.

El Mahdi: Yeah. To this massive account creation. So you know that the number of corrupted accounts are negligible to the number of total accounts.

So that’s the good news. The good news is that you know that F is negligible to N. But then the slowdown of our Byzantine resilient methods is also close to one.

But it has the advantage compared to the state of the art today to train distributed settings of not taking the average gradient. And we argued in the very beginning that those 20 accounts that you could create, it doesn’t take a bot army or whatever, you don’t need to hack into the machines of the social network. You can have a dozen human, sitting somewhere in a house manually creating 20 accounts, training the accounts over time, doing behavior that makes the legitimate for some topics, and then because you’re distributing machine learning scheme would average the gradients generated by people behavior and that making your command anti-vaccine or controversies, anti-Semitic conspiracy theories.

Lucas: So if I have 20 bad gradients and like, 10,000 good gradients for a video, why is it that with averaging 20 bad gradients are messing up the-

El Mahdi: The amplitude. It’s like the billionaire in the room of core academics.

Lucas: Okay, because the amplitude of each of their accounts is greater than the average of the other accounts?

El Mahdi: Yes.

Lucas: The average of other accounts that are going to engage with this thing don’t have as large of an amplitude because they haven’t engaged with this topic as much?

El Mahdi: Yeah, because they’re not super credible on gun control, for example.

Lucas: Yeah, but aren’t there a ton of other accounts with large amplitudes that are going to be looking at the same video and correcting over the-

El Mahdi: Yeah, let’s define large amplitudes. If you come to the video and just like it, that’s a small update. What about you like it, post very engaging comments-

Lucas: So you write a comment that gets a lot of engagement, gets a lot of likes and replies.

El Mahdi: Yeah, that’s how you increase your amplitude. And because you are already doing some good job in becoming the reference on that video-sharing platform when it comes to discussing gun control, the amplitude of your commands is by definition high and the fact that your command was very early on posted and then not only you commented the video but you also produced a follow-up video.

Lucas: I see, so the gradient is really determined by a multitude of things that the video-sharing platform is measuring for, and the metrics are like, how quickly you commented, how many people commented and replied to you. Does it also include language that you used?

El Mahdi: Probably. It depends on the social media platform and it depends on the video-sharing platform and, what is clear is that there are many schemes that those 20 accounts created by this dozen people in a house can try to find good ways to maximize the amplitude of their generated gradients, but this is a way easier problem than the typical problems we have in technical AI safety. This is not value alignment or value loading or coherent extrapolated volition. This is a very easy, tractable problem on which now we have good news, provable results. What’s interesting is the follow-up questions that we are trying to investigate here with my colleagues, the first of which is, don’t necessarily have a majority of people on the internet promoting vaccines.

Lucas: People that are against things are often louder than people that are not.

El Mahdi: Yeah, makes sense, and sometimes maybe numerous because they generate content, and the people who think vaccines are safe not creating content. In some topics it might be safe to say that we have a majority of reasonable, decent people on the internet. But there are some topics in which now even like polls, like the vaccine situation, there’s a surge now of anti-vaccine resentment in western Europe and the US. Ironically this is happening in the developed country now, because people are so young, they don’t remember the non-vaccinated person. My aunt, I come from Morocco. my aunt is handicapped by polio, so I grew up seeing what a non-vaccinated person looks like. So young people in the more developed countries never had a living example of non-vaccinated past.

Lucas: But they do have examples of people that end up with autism and it seems correlated with vaccines.

El Mahdi: Yeah, the anti-vaccine content may just end up being so click baits, and so provocative that it gets popular. So this is a topic where the majority hypothesis which is crucial to poisoning resilience does not hold. An open follow up we’re onto now is how to combine ideas from reputation metrics, PageRank, et cetera, with poisoning resilience. So for example you have the National Health Institute, the John Hopkins Medical Hospital, Harvard Medical School, and I don’t know, the Massachusetts General Hospital having official accounts on some video-sharing platform and then you can spot what they say on some topic because now we are very good at doing semantic analysis of contents.

And know that okay, on the tag vaccines, I know that there’s this bunch of experts and then what you want to make emerge on your platform is some sort of like epistocracy. The power is given to the knowledgeable, like we have in some fields, like in medical regulation. The FDA doesn’t do a majority vote. We don’t have a popular majority vote across the country to tell the FDA whether it should approve this new drug or not. The FDA does some sort of epistocracy where the knowledgeable experts on the topic would vote. So how about mixing ideas from social choice?

Lucas: And topics in which there are experts who can inform.

El Mahdi: Yeah. There’s also a general fall-off of just straight out trying to connect Byzantine resilient learning with social choice, but then there’s another set of follow ups that motivates me even more. We were mentioning workers, workers, people generate accounts on social media, accounts generation gradients. That’s all I can implicitly assume in that the server, the abstraction that’s gathering those gradients is reliable. What about the aggregated platform itself being deployed on rogue machines? So imagine you are whatever platform doing learning. By the way, whatever always we have said from the beginning until now applies as long as you do gradient-based learning. So it can be recommended systems. It can be training some deep reinforcement learning of some super complicated tasks to beat, I don’t know the word, champion in poker.

We do not care as long as there’s some gradient generation from observing some state, some environmental state, and some reward or some label. It can be supervised, reinforced, as long as gradient based or what you say apply. Imagine now you have this platform leveraging distributed gradient creators, but then the platform itself for security reasons is deployed on several machines for fault tolerance. But then those machines themselves can fail. You have to make the servers agree on the model, so despite the fact that a fraction of the workers are not reliable and now a fraction of the servers themselves. This is the most important follow up i’m into now and I think there would be something on archive maybe in February or March on that.

And then a third follow up is practical instances of that, so I’ve been describing speculative thought experiments on power poisoning systems is actually brilliant master students working which means exactly doing that, like on typical recommended systems, datasets where you could see that it’s very easy. It really takes you a bunch of active agents to poison, a hundred thousand ones or more. Probably people working on big social media platforms would have ways to assess what I’ve said, and so as researchers in academia we could only speculate on what can go wrong on those platforms, so what we could do is just like we just took state of the art recommender systems, datasets, and models that are publicly available, and you can show that despite having a large number of reliable recommendation proposers, a small, tiny fraction of proposers can make, I don’t know, like a movie recommendation system recommend the most suicidal triggering film to the most depressed person watching through your platform. So I’m saying, that’s something you don’t want to have.

Lucas: Right. Just wrapping this all up, how do you see this in the context of AI alignment and the future of machine learning and artificial intelligence?

El Mahdi: So I’ve been discussing this here with people in the Beneficial AI conference and it seems that there are two schools of thought. I am still hesitating between the two because I switched within the past three months from the two sides like three times. So one of them thinks that an AGI is by definition resilient to poisoning.

Lucas: Aligned AGI might be by definition.

El Mahdi: Not even aligned. The second school of thought, aligned AGI is Byzantine resilient.

Lucas: Okay, I see.

El Mahdi: Obviously aligned AGI would be poisoning resilience, but let’s just talk about super intelligent AI, not necessarily aligned. So you have a super intelligence, would you include poisoning resilience in the super intelligence definition or not? And one would say that yeah, if you are better than human in whatever task, it means you are also better than human into spotting poison data.

Lucas: Right, I mean the poison data is just messing with your epistemics, and so if you’re super intelligent your epistemics would be less subject to interference.

El Mahdi: But then there is that second school of thought which I switched back again because I find that most people are in the first school of thought now. So I believe that super intelligence doesn’t necessarily include poisoning resilience because of what I call practically time constrained superintelligence. If you have a deadline because of computational complexity, you have to learn something, which can sometimes-

Lucas: Yeah, you want to get things done.

El Mahdi: Yeah, so you want to get it done in a finite amount of time. And because of that you will end up leveraging to speed up your learning. So if a malicious agent just put up bad observations of the environment or bad labeling of whatever is around you, then it can make you learn something else than what you would like as an aligned outcome. I’m strongly on the second side despite many disagreeing with me here. I don’t think super intelligence includes poisoning resilience, because super intelligence would still be built with time constraints.

Lucas: Right. You’re making a tradeoff between safety and computational efficiency.

El Mahdi: Right.

Lucas: It also would obviously seem to matter the kind of world that the ASI finds itself in. If it knows that it’s in a world with no, or very, very, very few malevolent agents that are wanting to poison it, then it can just throw all of this out of the window, but the problem is that we live on a planet with a bunch of other primates that are trying to mess up our machine learning. So I guess just as a kind of fun example in taking it to an extreme, imagine it’s the year 300,000 AD and you have a super intelligence which has sort of spread across space-time and it’s beginning to optimize its cosmic endowment, but it gives some sort of uncertainty over space-time to whether or not there are other super intelligences there who might want to poison its interstellar communication in order to start taking over some of its cosmic endowment. Do you want to just sort of explore?

El Mahdi: Yeah, that was like a closed experiment I proposed earlier to Carl Shulman from the FHI. Imagine some super intelligence reaching the planets where there is a smart form of life emerging from electric communication between plasma clouds. So completely non-carbon, non-silicon based.

Lucas: So if Jupiter made brains on it.

El Mahdi: Yeah, like Jupiter made brains on it just out of electric communication through gas clouds.

Lucas: Yeah, okay.

El Mahdi: And then this turned to a form of communication is smart enough to know that this is a super intelligence reaching the planet to learn about this form of life, and then it would just start trolling it.

Lucas: It’ll start trolling the super intelligence?

El Mahdi: Yeah. So they would come up with an agreement ahead of time, saying, “Yeah, this super intelligence coming from earth throughout our century to discover how we do things here. Let’s just behave dumbly, or let’s just misbehave. And then the super intelligence will start collecting data on this life form and then come back to earth saying, Yeah, they’re just a dumb plasma passive form of nothing interesting.

Lucas: I mean, you don’t think that within the super intelligence’s model, I mean, we’re talking about it right now so obviously a super intelligence will know this when it leaves that there will be agents that are going to try and trick it.

El Mahdi: That’s the rebuttal, yes. That’s the rebuttal again. Again, how much time does super intelligence have to do inference and draw conclusions? You will always have some time constraints.

Lucas: And you don’t always have enough computational power to model other agents efficiently to know whether or not they’re lying, or …

El Mahdi: You could always come up with thought experiment with some sort of other form of intelligence, like another super intelligence is trying to-

Lucas: There’s never, ever a perfect computer science, never.

El Mahdi: Yeah, you can say that.

Lucas: Security is never perfect. Information exchange is never perfect. But you can improve it.

El Mahdi: Yeah.

Lucas: Wouldn’t you assume that the complexity of the attacks would also scale? We just have a ton of people working on defense, but if we have an equal amount of people working on attack, wouldn’t we have an equally complex method of poisoning that our current methods would just be overcome by?

El Mahdi: That’s part of the empirical follow-up I mentioned. The one Isabella and I were working on, which is trying to do some sort of min-max game of poisoner versus poisoning resilience learner, adversarial poisoning setting where like a poisoner and then there is like a resilient learner and the poisoner tries to maximize. And what we have so far is very depressing. It turns out that it’s very easy to be a poisoner. Computationally it’s way easier to be the poisoner than to be-

Lucas: Yeah, I mean, in general in the world it’s easier to destroy things than to create order.

El Mahdi: As I said in the beginning, this is a sub-topic of technical AI safety where I believe it’s easier to have tractable formalizable problems for which you can probably have a safe solution.

Lucas: Solution.

El Mahdi: But in very concrete, very short term aspects of that. In March we are going to announce a major update in Tensor Flow which is the standout frameworks today to do distributed machine learning, open source by Google, so we will announce hopefully if everything goes right in sys ML in the systems for machine learning conference, like more empirically focused colleagues, so based on the algorithms I mentioned earlier which were presented at NuerIPS and ICML from the past two years, they will announce a major update where they basically changed every averaging insight in terms of flow by those three algorithms I mentioned, Krum and Bulyan and soon Kardam which constitute our portfolio of Byzantine resilience algorithms.

Another consequence that comes for free with that is that distributed machinery frameworks like terms of flow use TCPIP as a communication protocol. So TCPIP has a problem. It’s reliable but it’s very slow. You have to repeatedly repeat some messages, et cetera, to guarantee reliability, and we would like to have a faster communication protocol, like UDP. We don’t need to go through those details. But it has some package drop, so so far there was no version of terms of flow or any distributed machine learning framework to my knowledge using UDP. The old used TCPIP because they needed reliable communication, but now because we are Byzantine resilient, we can afford having fast but not completely reliable communication protocols like UDP. So one of the things that come for free with Byzantine resilience is that you can move from heavy-

Lucas: A little bit more computation.

El Mahdi: -yeah, heavy communication protocols like TCPIP to lighter, faster, more live communication protocols like UDP.

Lucas: Keeping in mind you’re trading off.

El Mahdi: Exactly. Now we have this portfolio of algorithms which can serve many other applications besides just making faster distributed machine learning, like making poisoning resilience. I don’t know, recommended systems for social media and hopefully making AGI learning poisoning resilience matter.

Lucas: Wonderful. So if people want to check out some of your work or follow you on social media, what is the best place to keep up with you?

El Mahdi: Twitter. My handle is El Badhio, so maybe you would have it written down on the description.

Lucas: Yeah, cool.

El Mahdi: Yeah, Twitter is the best way to get in touch.

Lucas: All right. Well, wonderful. Thank you so much for speaking with me today and I’m excited to see what comes out of all this next.

El Mahdi: Thank you. Thank you for hosting this.

Lucas: If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

[end of recorded material]

FLI Podcast- Artificial Intelligence: American Attitudes and Trends with Baobao Zhang

Our phones, our cars, our televisions, our homes: they’re all getting smarter. Artificial intelligence is already inextricably woven into everyday life, and its impact will only grow in the coming years. But while this development inspires much discussion among members of the scientific community, public opinion on artificial intelligence has remained relatively unknown.

Artificial Intelligence: American Attitudes and Trends, a report published earlier in January by the Center for the Governance of AI, explores this question. Its authors relied on an in-depth survey to analyze American attitudes towards artificial intelligence, from privacy concerns to beliefs about U.S. technological superiority. Some of their findings—most Americans, for example, don’t trust Facebook—were unsurprising. But much of their data reflects trends within the American public that have previously gone unnoticed.

This month Ariel was joined by Baobao Zhang, lead author of the report, to talk about these findings. Zhang is a PhD candidate in Yale University’s political science department and research affiliate with the Center for the Governance of AI at the University of Oxford. Her work focuses on American politics, international relations, and experimental methods.

In this episode, Zhang spoke about her take on some of the report’s most interesting findings, the new questions it raised, and future research directions for her team. Topics discussed include:

  • Demographic differences in perceptions of AI
  • Discrepancies between expert and public opinions
  • Public trust (or lack thereof) in AI developers
  • The effect of information on public perceptions of scientific issues

Research and publications discussed in this episode include:

You can listen to the podcast above, or read the full transcript below.

Ariel: Hi there. I’m Ariel Conn with the Future of Life Institute. Today, I am doing a special podcast, which I hope will be just the first in a continuing series, in which I talk to researchers about the work that they’ve just published. Last week, a report came out called Artificial Intelligence: American Attitudes and Trends, which is a survey that looks at what Americans think about AI. I was very excited when the lead author of this report agreed to come join me and talk about her work on it, and I am actually now going to just pass this over to her, and let her introduce herself, and just explain a little bit about what this report is and what prompted the research.

Baobao: My name is Baobao Zhang. I’m a PhD candidate in Yale University’s political science department, and I’m also a research affiliate with the Center for the Governance of AI at the University of Oxford. We conducted a survey of 2,000 American adults in June 2018 to look at what Americans think about artificial intelligence. We did so because we believe that AI will impact all aspects of society, and therefore, the public is a key stakeholder. We feel that we should study what Americans think about this technology that will impact them. In this survey, we covered a lot of ground. In the past, surveys about AI tend to have very specific focus, for instance on automation and the future of work. What we try to do here is cover a wide range of topics, including the future of work, but also lethal autonomous weapons, how AI might impact privacy, and trust in various actors to develop AI.

So one of the things we found is Americans believe that AI is a technology that should be carefully managed. In fact, 82% of Americans feel this way. Overall, Americans express mixed support for developing AI. 41% somewhat support or strongly support the development of AI, while there’s a smaller minority, 22%, that somewhat or strongly opposes it. And in terms of the AI governance challenges that we asked—we asked about 13 of them—Americans think all of them are quite important, although they prioritize preventing AI-assisted surveillance from violating privacy and civil liberties, preventing AI from being used to spread fake news online, preventing AI cyber attacks, and protecting data privacy.

Ariel: Can you talk a little bit about what the difference is between concerns about AI governance and concerns about AI development and more in the research world?

Baobao: In terms of the support for developing AI, we saw that as a general question in terms of support—we didn’t get into the specifics of what developing AI might look like. But in terms of the governance challenges, we gave quite detailed, concrete examples of governance challenges, and these tend to be more specific.

Ariel: Would it be fair to say that this report looks specifically at governance challenges as opposed to development?

Baobao: It’s a bit of both. I think we ask both about the R&D side, for instance we ask about support for developing AI and which actors the public trusts to develop AI. On the other hand, we also ask about the governance challenges. Among the 13 AI governance challenges that we presented to respondents, Americans tend to think all of them are quite important.

Ariel: What were some of the results that you expected, that were consistent with what you went into this survey thinking people thought, and what were some of the results that surprised you?

Baobao: Some of the results that surprised us is how soon the public thinks that high-level machine intelligence will be developed. We find that they think it will happen a lot sooner than what experts predict, although some past research suggests similar results. What didn’t surprise me, in terms of the AI governance challenge question, is how people are very concerned about data privacy and digital manipulation. I think these topics have been in the news a lot recently, given all the stories about hacking or digital manipulation on Facebook.

Ariel: So going back real quick to your point about the respondents expecting high-level AI happening sooner: how soon do they expect it?

Baobao: In our survey, we asked respondents about high-level machine intelligence, and we defined it as when machines are able to perform almost all tasks that are economically relevant today better than the median human today at each task. My co-author, Allan Dafoe, and some of my other team members, we’ve done a survey asking AI researchers—this was back in 2016—a similar question, and there we had a different definition of high-level machine intelligence that required a higher bar, so to speak. So that might have caused some difference. We’re trying to ask this question again to AI researchers this year. We’re doing continuing research, so hopefully the results will be more comparable. Even so, I think the difference is quite large.

I guess one more caveat is—we have in the footnote—we did ask the same definition as we asked AI experts in 2016 in a pilot survey on the American public, and we also found that the public thinks high-level machine intelligence will happen sooner than experts predict. So it might not just be driven by the definition itself, but the public and experts have different assessments. But to answer your question, the median respondent in our American public sample predicts that there’s a 54% probability of high-level machine intelligence being developed within the next 10 years, which is quite high of a probability.

Ariel: I’m hesitant to ask this, because I don’t know if it’s a very fair question, but do you have thoughts on why the general public thinks that high-level AI will happen sooner? Do you think it is just a case that there’s different definitions that people are referencing, or do you think that they’re perceiving the technology differently?

Baobao: I think that’s a good question, and we’re doing more research to investigate these results and to probe at it. One thing is that the public might have a different perception of what AI is compared to experts. In future surveys, we definitely want to investigate that. Another potential explanation is that the public lacks understanding of what goes into AI R&D.

Ariel: Have there been surveys that are as comprehensive as this in the past?

Baobao: I’m hesitant to say that there are surveys that are as comprehensive as this. We certainly relied on a lot of past survey research when building our surveys. The Eurobarometer had a couple of good surveys on AI in the past, but I think we cover both sort of the long-term and the short-term AI governance challenges, and that’s something that this survey really does well.

Ariel: Okay. The reason I ask that is I wonder how much people’s perceptions or misperceptions of how fast AI is advancing would be influenced by just the fact that we have had significant advancements just in the last couple of years that I don’t think were quite as common during previous surveys that were presented to people.

Baobao: Yes, that certainly makes sense. One part of our survey tries to track responses over time, so I was able to dig up some surveys going all the way back to the 1980s that were conducted by the National Science Foundation on the question of automation—whether automation will create more jobs or eliminate more jobs. And we find that compared with the historical data, the percentage of people who think that automation will create more jobs than it eliminates—that percentage has decreased, so this result could be driven by people reading in the news about all these advances in AI and thinking, “Oh, AI is getting really good these days at doing tasks normally done by humans,” but again, you would need much more data to sort of track these historical trends. So we hope to do that. We just recently received a grant from the Ethics and Governance of AI Fund, to continue this research in the future, so hopefully we will have a lot more data, and then we can really map out these historical trends.

Ariel: Okay. We looked at those 13 governance challenges that you mentioned. I want to more broadly ask the same two-part question of: looking at the survey in its entirety, what results were most expected and what results were most surprising?

Baobao: In terms of the AI governance challenge question, I think we had expected some of the results. We’d done some pilot surveys in the past, so we were able to have a little bit of a forecast, in terms of the governance challenges that people prioritize, such as data privacy, cyber attacks, surveillance, and digital manipulation. These were also things that respondents in the pilot surveys had prioritized. I think some of the governance challenges that people still think of as important, but don’t view as likely to impact large numbers of people in the next 10 years, such as critical AI systems failure—these questions are sort of harder to ask in some ways. I know that AI experts think about it a lot more than, say, the general public.

Another thing that sort of surprised me is how much people think value alignment— which is sort of an abstract concept—how much people think that’s quite important, and also likely to impact large numbers of people within the next 10 years. It’s up there with safety of autonomous vehicles or biased hiring algorithms, so that was somewhat surprising.

Ariel: That is interesting. So if you’re asking people about value alignment, were respondents already familiar with the concept, or was this something that was explained to them and they just had time to consider it as they were looking at the survey?

Baobao: We explained to them what it meant, and we said that it means to make sure that AI systems are safe, trustworthy, and aligned with human values. Then we gave a brief paragraph definition. We think that maybe people haven’t heard of this term before, or it could be quite abstract, so therefore we gave a definition.

Ariel: I would be surprised if it was a commonly known term. Then looking more broadly at the survey as a whole, you looked at lots of different demographics. You asked other questions too, just in terms of things like global risks and the potential for global risks, or generally about just perception of AI in general, and whether or not it was good, and whether or not advanced AI was good or bad, and things like that. So looking at the whole survey, what surprised you the most? Was it still answers within the governance challenges, or did anything else jump out at you as unexpected?

Baobao: Another thing that jumped out at me is that respondents who have computer science or engineering degrees tend to think that the AI governance challenges are less important across the board than people who don’t have computer science or engineering degrees. These people with computer science or engineering degrees also are more supportive of developing AI. I suppose that result is not totally unexpected, but I suppose in the news there is a sense that people who are concerned about AI safety, or AI governance challenges, tend to be those who have a technical computer background. But in reality, what we see are people who don’t have a tech background who are concerned about AI. For instance, women, those with low levels of education, or those who are low-income, tend to be the least supportive of developing AI. That’s something that we want to investigate in the future.

Ariel: There’s an interesting graph in here where you’re showing the extent to which the various groups consider an issue to be important, and as you said, people with computer science or engineering degrees typically don’t consider a lot of these issues very important. I’m going to list the issues real quickly. There’s data privacy, cyber attacks, autonomous weapons, surveillance, autonomous vehicles, value alignment, hiring bias, criminal justice bias, digital manipulation, US-China arms race, disease diagnosis, technological unemployment, and critical AI systems failure. So as you pointed out, the people with the CS and engineering degrees just don’t seem to consider those issues nearly as important, but you also have a category here of people with computer science or programming experience, and they have very different results. They do seem to be more concerned. Now, I’m sort of curious what the difference was between someone who has experience with computer science and someone who has a degree in computer science.

Baobao: I don’t have a very good explanation for the difference between the two, except for I can say that the people with experience, that’s a lower bar, so there are more people in the sample who have computer science or programming experience—and in fact, there’s 735 of them, compared to people who have computer science or engineering undergrad or graduate degrees, and that’s 195 people. I suppose those who have the CS or programming experience, that comprises a greater number of people. Going forward, in future surveys, we want to probe at this a bit more. We might look at what industries various people are working in, or how much experience they have either using AI or developing AI.

Ariel: And then I’m also sort of curious—I know you guys still have more work that you want to do—but I’m curious what you know now about how American perspectives are either different or similar to people in other countries.

Baobao: The most direct comparison that we can make is with respondents in the EU, because we have a lot of data based on the Eurobarometer surveys, and we find that Americans share similar concerns with Europeans about AI. So as I mentioned earlier, 82% of Americans think that AI is a technology that should be carefully managed, and that percentage is similar to what the EU respondents have expressed. Also, we find similar demographic trends, in that women, those with lower levels of income or lower levels of education, tend to be not as supportive of developing AI.

Ariel: I went through this list, and one of the things that was on it is the potential for a US-China arms race. Can you talk a little bit about the results that you got from questions surrounding that? Do Americans seem to be concerned about a US-China arms race?

Baobao: One of the interesting findings from our survey is that Americans don’t necessarily think the US or China is the best at AI R&D, which is surprising, given that these two countries are probably the best. That’s a curious fact that I think we need to be cognizant of.

Ariel: I want to interject there, and then we can come back to my other questions, because I was really curious about that. Is that a case of the way you asked it—it was just, you know, “Is the US in the lead? Is China in the lead?”—as opposed to saying, “Do you think the US or China are in the lead?” Did respondents seem confused by possibly the way the question was asked, or do they actually think there’s some other country where there’s even more research happening?

Baobao: We asked this question in a way that it has been asked about general scientific achievements that Pew Research Center has asked about, so we did it such that it’s a survey experiment where half of the respondents were randomly assigned to consider the US and half of the respondents were randomly assigned to consider China. We wanted to ask this question in this manner, so we get more specific distribution of responses. When you just ask who is in the lead, you’re only allowed to put down one, whereas we give respondents a number of choices, so you can be either best in the world or above average, et cetera.

In terms of people underestimating US R&D, I think this is reflective of the public underestimating US scientific achievements in general. Pew had a similar question in a 2015 survey, and while 45% of the scientists they interviewed think that scientific achievement in the US are the best in the world, only 15% of Americans expressed the same opinion. So this could just be reflecting this general trend.

Ariel: I want to go back to my questions about the US-China arms race, and I guess it does make sense, first, to just define what you are asking about with a US-China arms race. Is that focused more on R&D, or were you also asking about a weapons race?

Baobao: This is actually a survey experiment, where we present different messages to respondents about a potential US-China arms race, and we asked both about investment in AI military capabilities as well as developing AI in a more peaceful manner, and cooperation between the US and China in terms of general R&D. We found that Americans seem to both support the US investing more in AI military capabilities, to make sure that it doesn’t fall behind China’s, even though it would exacerbate a AI military arms race. On the other hand, they also support the US working hard with China to cooperate to avoid the dangers of a AI arms race, and they don’t seem to understand that there’s a trade-off between the two.

I think this result is important for policymakers trying to not exacerbate an arms race, or to prevent one, when communicating with the public—to communicate these trade-offs, although we find that messages that explain the risks of an arm race tend to decrease respondent support for the US investing more in AI military capabilities, but the other information treatments don’t seem to change public perceptions.

Ariel: Do you think it’s a misunderstanding of the trade-offs, or maybe just hopeful thinking that there’s some way to maintain military might while still cooperating?

Baobao: I think this is a question that involves further investigation. I apologize that I keep saying this.

Ariel: That’s the downside to these surveys. I end up with far more questions than get resolved.

Baobao: Yes, and we’re one of the first groups who are asking these questions, so we’re just at the beginning stages of probing this very important policy question.

Ariel: With a project like this, do you expect to get more answers or more questions?

Baobao: I think in the beginning stages, we might get more questions than answers, although we are certainly getting some important answers—for instance that the American public is quite concerned about the societal impacts of AI. With that result, then we can probe and get more detailed answers hopefully. What are they concerned about? What can policymakers do to alleviate these concerns?

Ariel: Let’s get into some of the results that you had regarding trust. Maybe you could just talk a little bit about what you asked the respondents first, and what some of their responses were.

Baobao: Sure. We asked two questions regarding trust. We asked about trust in various actors to develop AI, and we also asked about trust in various actors to manage the development and deployment of AI. These actors include parts of the US government, international organizations, companies, and other groups such as universities or nonprofits. We found that among the actors that are most trusted to develop AI, these include university researchers and the US military.

Ariel: That was a rather interesting combination, I thought.

Baobao: I would like to give it some context. In general, trust in institutions is low among the American public. Particularly, there’s a lot of distrust in the government, and university researchers and the US military are the most trusted institutions across the board, when you ask about other trust issues.

Ariel: I would sort of wonder if there’s political sides with which people are more likely to trust universities and researchers versus trust the military. Is that across the board respondents on either side of the political aisle trusted both, or were there political demographics involved in that?

Baobao: That’s something that we can certainly look into with our existing data. I would need to check and get back to you.

Ariel: The other thing that I thought was interesting with that—and we can get into the actors that people don’t trust in a minute—but I know I hear a lot of concern that Americans don’t trust scientists. As someone who does a lot of science communication, I think that concern is overblown. I think there is actually a significant amount of trust in scientists; There’s just some certain areas where it’s less, and I was sort of wondering what you’ve seen in terms of trust in science, and if the results of this survey have impacted that at all.

Baobao: I would like to add that among the actors that we asked who are currently building AI or planning to build AI, trust is relatively low amongst all these groups.

Ariel: Okay.

Baobao: So, even with university scientists: 50% of respondents say that they have a great amount of confidence or a fair amount of confidence in university researchers developing AI in the interest of the public, so that’s better than some of these other organizations, but it’s not super high, and that is a bit concerning. And in terms of trust in science in general—I used to work in the climate policy space before I moved into AI policy, and there, it’s a question that we struggle with in terms of trust in expertise with regards to climate change. I found that in my past research, communicating the scientific consensus in climate change is actually an effective messaging tool, so your concerns about distrust in science being overblown, that could be true. So I think going forward, in terms of effective scientific communication, having AI researchers deliver an effective message: I think that could be important in bringing the public to trust AI more.

Ariel: As someone in science communication, I would definitely be all for that, but I’m also all for more research to understand that better. I also want to go into the organizations that Americans don’t trust.

Baobao: I think in terms of tech companies, they’re not perceived as untrustworthy across the board. I think trust is still relatively high for tech companies, besides Facebook. People really don’t trust Facebook, and that could be because of all the recent coverage of Facebook violating data privacy, the Cambridge Analytica scandal, digital manipulation on Facebook, et cetera. So we conducted this survey a few months after the Cambridge Analytica Facebook scandal had been in the news, but we’ve also run some pilot surveys before all that press coverage of the Cambridge Analytica Facebook scandal had broke, and we also found that people distrust Facebook. So it might be something particular to the company, although that’s a cautionary tale for other tech companies, that they should work hard to make sure that the public trusts its products.

Ariel: So I’m looking at this list, and under the tech companies, you asked about Microsoft, Google, Facebook, Apple, and Amazon. And I guess one question that I have—the trust in the other four, Microsoft, Google, Apple, and Amazon appears to be roughly on par, and then there’s very limited trust in Facebook. But I wonder, do you think it’s just—since you’re saying that Facebook also wasn’t terribly trusted beforehand—do you think that has to do with the fact that we have to give so much more personal information to Facebook? I don’t think people are aware of giving as much data to even Google, or Microsoft, or Apple, or Amazon.

Baobao: That could be part of it. So, I think going forward, we might want to ask more detailed questions about how people use certain platforms, or whether they’re aware that they’re giving data to particular companies.

Ariel: Are there any other reasons that you think could be driving people to not trust Facebook more than the other companies, especially as you said, with the questions and testing that you’d done before the Cambridge Analytica scandal broke?

Baobao: Before the Cambridge Analytica Facebook scandal, there were a lot of news coverage around the 2016 elections of vast digital manipulation on Facebook, and on social media, so that could be driving the results.

Ariel: Okay. Just to be consistent and ask you the same question over and over again, with this, what did you find surprising and what was on par with your expectations?

Baobao: I suppose I don’t find the Facebook results that unsurprising, given its negative press coverage, and also from our pilot results. What I did find surprising is the high levels of trust in the US military to develop AI, because I think some of us in the AI policy community are concerned about military applications of AI, such as lethal autonomous weapons. But on the other hand, Americans seem to place a high general level of trust in the US military.

Ariel: Yeah, that was an interesting result. So if you were going to move forward, what are some questions that you would ask to try to get a better feel for why the trust is there?

Baobao: I think I would like to ask some questions about particular uses or applications of AI these various actors are developing. Sometimes people aren’t aware that the US military is perhaps investing in this application of AI that they might find problematic, or that some tech companies are working on some other applications. I think going forward, we might do more of these survey experiments, where we give information to people and see if that increases or decreases trust in the various actors.

Ariel: What did Americans think of high-level machine learning and AI?

Baobao: What we found is that the public thinks, on balance, it will be more bad than good: So we have 15% of respondents who think it will be extremely bad, possibly leading to human extinction, and that’s a concern. On the other hand, only 5% thinks it will be extremely good. There’s a lot of uncertainty. To be fair, it is about a technology that a lot of people don’t understand, so 18% said, “I don’t know.”

Ariel: What do we take away from that?

Baobao: I think this also reflects on our previous findings that I talked about, where Americans expressed concern about where AI is headed: that there are people with serious reservations about AI’s impact on society. Certainly, AI researchers and policymakers should take these concerns seriously, invest a lot more research into how to prevent the bad outcomes and how to make sure that AI can be beneficial to everyone.

Ariel: Were there groups who surprised you by either being more supportive of high-level AI and groups who surprised you by being less supportive of high-level AI?

Baobao: I think the results for support of developing high-level machine intelligence versus support for developing AI, they’re quite similar. The correlation is quite high, so I suppose nothing is entirely surprising. Again, we find that people with CS or engineering degrees tend to have higher levels of support.

Ariel: I find it interesting that people who have higher incomes seem to be more supportive as well.

Baobao: Yes. That’s another result that’s pretty consistent across the two questions. We also performed analysis looking at these different levels of support for developing high-level machine intelligence, controlling for support of developing AI, and what we find there is that those with CS or programming experience have greater support of developing high-level machine intelligence, even controlling for support of developing AI. So there, it seems to be another tech optimism story, although we need to investigate further.

Ariel: And can you explain what you mean when you say that you’re analyzing the support for developing high-level machine learning with respect to the support for AI? What distinction are you making there?

Baobao: Sure. So we use a multiple linear regression model, where we’re trying to predict support for developing high-level machine intelligence using all these demographic characteristics, but also including respondent’s support for developing AI, to see if there’s something driving the support for developing high-level machine intelligence in spite of controlling for developing AI. And we find that controlling for support for developing AI, having CS or programming experience is further correlated with support of developing high-level machine intelligence. I hope that makes sense.

Ariel: For the purposes of the survey, how do you distinguish between AI and high-level machine learning?

Baobao: We defined AI as computer systems that perform tasks or make decisions that usually require human intelligence. So that’s a more general definition, versus high-level machine intelligence defined in such a way where the AI is doing most economically relevant tasks at the level of the median human.

Ariel: Were there inconsistencies between those two questions, where you were surprised to find support for one and not support for the other?

Baobao: We can sort of probe it further, to see if there’s people who answer differently for those two questions. We haven’t looked into it, but certainly that’s something that we can with our existing data.

Ariel: Were there any other results that you think researchers specifically should be made aware of, that could potentially impact the work that they’re doing in terms of developing AI?

Baobao: I guess here’s some general recommendations. I think it’s important for researchers or people working in an adjacent space to do a lot more scientific communication to explain to the public what they’re doing—particularly maybe AI safety researchers, because I think there’s a lot of hype about AI in the news, either how scary it is or how great it will be, but I think some more nuanced narratives would be helpful for people to understand the technology.

Ariel: I’m more than happy to do what I can to try to help there. So for you, what are your next steps?

Baobao: Currently, we’re working on two projects. We’re hoping to run a similar survey in China this year, so we’re currently translating the questions into Chinese and changing the questions to have more local context. So then we can compare our results—the US results with the survey results from China—which will be really exciting. We’re also working on surveying AI researchers about various aspects of AI, both looking at their predictions for AI development timelines, but also their views on some of these AI governance challenge questions.

Ariel: Excellent. Well, I am very interested in the results of those as well, so I hope you’ll keep us posted when those come out.

Baobao: Yes, definitely. I will share them with you.

Ariel: Awesome. Is there anything else you wanted to mention?

Baobao: I think that’s it.

Ariel: Thank you so much for joining us.

Baobao: Thank you. It’s a pleasure talking to you.

 

 

AI Alignment Podcast: Cooperative Inverse Reinforcement Learning with Dylan Hadfield-Menell (Beneficial AGI 2019)

What motivates cooperative inverse reinforcement learning? What can we gain from recontextualizing our safety efforts from the CIRL point of view? What possible role can pre-AGI systems play in amplifying normative processes?

Cooperative Inverse Reinforcement Learning with Dylan Hadfield-Menell is the eighth podcast in the AI Alignment Podcast series, hosted by Lucas Perry and was recorded at the Beneficial AGI 2018 conference in Puerto Rico. For those of you that are new, this series covers and explores the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, Lucas will speak with technical and non-technical researchers across areas such as machine learning, governance,  ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with Dylan Hadfield-Menell. Dylan is a 5th year PhD student at UC Berkeley advised by Anca Dragan, Pieter Abbeel and Stuart Russell, where he focuses on technical AI alignment research.

Topics discussed in this episode include:

  • How CIRL helps to clarify AI alignment and adjacent concepts
  • The philosophy of science behind safety theorizing
  • CIRL in the context of varying alignment methodologies and it’s role
  • If short-term AI can be used to amplify normative processes
You can follow Dylan here and find the Cooperative Inverse Reinforcement Learning paper here. You can listen to the podcast above or read the transcript below.

Lucas: Hey everyone, welcome back to the AI Alignment Podcast series. I’m Lucas Perry and today we will be speaking for a second time with Dylan Hadfield-Menell on cooperative inverse reinforcement learning, the philosophy of science behind safety theorizing, CIRL in the context of varying alignment methodologies, and if short term AI can be used to amplify normative processes. This time it just so happened to be an in person discussion and Beneficial AGI 2019, FLI’s sequel to the Beneficial AI 2017 conference at Asilomar.

I have a bunch of more conversations that resulted from this conference to post soon and you can find more details about the conference in the coming weeks. As always, if you enjoy this podcast, please subscribe or follow us on your preferred listening platform. As many of you will already know, Dylan is a fifth year Ph.D. student at UC Berkeley, advised by Anca Dragan, Pieter Abbeel, and Stuart Russell, where he focuses on technical AI Alignment research. And so without further ado, I’ll give you Dylan.

Thanks so much for coming on the podcast again, Dylan, that’s been like a year or something. Good to see you again.

Dylan: Thanks. It’s a pleasure to be here.

Lucas: So just to start off, we can go ahead and begin speaking a little bit about your work on cooperative inverse reinforcement learning and whatever sorts of interesting updates or explanation you have there.

Dylan: Thanks. For me, working in cooperative IRL has been a pretty long process, it really sort of dates back to the start of my second year in PhD when my advisor came back from a yearlong sabbatical and suggested that we entirely changed the research direction we were thinking about.

That was to think about AI Alignment and AI Safety and associated concerns that, that might bring. And our first attempt at a really doing research in that area was to try to formalize what’s the problem that we’re looking at, what are the space of parameters and the space of solutions that we should be thinking about in studying that problem?

And so it led us to write Cooperative Inverse Reinforcement Learning. Since then I’ve had a large amount of conversations where I’ve had incredible difficulty trying to convey what it is that we’re actually trying to do here and what exactly that paper and idea represents with respect to AI Safety.

One of the big updates for me and one of the big changes since we’ve spoken last, is getting a little bit of a handle on really what’s the value of that as the system. So for me, I’ve come around to the point of view that really what we were trying to do with cooperative IRL was to propose an alternative definition of what it means for an AI system to be effective or rational in some sense.

And so there’s a story you can tell about artificial intelligence, which is that we started off and we observed that people were smart and they were intelligent in some way, and then we observed that we could get computers to do interesting things. And this posed the question of can we get computers to be intelligent? We had no idea what that meant, no idea how to actually nail it down and we discovered that in actually trying to program solutions that looked intelligent, we had a lot of challenges.

So one of the big things that we did as a field was to look over next door into the economics department in some sense, to look at those sort of models that they have of decision theoretic rationality and really looking at homoeconomicous as an ideal to shoot for. From that perspective, actually a lot of the field of AI has shifted to be about effective implementations of homoeconomicous.

In my terminology, this is about systems that are effectively individually rational. These are systems that are good at optimizing for their goals, and a lot of the concerns that we have about AI Safety is that systems optimizing for their own goals could actually lead to very bad outcomes for the rest of us. And so what cooperative IRL attempts to do is to understand what it would mean for a human robot system to behave as a rational agent.

In the sense, we’re moving away from having a box drawn around the AI system or the artificial component of the system to having that agent box drawn around the person and the system together, and we’re trying to model the sort of important parts of the value alignment problem in our formulation here. And in this case, we went with the simplest possible set of assumptions which are basically that we have a static set of preferences that are the humans preferences that they’re trying to optimize. This is effectively the humans welfare.

The world is fully observable and the robot and the person are both working to maximize the humans welfare, but there is this information bottlenecking. This information asymmetry that’s present that we think is a fundamental component of the value alignment problem. And so really what cooperative IRL, is it’s a definition of how a human and a robot system together can be rational in the context of fixed preferences in a fully observable world state.

Lucas: There’s a point of metatheory or coming up with models and theory. It seems like the fundamental issue is given how and just insanely complex AI Alignment is trying to converge on whatever the most efficacious model is, is very, very difficult. People keep flicking back and forth about theoretically how we’re actually going to do this. Even in very grid world or toy environments. So it seemed very, very hard to isolate the best variables or what variables can be sort of modeled and tracked in ways that is going to help us most.

Dylan: So, I definitely think that this is not an accurate model of the world and I think that there are assumptions here which, if not appropriately reexamined, would lead to a mismatch between the real world and things that work in theory.

Lucas: Like human beings having static preferences.

Dylan: So for example, yes, I don’t claim to know what human preferences really are and this theory is not an attempt to say that they are static. It is an attempt to identify a related problem to the one that we’re really faced with, that we can actually make technical and theoretical progress on. That will hopefully lead to insights that may transfer out towards other situations.

I certainly recognize that what I’m calling a theta in that paper is not really the same thing that everyone talks about when we talk about preferences. I, in talking with philosophers, I’ve discovered, I think it’s a little bit more closer to things like welfare in like a moral philosophy context, which maybe you could think about as being a more static object that you would want to optimize.

In some sense theta really is an encoding of what you would like the system to do, in general is what we’re assuming there.

Lucas: Because it’s static.

Dylan: Yes, and to the extent that you want to have that be changing over time, I think that there’s an interesting theoretical question as to how that actually is different, and what types of changes that leads to and whether or not you can always reduce something with non-static preferences to something with static preferences from a mathematical point of view.

Lucas: I can see how moving from static to changing over time just makes it so much more insanely complex.

Dylan: Yeah, and it’s also really complex of the level of its Philosophically unclear what the right thing to do.

Lucas: Yeah, that’s what I mean. Yeah, you don’t even know what it even means to be aligning as the values are changing, like whether or not the agent even thinks that they just moved in the right direction or not.

Dylan: Right, and I also even think I want to point out how uncertain all of these things are. We as people are hierarchical organizations have different behaviors and observation systems and perception systems. And we believe we have preferences, we have a name to that, but there is a sense in which that is ultimately a fiction of some kind.

It’s a useful tool that we have to talk about ourselves to talk about others that facilitates interaction and cooperation. And so given that I do not know the answer to these philosophical questions, what can I try to do as a technical researcher to push the problem forward and to make actual progress?

Lucas: Right, and so it’s sort of again, like a metatheoretical point and what people are trying to do right now in the context of AI Alignment, it seems that the best thing for people to be doing is sort of to be coming up with these theoretical models and frameworks, which have a minimum set of assumptions which may be almost like the real world but are not, and then making theoretical progress there that will hopefully in the future transfer, as you said to other problems as ML and deep learning gets better and the other tools are getting better so that it’ll actually have the other tools to make it work with more complicated assumptions.

Dylan: Yes, I think that’s right. The way that I view this as we had AI, is this broad, vague thing. Through the course of AI research, we kind of got to Markov decision processes as a sort of coordinating theory around what it means for us to design good agents, and cooperative IRL is an attempt to take a step from markup decision processes more closely towards the set of problems that we want to study.

Lucas: Right, and so I think this is like a really interesting point that I actually haven’t talked to anyone else about and if you have a few more words about it, I think it would be really interesting. So just in terms of being a computer scientist and being someone who is working on the emerging theory of a field. I think it’s often unclear what the actual theorizing process is behind how people get to CIRL. How did someone get to debate? How did someone get to iterated amplification?

It seems like you first identify problems which you see to be crucial and then there are some sorts of epistemic and pragmatic heuristics that you apply to try and begin to sculpt a model that might lead to useful insight. Would you have anything to correct or unpack here?

Dylan: I mean, I think that is a pretty good description of a pretty fuzzy process.

Lucas: But like being a scientist or whatever?

Dylan: Yeah. I don’t feel comfortable speaking for scientists in general here, but I could maybe say a little bit more about my particular process, which is that I try to think about how I’m looking at the problem differently from other people based on different motivations and different goals that I have. And I try to lean into how that can push us in different directions. There’s a lot of other really, really smart people who have tried to do lots of things.

You have to maintain an amount of intellectual humility about your ability to out think the historical components of the field. And for me, I think that in particular for AI Safety, it’s thinking about reframing what is the goal that we’re shooting towards as a field.

Lucas: Which we don’t know.

Dylan: We don’t know of those goals are, absolutely. And I think that there is a sense in which the field has not re-examined those goals incredibly deeply. For a little bit, I think that it’s so hard to do anything that looks intelligent in the real world that we’ve been trying to focus on that individually rational Markov decision process model. And I think that a lot of the concerns about AI Safety are really a call for AI as a field to step back and think about what we’re trying to accomplish in the world and how can we actually try to achieve beneficial outcomes for society.

Lucas: Yeah, and I guess like a sociological phenomenon within the scientists or people who are committed to empirical things. In terms of reanalyzing what the goal of AI Alignment is, the sort of area of moral philosophy and ethics and other things, which for empirical leaning rational people can be distasteful because you can’t just take a telescope to the universe and see like a list of what you ought to do.

And so it seems like people like to defer on these questions. I don’t know. Do you have anything else to add here?

Dylan: Yeah. I think computer scientists in particular are selected to be people who like having boxed off problems that they know how to solve and feel comfortable with, and that leaning into getting more people with a humanities bent into computer science and broadly AI in particular, AI Safety especially is really important and I think that’s a broad call that we’re seeing come from society generally.

Lucas: Yeah, and I think it also might be wrong though to model the humanities questions as those which are not in boxes and cannot be solved. That’s sort of like a logical positivist thing to say, that on one end we have the hard things and you just have to look at the world enough and you’ll figure it out and then there’s the soft squishy things which deal with abstractions that I don’t have real answers, but people with fluffy degrees need to come up with things that seem right but aren’t really right.

Dylan: I think it would be wrong to take what I just said in that direction, and if that’s what it sounds like I definitely want to correct that. I don’t think there is a sense in which computer science is a place where there are easy right answers, and that the people in humanities are sort of waving their hands and sort of fluffing around.

This is sort of leaning into making this a more AI value alignment kinds of framing or thinking about it. But when I think about being AI systems into the world, I think about what things can you afford to get wrong in your specification and which things can you not afford to get wrong in your specifications.

In this sense, specifying physics incorrectly is much, much better than specifying the objective incorrectly, at least by default. And the reason for that is what happens to the world when you push it, is a question that you can answer from your observations. And so if you start off in the wrong place, as long as you’re learning and adapting, I can reasonably expect my systems do correct to that. Or at least the goal of successful AI research is that your systems will effectively adapt to that.

However, the past that your system is supposed to do is sort of arbitrary in a very fundamental sense. And from that standpoint, it is on you as the system designer to make sure that objective is specified correctly. When I think about what we want to do as a field, I ended up taking a similar lens and that there’s a sense in which we as researchers and people and society and philosophers and all of it are trying to figure out what we’re trying to do and what we want to task the technology with, and the directions that we want to push it in. And then there are questions of what will the technology be like and how should it function that will be informed by that and shaped by that.

And I think that there is a sense in which that is arbitrary. Now, what is right? That I don’t really know the answer to and I’m interested in having those conversations, but they make me feel uneasy. I don’t trust myself on those questions, and that could mean that I should learn how to feel more uneasy and think about it more and in doing this research I have been kind of forced into some of those conversations.

But I also do think that for me at least I see a difference between what can we do and what should we do. And thinking about what should we do as a really, really hard question that’s different than what can we do.

Lucas: Right. And so I wanna move back towards CIRL, but just to sort of wrap up here on our philosophy of science musings, a thought I had while you were going through that was, at least for now, what I think is fundamentally shared between fields that deal with things that matter, are their concepts deal with meaningfully relevant reference in the world? Like do your concepts refer to meaningful things?

Putting ontology aside, whatever love means or whatever value alignment mean. These are meaningful referents for people and I guess for now if our concepts are actually referring to meaningful things in the world, then it seems important.

Dylan: Yes, I think so. Although, I’m not totally sure I understood that.

Lucas: Sure, that’s fine. People will say that humanities or philosophy doesn’t have these boxes with like well-defined problems and solutions because they either don’t deal with real things in the world or the concepts are so fuzzy that the problems are sort of invented and illusory. Like how many angels can stand on the head of a pin? Like the concepts don’t work, aren’t real and don’t have real referents, but whatever.

And I’m saying the place where philosophy and ethics and computer science and AI Alignment should at least come together for now is where the referents have, where the concepts of meaningful referents in the world?

Dylan: Yes, that is something that I absolutely buy. Yes, I think there’s a very real sense in which those questions are harder, but that doesn’t mean they’re less real or less important.

Lucas: Yes, that’s because it’s the only point I wanted to push against logical positivism.

Dylan: No, I don’t mean to say that the answers are wrong, it’s just that they are harder to prove in a real sense.

Lucas: Yeah. I mean, I don’t even know if they have answers or if they do or if they’re just all wrong, but I’m just open to it and like more excited about everyone coming together thing.

Dylan: Yes, I absolutely agree with that.

Lucas: Cool. So now let’s turn it back into the CIRL. So you began by talking about how you and your advisers were having this conceptual shift and framing, then we got into the sort of philosophy of science behind how different models and theories of alignment go. So from here, whatever else you have to say about CIRL.

Dylan: So I think for me the upshot of concerns about advanced AI systems and negative consequence there in really is a call to recognize that the goal of our field is AI Alignment. That almost any AI that’s not AI Alignment is solving a sub problem and viewing it only in solving that sub problem is a mistake.

Ultimately, we are in the business of building AI systems that integrate well with humans and human society. And if we don’t take that as a fundamental tenant of the field, I think that we are potentially in trouble and I think that that is a perspective that I wish was more pervasive throughout artificial intelligence generally,

Lucas: Right, so I think I do want to move into this view where safety is a normal thing, and like Stuart Russell will say, “People who build bridges all care about safety and there aren’t a subsection of bridge builders who work in bridge safety, everyone is part of the bridge safety.” And I definitely want to get into that, but I also sort of want to get a little bit more into CIRL and why you think it’s so motivating and why this theoretical framing and shift is important or illuminating, and what the specific content of it is.

Dylan: The key thing is that what it does is point out that it doesn’t make sense to talk about how well your system is doing without talking about the way in which it was instructed and the type of information that it got. No AI system exists on its own, every AI system has a designer, and it doesn’t make sense to talk about the functioning of that system without also talking about how that designer built it, evaluated it and how well it is actually serving those ends.

And I don’t think this is some brand new idea that no one’s ever known about, I think this is something that is incredibly obvious to practitioners in the field once you pointed out. The process whereby a robot learns to navigate a maze or vacuum a room is not, there is an objective and it optimizes it and then it does it.

What it is that there is a system designer who writes down an objective, selects an optimization algorithm, observes the final behavior of that optimization algorithm, goes back, modifies the objectives, modifies the algorithm, changes hyper parameters, and then runs it again. And there’s this iterative process whereby your system eventually ends up getting to the behavior that you wanted to have. And AI researchers have tended to draw a box around. The thing that we call AI is the sort of final component of that.

Lucas: Yeah, it’s because at least subjectively and I guess this is sort of illuminated by meditation and Buddhism, is that if you’re a computer scientist and you’re just completely identified with the process of doing computer science, you’re just identified with the problem. And if you just have a little bit of mindfulness and you’re like, “Okay, I’m in the context of a process where I’m an agent and trying to align another agent,” and if you’re not just completely identified with the process and you see the unfolding of the process, then you can do sort of like more of a meta-analysis which takes a broader view of the problem and can then, I guess hopefully work on improving it.

Dylan: Yeah, I think that’s exactly right, or at least as I understand that, that’s exactly right. And to be a little bit specific about this, we have had these engineering principles and skills that are not in the papers, but they are things that are passed down from Grad student to Grad student within a lab. Their institutional knowledge that exists within a company for how you actually verify and validate your systems, and cooperative IRLs and attempt to take all of that sort of structure that AI systems have existed within and try to bring that into the theoretical frameworks that we actually work with.

Lucas: So can you paint a little picture of what the CIRL model looks like?

Dylan: It exists in a sequential decision making context and we assume we have states of the world and a transition diagram that basically tells us how we get to another state given the previous state and actions from the human and the robot. But the important conceptual shift that it makes is the space of solutions that we’re dealing with are combinations of a teaching strategy and a learning strategy.

There is a commitment on the side of the human designers or users of the systems to provide data that is in some way connected to the objectives that they want to be fulfilled. That data can take many forms, it could be in the form of writing down a reward function that ranks a set of alternatives, it could be in the form of providing demonstrations that you expect your system to imitate. It could be in the form of providing binary comparisons between two clearly identified alternatives.

And the other side of the problem is what is the learning strategy that we use? And this is the question of how the robot is actually committing to respond to the observations that we’re giving it about what we wanted to do, in the case of a pre-specified proxy reward going to a literal interpretation and a reinforcement learning system, let’s say. What the system is committing to doing is optimizing under that set of trajectory rankings and preferences based off the simulation environment that it’s in, or the actual physical environment that it’s exploring.

When we shift to something like inverse reward design, which is a paper that we released last year, what that says is we’d like the system to look at this ranking of alternatives and actually try to blow that up into a larger uncertainty set over the set of possible consistent rankings with that, and then when you go into deployment, you may be able to leverage that uncertainty to avoid catastrophic failures or generally just unexpected behavior.

Lucas: So this other point I think that you and I discussed briefly, maybe it was actually with Rohan, but it seems like often in terms of AI Alignment, it’s almost like we’re reasoning from nowhere about abstract agents and that sort of makes the problem extremely difficult. Often, if you just look at human examples, it just becomes super mundane and easy. This sort of conceptual shift can almost I think be framed super simply as like the difference between a teacher trying to teach someone and then a teacher realizing that the teacher is a person that is teaching another student and the teacher can think better about how to teach and then also the process between the teacher and the student and how to improve that at a higher level of attraction.

Dylan: I think that’s the direction that we’re moving in. What I would say is it’s as AI practitioners, we are teaching our systems how to behave and we have developed our strategies for doing that.

And now that we’ve developed a bunch of strategies that sort of seem to work. I think it’s time for us to develop a more rigorous theory of actually how those teaching strategies interact with the final performance of the system.

Lucas: Cool. Is there anything else here that you would like say about CIRL, or any really important points you would like people to get people who are interested in technical AI Alignment or CS students?

Dylan: I think the main point that I would make is that research and thinking about powerful AI systems is valuable, even if you don’t think that that’s what’s going to happen. You don’t need to be motivated by those sets of problems in order to recognize that this is actually just basic research into the science of artificial intelligence.

It’s got an incredible amount of really interesting problems and the perspectives that you adopt from this framing can be incredibly useful as a comparative advantage over other researchers in the field. I think that’d be my final word here.

Lucas: If I might just ask you one last question. We’re at beneficial AGI 2019 right now and we’ve heard a lot of overviews of different research agendas and methodologies and models and framings for how to best go forth with AI Alignment, which include a vast range of things which work on corrigibility and interpretability and robustness and other things, and the different sort of research agendas and methodologies of places like MIRI who is come out with this new framing on embedded agency, and also different views at OpenAI and DeepMind.

And Eric Drexler has also newly proposed these services based conception of AI where we remove the understanding of powerful AI systems or regular AI systems as agents, which sort of gets us away from a lot of the x-risky problems and global catastrophic risks problems and value alignment problems.

From your point of view, as someone who’s worked a lot in CIRL and is the technical alignment researcher, how do you view CIRL in this context and how do you view all of these different emerging approaches right now in AI Alignment?

Dylan: For me, and you know, I should give a disclaimer. This is my research area and so I’m obviously pretty biased to thinking it’s incredibly important and good, but for me at least, cooperative IRL is a uniting framework under which I can understand all of those different approaches. I believe that a services type solution to AI Safety or AI Alignment that’s actually arguing for a particular type of learning strategy and implementation strategies of CIRL, and I think it can be framed within that system.

Similarly, I had some conversations with people about debate. I believe debate fits really nicely into the framework and we commit to a human strategy of judging debates from systems and we commit to a robot strategy and just putting yourself into two systems and working towards that direction. So for me, it’s a way in which I can sort of identify the commonalities between these different approaches and compare and contrast them and then under a set of assumptions about what the world is like, what the space of possible preferences is like and what the space of strategies that people can implement possibly get out some information about which one is better or worse, or which type of strategy is vulnerable to different types of mistakes or errors.

Lucas: Right, so I agree with all of that, the only place that I might want to push back is, it seems that maybe the MIRI embedded agency stuff subsumes everything else. What do you think about that?

Because the framing is like whenever AI researchers draw these models, there are these conceptions of these information channels, right, which are selected by the researchers and which we control, but the universe is really just a big non-dual happening of stuff and agents are embedded in the environment and are almost an identical process within the environment and it’s much more fuzzy where the dense causal streams are and where a little causal streams are and stuff like that. It just seems like the MIRI stuff seems to maybe subsume the CIRL and everything else a little bit more, but I don’t know.

Dylan: I certainly agree that that’s the one that’s hardest to fit into the framework, but I would also say that in my mind, I don’t know what an agent is. I don’t know how to operationalize an agent, I don’t actually know what that means in the physical world and I don’t know what it means to be an agent. What I do know is that there is a strategy of some sort that we can think of as governing the ways that the system is perform and behave.

I want to be very careful about baking in assumptions in beforehand. And it feels to me like embedded agency is something that I don’t fully understand the set of assumptions being made in that framework. I don’t necessarily understand how they relate to the systems that we’re actually going to build.

Lucas: When people say that an agent is like a fuzzy concept, I think that, that might be surprising to a lot of people who have thought somewhat about the problem because it’s like, obviously I know what an agent is, it’s different than all the other dead stuff in the world that has goals and it’s physically confined and unitary.

If you just like imagine like abiogenesis, how life began. It is the first relatively self-replicating chain of hydrocarbons and agent and you can go from a really small systems to really big systems, which can exhibit certain properties or principles that feel a little bit agenty, but may not be useful. And so I guess if we’re going to come up with a definition of it, it should just be something useful for us or something.

Dylan: I think I’m not sure is the most accurate word we can use here. I wish I had a better answer for what this was, maybe I can share one of the thought experiments that convinced me, I was pretty confused about what an agent is.

Lucas: Yeah, sure.

Dylan: It came from thinking about what value alignment is. So if we think about values alignment between two agents and those are both perfectly rational actors, making decisions in the world perfectly in accordance with their values, with full information. I can sort of write down a definition of value alignment, which is basically you’re using the same ranking over alternatives that I am.

But a question that we really wanted to try to answer that feels really important is what does it mean to be value aligned in a partial context? If you were a bounded agent, if you’re not a perfectly rational agent, what does it actually mean for you to be value aligned? That was the question that we also didn’t really know how to answer.

Lucas: My initial reaction is the kind of agent that tries its best with its limited rationality to be like the former thing that you talked about.

Dylan: Right, so that leads to a question that we thought about, so as opposed I have a chess playing agent and it is my chess playing agent and so I wanted to win the game for me. Suppose it’s using the correct goal test, so it is actually optimizing for my values. Let’s say it’s only searching out to depth three, so it’s pretty dumb as far as chess players go.

Do I think that that is an agent that is value aligned with me? Maybe. I mean, certainly I can tell the story in one way that it sounds like it is. It’s using the correct objective function, it’s doing some sort of optimization thing. If it ever identifies a checkmate move in three moves, I will always find that get that back to me. And so that’s a sense in which it feels like it is a value aligned agent.

On the other hand, what if it’s using a heuristic function which is chosen poorly, or and something closer to an adversarial manner. So now it’s a depth three agent that is still using the correct goal test, but it’s searching in a way that is adversarially selected. Is that a partially value aligned agent?

Lucas: Sorry, I don’t understand what it means to have the same objective function, but be searching in three depth in an adversarial way.

Dylan: In particular, when you’re doing a chess search engine, there is your sort of goal tests that you run on your leaves of your search to see if you’ve actually achieved winning the game. But because you’re only doing a partial search, you often have to rely on using a heuristic of some sort to like rank different positions.

Lucas: To cut off parts of the tree.

Dylan: Somewhat to cut off parts of the tree, but also just like you’ve got different positions, neither of which are winning and you need to choose between those.

Lucas: All right. So there’s a heuristic, like it’s usually good to take the center or like the queen is something that you should always probably keep.

Dylan: Or these things that are like values of pieces that you can add up was I think one of the problems …

Lucas: Yeah, and just as like an important note now in terms of the state of machine learning, the heuristics are usually chosen by the programmer. Are system is able to collapse on heuristics themselves?

Dylan: Well, so I’d say one of the big things in like AlphaZero or AlphaGo as an approach is that they applied sort of learning on the heuristic itself and they figured out a way to use the search process to gradually improve the heuristic and have the heuristic actually improving the search process.

And so there’s sort of a feedback loop set up in those types of expert iteration systems. What my point here is that when I described that search algorithm to you, I didn’t mention what heuristic it was using at all. And so you had no reason to tell me whether or not that system was partially value aligned or not because actually with heuristic is 100 percent of what’s going to determine the final performance of the system and whether or not it’s actually helping you.

And then the sort of final point I have here that I might be able to confuse you with a little bit more is, what if we just sort of said, “Okay, forget this whole searching business. I’m just going to precompute all the solutions from my search algorithm and I’m going to give you a policy of when you’re in this position, do this move. When you’re in that position, do that move.” And what would it mean for that policy to be values aligned with me?

Lucas: If it did everything that you would have done if you were the one playing the chess game. Like is that value alignment?

Dylan: That certainly perfect imitation, and maybe we [crosstalk 00:33:04]

Lucas: Perfect imitation isn’t necessarily value alignment because you don’t want it to perfectly imitate you, you want it to win the game.

Dylan: Right.

Lucas: Isn’t the easiest way to just sort of understand this is that there are degrees of value alignment and value alignment is the extent to which the thing is able to achieve the goals that you want?

Dylan: Somewhat, but the important thing here is trying to understand what these intuitive notions that we’re talking about actually mean for the mathematics of sequential decision making. And so there’s a sense in which you and I can talk about partial value alignment and the agents that are trying to help you. But if we actually look at the math of the problem, it’s actually very hard to understand how that actually translates. Like mathematically I have lots of properties that I could write down and I don’t know which one of those I want to call partial value alignment.

Lucas: You know more about the math than I do, but the percentage chance of a thing achieving the goal is the degree to which its value aligned? If you’re certain that the end towards which is striving, and the end towards what you want it to strive?

Dylan: Right, but that striving term is a hard one, right? Because if your goals aren’t achievable then it’s impossible to be value aligned with you in that sense.

Lucas: Yeah, you have to measure the degree to which the end towards which it’s striving is the end towards what you want it to strive and then also measure the degree to which the way that it tries to get to what you want is efficacious or …

Dylan: Right. I think that intuitively I agree with you and I know what you mean, but it’s like I can do things like I can write down a reward function and I can say how well does this system optimize that reward function? And we could ask whether or not that means its value aligned with it or not. But to me, that just sounds like the question of like is your policy optimal and the sort of more standard context.

Lucas: All right, so have you written about how you think that CIRL subsumes all of these other methodologies? And if it does subsume these other AI Alignment methodologies. How do you think that will influence or affect the way we should think about the other ones?

Dylan: I haven’t written that explicitly, but when I’ve tried to convey is that it’s a formalization of the type of problem we’re trying to solve. I think describing this subsuming them is not quite right.

Lucas: It contextualizes them and it brings light to them by providing framing.

Dylan: It gives me a way to compare those different approaches and understand what’s different and what’s the same between them, and in what ways are they … like in what scenarios do we expect them to work out versus not? One thing that we’ve been thinking about recently is what happens when the person doesn’t know immediately and what they’re trying to do.

So if we imagine that there is in fact the static set of preferences, the person’s trying to optimize, so we’re still making that assumption, but assuming that those preferences are revealed to the person over time through experience or interaction with the world. That is a richer class of value alignment problems than cooperative IRL deals with. It’s really closer to what we are attempting to do right now.

Lucas: Yeah, and I mean that doesn’t even include value degeneracy, like what if I get hooked on drugs in the next three years and all my values go and my IRL agent works on assumptions that I’m always updating towards what I want, but you know …

Dylan: Yes, and I think that’s where you get these questions of changing preferences that make it hard to really think through things. I think there’s a philosophical stance you’re taking there, which is that your values have changed rather than your beliefs have changed there.

In the sense that wire-heading is a phenomenon that we see in people and in general learning agents, and if you are attempting to help it learning agent, you must be aware of the fact that wire-heading is a possibility and possibly bad. And then it’s incredibly hard to distinguish from someone who’s just found something that they really like and want to do.

When you should make that distinction or how you should make that distinction is a really challenging question, that’s not a purely technical computer science question.

Lucas: Yeah, but even at the same time, I would like to demystify it a bit. If your friend got hooked on drugs, it’s pretty obvious for you why it’s bad, it’s bad because he’s losing control, it’s bad because he’s sacrificing all of his other values. It’s bad because he’s shortening his life span by a lot.

I just mean to win again, in this way, it’s obvious in ways in which humans do this, so I guess if we take biological inspired approaches to understanding cognition and transferring how humans deal with these things into AI machines, at least at face value seems like a good way of doing it, I guess.

Dylan: Yes, everything that you said I agree with. My point is that those are in a very real sense, normative assumptions that you as that person’s friend are able to bring to the analysis of that problem, and in in some ways there is an arbitrariness to labeling that as bad.

Lucas: Yeah, so the normative issue is obviously very contentious and needs to be addressed more, but at the same time society has come to very clear solutions to normative problems like murder is basically a solved normative problem. There’s a degree to which it’s super obvious that certain normative questions are just answer it and we should I guess practice epistemic humility and whatever here obviously.

Dylan: Right, and I don’t disagree with you on that point, but I think what I’d say is, as a research problem there’s a real question to getting a better understanding of the normative processes whereby we got to solving that question. Like what is the human normative process? It’s a collective societal system. How does that system evolve and change? And then how should machines or other intelligent entities integrate into that system without either subsuming or destroying it in bad ways? I think that’s what I’m trying to get at when I make these points. There is something about what we’re doing here as a society that gets us to labeling these things in the ways that we do and calling them good or bad.

And on the one hand, as a person believe that there are correct answers and I know what I think is right versus what I think is wrong. And then as a scientist I want to try to take a little bit more of an outside view and try to understand like what is the process whereby we as a society or as genetic beings started doing that? Understanding what that process is and how that process evolves, and actually what that looks like in people now is a really critical research program.

Lucas: So one thing that I tried to cover in my panel yesterday on what civilization should strive for, is in the short, medium, to longterm the potential role that narrow to general AI systems might play in amplifying human moral decision making.

Solving as you were discussing this sort of deliberative, normative process that human beings undergo to total converge on an idea. I’m just curious to know like with more narrow systems, if you’re optimistic about ways in which AI can sort of help and elucidate our moral decision making at work to amplify it.

And before I let you start, I guess there’s one other thing that I said that I think Rohin Shah pointed out to me that was particularly helpful in one place. But beyond the moral decision making, the narrow AI systems can help us by making the moral decision make, the decisions that we implement them faster than we could.

Depending on the way a self-driving car decides to crash is like an expression of our moral decision making in like a fast computery way. I’m just saying like beyond ways in which AI systems make moral decisions for us faster than we can, I don’t know, maybe in courts or other things which seem morally contentious. Are there also other ways in which they can actually help the deliberative process examining massive amounts of moral information or like a value information or analyzing something like an aggregated well-being index where we try to understand more so how policies impact the wellbeing of people or like what sorts of moral decisions lead to good outcomes, whatever. So whatever you have to say to that.

Dylan: Yes, I definitely want to echo that. We can sort of get a lot of pre-deliberation into a fast timescale reaction with AI systems and I think that that is a way for us to improve how we act in the quality of the things that we do from a moral perspective. That you do see a real path and to actually bringing that to be in the world.

In terms of helping us actually deliberate better, I think that is a harder problem that I think is absolutely worth more people thinking about but I don’t know the answers here. What I do think is that if we have a better understanding of what the deliberative process is, I think there are correct questions to look at to try to get to that or not, the moral questions about what’s right and what’s wrong and what do we think is right and what do we think is wrong, but they are much more questions at the level of what is it about our evolutionary pathway that led us to thinking that these things are right or wrong.

What is it about society and the pressures that you’re gone and faced that led us to things where murder is wrong in almost every society in the world. I will say the death penalty is the thing, it’s just the type of sanctioned murder. So there is a sense in which I think it’s a bit more nuanced than just that. And there’s something to be said about like I guess if I had to make my claims, like what I think has sort of happened there.

So there’s something about us as creatures that evolved to coordinate and perform well in groups and pressures that, that placed on us that caused us to develop these normative systems whereby we say different things are right and wrong.

Lucas: Iterated game theory over millions of years or something.

Dylan: Something like that. Yeah, but there’s a sense in which us labeling things as right and wrong and developing the processes whereby we label things as right and wrong is a thing that we’ve been pushed towards.

Lucas: From my perspective, it feels like this is more tractable than people lead on, like AI is only going to be able to help in moral deliberation, once it’s general. It already helps us in regular deliberation and moral deliberation isn’t a special kind of deliberation and moral deliberation requires empirical facts about the world and in persons just like any other kind of actionable deliberation does and domains that aren’t considered to have to do with moral philosophy or ethics or things like that.

So I’m not an AI researcher, but it seems to me like this is more attractable than people lead onto be. The normative aspect of AI Alignment seems to be under researched.

Dylan: Can you say a little more about what you mean by that?

Lucas: What I meant was the normative deliberative process, the difficulty in coming to normative conclusions and what the appropriate epistemic and deliberative process is for arriving at normative solutions and how narrow AI systems can take us to a beautiful world where advanced AI systems actually lead us to post human ethics.

If we ever want to get to a place where general systems take us to post human ethics, why not start today with figuring out how narrow systems can work to amplify human moral decision making and deliberative processes.

Dylan: I think the hard part there is, I don’t exactly know what it means to amplify those processes. My perspective is that we as a species do not yet have a good understanding of what those deliberative processes actually represent and what formed the result actually does.

Lucas: It’s just like giving more information, providing tons of data, analyzing the data, potentially pointing out biases. The part where they’re literally amplifying cognitive implicit or explicit decision making process is more complicated and will require more advancement and cognition and deliberation and stuff. But yeah, I still think there are more mundane ways in which it can make us better moral reasoners and decision makers.

If I could give you like 10,000 more bits of information every day about moral decisions that you make, you would probably just be a better moral agent.

Dylan: Yes, one way to try to think about that is maybe things like VR approaches to increasing empathy. I think that that has a lot of power to make us better.

Lucas: Max always says that there’s a race between wisdom and the power of our technology and it seems like people really aren’t taking seriously ways in which we can amplify wisdom because wisdom is generally taken to be part of the humanities and like the soft sciences. Maybe we should be taking more seriously ways in which narrow current day AI systems can be used to amplify the progress at which the human species makes wisdom. Because otherwise we’re just gonna like continue how we always continue and the wisdom is going to go really slow and then we’re going to probably learn from a bunch of mistakes.

And it’s just not going to be as good until we’ll develop a rigorous science of making moral progress or like using technology to amplify the progress of wisdom and moral progress.

Dylan: So in principle, what you’re saying, I don’t really disagree with it, but I also don’t know how that would change what I’m working on either. In the sense that I’m not sure what it would mean. I do not know how I would do research on amplifying wisdom. I just don’t really know what that means. And that’s not to say it’s an impossible problem, we talked earlier about how I don’t know what partial value alignment means, that something that you and I can talk about it and we can intuitively I think align on a concept, but it’s not a concept I knew how to translate into actionable concrete research problems right now.

In the same way, the idea of amplifying wisdom and making people more wise is something that I think intuitively I understand what you mean, but when I try to think about how an AI system would make someone wiser, that feels difficult.

Lucas: It can seem difficult, but I feel like it would, obviously this is like an open research question, but if you were able to identify a bunch of variables that are most important for moral decision making and then if you could use AI systems to sort of gather aggregate and compile in certain ways and analyze moral information in this way, again, it just seems more tractable than people seem to be letting on.

Dylan: Yeah, although I wonder now is that different from value alignment does, we’re thinking about it, right? Concrete research thing I spend a while thinking about is, how do you identify the features that a person considers to be valuable? Say, we don’t know the relative tradeoffs between them.

One way you might try to solve value alignment is have a process that identifies the features that might matter in the world and then have a second process that identifies the appropriate tradeoffs between those features, and maybe something about diminishing returns or something like that. And that to me sounds like I just placed values with wisdom and I’ve got sort of what you’re thinking about. I think both of those terms are similarly diffuse. I wonder if what we’re talking about is semantics, and if it’s not, I’d like to know what the difference is.

Lucas: I guess, the more mundane definition of wisdom, at least in the way that Max Tegmark would use it would be like the ways in which we use our technology. I might have specific preferences, but just because I have specific preferences that I may or may not be aligning an AI system to does not necessarily mean that that total process, this like CIRL process is actually an expression of wisdom.

Dylan: Okay, can you provide a positive description of what a process would look like? Or like basically what I’m saying is I can hear the point of I have preferences and I aligned my system to it and that’s not necessarily a wise system and …

Lucas: Yeah, like I build a fire because I want to be hot, but then the fire catches my village on fire and no longer is … That’s still might be value alignment.

Dylan: But isn’t [crosstalk 00:48:39] some values that you didn’t take into account when you were deciding to build the fire.

Lucas: Yeah, that’s right. So I don’t know. I’d probably have to think about this more because I guess this is something that I just sort of throwing out right now as a reaction to what we’ve been talking about. So I don’t have a very good theory of it.

Dylan: And I don’t wanna say that you need to know the right answers to these things to not have that be a useful direction to push people.

Lucas: We don’t want to use different concepts to just reframe the same problem and just make a conceptual mess.

Dylan: That’s what I’m a little bit concerned about and that’s the thing I’m concerned about broadly. We’ve got a lot of issues that we’re thinking about in dealing with that we’re not really sure what they are.

For me, I think one of the really helpful things has been to frame the issue that I’m thinking about as if a person has a behavior that they want to implement into the world and that’s a complex behavior that they don’t know how to identify immediately. How do you actually go about building systems that allow you to implement that behavior effectively, evaluate that the behavior is actually been correctly implemented.

Lucas: Avoiding side effects, avoiding …

Dylan: Like all of these kinds of things that we sort of concerned about in AI Safety, in my mind fall a bit more into place when we frame the problem as I have a desired behavior that I want to exist, a response function, a policy function that I want to implement into the world. What are the technological systems I can use to implement that in a computer or a robot or what have you.

Lucas: Okay. Well, do you have anything else you’d like to wrap up on?

Dylan: No, I just, I want to say thanks for asking hard questions and making me feel uncomfortable because I think it’s important to do a lot of that as a scientist and in particular I think as people working on AI, we should be spending a bit more time being uncomfortable and talking about these things, because it does impact what we end up doing and it does I think impact the trajectories that we put the technology on.

Lucas: Wonderful. So if people want to read about cooperative inverse reinforcement learning, where can we find the paper or other work that you have on that? What do you think are the best resources? What are just general things you’d like to point people towards in order to follow you or keep up to date with AI Alignment?

Dylan: I tweet occasionally about AI Alignment and a bit of AI ethics questions, the Hadfield-Menell, my first initial, last name. And if you’re interested in getting a technical introduction to value alignment, I would say take a look at the 2016 paper on cooperative IRL. If you’d like a more general introduction, there’s a blog post from summer 2017 on the bear blog.

Lucas: All right, thanks so much Dylan, and maybe we’ll be sitting in a similar room again in two years for Beneficial Artificial Super Intelligence 2021.

Dylan: I look forward to it. Thanks a bunch.

Lucas: Thanks. See you, Dylan. If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

[end of recorded material]

Podcast: Existential Hope in 2019 and Beyond

Humanity is at a turning point. For the first time in history, we have the technology to completely obliterate ourselves. But we’ve also created boundless possibilities for all life that could enable  just about any brilliant future we can imagine. Humanity could erase itself with a nuclear war or a poorly designed AI, or we could colonize space and expand life throughout the universe: As a species, our future has never been more open-ended.

The potential for disaster is often more visible than the potential for triumph, so as we prepare for 2019, we want to talk about existential hope, and why we should actually be more excited than ever about the future. In this podcast, Ariel talks to six experts–Anthony Aguirre, Max Tegmark, Gaia Dempsey, Allison Duettmann, Josh Clark, and Anders Sandberg–about their views on the present, the future, and the path between them.

Anthony and Max are both physics professors and cofounders of FLI. Gaia is a tech enthusiast and entrepreneur, and with her newest venture, 7th Future, she’s focusing on bringing people and organizations together to imagine and figure out how to build a better future. Allison is a researcher and program coordinator at the Foresight Institute and creator of the website existentialhope.com. Josh is cohost on the Stuff You Should Know Podcast, and he recently released a 10-part series on existential risks called The End of the World with Josh Clark. Anders is a senior researcher at the Future of Humanity Institute with a background in computational neuroscience, and for the past 20 years, he’s studied the ethics of human enhancement, existential risks, emerging technology, and life in the far future.

We hope you’ll come away feeling inspired and motivated–not just to prevent catastrophe, but to facilitate greatness.

Topics discussed in this episode include:

  • How technology aids us in realizing personal and societal goals.
  • FLI’s successes in 2018 and our goals for 2019.
  • Worldbuilding and how to conceptualize the future.
  • The possibility of other life in the universe and its implications for the future of humanity.
  • How we can improve as a species and strategies for doing so.
  • The importance of a shared positive vision for the future, what that vision might look like, and how a shared vision can still represent a wide enough set of values and goals to cover the billions of people alive today and in the future.
  • Existential hope and what it looks like now and far into the future.

You can listen to the podcast above, or read the full transcript below.

Ariel: Hi everyone. Welcome back to the FLI podcast. I’m your host, Ariel Conn, and I am truly excited to bring you today’s show. This month, we’re departing from our standard two-guest interview format because we wanted to tackle a big and fantastic topic for the end of the year that would require insight from a few extra people. It may seem as if we at FLI spend a lot of our time worrying about existential risks, but it’s helpful to remember that we don’t do this because we think the world will end tragically: We address issues relating to existential risks because we’re so confident that if we can overcome these threats, we can achieve a future greater than any of us can imagine.

And so, as we end 2018 and look toward 2019, we want to focus on a message of hope, a message of existential hope.

I’m delighted to present Anthony Aguirre, Max Tegmark, Gaia Dempsey, Allison Duettmann, Josh Clark and Anders Sandberg, all of whom were kind enough to come on the show and talk about why they’re so hopeful for the future and just how amazing that future could be.

Anthony and Max are both physics professors and cofounders of FLI. Gaia is a tech enthusiast and entrepreneur, and with her newest venture, 7th Future, she’s focusing on bringing people and organizations together to imagine and figure out how to build a better future. Allison is a researcher and program coordinator at the Foresight Institute and she created the website existentialhope.com. Josh is cohost on the Stuff You Should Know Podcast, and he recently released a 10-part series on existential risks called The End of the World with Josh Clark. Anders is a senior researcher at the Future of Humanity Institute with a background in computational neuroscience, and for the past 20 years, he’s studied the ethics of human enhancement, existential risks, emerging technology, and life in the far future.

Over the course of a few days, I interviewed all six of our guests, and I have to say, it had an incredibly powerful and positive impact on my psyche. We’ve merged these interviews together for you here, and I hope you’ll all also walk away feeling a bit more hope for humanity’s collective future, whatever that might be.

But before we go too far into the future, let’s start with Anthony and Max, who can talk a bit about where we are today.

Anthony: I’m Anthony Aguirre, I’m one of the founders of the Future of Life Institute. And in my day job, I’m a Physicist at the University of California at Santa Cruz.

Max: I am Max Tegmark, a professor doing physics and AI research here at MIT, and also the president of the Future of Life Institute.

Ariel: All right. Thank you so much for joining us today. I’m going to start with sort of a big question. That is, do you think we can use technology to solve today’s problems?

Anthony: I think we can use technology to solve any problem in the sense that I think technology is an extension of our capability: it’s something that we develop in order to accomplish our goals and to bring our will into fruition. So, sort of by definition, when we have goals that we want to do — problems that we want to solve — technology should in principle be part of the solution.

Max: Take, for example, poverty. It’s not like we don’t have the technology right now to eliminate poverty. But we’re steering the technology in such a way that there are people who starve to death, and even in America there are a lot of children who just don’t get enough to eat, through no fault of their own.

Anthony: So I’m broadly optimistic that, as it has over and over again, technology will let us do things that we want to do better than we were previously able to do them. Now, that being said, there are things that are more amenable to better technology, and things that are less amenable. And there are technologies that tend to, rather than functioning as kind of an extension of our will, will take on a bit of a life of their own. If you think about technologies like medicine, or good farming techniques, those tend to be sort of overall beneficial and really are kind of accomplishing purposes that we set. You know, we want to be more healthy, we want to be better fed, we build the technology and it happens. On the other hand, there are obviously technologies that are just as useful or even more useful for negative purposes — socially negative or things that most people agree are negative things: landmines, for example, as opposed to vaccines. These technologies come into being because somebody is trying to accomplish their purpose — defending their country against an invading force, say — but once that technology exists, it’s kind of something that is easily used for ill purposes.

Max: Technology simply empowers us to do good things or bad things. Technology isn’t evil, but it’s also not good. It’s morally neutral. Right? You can use fire to warm up your home in the winter or to burn down your neighbor’s house. We have to figure out how to steer it and where we want to go with it. I feel that there’s been so much focus on just making our tech powerful right now — because that makes money, and it’s cool — that we’ve neglected the steering and the destination quite a bit. And in fact, I see the core goal of the Future of Life Institute: Help bring back focus on the steering of our technology and the destination.

Anthony: There are also technologies that are really tricky in that they give us what we think we want, but then we sort of regret having later, like addictive drugs, or gambling, or cheap sugary foods, or-

Ariel: Social media.

Anthony: … certain online platforms that will go unnamed. We feel like this is what we want to do at the time; We choose to do it. We choose to eat the huge sugary thing, or to spend some time surfing the web. But later, with a different perspective maybe, we look back and say, “Boy, I could’ve used those calories, or minutes, or whatever, better.” So who’s right? Is it the person at the time who’s choosing to eat or play or whatever? Or is it the person later who’s deciding, “Yeah, that wasn’t a good use of my time or not.” Those technologies I think are very tricky, because in some sense they’re giving us what we want. So we reward them, we buy them, we spend money, the industries develop, the technologies have money behind them. At the same time, it’s not clear that they make us happier.

So I think there are certain social problems, and problems in general, that technology will be tremendously helpful in improving as long as we can act to sort of wisely try to balance the effects of technology that have dual use toward the positive, and as long as we can somehow get some perspective on what to do about these technologies that take on a life of their own, and tend to make us less happy, even though we dump lots of time and money into them.

Ariel: This sort of idea of technologies — that we’re using them and as we use them we think they make us happy and then in the long run we sort of question that — is this a relatively modern problem, or are there examples of anything that goes further back that we can learn from from history?

Anthony: I think it goes fairly far back. Certainly drug use goes a fair ways back. I think there have been periods where drugs were used as part of religious or social ceremonies and in other kind of more socially constructive ways. But then, it’s been a fair amount of time where opiates and very addictive things have existed also. Those have certainly caused social problems back at least a few centuries.

I think a lot of these examples of technologies that give us what we seem to want but not really what we want are ones in which we’re applying the technology to a species — us — that developed in a very different set of circumstances, and that contrast between what’s available and what we evolutionarily wanted is causing a lot of problems. The sugary foods are an obvious example where we can now just supply huge plenitudes of something that was very rare and precious back in more evolutionary times — you know, sweet calories.

Drugs are something similar. We have a set of chemistry that helps us out in various situations, and then we’re just feeding those same chemical pathways to make ourselves feel good in a way that is destructive. And violence might be something similar. Violent technologies go way, way back. Those are another one that are clearly things that we want to invent to further our will and accomplish our goals. They’re also things that may at some level be addictive to humans. I think it’s not entirely clear exactly how — there’s a strange mix there, but I think there’s certainly something compelling and built into at least many humans’ DNA that promotes fighting and hunting and all kinds of things that were evolutionarily useful way back when and perhaps less useful now. It had a clear evolutionary purpose with tribes that had to defend themselves, with animals that needed to be killed for food. But feeding that desire to run around and hunt and shoot people, which most people aren’t doing in real life, but tons of people are doing in video games. So there’s clearly some built in mechanism that’s rewarding that behavior as being fun to do and compelling. Video games are obviously a better way to express that than running around and doing it in real life, but it tells you something about some circuitry that is still there and is left over from early times. So I think there are a number of examples like that — this connection between our biological evolutionary history and what technology makes available in large quantities — where we really have to think carefully about how we want to play that.

Ariel: So, as you look forward to the future, and sort of considering some of these issues that you’ve brought up, how do you envision us being able to use technology for good and maybe try to overcome some of these issues? I mean, maybe it is good if we’ve got people playing video games instead of going around shooting people in real life.

Anthony: Yeah. So there may be examples where some of that technology can fulfill a need in a less destructive way than it might otherwise be. I think there are also plenty of examples where a technology can root out or sort of change the nature of a problem that would be enormously difficult to do something about without a technology. So for example, I think eating meat, when you analyze it from almost any perspective, is a pretty destructive thing for humanity to be doing. Ecologically, ethically in terms of the happiness of the animals, health-wise: so many things are destructive about it. And yet, you really have the sense that it’s going to be enormously difficult — it would be very unlikely for that to change wholesale on a relatively short period of time.

However, there are technologies — clean meat, cultured meat, really good tasting vegetarian meat substitutes — that are rapidly coming to market. And you could imagine if those things were to get cheap and widely available and perhaps a little bit healthier, that could dramatically change that situation relatively quickly. I think if a non-ecologically destructive, non-suffering inducing, just as tasty and even healthier product were cheaper, I don’t think people would be eating meat. Very few people actually like, I think, intrinsically the idea of having an animal suffer in order for them to eat. So I think that’s an example of something that would be really, really hard to change through just social actions. Could be jump started quite a lot by technology — that’s one of the ones I’m actually quite hopeful about.

Global warming I think is a similar one — it’s on some level a social and economic problem. It’s a long-term planning problem, which we’re very bad at. It’s pretty clear how to solve the global warming issue if we really could think on the right time scales and weigh the economic costs and benefits over decades — it’d be quite clear that mitigating global warming now and doing things about it now might take some overall investment that would clearly pay itself off. But we seem unable to accomplish that.

On the other hand, you could easily imagine a really cheap, really power-dense, quickly rechargeable battery being invented and just utterly transforming that problem into a much, much more tractable one. Or feasible, small-scale nuclear fusion power generation that was cheap. You can imagine technologies that would just make that problem so much easier, even though it is ultimately kind of a social or political problem that could be solved. The technology would just make it dramatically easier to do that.

Ariel: Excellent. And so thinking more hopefully — even when we’re looking at what’s happening in the world today, news is usually focusing on all the bad things that have gone wrong — when you look around the world today, what do you think, “Wow, technology has really helped us achieve this, and this is super exciting?”

Max: Almost everything I love about today is the result of technology. It’s because of technology that we’ve more than doubled the lifespan that we humans used to have, most of human history. More broadly, I feel that the technology is empowering us. Ten thousand years ago, we felt really, really powerless; We were these beings, you know, looking at this great world out there and having very little clue about how it worked — it was largely mysterious to us — and even less ability to actually influence the world in a major way. Then technology enabled science, and vice versa. So the sciences let us understand more and more how the world works, and let us build this technology which lets us shape the world to better suit us. Helping produce much better, much more food, helping keep us warm in the winter, helping make hospitals that can take care of us, and schools that can educate us, and so on.

Ariel: Let’s bring on some of our other guests now. We’ll turn first to Gaia Dempsey. How do you envision technology being used for good?

Gaia: That’s a huge question.

Ariel: It is. Yes.

Gaia: I mean, at its essence I think technology really just means a tool. It means a new way of doing something. Tools can be used to do a lot of good — making our lives easier, saving us time, helping us become more of who we want to be. And I think technology is best used when it supports our individual development in the direction that we actually want to go — when it supports our deeper interests and not just the, say, commercial interests of the company that made it. And I think in order for that to happen, we need for our society to be more literate in technology. And to me that’s not just about understanding how computing platforms work, but also understanding the impact that tools have on us as human beings. Because they don’t just shape our behavior, they actually shape our minds and how we think.

So I think we need to be very intentional about the tools that we choose to use in our own lives, and also the tools that we build as technologists. I’ve always been very inspired by Douglas Engelbart’s work, and I think that — I was revisiting his original conceptual framework on augmenting human intelligence, which he wrote and published in 1962 — and I really think he had the right idea, which is that tools used by human beings don’t exist in a vacuum. They exist in a coherent system and that system involves language: the language that we use to describe the tools and understand how we’re using them; the methodology; and of course the training and education around how we learn to use those tools. And I think that as a tool maker it’s really important to think about each of those pieces of an overarching coherent system, and imagine how they’re all going to work together and fit into an individual’s life and beyond: you know, the level of a community and a society.

Ariel: I want to expand on some of this just a little bit. You mentioned this idea of making sure that the tool, the technology tool, is being used for people and not just for the benefit, the profit, of the company. And that that’s closely connected to making sure that people are literate about the technology. One, just to confirm that that is actually what you were saying. And, two, I mean one of the reasons I want to confirm this is because that is my own concern — that it’s being too focused for making profit and not enough people really understand what’s happening. My question to you is, then, how do we educate people? How do we get them more involved?

Gaia: I think for me, my favorite types of tools are the kinds of tools that support us in developing our thinking and that help us accelerate our ability to learn. But I think that some of how we do this in our society is not just about creating new tools or getting trained on new tools, but really doesn’t have very much to do with technology at all. And that’s in our education system, teaching critical thinking. And teaching, starting at a young age, to not just accept information that is given to you wholesale, but really to examine the motivations and intentions and interests of the creator of that information, and the distributor of that information. And I think these are really just basic tools that we need as citizens in a technological society and in a democracy.

Ariel: That actually moves nicely to another question that I have. Well, I actually think the sentiment might be not quite as strong as it once was, but I do still hear a lot of people who sort of approach technology as the solution to any of today’s problems. And I’m personally a little bit skeptical that we can only use technology. I think, again, it comes back to what you were talking about with it’s a tool so we can use it, but I think it just seems like there’s more that needs to be involved. I guess, how do you envision using technology as a tool, and still incorporating some of these other aspects like teaching critical thinking?

Gaia: You’re really hitting on sort of the core questions that are fundamental to creating the kind of society that we want to live in. And I think that we would do well to spend more time thinking deeply about these questions. I think technology can do really incredible, tremendous things in helping us solve problems and create new capabilities. But it also creates a new set of problems for us to engage with.

We’ve sort of coevolved with our technology. So it’s easy to point to things in the culture and say, “Well, this never would have happened without technology X.” And I think that’s true for things that are both good and bad. I think, again, it’s about taking a step back and taking a broader view, and really not just teaching critical thinking and critical analysis, but also systems level thinking. And understanding that we ourselves are complex systems, and we’re not perfect in the way that we perceive reality — we have cognitive biases, we cannot necessarily always trust our own perceptions. And I think that’s a lifelong piece of work that everyone can engage with, which is really about understanding yourself first. This is something that Yuval Noah Harari talked about in a couple of his recent books and articles that he’s been writing, which is: if we don’t do the work to really understand ourselves first and our own motivations and interests, and sort of where we want to go in the world, we’re much more easily co-opted and hackable by systems that are external to us.

There are many examples of recommendation algorithms and sentiment analysis — audience segmentation tools that companies are using to be able to predict what we want and present that information to us before we’ve had a chance to imagine that that is something we could want. And while that’s potentially useful and lucrative for marketers, the question is what happens when those tools are then utilized not just to sell us a better toothbrush on Amazon, but when it’s actually used in a political context. And so with the advent of these vast machine learning, reinforcement learning systems that can look at data and look at our behavior patterns and understand trends in our behavior and our interests, that presents a really huge issue if we are not ourselves able to pause and create a gap, and create a space between the information that’s being presented to us within the systems that we’re utilizing and really our own internal compass.

Ariel: You’ve said two things that I think are sort of interesting, especially when they’re brought together. And the first is this idea that we’ve coevolved with technology — which, I actually hadn’t thought of it in that phrase before, and I think it’s a really, really good description. But then when we consider that we’ve coevolved with technology, what does that mean in terms of knowing ourselves? And especially knowing ourselves as our biological bodies, and our limiting cognitive biases? I don’t know if that’s something that you’ve thought about much, but I think that combination of ideas is an interesting one.

Gaia: I mean, I know that I certainly already feel like I’m a cyborg. Part of knowing myself is — it does involve understanding the tools that I use, that feel that they are extensions of myself. That kind of comes back to the idea of technology literacy, and systems literacy, and being intentional about the kinds of tools that I want to use. For me, my favorite types of tools are the kind that I think are very rare: the kind that support us developing the capacity for long-term thinking, and for being true to the long-term intentions and goals that I set for myself.

Ariel: Can you give some examples of those?

Gaia: Yeah, I’ll give a couple examples. One example that’s sort of probably familiar to a lot of people listening to this comes from the book Ready Player One. And in this book the main character is interacting with his VR system that he sort of lives and breathes in every single day. And at a certain point the system asks him: do you want to activate your health module? I forgot exactly what it was called. And without giving it too much thought, he kind of goes, “Sure. Yeah, I’d like to be healthier.” And it instantiates a process whereby he’s not allowed to log into the OASIS without going through his exercise routine every morning. To me, what’s happening there is: there is a choice.

And it’s an interesting system design because he didn’t actually do that much deep thinking about, “Oh yeah, this is a choice I really want to commit to.” But the system is sort of saying, “We’re thinking through the way that your decision making process works, and we think that this is something you really do want to consider. And we think that you’re going to need about three months before you make a final decision as to whether this is something you want to continue with.”

So that three month period or whatever, and I believe it was three months in the book, is what’s known as an akrasia horizon. Which is a term that I learned through a different tool that is sort of a real life version of that, which is called Beeminder. And the akrasia horizon is, really, it’s a time period that’s long enough that it will sort of circumvent a cognitive bias that we have to really prioritize the near term at the expense of the future. And in the case of the Ready Player One example, the near term desire that he would have that would circumvent the future — his long-term health — is, “I don’t feel like working out today. I just want to get into my email or I just want to play a video game right now.” And a very similar sort of setup is created in this tool Beeminder, which I love to use to support some goals that I want to make sure I’m really very motivated to meet.

So it’s a tool where you can put in your goals and you can track them either yourself by entering the data manually, or you can connect to a number of different tracking capabilities like RescueTime and others. And if you don’t stay on track with your goals, they charge your credit card. It’s a very effective sort of motivating force. And so I sort of have a nickname: I call these systems time bridges. Which are really choices made by your long-term thinking self, that in some way supersedes the gravitational pull toward mediocrity inherent in your short-term impulses.

It’s about experimenting too. And this is one particular system that creates consequences and accountability. And I love systems. For me if I don’t have systems in my life that help me organize the work that I want to do, I’m hopeless. That’s why I like to collect and I’m sort of an avid taster of different systems, and I’ll try anything, and really collect and see what works. And I think that’s important. It’s a process of experimentation to see what works for you.

Ariel: Let’s turn to Allison Duettmann now, for her take on how we can use technology to help us become better versions of ourselves and to improve our societal interactions.

Allison: I think there are a lot of technological tools that we can use to aid our reasoning and sense making and coordination. So I think that technologies can be used to help with reasoning, for example, by mitigating trauma, or bias, or by augmenting our intelligence. That’s the whole point of creating AI in the first place. Technologies can also be used to help with collective sense-making, for example with truth-finding and knowledge management, and I think your hypertexts and prediction markets — something that Anthony’s working on — are really worthy examples here. I also think technologies can be used to help with coordination. Mark Miller, who I’m currently writing a book with, likes to say that if you lower the risks of cooperation, you’ll get a more cooperative world. I think that most cooperative interactions may soon be digital.

Ariel: That’s sort of an interesting idea, that there’s risks to cooperation. Can you maybe expand on that a little bit more?

Allison: Yeah, sure. I think that most of our interactions are already digital ones, for some of us at least, and they will be more and more so in the future. So I think that one step to lowering the risk of cooperation is establishing cybersecurity as a first step, because this would decrease the risk of digital coercion. But I do think that’s only part of it, because rather than just freeing us from the restraints that keep us from cooperating, we also need to equip us with the tools to cooperate, right?

Ariel: Yes.

Allison: I think some of those may be smart contracts to allow individuals to credibly commit, but there may be others too. I just think that we have to realize that the same technologies that we’re worried about in terms of risks are also the ones that may augment our abilities to decrease those risks.

Ariel: One of the things that came to mind as you were talking about this, using technology to improve cooperation — when we look at the world today, technology isn’t spread across the globe evenly. People don’t have equal access to these tools that could help. Do you have ideas for how we address various inequality issues, I guess?

Allison: I think inequality is a hot topic to address. I’m currently writing a book with Mark Miller and Christine Peterson on a few strategies to strengthen civilization. In this book we outline a few paths to do so, but also potential positive outcomes. One of the outcomes that we’re outlining is a voluntary world in which all entities can cooperate freely with each other to realize their interests. It’s kind of based on the premise that finding one utopia that works for everyone is hard, and is perhaps impossible, but that in the absence of knowing what’s in everyone’s interest, we shouldn’t try to impose any interests by one entity — whether that’s an AI or an organization or a state — but we should try to create a framework in which different entities, with different interests, whether they’re human or artificial, can pursue their interests freely by cooperating. And I think If you look at the strategy, it has worked pretty well so far. If you look at society right now it’s really not perfect, but by allowing humans to cooperate freely and engage in some mutually beneficial relationships, civilization already serves our interests quite well. And it’s really not perfect by far, I’m not saying this, but I think as a whole, our civilization at least tends imperfectly to plan for pareto-preferred paths. We have survived so far, and in better and better ways.

So a few ways that we propose to strengthen this highly involved process is by proposing kind of general recommendations for solving coordination problems, and then a few more specific ideas on reframing a few risks. But I do think that enabling a voluntary world in which different entities can cooperate freely with each other is the best we can do, given our limited knowledge of what is in everyone’s interests.

Ariel: I find that interesting, because I hear lots of people focus on how great intelligence is, and intelligence is great, but it does often seem — and I hear other people say this — that cooperation is also one of the things that our species has gotten right. We fail at it sometimes, but it’s been one of the things, I think, that’s helped.

Allison: Yeah, I agree. I hosted an event last year at the Internet Archive on different definitions of intelligence. Because in the paper that we wrote last year, we have this very grand, or broad conception of intelligence, which includes civilization as an intelligence. So I think you may be asking yourself the question of, what does it mean to be intelligent, and if what we care about is problem-solving ability then I think that civilization certainly classifies as a system that can solve more problems than any individual that is within it alone. So I do think this is part of the cooperative nature of the individual parts within civilization, and so I don’t think that cooperation and intelligence are mutually exclusive at all. Marvin Minsky wrote this amazing book, Society of Mind, and in much of this, has similar ideas.

Ariel: I’d like to take this idea and turn it around, and this is a question specifically for Max and Anthony: looking back at this past year, how has FLI helped foster cooperation and public engagement surrounding the issues we’re concerned about? What would you say were FLI’s greatest successes in 2018?

Anthony: Let’s see, 2018. What I’ve personally enjoyed the most, I would say, is starting the engagement between the technical researchers and the nonprofit community really starting to get more engaged with state and federal governments. So for example the Asilomar principles — which were generated at this nexus of business and nonprofit and academic thinkers about AI and related things — I think were great. But that conversation didn’t really include much from people in policy, and governance, and governments, and so on. So, starting to see that thinking, and those recommendations, and those aspirations of the community of people who know about AI and are thinking hard about it and what it should do and what it shouldn’t do — seeing that start to come into the political sphere, and the government sphere, and the policy sphere I think is really encouraging.

That seems to be happening in many places at some level. I think the local one that I’m excited about is the passage of the California legislature of a resolution endorsing the Asilomar principles. That felt really good to see that happen and really encouraging that there were people in the legislature that — we didn’t go and lobby them to do that, they came to us and said, “This is really important. We want to do something.” And we worked with them to do that. That was super encouraging, because it really made it feel like there is a really open door, and there’s a desire in the policy world to do something. This thing is getting on people’s radar, that there’s a huge transformation coming from AI.

They see that their responsibility is to do something about that. They don’t intrinsically know what they should be doing, they’re not experts in AI, they haven’t been following the field. So there needs to be that connection and it’s really encouraging to see how open they are and how much can be produced with honestly not a huge level of effort; Just communication and talking through things I think made a significant impact. I was also happy to see how much support there continues to be for controlling the possibility of lethal autonomous weapons.

The thing we’ve done this year, the lethal autonomous weapons pledge, I felt really good about the success of. So this was an idea that anybody who’s interested, but especially companies who are engaged in developing related technologies, drones, or facial recognition, or robotics, or AI in general — to get them to take that step themselves of saying, “No, we want to develop these technologies for good, and we have no interest in developing things that are going to be weaponized and used in lethal autonomous weapons.”

I think having a large number of people and corporations sign on to a pledge like that is useful not so much because they were planning to do all those things and now they signed a pledge, so they’re not going to do it anymore. I think that’s not really the model so much as it’s creating a social and cultural norm that these are things that people just don’t want to have anything to do with, just like biotech companies don’t really want to be developing biological weapons, they want to be seen as forces for good that are building medicines and therapies and treatments and things. Everybody is happy for biotech companies to be doing those things.

If biotech companies were building biological weapons also, you really start to wonder, “Okay, wait a minute, why are we supporting this? What are they doing with my information? What are they doing with all this genetics that they’re getting? What are they doing with the research that’s funded by the government? Do we really want to be supporting this?” So keeping that distinction in the industry between all the things that we all support — better technologies for helping people — versus the military applications, particularly in this rather destabilizing and destructive way: I think that is more the purpose — to really make clear that there are companies that are going to develop weapons for the military, and that’s part of the reality of the world.

We have militaries; We need, at the moment, militaries. I think I certainly would not advocate that the US should stop defending itself, or shouldn’t develop weapons, and I think it’s good that there are companies that are building those things. But there are very tricky issues when the companies building military weapons are the same companies that are handling all of the data of all of the people in the world or in the country. I think that really requires a lot of thought, how we’re going to handle it. And seeing companies engage with those questions and thinking about how are the technologies that we’re developing, how are they going to be used and for what purposes, and what purposes do we not want them to be used for is really, really heartening. It’s been very positive I think to see at least in certain companies those sort of conversations go on with our pledge or just in other ways.

You know, seeing companies come out with, “This is something that we’re really worried about. We’re developing these technologies, but we see that there could be major problems with them.” That’s very encouraging. I don’t think it’s necessarily a substitute for something happening at the regulatory or policy level, I think that’s probably necessary too, but it’s hugely encouraging to see companies being proactive about thinking about the societal and ethical implications of the technologies they’re developing.

Max: There are four things I’m quite excited about. One of them is that we managed to get so many leading companies and AI researchers and universities to pledge to not build lethal autonomous weapons, also known as killer robots. Second is that we were able to channel two million dollars, thanks to Elon Musk, to 10 research groups around the world to help figure out how to make artificial general intelligence safe and beneficial. Third is that the state of California decided to officially endorse the 23 Asilomar Principles. It’s really cool that these are getting more taken seriously now, even by policy makers. And the fourth is that we were able to track down the children of Stanislav Petrov in Russia, thanks to whom this year is not the 35th anniversary year of World War III, and actually give them the appreciation we feel that they deserve.

I’ll tell you a little more about this one because it’s something I think a lot of people still aren’t that aware of. But September 26th, 35 years ago, Stanislav Petrov was on shift and in charge of his Soviet early warning station, which showed five US nuclear missiles incoming, one after the other. Obviously, not what he was hoping that would happen at work that day and a really horribly scary situation where the natural response is to do what that system was built for: namely, warning the Soviet Union so that they would immediately strike back. And if that had happened, then thousands of mushroom clouds later, you know, you and I, Ariel, would probably not be having this conversation. Instead, he, mostly on gut instinct, came to the conclusion that there was something wrong and said, “This is a false alarm.” And we’re incredibly grateful for that level-headed action of him. He passed away recently.

His two children are living on very modest means outside of Moscow and we felt that when someone does something like this, or in his case abstains from doing something, that future generations really appreciate, we should show our appreciation, so that others in his situation later on know that if they sacrifice themselves for the greater good, they will be appreciated. Or if they’re dead, their loved ones will. So we organized a ceremony in New York City and invited them to it and bought air tickets for them and so on. And in a very darkly humorous illustration of how screwed up their relationships are at the global level now, the US decided that because — that the way to show appreciation for the US not having gotten nuked was to deny a visa to Stanislav’s son. So he could only join by Skype. Fortunately, his daughter was able to get a visa, even though the waiting period to even get a visa point for Moscow was 300 days. We had to fly her to Israel to get her the Visa.

But she came and it was her first time ever outside of Russia. She was super excited to come and see New York. It was very touching for me to see all the affection that the New Yorkers there deemed at her and see her reaction and her husband’s reaction and to get to give her this $50,000 award, which for them was actually a big deal. Although it’s of course nothing compared to the value for the rest of the world of what their father did. And it was a very sobering reminder that we’ve had dozens of near misses where we almost had a nuclear war by mistake. And even though the newspapers usually make us worry about North Korea and Iran, of course by far the most likely way in which we might get killed by a nuclear explosion is because another just stupid malfunction or error causing the US and Russia to start a war by mistake.

I hope that this ceremony and the one we did the year before also, for family of Vasili Arkhipov, can also help to remind people that hey, you know, what we’re doing here, having 14,000 hydrogen bombs and just relying on luck year after year isn’t a sustainable long-term strategy and we should get our act together and reduce nuclear arsenals down to the level needed for deterrence and focus our money on more productive things.

Ariel: So I wanted to just add a quick follow-up to that because I had the privilege of attending the ceremony and I got to meet the Petrovs. And one of the things that I found most touching about meeting them was their own reaction to New York, which was in part just an awe of the freedom that they felt. And I think, especially, this is sort of a US centric version of hope, but it’s easy for us to get distracted by how bad things are because of what we see in the news, but it was a really nice reminder of how good things are too.

Max: Yeah. It’s very helpful to see things through other people’s eyes and in many cases, it’s a reminder of how much we have to lose if we screw up.

Ariel: Yeah.

Max: And how much we have that we should be really grateful for and cherish and preserve. It’s even more striking if you just look at the whole planet, you know, in a broader perspective. It’s a fantastic, fantastic place, this planet. There’s nothing else in the solar system even remotely this nice. So I think we have a lot to win if we can take good care of it and not ruin it. And obviously, the quickest way to ruin it would be to have an accidental nuclear war, which — it would be just by far the most ridiculously pathetic thing humans have ever done, and yet, this isn’t even really a major election issue. Most people don’t think about it. Most people don’t talk about it. This is, of course, the reason that we, with the Future of Life Institute, try to keep focusing on the importance of positive uses of technology, whether it be nuclear technology, AI technology, or biotechnology, because if we use it wisely, we can create such an awesome future, like you said: Take the good things we have, make them even better.

Ariel: So this seems like a good moment to introduce another guest, who just did a whole podcast series exploring existential risks relating to AI, biotech, nanotech, and all of the other technologies that could either destroy society or help us achieve incredible advances if we use them right.

Josh: I’m Josh Clark. I’m a podcaster. And I’m the host of a podcast series called the End of the World with Josh Clark.

Ariel: All right. I am really excited to have you on the show today because I listened to all of the End of the World. And it was great. It was a really, really wonderful introduction to existential risks.

Josh: Thank you.

Ariel: I highly recommend it to anyone who hasn’t listened to it. But now that you’ve just done this whole series about how things can go horribly wrong, I thought it would be fun to bring you on and talk about what you’re still hopeful for after having just done that whole series.

Josh: Yeah, I’d love that, because a lot of people are hesitant to listen to the series because they’re like, well, “it’s got to be such a downer.” And I mean, it is heavy and it is kind of a downer, but there’s also a lot of hope that just kind of emerged naturally from the series just researching this stuff. There is a lot of hope — it’s pretty cool.

Ariel: That’s good. That’s exactly what I want to hear. What prompted you to do that series, The End of the World?

Josh: Originally, it was just intellectual curiosity. I ran across a Bostrom paper in like 2005 or 6, my first one, and just immediately became enamored with the stuff he was talking about — it’s just baldly interesting. Like anyone who hears about this stuff can’t help but be interested in it. And so originally, the point of the podcast was, “Hey, everybody come check this out. Isn’t this interesting? There’s like, people actually thinking about this kind of stuff and talking about it.” And then as I started to interview some of the guys at the Future of Humanity Institute, started to read more and more papers and research further, I realized, wait, this isn’t just like, intellectually interesting. This is real stuff. We’re actually in real danger here.

And so as I was creating the series, I underwent this transition for how I saw existential risks, and then ultimately how I saw humanity’s future, how I saw humanity, other people, and I kind of came to love the world a lot more than I did before. Not like I disliked the world or people or anything like that. But I really love people way more than I did before I started out, just because I see that we’re kind of close to the edge here. And so the point of why I made the series kind of underwent this transition, and you can kind of tell in the series itself where it’s like information, information, information. And then now, that you have bought into this, here’s how we do something about it.

Ariel: So you have two episodes that go into biotechnology and artificial intelligence, which are two — especially artificial intelligence — they’re both areas that we work on at FLI. And in them, what I thought was nice is that you do get into some of the reasons why we’re still pursuing these technologies, even though we do see these existential risks around them. And so, I was curious, as you were doing your research into the series, what did you learn about, where you were like, “Wow, that’s amazing, that I’m so psyched that we’re doing this, even though there are these risks.”

Josh: Basically everything I learned about. I had to learn particle physics to explain what’s going on in large Hadron Collider. I had to learn a lot about AI. I realized when I came into it, that my grasp of AI was beyond elementary. And it’s not like I could actually put together a AGI myself from scratch or anything like that now, but I definitely know a lot more than I did before. With biotech in particular, there was a lot that I learned that I found particularly jarring with the number of accidents that are reported every year, and then more than that, the fact that not every lab in the world has to report accidents. I found that extraordinarily unsettling.

So kind of from start to finish, I learned a lot more than I knew going into it, which is actually one of the main reasons why it took me well over a year to make the series because I would start to research something and then I’d realized I need to understand the fundamentals of this. So I’d go understand, I’d go learn that, and then there’d be something else I had to learn first, before I could learn something the next level up. So I kept having to kind of regressively research and I ended up learning quite a bit of stuff.

But I think to answer your question, the thing that struck me the most was learning about physics, about particle physics, and how tenuous our understanding of our existence is, but just how much we’ve learned so far in just the last like century or so, when we really dove into quantum physics, particle physics and just what we know about things. One of the things that just knocked my socks off was the idea that there’s no such thing as particles — like particles, as we think of them are just basically like shorthand. But the rest of the world outside of particle physics has said like, “Okay, particles, there’s like protons and neutrons and all that stuff. There’s electrons. And we understand that they kind of all fit into this model, like a solar system. And that’s how atoms work.”

That is not at all how atoms work, like a particle is just a pack of energetic vibrations and everything that we experience and see and feel, and everything that goes on in the universe is just the interaction of these energetic vibrations in force fields that are everywhere at every point in space and time. And just to understand that, like on a really fundamental level, changed my life actually, changed the way that I see the universe and myself and everything actually.

Ariel: I don’t even know where I want to go next with that. I’m going to come back to that because I actually think it connects really nicely to the idea of existential hope. But first I want to ask you a little bit more about this idea of getting people involved more. I mean, I’m coming at this from something of a bubble at this point where I am surrounded by people who are very familiar with the existential risks of artificial intelligence and biotechnology. But like you said, once you start looking at artificial intelligence, if you haven’t been doing it already, you suddenly realize that there’s a lot there that you don’t know.

Josh: Yeah.

Ariel: I guess I’m curious, now that you’ve done that, to what extent do you think everyone needs to? To what extent do you think that’s possible? Do you have ideas for how we can help people understand this more?

Josh: Yeah you know, that really kind of ties into taking on existential risks in general, is just being an interested curious person who dives into the subject and learns as much as you can, but that at this moment in time, as I’m sure you know, that’s easier said than done. Like you really have to dedicate a significant portion of your life to spending time focusing on that one issue whether it’s AI, it’s biotech or particle physics, or nanotech, whatever. You really have to immerse yourself into it because it’s not a general topic of national or global conversation, the existential risks that we’re facing, and certainly not the existential risks we’re facing from all the technology that everybody’s super happy that we’re coming out with.

And I think that one of the first steps to actually taking on existential risks is for more and more people to start talking about it. Groups like yours, talking to the public, educating the public. I’m hoping that my series did something like that, just arousing curiosity in people, but also raising awareness of these things like, these are real things, these aren’t crackpots talking about this stuff. This is real, legitimate issues that are coming down the pike, that are being pointed out by real, legitimate scientists and philosophers and people who have given great thought about this. This isn’t like a chicken little situation; This is quite real. I think if you can pique someone’s curiosity just enough that they listen, stop and listen, do a little research, it sinks in after a minute that this is real. And that, oh, this is something that they want to be a part of doing something about.

And so I think just getting people talking about that just by proxy will interest other people who hear about it, and it will spread further and further out. And I think that that’s step one, is to just make it so it’s an okay thing to talk about, so you’re not nuts to raise this kind of stuff seriously.

Ariel: Well, I definitely appreciate you doing your series for that reason. I’m hopeful that that will help a lot.

Ariel: Now, Allison — you’ve got this website which, my understanding is that you’re trying to get more people involved in this idea that if we focus on these better ideals for the future, we stand a better shot at actually hitting them.

Allison: At ExistentialHope.com, I keep a map of reading, podcasts, organizations, and people that inspire an optimistic long-term vision for the future.

Ariel: You’re clearly doing a lot to try to get more people involved. What is it that you’re trying to do now, and what do you think we all need to be doing more of to get more people thinking this way?

Allison: I do think that it’s up to everyone, really, to try to, again, engage with the fact that we may not be doomed, and what may be on the other side. What I’m trying to do with the website, at least, is generating common knowledge to catalyze more directed coordination toward beautiful futures. I think that there’s a lot of projects out there that are really dedicated to identifying the threats to human existence, but very few really offer guidance on what to influence that. So I think we should try to map the space of both peril and promise which lie before us, but we should really try to aim for that this knowledge can empower each and every one of us to navigate toward the grand future.

For us currently on the website this involves orienting ourselves, so collecting useful models, and relevant broadcasts, and organizations that generate new insights, and then try to synthesize a map of where we came from, and a really kind of long perspective, and where we may go, and then which lenses of science and technology and culture are crucial to consider along the way. Then finally we would like to publish a living document that summarizes those models that are published elsewhere, to outline possible futures, and the idea is that this is a collaborative document. Even already, currently, the website links to a host of different Google docs in which we’re trying to really synthesize the current state of the art in the different focus areas. The idea is that this is collaborative. This is why it’s on Google docs, because everyone can just comment. And people do, and I think this should really be a collaborative effort.

Ariel: What are some of your favorite examples of content that, presumably, you’ve added to your website, that look at these issues?

Allison: There’s quite a host of things on there, I think, that a good start for people to go on the website is just to go on the overview. Because here I list kind of my top 10 lists about short pieces and long pieces, but my personal ones, I think, as a starting ground: I really like the metaethics sequence by Eliezer Yudkowsky. It contains a really good post, like Existential Angst Factory, and Reality as Fixed Computation. For me this is kind of like existentialism 2.0. Have to get your motivations and expectations right. What can I reasonably hope for? Then I think, relatedly, there’s also the Fan Sequence, also by Yudkowsky. But that together with, for example, Letter From Utopia by Nick Bostrom, or Hedonistic Imperative by David Pearce, or Post On Raikoth by Scott Alexander — they are really a nice next step because they actually lay out a few compelling positive versions of utopia.

Then if you want to get into the more nitty gritty there’s a longer section on civilization, its past and its future — so, what’s wrong and how to improve it. Here Nick Bostrom wrote this piece on the future of human evolution, which lays out two suboptimal paths for humanity’s future, and interestingly enough they don’t involve extinction. A similar one, I think, which probably many people are familiar with, is Scott Alexander’s Meditations On Moloch, and then some that people are less familiar with — Growing Children For Bostrom’s Disneyland. They are really interesting, because they are other pieces of this type, which are sketching out competitive and selective pressures that lead toward races to the bottom, as negative futures which don’t involve extinction per se. I think the really interesting thing, then, is that even those features are only bad if you think that the bottom is bad.

Next to them I list books, for example, like Robin Hanson, Age of M, which argues that living at subsistence may not be terrible, and in fact it’s pretty much what most of our past lives outside of the current dream time have always involved. So I think those are two really different lenses to make sense of the same reality, and I personally found this contrast so intriguing that I hosted a salon last year with Paul Christiano, Robin Hanson, Peter Eckersley, and a few others to kind of map out where we may be racing towards, so how bad those competitive equilibria actually are. I also link to those from the website.

To me it’s always interesting to map out one potentially possible future visions, and then try to find one either that contradicts or compliments it. I think having a good idea of an overview of those gives you a good map, or at least a space of possibilities.

Ariel: What do you recommend to people who are interested in trying to do more? How do you suggest they get involved?

Allison: One thing, an obvious thing, would be commenting on the Google Docs, and I really encourage everyone to do that. Another one would be just to join the mailing list. You can kind of indicate whether you want updates on me, or whether you want to collaborate, in which case we may be able to reach out to you. Or if you’re interested in meetups, they would only be in San Francisco so far, but I’m hoping that there may be others. I do think that currently the project is really in its infancy. We are relying on the community to help with this, so there should be a kind of collaborative vision.

I think that one of the main things that I’m hoping that people can get out of it for now is just to give some inspiration on where we may end up if we get it right, and on why work toward better futures, or even work toward preventing existential risks, is both possible and necessary. If you go on the website on the first section — the vision section — that’s what that section is for.

Secondly, then, if you are already opted in, if you’re already committed, I’m hoping that perhaps the project can provide some orientation. If someone would like to help but doesn’t really know where to start, the focus areas are an attempt to map out the different areas that we need to make progress on for better futures. Each area comes with an introductory text, and organizations that are working in that area that one can join or support, and Future of Life is in a lot of those areas.

Then I think finally, just apart from inspiration or orientation, it’s really a place for collaboration. The project is in its infancy and everyone should contribute their favorite pieces to our better futures.

Ariel: I’m really excited to see what develops in the coming year for existentialhope.com. And, naturally, I also want to hear from Max and Anthony about 2019. What are you looking forward to for FLI next year?

Max: For 2019 I’m looking forward to more constructive collaboration on many aspects of this quest for a good future for everyone on earth. At the nerdy level, I’m looking forward to more collaboration on AI’s safety research and also ways of making the economy, that keeps growing thanks to AI, actually make everybody better off, rather than some people poorer and angrier. And at the most global level really looking forward to working harder to get past this outdated us versus them attitude that we still have between the US and China and Russia and other major powers. Many of our political leaders are so focused on the zero sum game mentality that they will happily risk major risks of nuclear war and AI arms races and other outcomes where everybody would lose, instead of just realizing hey, you know, we’re actually in this together. What does it mean for America to win? It means that all Americans get better off. What does it mean for China to win? It means that the Chinese people all get better off. Those two things can obviously happen at the same time as long as there’s peace, and technology just keeps improving life for everybody.

In practice, I’m very eagerly looking forward to seeing if we can get scientists from around the world — for example, AI researchers — to converge on certain shared goals that are really supported everywhere in the world, including by political leaders and in China and the US and Russia and Europe and so on, instead of just obsessing about the differences. Instead of thinking us versus them, it’s all of us on this planet working together against the common enemy, which is our own stupidity and the tendency to make bad mistakes, so that we can harness this powerful technology to create a future where everybody wins.

Anthony: I would say I’m looking forward to more of what we’re doing now, thinking more about the futures that we do want. What exactly do those look like? Can we really think through pictures of the future that makes sense to us that are attractive, that are plausible, and yet aspirational, and where we can identify things and systems and institutions that we can build now toward the aim of getting us to those futures? I think there’s been a lot of, so far, thinking about what are the major problems that might arise, and I think that’s really, really important, and that project is certainly not over, and it’s not like we’ve avoided all of those pitfalls by any means, but I think it’s important not to just not fall into the pit, but to actually have a destination that we’d like to get to — you know, the resort at the other end of the jungle or whatever.

I find it frustrating a bit when people do what I’m doing now: they talk about talking about what we should and shouldn’t do. But they don’t actually talk about what we should and shouldn’t do. I think the time has come to actually talk about it in the same way that when… there was the first use of CRISPR in a embryo that came to term. So everybody’s talking about, “Well, we need to talk about what we should and shouldn’t do with this. We need to talk about that, we need to talk about it.” Let’s talk about it already.

So I’m excited about upcoming events that FLI will be involved in that are explicitly thinking about: let’s talk about what that future is that we would like to have and let’s debate it, let’s have that discussion about what we do want and don’t want, try to convince each other and persuade each other of different visions for the future. I do think we’re starting to actually build those visions for what institutions and structures in the future might look like. And if we have that vision, then we can think of what are the things we need to put in place to have that.

Ariel: So one of the reasons that I wanted to bring Gaia on is because I’m working on a project with her — and it’s her project — where we’re looking at this process of what’s known as worldbuilding, to sort of look at how we can move towards a better future for all. I was hoping you could describe it, this worldbuilding project that I’m attempting to help you with, or work on with you. What is worldbuilding, and how are you modifying it for your own needs?

Gaia: Yeah. Worldbuilding is a really fascinating set of techniques. It’s a process that has its roots in narrative fiction. You can think of, for example, the entire complex world that J.R.R. Tolkien created for The Lord of the Rings series, for example. And in more contemporary times, some spectacularly advanced worldbuilding is occurring now in the gaming industry. So these huge connected systems of systems that underpin worlds in which millions of people today are playing, socializing, buying and selling goods, engaging in an economy. These are these vast online worlds that are not just contained on paper as in a book, but are actually embodied in software. And over the last decade, world builders have begun to formally bring these tools outside of the entertainment business, outside of narrative fiction and gaming, film and so on, and really into society and communities. So I really define worldbuilding as a powerful act of creation.

And one of the reasons that it is so powerful is that it really facilitates collaborative creation. It’s a collaborative design practice. And in my personal definition of worldbuilding, the way that I’m thinking of it, and using it, is that it unfolds in four main stages. The first stage is: we develop a foundation of shared knowledge that’s grounded in science, and research, and relevant domain expertise. And the second phase is building on that foundation of knowledge. We engage in an exercise where we predict how the interconnected systems that have emerged in this knowledge database — we predict how they will evolve. And we imagine the state of their evolution at a specific point in the future. Then the third phase is really about capturing that state in all its complexity, and making that information useful to the people who need to interface with it. And that can be in the form of interlinked databases and particularly also in the form of visualizations, which help make these sort of abstract ideas feel more present and concrete. And then the fourth and final phase is then utilizing that resulting world as a tool that can be used to support scenario simulation, research, and development in many different areas including public policy, media production, education, and product development.

I mentioned that these techniques are being brought outside of the realm of entertainment. So rather than just designing fantasy worlds for the sole purpose of containing narrative fiction and stories, these techniques are now being used with communities, and Fortune 500 companies, and foundations, and NGOs, and other places, to create plausible future worlds. It’s fascinating to me to see how these are being used. For example, they’re being used to reimagine the mission of an organization. They’re being used to plan for the future, and plan around a collective vision of that future. They’re very powerful for developing new strategies, new programs, and new products. And I think to me one of the most interesting things is really around informing policy work. That’s how I see worldbuilding.

Ariel: Are there any actual examples that you can give or are they proprietary?

Gaia: There are many examples that have created some really incredible outcomes. One of the first examples of worldbuilding that I ever learned about was a project that was done with a native Alaskan tribe. And the comments that came from the tribe and about that experience were what really piqued my interest. Because they said things like, “This enabled us to sort of leap frog over the barriers in our current thinking and imagine possibilities that were sort of beyond what we had considered.” This project brought together several dozen members of the community, again, to engage in this collaborative design exercise, and actually visualize and build out those systems and understand how they would be interconnected. And it ended up resulting in, I think, some really incredible things. Like a partnership with MIT where they brought a digital fabrication lab onto their reservation, and created new education programs around digital design and digital fabrication for their youth. And there’s a lot of other things that are still coming out of that particular worldbuild.

There are other examples where Fortune 500 companies are building out really detailed, long-term worldbuilds that are helping them stay relevant, and imagine how their business model is going to need to transform in order to adapt to really plausible, probable futures that are just around the corner.

Ariel: I want to switch now to what you specifically are working on. The project we’re looking at is looking roughly 20 years into the future. And you’ve sort of started walking through a couple systems yourself while we’ve been working on the project. And I thought that it might be helpful if you could sort of walk through, with us, what those steps are to help understand how this process works.

Gaia: Maybe I’ll just take a quick step back, if that’s okay and just explain the worldbuild that we’re preparing for.

Ariel: Yeah. Please do.

Gaia: This is a project called Augmented Intelligence. The first Augmented Intelligence summit is happening in March in 2019. And our goal with this project is really to engage with and shift the culture, and also our mindset, about the future of artificial intelligence. And to bring together a multidisciplinary group of leaders from government, academia, and industry, and to do a worldbuild that’s focused on this idea of: what does our future world look like with advanced AI deeply integrated into it? And to go through the process of really imagining and predicting that world in a way that’s just a bit further beyond the horizon that we normally see and talk about. And that exercise, that’s really where we’re getting that training for long-term thinking, and for systems level thinking. And the world that results — our hope is that it will allow us to develop better intuitions, to experiment, to simulate scenarios, and really to have a more attuned capacity to engage in many ways with this future. And ultimately explore how we want to evolve our tools and our society to meet that challenge.

Gaia: What will come out of this process — it really is a generative process that will create assets and systems that are interconnected, that inhabit and embody a world. And this world should allow us to experiment, and simulate scenarios, and develop a more attuned capacity to engage with the future. And that means on both an intuitive level and also in a more formal structured way. And ultimately our goal is to use this tool to explore how we want to evolve as a society, as a community, and to allow ideas to emerge about what solutions and tools will be needed to adapt to that future. Our goal is to really bootstrap a steering mechanism that allows us to navigate more effectively toward outcomes that support human flourishing.

Ariel: I think that’s really helpful. I think an example to walk us through what that looks like would be helpful.

Gaia: Sure. You know, basically what would happen in a worldbuilding process is that you would have some constraints or some sort of seed information that you think is very likely — based on research, based on the literature, based on sort of the input that you’re getting from domain experts in that area. For example, you might say, “In the future we think that education is all going to happen in a virtual reality system that’s going to cover the planet.” Which I don’t think is actually the case, but just to give an example. You might say something like, “If this were true, then what are the implications of that?” And you would build a set of systems, because it’s very difficult to look at just one thing in isolation.

Because as soon as you start to do that — John Muir says, “As soon as you try to look at just one thing, you find that it is irreversibly connected to everything else in the universe.” And I apologize to John Muir for not getting that quote exactly correct, he says it much more eloquently than that. But the idea is there. And that’s sort of what we leverage in a worldbuilding process: where you take one idea and then you start to unravel all of the implications, and all of the interconnecting systems that would be logical, and also possible, if that thing were true. It really does depend on the quality of the inputs. And that’s something that we’re working really, really hard to make sure that our inputs are believable and plausible, but don’t put too much in terms of constraints on the process that unfolds. Because we really want to tap into the creativity in the minds of this incredible group of people that we’re gathering, and that is where the magic will happen.

Ariel: To make sure that I’m understanding this right: if we use your example of, let’s say all education was being taught virtually, I guess questions that you might ask or you might want to consider would be things like: who teaches it, who’s creating it, how do students ask questions, who would their questions be directed to? What other types of questions would crop up that we’d want to consider? Or what other considerations do you think would crop up?

Gaia: You also want to look at the infrastructure questions, right? So if that’s really something that is true all over the world, what do server farms look like in that future, and what’s the impact on the environment? Is there some complimentary innovation that has happened in the field of computing that has made computing far more efficient? How have we been able to do this? Given the — there are certain physical limitations that just exist on our planet. If X is true in this interconnected system, then how have we shaped, and molded, and adapted everything around it to make that thing true? You can look at infrastructure, you can look at culture, you can look at behavior, you can look at, as you were saying, communication and representation in that system and who is communicating. What are the rules? I mean, I think a lot about the legal framework, and the political structure that exists around this. So who has power and agency? How are decisions made?

Ariel: I don’t know what this says about me, but I was just wondering what detention looks like in a virtual world.

Gaia: Yeah. It’s a good question. I mean, what are the incentives and what are the punishments in that society? And do our ideas of what incentives and punishments look like actually change in that context? There isn’t a place where you can come on a Saturday if there’s no physical school yard. How is detention even enforced when people can log in and out of the system at will?

Ariel: All right, now you have me wondering what recess looks like.

Gaia: So you can see that there are many different fascinating sort of rabbit holes that you could go down. And of course our goal is to really make this process really useful to imagining the way that we want our policies, and our tools, and our education to evolve.

Ariel: I want to ask one more question about … Well, it’s sort of about this but there’s also a broader aspect to it. And that is, I hear a lot of talk — and I’m one of the people saying this because I think it’s absolutely true — that we need to broaden the conversation and get more diverse voices into this discussion about what we want our future to look like. But what I’m finding is that this sounds really nice in theory, but it’s incredibly hard to actually do in practice. I’m under the impression that that is some of what you’re trying to address with this project. I’m wondering if you can talk a little bit about how you envision trying to get more people involved in considering how we want our world to look in the future.

Gaia: Yeah, that’s a really important question. One of the sources of inspiration for me on this point was a conversation with Stuart Russell — an interview with Stuart Russell, I should say — that I listened to. We’ve been really fortunate and we are thrilled that he’s one of our speakers and he’ll be involved in the worldbuilding process. And he kind of talks about this idea that the artificial intelligence researchers, the roboticists, even a few technologists that are building these amplifying tools that are just increasing in potency year over year, are not the only ones who need to have input into the conversation around how they’re utilized and the implications on all of us. And that’s really one of the sort of core philosophies behind this particular project, is that we really want it to be a multidisciplinary group that comes together, and we’re already seeing that. We have a really wonderful set of collaborators who are thinking about ethics in this space, and who are thinking about a broader definition of ethics, and different cultural perspectives on ethics. And how we can create a conversation that allows space for those to simultaneously coexist.

Allison: I recently had a similar kind of question that arose in conversation, which was about: why are we lacking positive future visions so much? Why are we all kind of stuck in a snapshot of the current suboptimal macro situation? I do think it’s our inability to really think in larger terms. If you look at our individual human life, clearly for most of us, it’s pretty incredible — our ability to lead much longer and healthier lives than ever before. If we compare this to how well humans used to live, this difference is really unfathomable. I think Yuval Harari said it right, he said “You wouldn’t want to have lived 100 years ago.” I think that’s correct. On the other hand I also think that we’re not there yet.

I find it, for example, pretty peculiar that we say that we value freedom of choice in everything we do, but in the one thing that’s kind of the basis of all of our freedoms, which is our very existence, we leave it up again to slowly deteriorate according to aging. This would really deteriorate ourselves and everything we value. I think that every day aging is burning libraries. We’ve come a long way, but we’re not safe, and we are definitely not there yet. I think the same holds true for civilization at large. I think thanks to a lot of technologies our living standards have been getting better and better, and I think the decline of poverty and violence are just a few examples.

We can share knowledge much easier, and I think everyone who’s read Enlightenment Now will be kind of tired of those graphs, but again, I also think that we’re not there yet. I think even though we have less wars than ever before, the ability to wipe ourselves out as a species also really exists, and I think in fact this ability is now more available to more people, and with technologies of maturity, it may really only take a small and well-curated group of individuals to cause havoc of catastrophic consequences. If you let that sink in, it’s really absurd that we have no emergency plan for the use of technological weapons. We have no plans to rebuild civilization. We have no plans to back up human life.

I think that current news articles take too much of a short term view. They’re more a snapshot. I think the long-term view, on the one hand, opens up this eye of, “Hey, look how far we’ve come,” but also, “Oh man. We’re here, and we’ve made it so far, but there’s no feasible plan for safety yet.” I do think we need to change that, so I think the long run doesn’t only open up rosy glasses, but also the realization that we ought to do more because we’ve come so far.

Josh: Yeah, one of the things that makes this time so dangerous is we’re at this kind of a fork in the road, where if we go this one way, like say, with figuring out how to develop friendliness in AI, we could have this amazing, astounding future for humanity that stretches for billions and billions and billions of years. One of the things that really opened my eyes was, I always thought that the heat death of the universe will spell the end of humanity. There’s no way we’ll ever make it past that, because that’s just the cessation of everything that makes life happen, right? And we will probably have perished long before that. But let’s say we figured out a way to just make it to the last second and humanity dies at the same time the universe does. There’s still an expiration date on humanity. We still go extinct eventually. But one of the things I ran across when I was doing research for the physics episode is that the concept of growing a universe from seed, basically, in a lab is out there. It’s done. I don’t remember who came up with it. But somebody has sketched out basically how to do this.

It’s 2018. If we think 100 or 200 or 500 or a thousand years down the road and that concept can be built upon and explored, we may very well be able to grow universes from seed in laboratories. Well, when our universe starts to wind down or something goes wrong with it, or we just want to get away, we could conceivably move to another universe. And so we suddenly lose that expiration date for humanity that’s associated with the heat death of the universe, if that is how the universe goes down. And so this idea that we have a future lifetime that spans into at least the multiple billions of years — at least a billion years if we just manage to stay alive on Planet Earth and never spread out but just don’t actually kill ourselves — when you take that into account the stakes become so much higher for what we’re doing today.

Ariel: So, we’re pretty deep into this podcast, and we haven’t heard anything from Anders Sandberg yet, and this idea that Josh brought up ties in with his work. Since we’re starting to talk about imagining future technologies, let’s meet Anders.

Anders: Well, I’m delighted to be on this. I’m Anders Sandberg. I’m a senior research fellow at The Future of Humanity Institute at University of Oxford.

Ariel: One of the things that I love, just looking at your FHI page, you talk about how you try to estimate the capabilities of future technology. I was hoping you could talk a little bit about what that means, what you’ve learned so far, how one even goes about studying the capabilities of future technologies?

Anders: Yeah. It is a really interesting problem because technology is based on ideas. As a general rule, you cannot predict what ideas people will come up with in the future, because if you could, you would already kind of have that idea. So this means that, especially technologies that are strongly dependent on good ideas, are going to be tremendously hard to predict. This is of course why artificial intelligence is a little bit of a nightmare. Similarly, biotechnology is strongly dependent on what we discover in biology and a lot of that is tremendously weird, so again, it’s very unpredictable.

Meanwhile, other domains of life are advancing at a more sedate pace. It’s more like you incrementally improve things. So the ideas are certainly needed, but we don’t really change everything around. If you think about more slower, microprocessors are getting better and a lot of improvements are small, incremental ones. Some of them require a lot of intelligence to come up with, but in the end it all sums together. It’s a lot of small things adding together. So you can see a relatively smooth development in the large.

Ariel: Okay. So what you’re saying is we don’t just have each year some major discovery, and that’s what doubles it. It’s lots of little incremental steps.

Anders: Exactly. But if you look at the performance of some software, quite often it goes up smoothly because the computers are getting better and then somebody has a brilliant idea that can do it not just in 10% less time, but maybe in 10% of the time that it would have taken. For example, the fast Fourier transform that people invented in the 60s and 70s enables the compression we use today for video and audio and enables multimedia on the internet. Without that to speed up, it would not be practical to do, even with current computers. This is true for a lot of things in computing. You get a surprise insight and the problem that previously might be impossible to do efficiently suddenly becomes quite convenient. So the problem is of course: what can we say about the abilities of future technology if these things happen?

One of the nice things you can do is you can lean on the laws of physics. There are good reasons not to think that perpetual motion machines can work, because we understand, actually, energy conservation and the laws of thermodynamics that give very strong reason why this cannot happen. We can be pretty certain that that’s not possible. We can analyze what would then be possible if you had perpetual motion machines or faster than light transport and you can see that some of the consequences are really weird. But it makes you suspect that this is probably not going to happen. So that’s one way of looking at it. But you can do the reverse: You can take laws of physics and engineering that you understand really well and make fictional machines — essentially work out all the details and say “okay, I can’t build this but were I to build it, in that case what properties would it have?” If I wanted to build, let’s say, a machine made out of atoms, could I make it to work? And it turns out that this is possible to do in a rigorous way, and it tells you capabilities about machines that don’t exist yet, and maybe we will never build, but it shows you what’s possible.

This is what Eric Drexler did for nanotechnology in the 80s and 90s. He basically worked out what would be possible if we could put atoms in the right place. He could demonstrate that this would produce machines of tremendous capability. We still haven’t built them, but he proved that these can be built — and we probably should build them because they are so effective, so environmentally friendly, and so on.

Ariel: So you gave the example of what he came up with a while back. What sort of capabilities have you come across that you thought were interesting that you’re looking forward to us someday pursuing?

Anders: I’ve been working a little bit on the questions about “is it possible to settle a large part of the universe?” I have been working out, together with my colleagues, a bit of the physical limitations of that. All in all, we found that a civilization doesn’t need to use an enormous, astronomical amount of matter and energy to settle a very large chunk of the universe. The total amount of matter corresponds with roughly a Mercury-sized planet in a solar system in each of the galaxies. Many people would say if you want to settle the universe you need an enormous spacecraft and you need enormous amount of energy. It looks like you would be able to see that across half of the universe, but we could demonstrate that actually if you essentially use matter from a really big asteroid or a small planet, you can get enough solar collectors to launch small spacecraft to all the stars and all the galaxies within reach and there you’ll use again a bit of asteroids to do it. The laws of physics allow intelligent life to spread across an enormous amount of the universe in a rather quiet way.

Ariel: So does that mean you think it’s possible that there is life out there and it’s reasonable for us not to have found it?

Anders: Yes. If we were looking at the stars, we would probably miss if one or two stars in remote galaxies were covered with solar collectors. It’s rather easy to miss them among the hundreds of billions of other stars. This was actually the reason we did this paper: We demonstrate that much of the thinking about the Fermi paradox — that annoying question that well, there ought to be a lot of intelligent life out in the universe given how large it is and that we tend to think that it’s relatively likely yet we don’t see anything — many of those explanations are based on the possibility of colonizing just the Milky Way. In this paper, we demonstrate that actually you need to care about all the other galaxies too. In a sense, we made the fermi paradox between a million and a billion times worse. Of course, this is all in a day’s work for us in the Philosophy Department, making everybody’s headaches bigger.

Ariel: And now it’s just up to someone else to figure out the actual way to do this technically.

Anders: Yeah, because it might actually be a good idea for us to do.

Ariel: So Josh, you’ve mentioned the future of humanity a couple of times, and humanity in the future, and now Anders has mentioned the possibility of colonizing space. I’m curious how you think that might impact humanity. How do you define humanity in the future?

Josh: I don’t know. That’s a great question. It could take any number of different routes. I think — Robin Hanson is an economist who came up with this, the great filter hypothesis, and I talked to him about that very question. His idea was that — and I’m sure it’s not just his, but it’s probably a pretty popular idea — that once we spread out from Earth and start colonizing further and further out into the galaxy, and then into the universe, we’ll undergo speciation events like, there will be multiple species of humans in the universe again, just like there was like 50,000 years ago, when we shared Earth with multiple species of humans.

The same thing is going to happen as we spread out from Earth. I mean, I guess the question is, which humans are you talking about, in what galaxy? I also think there’s a really good chance — and this could happen among multiple human species — that at least some humans will eventually shed their biological form and upload themselves into some sort of digital format. I think if you just start thinking in efficiencies, that’s just a logical conclusion to life. And then there’s any number of routes we could take and change especially as we merge more with technology or spread out from Earth and separate ourselves from one another. But I think the thing that really kind of struck me as I was learning all this stuff is that we tend to think of ourselves as the pinnacle of evolution, possibly the most intelligent life in the entire universe, right? Certainly the most intelligent on Earth, we’d like to think. But if you step back and look at all the different ways that humans can change, especially like the idea that we might become post-biological, it becomes clear that we’re just a point along a spectrum that keeps on stretching out further and further into the future than it does even into the past.

We’re just at a current situation on that point right now. We’re certainly not like the end-all be-all of evolution. And ultimately, we may take ourselves out of evolution by becoming post-biological. It’s pretty exciting to think about all the different ways that it can happen, all the different routes we can take — there doesn’t have to just be one single one either.

Ariel: Okay, so, I kind of want to go back to some of the space stuff a little bit, and Anders is the perfect person for my questions. I think one of the first things I want to ask is, very broadly, as you’re looking at these different theories about whether or not life might exist out in the universe and that it’s reasonable for us not to have found it, do you connect the possibility that there are other life forms out there with an idea of existential hope for humanity? Or does it cause you concern? Or are they just completely unrelated?

Anders: The existence of extraterrestrial intelligence: if we knew they existed that would in some sense be hopeful because we know the universe allows for more than our kind of intelligence and intelligence might survive over long spans of time. If we just discovered that we’re all alone and a lot of ruins from extinct civilizations, that would be very bad news for us. But we might also have this weird situation that we currently feel, that we don’t see anybody. We don’t notice any ruins; Maybe we’re just really unique and should perhaps feel a bit proud or lucky but also responsible for a whole universe. It’s tricky. It seems like we could learn something very important if we understood how much intelligence there is out there. Generally, I have been trying to figure out: is the absence of aliens evidence for something bad? Or might it actually be evidence for something very hopeful?

Ariel: Have you concluded anything?

Anders: Generally, our conclusion has been that the absence of aliens is not surprising. We tend to think that the Fermi Paradox implies “oh, there’s something strange here.” The universe is so big and if you multiply the number of stars with some reasonable probability, you should get loads of aliens. But actually, the problem here is reasonable probability. We normally tend to think of that as something like bigger than one chance in a million or so, but actually, there is no reason the laws of physics wouldn’t put a probability that’s one in a googol. It actually turns out that we’re uncertain enough about the origin of life and the origins of intelligence and other forms of complexity that it’s not implausible that maybe we are the only life within the visible universe. So we shouldn’t be too surprised about that empty sky.

One possible reason for the great silence is that life is extremely rare. Another possibility might be that life is not rare but it’s very rare that it becomes the kind of life that evolves to complex nervous systems. Another reason might be of course that once you get intelligence, well, it destroys itself relatively quickly, and Robin Hanson has called this the Great Filter. We know that one of the terms in the big equation for the number of civilizations in the universe needs to be very small; otherwise, the sky would be full of aliens. But is that one of the early terms, like the origin of life, or the origin of intelligence — or the late term, how long intelligence survives? Now, if there is an early Great Filter, this is rather good news for us. We are going to be very unique and maybe a bit lonely, but, it doesn’t tell us anything dangerous about our own chances. Of course, we might still flub it and go extinct because our own stupidity but that’s kind of up to us rather than the laws of physics.

On the other hand, if it turns out that there is a late Great Filter, then even though we know the universe might be dangerous, we’re still likely to get wiped out — which is very scary. So, figuring out where the unlikely terms in the big equation are is actually quite important for making a guess about our own chances.

Ariel: Where are we now in terms of that?

Anders: Right now, in my opinion — I have a paper, not published yet but it’s in the review process, where we try to apply proper uncertainty calculations to this. Because many people make guesstimates about the probabilities of various things, admit that they’re guesstimates, and then get a number at the end that we also admit is a bit uncertain. But we haven’t actually done a proper uncertainty calculation so quite a lot of these numbers become surprisingly biased. So instead of saying that maybe there’s one chance in a million that a planet develops life, you should try to have a full range of what’s the lowest probability there could be for life and what’s the highest probability and how do you think it distributes between them. If you use that kind of proper uncertainty range and then multiply it all together and do the maths right, then you get the probability distribution for how many alien species there could be in the universe. Even if you’re starting out as somebody who’s relatively optimistic about the mean value of all of this, you will still find that you get a pretty big chunk of probability that we’re actually pretty alone in the Milky Way or even the observable universe.

In some sense, this is just common sense. But it’s a very nice thing to be able to quantify the common sense, and then start saying: so what happens if we for example discover that there is life on Mars? What will that tell us? How will that update things? You can use the math to calculate that, and this is what we’ve done. Similarly, if we notice that there doesn’t seem to be any alien super civilizations around the visible universe, that’s a very weak update but you can still use that to see that this updates our estimates of the probability of life and intelligence much more than the longevity of civilizations.

Mathematically this gives us a reason to think that the Great Filter might be early. The absence of life might be rather good news for us because it means that once you get intelligence, there’s no reason why it can’t persist for a long time and grow into something very flourishing. That is a really good cause of existential hope. It’s really promising, but we of course need to do our observations. We actually need to look for life, we need to look out in the sky and see. You may find alien civilizations. In the end, any amount of mathematics and armchair astrobiology, that’s always going to be disproven by any single observation.

Ariel: That comes back to a question that came to mind a bit earlier. As you’re looking at all of this stuff and especially as you’re looking at the capabilities of future technologies, once we figure out what possibly could be done, can you talk a little bit about what our limitations are today from actually doing it? How impossible is it?

Anders: Well, impossible is a really tricky word. When I hear somebody say “it’s impossible,” I immediately ask “do you mean against the laws of physics and logic” or “we will not be able to do this for the foreseeable future” or “we can’t do it within the current budget”?

Ariel: I think maybe that’s part of my question. I’m guessing a lot of these things probably are physically possible, which is why you’ve considered them, but yeah, what’s the difference between what we’re technically capable of today and what, for whatever reason, we can’t budget into our research?

Anders: We have a domain of technologies that we already have been able to construct. Some of them are maybe too expensive to be very useful. Some of them still requires a bunch of grad students holding them up and patching them as they are breaking all the time, but we can kind of build them. And then there’s some technology that we are very robustly good at. We have been making cog wheels and combustion engines for decades now and we’re really good at that. Then there are these technologies that we can do exploratory engineering to demonstrate that if we actually had cog wheels made out of pure diamond or the Dyson shell surrounding the sun collecting energy, they could do the following things.

So they don’t exist as practical engineering. You can work out blueprints for them and in some sense of course, once we have a complete enough blueprint, if you asked could you build the thing, you could do it. The problem is of course normally you need the tools and resources for that, and you need to make the tools to make the tools, and the tools to make those tools, and so on. So if we wanted to make atomically precise manufacturing today, we can’t jump straight to it. What we need to make is a tool that allows us to build things that are moving us much closer.

The Wright Brothers’ airplane was really lousy as an airplane but it was flying. It’s a demonstration, but it’s also a tool that allows you to make a slightly better tool. You would want to get through this and you’d probably want to have a roadmap and do experiments and figure out better tools to do that.

This is typically where scientists actually have to give way to engineers. Because engineers care about solving a problem rather than being the most elegant about it. In science, we want to have this beautiful explanation of how everything works; Then we do experiments to test whether it’s true and refine our explanation. But in the end, the paper that gets published is going to be the one that has the most elegant understanding. In engineering, the thing that actually sells and changes the world is not going to be the most elegant thing but the most useful thing. The AK-47 is in many ways not a very precise piece of engineering but that’s the point. It should be possible to repair it in the field.

The reason our computers are working so well was we figured out the growth path where you use photolithography to etch silicon chips, and that allowed us to make a lot of them very cheaply. As we learned more and more about how to do that, they became cheaper and more capable and we developed even better ways of etching them. So in order to build molecular nanotechnology, you would need to go through a somewhat similar chain. It might be that you start out with using biology to make proteins, and then you use the proteins to make some kind of soft machinery, and then you use that soft machinery to make hard machinery, and eventually end up with something like the work of Eric Drexler.

Ariel: I actually want to step back to the present now and you mentioned computers and we’re doing them very well. But computers are also an example of — or maybe software I suppose is more the example — of technology that works today but it often fails. Especially when we’re considering things like AI safety in the future, what should we make of the fact that we’re not designing software to be more robust? I mean, I think especially if we look at something like airplanes which are quite robust, we can see that it could be done but we’re still choosing not to.

Anders: Yeah, nobody would want to fly with an airplane that crashed as often as a word processor.

Ariel: Exactly.

Anders: It’s true that the earliest airplanes were very crash prone — in fact most of them were probably as bad as our current software is. But the main reason we’re not making software better is that most of the time we’re not willing to pay for that quality. Also, that there is some very hard engineering problems with engineering complexity. So making a very hard material is not easy but in some sense, it’s a straightforward problem. If, on the other hand, you have literally billions of moving pieces that all need to fit together, then it gets tricky to make sure that this always works as it should. But it can be done.

People have been working on mathematical proofs that certain pieces of software are correct and secure. It’s just that up until recently, it’s been so expensive and tough that nobody really cared to do it except maybe some military groups. Now it’s starting to become more and more essential because we’ve built our entire civilization on a lot of very complex systems that are unfortunately very insecure, very unstable, and so on. Most of the time we get around it by making backup copies and whenever a laptop crashes, well, we reboot it, swear a bit and hopefully we haven’t lost too much work.

That’s not always a bad solution — a lot of biology is like that too. Cells in our bodies are failing all the time but they’re just getting removed and replaced and then we try again. But this, of course, is not enough for certain sensitive applications. If we ever want to have brain-to-computer interfaces, we certainly want to have good security so we don’t get hacked. If we want to have very powerful AI systems, we want to make sure that their motivations are constrained in such a way that they’re helpful. We also want to make sure that they don’t get hacked or develop weird motivations or behave badly because their owners told them to behave badly. Those are very complex problems: It’s not just like engineering something that’s simply safe. You’re going to need entirely new forms of engineering for that kind of learning system.

This is something we’re learning. We haven’t been building things like software for very long and when you think about the sheer complexity of a normal operating system, even a small one running on a phone, it’s kind of astonishing that it works at all.

Allison: I think that Eliezer Yudkowsky once said that the problem of our complex civilization is its complexity. It does seem that technology is outpacing our ability to make sense of it. But I think we have to remind ourselves again of why we developed those technologies in the first place, and of the tremendous promises if we get it right. Of course on the one hand I think solving problems that are created by technologies, for example, existential risks — or at least some of those, they require a few kind of non-technological aspects, especially human reasoning, sense-making, and coordination.

And  I’m not saying that we have to focus on one conception of the good. There are many conceptions of the good. There’s transhumanist futures, there’s cosmist futures, there’s extropian futures, and many, many more, and I think that’s fine. I don’t think we have to agree on a common conception just yet — in fact we really shouldn’t. But the point is not that we ought to settle soon, but that we have to allow into our lives again the possibility that things can be good, that good things are possible — not guaranteed, but they’re possible. I think to use technologies for good we really need a change of mindset, from pessimism to at least conditional optimism. And we need a plethora of those, right? It’s not going to be one of them.

I do think that in order to use technologies for good purposes, we really have to remind ourselves that they can be used for good, and that there are good outcomes in the first place. I genuinely think that often in our research, we put the cart before the horse in focusing solely on how catastrophic human extinction would be. I think this often misses the point that extinction is really only so bad because the potential value that could be lost is so big.

Josh: If we can just make it to this point — Nick Bostrom, whose ideas a lot of The End of the World is based on, calls it technological maturity. It’s kind of a play on something that Carl Sagan said about the point we’re at now: “technological adolescence” is what Sagan called it, which is this point where we’re starting to develop this really intense, amazingly powerful technology that will one day be able to guarantee a wonderful, amazing existence for humanity, if we can survive to the point where we’ve mastered it safely. That’s what the next hundred or 200 or maybe 300 years stretches out ahead of us. That’s the challenge that we have in front of us. If we can make it to technological maturity, if we figure out how to make an artificial generalized intelligence that is friendly to humans, that basically exists to make sure that humanity is well cared for and taken care of, there’s just no telling what we’ll be able to come up with and just how vastly improved the life of the average human would be in that situation.

We’re talking — honestly, this isn’t like some crazy far out far future idea. This is conceivably something that we could get done as humans in the next century or two or three. Even if you talk out to 1000 years, that sounds far away. But really, that’s not a very long time when you consider just how far of a lifespan humanity could have stretching out ahead of it. The stakes: that makes me, almost gives me a panic attack when I think of just how close that kind of a future is for humankind and just how close to the edge we’re walking right now in developing that very same technology.

Max: The way I see the future of technology as we go towards artificial general intelligence, and perhaps beyond — it could totally make life the master of its own destiny, which makes this a very important time to stop and think what do we want this destiny to be? The more clear and positive vision we can formulate, I think the more likely it is we’re going to get that destiny.

Allison: We often seem to think that rather than optimizing for good outcomes, we should aim for maximizing the probability of an okay outcome, but I think for many people it’s more motivational to act on a positive vision, rather than one that is steered by risks only. To be for something rather than against something. To work toward a grand goal, rather than an outcome in which survival is success. I think a good strategy may be to focus on good outcomes.

Ariel: I think it’s incredibly important to remember all of the things that we are hopeful for for the future, because these are the precise reasons that we’re trying to prevent the existential risks, all of the ways that the future could be wonderful. So let’s talk a little bit about existential hope.

Allison: The term existential hope was coined by Owen Cotton-Barratt and Toby Ord to describe the chance of something extremely good happening, as opposed to an existential risk, which is a chance of something extremely terrible occurring. Kind of like describing a eucatastrophe instead of a catastrophe. I personally really agree with this line, because I think for me really it means that you can ask yourself this question of: do you think you can save the future? I think this question may appear at first pretty grandiose, but I think it’s sometimes useful to ask yourself that question, because I think if your answer is yes then you’ll likely spend your whole life trying, and you won’t rest, and that’s a pretty big decision. So I think it’s good to consider the alternative, because if the answer is no then you perhaps may be able to enjoy the little bit of time that you have on Earth rather than trying to spend it on making a difference. But I am not sure if you could actually enjoy every blissful minute right now if you knew that there was just a slight chance that you could make a difference. I mean, could you actually really enjoy this? I don’t think so, right?

I think perhaps we fail — and we do our best, but at the final moment something comes along that makes us go extinct anyways. But I think if we imagine the opposite scenario, in which we have not tried, and it turns out that we could have done something, an idea that we may have had or a skill we may have given was missing and it’s too late, I think that’s a much worse outcome.

Ariel: Is it fair for me to guess, then, that you think for most people the answer is that yes, there is something that we can do to achieve a more existential hope type future?

Allison: Yeah, I think so. I think that for most people there is at least something that we can be doing if we are not solving the wrong problems. But I do also think that this question is a serious question. If the answer for yourself is no, then I think you can really try to focus on having a life that is as good as it could be right now. But I do think that if the answer is yes, and if you opt in, then I think that there’s no space any more to focus on how terrible everything is. Because we’ve just confessed to how terrible everything is, and we’ve decided that we’re still going to do it. I think that if you opt in, really, then you can take that bottle of existential angst and worries that I think is really pestering us, and put it to the side for a moment. Because that’s an area you’ve dealt with and decided we’re still going to do it.

Ariel: The sentiment that’s been consistent is this idea that the best way to achieve a good future is to actually figure out what we want that future to be like and aim for it.

Max: On one hand, should be a no-brainer because that’s how we think about life as individuals. Right? I often get students walking into my office at MIT for career advice, and I always ask them about their vision for the future, and they always tell me something positive. They don’t walk in there and say, “Well, maybe I’ll get murdered. Maybe I’ll get cancer. Maybe I’ll …” because they know that that’s a really ridiculous approach to career planning. Instead, they envision the positive future, their aspiring things, so that we can constructively think about the challenges, the pitfalls to be avoided, and a good strategy for getting there.

Yet, as a species, we do exactly the opposite. We go to the movies and we watch Terminator, or Blade Runner, or yet another dystopic future vision that just fills us with fear and sometimes paranoia or hypochondria, when what we really need to do, as a species, is the same thing as we need to do as individuals: envision a hopeful, inspiring future that we want to rally around. It’s a well known historical fact, right, that the secret to get more constructive collaboration is to develop a shared positive vision. Why is Silicon Valley in California and not in Uruguay or Mongolia? Well, it’s because in the 60s, JFK articulated this really inspiring vision — going to space — which lead to massive investments in stem research and gave the US the best universities in the world and these amazing high tech companies, ultimately. Came from a positive vision.

Similarly, why is Germany now unified into one country instead of fragmented into many? Or Italy? Because of a positive vision. Why are the US states working together instead of having more civil wars against each other? Because of a positive vision of how much greater we’ll be if we work together. And if we can develop a more positive vision for the future of our planet, where we collaborate and everybody wins by getting richer and better off, we’re again much more likely to get that than if everybody just keeps spending their energy and time thinking about all the ways they can get screwed by their neighbors and all the ways in which things can go wrong — causing some self fulfilling prophecy basically, where we get a future with war and destruction instead of peace and prosperity.

Anders: One of the things I’m envisioning is that you can make a world where everybody’s connected but also connected on their own terms. Right now, we don’t have a choice. My smartphone gives me a lot of things but it also reports my location and a lot of little apps are sending my personal information to companies and institutions I have no clue about and I don’t trust. I think one important technology that might actually be that you do privacy-enhancing technologies. Many of the little near-field microchips we carry around, they also are indiscriminately reporting to nearby antennas what we’re doing. But you could imagine having a little personal firewall that actually blocks signals that you don’t approve of. You could actually have firewalls and ways of controlling the information leaving your smartphone or your personal space. And I think we actually need to develop that, both for security purposes but also to feel that we actually are in charge of our private lives.

Some of that privacy is a social convention. We agree on what is private and not: This is why we have certain rules about what you are allowed to do with a cell phone in a restaurant. You’re not going to have a conversation with somebody — that’s rude. And others are not supposed to listen to your restaurant conversations that you have with people in the restaurant, even though technically of course, it’s trivial. I think we are going to develop new interesting rules and new technologies to help implement these social rules.

Another area I’m really excited about is the ability to capture energy, for example, using solar collectors. Solar collectors are getting exponentially better and are becoming competitive in a lot of domains with traditional energy sources. But the most beautiful things is they can be made small, used in a distributed manner. You don’t need that big central solar farm even though it might be very effective. You can actually have little solar panels on your house or even on gadgets, if they’re energy efficient enough. That means that you both reduce the risk of a collective failure but also that you get a lot of devices that can now function independently of the grid.

Then I think we are probably going to be able to combine this to fight a lot of emergent biological threats. Right now, we still have this problem that it takes a long time to identify a new pathogen. But I think we’re going to see more and more distributed sensors that can help us identify it quickly, global networks that make the medical professional aware that something new has shown up, and hopefully also ways of very quickly brewing up vaccines in an automated manner when something new shows up.

My vision is that within one or two decades, if something nasty shows up, the next morning, everybody could essentially have a little home vaccine machine manufacture those antibodies to make you resistant against that pathogen — whether that was a bio weapon or something nature accidentally brewed up.

Ariel: I never even thought about our own personalized vaccine machines. Is that something people are working on?

Anders: Not that much yet.

Ariel: Oh.

Anders: You need to manufacture antibodies cheaply and effectively. This is going to require some fairly advanced biotechnology or nanotechnology. But it’s very foreseeable. Basically, you want to have a specialized protein printer. This is something we’re moving in the direction of. I don’t think anybody’s right now doing it but I think it’s very clearly in the path where we’re already moving.

So right now in order to make a vaccine, you need to have this very time consuming process: For example in the case of flu vaccine, you identify the virus, you multiply the virus, you inject it into chicken eggs to get the antibodies and the antigens, you develop a vaccine, and if you did it all right, you have a vaccine out in a few months just in time for the winter flu — and hopefully it was for the version of the flu that was actually making the rounds. If you were unlucky, it was a different one.

But what if you could instead take the antigen, you sequence it — that’s just going to take you a few hours — you generate all the proteins, you run it through various software and biological screens to remove the ones that don’t fit, find the ones that are likely to be good targets for immune system, automatically generate the antibodies, automatically test them out so you find which ones might be bad for patients, and then test them out. Then you might be able to make a vaccine within weeks or days.

Ariel: I really like your vision for the near term future. I’m hoping that all of that comes true. Now, to end, as you look further out into the future — which you’ve clearly done a lot of — what are you most hopeful for?

Anders: I’m currently working on writing a book about what I call “Grand Futures.” Assuming humanity survives and gets its act together, however we’re supposed to do that, then what? How big could the future possibly be? It turns out that the laws of physics certainly allow us to do fantastic things. We might be able to spread literally over billions of light years. Settling space is definitely physically possible, but also surviving even as a normal biological species on earth for literally hundreds of millions of years — and that’s already not stretching it. It might be that if we go post-biological, we can survive up until proton decay in somewhere north of 10^30 years in the future. Of course, the amount of intelligence that could be generated, human brains are probably just the start.

We could probably develop ourselves or Artificial Intelligence to think enormously bigger, enormously much more deeply, enormously more profoundly. Again, this is stuff that I can analyze. There are questions about what the meaning of these thoughts would be, how deep the emotions of the future could be, et cetera, that I cannot possibly answer. But it looks like the future could be tremendously grand, enormously much bigger, just like our own current society would strike our stone age ancestors as astonishingly wealthy, astonishingly knowledgeable and interesting.

I’m looking at: what about the stability of civilizations? Historians have been going on a lot about the decline and fall of civilizations. Does that tell us an ultimate limit on what we can plan for? Eventually I got fed up reading historians and did some statistics and got some funny conclusions. But even if our civilization lasts long, it might become something very alien over time, so how do we handle that? How do you even make a backup of your civilization?

And then of course there are questions like “how long can we survive on earth?” And “when the biosphere starts failing in about a billion years, couldn’t we fix that?” What are the environmental ethics issues surrounding that? What about settling the solar system? how do you build and maintain your Dyson sphere? Then of course there’s the stellar settlement, the intergalactic settlement, then the ultimate limits of physics. What can we say about them and in what ways could physics be really different from what we expect and what does that do for our chances?

It all leads back to this question: so, what should we be doing tomorrow? What are the near term issues? Some of them are interesting like, okay, so if the future is super grand, we should probably expect that we need to safeguard ourselves against existential risk. But we might also have risks — not just going extinct, but causing suffering and pain. And maybe there are other categories we don’t know about. I’m looking a little bit at all the unknown super important things that we don’t know about yet. How do we search for them? If we discover something that turns out to be super important, how do we coordinate mankind to handle that?

Right now, this sounds totally utopian. Would you expect all humans to get together and agree on something philosophical? That sounds really unlikely. Then again, a few centuries ago the United Nations and the internet would also sound totally absurd. The future is big — we have a lot of centuries ahead of us, hopefully.

Max: When I look really far into the future, I also look really far into space and I see this vast cosmos, which is 13.8 billion years old. And most of it is, despite what the UFO enthusiasts say, is actually looking pretty dead and wasted opportunities. And if we can help life flourish not just on earth, but ultimately throughout much of this amazing universe, making it come alive and teeming with these fascinating and inspiring developments, that makes me feel really, really inspired.

This is something I hope we can contribute to, we denizens of this planet, right now, here, in our lifetime. Because I think this is the most important time and place probably in cosmic history. After 13.8 billion years on this particular planet, we’ve actually developed enough technology, almost, to either drive ourselves extinct or to create super intelligence, which can spread out into the cosmos and do either horrible things or fantastic things. More than ever, life has become the master of its own destiny.

Allison: For me this pretty specific vision would really be a voluntary world, in which different entities, whether they’re AI or humans, can cooperate freely with each other to realize their interests. I do think that we don’t know where we want to end up, and we really have — if you look back 100 years, it’s not only that you wouldn’t have wanted to live there, but also many of the things that were regarded as moral back then are not regarded as moral anymore by most of us, and we can expect the same to hold true 100 years from now. I think rather than locking in any specific types of values, we ought to leave the space of possible values open.

Maybe right now you could try to do something like coherent extrapolated volition, which is, in AI safety, coined by Eliezer Yudkowsky to describe a goal function of a superintelligence that would execute your goals if you were more the person you wish you were, if we lived closer together, if we had more time to think and collaborate — so kind of a perfect version of human morality. I think that perhaps we could do something like that for humans, because we all come from the same evolutionary background. We all share a few evolutionary cornerstones, at least, that make us value family, or make us value a few others of those values, and perhaps we could do something like coherent extrapolated volition of some basic, very boiled down values that most humans would agree to. I think that may be possible, I’m not sure.

On the other hand, in a future where we succeed, at least in my version of that, we live not only with humans but with a lot of different mind architectures that don’t share our evolutionary background. For those mind architectures it’s not enough to try to do something like coherent extrapolated volition, because given that they have very different starting conditions, they will also end up valuing very different value sets. In the absence of us knowing what’s in their interests, I think really the only thing we can reasonably do is try to create a framework in which very different mind architectures can cooperate freely with each other, and engage in mutually beneficial relationships.

Ariel: Honestly, I really love that your answer of what you’re looking forward to is that it’s something for everybody. I like that.

Anthony: When you think about what life used to be for most humans, we really have come a long way. I mean, slavery was just fully accepted for a long time. Complete subjugation of women and sexism was just totally accepted for a really long time. Poverty was just the norm. Zero political power was the norm. We are in a place where, although imperfect, many of these things have dramatically changed; even if they’re not fully implemented; Our ideals and our beliefs of human rights and human dignity and equality have completely changed and we’ve implemented a lot of that in our society.

So what I’m hopeful about is that we can continue that process, and that the way that culture and society work 100 years from now, we would look at from now and say, “Oh my God, they really have their shit together. They have figured out how to deal with differences between people, how to strike the right balance between collective desires and individual autonomy, between freedom and constraint, and how people can feel liberated to follow their own path while not trampling on the rights of others.” These are not in principle impossible things to do, and we fail to do them right now in large part, but I would like to see our technological development be leveraged into a cultural and social development that makes all those things happen. I think that really is what it’s about.

I’m much less excited about more fancy gizmos, more financial wealth for everybody, more power to have more stuff and accomplish more and higher and higher GDP. Those are useful things, but I think they’re things toward an end, and that end is the sort of happiness and fulfillment and enlightenment of the conscious living beings that make up our world. So, when I think of a positive future, it’s very much one filled with a culture that honestly will look back on ours now and say, “Boy, they really were screwed up, and I’m glad we’ve gotten better and we still have a ways to go.” And I hope that our technology will be something that will in various ways make that happen, as technology has made possible the cultural improvements we have now.

Ariel: I think as a woman I do often look back at the way technology enabled feminism to happen. We needed technology to sort of get a lot of household chores accomplished — to a certain extent, I think that helped.

Anthony: There are pieces of cultural progress that don’t require technology, as we were talking about earlier, but are just made so much easier by it. Labor-saving devices helped with feminism; Just industrialization I think helped with serfdom and slavery — we didn’t have to have a huge number of people working in abject poverty and total control in order for some to have a decent lifestyle, we could spread that around. I think something similar is probably true of animal suffering and meat. It could happen without that — I mean, I fully believe that 100 years from now, or 200 years from now, people will look back at eating meat as just like a crazy thing that people used to do. It’s just the truth I think of what’s going to happen.

But it’ll be much, much easier if we have technologies that make that economically viable and easy rather than pulling teeth and a huge cultural fight and everything, which I think will be hard and long. We should be thinking about, if we had some technological magic wand, what are the social problems that we would want to solve with it, and then let’s look for that wand once we identify those problems. If we could make some social problem much better if we only had such and such technology, that’s a great thing to know, because technologies are something we’re pretty good at inventing. If they don’t violate the laws of physics, and there’s some motivation, we can often generate those things, so let’s think about what they are, what would it take to solve this sort of political informational mess where nobody knows what’s true and everybody is polarized?

That’s a social problem. It has a social solution. But there might be technologies that would be enormously helpful in making those social solutions easier. So what are those technologies? Let’s think about them. So I don’t think there’s a kind of magic bullet for a lot of these problems. But having that extra boost that makes it easier to solve the social problem I think is something we should be looking for for sure.

And there are lots of technologies that really do help — worth keeping in mind, I guess, as we spend a lot of our time worrying about the ill effects of them, and the dangers and so on. There is a reason we keep pouring all this time and money and energy and creativity into developing new technologies.

Ariel: I’d like to finish with one last question for everyone, and that is: what does existential hope mean for you?

Max: For me, existential hope is hoping for and envisioning a really inspiring future, and then doing everything we can to make it so.

Anthony: It means that we really give ourselves the space and opportunity to continue to progress our human endeavor — our culture, our society — to build a society that really is backstopping everyone’s freedom and actualization, compassion, enlightenment, in a kind of steady, ever-inventive process. I think we don’t often give ourselves as much credit as we should for how much cultural progress we’ve really made in tandem with our technological progress.

Anders: My hope for the future is that we get this enormous open-ended future. It’s going to contain strange and frightening things, but I also believe that most of it is going to be fantastic. It’s going to be roaring onward far, far, far into the long term future of the universe, probably changing a lot of the aspects of the universe.

When I use the term “existential hope,” I contrast that with existential risk. Existential risks are things that threaten to curtail our entire future, to wipe it out, to make it too much smaller than it could be. Existential hope, to me, means that maybe the future is grander than we expect. Maybe we have chances we’ve never seen. And I think we are going to be surprised by many things in the future and some of them are going to be wonderful surprises. That is the real existential hope.

Gaia: When I think about existential hope, I think it’s sort of an unusual phrase. But to me it’s really about the idea of finding meaning, and the potential that each of us has to experience meaning in our lives. And I think that the idea of existential hope, and I should say, the existential part of that, is the concept that that fundamental capability is something that will continue in the very long-term and will not go away. You know, I think it’s the opposite of nihilism, it’s the opposite of the idea that everything is just meaningless and our lives don’t matter and nothing that we do matters.

If I’m feeling — if I’m questioning that, I like to go and read something like Viktor Frankl’s book Man’s Search for Meaning, which really reconnects me to these incredible, deep truths about the human spirit. That’s a book that tells the story of his time in a concentration camp at Auschwitz. And even in those circumstances, the ability that he found within himself and that he saw within people around him to be kind, and to persevere, and to really give of himself, and others to give of themselves. And there’s just something impossible, I think, to capture in language. Language is a very poor tool, in this case, to try to encapsulate the essence of what that is. I think it’s something that exists on an experiential level.

Allison: For me, existential hope is really trying to choose to make a difference, knowing that success is not guaranteed, but it’s really making a difference because we simply can’t do it any other way. Because not trying is really not an option. It’s the first time in history that we’ve created the technologies for our destruction and for our ascent. I think they’re both within our hands, and we have to decide how to use them. So I think existential hope is transcending existential angst, and transcending our current limitation, rather than trying to create meaning within them, and I think it’s the adequate mindset for the time that we’re in.

Ariel: And I still love this idea that existential hope means that we strive toward everyone’s personal ideal, whatever that may be. On that note, I cannot thank my guests enough for joining the show, and I also hope that this episode has left everyone listening feeling a bit more optimistic about our future. I wish you all a happy holiday and a happy new year!

AI Alignment Podcast: Inverse Reinforcement Learning and the State of AI Alignment with Rohin Shah

What role does inverse reinforcement learning (IRL) have to play in AI alignment? What issues complicate IRL and how does this affect the usefulness of this preference learning methodology? What sort of paradigm of AI alignment ought we to take up given such concerns?

Inverse Reinforcement Learning and the State of AI Alignment with Rohin Shah is the seventh podcast in the AI Alignment Podcast series, hosted by Lucas Perry. For those of you that are new, this series is covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, governance,  ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with Rohin Shah. Rohin is a 5th year PhD student at UC Berkeley with the Center for Human-Compatible AI, working with Anca Dragan, Pieter Abbeel and Stuart Russell. Every week, he collects and summarizes recent progress relevant to AI alignment in the Alignment Newsletter

Topics discussed in this episode include:

  • The role of systematic bias in IRL
  • The metaphilosophical issues of IRL
  • IRL’s place in preference learning
  • Rohin’s take on the state of AI alignment
  • What Rohin has changed his mind about
You can learn more about Rohin’s work here and find the Value Learning sequence hereYou can listen to the podcast above or read the transcript below.

Lucas: Hey everyone, welcome back to the AI Alignment Podcast series. I’m Lucas Perry and today we will be speaking with Rohin Shah about his work on inverse reinforcement learning and his general take on the state of AI alignment efforts and theory today. Rohin is a 5th year PhD student at UC Berkeley with the Center for Human-Compatible AI, working with Anca Dragan, Pieter Abbeel and Stuart Russell. Every week, he collects and summarizes recent progress relevant to AI alignment in the Alignment Newsletter. He has also been working with effective altruism for several years. Without further ado I give you Rohin Shah.

Hey, Rohin, thank you so much for coming on the podcast. It’s really a pleasure to be speaking with you.

Rohin: Hey, Lucas. Yeah. Thanks for inviting me. I’m glad to be on.

Lucas: Today I think that it would be interesting just to start off by delving into a lot of the current work that you’ve been looking into and practicing over the past few years. In terms of your research, it looks like you’ve been doing a lot of work on practical algorithms for inverse reinforcement learning that take into account, as you say, systematic cognitive biases that people have. It would be interesting if you could just sort of unpack this work that you’ve been doing on this and then contextualize it a bit within the AI alignment problem.

Rohin: Sure. So basically the idea with inverse reinforcement learning is you can look at the behavior of some agent, perhaps a human, and tell what they’re trying to optimize, what are the things that they care about? What are their goals? And in theory this seems like a pretty nice way to do AI alignment and that intuitively you can just say, “Hey, AI, go look at the actions of humans are taking, look at what they say, look at what they do, take all of that in and figure out what humans care about.” And then you could use that perhaps as a utility function for your AI system.

I think I have become less optimistic about this approach now for reasons I’ll get into, partly because of my research on systematic biases. Basically one problem that you have to deal with is the fact that whatever humans are trying to optimize for, they’re not going to do it perfectly. We’ve got all of these sorts of cognitive biases like a planning fallacy or hyperbolic time discounters, when we tend to be myopic, not looking as far into the long-term as we perhaps could.

So assuming that humans are like perfectly optimizing goals that they care about is like clearly not going to work. And in fact, basically, if you make that assumption, well, then whatever reward function you infer, once the AI system is optimizing that, it’s going to simply recover the human performance because well, you assumed that it was optimal when you inferred what it was so that means whatever the humans were doing is probably the behavior that optimizes their work function that you inferred.

And we’d really like to be able to reach super human performance. We’d like our AI systems to tell us how we’re wrong to get new technologies develop things that we couldn’t have done ourselves. And that’s not really something we can do using the sort of naive version of inverse reinforcement learning that just assumes that you’re optimal. So one thing you could try to do is to learn the ways in which humans are biased, the ways in which they make mistakes, the ways in which they plan sub-optimally. And if you could learn that, then you could correct for those mistakes, take them into account when you’re inferring human values.

The example I like to use is if there’s a grad student who procrastinates or doesn’t plan well and as a result near a paper deadline they’re frantically working, but they don’t get it in time and they miss the paper deadline. If you assume that they’re optimal, optimizing for their goals very well I don’t know what you’d infer, maybe something like grad students like to miss deadlines. Something like that seems pretty odd and it doesn’t seem like you’d get something sensible out of that, but if you realize that humans are not very good at planning, they have the planning fallacy and they tend to procrastinate for reasons that they wouldn’t endorse on reflection, then maybe you’d be able to say, “Oh, this was just a mistake of a grad student made. In the future I should try to help them meet their deadlines.”

So that’s the reason that you want to learn systematic biases. My research was basically let’s just take the hammer of deep learning and apply it to this problem. So not just learn the reward function, but let’s also learn the biases. It turns out that this was already known, but there is an impossibility result that says that you can’t do this in general. So more, I guess I would phrase the question I was investigating, as what are a weaker set of assumptions some of than the ones that we currently use such that you can still do some reasonable form of IRL.

Lucas: Sorry. Just stepping back for like half a second. What does this impossibility theorem say?

Rohin: The impossibility theorem says that if you assume that the human is basically running some sort of planner that takes in a reward function and spits out a behavior or a policy, a thing to do over time, then if you all you see is the behavior of the human, basically any reward function is compatible with some planner. So you can’t learn anything about that reward function without making any more assumptions. And intuitively, this is because for any complex behavior you see you could either call it, “Hey, the human’s optimizing a reward that makes them act like that. “Or you could say, “I guess the human is biased and they’re trying to do something else, but they did this instead.”

The sort of extreme version of this is like if you give me an option between apples and oranges and I picked the apple, you could say, “Hey, Rohin probably likes apples and is good at maximizing his reward of getting apples.” Or you could say, “Rohin probably likes oranges and he is just extremely bad at satisfying his preferences. He’s got a systematic bias that always causes him to choose the opposite of what he wants.” And you can’t distinguish between these two cases just by looking at my behavior.

Lucas: Yeah, that makes sense. So we can pivot sort of back in here into this main line of thought that you were on.

Rohin: Yeah. So basically with that impossibility result … When I look at the impossibility result, I sort of say that humans do this all the time, humans just sort of look at other humans and they can figure out what they want to do. So it seems like there are probably some simple set of assumptions that humans are using to infer what other humans are doing. So a simple one would be when the consequences of something or obvious to humans. Now, how you determine when that is another question, but when that’s true humans tend to be close to optimal and if you have something like that, you can rule out the planner that says the human is anti-rational and always chooses the worst possible thing.

Similarly, you might say that as tasks get more and more complex or require more and more computation, the probability that the human chooses the action that best maximizes his or her goals also goes down since the task is more complex and maybe a human doesn’t figure that out, figure out what’s the best thing to do. Maybe with enough of these assumptions we could get some sort of algorithm that actually works.

So we looked at if you make the assumption that the human is often close to rational and a few other assumptions about humans behaving similarly or planning similarly on similar tasks, then you can maybe, kind of, sort of, in simplified settings do IRL better than if you had just assumed that the human was optimal if humans actually systematically biased, but I wouldn’t say that our results are great. I don’t think I would say that I definitively, conclusively said, “This will never work.” Nor did I definitively conclusively say that this is great and we should definitely be putting more resources into it. Sort of somewhere in the middle, maybe more on the negative side of like this seems like a really hard problem and I’m not sure how we get around it.

Lucas: So I guess just as a point of comparison here, how is it that human beings succeed at this every day in terms of inferring preferences?

Rohin: I think humans have the benefit of being able to model the other person as being very similar to themselves. If I am trying to infer what you are doing I can sort of say, “Well, if I were in Lucas issues and I were doing this, what would I be optimizing?” And that’s a pretty good answer to what you would be optimizing. Humans are just in some absolute sense very similar to each other. We have similar biases. We’ve got similar ways of thinking. And I think we’ve leveraged that similarity a lot using our own self models as a drop-in approximation of the other person’s planner in this planner reward language.

And then we say, “Okay, well, if this other person thought like me and this is what they ended up doing, well then, what must they have been optimizing?” I think you’ll see that when this assumption breaks down humans actually get worse at inferring goals. It’s harder for me to infer what someone in a different culture is actually trying to do. They might have values that are like significantly different from mine.

I’ve been in both India and the US and it often seems to me that people in the US just have a hard time grasping the way that Indians see society and family expectations and things like this. So that’s an example that I’ve observed. It’s probably also true the other way around, but I was never old enough in India to actually think through this.

Lucas: Human beings sort of succeed in inferring preferences of people who they can model as having like similar values as their own or if you know that the person has similar values as your own. If inferring human preferences from inverse reinforcement learning is sort of not having the most promising results, then what do you believe to be a stronger way of inferring human preferences?

Rohin: The one thing I correct there is that I don’t think humans do it by assuming that people have similar values, just that people think in similar ways. For example, I am not particularly good at dancing. If I see someone doing a lot of hip-hop or something. It’s not that I value hip-hop and so I can infer they value hip-hop. It’s that I know that I do things that I like and they are doing hip-hop. Therefore, they probably like doing hip-hop. But anyway, that’s the minor point.

So a, just because IRL algorithms aren’t doing well now, I don’t think it’s true that IRL algorithms couldn’t do well in the future. It’s reasonable to expect that they would match human performance. That said, I’m not super optimistic about IRL anyway, because even if we do figure out how to get IRL algorithms and sort of make all these implicit assumptions that humans are making that we can then run and get what a human would have thought other humans are optimizing, I’m not really happy about then going and optimizing that utility function off into the far future, which is what sort of the default assumption that we seem to have when using inverse reinforcement learning.

It may be that IRL algorithms are good for other things, but for that particular application, it seems like the utility function you infer is going to not really scale to things that super intelligence will let us do. Humans just think very differently about how they want the future to go. In some sense, the future is going to be very, very different. We’re going to need to think a lot about how we want the future to go. All of our experience so far has not trained us to be able to think about what we care about in the sort of feature setting where we’ve got as a simple example the ability to easily copy people if they’re uploaded as software.

If that’s a thing that happens, well, is it okay to clone yourself? How does democracy work? All these sorts of things are somewhat value judgments. If you take egalitarianism and run with it, you basically get that one person can copy themselves millions of millions of times and just determine the outcome of all voting that way. That seems bad, but on our current values, I think that is probably what we want and we just really haven’t thought this through. IRL to infer utility function that we’ve then just ruthlessly optimized in the long-term just seems like by the time when the world changes a bunch, the value function that we inferred is going to be weirdly wrong in strange ways that we can’t predict.

Lucas: Why not run continuous updates on it as people update given the change of the world?

Rohin: It seems broadly reasonable. This is the sort of idea that you could have about how you could use IRL in a more realistic way that actually works. I think that’s perfectly fine. I’m optimistic about approaches that are like, “Okay, we’re going to use IRL to infer a value function or reward function or something and we’re going to use that to inform what the AI does, but it’s not going to be the end-all utility functions. It’s just going to infer what we do now and AI system is somehow going to check with us. Maybe it’s got some uncertainty over what the true reward function is. Maybe that it only keeps this reward function for a certain amount of time.”

These seem like things that are worth exploring, but I don’t know that we have the correct way to do it. So in the particular case that you proposed, just updating the reward function over time. The classic wire heading question is, how do we make it so that the AI doesn’t say, “Okay, actually, in order to optimize the utility function I have now, it would be good for me to prevent you from changing my utility function since if you change my utility function, I’m no longer going to achieve my original utility.” So that’s one issue.

The other issue is maybe it starts doing some long-term plans. Maybe even if it’s planning according to this utility function without expecting some changes to the utility function in the future, then it might set up some long-term plans that are going to look bad in the future, but it is hard to stop them in the future. Like you make some irreversible change to society because you didn’t realize that something was going to change. These sorts of things suggest you don’t want a single utility function that you’re optimizing even if you’re updating that utility function over time.

It could be that you have some sort of uncertainty over utility functions and that might be okay. I’m not sure. I don’t think that it’s settled that we don’t want to do something like this. I think it’s settled that we don’t want to use IRL to infer a utility function and optimize that one forever. There are certain middle grounds. I don’t know how well those middle grounds work. There are some intuitively there are going to be some problems, but maybe we can get around those.

Lucas: Let me try to do a quick summary just to see if I can explain this as simply as possible. There are people and people have preferences, and a good way to try and infer their preferences is through their observed behavior, except that human beings have cognitive and psychological biases, which sort of skew their actions because they’re not perfectly rational epistemic agents or rational agents. So the value system or award system that they’re optimizing for is imperfectly expressed through their behavior. If you’re going to infer the preferences from behavior than you have to correct for biases and epistemic and rational failures to try and inferr the true reward function. Stopping there. Is that sort of like a succinct way you’d put it?

Rohin: Yeah, I think maybe another point that might be the same or might be different is that under our normal definition of what our preferences or our values are, if we would say something like, “I value egalitarianism, but it seems predictably true that in the future we’re not going to have a single vote per a sentient being,” or something. Then essentially what that says is that our preferences, our values are going to change over time and they depend on the environment in which we are right now.

So you can either see that as okay, I have this really big, really global, really long-term utility function that tells me how given my environment what my narrow values in that environment are. And in that case and you say, “Well okay, in that case, we’re really super biased because we only really know our values in the environment. We don’t know our values in future environments. We’d have to think a lot more for that.” Or you can say, “We can infer our narrow values now and that has some biases thrown in, but we could probably account for those that then we have to have some sort of story for how we deal with our preferences evolving in the future.”

Those are two different perspectives on the same problem, I would say, and they differ in basically what you’re defining values to be. Is it the thing that tells you how to extrapolate what you want all the way into the future or is it the thing that tells you how you’re behaving right now in the environment. I think our classical notion of preference or values, the one that we use when we say values in everyday language is talking about the second kind, the more narrow kind.

Lucas: There’s really a lot there, I think, especially in terms of issues in that personal identity over time, commitment to values and as you said, different ideas and conceptualization of value, like what is it that I’m actually optimizing for or care about. Population ethics and tons of things about how people value future versions of themselves or whether or not they actually equally care about their value function at all times as it changes within the environment.

Rohin: That’s a great description of why I am nervous around inverse reinforcement learning. You listed a ton of issues and I’m like, yeah, all of those are like really difficult issues. And with inverse reinforcement learning, it’s sort of based on this premise of all of that is existent, is real and is timeless and we can infer it and then maybe we put on some hacks like continuously improving the value function over time to take into account changes, but this does feel like we’re starting with some fundamentally flawed paradigm.

So mostly because of this fact that it feels like we’ve taken a flawed paradigm to start with, then changed it so that it doesn’t have all the obvious flaws. I’m more optimistic about trying to have a different paradigm of how we want to build AI, which maybe I’ll summarize as just make AIs that do what we want or what we mean at the current moment in time and then make sure that they evolve along with us as we evolve and how we think about the world.

Lucas: Yeah. That specific feature there is something that we were trying to address in inverse reinforcement learning, if the algorithm were sort of updating overtime alongside myself. I just want to step back for a moment to try to get an even grander and more conceptual understanding of the globalness of inverse reinforcement learning. So from an evolutionary and sort of more cosmological perspective, you can say that from the time that the first self-replicating organisms on the planet until today, like the entire evolutionary tree, there’s sort of a global utility function across all animals that is ultimately driven by thermodynamics and the sun shining light on a planet and that this sort of global utility function of all agents across the planet, it seems like very ontologically basic and pure like what simply empirically exists. Attempting to access that through IRL is just interesting, the difficulties that arise from that. Does that sort of a picture seem accurate?

Rohin: I think I’m not super sure what exactly you’re proposing here. So let me try and restate it. So if we look at the environment as a whole or the universe as a whole or maybe we’re looking at evolution perhaps and we see that hey, evolution seems to have spit out all of these creatures that are interacting in this complicated way, but you can look at all of their behavior and trace it back to this objective in some sense of maximizing reproductive fitness. And so are we expecting that IRL on this very grand scale would somehow end up with maximize reproductive fitness. Is that what … Yeah, I think I’m not totally sure what implication you’re drawing from this.

Lucas: Yeah. I guess I’m not arguing that there’s going to be some sort of evolutionary thing which is being optimized.

Rohin: IRL does make the assumption that there is something doing an optimization. You usually have to point it towards what that thing is. You have to say, “Look at the behavior of this particular piece of the environment and tell me what it’s optimizing.” Maybe if you’re imagining IRL on this very grand scale, what is the thing you’re pointing it at?

Lucas: Yeah, so to sort of reiterate and specify, the pointing IRL at the human species would be like to point IRL at 7 billion primates. Similarly, I was thinking that what if one pointed IRL at the ecosystem of Earth over time, you could sort of plot this evolving algorithm over time. So I was just sort of bringing to note that accessing this sort of thing, which seems quite ontologically objective and just sort of clear in this way, it’s just very interesting how it’s fraught with so many difficulties. Yeah, in terms of history it seems like all there really is, is the set of all preferences at each time step over time, which could be summarized in some sort of global or individual levels of algorithms.

Rohin: Got it. Okay. I think I see what you’re saying right now. It seems like the intuition is like ecosystems, universe, laws of physics, very simple, very ontologically basic things, there’s something more real about any value function we could infer from that. And I think this is a misunderstanding of what IRL does. IRL fundamentally requires you to have some notion of counterfactuals. You need to have a description of the action space that some agent had and then when you observe their behavior, you see that they made a choice to take one particular action instead of another particular action.

You need to be able to ask the question of what could they have done instead, which is a counterfactual. Now, with laws of physics, it’s very unclear what the counterfactual would be. With evolution, you can maybe say something like, “Evolution could have chosen to make a whole bunch of mutations and I chose this particular one. And then if you use that particular model, what is IRL going to infer? It will probably infer something like maximized reproductive fitness.”

On the other hand, if you model evolution as like hey, you can design the best possible organism that you can. You can just create an organism out of thin air. And then what reward function are you maximizing then, it’s like super unclear. If you could just poof into existence a organism, you could just make something that’s extremely intelligent, very strong, et cetera, et cetera. And you’re like, well, evolution didn’t do that. It took millions of years to create even humans so clearly it wasn’t optimizing reproductive fitness, right?

And in fact, I think people often say that evolution is not an optimization process because of things like this. The notion of something doing optimization is very much relative to what you assume their capabilities to be and in particular what do you assume their counterfactuals to be. So if you were talking about this sort of grand scale ecosystems, universe, laws of physics, I would ask you like, “What are the counterfactuals? What could the laws of physics done otherwise or what could the ecosystem have done if it didn’t do the thing that it did?” Once you have an answer to that, I imagine I could predict what IRL would do. And that part is the part that doesn’t seem ontologically basic to me, which is why I don’t think that IRL on this sort of thing makes very much sense.

Lucas: Okay. The part here that seems to be a little bit funny to me is where tracking from physics, whatever you take to be ontologically basic about the universe, and tracking from that to the level of whatever our axioms and pre-assumptions for IRL are. What I’m trying to say is in terms of moving from whatever is ontologically basic to the level of agents and we have some assumptions in our IRL where we’re thinking about agents as sort of having theories of counterfactuals where they can choose between actions and they have some sort of reward or objective function that they’re trying to optimize for over time.

It seems sort of metaphysically queer where physics stops … Where we’re going up in levels of abstraction from physics to agents and we … Like physics couldn’t have done otherwise, but somehow agents could have done otherwise. Do you see the sort of concern that I’m raising?

Rohin: Yeah, that’s right. And this is perhaps another reason that I’m more optimistic about the don’t try to do anything at the grand scale and just try to do something that does the right thing locally in our current time, but I think that’s true. It definitely feels to me like optimization, the concept, should be ontologically basic and not a property of human thought. There’s something about how a random universe is high entropy whereas the ones that humans construct is low entropy. That suggests that we’re good at optimization.

It seems like it should be independent of humans. Also, on the other hand, optimization, any conception I come up with it is either specific to the way humans think about it or it seems like it relies on this notion of counterfactuals. And yeah, the laws of physics don’t seem like they have counterfactuals, so I’m not really sure where that comes in. In some sense, you can see that, okay, why do we have the notion of counterfactuals on agency thinking that we could have chosen something else while we’re basically … In some sense we’re just an algorithm that’s continually thinking about what we could do, trying to make plans.

So we search over this space of things that could be done, and that search is implemented in physics, which has no say, it has no counterfactuals, but the search itself, which is an abstraction layer above, it’s something that is running on physics. It is not itself a physics thing, that search is in fact going through multiple options and then choosing one now. It is deterministic from the point of view of physics, but from the point of view of the search, it’s not deterministic. The search doesn’t know which one is going to happen. I think that’s why humans have this notion of choice and of agency.

Lucas: Yeah, and I mean, just in terms of understanding the universe, it’s pretty interesting just how there’s like these two levels of attention where at the physics level you actually couldn’t have done otherwise, but as sort of like this optimization process running on physics that’s searching over space and time and modeling different world scenarios and then seemingly choosing and thus, creating observed behavior for other agents to try and infer whatever reward function that thing is trying to optimize for, it’s an interesting picture.

Rohin: I agree. It’s definitely a sort of puzzles that keep you up at night. But I think one particularly important implication of this is that agency is about how a search process thinks about itself. It’s not just about that because I can look at what someone else is doing and attribute agency to them, figure out that they are themselves running an algorithm that chooses between actions. I don’t have a great story for this. Maybe it’s just humans realizing that other humans are just like them.

So this is maybe why we get acrimonious debates about whether evolution has agency, but we don’t get acrimonious debates about whether humans have agency. Evolution is sufficiently different from us that we can look at the way that it “chooses” “things” and we say, “Oh well, but we understand how it chooses things.” You could model it as a search process, but you could also model it is all that’s happening is this deterministic or mostly deterministic which animals survived and had babies and that is how things happen. And so therefore, it’s not an optimization process. There’s no search. There is deterministic. And so you have these two conflicting views for evolution.

Whereas I can’t really say, “Hey Lucas, I know exactly deterministically how you’re going to do things.” I know this at the sense of like men, there are electrons and atoms and stuff moving around in your brain and electrical signals, but that’s not going to let me predict what you can do. One of the best models I can have of you is just optimizing for some goal, whereas with evolution I can have a more detailed model. And so maybe that’s why I set aside the model of evolution as an optimizer.

Under this setting it’s like, “Okay, maybe our views of agency and optimization are just facts about how well we can model the process, which cuts against the optimization as ontologically basic thing and it seems very difficult. It seems like a hard problem to me. I want to reiterate that most of this has just pushed me to let’s try and instead have a AI alignment focus, try to do things that we understand now and not get into the metaphilosophy problems. If we just get AI systems that broadly do what we want and are asking us for clarification, helping us evolve our thoughts over time, if we can do something like that. I think there are people who would argue that like no, of course, we can’t do something like that.

But if we could do something like that, that seems significantly more likely to work than something that has to have answers to all these metaphilosophical problems today. My position is just that this is doable. We should be able to make systems that are of the nature that I described.

Lucas: There’s clearly a lot of philosophical difficulties that go into IRL. Now it would be sort of good if we could just sort of take a step back and you could summarize your thoughts here on inverse reinforcement learning and the place that it has in AI alignment.

Rohin: I think my current position is something like fairly confidently don’t use IRL to infer a utility function that you then optimize over the long-term. In general, I would say don’t have a utility function that you optimize over the long-term because it doesn’t seem like that’s easily definable right now. So that’s like one class of things I think we should do. On the other hand I think IRL is probably good as a tool.

There is this nice property of IRL that you figure out what someone wants and then you help them do it. And this seems more robust than handwriting, the things that we care about in any particular domain, like even in a simple household robot setting, there are tons and tons of preferences that we have like don’t break vases. Something like IRL could infer these sorts of things.

So I think IRL has definitely a place as a tool that helps us figure out what humans want, but I don’t think the full story for alignment is going to rest on IRL in particular. It gets us good behavior in the present, but it doesn’t tell us how to extrapolate on into the future. Maybe if you did IRL that let you infer how we want the AI system to extrapolate our values or to figure out IRL and our meta-preferences about how the algorithm should infer our preferences or something like this, that maybe could work, but it’s not obvious to me. It seems worth trying at some point.

TLDR, don’t use it for long-term utility function. Do use it as a tool to get decent behavior in the short-term. Maybe also use it as a tool to infer meta-preferences. That seems broadly good, but I don’t know that we know enough about that setting yet.

Lucas: All right. Yeah, that’s all just super interesting and it’s sort of just great to hear how the space is unfolded for you and what your views are now. So I think that we can just sort of pivot here into the AI alignment problem more generally and so now that you’ve moved on from being as excited about IRL, what is essentially capturing your interests currently in the space of AI alignment?

Rohin: The thing that I’m most interested in right now is can we build an AI system that basically evolves over time with us. I’m thinking of this now is like a human AI interaction problem. You’ve got an AI system. We want to figure out how to make it that it broadly helps us, but also at the same time and figures out what it needs to do based on some sort of data that comes from humans. Now, this doesn’t have to be the human saying something. It could be from their behavior. It could be things that they have created in the past. It could be all sorts of things. It could be a reward function that they write down.

But I think the perspective of the things that are easy to infer are the things that are specific to our current environment is pretty important. What I would like to do is build AI systems that refer to preferences in the current environment or things we want in the current environment and do those reasonably well, but don’t just extrapolate to the future and let humans adapt to the future and then figure out what the humans value now and then do things based on that then.

There are a few ways that you could imagine this going. One is this notion of corrigibility in the sense that Paul Christiano writes about it, not the sense that MIRI writes about it, where the AI is basically trying to help you. And if I have an AI that is trying to help me, well, I think one of the most obvious things for someone who’s trying to help me to do is make sure that I remain in effective control of any power resources that might be present that the AI might have and to ask me if my values change in the future or if what I want the AI to do changes in the future. So that’s one thing that you might hope to do.

Also imagine building a norm following AI. So I think human society basically just runs on norms that we mostly all share and tend to follow. We have norms against particularly bad things like murdering people and stealing. We have norms against shoplifting. We have maybe less strong norms against littering. Unclear. And then we also have norms for things are not very consequential. We have norms against randomly knocking over a glass at a restaurant in order to break it. That is also a norm. Even though there are quite often times where I’m like, “Man, it would be fun to just break a glass at the restaurant. It’s very cathartic,” but it doesn’t happen very often.

And so if we could build an AI system that could infer and follow those norms, it seems like this AI would behave in a more human-like fashion. This is a pretty new line of thought so I don’t know whether this works, but it could be that such an AI system is simultaneously behaving in a fashion that humans would find acceptable and also lets us do pretty cool, interesting, new things like developing new technologies and stuff that humans can then deploy and the AI doesn’t just unilaterally deploy without any safety checks or running it by humans or something like that.

Lucas: So let’s just back up a little bit here in terms of the picture of AI alignment. So we have a system that we do not want to extrapolate too much toward possible future values. It seems that there are all these ways in which we can be using the AI first to sort of amplify our own decision making and then also different methodologies which reflect the way that human beings update their own values and preferences over time, something like as proposed by I believe Paul Christiano and Geoffrey Irving and other people at OpenAI, like alignment through debate.

And there’s just all these sorts of epistemic practices of human beings with regards to sort of this world model building and how that affects shifts in value and preferences, also given how the environment changes. So yeah, it just seems like tracking overall these things, finding ways in which AI can amplify or participate in those sort of epistemic practices, right?

Rohin: Yeah. So I definitely think that something like amplification can be thought of as improving our epistemics over time. That seems like a reasonable way to do it. I haven’t really thought very much about how amplification or the pay scales were changing environments. They both operate under this general like we could have a deliberation tree and in principle what we want is this exponentially sized deliberation tree where the human goes through all of the arguments and counter-arguments and breaks those down into sub-points in excruciating detail in a way that no human could ever actually do because it would take way too long.

And then amplification debate basically show you how to get the outcome that this reasoning process would have given by using an AI system to assist the human. I don’t know if I would call it like improving human epistemics, but more like taking whatever epistemics you already have and running it for a long amount of time. And it’s possible like in that long amount of time you actually figure out how to do better epistemics.

I’m not sure that this perspective really talks very much about how preferences change over time. You would hope that it would just naturally be robust to that in that as the environment changes, your deliberation starts looking different in that like okay, now suddenly we have to go back to my example before we have uploads and we’re like egalitarianism now seems to have some really weird consequences. And then presumably the deliberation tree that amplification and debate are mimicking is going to have a bunch of thoughts about do we actually want egalitarianism now, what were the moral intuitions that pushed us towards this? Is there some equivalent principle that lets us keep our moral intuitions, but doesn’t have this weird property where a single person can decide the outcome of an election, et cetera, et cetera.

I think they were not designed to do this, but by a virtue of being based off like how a human would think, what a human would do if they got a long time and a lot of helpful tools to think about it, they’re essentially just inheriting these properties from the human. If the human as the environment would change would start rethinking their priorities or what they care about, then so too would amplification and debate.

Lucas: I think here it also has me thinking about what are the meta-preferences and the meta-meta-preferences and if you could imagine taking a human brain and then running it until the end, through decision and rational and logical thought trees over enough time, with enough epistemics and power behind it to try to sort of navigate its way to the end. It just raises interesting questions about like is that what we want? Is taking that over every single person and then sort of just preference aggregating it all together, is that what we want? And what is the role of moral philosophy for thinking here?

Rohin: Well, so one thing is that whatever moral philosophy you would do so would the amplification of you in theory. I think the benefit of these approaches is that they have this nice property that whatever you would have thought of it in the limit of good AI and idealizations, properly mimicking you and so on, so forth. In this sort of nice world where this all works in a nice, ideal way, it seems like any consideration you can have or you would have so would be agent produced by iterated amplification or debate.

And so if you were going to do a bunch of moral philosophy and come to some sort of decision based on that, so would iterated amplification or debate. So I think it’s like basically here is how we build an AI system that solves the problems in the same way that a human would solve them. And so then if you’re worried about, hey, maybe humans themselves are just not very good at solving problems. Looks like most humans in the world. Like don’t do moral philosophy and don’t extrapolate their values well in the future. And the only reason we have moral progress is because younger generations keep getting born and they have different views than the older generations.

That, I think, could in fact be a problem, but I think there’s hope that we could like train humans to have them nice sort of properties, good epistemics, such they would provide good training data for iterated amplification if there comes a day where we think we can actually train iterated amplification to mimic human explicit reasoning. They do both have the property that they’re only mimicking the explicit reasoning and not necessarily the implicit reasoning.

Lucas: Do you want to unpack that distinction there?

Rohin: Oh, yeah. Sure. So both of them require that you take your high-level question and decompose it into a bunch of sub-questions or sorry, the theoretical model of them has that. This is like pretty clear with iterated amplification. It is less clear with debate. At each point you need to have the top level agent decompose the problem into a bunch of sub-problems. And this basically requires you to be able to decompose tasks into clearly specified sub-tasks, where clearly specified could mean in natural language, but you need to make it explicit in a way that the agent you’re assigning the task to can understand it without having to have your mind.

Whereas if I’m doing some sort of programming task or something, often I will just sort of know what direction to go in next, but not be able to cleanly formalize it. So you’ll give me some like challenging algorithms question and I’ll be like, “Oh, yeah, kind of seems like dynamic programming is probably the right thing to do here.” And maybe if I consider it this particular way, maybe if I put these things in the stack or something, but even the fact that I’m saying this out in natural language is misrepresenting my process.

Really there’s some intuitive not verbalizable process going on in my head. Somehow navigates to the space of possible programs and picks a thing and I think the reason I can do this is because I’ve been programming for a lot of time and I’ve trained a bunch of intuitions and heuristics that I cannot easily verbalize us some like nice decomposition. So that’s sort of implicit in this thing. If you did want that to be incorporated in an iterated amplification, it would have to be incorporated in the base agent, the one that you start with. But if you start with something relatively simple, which I think is often what we’re trying to do, then you don’t get those human abilities and you have to rediscover them in some sense through explicit decompositional reasoning.

Lucas: Okay, cool. Yeah, that’s super interesting. So now to frame all of this again, do you want to sort of just give a brief summary of your general views here?

Rohin: I wish there were a nice way to summarize this. That would mean we’d made more progress. It seems like there’s a bunch of things that people have proposed. There’s amplification/debate, which are very similar, IRL as a general. I think, but I’m not sure, that most of them would agree that we don’t want to like infer a utility function and optimize it for the long-term. I think more of them are like, yeah, we want this sort of interactive system with the human and the AI. It’s not clear to me how different these are and what they’re aiming for in amplification and debate.

So here we’re sort of looking at how things change over time and making that a pretty central piece of how we’re thinking about it. Initially the AI is trying to help the human, human has some sort of reward function, AI trying to learn it and help them, but over time this changes, the AI has to keep up with it. And under this framing you want to think a lot about interaction, you want to think about getting as many bits about reward from the human to the AI as possible. Maybe think about control theory and how human data is in some sense of control mechanism for the AI.

You’d want to like infer norms and ways that people behave, how people relate with each other, try to have your AI systems do that as well. So that’s one camp of things, have the AI interact with humans, behave generally in the way that humans would say is not crazy, update those over time. And then there’s the other side which is like have an AI system that is taking human reasoning, human explicit reasoning and doing that better or doing that more, which allows it to do anything that the human would have done, which is more taking the thought process that humans go through and putting that at the center. That is the thing that we want to mimic and make better.

Sort of parts where our preferences change over time is something that you get for free in some sense by mimicking human thought processes or reasoning. Summary, those are two camps. I am optimistic about both of them, think that people should be doing research on both of them. I don’t really have much more of a perspective of that, I think.

Lucas: That’s excellent. I think that’s a super helpful overview actually. And given that, how do you think that your views of AI alignment have changed over the past few years?

Rohin: I’ll note that I’ve only been in this field for I think 15, 16 months now, so just over a year, but over that year I definitely came into it thinking what we want to do is infer the correct utility function and optimize it. And I have moved away quite strongly from that. I, in fact, recently started writing a value learning sequence or maybe collating is a better word. I’ve written a lot of posts that still have to come out, but I also took a few posts from other people.

The first part of that sequence is basically arguing seems bad to try and define a utility function and then optimize it. So I’m just trying to move away from long-term utility functions in general or long-term goals or things like this. That’s probably the biggest update since starting. Other things that I’ve changed, a focus more on norms than on values, trying to do things that are easy to infer right now in the current environment and that making sure that we update on these over time as opposed to trying to get the one true thing that depends on us solving all the hard metaphilosophical problems. That’s, I think, another big change in the way I’ve been thinking about it.

Lucas: Yeah. I mean, there are different levels of alignment at their core.

Rohin: Wait, I don’t know exactly what you mean by that.

Lucas: There’s your original point of view where you said you came into the field and you were thinking infer the utility function and maximize it. And your current view is now that you are moving away from that and beginning to be more partial towards the view which takes it that we want to be like inferring from norms in the present day just like current preferences and then optimizing that rather than extrapolating towards some ultimate end-goal and then trying to optimize for that. In terms of aligning in these different ways, isn’t there a lot of room for value drift, allowing the thing to run in the real world rather than amplifying explicit human thought on a machine?

Rohin: Value drift if is an interesting question. In some sense, I do want my values to drift in that whatever I think about the correct way that the future should go or something like that today. I probably will not endorse that in the future and I endorse the fact that I won’t endorse it in the future. I do want to learn more and then figure out what to do in the future based on that. You could call that value drift that is a thing. I want to happen. So in that sense then value drift wouldn’t be a bad thing, but then there’s also a sense in which there are ways in which my values could change in the future and ways that I don’t endorse and then that one maybe is value drift. That is bad.

So yeah, if you have an AI system that’s operating in the real world and changes over time as we humans change, yes, there will be changes at what the AI system is trying to achieve over time. You could call that value drift, but value drift usually has a negative connotation, whereas like this process of learning as the environment changes seems to be to me like a positive thing. It’s a thing I would want to do myself.

Lucas: Yeah, sorry, maybe I wasn’t clear enough. In the case of running human beings in the real world, where there are like the causes and effects of history and whatever else and how that actually will change the expression of people over time. Because if you’re running this version of AI alignment where you’re sort of just always optimizing the current set of values in people, progression of the world and of civilization is only as good as the best of all human like values and preferences in that moment.

It’s sort of like limited by what humans are in that specific environment and time, right? If you’re running that in the real world versus running some sort of amplified version of explicit human reasoning, don’t you think that they’re going to come to different conclusions?

Rohin: I think the amplified explicit human reasoning, I imagine that it’s going to operate in the real world. It’s going to see changes that happen. It might be able to predict those changes and then be able to figure out how to respond fast, before the changes even happen perhaps, but I still think of amplification as being very much embedded in the real world. Like you’re asking it questions about things that happen in the real world. It’s going to use explicit reasoning that it would have used if a human were in the real world and thinking about the question.

I don’t really see much of a distinction here. I definitely think that even in my setting where I’m imagining AI systems that evolve over time and change based on that, that they are going to be smarter than humans, going to think through things a lot faster, be able to predict things in advance in the same way that simplified explicit reasoning would. Maybe there are differences, but value drift doesn’t seem like one of them or at least I cannot predict right now how they will differ along the axis of value drift.

Lucas: So then just sort of again taking a step back to the ways in which your views have shifted over the past few years. Is there anything else there that you’d like to touch on?

Rohin: Oh man, I’m sure there is. My views changed so much because I was just so wrong initially.

Lucas: So most people listening should think that if given a lot more thought on this subject, that their views are likely to be radically different than the ones that they currently have and the conceptions that they currently have about AI alignment.

Rohin: Seems true from most listeners, yeah. Not all of them, but yeah.

Lucas: Yeah, I guess it’s just an interesting fact. Do you think this is like an experience of most people who are working on this problem?

Rohin: Probably. I mean, within the first year of working on the problem that seems likely. I mean just in general if you work on the problem, if you start with near no knowledge on something and then you work on it for a year, your views should change dramatically just because you’ve learned a bunch of things and I think that basically explains most of my changes in view.

It’s just actually hard for me to remember all the ways in which I was wrong back in the past and I focused on not using utility functions because I think that even other people in the field still believe right now. So that’s where that one came from, but there are like plenty of other things that are just notably, easily, demonstrably wrong about that I’m having trouble recalling now.

Lucas: Yeah, and the utility function one I think is a very good example and I think that if it were possible to find all of these in your brain and distill them, I think it would make a very, very good infographic on AI alignment, because those misconceptions are also misconceptions that I’ve had and I share those and I think that I’ve seen them also in other people. A lot of sort of the intellectual blunders that you or I have made are probably repeated quite often.

Rohin: I definitely believe that. Yeah, I guess I could talk about the things that I’m going to very soon saying the value learning sequence. Those were definitely updates that I made, one of those a utility functions thing. Another one was thinking about what we want is for the human AI system as a whole to be optimizing for some sort of goal. And this opens up a nice space of possibilities where the AI is not optimizing a goal, only the human AI system together is. Keeping in mind that that is the goal and not just the AI itself must be optimizing some sort of goal.

The idea of corrigibility itself as a thing that we should be aiming for was a pretty big update for me, took a while for me to get to that one. I think distributional shift was a pretty key concept that I learned at some point and started applying everywhere. One way of thinking about the evolving preferences over time thing is that humans, they’ve been trained on the environment that we have right now and arguably we’ve been trained on the ancestral environment too by evolution, but we haven’t been trained on whatever the future is going to be.

Or for a more current example, we haven’t been trained on social media. Social media is a fairly new thing affecting us in ways that we hadn’t considered in the past and this is causing us to change how we do things. So in some sense what’s happening is as we go into the future, we’re encountering a distributional shift and human values don’t extrapolate well to that distributional shift. What do you actually need to do is wait for the humans to get to that point, let them experience it, train on it, have their values be trained on this new distribution and then figure out what they are rather than trying to do it right now when their values are just going to be wrong or going to be not what they would get if they were actually in that situation.

Lucas: Isn’t that sort of summarizing coherent extrapolated volition?

Rohin: I don’t know that coherent extrapolated volition explicitly talks about having the human be in a new environment. I guess you could imagine that CEV considers … If you imagine like a really, really long process of deliberation in CEV, then you could be like, okay what would happen if I were in this environment and all these sorts of things happened. It seems like you would need to have a good model of how the world works and how physics works in order to predict what the environment would be like. Maybe you can do that and then in that case you simulate a bunch of different environments and you think about how humans would adapt and evolve and respond to those environments and then you take all of that together and you summarize it and distill it down into a single utility function.

Plausibly that could work. Doesn’t seem like a thing we can actually build, but as a definition of what we might want, that seems not bad. I think that is me putting the distributional shift perspective on CEV and it was not, certainly not obvious to me from the statement of CEV itself, that you’re thinking about how to mitigate the impact of distributional shift on human values. I think I’ve had this perspective and I’ve put it on CEV and I’m like, yeah, that seems fine, but it was not obvious to me from reading about CEV alone.

Lucas: Okay, cool.

Rohin: I recently posted a comment on the Alignment Forum talking about how we want to like … I guess this is sort of in corrigibility ability too, making an AI system that tries to help us as opposed to making an AI system that is optimizing the one true utility function. So that was an update I made, basically the same update as the one about aiming for corrigibility. I guess another update I made is that while there is a phase transition or something or like a sharp change in the problems that we see when AIs become human level or super-intelligent, I think the underlying causes of the problems don’t really change.

Underlying causes of problems with narrow AI systems, probably similar to the ones that underlie a super intelligent systems. Having their own reward function leads to problems both in narrow settings and in super-intelligent settings. This made me more optimistic about doing work trying to address current problems, but with an eye towards long-term problems.

Lucas: What made you have this update?

Rohin: Thinking about the problems a lot, in particular thinking about how they might happen in current systems as well. So I guess a prediction that I would make is that if it is actually true that superintelligence would end up killing us all or something like that, some like really catastrophic outcome. Then I would predict that before that, we will see some AI system that causes some other smaller scale catastrophe where I don’t know what catastrophe means, it might be something like oh, you humans die or oh, the power grid went down for some time or something like that.

And then before that we will have things that sort of fail in relatively not important ways, but in ways of say that like here’s an underlying problem that we need to fix with how we build AI systems. If you extrapolate all the way back to today that looks like for example to boat racing example from open AI, a reward hacking one. So generally expecting things to be more continuous. Not necessarily slow, but continuous. That update I made because of the posts arguing for slow take off from Paul Christiano and AI impacts.

Lucas: Right. And the view there is sort of that the world will be propagated with lower-level ML as we sort of start to ratchet up the capability of intelligence. So a lot of tasks will sort of be … Already being done by systems that are slightly less intelligent than the current best system. And so all work ecosystems will already be fully flooded with AI systems optimizing within the spaces. So there won’t be a lot of space for the first AGI system or whatever to really get decisive strategic advantage.

Rohin: Yeah, would I make prediction that we won’t have a system that gets a decisive strategic advantage? I’m not sure about that one. It seems plausible to me that we have one AI system that is improving over time and we use those improvements in society for before it becomes super intelligent. But then by the time it becomes super intelligent, it is still the one AI system that is super intelligent. So it does gain a decisive strategic advantage.

An example of this would be if there was just one main AGI project, I would still predict that progress on AI, it would be continuous, but I would not predict a multipolar outcome in that scenario. The corresponding view is that while I still do use the terminology first AGI because it’s like pointing out some intuitive concept that I think is useful, it’s a very, very fuzzy concept and I don’t think we’ll be able to actually point at any particular system and say that was the first AGI. Rather we’ll point to like a broad swath of time and say, “Somewhere in there AI had became generally intelligent.”

Lucas: There are going to be all these sort of like isolated meta-epistemic reasoning tools which can work in specific scenarios, which will sort of potentially aggregate in that fuzzy space to create something fully general.

Rohin: Yep. They’re going to be applied in some domains and then the percent of domains in which they apply will gradually grow grutter and eventually we’ll be like, huh, looks like there’s nothing left for humans to do. It probably won’t be a surprise, but I don’t think there will be a particular point where everyone agrees, yep, looks like AI is going to automate everything in just a few years. It’s more like AI will start automating a bunch of stuff. The amount of stuff it automates will increase over time. Some people will see it coming, see full automation coming earlier, some people will be like nah, this is just a simple task that AI can do, still got a long ways to go for all the really generally intelligent stuff. People will sign on to like oh, yeah, it’s actually becoming generally intelligent at different spots.

Lucas: Right. If you have a bunch of small mammalian level AIs automating a lot of stuff in industry, there would likely be a lot of people whose timelines would be skewed in the wrong direction.

Rohin: I’m not even sure this was a point of timelines. It was just a point of like which is the system that you call AGI. I claim this will not have a definitive answer. So that was also an update to how I was thinking. That one, I think, is like more generally accepted in the community. And this was more like well, all of the literature on the AI safety that’s publicly available and like commonly read by EA’s doesn’t really talk about these sorts of points. So I just hadn’t encountered these things when I started out. And then I encountered a more maybe I thought to myself, I don’t remember, but like once I encountered the arguments I was like, yeah, that makes sense and maybe I should have thought of that before.

Lucas: In the sequence which you’re writing, do you sort of like cover all of these items which you didn’t think were in the mainstream literature?

Rohin: I cover some of them. The first few things I told you were I was just like what did I say in the sequence. There were a few I think that probably aren’t going to be in that sequence just because there’s a lot of stuff that people have not written down.

Lucas: It’s pretty interesting because the way in which the AI alignment field is evolving is sometimes, it’s often difficult to have a bird’s-eye view of where it is and track avant-guard ideas being formulated in people’s brains and being shared.

Rohin: Yeah. I definitely agree. I was hoping that the Alignment Newsletter, which I write, to help with that. I would say it probably speeds up the process of bit, but it’s definitely not keeping you on the forefront. There are many ideas that I’ve heard about, that I’ve even read documents about that haven’t made it in the newsletter yet because they haven’t become public.

Lucas: So how many months behind do you think for example, the newsletter would be?

Rohin: Oh, good question. Well, let’s see. There’s a paper that I started writing in May or April that has not made it into the newsletter yet. There’s a paper that I finished and submitted in October that has not made it to the newsletter yet, or was it September, possibly September. That one will come out soon. That suggests a three month lag. But I think many others have been longer than that. Admittedly, this is for academic researchers at CHAI. I think CHAI is like we tend to publish using papers and not blog posts and this results in the longer delay on our side.

Also because work on relative reachability, for example, I’ve learned about quite a bit. I learned about maybe four or five months before she released it and that’s when it came out in the newsletter. And of course, she’d been working on it for longer or like AI safety by debate I think I learned about six or seven months before it was published in came out in the newsletter. So yeah, somewhere between three months and half a year for things seems likely. For things that I learned from MIRI, it’s possible that they never get into the newsletter because they’re never made public. So yeah, there’s a fairly broad range there.

Lucas: Okay. That’s quite interesting. I think that also sort of gives people a better sense of what’s going on in technical AI alignment because it can seem kind of black boxy.

Rohin: Yeah. I mean, in some sense this is a thing that all fields have. I used to work in programming languages. On there we would often write a paper and submit it and then go and present about it a year later by the time we had moved on, done a whole other project and written other paper and then we’d go back and we’d talk about this. I definitely remember sometimes grad students being like, “Hey, I want to get this practice document.” I say, “What’s it about?” It’s like some topic. And I’m like wait, but you did that. I heard about this like two years ago. And they’re like, yep, just got published.

So in that sense, I think both AI is faster and AI alignment is I think even faster than AI because it’s a smaller field and people can talk to each other more, and also because a lot of us write blog posts. Blog posts are great.

Lucas: They definitely play a crucial role within the community in general. So I guess just sort of tying things up a bit more here, pivoting back to a broader view. Given everything that you’ve learned and how your ideas have shifted, what are you most concerned about right now in AI alignment? How are the prospects looking to you and how does the problem of AI alignment look right now to Rohin Shah?

Rohin: I think it looks pretty tractable, pretty good. Most of the problems that I see are I think ones that we can see in advance, we probably can solve. None of these seem like particularly impossible to me. I think I also give more credit to the machine learning community or AI community than other researchers do. I trust in our ability where here are meaning like the AI field broadly, our ability to notice what things could go wrong and fix them in a way that maybe other researchers in the AI safety don’t.

I think one of the things that feels most problematic to me right now is the problem of inner optimizers, which I’m told there will probably be a sequence on in the future because there aren’t great resources on it right now. So basically this is the idea of if you run a search process over a wide space of strategies or options and you search for something that gets you good external reward or something like that, what you might end up finding is a strategy that is itself a consequentialist agent that’s optimizing for its own internal reward and that internal reward will agree with the external reward on the training data because that’s why it was selected, but it might diverge soon as there’s any distribution shift.

And then it might start optimizing against us adversarially in the same way that you would get if you like gave a misspecified award function to and RL system today. This seems plausible to me. I’ve read a bit more about this and talk to people about this and things that aren’t yet public, but hopefully will soon be. I definitely recommend reading that if it ever comes out, but yeah, this seems like it could be a problem. I don’t think we have any instance of it being a problem yet. Seems hard to detect and I’m not sure how I would fix it right now.

But I also don’t think that we’ve thought about the problem or I don’t think I’ve thought about the problem that much. I don’t want to say like, “Oh man, this is totally unsolvable,” yet. Maybe I’m just an optimistic person by nature. I mean, that’s definitely true, but maybe that’s biasing my judgment here. Feels like we could probably solve that if it ends up being a problem.

Lucas: Is there anything else here that you would like to wrap up on in terms of AI alignment or inverse reinforcement learning?

Rohin: I want to continue to exhort that we should not be trying to solve all the metaphilosophical problems and we should not be trying to like infer the one true utility function and we should not be modeling an AI as pursuing a single goal over the long-term. That is a thing I want to communicate to everybody else. Apart from that I think we’ve covered everything at a good depth. Yeah, I don’t think there’s anything else I’d add to that.

Lucas: So given that I think rather succinct distillation of what we are trying not to do, could you try and offer an equally succinct distillation of what we are trying to do?

Rohin: I wish I could. That would be great, wouldn’t it? I can tell you that I can’t do that. I could give you like a suggestion on what we are trying to do instead, which would be try to build an AI system that is corrigible, that is doing what we want, but it’s going to remain under human control in some sense. It’s going to ask us, take our preferences into account, not try to go off behind our backs and optimize against us. That is a summary of a path that we could go down that I think is premised or what I would want our AI systems to be like. But that’s unfortunately very sparse on concrete details because I don’t know those concrete details yet.

Lucas: Right. I think that that sort of perspective shift is quite important. I think it changes the nature of the problem and how one thinks about the problem, even at the societal level.

Rohin: Yeah. Agreed.

Lucas: All right. So thank you so much Rohin, it’s really been a pleasure. If people are interested in checking out some of this work that we have mentioned or following you, where’s the best place to do that?

Rohin: I have a website. It is just RohinShah.com. Subscribing to the Alignment Newsletter is … Well, it’s not a great way to figure out what I personally believe. Maybe if you’d keep reading the newsletter over time and read my opinions for several weeks in a row, maybe then you’d start getting a sense of what Rohin thinks. It will soon have links to my papers and things like that, but yeah, that’s probably the best way on this, like my website. I do have a Twitter, but I don’t really use it.

Lucas: Okay. So yeah, thanks again Rohin. It’s really been a pleasure. I think that was a ton to think about and I think that I probably have a lot more of my own thinking and updating to do based off of this conversation.

Rohin: Great. Love it when that happens.

Lucas: So yeah. Thanks so much. Take care and talk again soon.

Rohin: All right. See you soon.

Lucas: If you enjoyed this podcast, please subscribe, give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

[end of recorded material]

Podcast: Governing Biotechnology, From Avian Flu to Genetically-Modified Babies with Catherine Rhodes

A Chinese researcher recently made international news with claims that he had edited the first human babies using CRISPR. In doing so, he violated international ethics standards, and he appears to have acted without his funders or his university knowing. But this is only the latest example of biological research triggering ethical concerns. Gain-of-function research a few years ago, which made avian flu more virulent, also sparked controversy when scientists tried to publish their work. And there’s been extensive debate globally about the ethics of human cloning.

As biotechnology and other emerging technologies become more powerful, the dual-use nature of research — that is, research that can have both beneficial and risky outcomes — is increasingly important to address. How can scientists and policymakers work together to ensure regulations and governance of technological development will enable researchers to do good with their work, while decreasing the threats?

On this month’s podcast, Ariel spoke with Catherine Rhodes about these issues and more. Catherine is a senior research associate and deputy director of the Center for the Study of Existential Risk. Her work has broadly focused on understanding the intersection and combination of risks stemming from technologies and risks stemming from governance. She has particular expertise in international governance of biotechnology, including biosecurity and broader risk management issues.

Topics discussed in this episode include:

  • Gain-of-function research, the H5N1 virus (avian flu), and the risks of publishing dangerous information
  • The roles of scientists, policymakers, and the public to ensure that technology is developed safely and ethically
  • The controversial Chinese researcher who claims to have used CRISPR to edit the genome of twins
  • How scientists can anticipate whether the results of their research could be misused by someone else
  • To what extent does risk stem from technology, and to what extent does it stem from how we govern it?

Books and publications discussed in this episode include:

You can listen to this podcast above, or read the full transcript below. And feel free to check out our previous podcast episodes on SoundCloud, iTunes, Google Play and Stitcher.

 

Ariel: Hello. I’m Ariel Conn with the Future of Life Institute. Now I’ve been planning to do something about biotechnology this month anyways since it would go along so nicely with the new resource we just released which highlights the benefits and risks of biotech. I was very pleased when Catherine Rhodes agreed to be on the show. Catherine is a senior research associate and deputy director of the Center for the Study of Existential Risk. Her work has broadly focused on understanding the intersection and combination of risks stemming from technologies and risks stemming from governance, or a lack of it.

But she has particular expertise in international governance of biotechnology, including biosecurity and broader risk management issues. The timing of Catherine as a guest is also especially fitting given that just this week the science world was shocked to learn that a researcher out of China is claiming to have created the world’s first genetically edited babies.

Now neither she nor I have had much of a chance to look at this case too deeply but I think it provides a very nice jumping-off point to consider regulations, ethics, and risks, as they pertain to biology and all emerging sciences. So Catherine, thank you so much for being here.

Catherine: Thank you.

Ariel: I also want to add that we did have another guest scheduled to join us today who is unfortunately ill, and unable to participate, so Catherine, I am doubly grateful to you for being here today.

Before we get too far into any discussions, I was hoping to just go over some basics to make sure we’re all on the same page. In my readings of your work, you talk a lot about biorisk and biosecurity, and I was hoping you could just quickly define what both of those words mean.

Catherine: Yes, in terms of thinking about both biological risk and biological security, I think about the objects that we’re trying to protect. It’s about the protection of human, animal, and plant life and health, in particular. Some of that extends to protection of the environment. The risks are the risks to those objects and security is securing and protecting those.

Ariel: Okay. I’d like to start this discussion where we’ll talk about ethics and policy, looking first at the example of the gain-of-function experiments that caused another stir in the science community a few years ago. That was research which was made, I believe, on the H5N1 virus, also known as the avian flu, and I believe it made the virus more virulent. First, can you just explain what gain-of-function means? And then I was hoping you could talk a bit about what that research was, and what the scientific community’s reaction to it was.

Catherine: Gain-of-function’s actually quite a controversial term to have selected to describe this work, because a lot of what biologists do is work that would add a function to the organism that they’re working on, without that actually posing any security risk. In this context, it was a gain of a function that would make it perhaps more desirable for use as a biological weapon.

In this case, it was things like an increase in its ability to transmit between mammals, so in particular, they were getting it tracked to be transmittable between ferrets in a laboratory, and ferrets are a model for transmission between humans.

Ariel: You actually bring up an interesting point that I hadn’t thought about. To what extent does our choice of terminology affect how we perceive the ethics of some of these projects?

Catherine: I think it was perhaps in this case, it was more that the use of that term which was more done from perhaps the security and policy community side, made the conversation with scientists more difficult, as it was felt this was mislabeling our research, it’s affecting research that shouldn’t really come into this kind of conversation about security. So I think that was where it maybe caused some difficulties.

But I think also there’s understanding that needs to be the other way as well, that this isn’t not necessarily that all policymakers are going to have that level of detail about what they mean when they’re talking about science.

Ariel: Right. What was the reaction then that we saw from the scientific community and the policymakers when this research was published?

Catherine: There was firstly a stage of debate about whether those papers should be published or not. There was some guidance given by what’s called the National Science Advisory Board for Biosecurity in the US, that those papers should not be published in full. So, actually, the first part of the debate was about that stage of ‘should you publish this sort of research where it might have a high risk of misuse?’

That was something that the security community had been discussing for at least a decade, that there were certain experiments where they felt that they would meet a threshold of risk, where they shouldn’t be openly published or shouldn’t be published with their methodological details in full. I think for the policy and security community, it was expected that these cases would arise, but this hadn’t perhaps been communicated to the scientific community particularly well, and so I think it came as a shock to some of those researchers, particularly because the research had been approved initially, so they were able to conduct the research, but suddenly they would find that they can’t publish the research that they’ve done. I think that was where this initial point of contention came about.

It then became a broader issue. More generally, how do we handle these sorts of cases? Are there times when we should restrict publication? Or, is publication actually open publication, going to be a better way of protecting ourselves, because we’ll all know about the risks as well?

Ariel: Like you said, these scientists had gotten permission to pursue this research, so it’s not like it was questionable, or they had no reason to think it was too questionable to begin with. And yet, I guess there is that issue of how can scientists think about some of these questions more long term and maybe recognize in advance that the public or policymakers might find their research concerning? Is that something that scientists should be trying to do more of?

Catherine: Yes, and I think that’s part of this point about the communication between the scientific and policy communities, so that these things don’t come as a surprise or a shock. Yes, I think there was something in this. If we’re allowed to do the research, should we not have had more conversation at the earlier stages? I think in general I would say that’s where we need to get to, because if you’re trying to intervene at the stage of publication, it’s probably already too late to really contain the risk of publication, because for example, if you’ve submitted a journal article online, that information’s already out there.

So yes, trying to take it further back in the process, so that the beginning stages of designing research projects these things are considered, is important. That has been pushed forward by funders, so there are now some clauses about ‘have you reviewed the potential consequences of your research?’ That is one way of triggering that thinking about it. But I think there’s been a broader question further back about education and awareness.

It’s all right if you’re being asked that question, but do you actually have information that helps you know what would be a security risk? And what elements might you be looking for in your work? So, there’s this case more generally in how do we build awareness amongst the scientific community that these issues might arise, and train them to be able to spot some of the security concerns that may be there?

Ariel: Are we taking steps in that direction to try to help educate both budding scientists and also researchers who have been in the field for a while?

Catherine: Yes, there have been quite a lot of efforts in that area. Again, probably over the last decade or so, done by academic groups in civil society. It’s been something that’s been encouraged by states-parties to the Biological Weapons Convention have been encouraging education and awareness raising, and also the World Health Organization. It’s got a document on responsible life sciences research, and it also encourages education and awareness-raising efforts.

I think that those have further to go, and I think some of the barriers to those being taken up are the familiar things that it’s very hard to find space in a scientific curriculum to have that teaching, that more resources are needed in terms of where are the materials that you would go to. That is being built up.

I think also then talking about the scientific curriculums at maybe the undergraduate, postgraduate level, but how do you extend this throughout scientific careers as well? There needs to be a way of reaching scientists at all levels.

Ariel: We’re talking a lot about the scientists right now, but in your writings, you mention that there are three groups who have responsibility for ensuring that science is safe and ethical. Those are one, obviously the scientists, but then also you mention policymakers, and you mention the public and society. I was hoping you could talk a little bit about how you see the roles for each of those three groups playing out.

Catherine: I think these sorts of issues, they’re never going to be just the responsibility of one group, because there are interactions going on. Some of those interactions are important in terms of maybe incentives. So we talked about publication. Publication is of such importance within the scientific community and within their incentive structures. It’s so important to publish, that again, trying to intervene just at that stage, and suddenly saying, “No, you can’t publish your research” is always going to be a big problem.

It’s to do with the norms and the practices of science, but some of that, again, comes from the outside. Are there ways we can reshape those sorts of structures that would be more useful? Is one way of thinking about it. I think we need clear signals from policymakers as well, about when to take threats seriously or not. If we’re not hearing from policymakers that there are significant security concerns around some forms of research, then why should we expect the scientist to be aware of it?

Yes, also policy does have a control and governance mechanisms within it, so it can be very useful. In forms of deciding what research can be done, that’s often done by funders and government bodies, and not by the research community themselves. Trying to think how more broadly, to bring in the public dimension. I think what I mean there is that it’s about all of us being aware of this. It shouldn’t be isolating one particular community and saying, “Well, if things go wrong, it was you.”

Socially, we’ve got decisions to make about how we feel about certain risks and benefits and how we want to manage them. In the gain-of-function case, the research that was done had the potential for real benefits for understanding avian influenza, which could produce a human pandemic, and therefore there could be great public health benefits associated with some of this research that also poses great risks.

Again, when we’re dealing with something that for society, could bring both risks and benefits, society should play a role in deciding what balance it wants to achieve.

Ariel: I guess I want to touch on this idea of how we can make sure that policymakers and the public – this comes down to a three way communication. I guess my question is, how do we get scientists more involved in policy, so that policymakers are informed and there is more of that communication? I guess maybe part of the reason I’m fumbling over this question is it’s not clear to me how much responsibility we should be putting specifically on scientists for this, versus how much responsibility does go to the other groups.

Catherine: About science, it’s becoming more involved in policy. That’s another part of thinking of the relationship between science and policy, and science and society, is that we’ve got an expectation that part of what policymakers will consider is how to have regulation and governance that’s appropriate to scientific practice, and to emerging technologies, science and technology advances, then they need information from the scientific community about those things. There’s a responsibility of policymakers to seek some of that information, but also for scientists to be willing to engage in the other direction.

I think that’s the main answer to how they could be more informed, and what other ways there could be more communication? I think some of the useful ways that’s done at the moment is by having, say, meetings where there might be a horizon scanning element, so that scientists can have input on where we might see advances going. But if you also have within the participation, policymakers, and maybe people who know more about things like technology transfer, and startups, investments, so they can see what’s going on in terms of where the money’s going. Bringing those groups together to look at where the future might be going is quite a good way of capturing some of those advances.

And it helps inform the whole group, so I think those sorts of processes are good, and there are some examples of those, and there are some examples where the international science academies come together to do some of that sort of work as well, so that they would provide information and reports that can go forward to international policy processes. They do that for meetings at the Biological Weapons Convention, for example.

Ariel: Okay, so I want to come back to this broadly in a little bit, but first I want to touch on biologists and ethics and regulation a little bit more generally. Because I guess I keep thinking of the famous Asilomar meeting from I think it was in the late ’70s, in which biologists got together, recognized some of the risks in their field, and chose to pause the work that they were doing, because there were ethical issues. I tend to credit them with being more ethically aware than a lot of other scientific fields.

But it sounds like maybe that’s not the case. Was that just a special example in which scientists were unusually proactive? I guess, should we be worried about scientists and biosecurity, or is it just a few bad apples like we saw with this recent Chinese researcher?

Catherine: I think in terms of ethical awareness, it’s not that I don’t think biologists are ethically aware, but it is that there can be a lot of different things coming onto their agendas in that, and again, those can be pushed out by other practices within your daily work. So, I think for example, one of the things in biology, often it’s quite close to medicine, and there’s been a lot over the last few decades about how we treat humans and animals in research.

There’s ethics and biomedical ethics, there’s practices to do with consent and participation of human subjects, that people are aware of. It’s just that sometimes you’ve got such an overload of all these different issues you’re supposed to be aware of and responding to, so sustainable development and environmental protection is another one, that I think it’s going to be the case that often things will fall off the agenda or knowing which you should prioritize perhaps can be difficult.

I do think there’s this lack of awareness of the past history of biological warfare programs, and the fact that scientists have always been involved with them, and then looking forward to know how much more easy, because of the trends in technology, it may be for more actors to have access to such technologies and the implications that might have.

I think that picks up on what you were saying about, are we just concerned about the bad apples? Are there some rogue people out there that we should be worried about? I think there’s two parts to that, because there may be some things that are more obvious, where you can spot, “Yeah, that person’s really up to something they shouldn’t be.” I think there are probably mechanisms where people do tend to be aware of what’s going on in their laboratories.

Although, as you mentioned, the recent Chinese case, potentially CRISPR gene edited babies, it seems clear that people within that person’s laboratory didn’t know what was going on, the funders didn’t know what was going on, the government didn’t know what was going on, so yes, there will be some cases where there’s something very obvious that someone is doing bad.

I think that’s probably an easier thing to handle and to conceptualize, but when we’re now getting these questions about you can be doing the stuff, scientific work, and research, that’s for clear benefits, and you’re doing it for those beneficial purposes, but how do you work out whether the results of that could be misused by someone else? How do you frame whether you have any responsibility for how someone else would use it when they may well not be anywhere near you in a laboratory? They may be very remote, you probably have no contact with them at all, so how can you judge and assess how your work may be misused, and then try and make some decision about how you should proceed with it? I think that’s a more complex issue.

That does probably, as you say, speak to ‘are there things in scientific cultures, working practices, that might assist with dealing with that? Or might make it problematic?’ Again, I think I’ve picked up a few times, but there’s a lot going on in terms of the sorts of incentive structures that scientists are working in, which do more broadly meet up with global economic incentives. Again, not knowing the full details of the recent Chinese CRISPR case, there can often be almost racing dynamics between countries to have done some of this research and to be ahead in it.

I think that did happen with the gain-of-function experiments so that when the US had a moratorium on doing them, that China wrapped up its experiments in the same area. There’s all these kind of incentive structures that are going on as well, and I think those do affect wider scientific and societal practices.

Ariel: Okay. Quickly touching on some of what you were talking about, in terms of researchers who are doing things right, in most cases I think what happens is this case of dual use, where the research could go either way. I think I’m going to give scientists the benefit of the doubt and say most of them are actually trying to do good with their research. That doesn’t mean that someone else can’t come along later and then do something bad with it.

This is I think especially a threat with biosecurity, and so I guess, I don’t know that I have a specific question that you haven’t really gotten into already, but I am curious if you have ideas for how scientists can deal with the dual use nature of their research. Maybe to what extent does more open communication help them deal with it, or is open communication possibly bad?

Catherine: Yes. I think yes it’s possibly good and possibly bad. I think again, yeah, it’s a difficult question without putting their practice into context. Again, it shouldn’t be that just the scientist has to think through these issues of dual use and can it be misused. If there’s not really any new information coming out about how serious a threat this might be, so do we know that this is being pursued by any terrorist group? Do we know why that might be of a particular concern?

I think another interesting thing is that you might get combinations of technology that have developed in different areas, so you might get someone who does something that helps with the dispersal of an agent, that’s entirely disconnected from someone who might be working on an agent, that would be useful to disperse. Knowing about the context of what else is going on in technological development, and not just within your own work is also important.

Ariel: Just to clarify, what are you referring to when you say agent here?

Catherine: In this case, again, thinking of biology, so that might be a microorganism. If you were to be developing a biological weapon, you don’t just need to have a nasty pathogen. You would need some way of dispersing, disseminating that, for it to be weaponized. Those components may be for beneficial reasons going on in very different places. How would scientists be able to predict where those might combine and come together, and create a bigger risk than just their own work?

Ariel: Okay. And then I really want to ask you about the idea of the races, but I don’t have a specific question to be honest. It’s a concerning idea, and it’s something that we look at in artificial intelligence, and it’s clearly a problem with nuclear weapons. I guess what are concerns we have when we look at biological races?

Catherine: It may not even be necessarily specific to looking at biological races, but it is this thing, and again, not even thinking of maybe military science uses of technology, but about how we have very strong drivers for economic growth, and that technology advances will be really important to innovation and economic growth.

So, I think this does provide a real barrier to collective state action against some of these threats, because if a country can see an advantage of not regulating an area of technology as strongly, then they’ve got a very strong incentive to go for that. It’s working out how you might maybe overcome some of those economic incentives, and try and slow down some of the development of technology, or application of technology perhaps, to a pace where we can actually start doing these things like working out what’s going on, what the risks might be, how we might manage those risks.

But that is a hugely controversial kind of thing to put forward, because the idea of slowing down technology, which is clearly going to bring us these great benefits and is linked to progress and economic progress is a difficult sell to many states.

Ariel: Yeah, that makes sense. I think I want to turn back to the Chinese case very quickly. I think this is an example of what a lot of people fear, in that you have this scientist who isn’t being open with the university that he’s working with, isn’t being open with his government about the work he’s doing. It sounds like even the people who are working for him in the lab, and possibly even the parents of the babies that are involved may not have been fully aware of what he was doing.

We don’t have all the information, but at the moment, at least what little we have sounds like an example of a scientist gone rogue. How do we deal with that? What policies are in place? What policies should we be considering?

Catherine: I think I share where the concerns in this are coming from, because it looks like there’s multiple failures of the types of layers of systems that should have maybe been able to pick this up and stop it, so yes, we would usually expect that a funder of the research, or the institution the person’s working in, the government through regulation, the colleagues of a scientist would be able to pick up on what’s happening, have some ability to intervene, and that doesn’t seem to have happened.

Knowing that these multiple things can all fall down is worrying. I think actually an interesting thing about how we deal with this that there seems to be a very strong reaction from the scientific community working around those areas of gene editing, to all come together and collectively say, “This was the wrong thing to do, this was irresponsible, this is unethical. You shouldn’t have done this without communicating more openly about what you were doing, what you were thinking of doing.”

I think that’s really interesting to see that community push back which I think in those cases to me, where scientists are working in similar areas, I’d be really put off by that, thinking, “Okay, I should stay in line with what the community expects me to do.” I think that is important.

Where it also is going to kick in from the more top-down regulatory side as well, so whether China will now get some new regulation in place, do some more checks down through the institutional levels, I don’t know. Likewise, I don’t know whether internationally it will bring a further push for coordination on how we want to regulate those experiments.

Ariel: I guess this also brings up the question of international standards. It does look like we’re getting very broad international agreement that this research shouldn’t have happened. But how do we deal with cases where maybe most countries are opposed to some type of research and another country says, “No, we think it could be possibly ethical so we’re going to allow it?”

Catherine: I think this is again, the challenging situation. It’s interesting to me, this picks up, I’m trying to think whether this is maybe 15-20 years ago, but the debates about human cloning internationally, whether there should be a ban on human cloning. There was a declaration made, there’s a UN declaration against human cloning, but it fell down in terms of actually being more than a declaration, having something stronger in terms of an international law on this, because basically in that case, it was the differences between states’ views of the status of the embryo.

Regulating human reproductive research at the international level is very difficult because of some of those issues where like you say, there can be quite significant differences in ethical approaches taken by different countries. Again, in this case, I think what’s been interesting is, “Okay, if we’re going to come across a difficulty in getting an agreement between states and the governmental level, is there things that the scientific community or other groups can do to make sure those debates are happening, and that some common ground is being found to how we should pursue research in these areas, when we should decide it’s maybe safe enough to go down some of these lines?”

I think another point about this case in China was that it’s just not known whether it’s safe to be doing gene editing on humans yet. That’s actually one of the reasons why people shouldn’t be doing it regardless. I hope that gets some way to the answer. I think it is very problematic that we often will find that we can’t get broad international agreement on things, even when there seems to be some level of consensus.

Ariel: We’ve been talking a lot about all of these issues from the perspective of biological sciences, but I want to step back and also look at some of these questions more broadly. There’s two sides that I want to look at. One is just this question of how do we enable scientists to basically get into policy more? I mean, how can we help scientists understand how policymaking works and help them recognize that their voices in policy can actually be helpful? Or, do you think that we are already at a good level there?

Catherine: I would say we’re certainly not at an ideal level yet of science and policy. It does vary across different areas of course, so the thing that was coming up into my mind is in climate change, for example, having the intergovernmental panel doing their reports every few years. There’s a good, collaborative, international evidence base and good science policy process in that area.

But in other areas there’s a big deficit I would say. I’m most familiar with that internationally, but I think some of this scales down to the national level as well. Part of it is going in the other direction almost. When I spoke earlier about needs perhaps for education and awareness raising among scientists about some of these issues around how their research may be used, I think there’s also a need for people in policy to become more informed about science.

That is important. I’m trying to think what are the ways maybe scientists can do that? I think there’s some attempts, so when there’s international negotiations going on, to have … I think I’ve heard them described as mini universities, so maybe a week’s worth of quick updates on where the science is at before a negotiation goes on that’s relevant to that science.

I think one of the key things to say is that there are ways for scientists and the scientific community to have influence both on how policy develops and how it’s implemented, and a lot of this will go through intermediary bodies. In particular, the professional associations and academies that represent scientific communities. They will know, for example, thinking in the UK context, but I think this is similar in the US, there may be a consultation by parliament on how should we address a particular issue?

There was one in the UK a couple of years ago, how should we be regulating genetically modified insects? If a consultation like that’s going on and they’re asking for advice and evidence, there’s often ways of channeling that through academies. They can present statements that represent broader scientific consensus within their communities and input that.

The reason for mentioning them as intermediaries, again, it’s a lot of a burden to put on individual scientists to say, “You should all be getting involved in policy and informing policy. Another part of what you should be doing as part of your role,” but yes, realizing that you can do that as a collective, rather than it just having to be an individual thing I think is valuable.

Ariel: Yeah, there is the issue of, “Hey, in your free time, can you also be doing this?” It’s not like scientists have lots of free time. But one of the things that I get the impression is that scientists are sometimes a little concerned about getting involved with policymaking because they fear overregulation, and that it could harm their research and the good that they’re trying to do with their research. Is this fear justified? Are scientists hampered by policies? Are they helped by policies?

Catherine: Yeah, so it’s both. It’s important to know that the mechanisms of policy can play facilitative roles, they can promote science, as well as setting constraints and limits on it. Again, most governments are recognizing that the life sciences and biology and artificial intelligence and other emerging technologies are going to be really key for their economic growth.

They are doing things to facilitate and support that, and fund it, so it isn’t only about the constraints. However, I guess for a lot of scientists, the way you come across regulation, you’re coming across the bits that are the constraints on your work, or there are things that make you fill in a lot of forms, so it can just be perceived as something that’s burdensome.

But I would also say that certainly something I’ve noticed in recent years is that we shouldn’t think that scientists and technology communities aren’t sometimes asking for areas to be regulated, asking for some guidance on how they should be managing risks. Switching back to a biology example, but with gene drive technologies, the communities working on those have been quite proactive in asking for some forms of, “How do we govern the risks? How should we be assessing things?” Saying, “These don’t quite fit with the current regulatory arrangements, we’d like some further guidance on what we should be doing.”

I can understand that there might be this fear about regulation, but I also think something you said, could this be the source of the reluctance to engage with policy, and I think an important thing to say there is that actually if you’re not engaging with policy, it’s more likely that the regulation is going to be working in ways that are not intentionally, but could be restricting scientific practice. I think that’s really important as well, that maybe the regulation is created in a very well intended way, and it just doesn’t match up with scientific practice.

I think at the moment, internationally this is becoming a discussion around how we might handle the digital nature of biology now, when most regulation is to do with materials. But if we’re going to start regulating the digital versions of biology, so gene sequencing information, that sort of thing, then we need to have a good understanding of what the flows of information are, in which ways they have value within the scientific community, whether it’s fundamentally important to have some of that information open, and we should be very wary of new rules that might enclose it.

I think that’s something again, if you’re not engaging with the processes of regulation and policymaking, things are more likely to go wrong.

Ariel: Okay. We’ve been looking a lot about how scientists deal with the risks of their research, how policymakers can help scientists deal with the risks of their research, et cetera, but it’s all about the risks coming from the research and from the technology, and from the advances. Something that you brought up in a separate conversation before the podcast is to what extent does risk stem from technology, and to what extent can it stem from how we govern it? I was hoping we could end with that question.

Catherine: That’s a really interesting question to me, and I’m trying to work that out in my own research. One of the interesting and perhaps obvious things to say is it’s never down to the technology. It’s down to how we develop it, use it, implement it. The human is always playing a big role in this anyway.

But yes, I think a lot of the time governance mechanisms are perhaps lagging behind the development of science and technology, and I think some of the risk is coming from the fact that we may just not be governing something properly. I think this comes down to things we’ve been mentioning earlier. We need collectively both in policy, in the science communities, technology communities, and society, just to be able to get a better grasp on what is happening in the directions of emerging technologies that could have both these very beneficial and very destructive potentials, and what is it we might need to do in terms of really rethinking how we govern these things?

Yeah, I don’t have any answer for where the sources of risk are coming from, but I think it’s an interesting place to look, is that intersection between the technology development, and the development of regulation and governance.

Ariel: All right, well yeah, I agree. I think that is a really great question to end on, for the audience to start considering as well. Catherine, thank you so much for joining us today. This has been a really interesting conversation.

Catherine: Thank you.

Ariel: As always, if you’ve been enjoying the show, please take a moment to like it, share it, and follow us on your preferred podcast platform.

[end of recorded material]

Podcast: Can We Avoid the Worst of Climate Change? with Alexander Verbeek and John Moorhead

“There are basically two choices. We’re going to massively change everything we are doing on this planet, the way we work together, the actions we take, the way we run our economy, and the way we behave towards each other and towards the planet and towards everything that lives on this planet. Or we sit back and relax and we just let the whole thing crash. The choice is so easy to make, even if you don’t care at all about nature or the lives of other people. Even if you just look at your own interests and look purely through an economical angle, it is just a good return on investment to take good care of this planet.” – Alexander Verbeek

On this month’s podcast, Ariel spoke with Alexander Verbeek and John Moorhead about what we can do to avoid the worst of climate change. Alexander is a Dutch diplomat and former strategic policy advisor at the Netherlands Ministry of Foreign Affairs. He created the Planetary Security Initiative where representatives from 75 countries meet annually on the climate change-security relationship. John is President of Drawdown Switzerland, an act tank to support Project Drawdown and other science-based climate solutions that reverse global warming. He is a blogger at Thomson Reuters, The Economist, and sciencebasedsolutions.com, and he advises and informs on climate solutions that are economy, society, and environment positive.

Topics discussed in this episode include:

  • Why the difference between 1.5 and 2 degrees C of global warming is so important, and why we can’t exceed 2 degrees C of warming
  • Why the economy needs to fundamentally change to save the planet
  • The inequality of climate change
  • Climate change’s relation to international security problems
  • How we can avoid the most dangerous impacts of climate change: runaway climate change and a “Hothouse Earth”
  • Drawdown’s 80 existing technologies and practices to solve climate change
  • “Trickle up” climate solutions — why individual action is just as important as national and international action
  • What all listeners can start doing today to address climate change

Publications and initiatives discussed in this episode include:

You can listen to this podcast above, or read the full transcript below. And feel free to check out our previous podcast episodes on SoundCloud, iTunes, Google Play and Stitcher.

 

Ariel: Hi everyone, Ariel Conn here with the Future of Life Institute. Now, this month’s podcast is going live on Halloween, so I thought what better way to terrify our listeners than with this month’s IPCC report. If you’ve been keeping up with the news this month, you’re well aware that the report made very dire predictions about what a future warmer world will look like if we don’t keep global temperatures from rising more than 1.5 degrees Celsius. Then of course there were all of the scientists’ warnings that came out after the report about how the report underestimated just how bad things could get.

It was certainly enough to leave me awake at night in a cold sweat. Yet the report wasn’t completely without hope. The authors seem to still think that we can take action in time to keep global warming to 1.5 degrees Celsius. So to consider this report, the current state of our understanding of climate change, and how we can ensure global warming is kept to a minimum, I’m excited to have Alexander Verbeek and John Moorhead join me today.

Alexander is a Dutch environmentalist, diplomat, and former strategic policy advisor at the Netherlands Ministry of Foreign Affairs. Over the past 28 years, he has worked on international security, humanitarian, and geopolitical risk issues, and the linkage to the Earth’s accelerating environmental crisis. He created the Planetary Security Initiative held at The Hague’s Peace Palace where representatives from 75 countries meet annually on the climate change-security relationship. He spends most of his time speaking and advising on planetary change to academia, global NGOs, private firms, and international organizations.

John is President of Drawdown Switzerland in addition to being a blogger at Thomson Reuters, The Economist, and sciencebasedsolutions.com. He advises and informs on climate solutions that are economy, society, and environment positive. He affects change by engaging on the solutions to global warming with youth, business, policy makers, investors, civil society, government leaders, et cetera. Drawdown Switzerland an act tank to support Project Drawdown and other science-based climate solutions that reverse global warming in Switzerland and internationally by investment at scale in Drawdown Solutions. So John and Alexander, thank you both so much for joining me today.

Alexander: It’s a pleasure.

John: Hi Ariel.

Ariel: All right, so before we get too far into any details, I want to just look first at the overall message of the IPCC report. That was essentially: two degrees warming is a lot worse than 1.5 degrees warming. So, I guess my very first question is why did the IPCC look at that distinction as opposed to anything else?

Alexander: Well, I think it’s a direct follow up from the negotiations in the Paris Agreement, where in a very late stage after the talk for all the time about two degrees, at a very late stage the text included the reference to aiming for 1.5 degrees. At that moment, it invited the IPCC to produce a report by 2018 about what the difference actually is between 1.5 and 2 degrees. Another major conclusion is that it is still possible to stay below 1.5 degrees, but then we have to really urgently really do a lot, and that is basically cut in the next 12 years our carbon pollution with 45%. So that means we have no day to lose, and governments, basically everybody, business and people, everybody should get in action. The house is on fire. We need to do something right now.

John: In addition to that, we’re seeing a whole body of scientific study that’s showing just how difficult it would be if we were to get to 2 degrees and what the differences are. That was also very important. Just for your US listeners, I just wanted to clarify because we’re going to be talking in degrees centigrade, so for the sake of argument, if you just multiply by two, every time you hear one, it’s two degrees Fahrenheit. I just wanted to add that.

Ariel: Okay great, thank you. So before we talk about how to address the problem, I want to get more into what the problem actually is. And so first, what is the difference between 1.5 degrees Celsius and 2 degrees Celsius in terms of what impact that will have on the planet?

John: So far we’ve already seen a one degree C increase. The impacts that we’re seeing, they were all predicted by the science, but in many cases we’ve really been quite shocked at just how quickly global warming is happening and the impacts it’s having. I live here in Switzerland, and we’re just now actually experiencing another drought, but in the summer we had the worst drought in eastern Switzerland since 1847. Of course we’ve seen the terrible hurricanes hitting the United States this year and last. That’s one degree. So 1.5 degrees increase, I like to use the analogy of our body temperature: If you’re increasing your body temperature by two degrees Fahrenheit, that’s already quite bad, but if you then increase it by three degrees Fahrenheit, or four, or five, or six, then you’re really ill. That’s really what happens with global warming. It’s not a straight line.

For instance, the difference between 1.5 degrees and two degrees is that heat waves are forecast to increase by over 40%. There was another study that showed that fresh water supply would decrease by 9% in the Mediterranean for 1.5 degrees, but it would decrease by 17% if we got to two degrees. So that’s practically doubling the impact for a change of 1.5 degrees. I can go on. If you look at wheat production, the difference between two and 1.5 degrees is a 70% loss in yield. Sea level rise would be 50 centimeters versus 40 centimeters, and 10 centimeters doesn’t sound like that much, but it’s a huge amount in terms of increase.

Alexander: Just to illustrate that a bit, if you have just a 10 centimeters increase, that means that 10 million people extra will be on the move. Or to formulate it another way, I remember when Hurricane Sandy hit New York and the subway flooded. At that moment we had, and that’s where we now are more or less, we have had some 20 centimeters of sea level rise since the industrial revolution. If we didn’t have those 20 centimeters, the subways would not have flooded. So it sounds like nothing, but it has a lot of impacts. I think another one that I saw that was really striking is the impact on nature, the impact on insects or on coral reefs. So if you have two degrees, there’s hardly any coral reef left in the world, whereas if it would be 1.5 degrees, we would still lose 70-90%, but there could still be some coral reefs left.

John: That’s a great example I would say, because currently it’s 50% of coral reefs at one degree increase have already died off. So at 1.5, we could reach 90%, and two degrees we will have practically wiped off all coral reefs.

Alexander: And the humanitarian aspects are massive. I mean John just mentioned water. I think one of these things we will see in the next decade or next two decades is a lot of water related problems. The amount of people that will not have access to water is increasing rapidly. It may double in the next decade. So any indication here that we have in the report on how much more problems we will see with water if we have that half degree extra is a very good warning. If you see the impact of not enough water on the quality of life of people, on people going on the move, increased urbanization, more tensions in the city because there they also have problems with having enough water, and of course water is related to energy and especially food production. So its humanitarian impacts of just that half degree extra is massive.

Then last thing here, we’re talking about global average. In some areas, if let’s say globally it gets two degrees warmer, in landlocked countries for instance, it will go much faster, or in the Arctic, it goes like twice as fast with enormous impacts and potential positive feedback loops that might end up with.

Ariel: That was something interesting for me to read. I’ve heard about how the global average will increase 1.5 to two degrees, but I hadn’t heard until I read this particular report that that can mean up to 3.5 degrees Celsius in certain places, that it’s not going to be equally distributed, that some places will get significantly hotter. Have models been able to predict where that’s likely to happen?

John: Yeah, and not only that, it’s already happening. That’s also one of the problems we face when we describe global warming in terms of one number, an average number, is that it doesn’t portray the big differences that we’re seeing in terms of global warming. For instance, in the case of Switzerland we’re already at a two degree centigrade increase, and that’s had huge implications for Switzerland already. We’re a landlocked country. We have beautiful mountains as you know, and beautiful lakes as well, but we’re currently seeing things that we hadn’t seen before, which is some of our lakes are starting to dry out in this current drought period. Lake levels have dropped very significantly. Not the major ones that are fed by glaciers, but the glaciers themselves, out of 80 glaciers that are tracked in Switzerland, 79 are retreating. They’re losing mass.

That’s having impacts, and in terms of extreme weather, just this last summer we saw these incredible – what Al Gore calls water bombs – that happened in Lausanne and Eschenz, two of our cities, where we saw centimeters, months worth of rain, fall in the space of just a few minutes. This is caused all sorts of damages as well.

Just a last point about temperature differences is that, for instance, northern Europe this last summer, we saw four, five degrees, much warmer, which caused so much drying out that we saw forest fires that we hadn’t seen in places like Sweden or Finland and so on. We also saw in February of this year what the scientists call a temperature anomaly of 20 degrees, which meant that for a few days it was warmer in the North Pole than it was in Poland because of this temperature anomaly. Averages help us understand the overall trends, but they also hide differences that are important to consider as well.

Alexander: Maybe the word global warming is, let’s say for a general public, not the right word because it sounds a bit like “a little bit warmer,” and if it’s now two degrees warmer than yesterday, I don’t care so much. Maybe “climate weirding” or “climate chaos” are better because we will just get more extremes. Let’s say you follow for instance how the jet stream is moving, it used to have rather quick pulls going around the planet at the height where the jets like to fly at about 10 kilometers. It is now, because there’s less temperature difference between the equator and the poles, it’s getting slower. It’s getting a bit lazy.

That means two things. It means on the one hand that you see that once you have a certain weather pattern, it sticks longer, but the other thing is by this lazy jet stream to compare it a bit like a river that enters the flood lands and starts to meander, is that the waves are getting bigger. Let’s say if it used to be that the jet stream brought cold air from Iceland to the Netherlands where I’m from, since it is now wavier, it brings now cold weather all the way from Greenland, and same with warm weather. It comes from further down south and it sticks longer in that pattern so you get longer droughts, you get longer periods of rain, it all gets more extreme. So a country like the Netherlands which is a delta where we always deal with too much water, and like many other countries in the world, we experience drought now which is something that we’re not used to. We have to ask foreign experts how do you deal with drought, because we always tried to pump the water out.

John: Yeah I think the French, as often is the case, have the best term for it. It’s called dérèglement climatique which is this idea of climate disruption.

Ariel: I’d like to come back to some of the humanitarian impacts because one of the things that I see a lot is this idea that it’s the richer, mostly western but not completely western countries that are causing most of the problems, and yet it’s the poorer countries that are going to suffer the most. I was wondering if you guys could touch on that a little bit?

Alexander: Well I think everything related to climate change is about that it is unfair. It is created by countries that generally are less impacted by now, so we started let’s say in western Europe with the industrial revolution and came followed by the US that took over. Historically the US produced the most. Then you have a different groups of countries. Let’s take a country in Sahel like Burkina Faso for instance. They contributed practically zero to the whole problem, but the impact is much more on their sides. Then there’s kind of a group of countries in between. Let’s say a country like China that for a long time did not contribute much to the problem and is now rapidly catching up. Then you get this difficult “tragedy of the commons” behavior that everybody points at somebody else for their part, what they have done, and either because they did it in past or because they do it now, everybody can use the statistics in their advantage, apart from these really really poor countries that are getting the worst.

I mean a country like Tuvalu is just disappearing. That’s one of those low-lying natural states in the Pacific. They contributed absolutely zero and their country is drowning. They can point at everybody else and nobody will point at them. So there is a huge call for that this is an absolutely globalized problem that you can only solve by respecting each other, by cooperating together, and by understanding that if you help other countries, it’s not only your moral obligation but it’s also in your own interest to help the others to solve this.

John: Yeah. Your listeners would most likely also be aware of the sustainable development goals, which are the objectives the UN set for 2030. There are 17 of them. They include things like no poverty, zero hunger, health, education, gender equality, et cetera. If you look at who is being impacted by a 2 degree and a 1.5 degree world, then you can see that it’s particularly in the developing and the least developed countries that the impact is felt the most, and that these SDGs are much more difficult if not impossible to reach in a 2 degree world. Which again is why it’s so important for us to stay within 1.5 degrees.

Ariel: And so looking at this from more of a geopolitical perspective, in terms of trying to govern and address… I guess this is going to be a couple questions. In terms of trying to prevent climate change from getting too bad, what do countries broadly need to be doing? I want to get into specifics about that question later, but broadly for now what do they need to be doing? And then, how do we deal with a lot of the humanitarian impacts at a government level if we don’t keep it below 1.5 degrees?

Alexander: A broad answer would be two things: get rid of the carbon pollution that we’re producing every day as soon as possible. So phase out fossil fuels. The other that’s a broad answer would be a parallel to what John was just talking about. We have the agenda 2030. We have those 17 sustainable development goals. If we would all really follow that and live up to that, we’d actually get a much better world because all of these things are integrated. If you just look at climate change in isolation you are not going to get there. It’s highly integrated to all those related problems.

John: Yeah, just in terms of what needs to be done broadly speaking, it’s the adoption of renewable energy, scaling up massively the way we produce electricity using renewables. The IPCC suggested there should be 85% and there are others that say we can even get to 100% renewables by 2050. The other side is everything to do with land use and food, our diet has a huge impact as well. On the one hand as Alexander has said very well, we need to cut down on emissions that are caused by industry and fossil fuel use, but on the other hand what’s really important is to preserve our natural ecosystems that protect us, and add forest, not deforest. We need to naturally scale up the capture of carbon dioxide. Those are the two pieces of the puzzle.

Alexander: Don’t want to go too much into details, but all together it ultimately asks for a different kind of economy. In our latest elections when I looked at the election programs, every party whether left or right or in the middle, they all promise something like, “when we’re in government, they’ll be something like 3% of economic growth every year.” But if you grow 3% every year, that means that every 20 years you double your economy. That means every 40 years you quadruple your economy, which might be nice if it will be only the services industry, but if you talk about production we can not let everything grow in the amount of resources that we use and the amount of waste we produce, when the Earth itself is not growing. So apart from moving to renewables, it is also changing the way how we use everything around and how we consume.

You don’t have to grow when you have it this good already, but it’s so much in the system that we have used the past 200, 250 years. Everything is based on growth. And as the Club of Romes said in the early ’70s, there’s limits to growth unless our planet would be something like a balloon that somebody would blow air in and it would be growing, then you would have different system. But as long as that is not the case and as long as there’s no other planets where we can fly to, that is the question where it’s very hard to find an answer. You can conclude that we can not grow, but how do we change that? That’s probably a completely different podcast debate, but it’s something I wanted to flag here because at the end of today you always end up with this question.

Ariel: This is actually, this is very much something that I wanted to come back to, especially in terms of what individuals can do, I think consuming less is one of the things that we can do to help. So I want to come back to that idea. I want to talk a little bit more though about some of the problems that we face if we don’t address the problem, and then come back to that. So, first going back to the geopolitics of addressing climate change if it happens, I think, again, we’ve talked about some of the problems that can arise as a result of climate change, but climate change is also thought of as a threat multiplier. So it could trigger other problems. I was hoping you could talk a little bit about some of the threats that governments need to be aware of if they don’t address climate change, both in terms of what climate change could directly cause and what it could indirectly cause.

Alexander: There’s so much we can cover here. Let’s start with security, it’s maybe the first one you think of. You’ll read in the paper about climate wars and water wars and those kind of popular words, which of course is too simplified. But, there is a clear correlation between changing climates and security.

We’ve seen it in many places. You see it in the place where we’re seeing more extreme weather now, so let’s say in the Sahel area, or in the Middle East, there’s a lot of examples where you just see that because of rising temperatures and because of less rainfall which is consistently going on now, it’s getting worse now. The combination is worse. You get more periods of drought, so people are going on the move. Where are they going to? Well normally, unlike many populists like to claim in some countries, they’re not immediately going to the western countries. They don’t go too far. People don’t want to move too far so they go to an area not too far away, which is a little bit less hit by this drought, but by the fact that they arrived there, they increased pressures on the little water and food and other resources that they have. That creates, of course, tensions with the people that are already there.

So think for instance about the Nomadic herdsman and the more agricultural farmers that you have and the kind of tension. They all need a little bit of water, so you see a lot of examples. There’s this well known graph where you see the world’s food prices over the past 10 years. There were two big spikes where suddenly the food prices as well as the energy prices rapidly went up. The most well known is in late 2010. Then if you plot on that graph the revolutions and uprisings and unrest in the world, you see that as soon as the world’s food price gets above, let’s say, 200, you see that there is so much more unrest. The 2010 one led soon after to the Arab Spring, which is not an automatic connection. In some countries there was no unrest, and they had the same drought, so it’s not a one on one connection.

So I think you used the right word of saying a threat multiplier. On top of all the other problems they have with bad governance and fragile economies and all kinds of other development aspects that you find back in those same SDGs that were mentioned, if you add to that the climate change problem, you will get a lot of unrest.

But let me add one last thing here. It’s not just about security. There’s also, there’s an example for instance, when Bangkok was flooding, the factory that produced chips was flooded. The chip prices worldwide suddenly rose like 10%, but there was this factory in the UK that produced perfectly ready cars to sell. The only thing they missed was this few-centimeters big electronic chip that needed to be in the car. So they had to close the factory for like 6 weeks because of a flooding in Bangkok. That just shows that this interconnected worldwide economy that we have, you’re nowhere in the world safe from the impacts of climate change.

Ariel: I’m not sure if it was the same flood, but I think Apple had a similar problem, didn’t they? Where they had a backlog of problems with hard drives or something because the manufacturer, I think in Thailand, I don’t remember, flooded.

But anyway, one more problem that I want to bring up, and that is: at the moment we’re talking about actually taking action. I mean even if we only see global temperatures rise to two degrees Celsius, that will be because we took action. But my understanding is, on our current path we will exceed two degrees Celsius. In fact, the US National Highway Traffic Safety Administration Report that came out recently basically says that a 4 degree increase is inevitable. So I want to talk about what the world looks like at that level, and then also what runaway climate change is and whether you think we’re on a path towards runaway climate change, or if that’s still an extreme that hopefully won’t happen.

John: There’s a very important discussion that’s going on around at what point we will reach that tipping point where because of positive feedback loops, it’s just going to get worse and worse and worse. There’s been some very interesting publications lately that were trying to understand at what level that would happen. It turns out that the assessment is that it’s probably around 2 degrees. At the moment, if you look at the Paris Agreement and what all the countries have committed to and you basically take all those commitments which, you were mentioning the actions that already have been started, and you basically play them out until 2030, we would be on a track that would take us to 3 degrees increase, ultimately.

Ariel: And to clarify, that’s still with us taking some level of action, right? I mean, when you talk about that, that’s still us having done something?

John: Yeah, if you add up all the countries’ plans that they committed to and they fully implement them, it’s not sufficient. We would get to 3 degrees. But that’s just to say just how much action is required, we really need to step up the effort dramatically. That’s basically what the 1.5 degrees IPCC report tells us. If we were to get already to 2 degrees, let’s not talk about 3 degrees in the moment. But what could happen is that we would reach this tipping point into what scientists are describing a “Hothouse Earth.” What that means is that you get so much ice melting — now, the ice and snow serve an important protective function. They reflect back out, because it’s white it reflects back out a lot of the heat. If all that melts and is replaced by much darker land mass or ocean, then that heat is gonna be absorbed, not reflected. So that’s one positive feedback loop that constantly makes it even warmer, and that melts more ice, et cetera.

Another one is the permafrost, where the permafrost, as its name suggests, is frozen in the northern latitudes. The risk is that it starts to melt. It’s not the permafrost itself, it’s all the methane that it contains, which is a very powerful greenhouse gas which would then get released. That leads to warmer temperatures which melts even more of the permafrost et cetera.

That’s the whole idea of runaway, then we completely lose control, all the natural cooling systems, the trees and so on start to die back as well, and so we get four, five, six … But as I mentioned earlier, 4 could be 7 in some parts of the world and it could be 2 or 3 in others. It would make large parts of the world basically uninhabitable if you take it to the extreme of where it could all go.

Ariel: Do we have ideas of how long that could take? Is that something that we think could happen in the next 100 years or is that something that would still take a couple hundred years?

John: Whenever we talk about the temperature increases, we’re looking at the end of the century, so that’s 2100, but that’s less than 100 years.

Ariel: Okay.

Alexander: The problem is looking to, at the end of the century, this always come back to “end of the century.” It sounds so far away, it’s just 82 years. I mean if you flip back, you’re in 1936. My father was a boy of 10 years old and it’s not that far away. My daughter might still live in 2100, but by that time she’ll have children and maybe grandchildren that have to live through the next century. It’s not that once we are at the year 2100 that the problem suddenly stops. We talk about an accelerating problem. If you stay on the business-as-usual scenario and you mitigate hardly anything, then it’s 4 degrees at the end of the century, but the temperatures keep rising.

As we already said, 4 degrees at the end of the century, that is kind of average. In the worst case scenario, it might as well be 6. It could also be less. And in the Arctic it could be anywhere between let’s say 6 or maybe even 11. It’s typically the Arctic where you have this methane, what John was just talking about, so we don’t want to get some kind of Venus, you know. This is typically the world we do not want. That makes it why it’s so extremely important to take measures now because anything you do now is a fantastic investment in the future.

If you look at risks on other things, Dick Cheney a couple of years ago said, if there’s only 1% chance that terrorists will get weapons of mass destruction we should act as if they have them. Why don’t we do it in this case? If there’s only 1% chance that we would get complete destruction of the planet as we know it, we have to take urgent action. So why do it on the one risk that hardly kills people if you look on big numbers, however bad terrorism is, and now we talk something about a potential massive killer of millions of people and we just say, “Yeah, well you know, only 50% chance that we get in this scenario or that scenario.”

What would you do if you were sitting in a plane and at takeoff the pilot says, “Hi guys. Happy to be on board. This is how you buckle and unbuckle your belt. And oh by the way, we have 50% chance that we’re gonna make it today. Hooray, we’re going to take off.” Well you would get out of the plane. But you can’t get out of this planet. So we have to take action urgently, and I think the report that came out is excellent.

The problem is, if you’re reading it a bit too much and everybody is focusing on it now, you get into this energetic mood like, “Hey. We can do it!” We only talk about corals. We only talk about this because suddenly we’re not talking about the three or four or five degree scenarios, which is good for a change because it gives hope. I know that in talks like this I always try to give as much hope as I can and show the possibilities, but we shouldn’t forget about how serious the thing is that we’re actually talking about. So now we go back to the positive side.

Ariel: Well I am all for switching to the positive side. I find myself getting increasingly cynical about our odds of success, so let’s try to fix that in whatever time we have left.

John: Can I just add just briefly, Alex, because I think that’s a great comment. It’s something that I’m also confronted with sometimes by fellow climate change folk, is that they come up to me, and this is after they’ve heard me talk about what the solutions are. They tell me, “Don’t make it sound too easy either.” But I think it’s a question of balance and I think that when we do talk about the solutions and we’ll hear about them, but do bear in mind just how much change is involved. I mean it is really very significant change that we need to embark on to avoid 1.5 or beyond.

Alexander: There’s basically two choices. We’re going to massively change everything we are doing on this planet, the way we work together, the actions we take, the way we run our economy, and the way we behave towards each other and towards the planet and towards everything that lives on this planet. Or we sit back and relax and we just let the whole thing crash. The choice is so easy to make, even if you don’t care at all about nature or the lives of other people. Even if you just look at your own interests and look purely through an economical angle, it is just a good return on investment to take good care of this planet.

It is only because those that have so much political power are so closely connected to the big corporations that look for short-term profits, and certainly not all of them, but the ones that are really influential, and I’m certainly thinking about the country of our host today. They have so much impact on the policies that are made and their sole interest is just the next quarterly financial report that comes out. That is not in the interest of the people of this planet.

Ariel: So this is actually a good transition to a couple of questions that I have. I actually did start looking at the book Drawdown, which talks about, what is it, 80 solutions? Is that what they discuss?

John: Yeah, 80 existing solutions or technologies or practices, and then there’s 20 what they call coming attractions which would be in addition to that. But it’s the 80 we’re talking about, yeah.

Ariel: Okay, so I started reading that and I read the introduction and the first chapter and felt very, very hopeful. I started reading about some of the technologies and I still felt hopeful. Then as I continued reading it and began to fully appreciate just how many technologies have to be implemented, I started to feel less hopeful. And so, going back, before we talk too much about the specific technologies, I think as someone who’s in the US, one of the questions that I have is even if our federal government isn’t going to take action, is it still possible for those of us who do believe that climate change is an issue to take enough action that we can counter that?

John: That’s an excellent question and it’s a very apropos question as well. My take on this is I had the privilege of being at the Global Climate Action Summit in San Francisco. You’re living it, but I think it’s two worlds basically in the United States at the moment, at least two worlds. What really impressed me, however, was that you had people of all political persuasions, you had indigenous people, you had the head of the union, you had mayors, city leaders. You also had some country leaders as well who were there, particularly those who are gonna be most impacted by climate change. What really excited me was the number of commitments that were coming at us throughout the days of, one city that’s gonna go completely renewable and so on.

We had so many examples of those. And in particular, if you’re talking about the US, California, which actually if it was its own country would be the fifth economy I believe — they’re committed to achieving 100% renewable energy by 2050. There was also the mayor of Houston, for instance, who explained how quickly he wanted to also achieve 100% renewables. That’s very exciting and that movement I think is very important. It would be of course much much better to have nations’ leaders as well to fully back this, but I think that there’s a trickle-up aspect, and I don’t know if this is the right time to talk about exponential growth that can happen. Maybe when we talk about the specific solutions we can talk about just how quickly they can go, particularly when you have a popular movement around saving the climate.

A couple of weeks ago I was in Geneva. There was a protest there. Geneva is quite a conservative city actually. I mean you’ve got some wonderful chocolate as you know, but also a lot of banks and so on. At the march, there were, according to the organizers, 7000 people. It was really impressive to see that in Geneva which is not that big a city. The year before at the same march there were 500. So we’re more than increasing the numbers by 10, and I think that there’s a lot of communities and citizens that are being affected that are saying, “I don’t care what the federal government’s doing. I’m gonna put a solar panel on my roof. I’m going to change my diet, because it’s cheaper, it saves me money, and it also is much healthier to do that and with much more resilience,” when a hurricane comes around for instance.

Ariel: I think now is a good time to start talking about what some of the solutions are. I wanna come back to the idea of trickle up, because I’m still gonna ask you guys more questions about individual action as well, but first let’s talk about some of the things that we can be doing now. What are some of the technological developments that exist today that have the most promise that we should be investing more in and using more?

John: What I perhaps wanted to do is just take a little step back, because the IPCC does talk about some very unpleasant things that could happen to our planet, but they also talk about what the steps are to stay within 1.5 degrees. Then there’s some other plans we can discuss that also achieve that. So what does the IPCC tell us? You mentioned it earlier. First of all, we need to significantly cut, every decade actually, by half, the carbon dioxide emission and greenhouse gas emissions. That’s something called the Carbon Law. It’s very convenient because you can imagine defining what your objective is and say okay, every 10 years I need to cut in half the emissions. That’s number one.

Number two is that we need to go dramatically to renewables. There’s no other way, because of the emissions that fossil fuels produce, they will no longer be an option. We have to go renewable as quickly as possible. It can be done by 2050. There’s a professor at Stanford called Mark Jacobson who with an international team has mapped out the way to get to 100% renewables for 139 countries. It’s called The Solutions Project. Number Three has to do with fossil fuels. What the IPCC says is that there should be practically no coal being used in 2050. That’s where there are some differences.

Basically, as I mentioned earlier, on the one hand you have your emissions and on the other hand you have this capture, the sequestration of carbon by soils and by vegetation. They’re both in balance. One is putting CO2 into the air, and the other is taking it out. So we need to favor obviously the sequestration. It’s an area under the curve problem. You have a certain budget that’s associated with that temperature increase. If you emit more, you need to absorb more. There’s just no two ways about it.

The IPCC is actually in that respect quite conservative, because they’re saying there still will be coal around. Whereas there are other plans such as Drawdown and the Exponential Climate Action Roadmap, as well as The Solutions Project which I just mentioned, which get us to 100% renewables by 2050, and so zero emissions for sake of argument.

The other difference I would say with the IPCC is that because you are faced with this tremendous problem of all this carbon dioxide we need to take out of the atmosphere, which is where Drawdown comes from. The term means to draw out of the atmosphere the carbon dioxide. There’s this technology which is around, it’s basically called energy crops. You basically grow crops for energy. That gives us a little bit of an issue because it encourages politicians to think that there’s a magic wand that we’ll be able to use in the future to all of a sudden be able to remove the carbon dioxide. I’m not saying that we may very well have to get there, what I am saying is that we can, with for instance Drawdown’s 80 solutions, get there.

Now in terms of the promise, the thing that I think is important is that the thinking has to evolve from the magic bullet syndrome that we all live every day, we always want to find that magic solution that’ll solve everything, to thinking more holistically about the whole of the Earth’s planetary system and how they interact and how we can achieve solutions that way.

Alexander: Can I ask something John? Can you summarize that Drawdown relies with its 80 technologies, completely on proven technology whereas in the recent 1.5 report, I have the impression that they practically, for every solution that they come up with, they rely on still unproven technologies that are still on the drawing table or maybe tested on a very small scale? Is there a difference between those two approaches?

John: Not exactly. I think there’s actually a lot of overlap. There’s a lot of the same solutions that are in Drawdown are in all climate solutions, so we come back to the same set which is actually very reassuring because that’s the way science works. It empirically tests and models all the different solutions. So what I always find very reassuring is whenever I read different approaches, I always look back at Drawdown and I say, “Okay yes, that’s in the 80 solutions.” So I think there is actually a lot of over overlap. A lot of IPCC is Drawdown solutions, but the IPCC works a bit differently because the scientists have to work with governments in terms of coming up with proposals, so there is a process of negotiation of how far can we take this which scientists such as the Project Drawdown scientists are unfettered by that.

They just go out and they look for what’s best. They don’t care if it’s politically sensitive or not, they will say what they need to say. But I think the big area of concern is this famous bio-energy carbon capture and storage (BECCS), which are these energy crops that you grow and then you capture the carbon dioxide. So you actually are capturing carbon dioxide. There’s both moral hazard because politicians will say, “Okay. I’m just going to wait until BECCS comes round and that will solve all our problems,” on the one hand. On the other hand it does pose us with some serious questions about competition of land for producing crops versus producing crops for energy.

Ariel: I actually want to follow up with Alexander’s question really quickly because I’ve gotten a similar impression that some of the stuff in the IPCC report is for technologies that are still in development. But my understanding is that the Drawdown solutions are in theory at least, if not in practice, ready to scale up.

John: They’re existing technologies, yeah.

Ariel: So when you say there’s a lot of overlap, is that me or us misunderstanding the IPCC report or are there solutions in the IPCC report that aren’t ready to be scaled up?

John: The approaches are a bit different. The approaches that Drawdown takes is a bottom up approach. They basically unleashed 65 scientists to go out and look for the best solutions. So they go out and they look at all the literature. And it just so happens that nuclear energy is one of them. It doesn’t produce greenhouse gas emissions. It is a way of producing energy that doesn’t cause climate change. A lot of people don’t like that of course, because of all the other problems we have with nuclear. But let me just reassure you very quickly that there are three scenarios for Drawdown. It goes from so-called “Plausible,” which I don’t like as a name because it suggests that the other ones might not be plausible, but it’s the most conservative one. Then the second one is “Drawdown.” Then the third one is “Optimum.”

Optimum doesn’t include solutions that are called with regrets, such as nuclear. So when you go optimum, basically it’s 100% renewable. There’s no nuclear energy in there either in the mix. That’s very positive. But in terms of the solutions, what they look at, what IPCC looks at is the trajectory that you could achieve given the existing technologies. So they talk about renewables, they talk about fossil fuels going down to net zero, they talk about natural climate solutions, but perhaps they don’t talk about, for instance, educating girls, which is one of the most important Drawdown solutions because of the approach that Drawdown takes where they look at everything. Sorry, that’s a bit of a long answer to your question.

Alexander: That’s actually part of the beauty of Drawdown, that they look so broadly, that educating girls… So a girl leaving school at 12 got on average like five children and a girl that you educate leaving school at the age of 18 on average has about two children, and they will have a better quality of life. They will put much less pressure on the planet. So this more holistic approach of Drawdown I like very much and I think it’s good to see so much overlap between Drawdown and IPCC. But I was struck by IPCC that it relies so heavily on still unproven technologies. I guess we have to bet on all our horses and treat this a bit as a kind of wartime economy. If you see the creativity and the innovation that we saw during the second World War in the field of technology as well as government by the way, and if you see, let’s say, the race to the moon, the amazing technology that was developed in such a short time.

Once you really dedicate all your knowledge and your creativity and your finances and your political will into solving this, we can solve this. That is what Drawdown is saying and that is also what the IPCC 1.5 is saying. We can do it, but we need the political will and we need to mobilize the strengths that we have. Unfortunately, when I look around worldwide, the trend is in many countries exactly the opposite. I think Brazil might soon be the latest one that we should be worried about.

John: Yeah.

Ariel: So this is, I guess where I’m most interested in what we can do and also possibly the most cynical, and this comes back to this trickle up idea that you were talking about. That is, we don’t have the political will right now. So what do those of us who do have the will do? How do we make that transition of people caring to governments caring? Because I do, maybe this is me being optimistic, but I do think if we can get enough people taking individual action, that will force governments to start taking action.

John: So trickle up, grassroots, I think we’re in the same sort of idea. I think it’s really important to talk a little bit, and then we will get into the solutions, but to talk about not just as the solutions to global warming, but to a lot of other problems as well such as air pollution, our health, the pollution that we see in the environment. And actually Alexander you were talking earlier about the huge transformation. But transformation does not necessarily always have to mean sacrifice. It doesn’t also have to mean that we necessarily, although it’s certainly a good idea, for instance, I think you were gonna ask a question also about flying, to fly less there’s no doubt about that. To perhaps not buy the 15th set of clothes and so on so forth.

So there certainly is an element of that, although the positive side of that is the circular economy. In fact, these solutions, it’s not a question of no growth or less growth, but it’s a question of different growth. I think in terms of the discussion in climate change, one mistake that we have made is emphasized too much the “don’t do this.” I think that’s also what’s really interesting about Drawdown, is that there’s no real judgments in there. They’re basically saying, “These are the facts.” If you have a plant-based diet, you will have a huge impact on the climate versus if you eat steak every day, right? But it’s not making a judgment. Rather than don’t eat meat it’s saying eat plant-based foods.

Ariel: So instead of saying don’t drive your car, try to make it a competition to see who can bike the furthest each week or bike the most miles?

John: For example, yeah. Or consider buying an electric car if you absolutely have to have a car. I mean in the US it’s more indispensable than in Europe.

Alexander: It means in the US that when you build new cities, try to build them in a more clever way than the US has been doing up until now because if you’re in America and you want to buy whatever, a new toothbrush, you have to get in your car to go there. When I’m in Europe, I just walk out of the door and within 100 meters I can buy a toothbrush somewhere. I walk or I go on a bicycle.

John: That might be a longer-term solution.

Alexander: Well actually it’s not. I mean in the next 30 years, the amount of investment they can place new cities is an amount of 90 trillion dollars. The city patterns that we have in Europe were developed in the Middle Ages in the centers of cities, so although it is urgent and we have to do a lot of things, you should also think about the investments that you make now that will be followed for hundreds of years. We shouldn’t keep repeating the mistakes from the past. These are the kinds of things we should also talk about. But to come back to your question on what we can do individually, I think there is so much that you can do that helps the planet.

Of course, you’re only one out of seven billion people, although if you listen to this podcast it is likely that you are in that elite out of that seven billion that is consuming much more of the planet, let’s say, than your quota that you should be allowed to. But it means, for instance, changing your diet, and then if you go to a plant-based diet, the perks are not only that it is good for the planet, it is good for yourself as well. You live longer. You have less chance of developing cancer or heart disease or all kinds of other things you don’t want to have. You will live longer. You will have for a longer time a healthier life.

It means actually that you discover all kinds of wonderful recipes that you had never heard of before when you were still eating steak every day, and it is actually a fantastic contribution for the animals that are daily on an unimaginable scale tortured all over the world, locked up in small cages. You don’t see it when you buy it at a butcher, but you are responsible because they do that because you are the consumer. So stop doing that. Better for the planet. Better for the animals. Better for yourself. Same with use your bicycle, walk more. I still have a car. It is 21 years old. It’s the only car I ever bought in my life, and I use it maximum 20 minutes per month. I’m not even buying an electrical vehicle because I still got an old one. There’s a lot that you can do and it has more advantages than just to the planet.

John: Absolutely. Actually, walkable cities is one of the Drawdown solutions. Maybe I can just mention very quickly. I’ll just list out of the 80 solutions, there was a very interesting study that showed that there are 30 of them that we could put into place today, and that that added up to about 40% of the greenhouse gases that we’ll be able to remove.

I’ll just list them quickly. The ones at the end, they’re more, if you are in an agricultural setting, which of course is probably not the case for many of your listeners. But: reduced food waste, plant-rich diets, clean cookstoves, composting, electric vehicles we talked about, ride sharing, mass transit, telepresence (basically video conferencing, and there’s a lot of progress being made there which means we perhaps don’t need to take that airplane.) Hybrid cars, bicycle infrastructure, walkable cities, electric bicycles, rooftop solar, solar water (so that’s heating your hot water using solar.) Methane digesters (it’s more in an agricultural setting where you use biomass to produce methane.) Then you have LED lighting, which is a 90% gain compared to incandescent. Household water saving, smart thermostats, household recycling and recyclable paper, micro wind (there are some people that are putting a little wind turbine on their roof.)

Now these have to do with agriculture, so they’re things like civil pasture, tropical staple trees, tree intercropping, regenerative agriculture, farmland restoration, managed grazing, farmland irrigation and so on. If you add all those up it’s already 37% of the solution. I suspect that the 20 is probably a good 20%. Those are things you can do tomorrow — today.

Ariel: Those are helpful, and we can find those all at drawdown.org; that’ll also list all 80. So you’ve brought this up a couple times, so let’s talk about flying. This was one of those things that really hit home for me. I’ve done the carbon footprint thing and I have an excellent carbon footprint right up until I fly and then it just explodes. As soon as I start adding the footprint from my flights it’s just awful. I found it frustrating that one, so many scientists especially have … I mean it’s not even that they’re flying, it’s that they have to fly if they want to develop their careers. They have to go to conferences. They have to go speak places. I don’t even know where the responsibility should lie, but it seems like maybe we need to try to be cutting back on all of this in some way, that people need to be trying to do more. I’m curious what you guys think about that.

Alexander: Well start by paying tax, for instance. Why is it — well I know why it is — but it’s absurd that when you fly an airplane you don’t pay tax. You can fly all across Europe for like 50 euros or 50 dollars. That is crazy. If you would do the same by your car, you pay tax on the petrol that you buy, and worse, you are not charged for the pollution that you cause. We know that airplanes are heavily polluting. It’s not only the CO2 that they produce, but where they produce, how they produce. It works three to four times faster than all the CO2 that you produce if you drive your car. So we know how bad it is, then make people pay for it. Just make flying more expensive. Pay for the carbon you produce. When I produce waste at home, I pay to my municipality because they pick it up and they have to take care of my garbage, but if I put garbage in the atmosphere, somehow I don’t go there. Actually, it is by all sorts of strange ways, it’s actually subsidized because you don’t pay a tax for it, so there’s worldwide like five or six times as much subsidies on fossil fuels than there is on renewables.

We completely have to change the system. Give people a budget maybe. I don’t know, there could be many solutions. You could say that everybody has the right to search a budget for flying or for carbon, and you can maybe trade that or swap it or whatever. There’s some NGOs that do it. They say to, I think the World Wildlife Fund, but correct me if I’m wrong. All the people working there, they get not only a budget for the projects, they also get a carbon budget. You just have to choose, am I going to this conference or going to that conference, or should I take the train, and you just keep track of what you are doing. That’s something we should maybe roll out on a much bigger scale and make it more expensive.

John: Yeah, the whole idea of a carbon tax, I think is key. I think that’s really important. Some other thoughts: Definitely reduce, do you really absolutely need to make that trip, think about it. Now with webcasting and video conferencing, we can do a lot more without flying. The other thing I suggest is that when you at some point you absolutely do have to travel, try to combine it with as many other things as possible that are perhaps not directly professional. If you are already in the climate change field, then at least you’re traveling for a reason. Then it’s a question of the offsets. Using calculators you can see what the emissions were and pay for what’s called an offset. That’s another option as well.

Ariel: I’ve heard mixed things about offsets. In some cases I see that yes, you should absolutely buy them, and you should. If you fly, you should get them. But that in a lot of cases they’re a bandaid or they might be making it seem like it’s okay to do this when it’s still not the solution. I’m curious what your thoughts on that are.

John: For me, something like an offset, as much as possible should be a last resort. You absolutely have to make the trip, it’s really important, and you offset your trip. You pay for some trees to be planted in the rainforest for instance. There are loads of different possibilities to do so. It’s not a good idea. Unfortunately Switzerland’s plan, for instance, includes a lot of getting others to reduce emissions. That’s really, you can argue that it’s cheaper to do it that way and somebody else might do it more cheaply for you so to speak. So cheaper to plant a tree and it’ll have more impact in the rainforest than in Switzerland. But on the other hand, it’s something which I think we really have to avoid, also because in the end the green economy is where the future lies and where we need to transform to. So if we’re constantly getting others to do the decarbonization for us, then we’ll be stuck with an industry which is ultimately will become very expensive. That’s not a good idea either.

Alexander: I think also the prices are absolutely unrealistic. If you fly, let’s say, from London to New York, your personal, just the fact that you were in the plane, not all the other people, the fact you were in the plane is responsible for three square meters of the Arctic that is melting. You can offset that by paying something like, what is it, 15 or 20 dollars for offsetting that flight. That makes ice in the Arctic extremely cheap. A square meter would be worth something like seven dollars. Well I personally would believe that it’s worth much more.

Then the thing is, then they’re going to plant a tree that takes a lot of time to grow. By the time it’s big, it’s getting CO2 out of the air, are they going to cut it and make newspapers out of it which you then burn in a fireplace, the carbon is still back to where it was. So you need to really carefully think what you’re doing. I feel it is very much a bit like going to a priest and say like, “I have flown. Oh, I have sinned, but I can now do a few prayers and I pay these $20 and now it’s fine. I can book my next flight.” That is not the way it should be. Punish people up front to pay the tickets. Pay the price for the pollution and for the harm that you are causing to this planet and to your fellow citizens on this planet.

John: Couldn’t agree more. But there are offset providers in the US, look them up. See which one you like the best and perhaps buy more offsets. Economy is half the carbon than Business class, I hate to say.

Alexander: Something for me which you mentioned there, I decided long ago, six, seven years ago, that I would never ever in my life fly Business again. I’m not, as somebody who had a thrombosis and the doctors advised me that I should take business, I don’t. I still fly. I’m very much like Ariel that my footprint is okay until the moment that I start adding flying because I do that a lot for my job. Let’s say in the next few weeks, I have a meeting in the Netherlands. I have only 20 days later a meeting in England. I stay in the Netherlands. In between I do all my travel to Belgium and France and the UK, I do everything by train. It’s only that by plane I’m going back from London to Stockholm, because I couldn’t find any reasonable way to go back. I wonder why don’t we have high speed train connections all the way up to Stockholm here.

Ariel: We talked a lot about taxing carbon. I had an interesting experience last week where I’m doing what I can to try to not drive if I’m in town. I’m trying to either bike or take the bus. What often happens is that works great until I’m running late for something, and then I just drive because it’s easier. But the other week, I was giving a little talk on the campus at CU Boulder, and the parking on CU Boulder is just awful. There is absolutely no way that, no matter how late I’m running, it’s more convenient for me to take my car. It never even once dawned on me to take the car. I took a bus. It’s that much easier. I thought that was really interesting because I don’t care how expensive you make gas or parking, if I’m running late I’m probably gonna pay for it. Whereas if you make it so inconvenient that it just makes me later, I won’t do that. I was wondering if you have any other, how can we do things like that where there’s also this inconvenience factor?

Alexander: Have a look at Europe. Well coincidentally I know CU Boulder and I know how difficult the parking is. That’s the brilliance of Boulder where I see a lot of brilliant things. It’s what we do in Europe. I mean one of the reasons why I never ever use a car in Stockholm is that I have no clue how or where to park it, nor can I read the signs because my Swedish is so bad. I’m afraid of a ticket. I never use the car here. Also because we have such perfect public transport. The latest thing they have here is the VOI that just came out like last month, which is, I don’t know the word, we call it “step” in Dutch. I don’t know what you call that in English, whether it’s the same word or not, but it’s like these two-wheeled things that kids normally have. You know?

They are now here electric, so you download an app on your mobile phone and you see one of them in the street because they’re everywhere now. Type in a code and then it unlocks. Then it starts using your time. So for every minute, you pay like 15 cents. So all these electric little things that are everywhere for free, you just drive all around town and you just drop them wherever you like. When you need one, you look on your app and the app shows you where the nearest one is. It’s an amazing way of transport and it’s just, a month ago you saw just one or two. Now they are everywhere. You’re on the streets, you see one. It’s an amazing new way of transport. It’s very popular. It just works on electricity. It makes things so much more easy to reach everywhere in the city because you go at least twice as fast as walking.

John: There was a really interesting article in The Economist about parking. Do you know how many parking spots The Shard, the brand new building in London, the skyscraper has? Eight. The point that’s being made in terms of what you were just asking about in terms of inconvenience, in Europe it just really, in most cases it really doesn’t make any sense at all to take a car into the city. It’s a nightmare.

Before we talk more about personal solutions, I did want to make some points about the economics of all these solutions because what’s really interesting about Drawdown as well is that they looked at both what you would save and what it would cost you to save that over the 30 years that you would put in place those solutions. They came up with some things which at first sight are really quite surprising, because you would save 74.4 trillion dollars for an investment or a net cost of 29.6 trillion.

Now that’s not for all the solutions, so it’s not exactly that. In some of the solutions it’s very difficult to estimate. For instance, the value of educating girls. I mean it’s inestimable. But the point that’s also made is that if you look at The Solutions Project, Professor Jacobson, they also looked at savings, but they looked at other savings that I think are much more interesting and much more important as well. You would basically see a net increase of over 24 million long-term jobs that you would see an annual decrease in four to seven million air pollution deaths per year.

You would also see the stabilization of energy prices, because think of the price of oil where it goes from one day to the next, and annual savings of over 20 trillion in health and climate costs. Which comes back to, when you’re doing those solutions, you are also saving money, but you are also saving more importantly peoples’ lives, the tragedy of the commons, right? So I think it’s really important to think about those solutions. I mean we know very well why we are still using fossil fuels, it’s because of the massive subsidies and support that they get and the fact that vested interests are going to defend their interests.

I think that’s really important to think about in terms of those solutions. They are becoming more and more possible. Which leads me to the other point that I’m always asked about, which is, it’s not going fast enough. We’re not seeing enough renewables. Why is that? Because even though we don’t tax fuel, as you mentioned Alexander, because we’ve produced now so many solar panels, the cost is getting to be much cheaper. It’ll get cheaper and cheaper. That’s linked to this whole idea of exponential growth or tipping points, where all of a sudden all of us start to have a solar panel on our roof, where more and more of us become vegetarians.

I’ll just tell you a quick anecdote on that. We had some out of town guests who absolutely wanted to go to actually a very good steakhouse in Geneva. So along we went. We didn’t want to offend them and say “No, no, no. We’re certainly not gonna go to a steakhouse.” So we went along. It was a group of seven of us. Imagine the surprise when they came to take our orders and three out of seven of us said, “I’m afraid we’re vegetarians.” It was a bit of a shock. I think those types of things start to make others think as well, “Oh, why are you vegetarian,” and so on and so forth.

That sort of reflection means that certain business models are gonna go out of business, perhaps much faster than we think. On the more positive side, there are gonna be many more vegetarian restaurants, you can be sure, in the future.

Ariel: I want to ask about what we’re all doing individually to address climate change. But Alexander, one of the things that you’ve done that’s probably not what just a normal person would do, is start the Planetary Security Initiative. So before we get into what individuals can do, I was hoping you could talk a little bit about what that is.

Alexander: That was not so much as an individual. I was at Yale University for half a year when I started this, but then when I came back in the Ministry of Foreign Affairs for one more year, I had some ideas and I got support from the ministers of doing that, on bringing the experts in the world together that work in the field of the impact that climate change will have on security. So the idea to start was creating an annual meeting where all these experts in the world come together because that didn’t exist yet, and to make more scientists and researchers in the world energetic to study more in the field of how this relationship works. But more importantly, the idea was also to connect the knowledge and the insights of these experts on how the changing climate and the impacts impacts has on water and food, and our changing planetary conditions, how they are impacting the geopolitics.

I have a background, both in security as well as environment. That used to be two completely different tracks that weren’t really interacting. The more I was working on those two things, the more that I saw that the changing environment is actually directly impacting our security situation. It’s already happening and you can be pretty sure that the impact is going to be much more in the future. So what we then started was a meeting in the Peace Palace in the Hague. There were some 75 countries the first time that we were present there, and then the key experts in the world. It’s now an annual meeting that always takes place. For anybody that’s interested, contact me and then I will provide you with the right contact. It is growing now into all kinds of other initiatives and other involvement and more studies that are taking place.

So the issue is really taking off, and that is mainly because more and more people see the need of getting better insights into the impact that all of these changes that we’ve been discussing, that it’ll have on security whether that’s individual security, human security of individuals, that’s also geopolitical security. Imagine that when so much is changing, when the economies are changing so rapidly, when interests of people change and when people start going on the move, tensions will rise for a number of reasons, partly related to climate change, but it’s very much a situation where climate change is already in an existing fragile situation, it’s making it worse. So that is the Planetary Security Initiative. The government of the Netherlands has been very strong on this, working closely together with something other governments. Sweden, for instance, where I’m living, Sweden has in the past year been focusing very much on strengthening the United Nations, that you would have experts at the relevant high level in New York that can connect the dots and connect to people and the issues to not just raise awareness for the issue, but make sure that in the policies that are made, these issues are also taken into account because you better do it up front than repair damage afterwards if you haven’t taken care of these issues.

It’s a rapidly developing field. There is a new thing as, for instance, using AI and data, I think the World Resources Institute in Washington is very good at that, where they combine let’s say, the geophysical data, let’s say satellite and other data on increasing drought in the world, but also deforestation and other resource issues. They are connecting that now with the geopolitical impacts with AI and with combining all these completely different databases. You get much better insight on where the risks really are, and I believe that in the years to come, WRI in combination with several other think tanks can do brilliant work where the world is really waiting for the kind of insights. International policies will be so much more effective if you know much better where the problems are really going to hit first.

Ariel: Thank you. All right, so we are starting to get a little bit short on time, and I want to finish the discussion with things that we’ve personally been doing. I’m gonna include myself in this one because I think the more examples the better. So what we’ve personally been doing to change our lifestyles for the better, not sacrifice, but for the better, to address climate change. And also, to keep us all human, where we’re failing that we wish we were doing better.

I can go ahead and start. I am trying to not use my car in town. I’m trying to stick to biking or taking public transportation. I have dropped the temperature in our house by another degree, so I’m wearing more sweaters. I’m going to try to be stricter about flying, only if I feel that I will actually be having a good impact on the world will I fly, or a family emergency, things like that.

I’m pretty sure our house is on wind power. I work remotely, so I work from home. I don’t have to travel for work. I those are some of the big things, and as I said, flying is still a problem for me so that’s something I’m working on. Food is also an issue for me. I have lots of food issues so cutting out meat isn’t something that I can do. But I have tried to buy most of my food from local farms, I’m trying to buy most of my meat from local farms where they’re taking better care of the animals as well. So hopefully that helps a little bit. I’m also just trying to cut back on my consumption in general. I’m trying to not buy as many things, and if I do buy things I’m trying to get them from companies that are more environmentally-conscious. So I think food and flying are sort of where I’m failing a little bit, but I think that’s everything on my end.

Alexander: I think one of the big changes I made is I became years ago already vegetarian for a number of good reasons. I am now practically vegan. Sometimes when I travel it’s a bit too difficult. I hardly ever use the car. I guess it’s just five or six times a year that I actually use my car. I use bicycles and public transport. The electricity at our home is all wind power. In the Netherlands, that’s relatively easy to arrange nowadays. There’s a lot of offers for it, so I deliberately buy wind power, including in the times when wind power was still more expensive than other power. I think about in consumption, when I buy food, I try to buy more local food. There’s the occasional kiwi, which I always wonder it’s arrives in Europe, but that’s another thing that you can think of. Apart from flying, I really do my best with my footprint. Then flying is the difficult thing because with my work, I need to fly. It is about personal contacts. It is about meeting a lot of people. It’s about teaching.

I do teaching online. I use Skype for teaching to classrooms. I do many Skype conferences all the time, but yes I’m still flying. I refuse flying business class. I started that some six, seven years ago. Just today business class ticket was offered to me for a very long flight and I refused it. I say I will fly economy. But yes, the flying is what adds to my footprint. I still, I try to combine trips. I try to stay longer at a certain place, combining it, and then by train go to all kinds of other places. But when you’re stuck here in Stockholm, it’s quite difficult to get here by other means than flying. Once I’m, let’s say, in the Netherlands or Brussels or Paris or London or Geneva, you can do all those things by train, but it gets a bit more difficult out here.

John: Pretty much in Alexander’s case, except that I’m very local. I travel actually very little and I keep the travel down. If I do have to travel, I have managed to do seven hour trips by train. That’s a possibility in Europe, but that sort of gets you to the middle of Germany. Then the other thing is I’ve become vegetarian recently. I’m pretty close to vegan, although it’s difficult with such good cheese we have in this country. But the way it came about is interesting as well. It’s not just me. It’s myself, my wife, my daughter, and my son. The third child is never gonna become vegetarian I don’t think. But that’s not bad, four out of five.

In terms of what I think you can do and also points to things that we perhaps don’t think about contributing, being a voice, vis a vis others in our own communities and explaining why you do what you do in terms of biking and so on so forth. I think that really encourages others to do the same. It can grow a lot like that. In that vein, I teach as much as I can to high school students. I talk to them about Drawdown. I talk to them about solutions and so on. They get it. They are very very switched on about this. I really enjoy that. You really see, it’s their future, it’s their generation. They don’t have very much choice unfortunately. On a more positive note, I think they can really take it away in terms of a lot of actions which we haven’t done enough of.

Ariel: Well I wanted to mention this stuff because going back to your idea, this trickle up, I’m still hopeful that if people take action that that will start to force governments to. One final question on that note, did you guys find yourselves struggling with any of these changes or did you find them pretty easy to make?

Alexander: I think all of them were easy. Switching your energy to wind power, et cetera. Buying more consciously. It comes naturally. I was already vegetarian, and then moving to vegan, just go online and read it about it and how to do it. I remember when I was a kid that hardly anybody was vegetarian. Then I once discussed it with my mother and she said, “Oh it’s really difficult because then you need to totally balance your food and be in touch with your doctor, whatever.” I’ve never spoken to any doctor. I just stopped eating meat and now I … Years ago I swore out all dairy. I’ve never been ill. I don’t feel ill. Actually I feel better. It is not complicated. The rather complicated thing is flying, there are sometimes I have to make difficult choices like being for a long time away from home, I saved quite a bit on that part. That’s sometimes more complicated or, like soon I’ll be in a nearly eight hour train ride in something I could have flown in an hour.

John: I totally agree. I mean I enjoy being in a train, being able to work and not be worried about some truck running into you or the other foibles of driving which I find very very … I’ve got to a point where I’m becoming actually quite a bad driver. I drive so little that, I hope not, but I might have an accident.

Ariel: Well fingers crossed that doesn’t happen. Amd good. That’s been my experience so far too. The changes that I’ve been trying to make haven’t been difficult. I hope that’s an important point for people to realize. Anything else you want to add either of you?

Alexander: I think there’s just one thing that we didn’t touch on, on what you can do individually. That’s perhaps the most important one for us in democratic countries. That is vote. Vote for the best party that actually takes care of our long-term future, a party that aims for taking rapidly the right climate change measures. A party that wants to invest in a new economy that sees that if you invest now, you can be a leader later.

There is, in some countries, you have a lot of parties and there is all kinds of nuances. In other countries you have to deal with basically two parties, where just the one part is absolutely denying science and is doing exactly the wrong things and are basically aiming to ruin the planet as soon as possible, whereas the other party is actually looking for solutions. Well if you live in a country like that, and there are coincidentally soon elections coming up, vote for the party that takes the best positions on this because it is about the future of your children. It is the single most important influential thing that you can do, certainly if you live in a country where the emissions that the country produces are still among the highest in the world. Vote. Take people with you to do it.

Ariel: Yeah, so to be more specific about that, as I mentioned at the start this podcast, it’s coming out on Halloween, which means in the US, elections are next week. Please vote.

John: Yeah. Perhaps something else is how you invest, where your money is going. That’s one that can have a lot of impact as well. All I can say is, I hate to come back to Drawdown, but go through the Drawdown and think about your investments and say, okay, renewables whether it’s LEDs or whatever technology it is, if it’s in Drawdown, make sure it’s in your investment portfolio. If it’s not, you might want to get out of it, particularly the ones that we already know are causing the problem in the first place.

Ariel: That’s actually, that’s a good reminder. That’s something that has been on my list of things to do. I know I’m guilty of not investing in the proper companies at the moment. That’s something I’ve been wanting to fix.

Alexander: And tell your pension funds: divest from fossil fuels and invest in renewables and all kinds of good things that we need in the new economy.

John: But not necessarily because you’re doing it as a charitable cause, but really because these are the businesses of the future. We talked earlier about growth that these different businesses can take. Another factor that’s really important is efficiency. For instance, I’m sure you have heard of The Impossible Burger. It’s a plant-based burger. Now what do you think is the difference in terms of the amount of crop land required to produce a beef burger versus an impossible burger?

Alexander: I would say one in 25 or one in 35, but at range.

John: Yeah, so it’s one in 20. The thing is that when you look at that type of gain in efficiency, it’s just a question of time. A cow simply can’t compete. You have to cut down the trees to grow the animal feed that you ship to the cow, that the cow then eats. Then you have to wait a number of years, and that’s that 20 factor difference in efficiency. Now our capitalist economic system doesn’t like inefficient systems. You can try to make that cow as efficient as possible, you’re never going to be able to compete with a plant-based burger. Anybody who thinks that that plant-based burger isn’t going to displace the meat burger should really think again.

Ariel: All right, I think we’re ending on a nice hopeful note. So I want to thank you both for coming on today and talking about all of these issues.

Alexander: Thanks Ariel. It was nice to talk.

John: Thank you very much.

Ariel: If you enjoyed this podcast, please take a moment to like it and share it, and maybe even leave a positive review. And o f course, if you haven’t already, please follow us. You can find the FLI podcast on iTunes, Google Play, SoundCloud, and Stitcher.

[end of recorded material]

AI Alignment Podcast: On Becoming a Moral Realist with Peter Singer

Are there such things as moral facts? If so, how might we be able to access them? Peter Singer started his career as a preference utilitarian and a moral anti-realist, and then over time became a hedonic utilitarian and a moral realist. How does such a transition occur, and which positions are more defensible? How might objectivism in ethics affect AI alignment? What does this all mean for the future of AI?

On Becoming a Moral Realist with Peter Singer is the sixth podcast in the AI Alignment series, hosted by Lucas Perry. For those of you that are new, this series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with Peter Singer. Peter is a world-renowned moral philosopher known for his work on animal ethics, utilitarianism, global poverty, and altruism. He’s a leading bioethicist, the founder of The Life You Can Save, and currently holds positions at both Princeton University and The University of Melbourne.

Topics discussed in this episode include:

  • Peter’s transition from moral anti-realism to moral realism
  • Why emotivism ultimately fails
  • Parallels between mathematical/logical truth and moral truth
  • Reason’s role in accessing logical spaces, and its limits
  • Why Peter moved from preference utilitarianism to hedonic utilitarianism
  • How objectivity in ethics might affect AI alignment
In this interview we discuss ideas contained in the work of Peter Singer. You can learn more about Peter’s work here and find many of the ideas discussed on this podcast in his work The Point of View of the Universe: Sidgwick and Contemporary EthicsYou can listen to the podcast above or read the transcript below.

Lucas: Hey, everyone, welcome back to the AI Alignment Podcast series. I’m Lucas Perry, and today, we will be speaking with Peter Singer about his transition from being a moral anti-realist to a moral realist. In terms of AI safety and alignment, this episode primarily focuses on issues in moral philosophy.

In general, I have found the space of moral philosophy to be rather neglected in discussions of AI alignment where persons are usually only talking about strategy and technical alignment. If it is unclear at this point, moral philosophy and issues in ethics make up a substantial part of the AI alignment problem and have implications in both strategy and technical thinking.

In terms of technical AI alignment, it has implications in preference aggregation, and it’s methodology, in inverse reinforcement learning, and preference learning techniques in general. It affects how we ought to proceed with inter-theoretic comparisons of value, with idealizing persons or agents in general and what it means to become realized, how we deal with moral uncertainty, and how robust preference learning versus moral reasoning systems should be in AI systems. It has very obvious implications in determining the sort of society we are hoping for right before, during, and right after the creation of AGI.

In terms of strategy, strategy has to be directed at some end and all strategies smuggle in some sort of values or ethics, and it’s just good here to be mindful of what those exactly are.

And with regards to coordination, we need to be clear, on a descriptive account, of different cultures or groups’ values or meta-ethics and understand how to move from the state of all current preferences and ethics onwards given our current meta-ethical views and credences. All in all, this barely scratches the surface, but it’s just a point to illustrate the interdependence going on here.

Hopefully this episode does a little to nudge your moral intuitions around a little bit and impacts how you think about the AI alignment problem. In coming episodes, I’m hoping to pivot into more strategy and technical interviews, so if you have any requests, ideas, or persons you would like to see interviewed, feel free to reach out to me at lucas@futureoflife.org. As usual, if you find this podcast interesting or useful, it’s really a big help if you can help share it on social media or follow us on your preferred listening platform.

As many of you will already know, Peter is a world-renowned moral philosopher known for his work on animal ethics, utilitarianism, global poverty, and altruism. He’s a leading bioethicist, the founder of The Life You Can Save, and currently holds positions at both Princeton University and The University of Melbourne. And so, without further ado, I give you Peter Singer.

Thanks so much for coming on the podcast, Peter. It’s really wonderful to have you here.

Peter: Oh, it’s good to be with you.

Lucas: So just to jump right into this, it would be great if you could just take us through the evolution of your metaethics throughout your career. As I understand, you began giving most of your credence to being an anti-realist and a preference utilitarian, but then over time, it appears that you’ve developed into a hedonic utilitarian and a moral realist. Take us through the evolution of these views and how you developed and arrived at your new ones.

Peter: Okay, well, when I started studying philosophy, which was in the 1960s, I think the dominant view, at least among people who were not religious and didn’t believe that morals were somehow an objective truth handed down by God, was what was then referred to as an emotivist view, that is the idea that moral judgments express our attitudes, particularly, obviously from the name, emotional attitudes, that they’re not statements of fact, they don’t purport to describe anything. Rather, they express attitudes that we have and they encourage others to share those attitudes.

So that was probably the first view that I held, siding with people who were non-religious. It seemed like a fairly obvious option. Then I went to Oxford and I studied with R.M. Hare who was a professor of moral philosophy at Oxford at the time and a well-known figure in the field. His view was also in this general ballpark of non-objectivist or, as we would know say, non-realist theories, non-cognitivist] was another term used for them. They didn’t purport to be about knowledge.

But his view was that when we make a moral judgment, we are prescribing something. So his idea was that moral judgments fall into the general family of imperative judgments. So if I tell you shut the door, that’s an imperative. It doesn’t say anything that’s true or false. And moral judgments were a particular kind of imperative according to Hare, but they had this feature that they had to be uni