Posts in this category get featured at the top of the front page.

FLI Podcast: The Psychology of Existential Risk and Effective Altruism with Stefan Schubert

We could all be more altruistic and effective in our service of others, but what exactly is it that’s stopping us? What are the biases and cognitive failures that prevent us from properly acting in service of existential risks, statistically large numbers of people, and long-term future considerations? How can we become more effective altruists? Stefan Schubert, a researcher at University of Oxford’s Social Behaviour and Ethics Lab, explores questions like these at the intersection of moral psychology and philosophy. This conversation explores the steps that researchers like Stefan are taking to better understand psychology in service of doing the most good we can. 

Topics discussed include:

  • The psychology of existential risk, longtermism, effective altruism, and speciesism
  • Stefan’s study “The Psychology of Existential Risks: Moral Judgements about Human Extinction”
  • Various works and studies Stefan Schubert has co-authored in these spaces
  • How this enables us to be more altruistic

Timestamps:

0:00 Intro

2:31 Stefan’s academic and intellectual journey

5:20 How large is this field?

7:49 Why study the psychology of X-risk and EA?

16:54 What does a better understanding of psychology here enable?

21:10 What are the cognitive limitations psychology helps to elucidate?

23:12 Stefan’s study “The Psychology of Existential Risks: Moral Judgements about Human Extinction”

34:45 Messaging on existential risk

37:30 Further areas of study

43:29 Speciesism

49:18 Further studies and work by Stefan

Works Cited 

Understanding cause-neutrality

Considering Considerateness: Why communities of do-gooders should be exceptionally considerate

On Caring by Nate Soares

Against Empathy: The Case for Rational Compassion

Eliezer Yudkowsky’s Sequences

Whether and Where to Give

A Person-Centered Approach to Moral Judgment

Moral Aspirations and Psychological Limitations

Robin Hanson on Near and Far Mode 

Construal-Level Theory of Psychological Distance

The Puzzle of Ineffective Giving (Under Review) 

Impediments to Effective Altruism

The Many Obstacles to Effective Giving (Under Review) 

Moral Aspirations and Psychological Limitations

 

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Lucas Perry: Hello everyone and welcome to the Future of Life Institute Podcast. I’m Lucas Perry.  Today, we’re speaking with Stefan Schubert about the psychology of existential risk, longtermism, and effective altruism more broadly. This episode focuses on Stefan’s reasons for exploring psychology in this space, how large this space of study currently is, the usefulness of studying psychology as it pertains to these areas, the central questions which motivate his research, a recent publication that he co-authored which motivated this interview called The Psychology of Existential Risks: Moral Judgements about Human Extinction, as well as other related work of his. 

This podcast often ranks in the top 100 of technology podcasts on Apple Music. This is a big help for increasing our audience and informing the public about existential and technological risks, as well as what we can do about them. So, if this podcast is valuable to you, consider sharing it with friends and leaving us a good review. It really helps. 

Stefan Schubert is a researcher at the the Social Behaviour and Ethics Lab at the University of Oxford, working in the intersection of moral psychology and philosophy. He focuses on psychological questions of relevance to effective altruism, such as why our altruistic actions are often ineffective, and why we don’t invest more in safe-guarding our common future. He was previously a researcher at Centre for Effective Altruism and a postdoc in philosophy at the London School of Economics. 

We can all be more altruistic and effective in our service of others. Expanding our moral circles of compassion farther into space and deeper into time, as well as across species, and possibly even eventually to machines, while mitigating our own tendencies towards selfishness and myopia is no easy task and requires deep self-knowledge and far more advanced psychology than I believe we have today. 

This conversation explores the first steps that researchers like Stefan are taking to better understand this space in service of doing the most good we can. 

So, here is my conversation with Stefan Schubert 

Can you take us through your intellectual and academic journey in the space of EA and longtermism and in general, and how that brought you to what you’re working on now?

Stefan Schubert: I started range of different subjects. I guess I had a little bit of hard time deciding what I wanted to do. So I got a masters in political science. But then in the end, I ended up doing a PhD in philosophy at Lund University in Sweden, specifically in epistemology, the theory of knowledge. And then I went to London School of Economics to do a post doc. And during that time, I discovered effective altruism and I got more and more involved with that.

So then I applied to Centre for Effective Altruism, here in Oxford, to work as a researcher. And I worked there as a researcher for two years. At first, I did policy work, including reports on catastrophic risk and x-risk for a foundation and for a government. But then I also did some work, which was general and foundational or theoretical nature, including work on the notion of cause neutrality, how we should understand that. And also on how EAs should think about everyday norms like norms of friendliness and honesty.

And I guess that even though I, at the time I didn’t do sort of psychological empirical research, that sort of relates to my current work on psychology because for the last two years, I’ve worked on the psychology of effective altruism at the Social Behavior and Ethics Lab here at Oxford. This lab is headed by Nadira Farber and I also work closely with Lucius Caviola, who did his PhD here at Oxford and recently moved to Harvard to do a postdoc.

So we have three strands of research. The first one is sort of the psychology of effective altruism in general. So why is it that people aren’t effectively altruistic? This is a bit of a puzzle because generally people, they are at least somewhat effective when they working for their own interest. To be sure they are not maximally effective, but when they try to buy a home or save for retirement, they do some research and sort of try to find good value for money.

But they don’t seem to do the same when they donate to charity. They aren’t as concerned with effectiveness. So this is a bit of a puzzle. And then there are two strands of research, which have to do with specific EA causes. So one is the psychology of longtermism and existential risk, and the other is the psychology of speciesism, human-animal relations. So out of these three strands of research, I focused the most on the psychology of effective altruism in general and the psychology of longtermism and existential risk.

Lucas Perry: How large is the body of work regarding the psychology of existential risk and effective altruism in general? How many people are working on this? If you give us more insight into the state of the field and the amount of interest there.

Stefan Schubert: It’s somewhat difficult to answer because it sort of depends on how do you define these domains. There’s research, which is of some relevance to ineffective altruism, but it’s not exactly on that. But I will say that there may be around 10 researchers or so who are sort of EAs and work on these topics for EA reasons. So you definitely want to count them. And then when we thinking about non EA researchers, like other academics, there hasn’t been that much research I would say on the psychology of X-risk and longtermism

There’s research on the psychology of climate change, that’s a fairly large topic. But more specifically on X-risk and longtermism, there’s less. Effective altruism in general. That’s a fairly large topic. There’s lots of research on biases like the identifiable victim effect: people’s tendency to donate to identifiable victims over larger number of known unidentifiable statistical victims. Maybe the order of a few hundred papers.

And then the last topic, speciesism; human-animals relations: that’s fairly large. I know less of that literature, but my impression is that it’s fairly large.

Lucas Perry: Going back into the 20th century, much of what philosophers have done, like Peter Singer is constructing thought experiments, which isolate the morally relevant aspects of a situation, which is intended in the end to subvert psychological issues and biases in people.

So I guess I’m just reflecting here on how philosophical thought experiments are sort of the beginnings of elucidating a project of the psychology of EA or existential risk or whatever else.

Stefan Schubert: The vast majority of these papers are not directly inspired by philosophical thought experiments. It’s more like psychologists who run some experiments because there’s some theory that some other psychologist has devised. Most don’t look that much at philosophy I would say. But I think effective altruism and the fact that people are ineffectively altruistic, that’s fairly theoretically interesting for psychologists, and also for economists.

Lucas Perry: So why study psychological questions as they relate to effective altruism, and as they pertain to longtermism and longterm future considerations?

Stefan Schubert: It’s maybe easiest to answer that question in the context of effective altruism in general. I should also mention that when we studied this topic of sort of effectively altruistic actions in general, what we concretely study is effective and ineffective giving. And that is because firstly, that’s what other people have studied, so it’s easier to put our research into context.

The other thing is that it’s quite easy to study in a lab setting, right? So you might ask people, where would you donate to the effective or the ineffective charity? You might think that career choice is actually more important than giving, or some people would argue that, but that seems more difficult to study in a lab setting. So with regards to what motivates our research on effective altruism in general and effective giving, what ultimately motivates our research is that we want to make people improve their decisions. We want to make them donate more effectively, be more effectively altruistic in general.

So how can you then do that? Well, I want to make one distinction here, which I think might be important to think about. And that is the distinction between what I call a behavioral strategy and an intellectual strategy. And the behavioral strategy is that you come up with certain framings or setups to decision problems, such that people behave in a more desirable way. So there’s literature on nudging for instance, where you sort of want to nudge people into desirable options.

So for instance, in a cafeteria where you have healthier foods at eye level and the unhealthy food is harder to reach people will eat healthier than if it’s the other way round. You could come up with interventions that similarly make people donate more effectively. So for instance, the default option could be an effective charity. We know that in general, people tend often to go with the default option because of some kind of cognitive inertia. So that might lead to more effective donations.

I think it has some limitations. For instance, nudging might be interesting for the government because the government has a lot of power, right? It might frame the decision on whether you want to donate your organs after you’re dead. The other thing is that just creating an implementing these kinds of behavior interventions can often be very time consuming and costly.

So one might think that this sort of intellectual strategy should be emphasized and it shouldn’t be forgotten. So with respect to the intellectual strategy, you’re not trying to change people’s behavior solely, you are trying to do that as well, but you’re also trying to change their underlying way of thinking. So in a sense it has a lot in common with philosophical argumentation. But the difference is that you start with descriptions of people’s default way of thinking.

You describe that your default way of thinking, that leads you to prioritize an identifiable victim over larger numbers of statistical victims. And then you sort of provide an argument that that’s wrong. Statistical victims, they are just as real individuals as the identifiable victims. So you get people to accept that their own default way of thinking about identifiable versus statistical victims is wrong, and that they shouldn’t trust the default way of thinking but instead think in a different way.

I think that this strategy is actually often used, but we don’t often think about it as a strategy. So for instance, Nate Soares has this blog post “On Caring” where he argues that we shouldn’t trust our internal care-o-meter. And this is because we can’t increase how much we feel about more people dying with the number of people that die or with the badness of those increasing numbers. So it’s sort of an intellectual argument that takes psychological insight as a starting point and other people have done as well.

So the psychologist Paul Bloom has this book Against Empathy where he argues for similar conclusions. And I think Eliezer Yudkowsky uses his strategy a lot in his sequences. I think it’s often an effective strategy that should be used more.

Lucas Perry: So there’s the extent to which we can know about underlying, problematic cognition in persons and we can then change the world in ways. As you said, this is framed as nudging, where you sort of manipulate the environment in such a way without explicitly changing their cognition, in order to produce desired behaviors. Now, my initial reaction to this is, how are you going to deal with the problem when they find out that you’re doing this to them?

Now the second one here is the extent to which we can use our insights from psychological and analysis and studies to change implicit and explicit models and cognition in order to effectively be better decision makers. If a million deaths is a statistic and a dozen deaths is a tragedy, then there is some kind of failure of empathy and compassion in the human mind. We’re not evolved or set up to deal with these kinds of moral calculations.

So maybe you could do nudging by setting up the world in such a way that people are more likely to donate to charities that are likely to help out statistically large, difficult to empathize with numbers of people, or you can teach them how to think better and better act on statistically large numbers of people.

Stefan Schubert: That’s a good analysis actually. On the second approach: what I call the intellectual strategy, you are sort of teaching them to think differently. Whereas on this behavioral or nudging approach, you’re changing the world. I also think that this comment about “they might not like the way you nudged them” is a good comment. Yes, that has been discussed. I guess in some cases of nudging, it might be sort of cases of weakness of will. People might not actually want the chocolate but they fall prey to their impulses. And the same might be true with saving for retirement.

So whereas with ineffective giving, yeah there it’s much less clear. Is it really the case that people really want to donate effectively and therefore sort of are happy to be nudged in this way, that doesn’t seem to clear at all? So that’s absolutely a reason against that approach.

And then with respect to arguing for certain conclusions, in the sense that it is argument or argumentation, it’s more akin to philosophical argumentation. But it’s different from standard analytic philosophical argumentation in that it discusses human psychology. You discuss how our psychological dispositions mislead us at length and that’s not how analytic philosophers normally do it. And of course you can argue for instance, effective giving in the standard philosophical vein.

And some people have done that, like this EA philosopher Theron Pummer, he has an interesting paper called Whether and Where to Give on this question of whether it is an obligation to donate effectively. So I think that’s interesting, but one worries that there might not be that much to say about these issues because everything else equal is maybe sort of trivial that the more effectiveness the better. Of course everything isn’t always equal. But in general, it might not be too much interesting stuff you can say about that from a normative or philosophical point of view.

But there are tons of interesting psychological things you can say because there are tons of ways in which people aren’t effective. The other related issue is that this form of psychology might have a substantial readership. So it seems to me based on the success of Kahneman and Haidt and others, that people love to read about how their own and others’ thoughts by default go wrong. Whereas in contrast, standard analytic philosophy, it’s not as widely read, even among the educated public.

So for those reasons, I think that the sort of more psychology based augmentation may in some respects be more promising than purely abstract philosophical arguments for why we should be effectively altruistic.

Lucas Perry: My view or insight here is that the analytic philosopher is more so trying on the many different perspectives in his or her own head, whereas the psychologist is empirically studying what is happening in the heads of many different people. So clarifying what a perfected science of psychology in this field is useful for illustrating the end goals and what we’re attempting to do here. This isn’t to say that this will necessarily happen in our lifetimes or anything like that, but what does a full understanding of psychology as it relates to existential risk and longtermism and effective altruism enable for human beings?

Stefan Schubert: One thing I might want to say is that psychological insights might help us to formulate a vision of how we ought to behave or what mindset we ought to have and what we ought to be like as people, which is not the only normatively valid, which is what philosophers talk about, but also sort of persuasive. So one idea there that Lucius and I have discussed quite extensively recently is that some moral psychologists suggest that when we think about morality, we think to a large degree, not in terms of whether a particular act was good or bad, but rather about whether the person who performed that act is good or bad or whether they are virtuous or vicious.

So this is called the person centered approach to moral judgment. Based on that idea, we’ve been thinking about what lists of virtues people would need, in order to make the world better, more effectively. And ideally these should be virtues that both are appealing to common sense, or which can at least be made appealing to common sense, and which also make the world better when applied.

So we’ve been thinking about which such virtues one would want to have on such a list. We’re not sure exactly what we’ll include, but some examples might be prioritization, that you need to make sure that you prioritize the best ways of helping. And then we have another which we call Science: That you do proper research and how to help effectively or that you rely on others who do. And then collaboration, that you’re willing to collaborate on moral issues, potentially even with your moral opponents.

So the details of this virtues aren’t too important, but the idea is that it hopefully should seem like a moral ideal to some people, to be a person who lives these virtues. I think that to many people philosophical arguments about the importance of being more effective and putting more emphasis on consequences, if you read them in a book of analytic philosophy, that might seem pretty uninspiring. So people don’t read that and think “that’s what I would want to be like.”

But hopefully, they could read about these kinds of virtues and think, “that’s what I would want to be like.” So to return to your question, ideally we could use psychology to sort of create such visions of some kind of moral ideal that would not just be normatively correct, but also sort of appealing and persuasive.

Lucas Perry: It’s like a science, which is attempting to contribute to the project of human and personal growth and evolution and enlightenment in so far as that as possible.

Stefan Schubert: We see this as part of the larger EA project of using evidence and reason and research to make the world a better place. EA has this prioritization research where you try to find the best ways of doing good. I gave this talk at EAGx Nordics earlier this year on “Moral Aspirations and Psychological Limitations.” And in that talk I said, well what EAs normally do when they prioritize ways of doing good, is as it were, they look into the world and they think: what ways of doing good are there? What different courses are there? What sort of levers can we pull to make the world better?

So should we reduce existential risk from specific sources like advanced AI or bio risk, or is rather global poverty or animal welfare the best thing to work on? But then the other approach is to rather sort of look inside yourself and think, well I am not perfectly effectively altruistic, and that is because of my psychological limitations. So then we want to find out which of those psychological limitations are most impactful to work on because, for instance, they are more tractable or because it makes a bigger difference if we remove them. That’s one way of thinking about this research, that we sort of take this prioritization research and turn it inwards.

Lucas Perry: Can you clarify the kinds of things that psychology is really pointing out about the human mind? Part of this is clearly about biases and poor aspects of human thinking, but what does it mean for human beings to have these bugs and human cognition? What are the kinds of things that we’re discovering about the person and how he or she thinks that fail to be in alignment with the truth.

Stefan Schubert: I mean, there are many different sources of error, one might say. One thing that some people have discussed is that people are not that interested in being effectively altruistic. Why is that? Some people say that’s just because they get more warm glow out of giving someone who’s suffering more saliently and then the question arises, why do they get more warm glow out of that? Maybe that’s because they just want to signal their empathy. That’s sort of one perspective, which is maybe a bit cynical, then ,that the ultimate source of lots of ineffectiveness is just this preference for signaling and maybe a lack of genuine altruism.

Another approach would be to just say, the world is very complex and it’s very difficult to understand it and we’re just computationally constrained, so we’re not good enough at understanding it. Another approach would be to say that because the world is so complex, we evolved various broad-brushed heuristics, which generally work not too badly, but then, when we are put in some evolutionarily novel context and so on, they don’t guide us too well. That might be another source of error. In general, what I would want to emphasize is that there are likely many different sources of human errors.

Lucas Perry: You’ve discussed here how you focus and work on these problems. You mentioned that you are primarily interested in the psychology of effective altruism in so far as we can become better effective givers and understand why people are not effective givers. And then, there is the psychology of longtermism. Can you enumerate some central questions that are motivating you and your research?

Stefan Schubert: To some extent, we need more research just in order to figure out what further research we and others should do so I would say that we’re in a pre-paradigmatic stage with respect to that. There are numerous questions one can discuss with respect to psychology of longtermism and existential risks. One is just people’s empirical beliefs on how good the future will be if we don’t go extinct, what the risk of extinction is and so on. This could potentially be useful when presenting arguments for the importance of work on existential risks. Maybe it turns out that people underestimate the risk of extinction and the potential quality of the future and so on. Another issue which is interesting is moral judgments, people’s moral judgements about how bad extinction would be, and the value of a good future, and so on.

Moral judgements about human extinction, that’s exactly what we studied in a recent paper that we published, which is called “The Psychology of Existential Risks: Moral Judgements about Human Extinction.” In that paper, we test this thought experiment by philosopher Derek Parfit. He has this thought experiment where he discusses three different outcomes. First, peace, the second, a nuclear war that kills 99% of the world’s existing population and three, a nuclear war that kills everyone. Parfit says, then, that a war that kills everyone, that’s the worst outcome. Near-extinction is the next worst and peace is the best. Maybe no surprises there, but the more interesting part of the discussion, that concerns the relative differences between these outcomes in terms of badness. Parfit effectively made an empirical prediction, saying that most people would find a difference in terms of badness between peace and near-extinction to be greater, but he himself thought that the difference between near-extinction and extinction, that’s the greater difference. That’s because only extinction would lead to the future forever being lost and Parfit thought that if humanity didn’t go extinct, the future could be very long and good and therefore, it would be a unique disaster if the future was lost.

On this view, extinction is uniquely bad, as we put it. It’s not just bad because it would mean that many people would die, but also because it would mean that we would lose a potentially long and grand future. We tested this hypothesis in the paper, then. First, we had a preliminary study, which didn’t actually pertain directly to Parfit’s hypothesis. We just studied whether people would find extinction a very bad event in the first place and we found that, yes, they do and they that the government should invest substantially to prevent it.

Then, we moved on to the main topic, which was Parfit’s hypothesis. We made some slight changes. In the middle outcome, Parfit had 99% dying. We reduced that number to 80%. We also talked about catastrophes in general rather than nuclear wars and we didn’t want to talk about peace because we thought that you might have an emotional association with the word “peace;” we just talked about no catastrophe instead. Using this paradigm, we found that Parfit was right. First, most people, just like him, thought that extinction was the worst outcome, near extinction the next, and no catastrophe was the best. But second, we find, then, that most people find the difference in terms of badness, between no one dying and 80% dying, that’s greater than the difference between 80% dying and 100% dying.

Our interpretation, then, is that this is presumably because they focus most on the immediate harm that the catastrophes cause and in terms of the immediate harm, the difference between no one dying and 80% dying, it’s obviously greater than that between 80% dying and 100% dying. That was a control condition in some of our experiments, but we also had other conditions where we would slightly tweak the question. We had one condition which we call the salience condition, where we made the longterm consequences of the three outcomes salient. We told participants to remember the longterm consequences of the outcomes. Here, we didn’t actually add any information that they don’t have access to, but we just made some information more salient and that made significantly more participants find the difference between 80% dying and 100% dying the greater one.

Then, we had yet another condition which we call the utopia condition, where we told participants that if humanity doesn’t go extinct, then the future will be extremely long and extremely good and it was said that if 80% die, then, obviously, at first, things are not so good, but after a recovery period, we would go on to this rosy future. We included this condition partly because such scenarios have been discussed to some extent by futurists, but partly also because we wanted to know, if we ramp up this goodness of the future to the maximum and maximize the opportunity costs of extinction, how many people would then find the difference between near extinction and extinction the greater one. Indeed, we found, then, that given such a scenario, a large majority found the difference between 80% dying and 100% dying the larger one so then, they did find extinction uniquely bad given this enormous opportunity cost of a utopian future.

Lucas Perry: What’s going on in my head right now is we were discussing earlier the role or not of these philosophical thought experiments in psychological analysis. You’ve done a great study here that helps to empirically concretize the biases and remedies for the issues that Derek Parfit had exposed and pointed to in his initial thought experiment. That was popularized by Nick Bostrom and it’s one of the key thought experiments for much of the existential risk community and people committed to longtermism because it helps to elucidate this deep and rich amount of value in the deep future and how we don’t normally consider that. Your discussion here just seems to be opening up for me tons of possibilities in terms of how far and deep this can go in general. The point of Peter Singer’s child drowning in a shallow pond was to isolate the bias of proximity and Derek Parfit’s thought experiment isolates the bias of familiarity, temporal bias and continuing into the future, it’s making me think, we also have biases about identity.

Derek Parfit also has thought experiments about identity, like with his teleportation machine where, say, you stepped into a teleportation machine and it annihilated all of your atoms but before it did so, it scanned all of your information and once it scanned you, it destroyed you and then re-assembled you on the other side of the room, or you can change the thought experiment and say on the other side of the universe. Is that really you? What does it mean to die? Those are the kinds of questions that are elicited. Listening to what you’ve developed and learned and reflecting on the possibilities here, it seems like you’re at the beginning of a potentially extremely important and meaningful field that helps to inform decision-making on these morally crucial and philosophically interesting questions and points of view. How do you feel about that or what I’m saying?

Stefan Schubert: Okay, thank you very much and thank you also for putting this Parfit thought experiment a bit in context. What you’re saying is absolutely right, that this has been used a lot, including by Nick Bostrom and others in the longtermist community and that was indeed one reason why we wanted to test it. I also agree that there are tons of interesting philosophical thought experiments there and they should be tested more. There’s also this other field of experimental philosophy where philosophers test philosophical thought experiments themselves, but in general, I think there’s absolutely more room for empirical testing of them.

With respect to temporal bias, I guess it depends a bit what one means by that, because we actually did get an effect from just mentioning that they should consider the longterm consequences, so I might think that to some extent it’s not only that people are biased in favor of the present, but it’s also that they don’t really consider the longterm future. They sort of neglect it and it’s not something that’s generally discussed among most people. I think this is also something that Parfit’s thought experiment highlights. You have to think about the really longterm consequences here and if you do think about them, then, your intuitions about these thought experiment should reverse.

Lucas Perry: People’s cognitive time horizons are really short.

Stefan Schubert: Yes.

Lucas Perry: People probably have the opposite discounting of future persons that I do. Just because I think that the kinds of experiences that Earth-originating intelligent life forms will be having in the near to 100 to 200 years will be much more deep and profound than what humans are capable of, that I would value them more than I value persons today. Most people don’t think about that. They probably just think there’ll be more humans and short of their bias towards present day humans, they don’t even consider a time horizon long enough to really have the bias kick in, is what you’re saying?

Stefan Schubert: Yeah, exactly. Thanks for that, also, for mentioning that. First of all, my view is that people don’t even think so much about the longterm future unless prompted to do so. Second, in this first study I mentioned, which was sort of a pre-study, we asked, “How good do you think that the future’s going to be?” On the average, I think they said, “It’s going to be slightly better than the present” and that would be very different from your view, then, that the future’s going to be much better. You could argue that this view that the future is going to be about as good as present is somewhat unlikely. I think it’s going to be much better or maybe it’s going to be much worse. There’s several different biases or errors that are present here.

Merely making the longterm consequences of the three outcomes salient, that already makes people more inclined to find a difference between 80% dying and 100% dying the greater one, so then you don’t add any information. Also ,specifying that the longterm outcomes are going to be extremely good, that makes a further difference that make most people find the difference between 80% dying and 100% dying the greater one.

Lucas Perry: I’m sure you and I, and listeners as well, have the hilarious problem of trying to explain this stuff to friends or family members or people that you meet that are curious about it and the difficulty of communicating it and imparting the moral saliency. I’m just curious to know if you have explicit messaging recommendations that you have extracted or learned from the study that you’ve done.

Stefan Schubert: You want to make the future more salient if you want people to care more about existential risk. With respect to explicit messaging more generally, like I said, there haven’t been that many studies on this topic, so I can’t refer to any specific study that says that this is how you should work with the messaging on this topic but just thinking more generally, one thing I’ve been thinking about is that maybe, with many of these issues, it’s just that it takes a while for people to get habituated with them. At first, if someone hears a very surprising statement that has very far reaching conclusions, they might be intuitively a bit skeptical about it, independently of how reasonable that argument would be for someone who would be completely unbiased. Their prior is that, probably, this is not right and to some extent, this might even be reasonable. Maybe people should be a bit skeptical of people who say such things.

But then, what happens is that most such people who make such claims that seem to people very weird and very far-reaching, they get discarded after some time because people poke holes in their arguments and so on. But then, a small subset of all such people, they actually stick around and they get more and more recognition and you could argue that that’s what’s now happening with people who work on longtermism and X-risk. And then, people slowly get habituated to this and they say, “Well, maybe there is something to it.” It’s not a fully rational process. I think this doesn’t just relate to longtermism an X-risk but maybe also specifically to AI risk, where it takes time for people to accept that message.

I’m sure there are some things that you can do to speed up that process and some of them would be fairly obvious like have smart, prestigious, reasonable people talk about this stuff and not people who don’t seem as credible.

Lucas Perry: What are further areas of the psychology of longtermism or existential risk that you think would be valuable to study? And let’s also touched upon other interesting areas for effective altruism as well.

Stefan Schubert: I mentioned previously people’s empirical beliefs, that could be valuable. One thing I should mention there is that I think that people’s empirical beliefs about the distant future are massively affected by framing effects, so depending on how you ask these questions, you are going to get very different answers so that’s important to remember that it’s not like people have these stable beliefs and they will always say that. The other thing I mentioned is moral judgments, and I said we stated moral judgements about human extinction, but there’s a lot of other stuff to do, like people’s views on population ethics could obviously be useful. Views on whether creating happy people is morally valuable. Whether it’s more valuable to bring large number of people whose life is barely worth living into existence than to bring in a small number of very happy people into existence and so on.

Those questions obviously have relevance for the moral value of the future. One thing I would want to say is that if you’re rational, then, obviously, your view on what and how much we should do to affect the distant future, that should arguably be a function of your moral views, including on population ethics, on the one hand, and also your empirical views of how the future’s likely to pan out. But then, I also think that people obviously aren’t completely rational and I think, in practice, their views on the longterm future will also be influenced by other factors. I think that their view on whether helping the longterm future seems like an inspiring project, that might depend massively on how the issue is framed. I think these aspects could be worth studying because if we find these kinds of aspects, then we might want to emphasize the positive aspects and we might want to adjust our behavior to avoid the negative. The goal should be to formulate a vision of longtermism that feels inspiring to people, including to people who haven’t put a lot of thought into, for instance, population ethics and related matters.

There are also some other specific issues which I think could be useful to study. One is the psychology of predictions about the distant future and the implications of the so-called construal level theory for the psychology or the longterm future. Many effective altruists would know construal level theory under another name: near mode and far mode. This is Robin Hanson’s terminology. Construal level theory is a theory about psychological distance and how it relates to how abstractly we construe things. It says that we conceive of different forms of distance – spatial, temporal, social – similarly. The second claim is that we conceive of items and events at greater psychological distance. More abstractly, we focus more on big picture features and less on details. So, Robin Hanson, he’s discussed this theory very extensively including with respect to the long term future. And he argues that the great psychological distance to the distant future causes us to reason in overly abstract ways, to be overconfident to have poor epistemics in general about the distant future.

I find this very interesting, and these kinds of ideas are mentioned a lot in EA and the X-risk community. But, to my knowledge there hasn’t been that much research which applies construal level theory specifically to the psychology of the distant future.

It’s more like people look at these general studies of construal level theory, and then they noticed that, well, the temporal distance to the distant future is obviously extremely great. Hence, these general findings should apply to a very great extent. But, to my knowledge, this hasn’t been studied so much. And given how much people discuss near or far mode in this case, it seems that there should be some empirical research.

I should also mention that I find that construal level theory a very interesting and rich psychological theory in general. I could see that it could illuminate the psychology of the distant future in numerous ways. Maybe it could be some kind of a theoretical framework that I could use for many studies about the distant future. So, I recommend that key paper from 2010 by Trope and Liberman on construal level theory.

Lucas Perry: I think that just hearing you say this right now, it’s sort of opening my mind up to the wide spectrum of possible applications of psychology in this area.

You mentioned population ethics. That makes me just think of in the context of EA and longtermism and life in general, the extent to which psychological study and analysis can find ethical biases and root them out and correct for them, either by nudging or by changing the explicit methods by which humans cognize about such ethics. There’s the extent to which psychology can better inform our epistemics, so this is the extent to which we can be more rational.

And I’m reflecting now how quantum physics subverts many of our Newtonian mechanics and classical mechanics, intuitions about the world. And there’s the extent to which psychology can also inform the way in which our social and experiential lives also condition the way that we think about the world and the extent to which that sets us astray in trying to understand the fundamental nature of reality or thinking about the longterm future or thinking about ethics or anything else. It seems like you’re at the beginning stages of debugging humans on some of the most important problems that exist.

Stefan Schubert: Okay. That’s a nice way of putting it. I certainly think that there is room for way more research on the psychology of longtermism and X-risk.

Lucas Perry: Can you speak a little bit now here about speciesism? This is both an epistemic thing and an ethical thing in the sense that we’ve invented these categories of species to describe the way that evolutionary histories of beings bifurcate. And then, there’s the psychological side of the ethics of it where we unnecessarily devalue the life of other species given that they fit that other category.

Stefan Schubert: So, we have one paper on the review, which is called “Why People Prioritize Humans Over Animals: A Framework for Moral Anthropocentrism.

To give you a bit of context, there’s been a lot of research on speciesism and on humans prioritizing humans over animals. So, in this paper we sort of try to take a bit more systematic approach and pick these different hypotheses for why humans prioritize humans over animals against each other, and look at their relative strengths as well.

And what we find is that there is truth to several of these hypotheses of why humans prioritize humans over animals. One contributing factor is just that they value individuals with greater mental capacities, and most humans have great mental capacities than most animals.

However, that explains the only part of the effect we find. We also find that people think that humans should be prioritized over animals even if they have the same mental capacity. And here, we find that this is for two different reasons.

First, according to our findings, people are what we call species relativists. And by that, we mean that they think that members of the species, including different non-human species, should prioritize other members of that species.

So, for instance, humans should prioritize other humans, and an elephant should prioritize other elephants. And that means that because humans are the ones calling the shots in the world, we have a right then, according to this species relativist view, to prioritize our own species. But other species would, if they were in power. At least that’s the implication of what the participants say, if you take them at face value. That’s species relativism.

But then, there is also the fact that they exhibit an absolute preference for humans over animals, meaning that even if we control for the mental capacities of humans and animals, and even if we control for the species relativist factors that we control for who the individual who could help them is, there remains a difference which can’t be explained by those other factors.

So, there’s an absolute speciesist preference for humans which can’t be explained by any further factor. So, that’s an absolute speciesist preference as opposed to this species relativist view.

In total, there’s a bunch of factors that together explain why humans prioritize animals, and these factors may also influence each other. So, we present some evidence that if people have a speciesist preference for humans over animals, that might, in turn, lead them to believe that animals have less advanced mental capacities than they actually have. And because they have this view that individuals with lower mental capacity, they are less morally valuable, that leads them to further deprioritize animals.

So, these three different factors, they sort of interact with each other in intricate ways. Our paper gives this overview over these different factors which contribute to humans prioritizing humans over animals.

Lucas Perry: This helps to make clear to me that a successful psychological study with regards to at least ethical biases will isolate the salient variables which are knobs that are tweaking the moral saliency of one thing over another.

Now, you said mental capacities there. You guys aren’t bringing consciousness or sentience into this?

Stefan Schubert: We discuss different formulations at length, and we went for the somewhat generic formulation.

Lucas Perry: I think people have beliefs about the ability to rationalize and understand the world, and then how that may or may not be correlated with consciousness that most people don’t make explicit. It seems like there are some variables to unpack underneath cognitive capacity.

Stefan Schubert: I agree. This is still like fairly broad brushed. The other thing to say is that sometimes we say that this human has as advanced mental capacities as these animals. Then, they have no reason to believe that the human has a more sophisticated sentience or is more conscious or something like that.

Lucas Perry: Our species membership tells me that we probably have more consciousness. My bedrock thing is I care about how much the thing can suffer or not, not how well it can model the world. Though those things are maybe probably highly correlated with one another. I think I wouldn’t be a speciesist if I thought human beings were currently the most important thing on the planet.

Stefan Schubert: You’re a speciesist if you prioritize humans over animals purely because of species membership. But, if you prioritize one species over another for some other reasons which are morally relevant, then you would not be seen as a speciesist.

Lucas Perry: Yeah, I’m excited to see what comes of that. I think that working on overcoming racism and misogyny and other things, and I think that overcoming speciesism and temporal biases and physical space, proximity biases are some of the next stages in human moral evolution that have to come. So, I think it’s honestly terrific that you’re working on these issues.

Is there anything you would like to say or that you feel that we haven’t covered?

Stefan Schubert: We have one paper which is called “The Puzzle of Ineffective Giving,” where we study this misconception that people have, which is that they think the difference in effectiveness between charities is much smaller than it actually is. So, experts think that the most effective charities are vastly much more effective than the average charity, and people don’t know that.

That seems to suggest that beliefs play a role in ineffective giving. But, there was one interesting paper called “Impediments to Effective Altruism” where they show that even if you tell people that cancer charity is less effective than an arthritis charity, they still donate.

So, then we have this other paper called “The Many Obstacles to Effective Giving.” It’s a bit similar to this speciesist paper, I guess, that we sort of pit different competing hypotheses that people have studied against each other. We give people different tasks, for instance, tasks which involve identifiable victims and tasks which involve ineffective but low overhead charities.

And then, we sort of started, well, what if we tell them how to be effective? Does that change how they behave? What’s the role of that pure belief factor? What’s the role of preferences? The result is a bit of a mix. Both beliefs and preferences contribute to ineffective giving.

In the real world, it’s likely that are several beliefs and preferences that obstruct effective giving present simultaneously. For instance, people might fail to donate to the most effective charity because first, it’s not a disaster charity, and they might have a preference for a disaster charity. And it might have a high overhead, and they might falsely believe then that high overhead entails low effectiveness. And it might not highlight identifiable victims, and they have a preference for donating to identifiable victims.

Several of these obstacles are present at the same time, and in that sense, ineffective giving is overdetermined. So, fixing one specific obstacle may not make as much of the difference as one would have wanted. That might support the view that what we need is not primarily behavioral interventions that address individual obstacles, but rather a more broad mindset change that can motivate people to proactively seek out the most effective ways of doing good.

Lucas Perry: One other thing that’s coming to my mind is the proximity of a cause to someone’s attention and the degree to which it allows them to be celebrated in their community for the good that they have done.

Are you suggesting that the way for remedying this is to help instill a curiosity and something resembling the EA mindset that would allow people to do the cognitive exploration and work necessary to transcend these limitations that bind them to their ineffective giving or is that unrealistic?

Stefan Schubert: First of all, let me just say that with respect to this proximity issue, that was actually another task that we had. I didn’t mention all the tasks. So, we told people that you can either help a local charity or a charity, I think it was in India. And then, we told them that the Indian charity is more effective and asked “where would you want to donate?”

So, you’re absolutely right. That’s another obstacle to effective giving, that people sometimes have preferences or beliefs that local charities are more effective even when that’s not the case. Some donor I talked to, he said, “Learning how to donate effectively, it’s actually fairly complicated, and there are lots of different things to think about.”

So, just fixing the overhead myth or something like that, that may not take you very far, especially if you think that the very best charities that are sort of extremely much more effective than the average charity. So, what’s important is not going from an average charity to a somewhat more effective charity, but to actually find the very best charities.

And to do that, we may need to address many psychological obstacles because the most effective charities, they might be very weird and sort of concerned with longterm future or what-not. So, I do think that a mindset where people seek out effective charities, or defer to others who do, that might be necessary. It’s not super easy to make people adopt that mindset, definitely not.

Lucas Perry: We have charity evaluators, right? These institutions which are intended to be reputable enough that they can tell you which are the most effective charities to donate to. It wouldn’t even be enough to just market those really hard. They’d be like, “Okay, that’s cool. But, I’m still going to donate my money to seeing eye dogs because blindness is something that runs in my family and is experientially and morally salient for me.”

Is the way that we fix the world really about just getting people to give more, and what is the extent to which the institutions which exist, which require people to give, need to be corrected and fixed? There’s that tension there between just the mission of getting people to give more, and then the question of, well, why do we need to get everyone to give so much in the first place?

Stefan Schubert: This insight that ineffective giving is overdetermined and there are lots of things that stand in a way of effective giving, one thing I like about it is that it seems to sort of go well with this observation that it is actually, in the real world, very difficult to make people donate effectively.

I might relate there a bit to what you mentioned about the importance of giving more, and so we could sort of distinguish between the different kinds of psychological limitations. First, that limitations that relate to how much we give. We’re selfish, so therefore we don’t necessarily give as much of our monetary rather resources as we should. There are sort of limits to altruism.

But then, there are also limits to effectiveness. We are ineffective for various reasons that we’ve discussed. And then, there’s also fact that we can have the wrong moral goals. Maybe we work towards short term goals, but then we would realize on the careful reflection that we should work towards long term goals.

And then, I was thinking like, “Well, which of these obstacles should you then prioritize if you turn this sort of prioritization framework inwards?” And then, you might think that, well, at least with respect to giving, it might be difficult for you to increase the amount that you give by more than 10 times. Americans, for instance, they already donate several percent of their income. We know from historical experience that it might be hard for people to sustain very high levels of altruism, so maybe it’s difficult for them to sort of ramp up this altruist factor to the extreme amount.

But then, with effectiveness, if this story about heavy-tailed distributions of effectiveness is right, then you could increase the effectiveness of your donations a lot. And arguably, the sort of psychological price for that is lower. It’s very demanding to give up a huge proportion of your income for others, but I would say that it’s less demanding to redirect your donations to a more effective cause, even if you feel more strongly for the ineffective cause.

I think it’s difficult to really internalize how enormously important it is to go for the most effective option. And also, of course, then the third factor to sort of change your moral goals if necessary. If people would reduce their donations by 99%, they would reduce the impact by 99%. Many people would feel guilty about it.

But then, if they reduce their impact 99% via reducing their effectiveness 99% through choosing an ineffective charity, then people don’t feel similarly guilty, so similar to Nate Soares’ idea of a care-o-meter: our feelings aren’t adjusted for these things, so we don’t feel as much about the ineffectiveness as we do about altruistic sacrifice. And that might lead us to not focus enough on effectiveness, and we should really think carefully about going that extra mile for the sake of effectiveness.

Lucas Perry: Wonderful. I feel like you’ve given me a lot of concepts and tools that are just very helpful for reinvigorating a introspective mindfulness about altruism in my own life and how that can be nurtured and developed.

So, thank you so much. I’ve really enjoyed this conversation for the reasons I just said. I think this is a very important new research stream in this space, and it seems small now, but I really hope that it grows. And thank you for you and your colleagues work here on seeding and doing the initial work in this field.

Stefan Schubert: Thank you very much. Thank you for having me. It was a pleasure.

AI Alignment Podcast: Machine Ethics and AI Governance with Wendell Wallach

Wendell Wallach has been at the forefront of contemporary emerging technology issues for decades now. As an interdisciplinary thinker, he has engaged at the intersections of ethics, governance, AI, bioethics, robotics, and philosophy since the beginning formulations of what we now know as AI alignment were being codified. Wendell began with a broad interest in the ethics of emerging technology and has since become focused on machine ethics and AI governance. This conversation with Wendell explores his intellectual journey and participation in these fields.

 Topics discussed in this episode include:

  • Wendell’s intellectual journey in machine ethics and AI governance 
  • The history of machine ethics and alignment considerations
  • How machine ethics and AI alignment serve to produce beneficial AI 
  • Soft law and hard law for shaping AI governance 
  • Wendell’s and broader efforts for the global governance of AI
  • Social and political mechanisms for mitigating the risks of AI 
  • Wendell’s forthcoming book

Key points from Wendell:

  • “So when you were talking about machine ethics or when we were talking about machine ethics, we were really thinking about it in terms of just how do you introduce ethical procedures so that when machines encounter new situations, particularly when the designers can’t fully predict what their actions will be, that they factor in ethical considerations as they choose between various courses of action. So we were really talking about very basic program in the machines, but we weren’t just thinking of it in terms of the basics. We were thinking of it in terms of the evolution of smart machines… What we encounter in the Singularity Institute, now MIRI for artificial intelligence approach of friendly AI and what became value alignment is more or less a presumption of very high order intelligence capabilities by the system and how you would ensure that their values align with those of the machines. They tended to start from that level. So that was the distinction. Where the machine ethics folks did look at those futuristic concerns, they did more so from a philosophical level and at least a belief or appreciation that this is going to be a relatively evolutionary course, whereby the friendly AI and value alignment folks, they tended to presume that we’re going to have very high order cognitive capabilities and how do we ensure that those align with the systems. Now, the convergence, I would say, is what’s happening right now because in workshops that have been organized around the societal and ethical impact of intelligent systems.”
  • “My sense has been that with both machine ethics and value alignment, we’ve sort of got the cart in front of the horse. So I’m waiting to see some great implementation breakthroughs, I just haven’t seen them. Most of the time, when I encounter researchers who say they’re taking seriously, I see they’re tripping over relatively low level implementations. The difficulty is here, and all of this is converging. What AI alignment was initially and what it’s becoming now I think are quite different. I think in the very early days, it really was presumptions that you would have these higher order intelligences and then how were you going to align them. Now, as AI alignment, people look at the value issues as they intersect with present day AI agendas. I realize that you can’t make the presumptions about the higher order systems without going through developmental steps to get there. So, in that sense, I think whether it’s AI alignment or machine ethics, the one will absorb the lessons of the other. Both will utilize advances that happen on both fronts.”
  • “David Collingridge wrote a book where he outlined a problem that is now known as the Collingridge Dilemma. Basically, Collingridge said that while it was easiest to regulate a technology early in its style development, early in its development, we had a little idea of what its societal impact would be. By the time we did understand what the challenges from the societal impact were, the technology would be so deeply entrenched in our society that it would be very difficult to change its trajectory. So we see that today with social media. Social media was totally entrenched in our society before we realized how it could be manipulated in ways that would undermine democracy. Now we’re having a devil of a time of figuring out what we could do. So Gary and I, who had been talking about these kinds of problems for years, we realized that we were constantly lamenting the challenge, but we altered the conversation one day over a cup of coffee. We said, “Well, if we had our druthers, if we have some degree of influence, what would we propose?” We came up with a model that we referred to as governance coordinating committees. Our idea was that you would put in place a kind of issues manager that would try and guide the development of a field, but first of all, it would just monitor development, convene forums between the many stakeholders, map issues and gaps, see if anyone was addressing those issues and gaps or where their best practices had come to the floor. If these issues were not being addressed, then how could you address them, looking at a broad array of mechanisms. By a broad array of mechanisms, we meant you start with feasible technological solutions, you then look at what can be managed through corporate self-governance, and if you couldn’t find anything in either of those areas, then you turn to what is sometimes called soft law… So Gary and I proposed this model. Every time we ever talked about it, people would say, “Boy, that’s a great idea. Somebody should do that.” I was going to international forums, such as going to the World Economic meetings in Davos, where I’d be asked to be a fire-starter on all kinds of subject areas by safety and food security and the law of the ocean. In a few minutes, I would quickly outline this model as a way of getting people to think much more richly about ways to manage technological development and not just immediately go to laws and regulatory bodies. All of this convinced me that this model was very valuable, but it wasn’t being taken up. All of that led to this first International Congress for the Governance of Artificial Intelligence, which will be convened in Prague on April 16 to 18. I do invite those of you listening to this podcast who are interested in the international governance of AI or really agile governance for technology more broadly to join us at that gathering.”

 

Important timestamps: 

0:00 intro

2:50 Wendell’s evolution in work and thought

10:45 AI alignment and machine ethics

27:05 Wendell’s focus on AI governance

34:04 How much can soft law shape hard law?

37:27 What does hard law consist of?

43:25 Contextualizing the International Congress for the Governance of AI

45:00 How AI governance efforts might fail

58:40 AGI governance

1:05:00 Wendell’s forthcoming book

 

Works referenced:

A Dangerous Master: How to  Keep Technology from Slipping Beyond Our Control 

Moral Machines: Teaching Robots Right from Wrong

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Hey everyone, welcome to the AI Alignment Podcast. I’m Lucas Perry. Today, we’ll be speaking with Wendell Wallach. This episode is primarily dedicated to the issue and topic of AI governance, though in order to get there we go on and explore Wendell’s intellectual journey in machine ethics and how that led him up to his current efforts in AI governance. We also discuss how machine ethics and AI alignment both attempt to serve the project of creating beneficial AI and deal with the moral and ethical considerations related to the growing power and use of artificial intelligence. We discuss soft law and hard law for shaping AI governance. We get into Wendell’s efforts for the global governance of AI and discuss the related risks. And to finish things off we also briefly touch on AGI governance and Wendell’s forthcoming book. If you find this podcast valuable, interesting, or helpful, consider sharing it with others who might find it valuable as well.

For those who are not familiar with Wendell, Wendell is an internationally recognized expert on the ethical and governance concerns posed by emerging technologies, particularly artificial intelligence and neuroscience. Wendell is a consultant and ethicist, and a scholar at Yale University’s Interdiscplinary Center or Bioethics. He is also a co-author with Colin Allen, Moral Machines: Teaching Robots Right from Wrong. This work maps machine ethics, machine morality, computational morality, and friendly AI. He has a second and more recent book, A Dangerous Master: How to Keep Technology from Slipping Beyond our Control. From my perspective of things, it seems there is much growing enthusiasm and momentum in the space of AI policy and governance efforts. So, this conversation and those like it I feel help to further develop my perspective and understanding of where we are in the project and space of AI governance. For these reasons, I hope that you’ll find it valuable as well. So, let’s get into our conversation with Wendell Wallach.

It would be great if you could start by clarifying the evolution of your thought in science and technology over the years. It appears that you’ve gone from being interested in bioethics to machine ethics to now a more recent focus in AI governance and AI ethics. Can you take us through this movement in your thought and work?

Wendell Wallach: In reality, all three of those themes have been involved in my work from the very beginning, but the emphasis has changed. So I lived a very idiosyncratic life that ended with two computer consulting companies that I had helped start. But I had felt that there were books that I wanted to get out of my head, and I turned those companies over to the employees, and I started writing and realized that I was not up on some of the latest work in cognitive science. So one thing led to another, and I was invited to the first meeting of a technology and ethics working group at Yale University that had actually been started by Nick Bostrom when he was at Yale and Bonnie Kaplan. Nick left about a year later, and a year after that, Bonnie Kaplan had an accident, and the chair of that working group was turned over to me.

So that started my focus on technology and ethics more broadly. It was not limited to bioethics, but it did happen within the confine for the Yale Interdisciplinary Center for Bioethics. I was all over the place and the sense that I was already a kind of transdisciplinary thinker, transdisciplinary scholar, but having the challenge of focusing my study and my work so it was manageable. In other words, I was trying to think broadly at the same time as I was trying to focus on different subject areas. One thing led to another. I was invited to a conference in Baden Baden where I met Colin Allen. We together with the woman who started the workshop there, Eva Schmidt, began thinking about a topic that we were calling machine morality at that time. By machine morality, we meant thinking about how moral decision making faculties might be implemented in computers and robots.

Around the same time, there were other scholars working on the same themes. Michael and Susan Anderson, for example, had grabbed on to the title ‘machine ethics.’ Over time, as these various pathways converge, machine ethics became the main research area or the way in which this research project was referred to. It did have other names in addition to machine morality. It was sometimes called computational morality. At the same time, there were others who were working on it under the title of friendly AI, a term that was coined by Eliezer Yudkowsky. But the real difference between the machine ethics folks and the friendly AI folks was that the friendly AI folks were explicitly focused upon the challenge of how you would manage or tame superintelligence, whereby the machine ethics crew were much more ethicists, philosophers, computer scientists who were really thinking about first steps toward introducing moral decision making faculties, moral sensitivity into computers and robots. This was a relatively small group of scholars, but as this evolved over time, Eva and Collin and I decided that we would write a book mapping the development of this field of research.

Eva Schmidt fell away, and the book finally came out from Oxford University Press under the title Moral Machines: Teaching Robots Right from Wrong. So, as you may be aware, that’s still a seminal text out there. It’s still something that is read broadly and is being cited broadly, and in fact, it’s citations are going up and were even being requested by Oxford University Press to produce an update of the book. Machine Ethics was two parts philosophy, one part, computer science. It was basically two fields of study. One was looking explicitly at the question of implementing sensitivity to moral considerations in computers and robots, and the other side with really thinking comprehensively about how humans make moral decisions. So, arguably, Moral Machines was the first book that really took that comprehensive look at human moral decision making seriously. It was also a time when there was a lot of research going on in moral psychology in the way in which people’s affective and decision making concerns affected what became our ethical decision making processes.

So we were also able to bring some of that in, bring evolutionary psychology in and bring a lot of new fields of research that had not really been given their due or had not been integrated very well with the dominant reason based theories of ethics such as deontology, which is really ethical approaches that focus on duties, rules and consequentialism, which is an ethical theory that says right and wrong is not determined by following the rules or doing your duty, it’s determined by looking at the consequences of your action and selecting that course or the action likely to produce the greatest good for the greatest number. So it’s like we were integrating evolutionary psychology, cognitive science, moral psychology, together with the more rational-based theories, as we looked at top down and bottom up approaches for introducing sensitivity to ethical considerations in computers and robots.

The major shift in that whole trajectory and one I only learned about at the first FLI conference in Puerto Rico where I and Jim Moor were the only two people who had been actively involved in the machine ethics community, Jim Moor is a professor at Dartmouth, for those of you who are not aware of him, and he has been a seminal figure in the philosophy of computing for decades now, was at that Puerto Rican gathering, the concept of value alignment with race to us for the first time. What I realized was that those who are talking about value alignment from the AI perspective, by and large, had little or no understanding that there had ever been a field or was an ongoing field known as machine ethics.

That led to my applying for a Future of Life Institute grant, which I was awarded as PI. That grant was to host three annual workshops bringing together experts not only in AI, but machine ethics, philosophy, generally, resilience, engineering, robotics, a broad array of fields of people who had been thinking seriously about value issues in computational systems. Those really became groundbreaking workshops where it was clear that the computer scientists and the AI researchers knew very little about ethics issues, and the ethicists didn’t necessarily have a great depth of understanding of some of the challenges coming up in artificial intelligence. Bart Selman and Stuart Russell agreed to be co-chairs of those workshops with me. The last one was completed over a year ago with some closing presentations in New York city and at Yale.

Lucas Perry: I think it’d be helpful here if you could disambiguate the machine ethics crowd and way of thinking and what has been done there from the AI alignment, value alignment, Eliezer branch of thinking that has been going on. AI alignment seems more focused on explicitly trying to understand human preference hierarchies and be able to specify objectives without the machine systems doing other things that we don’t want them to do. Then you said that machine ethics is about imbuing ethical decision making faculties or reasoning or sensitivities in machine systems. That, to me, seems more like normative ethics. We have these normative theories like you mentioned deontology and consequentialism and virtue ethics, and maybe machines can invent other normative ethical theories. So they seem like different projects.

Wendell Wallach: They are very different projects. The question is whether they converge or not or whether they can really be treated totally distinct projects from each other. So when you were talking about machine ethics or when we were talking about machine ethics, we were really thinking about it in terms of just how do you introduce ethical procedures so that when machines encounter new situations, particularly when the designers can’t fully predict what their actions will be, that they factor in ethical considerations as they choose between various courses of action. So we were really talking about very basic program in the machines, but we weren’t just thinking of it in terms of the basics. We were thinking of it in terms of the evolution of smart machines. For example, in Moral Machines, Colin and I had a chart that we had actually developed with Eva Schmidt and had been in earlier articles that the three of us offered, and it looked at the development of machines on two axes.

One was increasing autonomy, and the other was increasing sensitivity with at the far other extremes, sensitivity to ethical consideration. We realized that you could put any tool within that chart. So a hammer has no sensitivity, and it has no autonomy. But when you think of a thermostat, it has a very low degree of sensitivity and a very low degree of autonomy, so as temperatures change, it can turn on or off heating. We then, within that chart, had a series of semicircles, one that delineated when we moved into the realm of what we labeled operational morality. By operational morality, we meant that the computer designers could more or less figure out all the situations the system would encounter and hard program its responses to those situations. The next level was what we call functional morality, which was as the computer programmers could no longer predetermine all the situations the system would encounter, the system would have to have some kind of ethical sub routines. Then at the highest level was full moral agency.

What we encounter in the Singularity Institute, now MIRI for artificial intelligence approach of friendly AI and what became value alignment is more or less a presumption of very high order intelligence capabilities by the system and how you would ensure that their values align with those of the machines. They tended to start from that level. So that was the distinction. Where the machine ethics folks did look at those futuristic concerns, they did more so from a philosophical level and at least a belief or appreciation that this is going to be a relatively evolutionary course, whereby the friendly AI and value alignment folks, they tended to presume that we’re going to have very high order cognitive capabilities and how do we ensure that those align with the systems. Now, the convergence, I would say, is what’s happening right now because in workshops that have been organized around the societal and ethical impact of intelligent systems. The first experiments even the value alignment people are doing still tend to be relatively low level experiments, given the capabilities assistants have today.

So I would say, in effect, they are machine ethics experiments or at least they’re starting to recognize that the challenges at least initially aren’t that much different than those the machine ethicists outlined. As far as the later concerns go, which is what is the best course to proceed on producing systems that are value aligned, well there, I think we have some overlap also coming into the machine ethicist, which raises questions about some of these more technical and mathematically-based approaches to value alignment and whether they might be successful. In that regard, Shannon Vallor, an ethicist at Santa Clara University, who wrote a book called Technology and the Virtues, and has now taken a professorship at Edinburgh, she and I produced a paper called, I think it was From Machine Ethics to Value Alignment to virtue alignment. We’re really proposing that analytical approaches alone will not get us to machines that we can trust or that will be fully ethically aligned.

Lucas Perry: Can you provide some examples about specific implementations or systems or applications of machine ethics today?

Wendell Wallach: There really isn’t much. Sensitivity to ethical considerations is still heavily reliant on how much we can get that input into systems and then how you integrate that input. So we are still very much at the stage of bringing various inputs in without a lot of integration, let alone analysis of what’s been integrated and decisions being made based on that analysis. For all purposes and both machine ethics, then I would say, bottom up value alignment, there’s just not a lot that’s been done. These are still somewhat futuristic research trajectories.

Lucas Perry: I think I’m just trying to poke here to understand better about what you find most skillful and useful about both approaches in terms of a portfolio approach to building beneficial AI systems, like if this is an opportunity to convince people that machine ethics is something valuable and that should be considered and worked on and expanded. I’m curious to know what you would say.

Wendell Wallach: Well, I think machine ethics is the name of the game in the sense that for all I talk about systems that will have very high order of capabilities. We just aren’t there. We’re still dealing with relatively limited forms of cognitive decision making. For all the wonder that’s going on in machine learning, that’s still a relatively limited kind of learning approach. So I’m not dealing with machines that are making fundamental decisions at this point, or if they are allowed to, it’s largely because humans have abrogated their responsibility, trust the machines, and let the machines make the decisions regardless of whether the machines actually have the capabilities to make sophisticated decisions.

Well, I think as we move along, as you get more and more inputs into systems and you figure out ways of integrating them, there will be the problem of which decisions can be made without, let’s just say, higher order consciousness or understanding of the falling implications of those systems, of the situations, of the ethical concerns arising in the situations and which decisions really require levels of, and I’m going to use the understanding and consciousness words, but I’m using them in a circumspect way for the machines to fully appreciate the ramifications of the decisions being made and therefore those who are affected by those decisions or how those decisions will affect those around it.

Our first stage is going to be largely systems of limited consciousness or limited understanding and our appreciation of what they can and cannot do in a successful manner and when you truly need a human decision maker in the loop. I think that’s what we are broadly. The differences between the approaches with the AI researchers are looking at what kind of flexibility they have within the tools I have now for building AI systems. The machine ethicists, I think they’ll tend to be largely philosophically rooted or ethically rooted or practically ethically rooted, and therefore they tend to be more sensitive to the ramifications of decision makings by machines and capacities that need to be accounted for before you want to turn over a decision to a machine, such as a lethal autonomous weapon. What should the machine really understand before it can be a lethal autonomous weapon, and therefore, how tightly does the meaningful human control need to be?

Lucas Perry: I’m feeling a tension between trying to understand the role and place of both of these projects and how they’re skillful. In terms just strict AI alignment, if we had a system that wanted to help us and it was very good at preference learning such that it could use all human artifacts in the world like books, movies and other things. It can also study your behavior and also have conversations with us. It could leverage all data points in the world for building a deep and rich understanding of individual human preference hierarchies, and then also it could extrapolate broad preference facts about species wide general considerations. If that project were to succeed, then within those meta preferences and that preference hierarchy exists the kinds of normative ethical systems that machine ethics is trying to pay lip service to or to be sensitive towards or to imbue in machine systems.

From my perspective, if that kind of narrative that I just gave is true or valid, then that would be sort of a complete value alignment, and so far as it would create beneficial machine systems. But in order to have that kind of normative decision making and sensibilities in machine systems such that they fully understand and are sensitive to the ethical ramifications of certain decision makings, that requires higher order logic and the ability to generate concepts and to interrelate them and to shift them around and use them in the kinds of ways that human beings do, which we’re far short of.

Wendell Wallach: So that’s where the convergence is. We’re far short of it. So I have no problem with the description you made. The only thing I noted is, at the beginning you said, if we had, and for me, in order to have, you will have to go through these stages of development that we have been alluding to as machine ethics. Now, how much of that will be able to utilize tools that come out of artificial intelligence that we had not been able to imagine in the early days of machine ethics? I have no idea. There’s so many uncertainties on how that pathway is going to unfold. There’re uncertainties about what order the breakthroughs will take place, how the breakthroughs will interact with other breakthroughs and technology more broadly, whether there will be public reactions to autonomous systems along the way that slow down the course of development or even stop certain areas of research.

So I don’t know how this is all going to unfold. I do see within the AI community, there is kind of a leap of faith to a presumption of breaths of capacity that when I look at it, I still look at, well, how do we get between here and there. When I look at getting between here and there, I see that you’re going to have to solve some of these lower level problems that got described more in the machine ethics world than have initially been seen by the value alignment approaches. That said, now that we’re getting researchers actually trying to look at implementing value alignment, I think they’re coming to appreciate that these lower level problems are there. We can’t presume high level preference parsing by machines without them going through developmental stages in relationship to understanding what a preference is, what a norm is, how they get applied within different contexts.

My sense has been that with both machine ethics and value alignment, we’ve sort of got the cart in front of the horse. So I’m waiting to see some great implementation breakthroughs, I just haven’t seen them. Most of the time, when I encounter researchers who say they’re taking seriously, I see they’re tripping over relatively low level implementations. The difficulty is here, and all of this is converging. What AI alignment was initially and what it’s becoming now I think are quite different. I think in the very early days, it really was presumptions that you would have these higher order intelligences and then how were you going to align them. Now, as AI alignment, people look at the value issues as they intersect with present day AI agendas. I realize that you can’t make the presumptions about the higher order systems without going through developmental steps to get there.

So, in that sense, I think whether it’s AI alignment or machine ethics, the one will absorb the lessons of the other. Both will utilize advances that happen on both fronts. All I’m trying to underscore here is there are computer engineers and roboticist and philosophers who reflected on issues that perhaps the value alignment people are learning something from. I, in the end, don’t care about machine ethics or value alignment per se, I just care about people talking with each other and learning what they can from each other and moving away from a kind of arrogance that I sometimes see happen on both sides of the fence that one says to the other you do not understand. The good news and one thing that I was very happy about in terms of what we did in these three workshops that I was PI on with the help of the Future of Life Institute was, I think we sort of broke open the door for transdisciplinary dialogue.

Now, true, This was just one workshop. Now, we have gone from a time where the first Future of Life Institute gathering of Puerto Rico, the ethicists in the room, Jim Moore and I were backbenchers, to a time where we have countless conferences that are basically transdisciplinary conferences where people from many fields of research are now beginning to listen to each of them. The serious folks in the technology and ethics really have recognized the richness of ethical decision making in real contexts. Therefore, I think they can point that out. Technologists sometimes like to say, “Well, you ethicist, what do you have to say because you can’t tell us what’s right and wrong anyway?” Maybe that isn’t what ethics is all about, about dictating what’s right and wrong. Maybe ethics is more about how do we navigate the uncertainties of life, and what kinds of intelligence need to be brought to bear to navigate the uncertainties of life with a degree of sensitivity, depth, awareness, and appreciation for the multilayered kinds of intelligences that come into play.

Lucas Perry: In the context of this uncertainty about machine ethics and about AI alignment and however much or little convergence there might be, let’s talk about how all of this leads up into AI governance now. You touched on a lot of your machine ethics work. What made you pivot into AI governance, and where is that taking you today?

Wendell Wallach: After completing moral machines, I started to think about the fact that very few people had a deep and multidisciplinary understanding of the broad array of ethical and societal impacts posed by emerging technologies. I decided to write a primer on that, focusing on what could go wrong and how we might diffuse ethical challenges and undesirable societal impacts. That was finally published under the title A Dangerous Master: How to Keep Technology from Slipping Beyond our Control. The first part of that was really a primer on the various fields of science from synthetic biology to geoengineering, what the benefits were, what could go wrong. But then the book was very much about introducing people to various themes that arise, managing complex, adaptive systems, resilience, engineering, transcending limits, a whole flock of themes that have become part of language of discussing emerging technologies but weren’t necessarily known to a broader public.

Even for those of us who are specialists in one area of research such as biotech, we have had very little understanding of AI or geoengineering or some of the other fields. So I felt there was a need for a primer. Then the final chapter for the primer, I turned to how some of these challenges might be addressed through governance and oversight. Simultaneously, while I was working on that book, Gary Marchant and I, Gary Marchant is the director of the Center for Law and Innovation at the Sandra Day O’Connor School of Law at Arizona State University. Gary has been a specialist in the law and governance of emerging technologies. He and I, in our interactions lamented the fact that it was very difficult for any form of governance of these technologies. It was something called the pacing problem. The pacing problem refers to the fact that scientific discovery and technological innovation is far outpacing our ability to put in place appropriate ethical legal oversight, and that converges with another dilemma that has bedeviled people in technology governance for decades, going back to 1980.

David Collingridge wrote a book where he outlined a problem that is now known as the Collingridge Dilemma. Basically, Collingridge said that while it was easiest to regulate a technology early in its style development, early in its development, we had a little idea of what its societal impact would be. By the time we did understand what the challenges from the societal impact were, the technology would be so deeply entrenched in our society that it would be very difficult to change its trajectory. So we see that today with social media. Social media was totally entrenched in our society before we realized how it could be manipulated in ways that would undermine democracy. Now we’re having a devil of a time of figuring out what we could do.

So Gary and I, who had been talking about these kinds of problems for years, we realized that we were constantly lamenting the challenge, but we altered the conversation one day over a cup of coffee. We said, “Well, if we had our druthers, if we have some degree of influence, what would we propose?” We came up with a model that we referred to as governance coordinating committees. Our idea was that you would put in place a kind of issues manager that would try and guide the development of a field, but first of all, it would just monitor development, convene forums between the many stakeholders, map issues and gaps, see if anyone was addressing those issues and gaps or where their best practices had come to the floor. If these issues were not being addressed, then how could you address them, looking at a broad array of mechanisms. By a broad array of mechanisms, we meant you start with feasible technological solutions, you then look at what can be managed through corporate self-governance, and if you couldn’t find anything in either of those areas, then you turn to what is sometimes called soft law.

Soft law is laboratory practices and procedures, standards, codes of conduct, insurance policy, a whole plethora of mechanisms that fall short of laws and regulatory oversight. The value of soft law is that soft law can be proposed easily, and you can throw it out if technological advances mean it’s no longer necessary. So it’s very agile, it’s very adaptive. Really anyone can propose the news off law mechanism. But that contributes to one of the downsides, which is you can have competing soft law, but the other downside is perhaps even more important is that you seldom have a means of enforcement if there are violations of soft law. So, on some areas you deem need enforcement, and that’s why hard law and regulatory institutions become important.

So Gary and I proposed this model. Every time we ever talked about it, people would say, “Boy, that’s a great idea. Somebody should do that.” I was going to international forums, such as going to the World Economic meetings in Davos, where I’d be asked to be a fire-starter on all kinds of subject areas by safety and food security and the law of the ocean. In a few minutes, I would quickly outline this model as a way of getting people to think much more richly about ways to manage technological development and not just immediately go to laws and regulatory bodies. All of this convinced me that this model was very valuable, but it wasn’t being taken up. All of that led to this first International Congress for the Governance of Artificial Intelligence, which will be convened in Prague on April 16 to 18. I do invite those of you listening to this podcast who are interested in the international governance of AI or really agile governance for technology more broadly to join us at that gathering.

Lucas Perry: Can you specify the extent to which you think that soft law, international norms will shape hard law policy?

Wendell Wallach: I don’t think any of this is that easy at the moment because when I started working on this project and working toward the Congress, there was almost no one in this space. Suddenly, we have a whole flock of organizations that have jumped into it. We have more than 53 lists of principles for artificial intelligence and all kinds of specifications of laws coming along like GDPR, and the EU will actually be coming out very soon with a whole other list of proposed regulations for the development of autonomous systems. So we are now in an explosion of groups, each of which in one form or another is proposing both laws and soft law mechanisms. I think that means we are even more in need of something like a governance coordinating committee. What I mean is loose coordination and cooperation, but at least putting some mechanism in place for that.

Some of the groups that have come to the floor are like the OECD, which actually represents a broad array of the nations, but not all of them. The Chinese were not party to the development of the OECD principles. The Chinese, for example, have somewhat different principles and laws that are most attractive in the west. My point is that we have an awful lot of groups, some of which would like to have a significant leadership role or are dominating role, and we’ll have to see to what extent they cooperate with each other or whether we finally have a cacophony of competing soft law recommendations. But I think even if there’s a competition at the UN perhaps with a new mechanism that we create or through each of these bodies like the OECD and IAAA individually, best practices will come to the fore over time and they will become the soft law guidelines. Now, which of those soft guidelines need to make hard law? That may vary from nation to nation.

Lucas Perry: The agility here is in part imbued by a large amount of soft laws, which will then clarify best practices?

Wendell Wallach: Well, I think like anything else, just like the development of artificial intelligence. There’s all kinds of experimentation going on, all kinds of soft law frameworks, principles which have to be developed into policy and soft law frameworks going on. It will vary from nation to nation. We’ll get an insight over time about which practices really work and which haven’t worked. Hopefully, with some degree of coordination, we can underscore the best practices, we can monitor the development of the field in a way where we can underscore where the issues that still need to be addressed. We may have forums to work out differences. There may never be a full consensus and there may not need to be a full consensus considering much of the soft law will be implemented on a national or regional view like front. Only some of it will need to be top down in the sense that it’s international.

Lucas Perry: Can you clarify the set of things or legal instruments which consist of soft law and then the side of things which make up a hard law?

Wendell Wallach: Well, hard law is always things that have become governmentally instituted. So the laws and regulatory agencies that we have in America, for example, or you have the same within Europe, but you have different approaches to hard law. The Europeans are more willing to put in pretty rigorous hard law frameworks, and they believe that if we codify what we don’t want, that will force developers to come up with new creative experimental pathways that accommodate our values and goals. In America, were reticent to codify things into hard law because we think that will squelch innovation. So those are different approaches. But below hard law, in terms of soft law, you really do have these vast array of different mechanisms. So I mentioned international standards, some of those are technical. We see a lot of technical standards come in out of the IEEE and the ISO. The IEEE, for example, has jumped into the governance of autonomous systems in a way where it wants to go beyond what can be elucidated technically to talk more about what kinds of values we’re putting in place and what the actual implementation of those values would be. So that’s soft law.

Insurance policies sometimes dictate what you can and cannot do. So that soft law. We have laboratory practices and procedures. What’s safe to do in a laboratory and what isn’t? That’s soft law. We have new approaches to implementing values within technical systems, what is sometimes referred to as value-added design. That’s kind of a form of soft law. There are innumerable frameworks that we can come up with and we can create new ones if we need to to help delineate what is acceptable and what isn’t acceptable. But again, that delineation may or may not be enforceable. Some enforcement is, if you don’t do what the insurance policy has demanded of you, you lose your insurance policy, and that’s a form of enforceability.

You can lose membership in various organizations. Soft law gets into great detail in terms of acceptable use of humans and animals in research. But at least that’s a soft law that has, within the United States and Europe and elsewhere, some ability to prosecute people who violate the rights of individuals, who harm animals in a way that is not acceptable in the course of doing the research. So what are we trying to achieve by convening a first International Congress for the Governance of Artificial Intelligence? First of all, our hope is that we will get a broad array of stakeholders present. So, far, nearly all the governance initiatives are circumspect in terms of who’s there and who is not there. We are making special efforts to ensure that we have a robust representation from the Chinese. We’re going to make sure that we have robust representation from those from underserved nations and communities who are likely to be very effected by AI, but not necessarily we’ll know a great deal about it. So having a broad array of stakeholders is the number one goal of what we are doing.

Secondly, between here and the Congress, we’re convening six experts workshops. What we intend to do with these expert workshops is bring together a dozen or more of those individuals who have already been thinking very deeply about the kinds of governance mechanisms that we need. Do understand that I’m using the word governance, not government. Government usually just entails hard law and bureaucracies. By governance, we mean bringing in many other solutions to what we call regulatory or oversight problems. So we’re hopeful that we’ll get experts not only in AI governance, but also in thinking about agile governance more broadly that we will have them come to these small expert workshops we’re putting together, and at those expert workshops, we hope to elucidate what are the most promising mechanisms for the international governance of the AI. If they can elucidate those mechanisms, they will then be brought before the Congress. At the Congress, we’ll have further discussions and a Richmond around some of those mechanisms, and then by the end of the Congress, we will have boats to see if there’s an overwhelming consensus of those present to move forward on some of these initiatives.

Perhaps, something like what I had called the governance coordinating committee might be one of those mechanisms. I happen to have also been an advisor to the UN secretary General’s higher level panel on digital cooperation, and they drew upon some of my research and combined that with others and came up with one of their recommendations, so they recommended something that is sometimes referred to a network of networks. Very similar to what I’ve been calling a governance coordinating committee. In the end, I don’t care what mechanisms we start to put in place, just that we begin to take first steps toward putting in place that will be seen as trustworthy. If we can’t do that, then why bother. At the end of the Congress, we’ll have these votes. Hopefully that will bring some momentum behind further action to move expeditiously toward putting some of these mechanisms in place.

Lucas Perry: Can you contextualize this International Congress for the Governance of AI within the broader AI governance landscape? What are the other efforts going on, and how does this fit in with all of them?

Wendell Wallach: Well, there are many different efforts underway. The EU has its efforts, the IEEE has its effort. The World Economic Forum convenes people to talk about some of these issues. You’ll have some of this come up in the Partnership in AI, you have OECD. There are conversations going on in the UN. You the higher level panels recommendations. So they have now become a vast plethora of different groups that have jumped into it. Our point is that, so far, none of these groups include all the stakeholders. So the Congress is an attempt to bring all of these groups together and ensure that other stakeholders have a place at the table. That would be the main difference.

We want to weave the groups together, but we are not trying to put in place some new authority or someone who has authority over the individual groups. We’re just trying to make sure that we’re looking at the development of AI comprehensively, that we’re talking with each other, that we have forums to talk with each other, that issues aren’t going unaddressed, and then if somebody truly has come forward with best practices and procedures, that those are made available to everyone else in the world or at least underscored for others in the world as promising pathways to go down.

Lucas Perry: Can you elaborate on how these efforts might fail to develop trust or how they might fail to bring about coordination on the issues? Is it always in the incentive of a country to share best practices around AI if that increases the capacity of other countries to catch up?

Wendell Wallach: We always have this problem of competition and cooperation. Where’s competition going to take place? How much cooperation will there actually be? It’s no mystery to anyone in the world that decisions are being made as we speak about whether or not we’re going to move towards wider cooperation within the international world or whether we have movements where we are going to be looking at a war of civilization or at least a competition between civilizations. I happen to believe there’s so many problems within emerging technologies that if we don’t have some degree of coordination, we’re all damned and that that should prevail in global climate change and in other areas, but whether we’ll actually be able to pull that off has to do with decisions going on in individual countries. So, at the moment, we’re particularly seeing that tension between China and the US. If the trade work can be diffused, then maybe we can back off from that tension a little bit, but at the moment, everything’s up for grabs.

That being said, when everything’s up for grabs, my belief is you do what you can to facilitate the values that you think need to be forwarded, and therefore I’m pushing us toward recognizing the importance of a degree of cooperation without pretending that we aren’t going to compete with each other. Competition’s not bad. Competition, as we all know, furthers innovation helps disrupt technologies that are inefficient and replace them with more efficient ways of moving forward. I’m all for competition, but I would like to see it in a broader framework where there is at least a degree of cooperation on AI ethics and international governmental cooperation.

Lucas Perry: The path forward seems to have something to do with really reifying the importance of cooperation and how that makes us all better off to some extent, not pretending like there’s going to be full 100% cooperation, but cooperation where it’s needed such that we don’t begin defecting on each other in ways that are mutually bad and incompatible.

Wendell Wallach: That claim is central to the whole FLI approach.

Lucas Perry: Yeah. So, if we talk about AI in particular, there’s this issue of lethal autonomous weapons. There’s an issue of, as you mentioned, the spread of disinformation, the way in which AI systems and machine learning can be used more and more to lie and to spread subversive or malicious information campaigns. There’s also the degree to which algorithms will or will not be contributing to discrimination. So these are all like short term things that are governance issues for us to work on today.

Wendell Wallach: I think the longer term trajectory is that AI systems are giving increasing power to those who want to manipulate human behavior either from marketing or political purposes, and they’re manipulating the behavior by studying human behavior and playing to our vulnerabilities. So humans are very much becoming machines in this AI commercial political juggernaut.

Lucas Perry: Sure. So human beings have our own psychological bugs and exploits, and massive machine learning can find those bugs and exploits and exploit them in us.

Wendell Wallach: And in real time. I mean, with the collection of sensors and facial recognition software and emotion recognition software over 5G with a large database of our past preferences and behaviors, we can be bombarded with signals to manipulate our behavior on very low levels and areas where we are known to be vulnerable.

Lucas Perry: So the question is to the extent to which and the strategies for which we can use within the context of these national and global AI governance efforts to mitigate these risks.

Wendell Wallach: To mitigate these risks, to make sure that we have meaningful public education, meaning I would say from grammar school up, digital literacy so that individuals can recognize when they’re being scammed, when they’re being lied to. I mean, we’ll never be perfect at that, but at least have ones antenna out for that and the degree to which we perhaps need to have some self recognition that if we’re going to not be just manipulable. But we’ll truly cultivate the capacity to recognize when there are internal and external pressures upon us and diffuse those pressures so we can look at new, more creative, individualized responses to the challenge at hand.

Lucas Perry: I think that that point about elementary to high school education is really interesting and important. I don’t know what it’s like today. I guess they’re about the same as what I experienced. They just seemed completely incompatible with the way the technology is going and dis-employment and other things in terms of the way that they teach and what they teach.

Wendell Wallach: Well, it’s not happening within the school systems. What I don’t fully understand is how savvy young people are within their own youth culture, whether they’re recognizing when they’re being manipulated or not, whether that’s part of that culture. I mean part of my culture, and God knows I’m getting on in years now, but it goes back to questions of phoniness and pretense and so forth. So we did have our youth culture that was very sensitive to that. But that wasn’t part of what our educational institutions were engaged in.

The difference now is that we’ll have to be both within the youth culture, but also we would need to be actually teaching digital literacy. So, for an example, I’m encountering a as scam a week, I would say right now through the telephone or through email. Some new way that somebody has figured out to try and rip off some money from me. I can’t believe how many new approaches are coming up. It just flags that this form of corruption requires remarkable degree of both sensitivity but a degree of digital knowledge so that you can recognize when you need to at least check out whether this is real or a scan before you give sensitive information to others.

Lucas Perry: The saving grace, I think for, gen Z and millennial people is that… I mean, I don’t know what the percentages are, but more than before, many of us have basically grown up on the internet.

Wendell Wallach: So they have a degree of digital literacy.

Lucas Perry: But it’s not codified by an institution like the schooling system, but changing the schooling system to the technological predictions of academics. I don’t know how much hope I have. It seems like it’s a really slow process to change anything about education. It seems like it almost has to be done outside of public education

Wendell Wallach: That may be what we mean by governance now is what can be done within the existing institutions and what has to find means of being addressed outside of the existing institutions, and is it happening or isn’t it happening? If youth culture in its evolving forms gives 90% of digital literacy to young people, fine, but what about those people who are not within the networks of getting that education, and what about the other 10%? How does that take place? I think that’s the kind of creativity and oversight we need is just monitoring what’s going on, what’s happening, what’s not happening. Some areas may lead to actual governmental needs or interventions. So let’s take the technological unemployment issue. I’ve been thinking a lot about that disruption in new ways. One question I have is whether it can be slowed down. An example for me for a slow down would be if we found ways of not rewarding corporations for introducing technologies that bring about minimal efficiencies but are more costly to the society than the efficiencies that they introduce for their own productivity gains.

So, if it’s a small efficiency, but the corporation fires 10,000 people and just 10,000 people are now on the door, I’m not sure whether we should be rewarding corporations for that. On the other hand, I’m not quite sure what kind of political economy you could put in place so you didn’t reward corporations for that. Let’s just say that you have automatic long haul trucking. In the United States, we have 1.7 million long haul truck drivers. It’s one of the top jobs in the country. First of all, long haul trucking can probably be replaced more quickly than we’ll have self driving trucks in the cities because of some of the technical issues encountered in cities and on country roads and so forth. So you could have a long haul truck that just went from on-ramp to off ramp and then have human drivers who take over the truck for the last few miles to take it to the shipping depot.

But if we’ve replaced long haul truckers in the United States over a 10 year period, that would mean putting 14,000 truck drivers out of work every month. That means you have to create 14,000 jobs a month that are appropriate for long haul truck drivers. At the same time, as you’re creating jobs for new people entering the workforce and for others whose jobs are disappearing because of automation, it’s not going to happen. Given the culture in the United States, my melodramatic example is some long haul truckers may just decide to take the semis closed down interstate highways and sit in their cap and say to the government, “Bring it on.” We are moving into that kind of social instability. So, on one hand, if getting rid of the human drivers doesn’t bring massive efficiencies, it could very easily bring social instability and large societal costs. So perhaps we don’t want to encourage that. But we need to look at it in greater depth to understand what the benefits and costs are.

We often overplay the benefits, and we under-represent the downsides and the costs. You could see a form of tax on corporations relative to how many workers they laid off and how many jobs they created. It could be a sliding tax. For corporations reducing its workforce dramatically, then it gets a higher tax on its profit than one that’s actually increasing its workforce. That would be a form of maybe how you’re funding UBI. In UBI, I would like to see something that I’ve referred to as UBI plus plus plus. I mean there’ve been various UBI pluses. But in my thought was that you’re being given that basic income for performing a service for the society. In other words, performing a service for the society is your job. There may not be anybody overseeing what service you are providing or you might be able to decide yourself what that service would be.

Maybe somebody was an aspiring actor would decide that they were going to put together an acting group and take Shakespeare into the school system, that that was their service to the society. Others may decide they don’t know how to do a service to the society, but they want to go back to school, so perhaps they’re preparing for a new job or a new contribution, and perhaps other people will really need a job and we’ll have to create high touch jobs such as those that you have in Japan for them. But the point is UBI is paying you for a job. The job you’re doing is providing a service to the society, and that service is actually improving the overall society. So, if you had thousands of creative people taking educational programs into schools, perhaps you’re improving overall education and therefore the smarts of the next generation.

Most of this is not international governance, but where it does impinge upon international considerations is if we do have massive unemployment. It’s going to be poorer nations that are going to be truly set back. I’ve been planning out in international circles that we now have the Sustainable Development Goals. Well, just technological unemployment alone could undermine the realization of the Sustainable Development Goals.

Lucas Perry: So that seems like a really big scary issue.

Wendell Wallach: It’s going to vary from country to country. I mean, the fascinating thing is how different these national governments will be. So some of the countries in Africa are leap frogging technology. They’re moving forward. They’re building smart cities. They aren’t going through our development. But other countries don’t even have functioning governments or the governments are highly autocratic. When you look at the technology available for surveillance systems now, I mean we’re very likely to see some governments in the world that look like horrible forms of dictatorship gulags, at the same time as there’ll be some countries where human rights are deeply entrenched, and the oversight of the technologies will be such that they will not be overly repressive on individual behavior.

Lucas Perry: Yeah. Hopefully all of these global governance mechanisms that are being developed will bring to light all of these issues and then effectively work on them. One issue which is related, and I’m not sure how fits in here or it fits in with your thinking, is specifically the messaging and thought around the governance related to AGI and superintelligence. Do you have any thinking here about how any of this feeds into that or your thoughts about that?

Wendell Wallach: I think that the difficulty is we’re still in a realm where when and what AGI or superintelligence will appear and what it will look like. It’s still so highly speculative. So, at this stage of the game, I don’t think that AGI is really a governmental issue beyond the question of whether government should be funding some of the research. There may also be a role for governments in monitoring when we’re crossing thresholds that open the door for AGI. But I’m not so concerned about that because I think there’s a pretty robust community that’s doing that already that’s not governmental, and perhaps we don’t need the government too involved. But the point here is, if we can put in place robust mechanisms for the international governance of AI, then potentially those mechanisms either make recommendations that perhaps slow down the adoption of technologies that could be dangerous or enhance the ethics and the sensitivity and the development of the technologies. If and when we are about to cross thresholds that open real dangers or serious benefits, that we have the mechanisms in place to help regulate the unfold into that trajectory.

But that, of course, has to be wishful thinking at this point. We’re taking baby steps at this stage of the game. Those baby steps are going to be building on the activities at FLI and OpenAI and other groups that are already engaged in. My way of approaching it is, and it’s not just with AGI, it’s also in relationship to biotech, is just a flag that are speculative dangers out there, and we are making decisions today about what pathways we, humanity as a whole, want to navigate. So, oftentimes in my presentations, I will have a slide up, and that slide is two robots kneeling over the corpse of a human. When I put that slide up, I say we may even be dealing with the melodramatic possibility that we are inventing the human species as we have known it out of existence.

So that’s my way of flagging that that’s the concern, but not trying to pretend that that’s one that governments should or can address at this point more that we are inflection point where we should and can put in place values and mechanisms to try and ensure that the trajectory of the emerging technologies is human-centered, is planet-centered, is about human flourishing.

Lucas Perry: I think that the worry of the information that is implicit to that is that if there are two AIs embodied as robots or whatever, standing over a human corpse to represent them dominating or transcending the human species. What is implicit to that is that they have more power than us because you require more power to be able to do something like that. To have more power than the human species is something governments would maybe be interested in that would be something maybe we wouldn’t want to message about.

Wendell Wallach: I mean, it’s the problem with lethal autonomous weapons. Now, I think most of the world has come to understand that lethal autonomous weapons is a bad idea, but that’s not stopping governments from pursuing them or the security establishment within government saying that it’s necessary that we go down this road. Therefore, we don’t get an international ban or treaty. The messaging with governments is complicated. I’m using the messaging only to stress what I think we should be doing in the near term.

Lucas Perry: Yeah, I think that that’s a good idea and the correct approach. So, if everything goes right in terms of this process of AI governance, then we’re able to properly manage the development of new AI technology, what is your hope here? What are optimistic visions of the future, given successful AI governance?

Wendell Wallach: I’m a little bit different than most people on this. I’m not so much caught up in visions of the future based on this technology or that technology. My focus is more that we have a conscious active decision making process in the present where people get to put in place the values and instruments they need to have a degree of control over the overall development of emerging technologies. So, yes, of course I would like to see us address global climate change. I would like us to adapt AI for all. I would like to see all kinds of things take place. But more than anything, I’m acutely aware of what a significant inflection point this is in human history, and that we’re having the pass through a very difficult and perhaps in relatively narrow doorway in order ensure human flourishing for the next couple of hundred years.

I mean, I understand that I’m a little older than most of the people involved in this process, so I’m not going to be on the stage for that much longer barring radical life extension taking place in the next 20 years. So, unlike many people who are working on positive technology visions for the future, I’m less concerned with the future and more concerned with how, in the present, we nudge technology onto our positive course. So my investment is more that we ensure that humanity not only have a chance, but a chance to truly prevail.

Lucas Perry: Beautiful. So you’re now discussing about how you’re essentially focused on what we can do immediately. There’s the extent to which AI alignment and machine ethics or whatever are trying to imbue an understanding of human preference hierarchies in machine systems and to develop ethical sensibilities and sensitivities. I wonder what the role is for, first of all, embodied compassion and loving kindness in persons as models for AI systems and then embodied loving kindness and compassion and pure altruism in machine systems as a form of alignment with idealized human preference hierarchies and ethical sensibilities.

Wendell Wallach: In addition of this work I’m doing on the governance of emerging technologies, I’m also writing a book right now. The book has a working title, which is Descartes Meets Buddha: Enlightenment for the Information Age.

Lucas Perry: I didn’t know that. So that’s great.

Wendell Wallach: So this fits in with your question very broadly. I’m both looking at if the enlightenment ethos, which has directed humanities development over the last few hundred years is imploding under the weight of its own success, then what ethos do we put in place that gives humanity a direction for flourish and over the next few hundred years? I think central to creating that new ethos is to have a new understanding of what it means to be human. But that new understanding isn’t something totally new. It needs to have some convergence with what’s been perennial wisdom to be meaningful. But the fact is when we ask these questions, how are we similar to and how do we truly differ from the artificial forms of intelligence that we’re creating? Or what will it mean to be human as we evolved through the impact of emerging technologies, whether that’s life extension or uploading or bioengineering?

There still is this fundamental question about what grounds, what it means to be human. In other words, what’s not just up for grabs or up for engineering. To that, I bring in my own reflections after having meditated for the last 50 years on my own insights shall we say and how that converges with what we’ve learned about human functioning, human decision making and human ethics through the cognitive sciences over the last decade or two. Out of that, I’ve come up with a new model that I referred to as cyber souls, meaning that as sciences illuminating the computational and biochemical mechanisms that give rise to human capabilities, we have often lost sight of the way in which evolution also forged us into integrated beings, integrated within ourselves and searching for an adapted integration to the environment and the other entities that share in that environment.

And it’s this need for integration and relationship, which is fundamental in ethics, but also in decision making. There’s the second part of this, which is this new fascination with moral psychology and the recognition that reason alone may not be enough for good decision making. And that if we have an ethics that doesn’t accommodate people’s moral psychology, then reason alone isn’t going to be persuasive for people, they have to be moved by it. So I think this leads us to perhaps a new understanding of what’s the role of psychological states in our decision making, what information is carried by different psychological states, and how does that information help direct us toward making good and bad decisions. So I call that a silent ethic. There are certain mental states, which historically have at least indicated for people that they’re in the right place at the right time, in the right way.

Oftentimes, these states, whether they’re called flow or oneness or creativity, they’re being given some spiritual overlay and people look directly at how to achieve these states. But that may be a misunderstanding of the role of mental states. Mental States are giving us information. As we factor that information into our choices and actions, those mental states fall away, and the byproduct are these so-called spiritual or transcendent states, and often they have characteristics where thought and thinking comes to a rest. So I call this the silent ethic, taking the actions, making the choices that allow our thoughts to come to rest. When our thoughts are coming to rest, we’re usually in relationships within ourself and our environments that you can think of as embodied presence or perhaps even the foundations for virtue. So my own sense is we may be moving toward a new or revived virtue ethics. Part of what I’m trying to express in this new book is what I think is foundational to the flourishing of that new virtue ethics.

Lucas Perry: That’s really interesting. I bring this up and asking because I’ve been interested in the role of idealization, ethically, morally and emotionally in people and reaching towards whatever is possible in terms of human psychological enlightenment and how that may exist as certain benchmarks or reference frames in terms of value learning.

Wendell Wallach: Well, it is a counter pose to the notion that machines are going to have this kind of embodied understanding. I’m highly skeptical that we will get machines in the next hundred years that come in close to this kind of embodied understanding. I’m not skeptical that we could have on new kind of revival movement among humans where we create a new class of moral exemplars, which seems to be the exact opposite of what we’re doing at the moment.

Lucas Perry: Yeah. If we can get the AI systems and create abundance and reduce existential risk of bunch and have a long period of reflection, perhaps there will be this space for reaching for the limits of human idealization and enlightenment.

Wendell Wallach: It’s part of what the whole question is going on, for us, philosophy types, to what extent is this all about machine superintelligence and to what extent are we using the conversation about superintelligence as an imperfect mirror to think more deeply about the ways we’re similar to in dissimilar from the AI systems we’re creating or have a potential to create.

Lucas Perry: All right. So, with that, thank you very much for your time.

 If you enjoyed this podcast, please subscribe. Give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI alignment series.

End of recorded material

FLI Podcast: Cosmological Koans: A Journey to the Heart of Physical Reality with Anthony Aguirre

There exist many facts about the nature of reality which stand at odds with our commonly held intuitions and experiences of the world. Ultimately, there is a relativity of the simultaneity of events and there is no universal “now.” Are these facts baked into our experience of the world? Or are our experiences and intuitions at odds with these facts? When we consider this, the origins of our mental models, and what modern physics and cosmology tell us about the nature of reality, we are beckoned to identify our commonly held experiences and intuitions, to analyze them in the light of modern science and philosophy, and to come to new implicit, explicit, and experiential understandings of reality. In his book Cosmological Koans: A Journey to the Heart of Physical Reality, FLI co-founder Anthony Aguirre explores the nature of space, time, motion, quantum physics, cosmology, the observer, identity, and existence itself through Zen koans fueled by science and designed to elicit questions, experiences, and conceptual shifts in the reader. The universe can be deeply counter-intuitive at many levels and this conversation, rooted in Anthony’s book, is an attempt at exploring this problem and articulating the contemporary frontiers of science and philosophy.

Topics discussed include:

  • What is skillful of a synergy of Zen and scientific reasoning
  • The history and philosophy of science
  • The role of the observer in science and knowledge
  • The nature of information
  • What counts as real
  • The world in and of itself and the world we experience as populated by our concepts and models of it
  • Identity in human beings and future AI systems
  • Questions of how identity should evolve
  • Responsibilities and open questions associated with architecting life 3.0

 

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Lucas Perry: Welcome to the Future of Life Institute podcast. I’m Lucas Perry. Today, we’re speaking with Anthony Aguirre. He is a cosmologist, a co-founder of the Future of Life Institute, and a co-founder of the Foundational Questions Institute. He also has a cool prediction market called Metaculus that I suggest you check out. We’re discussing his book, Cosmological Koans: A Journey Into the Heart of Physical Reality. This is a book about physics from a deeply philosophical perspective in the format of Zen koans. This discussion is different from the usual topics of the podcast, thought there are certainly many parts that directly apply. I feel this will be of interest to people who like big questions about the nature of reality. Some questions that we explore are, what is skillful of a synergy of Zen and scientific reasoning, the history and philosophy of science, the nature of information, we ask what is real, and explore that question. We discuss the world in and of itself and the world we experience as populated by our concepts and stories about the universe. We discuss identity in people and future AI systems. We wonder about how identity should evolve in persons and AI systems. And we also get into the problem we face of architecting new forms of intelligence with their own lived experiences, and identities, and understandings of the world. 

As a bit of side news, Ariel is transitioning out of her role at FLI. So, i’ll be taking over the main FLI podcast from here on out. This podcast will continue to deal with broad issues in the space of existential risk and areas that pertain broadly to the Future of Life Institute. Like, AI risk and AI alignment, as well as bio-risk and climate change, and the stewardship of technology with wisdom and benevolence in mind. And the AI Alignment Podcast will continue to explore the technical, social, political, ethical, psychological, and broadly interdisciplinary facets of the AI alignment problem. So, I deeply appreciated this conversation with Anthony and I feel that conversations like these help me to live what I feel is an examined life. And if these topics and questions that I’ve mentioned are of interest to you or resonate with you then I think you’ll find this conversation valuable as well. 

So let’s get in to our conversation with Anthony Aguirre. 

We’re here today to discuss your work, Cosmological Koans: A Journey to the Heart of Physical Reality. As a little bit of background, tell me a little bit about your experience as a cosmologist and someone interested in Zen whose pursuits have culminated into his book.

Anthony Aguirre: I’ve been a cosmologist professionally for 20 years or so since grad school I suppose, but I’ve also for my whole life had just the drive to understand what reality is, what’s reality all about. One approach to that certainly to understanding physical reality is physics and cosmology and fundamental physics and so on. I would say that the understanding of mental reality, what is going on in the interior sense is also reality and is also crucially important. That’s what we actually experience. I’ve long had an interest in both sides of that question. What is this interior reality? Why do we have experience the way we do? How is our mind working? As well as what is the exterior reality of physics and the fundamental physical laws and the large scale picture of the universe and so on?

While professionally I’ve been very  focused on the external side and the cosmological side in particular, I’ve nourished that interest in the inner side as well and how that interior side and the exterior side connect in various ways. I think that longstanding interest has built the foundation of what then turned into this book that I’ve put together over a number of years that I don’t care to admit.

Lucas Perry: There’s this aspect of when we’re looking outward, we’re getting a story of the universe and then that story of the universe eventually leads up into us. For example as Carl Sagan classically pointed out, the atoms which make up your body had to be fused in supernovas, at least the things which aren’t hydrogen and helium. So we’re all basically complex aggregates of collapsed interstellar gas clouds. And this shows that looking outward into the cosmos is also a process of uncovering the story of the person and of the self as well.

Anthony Aguirre: Very much in that I think to understand how our mind works and how our body works, we have to situate that within a chain of wider and wider context. We have to think of ourselves as biological creatures, and that puts us in the biological context and evolution and evolution over the history of the earth, but that in turn is in the context of where the earth sits in cosmic evolution in the universe as a whole, and also where biology and its functioning sits within the context of physics and other sciences, information theory, computational science. I think to understand ourselves, we certainly have to understand those other layers of reality.

I think what’s often assumed though is that to understand those other layers of reality, we don’t have to understand how our mind works. I think that’s tricky because on the one hand, we’re asking for descriptions of objective reality, and we asking for laws of physics. We don’t want to ask for our opinion that we’re going to disagree about. We want something that transcends our own minds and our ability to understand or describe those things. We’re looking for something objective in that sense.

I think it’s also true that many of the things that we talk about is fairly objective contain unavoidably a fairly subjective component to them. Once we have the idea of an objective reality out there that is independent of who’s observing it, we ascribe a lot of objectivity to things that are in fact much more of a mix that have a lot more ingredients that we have brought to them than we like to admit and are not wholly out there to be observed by us as impartial observers but are very much a tangled interaction between the observer and the observed.

Lucas Perry: There are many different facets and perspectives here about why taking the cosmological perspective of understanding the history of the universe, as well as the person, is deeply informative. In terms of the perspective of the Future of Life Institute, understanding cosmology tells us what is ultimately possible for life in terms of how long the universe will last, and how far you can spread, and fundamental facts about information and entropy, which are interesting, and also ultimately determine how the fate of intelligence and consciousness in the world. There’s also this anthropic aspect that you’re touching on about how observers only observe the kinds of things that observers are able to observe. We can also consider the limits of the concepts that are born of being a primate conditioned by evolution and culture, and the extent to which our concepts are lived experiences within our world model. And then there’s this distinction between the map and the territory, or our world model and the world itself. And so perhaps part of fusing Zen with cosmology is experientially being mindful of not confusing the map for the territory in our moment to moment experience of things.

There’s also this scientific method for understanding what is ultimately true about the nature of reality, and then what Zen offers is an introspective technique for trying to understand the nature of the mind, the nature of consciousness, the causes and conditions which lead to suffering, and the concepts which inhabit and make up conscious experience. I think all of this thinking culminates into an authentically lived life as a scientist and as a person who wants to know the nature of things, to understand the heart of reality, to attempt to not be confused, and to live an examined life – both of the external world and the experiential world as a sentient being. 

Anthony Aguirre: Something like that, except I nurture no hope to ever not be confused. I think confusion is a perfectly admirable state in the sense that reality is confusing. You can try to think clearly, but I think there are always going to be questions of interests that you simply don’t understand. If you go into anything deeply enough, you will fairly quickly run into, wow, I don’t really get that. There are very few things that if you push into them carefully and skeptically and open-mindedly enough, you won’t come to that point. I think it would actually be I think let down if I ever got to the point where I wasn’t confused about something. All the fun would be gone, but otherwise, I think I agree with you. Where shall we start?

Lucas Perry: This helps to contextualize some of the motivations here. We can start by explaining why cosmology and Zen in particular? What are the skillful means born of a fusion of these two things? Why fuse these two things? I think some number of our audience will be intrinsically skeptical of all religion or spiritual pursuits. So why do this?

Anthony Aguirre: There are two aspects to it. I think one is a methodological one, which is Cosmological Koans is made up of these koans, and they’re not quite the same koans that you would get from a Zen teacher, but they’re sort of riddles or confrontations that are meant to take the recipient and cause them to be a little bit baffled, a little bit surprised, a little bit maybe shocked at some aspect of reality. The idea here is to both confront someone with something that is weird or unusual or contradicts what they might have believed beforehand in a comfortable, familiar way and make it uncomfortable and unfamiliar. Also to make the thing that is being discussed about the person rather than abstracts intellectual pursuit. Something that I like about Zen is that it’s about immediate experience. It’s about here you are here and now having this experience.

Part of the hope I think methodologically of Cosmological Koans is to try to put the reader personally in the experience rather than have it be stuff out there that physicists over there are thinking about and researching or we can speculate with a purely third person point of view to emphasize that if we’re talking about the universe and the laws of physics and reality, we’re part of the universe. We’re obeying those laws of physics. We’re part of reality. We’re all mixed up in that there can be cases where it’s useful to get a distance from that, but then there are also cases where it’s really important to understand what that all has to do with you. What does this say about me and my life, my experience, my individual subjective, first person view of the world? What does that have to do with these very third person objective things that physics studies?

Part of the point is an interesting and fun way to jolt someone into seeing the world in a new way. The other part is to make it about the reader in this case or about the person asking the questions and not just the universe out there. That’s one part of why I chose this particular format.

I think the other is a little bit more on the content side to say I think it’s dangerous to take things that were written 2,500 years ago and say, oh look, they anticipated what modern physics is finding now. They didn’t quite. Obviously, they didn’t know calculus, let alone anything else that modern physics knows. On the other hand, I think the history of thinking about reality from the inside out, from the interior perspective using a set of introspective tools that were incredibly sophisticated through thousands of years does have a lot to say about reality when the reality is both the internal reality and the external one.

In particular, when you’re talking about a person experiencing the physical world perceiving something in the exterior physical world in some way, what goes on in that process that has both the physical side to it and an internal subjective mental side to it, observing how much of the interior gets brought to the perception. In that sense, I think the Eastern traditions are way ahead of where the West was. The West has had this idea that there’s the external world out there that sends information in and we receive it and we have a pretty much accurate view of what the world is. The idea that instead what we are actually experiencing is very much a joint effort of the experiencer and that external world building up this thing in the middle that brings that individual along with a whole backdrop of social and biological and physical history to every perception. I think that is something that is (a) true, and (b) there’s been a lot more investigation of that on the Eastern and on the philosophical side, some in Western philosophy too of course, but on the philosophical side rather than just the physical side.

I think the book is also about exploring that connection. What are the connections between our personal first person, self-centered view and the external physical world? In doing that investigation, I’m happy to jump to whatever historical intellectual foundations there are, whether it’s Zen or Western philosophy or Indian philosophy or modern physics or whatever. My effort is to touch on all of those at some level in investigating that set of questions.

Lucas Perry: Human beings are the only general epistemic agents in the universe that we’re currently aware of. From the point of view of the person, all the progress we’ve done in philosophy and science, all that there has ever been historically, from a first person perspective, is consciousness and its contents, and our ability to engage with those contents. It is by virtue of engaging with the contents of consciousness that we believe that we gain access to the outside world.  You point out here that in Western traditions, it’s been felt that we just have all of this data come in and we’re basically just seeing and interacting with the world as it really is. But as we’ve moreso uncovered, and in reality, the process of science and interrogating the external world is more like you have this internal virtual world model simulation that you’re constructing, that is a representation of the world that you use to engage and navigate with it. 

From this first person experiential bedrock, Western philosophers like Descartes have tried to assume certain things about the nature of being, like “I think, therefore I am.” And from assumptions about being, the project and methodologies of science are born of that reasoning and follow from it. It seems like it took Western science a long time, perhaps up until quantum physics, to really come back to the observer, right?

Anthony Aguirre: Yeah. I would say that a significant part of the methodology of physics was at some level to explicitly get the observer out and to talk about only objectively mathematically definable things. The mathematical part is still with physics. The objective is still there, except that I think there’s a realization that one always has to, if one is being careful, talk about what actually gets observed. You could do all of classical physics at some level, physics up to the beginning of the 20th century without ever talking about the observer. You could say there is this object. It is doing this. These are the forces acting on it and so on. You don’t have to be very careful about who is measuring those properties or talking about them or in what terms.

Lucas Perry: Unless they would start to go fast and get big.

Anthony Aguirre: Before the 20th century, you didn’t care if things were going fast. In the beginning of the 20th century though, there was relativity, and there was quantum mechanics, and both of those suddenly had the agent doing the observations at their centers. In relativity, you suddenly have to worry about what reference frame you’re measuring things in, and things that you thought were objective facts like how long is the time interval between two things that happen suddenly were revealed to be not objective facts, but dependent on who the observer is in particular, what reference frame their state of motion and so on.

Everything else as it turned out is really more like a property of the world that the world can either have or not when someone checks. The structure of quantum mechanics is at some level things have a state, which encodes something about the objects, and the something that it encodes is there’s this set of questions that I could ask the object and I can get answers to those questions. There’s a particular set of questions that I might ask and I’d get definite answers. If I ask other questions that aren’t in that list, then I get answers still, but they’re indefinite, and so I have to use probabilities to describe them.

This is a very different structure to say the object is a list of potential answers to questions that I might pose. It’s very different from saying there’s a chunk of stuff that has a position and a momentum and a force is acting on it and so on. It feels very different. While mathematically you can make the connections between those, it is a very different way of thinking about reality. That is a big change obviously and one that I think still isn’t complete in the sense that as soon as you start to talk that way and say an electron or a glass of water or whatever is a set of potential answers to questions, that’s a little bit hard to swallow, but you immediately have to ask, well, who’s asking the questions and who’s getting the answers? That’s the observer.

The structure of quantum mechanics from the beginning has been mute about that. It said make an observation and you’ll get these probabilities. That’s just pushing the observer into the thing that by definition makes observations, but without a specification of what does that mean to make an observation, what’s allowed to do it and what isn’t? Can an electron observe another electron or does it have to be a big group of electrons? What is it exactly that counts as making an observation and so on? There are all these questions about what this actually means that have just been sitting around since quantum mechanics was created and really haven’t been answered at any agreed upon or really I would say satisfactory way.

Lucas Perry: Theres a ton there. In terms of your book, there’s this fusion between what is skillful and true about Zen and what is skillful and true about science. You discussed here historically this transition to an emphasis on the observer and information and how those change both epistemology and ontology. The project of Buddhism or the project of Zen is ultimately also different from the project and intentions of Western science historically in terms of the normative, and the ethics driving it, and whether it’s even trying to make claims about those kinds of things. Maybe you could also explain a little bit there about where the projects diverge, what they’re ultimately trying to say either about the nature of reality or the observer.

Anthony Aguirre: Certainly in physics and much of philosophy of physics I suppose, it’s purely about superior understanding of what physical reality is and how it functions and how to explain the world around us using mathematical theories but with little or no translation of that into anything normative or ethical or prescriptive in some way. It’s purely about what is, and not only is there no ought connected with it as maybe there shouldn’t be, but there’s no necessary connection between any statement of what ought to be and what is. No translation of because reality is like this, if we want this, we should do this.

Physics has got to be part of that. What we need to do in order to achieve our goals has to do with how the world works, and physics describes that so it has to be part of it and yet, it’s been somewhat disconnected from that in a way that it certainly isn’t in spiritual traditions like Buddhism where our goal in Buddhism is to reduce or eliminate suffering. This is how the mind works and therefore, this is what we need to do given the way the mind and reality works to reduce or eliminate suffering. That’s the fundamental goal, which is quite distinct from the fundamental goal of just I want to understand how reality works.

 do think there’s more to do, and obviously there are sciences that fill that role like psychology and social science and so on that are more about let’s understand how the mind works. Let’s understand how society works so that given some set of goals like greater harmony in society or greater individual happiness, we have some sense of what we should do in order to achieve those. I would say there’s a pretty big gap nowadays between those fields on the one hand and fundamental physics on the other hand. You can spend a lot of time doing social science or psychology without knowing any physics and vice versa, but at the same time, it’s not clear that they really should be so separate. Physics is talking about the basic nature of reality. Psychology is also talking about the basic nature of reality but two different sides of it, the interior side and the exterior side.

Those two are very much connected, and so it should not be entirely possible to fully understand one without at least some of the other. That I think is also part of the motivation that I have because I don’t think that you can have a comprehensive worldview of the type that you want to have in order to understand what we should do, without having some of both aspects in it.

Lucas Perry: The observer has been part of the equation the whole time. It’s just that classical mechanics is a problem such that it never really mattered that much, but now it matters more given astronomy and communications technologies.  When determining what is, the fact that an observer is trying to determine what is and that the observer has a particular nature impacts the process of trying to discover what is, but not only are there supposed “is statements” that we’re trying to discover or understand, but we’re also from one perspective conscious beings with experiences and we have suffering and joy, and are trying to determine what we ought to do. I think what you’re pointing towards is basically an alternate unification of the problem of determining what is, and also of the often overlooked fact that we are contextualized as a creature in the world we’re attempting to understand, and make decisions about what to do next.

Anthony Aguirre: I think you can think of that in very big terms like that in this cosmic context, what is subjectivity? What is consciousness? What does it mean to have feelings of moral value and so on? Let’s talk about that. I think it’s also worth being more concrete in the sense that if you think about my experience as an agent in the world insofar as I think the world is out there objectively and I’m just perceiving it more or less directly. I tend to make very real in my mind a lot of things that aren’t necessarily real. Things that are very much half created by me, I tend to then turn into objective things out there and then react to them. This is something that we just all do on a personal basis all the time in our daily lives. We make up stories and then we think that those stories are real. This is just a very concrete thing that we do every day.

Sometimes that works out well and sometimes it doesn’t because if the story that we have is different from the story that someone else has or the story that society has, or if some in some ways somewhat more objective story then we have a mismatch and we can cause a lot of poor choices and poor outcomes by doing that. Simply the very clear psychological fact that we can discover with a little bit of self analysis that the stories that we make up aren’t as true as we usually think they are, that’s just one end of the spectrum of this process by which we as sentient beings are very much co-creating the reality that we’re inhabiting.

I think this co-creation process we’re comfortable with the fact that it awkwardly happens when we make up stories about what happened yesterday when I was talking to so and so. We don’t think of it so much when we’re talking about a table. We think the table is there. It’s real. If anything, it is. When we go deeper, we can realize that all of the things like color and solidity and endurance over time aren’t in the way function of the atoms and the laws of physics evolving them. Those things are properties that we’ve brought as useful ways to describe the world that have developed over millions of years of evolution and thousands of years of social evolution and so on. Those properties, none of those things are built into the laws of nature. Those are all things that we’ve brought. That’s not to say that the table is made up. Obviously, it’s not. The table is very objective in a sense, but there’s no table built into the structure of the universe.

I think we tend to brush under the rug how much we bring to our description of reality. We say that it’s out there. We can realize that on small levels, but I think to realize the depth of how much we bring to our perceptions and where that stuff comes from, which is a long historical, complicated information generating process that takes a lot more diving in and thinking about.

Lucas Perry: Right. If one were god or if one were omniscient, then to know the universe at the ultimate level would be to know the cosmic wave function, and within the cosmic wave function, things like marriage and identity and the fact that I have a title and conceptual history about my life are not bedrock ontological things. Rather they’re concepts and stories that sentient beings make up due to, as you said, evolution and social conditioning and culture.

Anthony Aguirre: Right, but when you’re saying that, I think there’s a suggestion that the cosmic wave functions description would be better in some way. I’d take issue with that because I think if you were some super duper mega intelligence that just knew the position of every atom or exactly the cosmic wave function, that doesn’t mean that you would know that the table in front of me is brown. That description of reality has all the particles in it and their positions and at some level, all the information that you could have of the fundamental physics, but it’s completely missing a whole bunch of other stuff, which are the ways that we categorize that information into meaningful things like solidity and color and tableness.

Lucas Perry: It seems to me that that must be contained within that ultimate description of reality because in the end, we’re just arrangements of particles and if god or the omniscient thing could take the perspective of us then they would see the table or the chair and have that same story. Our stories about the world are information built into us. Right?

Anthony Aguirre: How would it do that? What I’m saying is there’s information. Say the wave function of the universe. That’s some big chunk of information describing all kinds of different observations you could make of locations of atoms and things, but nowhere in that description is it going to tell you the things that you would need to know in order to talk about whether there’s a glass on the table in front of me because glass and table and things are not part of that wave function. Those are concepts that have to be added to it. It’s more specification that has been added that exists because of our view of the world. It only exists from the interior perspective of where we are as creatures that have evolved and are looking out.

Lucas Perry: My perspective here is that given the full capacity of the universal wave function for the creation of all possible things, there is the total set of arbitrary concepts and stories and narratives and experiences that sentient beings might dream up that arrive within the context of that particular cosmic wave function. There could be tables and chairs, or sniffelwoops and worbblogs but if we were god and we had the wave function, we could run it such that we created the kinds of creatures who dreamt a life of sniffelwoops and worbblogs or whatever else. To me, it seems like it’s more contained within the original thing.

Anthony Aguirre: This is where I think it’s useful to talk about information because I think that I just disagree with that idea in the sense that if you think of an eight-bit string, so there’s 256 possibilities of where the ones and zeros can be on and off, if you think of all 256 of those things, then there’s no information there. Whereas when I say actually only 128 of these are allowed because the first one is a one, you cut down the list of possibilities, but by cutting it down, now there’s information. This is exactly the way that information physically or mathematically is defined. It’s by saying if all the possibilities are on equal footing, you might say equally probable, then there’s no information there. Whereas, if some of them are more probable or even known, like this is definitely a zero or one, then that whole thing has information in it.

I think very much the same way with reality. If you think of all the possibilities and they’re all on the table with equal validity, then there’s nothing there. There’s nothing interesting. There’s no information there. It’s when you cut down the possibilities that the information appears. You can look at this in many different contexts. If you think about it in quantum mechanics, if you start some system out, it evolves into many possibilities. When you make an observation of it, you’re saying, oh, this possibility was actually realized and in that sense, you’ve created information there.

Now suppose you subscribe to the many worlds view of quantum mechanics. You would say that the world evolves into two copies, one in which thing A happened and one in which thing B happened. In that combination, A and B, there’s less information than in either A or B. If you’re observer A or if you’re observer B, you have more information than if you’re observer C looking at the combination of things. In that sense, I think we as residents, not with omniscient view, but as limited agents that have a particular point of view actually have more information about the world in a particular sense than someone who has the full view. The person with the full view can say, well, if I were this person, I would see this, or if I were this person, I would see that. They have in some sense a greater analytical power, but there’s a missing aspect of that, which is to make a choice as to which one you’re actually looking at, which one you’re actually residing in.

Lucas Perry: It’s like the world model which you’re identified with or the world model which you’re ultimately running is the point. The eight-bit string that you mentioned: that contains all possible information that can be contained within that string. Your point is that when we begin to limit it is when we begin to encode more information.

Anthony Aguirre: That’s right. There’s a famous story called the Library of Babel by Borges. It’s a library with every possible sequence of characters just book, after book, after book. You have to ask yourself how much information is there in that library. On the one hand, it seems like a ton because each volume you pick out has a big string of characters in it, but on the other hand, there’s nothing there. You would search forever practically far longer than the age of the universe before you found even a sentence that made any sense.

Lucas Perry: The books also contain the entire multi-verse, right?

Anthony Aguirre: If they go on infinitely long, if they’re not finite length books. This is a very paradoxical thing about information, I think, which is that if you combine many things with information in them, you get something without information in it. That’s very, very strange. That’s what the Library of Babel is. I think it’s many things with lots of information, but combined, they give you nothing. I think that’s in some level how the universe is that it might be a very low information thing in and of itself, but incredibly high information from the standpoint of the beings that are in it like us.

Anthony Aguirre: When you think of it that way, we become vastly, vastly more important than you might think because all of that information that the universe then contains is defined in terms of us, in terms of the point of view that we’re looking out from, without which there’s sort of nothing there. That’s a very provocative and strange view of the world, but that’s more and more the way I think maybe it is.

Lucas Perry: I’m honestly confused. Can you expand upon your example? 

Anthony Aguirre: Suppose you’ve got the library of Babel. It’s there, it’s all written out. But suppose that once there’s a sentence like, “I am here observing the world,” that you can attribute to that sentence a point of view. So once you have that sequence of words like, “I am here observing the world,” it has a subjective experience. So then almost no book has that in this whole library, but a very, very, very select few do. And then you focus on those books. That sub-selection of books you would say there’s a lot of information associated with that subsection, because making something more special means that it has more information. So once you specify something, there’s a bunch of information associated with it.

Anthony Aguirre: By picking out those particular books, now you’ve created information. What I’m saying is there’s a very particular subset of the universe or subset of the ways the universe could be, that adds a perspective that has a subjective sense of looking out at the world. And if you specify, once you focus in from all the different states of the universe to those associated … having that perspective, that creates a whole bunch of information. That’s the way that I look at our role as subjective observers in the universe, that by being in a first person perspective, you’re sub-selecting a very, very, very special set of matter and thus creating a whole ton of information relative to all possible ways that the matter could be arranged.

Lucas Perry: So for example, say the kitchen is dirty, and if you leave the kitchen alone, entropy will just continue to make the kitchen more dirty because there are more possible states in which the kitchen is dirty than it is clean, and there are more possible states in the universe in which sentient human beings do not arise. But here we are, encoded on a planet with the rest of organic life … and in total, evolution and the history of life on this planet requires requires a large and unequal amount of information and specification. 

Anthony Aguirre: Yes, I would say … We haven’t talked about entropy, and I don’t know if we should. Genericness is the opposite of information. So when something’s very specific, there’s information content, and when it’s very generic, there’s less information content. This is at some level saying, “Our first person perspective as conscious beings is very, very specific.” I think there is something very special and mysterious at least, about the fact that there’s this very particular set of stuff in the universe that seems to have a first person perspective associated with it. That’s where we are, sort of almost by definition.

That’s where I think the question of agency and observation and consciousness has something to do with how the universe is constituted, not in that it changes the universe in some way, but that connected with this particular perspective is all this information, and if the physical world is at some level made of information, that’s a very radical thing because that’s saying that through our conscious existence and our particular point of view, we’re creating information, and information is reality, and therefore we’re creating reality.

There are all these ways that we apply physics to reality. They’re very information theoretic. There’s this sort of claim that a more useful way to think about the constituents of reality are as informational entities. And then the second claim is that by specifying, we create information. And then the third is that by being conscious observers who come into being in the universe and then have our perspective that we look out toward the universe from, that we are making a selection, we’re specifying, “This is what I see.” So we’re then creating a bunch of information and thus creating a reality.

In that sense, I’m claiming that we create a reality, not from some, “I think in my mind and therefore reality appears like magical powers,” but that if we really talk about what’s real, it isn’t just little bits of stuff I think, but it’s everything else that makes up reality and that information that makes up reality is something that we very much are part of the creation of. 

There are different definitions of information, but the way that the word is most commonly used is for Shannon information. And what that is, is an amount that is associated with a set of probabilities. So if I say I’m going to roll some dice, what am I going to roll? So you’d say, “I don’t know.” And I’d say, “Okay, so what probabilities would you ascribe to what I’m going to roll?” And you’d say, “Well probably a sixth for each side of the die.” And I would say that there’s zero information in that description. And I say that because that’s the most uncertain you could be about the rolls of the dice. There’s no information there in your description of the die.

Now I roll it, and we see that it’s a three. So now the probability of three is 100% or at least very close to it. And the probability of all the other ones is zero. And now there is information in our description. Something specific has happened, and we’ve created information. That’s not a magical thing; it’s just the information is associated with probabilities over things, and when we change the probabilities, we change how much information there is.

Usually when we observe things, we narrow the probabilities. That’s kind of the point of making observations, to find out more about something. In that sense, we can say that we’re creating information or we’re gathering information, so we’ve created information or gathered it in that sense by doing the measurement. In that sense, any time we look at anything, we’re creating information, right?

If I just think what is behind me, well there’s probably a pillar. It might be over there, it might be over there. Now let me turn around and look. Now I’ve gathered information or created information in my description of pillar location. Now when we’re talking about a wave function and somebody measuring the wave function, and we want to keep track of all of the information and so on, it gets rather tricky because there are questions about whose probabilities are we talking about, and whose observations and what are they observing. So we have to get really careful and technical about what sort of probabilities are being defined and whose they are, and how are they evolving.

When you read something like, “Information is preserved in the universe,” what that actually means is that if I take some description of the universe now and then I close my eyes and I evolve that description using the laws of physics, the information that my description had will be preserved. So the laws of physics themselves will not change the amount of information in that description.

But as soon as I open my eyes and look, it changes, because I just will observe something and I’ll see that I closed my eyes, the universe could have evolved into two different things. Now I open them and see which one it actually evolved into. Now I increased the information. I reduced the uncertainty. So it’s very, very subtle, the way in which the universe preserves information. The dynamics of the universe, the laws of physics, preserve the information that is associated with a description that you have of the world. There’s an incredible amount of richness there because that’s what’s actually happening. If you want to think about what reality is, that’s what reality is, and it’s the observers who are creating that description and observing that world and changing the description to match what they saw. Reality is a combination of those two things: the evolution of the world by the laws of physics, and the interaction of that with the person who or the whatever it is that is asking the questions and making the observations.

What’s very tricky is that unlike matter, information is not something that you can say, “I’ve got four bits of information here and five bits of information here, so I’m going to combine them and get nine bits of information.” Sometimes that’s true, but other times it’s very much not true. That’s what’s very, very, very tricky I think. So if I say I’ve got a die and I rolled a one with a 100% chance, that’s information. If I say I have a die and I rolled a two, or if I say I had a die and then rolled a three, all of those have information associated with them. But if I combine those in the sense that I say I have a die and I rolled a one and a two and a three and a four and a five and a six, then there’s no information associated with that.

All of the things happened, and so that’s what’s so tricky about it. It’s the same with the library of Babel. If I take every possibility on an equal footing, then none of them is special and there’s no information associated with that. If I take a whole bunch of special things and put them in a big pot, I just have a big mess and then there’s nothing special any more.

When I say something like, “The world is made out of information,” that means that it has different sort of properties than if it was made out of stuff. Because stuff … Like you take away some stuff and there’s less stuff. Or you divide the stuff in two and each half has half as much stuff. And information is not necessarily that way. And so if you have a bunch of information or a description of something and you take a subset of it, you’ve actually made more information even though there’s less that you’re talking about.

It’s different than the way we think about the makeup of reality when you think about it as made up of stuff, and has just very different properties that are somewhat counter-intuitive when we’re used to thinking about the world as being made up of stuff.

Lucas Perry: I’m happy that we have spent this much time on just discussing information, because I think that it offers an important conceptual shift for seeing the world, and a good challenging of some commonly held intuitions – at least, that I have. The question for me now is, what are the relevant and interesting implications here for agents? The one thing that had been coming to my mind is… and to inject more Zen here… there is a koan that goes something like: “first there were mountains and then there were no mountains, and then there were mountains.”  This seems to have parallels to the view that you’re articulating, because first you’re just stupefied and bought into the reality of your conceptualizations and stories where you say “I’m actually ultimately a human being, and I have a story about my life where I got married, and I had a thing called a job, and there were tables, which were solid and brown and had other properties…” But as you were saying, there’s no tableness or table in the wave function; these are all stories and abstractions which we use because they are functional or useful for us. And then when we see that we go, “Okay, so there aren’t really mountains in the way that I thought, mountains are just stories we tell ourselves about the wave function.”

But then I think it seems like you’re pointing out here again, there’s sort of this ethical or normative imperative where it’s like, “okay, so mountains are mountains again, because I need my concept and lived experience of a mountain to exist in the world, and to exist amongst human institutions and concepts and language, and even though I may return to this, this all may be viewed in a new light. Is this pointing in the right direction in your opinion?

Anthony Aguirre: I think in a sense, in that we think we’re so important, and the things around us are real, and then we realize as we study physics that actually, we’re tiny little blips in this potentially infinite or at least extremely large, somewhat uncaring-seeming universe, that the things that we thought are real are kind of fictitious, and partly made up by our own history and perceptions and things, that the table isn’t really real but it’s made up of atoms or wave function or what have you.

But then I would say, why do you attribute more realness to the wave function than the table? The wave function is a sort of very impoverished description of the world that doesn’t contain tables and things. So I think there’s this pathology of saying because something is described by fundamental physical mathematical laws, it’s more real than something like a table that is described by people talking about tables to other people.

There’s something very different about those things, but is one of them more real and what does that even mean? If the table is not contained in the wave function and the wave function isn’t really contained in the table, they’re just different things. They’re both, in my view, made out of information, but rather different types and accessible to rather different things.

To me, the, “Then I realized it was a mountain again,” is that yes, the table is kind of an illusion in a sense. It’s made out of atoms and we bring all this stuff to it and we make up solidity and brownness and stuff. So it’s not a fundamental part of the universe. It’s not objectively real, but then I think at some level nothing is so purely objectively real. It’s a sliding scale, and then it’s got a place for things like the wave function of the universe and the fundamental laws of physics at the more objective end of things, and brownness and solidity at the more subjective end of things, and my feelings about tables and my thirst for water at the very subjective end of things. But I see it as a sort of continuous spectrum, and that all of those things are real, just in somewhat different ways. In that sense, I think I’ve come back to those illusory things being real again in a sense, but just from a rather different perspective, if we’re going to be Zen about it.

Lucas Perry: Yeah, it seems to be an open question in physics and cosmology. There is still arguing now currently going on about what it means for something to be real. I guess I would argue that something is real if it maybe has causality or that causality would supervene upon that thing… I’m not even sure, I don’t think I’m even going to start here, I think I would probably be wrong. So…

Anthony Aguirre: Well, I think the problem is in trying to make a binary distinction between whether things are real or not or objective or not. I just think that’s the wrong way to think about it. I think there are things that are much more objective than other things, and things that are much less objective than other things, and to the extent that you want to connect real with being objective, there are then things that are more and less real.

In one of the koans in the book, I make this argument that we think of a mathematical statement like the Pythagorean theorem, say, or some other beautiful thing like Euler’s theorem relating exponentials to cosines and sines, that these are objective special things built into the universe, because we feel like once we understand these things, we see that they must have been true and existed before any people were around. Like it couldn’t be that the Pythagorean theorem just came into being when Pythagoras or someone else discovered it, or Euler’s theorem. They were true all the way back until before the first stars and whatnot.

And that’s clearly the case. There is no time at which those things became true. At the same time, suppose I just take some axioms of mathematics that we employ now, and some sort of rules for generating new true statements from them. And then I just take a computer and start churning out statements. So I churn out all possible consequences of those axioms. Now, if I let that computer churn long enough, somewhere in that string of true statements will be something that can be translated into the Pythagorean theorem or Euler’s theorem. It’s in there somewhere. But am I doing mathematics? I would say I’m not, in the sense that all I’m doing is generating an infinite number of true statements if I let this thing go on forever.

But almost all of them are super uninteresting. They’re just strings of gobbledygook that are true given the axioms and the rules for generating new true statements, but they don’t mean anything. Whereas Euler’s theorem is a very, very special statement that means something. So what we’re doing when we’re doing mathematics, we feel like what we’re doing is proving stuff to be true. And we are at some level, but I think what we’re really doing from this perspective is out of this catalog that is information-free of true statements, we’re picking out a very, very special subset that are interesting. And in making that selection, we’re once again creating information. And the information that we’re creating is really what we’re doing, I think, when we’re doing mathematics.

The information contained in the statement that the Pythagorean theorem is an interesting theorem that applies to stuff in the real world and that we should teach our kids in school, that only came into being when humans did. So although the statement has always been true, the information I think was created along with humans. So I think you kind of get to have it both ways. It is built into the universe, but at the same time, it’s created, so you discover it and you create it.

I think there’s a lot of things that are that way. And although the Pythagorean theorem feels super objective, you can’t disagree with the Pythagorean theorem in a sense, we all agree on it once we understand what it is, at the same time, it’s got this subjective aspect to it that out of all the theorems we selected, this particular one of interest … We also selected the axioms by the way, out of all different sets of axioms we could have chosen. So there’s this combination of objectivity and the subjectivity that we as humans that like to do geometry and think about the world and prove theorems and stuff have brought to it. And that combination is what’s created the information that is associated with the Pythagorean theorem.

Lucas Perry: Yeah. You threw the word “subjectivity” there, but this process is bringing us to the truth, right? I mean, the question is again, what is true or real?

Anthony Aguirre: There are different senses of subjectivity. So there’s one sense of having an interior world view, having consciousness or awareness or something like that, being a subject. And there’s another of saying that its perspectival, that it’s relative or something, that different agents might not agree on it or might see it a little bit differently. So I’d want to distinguish between those two.

Lucas Perry: In which sense did you mean?

Anthony Aguirre: What I mean is that the Pythagorean theorem is quite objective in the sense that once lots of agents agree on the premises and the ground rules, we’re all going to agree on Pythagorean theorem. Whereas we might not agree on whether ice cream is good, but it’s still a little bit not objective.

Lucas Perry: It’s like a small part of all possible mathematically true statements which arise out of those axioms.

Anthony Aguirre: Yes. And that some community of agents in a historical process had to select that out. It can’t be divorced from the process and the agents that brought it into being, and so it’s not entirely objective in that sense.

Lucas Perry: Okay. Yeah, yeah, that makes sense. I see. So this is a question I was intending on asking you an hour ago before we went down this wormhole, first I’m interested in just the structure of your book. How do you structure your book in terms of the ideas and what leads to what?

Anthony Aguirre: Just a brief outline of the book: there are a few different layers of structure. One is the koans themselves, which are sort of parables or little tales that encode some idea. There’s maybe a metaphor or just the idea itself, and the koans take place as part of a narrative that takes place starting in 1610 or 1630 or so, in a trip from Italy to in the end, Kyoto. So this across the world journey that takes place through these koans. And they don’t come in chronological order, so you kind of have to piece together the storyline as the book goes on. But it kind of comes together in the end, so there’s a sequence of things that are happening through the koans, and there’s a storyline that you get to see assemble itself and it involves a genie and it involves a sword fight and it involves all kinds of fun stuff.

That’s one layer of the structure, is the koans forming the narrative. Then after each koan is a commentary that’s kind of delving into the ideas, providing some background, filling in some physics, talking about what that koan was getting at. And in some cases, it’s kind of a resolution to it, like here’s the paradox and here’s the resolution to that paradox. But more often, it’s here’s the question, here’s how to understand what that question is really asking. Here’s a deeper question that we don’t know the answer to, and maybe we’ll come back to later in the book or maybe we won’t. So there’s kind of this development of a whole bunch of physics ideas that are going on in those commentaries.

In terms of the physics ideas, there’s a sequence. There’s first classical physics including relativity. The second part is quantum mechanics, essentially. The third part is statistical mechanics and information theory. The fourth part is cosmology. The fifth part is the connections to the interior sense, like subjectivity and the subject and experiments and thinking about interior sense and consciousness and the eye. And then the last part is a sort of more philosophical section, bringing things together in the way that we’ve been discussing, like how much of reality is out there, how much of it is constructed by us, or us as us writ large as a society and thinking beings and biological evolution and so on. So that’s kind of the structure of the book.

Lucas Perry: Can you read for us two of your favorite koans in the book?

Anthony Aguirre: This one alludes to a classic philosophical thought experiment of the ship of Theseus. This one’s called What Is It You Sail In? It takes place in Shanghai, China in 1620. “After such vast overland distances, you’re relieved that the next piece of your journey will be at sea, where you’ve always felt comfortable. Then you see the ship. You’ve never beheld a sorrier pile of junk. The hull seems to be made mostly of patches, and the patches appear to be made of other patches. The nails look nailed together. The sails are clearly mostly a quilt of canvas sacks and old clothing. ‘Does it float?’ you ask the first mate, packing in as much skepticism as you can fit. ‘Yes. Many repairs, true. But she is still my good companion, [Atixia 00:25:46], still the same ship she ever was.’

Is she?, you wonder. Then you look down at your fingernails, your skin, the fading scar on your arm and wonder, am I? Then you look at the river, the sea, the port and all around. Is anything?”

So what this one’s getting at is this classic tale where if you replace one board of a ship, you’d still say it’s the same ship; you’ve just replaced one little piece of it. But as you replace more and more pieces of it, at some point, every piece of the ship might be a piece that wasn’t there before. So is it the same ship or it’s not? Every single piece has been replaced. And our body is pretty much like this; on a multi-year timescale, we replace pretty much everything.

The idea of this is to get at the fact that when we think of a thing like an identity that something has, it’s much more about the form and I would say the information content in a sense, than about the matter that it’s made up of. The matter’s very interchangeable. That’s sort of the way of kicking off a discussion of what does it mean for something to exist? What is it made of? What does it mean for something to be different than another thing? What are the different forms of existence? What is the form versus the matter?

And with the conclusion that at some level, the very idea of matter is a bit of an illusion. There’s kind of form in the sense that when you think of little bits of stuff, and you break those little bits of stuff down farther, you see that there are protons and electrons and neutrons and whatnot, but what those things are, they’re not little bits of stuff. They’re sort of amounts or properties of something. Like we think of energy or mass as a thing, but it’s better to think of it as a property that something might have if you look.

The fact that you have an electron really means that you’ve got something with a little bit of the energy property or a little bit of the mass property, a little bit of the spin property, a little bit of the electron lepton number property, and that’s it. And maybe you talk about its position or its speed or something. So it’s more like a little bundle of properties than a little bundle of stuff. And then when you think of agglomerations of atoms, it’s the same way. Like the way that they’re arranged is a sort of informational thing, and questions you can ask and get answers to.

Going back to our earlier conversation, this is just a slightly more concrete version of the claim that when we say what something’s made of, there are lots of different answers to that question that are useful in different ways. But the answer that it’s made of stuff is maybe not so useful as we usually think it is.

Lucas Perry: So just to clarify for listeners, koans in Zen traditionally are supposed to be not explicitly philosophically analytical, but experiential things which are supposed to subvert commonly held intuitions which may take you from seeing mountains as mountains, to no mountains, to mountains again. So here there’s this perspective that there’s both supposedly the atoms which make up me and you, and then the way in which the atoms are arranged, and then this koan that you say elicits the thought that you can remove any bit of information from me, and you can continue to move one bit of information from me at a time, and there’s no one bit of information that I would say is essential to what I call Lucas, or what I take to be myself. Nor atoms. So then what am I? How many atoms or bits of information do you have to take away from me until I stop being Lucas? And so one may arrive at the place where you’re deeply questioning the category of Lucas altogether.

Anthony Aguirre: Yeah. The things in this book are not Zen koans in the sense that a lot of them are pretty philosophical and intellectual and analytical, which Zen koans are sort of not. But at the same time, when you delve into them and try to experience them, when you think not of the abstract idea of the ship in this koan and lepton numbers and energy and things like that, but when you apply it to yourself and think, okay, what am I if I’m not this body?, then it becomes a bit more like a genuine Zen koan. You’re sort of like, ah, I don’t know what I am. And that’s a weird place to be. I don’t know what I am.

Lucas Perry: Yeah. Sure. And the wisdom to be found is the subversion of a ton of different commonly held intuitions, which are evolutionarily conditioned, which are culturally conditioned and socially conditioned. So yeah, this has to do with the sense of permanent things and objects, and then what identity ultimately is, or what our preferences are about identity, or if there are normative or ethical imparitives about the sense of identity that we out to take. Are there any other ideas here for some other major intuitions that you’re attempting to subvert in your book?

Anthony Aguirre: Well yeah, there’s … I guess it depends which ones you have, but I’ve subverted as many as I can. I mean, a big one I think is the idea of a sort of singular individual self, and that’s one that is really interesting to experiment with. The way we go through our lives pretty much all the time is that there’s this one-to-one correspondence between our feeling that we’re an individual self looking out at the world, there’s an “I”. We feel like there’s this little nugget of me-ness that’s experiencing the world and owns mental faculties, and then owns and steers around this body that’s made out of physical stuff.

That’s the intuition that we go through life with, but then there are all kinds of thought experiments you can do that put tension on that. And one of them that I go through a lot in the book is what happens when the body gets split or duplicated, or there are multiple copies of it and things like that. And some of those things are physically impossible or so extraordinarily difficult that they’re not worth thinking about, but some of them are very much things that might automatically happen as part of physics, if we really could instantaneously copy a person and create a duplicate of them across the room or something like that.

What does that mean? How do we think about that? When we’ve broken that one-to-one correspondence between the thing that we like to think of as ourself and our little nugget of I-ness, and the physical body, which we know is very, very closely related to that thing. When one of them bifurcates into two, it kind of throws that whole thing up in the air, like now what do we think? And it gets very unsettling to be confronted with that. There are several koans investigating that at various different levels that don’t really draw any conclusions, I would say. They’re more experiments that I’m sort of inviting other people to subject themselves to, just as I have thinking about them.

It’s very confusing how to think about them. Like, should I care if I get copied to another copy across the room and then get instantaneously destroyed? Should that bother me? Should I fear that process? What if it’s not across the room, but across the universe? And what if it’s not instantaneously that I appear across the room, but I get destroyed now, and I exist on the other side of the universe a billion years from now, the same configuration of atoms? Do I care that that happens? There are no easy answers to this, I think, and they’re not questions that you can easily dismiss.

Lucas Perry: I think that this has extremely huge ethical implications, and represents, if transcended, an important point in human evolution. There is this koan, which is something like, “If you see the Buddha on the road, kill him.” Which means if you think you’ve reached something like enlightenment, it’s not that, because enlightenment is another one of these stories. But insofar as human beings are capable of transcending illusions and reaching anything called enlightenment… I think that an introspective journey into trying to understand the self and the world is one of the most interesting pursuits a human being can do. And just to contextualize this and, I think, paint the picture better, it’s evolution that has evolved these information processing systems, with this virtual sense of self that exists in the world model we have, and the model we have about ourselves and our body, and this is because this is good for self preservation. 

So you can say, “Where do you feel you’re located?” Well I sort of feel I’m behind my face and I feel I have a body and I have this large narrative of self concept and identity, which is like, “OI’m Lucas. I’m from here. I have this concept of self which I’ve created, which is basically this extremely elaborative connotative web of all the things which I think make up my identity. And under scrutiny, this is basically just all conditioned, it’s all outside of myself, all prior to myself, I’m not self-made at all, yet I think that I’m some sort of self separate entity. And then comes along Abrahamic religions at some point in the story of humanity, which are going to have tremendous cultural and social implications on the way that evolution has already bred ego-primates like ourselves. We’re primates with egos and now we have Abrahamic religions, which are contributing to this problem by conditioning the language and philosophy and thought of the West, which say that ultimately you’re a soul, you’re not just a physical thing.

You’re actually a soul who has a body and you’re basically just visiting here for a while, and then the thing that is essentially you will go to the next level of existence. This leads to, I think, reifying this rational conceptualization of self and this experience itself. Where you feel like you have a body, you feel that your heart beats itself, you feel that think your thoughts and you say things like, “I have a brain.” Who is it that stands in relation to the brain? Or we might say something like, “I have a body.” Who is it that has a body? So it seems like our language is clearly conditioned and structured around our sense and understanding of self. And there’s also this sense in which you’ve been trying to subvert some sorts of ideas here, like sameness or otherness, what counts as the same ship or not. And from an ultimate physics perspective, the thing that is fusing the stars is the same thing that is thinking my thoughts. The fundamental ontology of the world is running everything, and I’m not separate from that, yet if feels like I am, and this seems to have tremendous ethical implications.

For example, people believe that people are deserving of retribution for crimes or acting immorally, as if they had chosen in some ultimate and concrete sense what to do. The ultimate spiritual experience, or at least the ultimate insight, is to see this whole thing for what it is, to realize that basically everyone is spell bound by these narratives of self, and these different intuitions we have about the world, and that we’re basically bought into this story that I think Abrahamic religions have led to a deeper conditioning in us. It seems to me that atheists also experience themselves this way. We think when we die there’ll be nothing, there will just be an annihilation of the self, but part of this realization process is that there’s no self to be annihilated to begin with. There’s just consciousness and its contents, and ultimately by this process you may come to see that consciousness is something empty of self and empty of identity. It’s just another thing that is happening.

Anthony Aguirre: I think there are a lot of these cases where the mountain becomes less then more of a mountain and then more and less of a mountain. You touched upon consciousness and free will and many other things that are also in this, and there’s a lot of discussion of free will in the book and we can get into that too. I think with consciousness or the self, I find myself in this strange sort of war in the sense that, on the one hand I feel like there’s a sense in which this self that we construct, is kind of an illusionary thing and that the ego and things that we attach to, is kind of an illusionary thing. But at the same time, A, it sure feels real and the feeling of being Anthony, I think is a kind of unique thing.

I don’t subscribe to the notion that there’s this little nugget of soul stuff that exists at the core of a person. It’s easy to sort of make fun of this, but at the same time I think the idea that there’s something intrinsically equally valuable to each person is really, really important. I mean it underlies a lot of our way of thinking about society and morality, in ways that I find very valuable. And so while I kind of doubt the sort of metaphysics of the individual’s soul in that sense, I worry what happens to the way we’ve constructed our scheme of values. If we grade people on a sliding scale, you’re more valuable than this other person. I think that sense of equal intrinsic human worth is incredibly crucial and has led to a lot of moral progress. So I have this really ambivalent feeling, in that I doubt that there’s some metaphysical basis for that, but at the same time I really, really value that way of looking at the self, in terms of society and morality and so on, that we’ve constructed on top of that.

Lucas Perry: Yeah, so there’s the concept in zen Buddhism of skillful means. So one could say that the concept of each human being having some kind of equal and intrinsic worth, which is related to their uniqueness and fundamental being as being a human being, that that is skillful. 

Anthony Aguirre: It’s not something that in some sense makes any rational sense. Whatever you name, some people have more of it than others. Money, capability, intelligence, sensitivity.

Lucas Perry: Even consciousness.

Anthony Aguirre: Consciousness maybe. Maybe some people are just a lot more conscious than others. If we can measure it, maybe some people would be like a 10 on the dial and others would be 2. Who knows?

Lucas Perry: I think that’s absolutely probably true, because some people are brain dead. Medically there’s a sliding scale of brain activity, so yeah, I think today it seems clear that some people are more conscious than others.

Anthony Aguirre: Yes, that’s certainly true. I mean when we go to sleep, we’re less conscious. But nonetheless, although anything that you can measure about people and their experience of the world varies and if you could quantify it on a scale, some people would have more and less. Nonetheless, we find it useful to maintain this idea that there is some intrinsic equality among people and I worry what would happen if we let go of that. What kind of world would we build without that assumption? So I find it valuable to keep that assumption, but I’m conflicted about that honestly, because on what basis do we make that assumption? I really feel good about it, but I’m not sure I can point to why. Maybe that’s just what we do. We say this is an axiom that we choose to believe that there’s an intrinsic moral value to people and I respect that, because I think you have to have axioms. But it’s an interesting place that we’ve come to, I think in terms of the relation between our beliefs about reality and our beliefs about morality.

Lucas Perry: Yeah. I mean there’s the question, as we approach AI and super intelligence, of what authentic experiential and ethical enlightenment and idealization means. From my perspective the development of this idea, which is correlated with the enlightenment and humanism, right? Is a very recent thing, the 17 and the 1800’s, right? So it seems clear from a cosmological context that this norm or ethical view is obviously based on a bunch of things that are just not true, but at the same time it’s been ethnically very skillful and meaningful for fixing many of the immoral things that humans do, that are unethical. But obviously it seems like it will give way to something else, and the question is, is what else does it give way to?

So if we create Life 3.0 and we create AI’s that do not care about getting turned off for two minutes and then waking up again, because they don’t feel the delusion of a self. That to me seems to be a step in moral evolution, and why I think that ultimately it would be super useful for AI design, if the AI designers would consider the role that identity plays in forming strong AI systems that are there to help us. We have the opportunity here to have selfless AI systems, they’re not going to be confused like we are. They’re not going to think they have souls, or feel like they have souls, or have strong senses of self. So it seems like there’s opportunities here, and questions around what it means to transcend many of the aspects of human experience, and how best it would be to instantiate that in advanced AI systems. 

Anthony Aguirre: Yeah, I think there’s a lot of valuable stuff to talk about there. In humans, there are a whole bunch of things that go together that don’t necessarily have to be packaged together. Intelligence and consciousness are packaged together, it’s not clear to what degree those have to be. It’s not clear how much consciousness and selfness have to be packaged together. It’s not clear how much consciousness or selfness and a valence to consciousness, a positive or negative experience have to be packaged together. Could we conceive of something that is intelligent, but not conscious? I think we certainly could, depending on how intelligent it has to be. I think we have those things and depending on what we mean by consciousness, I guess. Can we imagine something that is conscious and intelligent, but without a self, maybe? Or conscious, but it doesn’t matter to it how something goes. So it’s something that’s conscious, but can’t really have a moral weight in the sense that it doesn’t either suffer or experience positive feelings, but it does experience.

I think there’s often a notion that if something is said to have consciousness, then we have to care about it. It’s not totally clear that that’s the case and at what level do we have to care about somethings preferences? The rain prefers to fall down, but I don’t really care and if I frustrate the rain by putting up an umbrella, I don’t feel bad about that. So at what level do preferences matter and how do we define those? So there are all these really, really interesting questions and what’s both sort of exciting and terrifying, is that we have a situation in which those questions are going to play out. In that we’re going to be creating things that are intelligent and we’re doing that now depending on how intelligent they have to be again. That may or may not be conscious, that may or may not have preferences, may or may not matter. They may or may not experience something positive or negative when those preferences are satisfied or not.

And I think we have the possibility of both moral catastrophe if we do things wrong at some level, but an enormous opportunity as well, in the sense that you’ve pointed out that we may be able to create agents that are purely selfless and insofar as other beings have a moral value. These beings can be absolute altruists, like Stewart has been pointing out in his book. Absolute altruism is a pretty tough one for humans to attain, but might be really easy for beings that we construct that aren’t tied to an evolutionary history and all those sorts of things that we came out of.

It may still be that the sort of moral value of the universe centers around the beings that do have meaningful preferences, like humans. Where meaning sort of ultimately sits, what is important and what’s not and what’s valuable and what’s not. If that isn’t grounded in the preferences of experiencing conscious beings, then I don’t know where it’s grounded, so there’s a lot of questions that come up with that. Does it just disappear if those beings disappear and so on? All incredibly important questions I think, because we’re now at the point in the next however many years, 50, 100, maybe less, maybe more. Where our decisions are going to affect what sorts of beings the universe gets inhabited by in the far future and we really need to avoid catastrophic blunders in how that plays out.

Lucas Perry: Yeah. There this whole aspect of AI alignment that you’re touching on, that is not just AI alignment, but AI generation and creation. The problem has been focused on how we can get AI systems, in so far as we create them, to serve the needs of human beings, to understand our preference hierarchies, to understand our metapreferences. But in the creation of Life 3.0, there’s this perspective that you’re creating something who, by virtue of how it is created, it is potentially more morally relevant than you, it may be capable of much more experience, much more profound levels of experience, which also means that there’s this aspect of AI alignment which is about qualia architecting or experience architecting or reflecting on the fact that we’re building Life 3.0. These aren’t just systems that can process information for us, there are important questions about what it is like to be that system in terms of experience and ethics and moral relevance. If you create something with the kind of experience that you have, and it has the escape velocity to become super intelligent and populate the cosmic endowment with whatever it determines to be the good, or what we determine to be the good, what is the result of that?

One last thing that I’m nervous about is that the way that the illusion of self will contribute to a fair and valuable AI alignment. This consideration is in relation to us not being able to see what is ultimately good. We could ultimately be tied up in the preservation of our own arbitrary identities, like the Lucas identity or the Anthony identity. You could be creating something like blissful, purely altruistic, benevolent Boddhisattva gods, but we never did because we had this fear and this illusion of self-annihilation. And that’s not to deny that our information can be destroyed, and maybe we care a lot about the way that the Lucas identity information is arranged, but when we question these types of intuitions that we have, it makes me question and wonder if my conditioned identity is actually as important as I think it is, or as I experience it to be.

Anthony Aguirre: Yeah, I think this is a very horrifyingly thorny question that we have to face and my hope is that we have a long time to face it. I’m very much an advocate of creating intelligent systems that can be incredibly helpful and economically beneficial and then reaping those benefits for a good long time while we sort ourselves out. But with a fairly strict upper limit on how intelligent and powerful we make those things. Because I think if huge gains in the capability of machine systems happens in a period of years or even decades, the chance of us getting these big questions right, seems to me like almost zero. There’s a lot of argumentation about how difficult is it to build a machine system that has the same sort of general intelligence that we do. And I think part of what makes that question hard, is thinking about the huge amount of effort that went in evolutionarily and otherwise to creating the sort of robust intelligence that humans have.

I mean we’ve built up over millions of years in this incredibly difficult adversarial environment, where robustness is incredibly important. Cleverness is pretty important, but being able to cope with a wide variety of circumstances is kind of what life and mind has done. And I think the degree to which AGI will be difficult, is at some level the degree to which it has to attain a similar level of generality and robustness, that we’ve spent just an ungodly amount of computation over the evolution of life on earth to attain. If we have to do anything like that level of computation, it’s going to take just an extraordinarily long time. But I think we don’t know to what degree all of that is necessary and to what degree we can really skip over a lot of it, in the same way that we skip over a lot of evolution of flying when we build an airplane.

But I think there’s another question, which is that of experience and feeling that were even more clueless as to where we would possibly start. If we wanted to create an appreciation for music, you have no clue where to even begin with that question, right? What does it even mean to appreciate or listen to, in some sense have preferences. You can maybe make a machine that will sort different kinds of music into different categories, but do you really feel like there’s going to be any music appreciation in there or in any other human feeling? These are things that have a very, very long, complicated evolutionary history and it’s really unclear to me that we’re going to get them in machine form without something like that. But at least as our moral system is currently construed, those are the things that actually matter.

Whether conscious beings are having a good time, is pretty much the foundation of what we consider to be important, morally speaking at least. Unless we have ideas like we have to do it with a way to please some deity or something like that. So I just don’t know, when you’re talking about future AI beings that have a much richer and deeper interior sense, that’s like the AGI problem squared. We can at least imagine what it’s like to make a general intelligence, an idea of what it would take to do that. But when you talk about creating a feeling being, with deeper, more profound feelings that we have, just no clue what that means in terms of actually engineering or something.

Lucas Perry: So putting on the table all of the moral anti-realism considerations and thought that many people in the AI alignment community may have… Their view is that there’s the set of the historically conditioned preferences that we have and that’s it. We can imagine if horshoecrabs had been able to create a being more intelligent than them, a being that was aligned to horshoecrabs preferences and preference hierarchy. And we can imagine that the horseshoecrabs were very interested and committed to just being horseshoecrabs, because that’s what horseshoecrab wants to do. So now you have this being that was able to maintain it’s own existential condition of the horseshoecrab for a very long time. That just seems like an obvious moral catastrophe. It seems like a waste of what could have been.

Anthony Aguirre: That’s true. But if you imagine that the horseshoe crabs, instead creating elaborate structures out of sand, that they decided we’re their betters and we’re like, this is their legacy was to create these intricate sand structures, because the universe deserves to be inhabited by these much greater beings than them. Then that’s also a moral catastrophe, right? Because the sand structures have no value whatsoever.

Lucas Perry: Yeah. I don’t want humans to do any of these things. I don’t want human beings to go around building monuments, and I don’t want us to lock in to the human condition either. Both of these cases obviously seem like horrible waste, and now you’re helping to articulate the issue that human beings are at a certain place in evolution. 

And so if we’re to create Life 3.0, then it’s also unclear epistemically how we are to evaluate what kinds of exotic qualia states are the kinds that are morally good, and I don’t even know how to begin to answer that question.

So we may be unaware of experiences that literally astronomically better than the kinds of experiences that we have access to, and it’s unclear to me how you would navigate effectively towards that, other than amplifying what we already have.

Anthony Aguirre: Yeah. I guess my instinct on that is to look more on the biology side then the machine side and to say as biological systems, we’re going to continue to evolve in various ways. Some of those might be natural, some of them might be engineered and so on. Maybe some of them are symbiotic, but I think it’s hard for me to imagine how we’re going to have confidence that the things that are being created have an experience that we would recognize or find valuable, if they don’t have some level of continuity with what we are, that we can directly experience. The reason I feel confidence that my dog is actually feeling some level of joy or frustration or whatever, is really by analogy, right? There’s no way that I can get inside the dog’s mind, maybe someday there will be, but there’s no way at the moment. I assume that because we have this common evolutionary heritage, that the outward manifestations of those feelings correspond to some inward feelings in much the same way that they do in humans and much the same the way that they do in me. And I feel quite confident about that really, although for a long period of history, people have believed otherwise at times.

So I think realistically all we’re going to be able to do, is reason by analogy and that’s not going to work very well I think with machine systems, because it’s quite clear that we’ll be able to create machine systems that can wag their tails and smile and things, even though there’s manifestly nothing behind that. So at what point we would start to believe the sort of behavioral cues and say that there’s some interior sense behind that, is very, very unclear when we’re talking about a machine system. And I think we’re very likely to make all kinds of moral errors in either ascribing too much or too little interior experience to machines, because we have no real way of knowing to make any meaningful connection between those things. I suspect that we’ll tend to make the error in both directions. We’ll create things that seem kind of lifelike and attribute all kinds of interior life to them that we shouldn’t and if we go on long enough, we may well create things that have some interior sense that we don’t attribute to them and make all kinds of errors that way too.

So I think it’s quite fraught actually in that sense and I don’t know what we’re going to do about that. I mean we can always hope that the intractably hard problems that we can’t solve now, will just be solved by something much smarter than us. But I do worry a little bit about attributing sort of godlike powers to something by saying, “Oh, it’s super intelligent, so it will be able to do that.” I’m not terribly optimistic. It may well be that the time at which something is so intelligent that it can solve the problem of consciousness and qualia and all these things, it’d be so far beyond the time at which it was smart enough to completely change reality in the world and all kinds of other things. That it’s almost past the horizon of what we can think about now, it’s sort of past the singularity in that sense. We can speculate, hopefully or not hopefully, but it’s not clear on what basis we would be speculating.

Lucas Perry: Yeah. At least the questions that it will need to face, and then we can leave it open as to whether or not and how long it will need to address those questions. So we discussed who I am, I don’t know. You touched on identity and free will. I think that free will in the libertarian sense, as in I could have done otherwise, is basically one of these common sense intuitions that is functionally useful, but ultimately illusory.

Anthony Aguirre: Yeah, I disagree. I will just say briefly, I prefer to think of free will as a set of claims that may or may not be true. And I think in general it’s useful to decompose the question of free will into a set of claims that may or may not be true. And I think when you do that, you find that most of the claims are true, but there may be some big fuzzy metaphysically thing that you’re equating to that set of claims and then claiming it’s not true. So that’s my feeling, that when you actually try to operationalize what you mean by free will, you’ll find that a lot of the things that you mean actually are properties of reality. But if you sort of invent a thing that you call free will, that’s by its nature can’t be part of a physical world, then yes, that doesn’t exist. In a nutshell that’s my point of view, but we could go into a lot more depth some other time.

Lucas Perry: I think I understand that from that short summary. So for this last part then, can you just touch on, because I think this is an interesting point, as we come to the end of the conversation. Form is emptiness, emptiness is form. What does that mean?

Anthony Aguirre: So form is emptiness, is coming back to the discussion of earlier. That when we talk about something like a table, that thing that we call real and existing and objective in some sense, is actually composed of all kinds of ingredients that are not that thing. Our evolutionary history and our concept of solidity and shape, all of these things come together from many different sources and as the Buddhist would say, “There’s no intrinsic self existence of a table.” It very much exists relative to a whole bunch of other things, that we and many other people and processes and so on, bring into being. So that’s the form is emptiness. The emptiness is the emptiness of an intrinsic self existence, so that’s the way that I view the form is emptiness.

But turning that around, that emptiness is form, is yes, even though the table is empty of inherit existence, you can still knock on it. It’s still there, it’s still real and it’s in many ways as real as anything else. If you look for something that is more intrinsically existing than a table, you’re not really going to find it and so we might as well call all of those things real, in which case the emptiness is form again, it’s something. That’s the way I sort of view it and that’s the way that I’ve explored it in that section of the book.

 So to talk about like the ship, that there’s this form of the ship that is kind of what we call the ship. That’s the arrangement of atoms and so on, it’s kind of made out of information and whatnot. That that form is empty in the sense that there are all these ingredients, that come from all these different places that come together to make that thing, but then that doesn’t mean it’s non-existent or meaningless or something like that. That there very much is meaning in the fact that something is a ship rather than something else, that is reality. So that’s kind of the case that I’m putting together in that last section of the book. It’s not so simply either, our straight forward sense of a table as a real existing thing, nor is it, everything is an illusion. It’s like a dream, it’s like a phantasm, nothing is real. Neither of those is the right way to look at it.

Lucas Perry: Yeah, I think that your articulation here brings me again back, for better or for worse, to mountains, no mountains, and mountains again. I came into this conversation with my conventional view of things, and then there’s “form is emptiness.” Oh so okay, so no mountains. But then “emptiness is form.” Okay, mountains again. And given this conceptual back and forth, you can decide what to do from there.

Anthony Aguirre: So have we come back to the mountain in this conversation, at this point?

Lucas Perry: Yeah. I think we’re back to mountains. So I tremendously valued this conversation and feel that it’s given me a lot to consider. And I will re-enter the realm of feeling like a self and inhabiting a world of chairs, tables, objects and people. And will have to engage with some more thinking about information theory. And with that, thank you so much.

 

AI Alignment Podcast: Human Compatible: Artificial Intelligence and the Problem of Control with Stuart Russell

Stuart Russell is one of AI’s true pioneers and has been at the forefront of the field for decades. His expertise and forward thinking have culminated in his newest work, Human Compatible: Artificial Intelligence and the Problem of Control. The book is a cornerstone piece, alongside Superintelligence and Life 3.0, that articulates the civilization-scale problem we face of aligning machine intelligence with human goals and values. Not only is this a further articulation and development of the AI alignment problem, but Stuart also proposes a novel solution which bring us to a better understanding of what it will take to create beneficial machine intelligence.

 Topics discussed in this episode include:

  • Stuart’s intentions in writing the book
  • The history of intellectual thought leading up to the control problem
  • The problem of control
  • Why tool AI won’t work
  • Messages for different audiences
  • Stuart’s proposed solution to the control problem

Key points from Stuart: 

  •  “I think it was around 2013 that it really struck me that in fact we’d been thinking about AI the wrong way all together. The way we had set up the whole field was basically kind of a copy of human intelligence in that a human is intelligent, if their actions achieve their goals. And so a machine should be intelligent if its actions achieve its goals. And then of course we have to supply the goals in the form of reward functions or cost functions or logical goals statements. And that works up to a point. It works when machines are stupid. And if you provide the wrong objective, then you can reset them and fix the objective and hope that this time what the machine does is actually beneficial to you. But if machines are more intelligent than humans, then giving them the wrong objective would basically be setting up a kind of a chess match between humanity and a machine that has an objective that’s across purposes with our own. And we wouldn’t win that chess match.”
  • “So when a human gives an objective to another human, it’s perfectly clear that that’s not the sole life mission. So you ask someone to fetch the coffee, that doesn’t mean fetch the coffee at all costs. It just means on the whole, I’d rather have coffee than not, but you know, don’t kill anyone to get the coffee. Don’t empty out my bank account to get the coffee. Don’t trudge 300 miles across the desert to get the coffee. In the standard model of AI, the machine doesn’t understand any of that. It just takes the objective and that’s its sole purpose in life. The more general model would be that the machine understands that the human has internally some overall preference structure of which this particular objective fetch the coffee or take me to the airport is just a little local manifestation. And machine’s purpose should be to help the human realize in the best possible way their overall preference structure. If at the moment that happens to include getting a cup of coffee, that’s great or taking him to the airport. But it’s always in the background of this much larger preference structure that the machine knows and it doesn’t fully understand. One way of thinking about is to say that the standard model of AI assumes that the machine has perfect knowledge of the objective and the model I’m proposing assumes that the model has imperfect knowledge of the objective or partial knowledge of the objective. So it’s a strictly more general case.”
  • “The objective is to reorient the field of AI so that in future we build systems using an approach that doesn’t present the same risk as the standard model… That’s the message I think for the AI community is the first phase our existence maybe should come to an end and we need to move on to this other way of doing things. Because it’s the only way that works as machines become more intelligent. We can’t afford to stick with the standard model because as I said, systems with the wrong objective could have arbitrarily bad consequences.”

 

Important timestamps: 

0:00 Intro

2:10 Intentions and background on the book

4:30 Human intellectual tradition leading up to the problem of control

7:41 Summary of the structure of the book

8:28 The issue with the current formulation of building intelligent machine systems

10:57 Beginnings of a solution

12:54 Might tool AI be of any help here?

16:30 Core message of the book

20:36 How the book is useful for different audiences

26:30 Inferring the preferences of irrational agents

36:30 Why does this all matter?

39:50 What is really at stake?

45:10 Risks and challenges on the path to beneficial AI

54:55 We should consider laws and regulations around AI

01:03:54 How is this book differentiated from those like it?

 

Works referenced:

Human Compatible: Artificial Intelligence and the Problem of Control

Superintelligence

Life 3.0

Occam’s razor is insufficient to infer the preferences of irrational agents

Synthesizing a human’s preferences into a utility function with Stuart Armstrong

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas: Hey everyone, welcome back to the AI Alignment Podcast. I’m Lucas Perry and today we’ll be speaking with Stuart Russell about his new book, Human Compatible: Artificial Intelligence and The Problem of Control. Daniel Kahneman says “This is the most important book I have read in quite some time. It lucidly explains how the coming age of artificial super intelligence threatens human control. Crucially, it also introduces a novel solution and a reason for hope.”

Yoshua Bengio says that “This beautifully written book addresses a fundamental challenge for humanity: increasingly intelligent machines that do what we ask, but not what we really intend. Essential reading if you care about our future.”

I found that this book helped clarify both intelligence and AI to me as well as the control problem born of the pursuit of machine intelligence. And as mentioned, Stuart offers a reconceptualization of what it means to build beneficial and intelligent machine systems. That provides a crucial place of pivoting and how we ought to be building intelligent machines systems.

Many of you will already be familiar with Stuart Russell. He is a professor of computer science and holder of the Smith-Zadeh chair in engineering at the University of California, Berkeley. He has served as the vice chair of the World Economic Forum’s Council on AI and Robotics and as an advisor to the United Nations on arms control. He is an Andrew Carnegie Fellow as well as a fellow of the Association for The Advancement of Artificial Intelligence, the Association for Computing Machinery and the American Association for the Advancement of Science.

He is the author with Peter Norvig of the definitive and universally acclaimed textbook on AI, Artificial Intelligence: A Modern Approach. And so without further ado, let’s get into our conversation with Stuart Russell.

Let’s start with a little bit of context around the book. Can you expand a little bit on your intentions and background for writing this book in terms of timing and inspiration?

Stuart: I’ve been doing AI since I was in high school and for most of that time the goal has been let’s try to make AI better because I think we’ll all agree AI is mostly not very good. When we wrote the first edition of the textbook, we decided to have a section called, What If We Do Succeed? Because it seemed to me that even though everyone was working on making AI equivalent to humans or better than humans, no one was thinking about what would happen if that turned out to be successful.

So that section in the first edition in 94 was a little equivocal, let’s say, you know, we could lose control or we could have a golden age and let’s try to be optimistic. And then by the third edition, which was 2010 the idea that we could lose control was fairly widespread, at least outside the AI communities. People worrying about existential risk like Steve Omohundro, Eliezer Yudkowsky and so on.

So we included those a little bit more of that viewpoint. I think it was around 2013 that it really struck me that in fact we’d been thinking about AI the wrong way all together. The way we had set up the whole field was basically kind of a copy of human intelligence in that a human is intelligent, if their actions achieve their goals. And so a machine should be intelligent if its actions achieve its goals. And then of course we have to supply the goals in the form of reward functions or cost functions or logical goals statements. And that works up to a point. It works when machines are stupid. And if you provide the wrong objective, then you can reset them and fix the objective and hope that this time what the machine does is actually beneficial to you. But if machines are more intelligent than humans, then giving them the wrong objective would basically be setting up a kind of a chess match between humanity and a machine that has an objective that’s across purposes with our own. And we wouldn’t win that chess match.

So I started thinking about how to solve that problem. And the book is a result of the first couple of years of thinking about how to do it.

Lucas: So you’ve given us a short and concise history of the field of AI alignment and the problem of getting AI systems to do what you want. One of the things that I found so great about your book was the history of evolution and concepts and ideas as they pertain to information theory, computer science, decision theory and rationality. Chapters one through three you sort of move sequentially through many of the most essential concepts that have brought us to this problem of human control over AI systems.

Stuart: I guess what I’m trying to show is how ingrained it is in intellectual thought going back a couple of thousand years. Even in the concept of evolution, this notion of fitness, you know we think of it as an objective that creatures are trying to satisfy. So in the 20th century you had a whole lot of disciplines, economics developed around the idea of maximizing utility or welfare or profit depending on which branch you look at. Control theory is about minimizing a cost function, so the cost function described some deviation from ideal behavior and then you build systems that minimize the cost. Operations research, which is dynamic programming and Markov decision processes is all about maximizing the sum of rewards. And statistics if you set it up in general, is about minimizing an expected loss function.

So all of these disciplines have the same bug if you like. It’s a natural way to set things up, but in the long run we’ll just see it as a bad cramped way of doing engineering. And what I’m proposing in the book actually is a way of thinking about it that’s much more in a binary rather than thinking about the machine and it’s objective.

You think about this coupled system with humans or you know, it could be any entity that wants a machine to do something good for it or another system to do something good for it. And then the system itself, which is supposed to do something good for the human or whatever else it is that wants something good to happen. So this kind of coupled system, don’t really see that in the intellectual tradition. Maybe one exception that I know of, which is the idea of principle agent games in economics. So a principal might be an employer and the agent might be the employee. And then the game is how does the employer get the employee to do something that the employer actually wants them to do, given that the employee, the agent has their own utility function and would rather be sitting home drinking beers and watching football on the telly.

How do you get them to show up at work and do all kinds of things they wouldn’t normally want to do? The simplest way is you pay them. But you know, there’s all kinds of other ideas about incentive schemes and status and then various kinds of sanctions if people don’t show up and so on. So the economists study that notion, which is a coupled system where one entity wants to benefit from the behavior of another.

So that’s probably the closest example that we have. And then maybe in ecology, look at symbiotic species or something like that. But there’s not very many examples that I’m aware of. In fact, maybe I can’t think of any, where the entity that’s supposedly in control, namely us, is less intelligent than the entity that it’s supposedly controlling, namely the machine.

Lucas: So providing some framing and context here for the listener, the first part of your book, chapters one through three explores the idea of intelligence in humans and in machines. There you give this historical development of ideas and I feel that this history you give of computer science and the AI alignment problem really helps to demystify both the person and evolution as a process and the background behind this problem.

Your second part of your book, chapters four through six discusses some of the problems arising from imbuing machines with intelligence. So this is a lot of the AI alignment problem considerations. And then the third part, chapter seven through ten suggests a new way to think about AI, to ensure that machines remain beneficial to humans forever.

You’ve begun stating this problem and readers can see in chapters one through three that this problem goes back a long time, right? The problem with computer science at its inception was that definition that you gave that a machine is intelligent in so far as it is able to achieve its objectives. In reaction to this, you’ve developed cooperative inverse reinforcement learning and inverse reinforcement learning, which is sort of part of the latter stages of this book where you’re arguing for new definition that is more conducive to alignment.

Stuart: Yeah. In the standard model as I call it in the book, the humans specifies the objective and plugs it into the machine. If for example, you get in your self driving car and it says, “Where do you want to go?” And you say, “Okay, take me to the airport.” For current algorithms as we understand them, understand built on this kind of model, that objective becomes the sole life purpose of the vehicle. It doesn’t necessarily understand that in fact that’s not your sole life purpose. If you suddenly get a call from the hospital saying, oh, you know, your child has just been run over and is in the emergency room. You may well not want to go to the airport. Or if you get into a traffic jam and you’ve already missed the last flight, then again you might not want to go to the airport.

So when a human gives an objective to another human, it’s perfectly clear that that’s not the sole life mission. So you ask someone to fetch the coffee, that doesn’t mean fetch the coffee at all costs. It just means on the whole, I’d rather have coffee than not, but you know, don’t kill anyone to get the coffee. Don’t empty out my bank account to get the coffee. Don’t trudge 300 miles across the desert to get the coffee.

In the standard model of AI, the machine doesn’t understand any of that. It just takes the objective and that’s its sole purpose in life. The more general model would be that the machine understands that the human has internally some overall preference structure of which this particular objective fetch the coffee or take me to the airport is just a little local manifestation. And machine’s purpose should be to help the human realize in the best possible way their overall preference structure.

If at the moment that happens to include getting a cup of coffee, that’s great or taking him to the airport. But it’s always in the background of this much larger preference structure that the machine knows and it doesn’t fully understand. One way of thinking about is to say that the standard model of AI assumes that the machine has perfect knowledge of the objective and the model I’m proposing assumes that the model has imperfect knowledge of the objective or partial knowledge of the objective. So it’s a strictly more general case.

When the machine has partial knowledge of the objective there’s whole lot of new things that come into play that simply don’t arise when the machine thinks it knows the objective. For example, if the machine knows the objective, it would never ask permission to do an action. It would never say, you know, is it okay if I do this because it believes that it’s already extracted all there is to know about human preferences in the form of this objective. And so whatever plan it formulates to achieve the objective must be the right thing to do.

Whereas a machine that knows that it doesn’t know the full objective could say, well, given what I know, this action looks okay, but I want to check with the boss before going ahead because it might be that this plan actually violate some part of the human preference structure that it doesn’t know about. So you get machines that ask permission, you get machines that, for example, allow themselves to be switched off because the machine knows that it might do something that will make the human unhappy. And if the human wants to avoid that and switches the machine off, that’s actually a good thing. Whereas a machine that has a fixed objective would never want to be switched off because that guarantees that it won’t achieve the objective.

So in the new approach you have a strictly more general repertoire of behaviors that the machine can exhibit. The idea of inverse reinforcement learning is this is the way for the machine to actually learn more about what the human preference structure is. By observing human behavior, which could be verbal behavior, like, could you fetch me a cup of coffee? That’s a fairly clear indicator about your preference structure, but it could also be that you know, you ask a human question and the human doesn’t reply. Maybe the human’s mad at you and is unhappy about the line of questioning that you’re pursuing.

 So human behavior means everything humans do and have done in the past. So everything we’ve ever written down, every movie we’ve made, every television broadcast contains information about human behavior and therefore about human preferences. Inverse reinforcement learning really means how do we take all that behavior and learn human preferences from it?

Lucas: What can you say about how tool AI as a possible path to AI alignment fits in this schema where we reject the standard model, as you call it, in favor of this new one?

Stuart: Tool AI is a notion, oddly enough, it doesn’t really occur within the field of AI. It’s a phrase that came from people who are thinking from the outside about possible risks from AI. And what it seems to mean is the idea that rather than buildings general purpose intelligence systems. If you are building AI systems designed for some specific purpose, then that’s sort of innocuous and doesn’t present any risks. And some people argue that in fact if you just have a large collection of these innocuous application specific AI systems, then there’s nothing to worry about.

My experience of tool AI is that when you build applications specific systems, you can kind of do it in two ways. One is you kind of hack it. In other words, you figure out how you would do this task and then you write a whole bunch of very, very special purpose code. So, for example, if you were doing handwriting recognition, you might think, oh, okay, well in order to find an ‘S’ I have to look for a line that’s curvy and I follow the line and it has to have three bends, it has to be arranged this way. And you know, you write a whole bunch of tests to check each characteristic of an ad that it has all these characteristics and it doesn’t have any loops and this, that and the other. And then you see okay, that’s an S.

And that’s actually not the way that people went about the problem of handwriting recognition. The way that they did it was to develop machine learning systems that could take images of characters that were labeled and then train a recognizer that could recognize new instances of characters. And in fact, Yann LeCun at AT&T was doing a system that was designed to recognize words and figures on checks. So very, very, very application specific, very tooley and order to do that he invented convolutional neural networks. Which is what we now call deep learning.

So, out of this very, very narrow piece of tool AI came this very, very general technique. Which has solved or largely solved object recognition, speech recognition, machine translation, and some people argue will produce general purpose AI. So I don’t think there’s any safety to be found in focusing on tool AI.

The second point is that people feel that somehow to tool AI is not an agent. So an agent meaning a system that you can think of as perceiving the world and then taking actions. And again, I’m not sure that’s really true. So a Go program is an agent. It’s an agent that operates in a small world, namely the Go board, but it perceives the board, the move that’s made and it takes action.

It chooses what to do next in many applications like this, this is the really the only way to build an effective tool is that it should be an agent. If it’s a little vacuum cleaning robot or lawn mowing robot, certainly a domestic robot that’s supposed to keep your house clean and look after the dog while you’re out. There’s simply no way to build those kinds of systems except as agents and as we improve the capabilities of these systems, whether it’s for perception or planning and behaving in the real physical world. We’re effectively going to be creating general purpose intelligent agents. I don’t really see salvation in the idea that we’re just going to build applications specific tools.

Lucas: So that helps to clarify that tool AI do not get around this update that you’re trying to do with regards to the standard model. So pivoting back to intentions surrounding the book, if you could distill the core message or the central objective in writing this book, how would you say that?

Stuart: The objective is to reorient the field of AI so that in future we build systems using an approach that doesn’t present the same risk as a standard model. I’m addressing multiple audiences. That’s the message I think for the AI community is the first phase our existence maybe should come to an end and we need to move on to this other way of doing things. Because it’s the only way that works as machines become more intelligent. We can’t afford to stick with the standard model because as I said, systems with the wrong objective could have arbitrarily bad consequences.

Then the other audience is the general public, people who are interested in policy, how things are going to unfold in future and technology and so on. For them, I think it’s important to actually understand more about AI rather than just thinking of AI as this kind of magic juice that triples the value of your startup company. It’s a collection of technologies and those technologies have been built within a framework, the standard model that has been very useful and is shared with these other fields, economic, statistics, operations of search, control theory. But that model does not work as we move forward and we’re already seeing places where the failure of the model is having serious negative consequences.

One example would be what’s happened with social media. So social media algorithms, content selection algorithms are designed to show you stuff or recommend stuff in order to maximize click-through. Clicking is what generates revenue for the social media platforms. And so that’s what they tried to do and I almost said they want to show you stuff that you will click on. And that’s what you might think is the right solution to that problem, right? If you want to maximize, click-through, then show people stuff they want to click on and that sounds relatively harmless.

Although people have argued that this creates a filter bubble or a little echo chamber where you only see stuff that you like and you don’t see anything outside of your comfort zone. That’s true. It might tend to cause your interests to become narrower, but actually that isn’t really what happened and that’s not what the algorithms are doing. The algorithms are not trying to show you the stuff you like. They’re trying to turn you into predictable clickers. They seem to have figured out that they can do that by gradually modifying your preferences and they can do that by feeding you material. That’s basically, if you think of a spectrum of preferences, it’s to one side or the other because they want to drive you to an extreme. At the extremes of the political spectrum or the ecological spectrum or whatever image you want to look at. You’re apparently a more predictable clicker and so they can monetize you more effectively.

So this is just a consequence of reinforcement learning algorithms that optimize click-through. And in retrospect, we now understand that optimizing click-through was a mistake. That was the wrong objective. But you know, it’s kind of too late and in fact it’s still going on and we can’t undo it. We can’t switch off these systems because there’s so tied in to our everyday lives and there’s so much economic incentive to keep them going.

So I want people in general to kind of understand what is the effect of operating these narrow optimizing systems that pursue these fixed and incorrect objectives. The effect of those on our world is already pretty big. Some people argue that operation’s pursuing the maximization of profit have the same property. They’re kind of like AI systems. They’re kind of super intelligent because they think over long time scales, they have massive information, resources and so on. They happen to have human components, but when you put a couple of hundred thousand humans together into one of these corporations, they kind of have this super intelligent understanding, manipulation capabilities and so on.

Lucas: This is a powerful and important update for research communities. I want to focus here in a little bit on the core messages of the book as per each audience because I think you can say and clarify different things for different people. So for example, my impressions are that for sort of laypersons who are not AI researchers, the history of ideas that you give clarifies the foundations of many fields and how it has led up to this AI alignment problem. As you move through and past single agent cases to multiple agent cases where we give rise to game theory and decision theory and how that all affects AI alignment.

So for laypersons, I think this book is critical for showing the problem, demystifying it, making it simple, and giving the foundational and core concepts for which human beings need to exist in this world today. And to operate in a world where AI is ever becoming a more important thing.

And then for the research community, as you just discussed, it seems like this rejection of the standard model and this clear identification of systems with exogenous objectives that are sort of singular and lack context and nuance. That when these things optimize for their objectives, they run over a ton of other things that we care about. And so we have to shift from this understanding where the objective is something inside of the exogenous system to something that the system is uncertain about and which actually exists inside of the person.

And I think the last thing that I sort of saw was for people who are not AI researchers, it says, here’s this AI alignment problem. It is deeply interdependent and difficult. It requires economists and sociologists and moral philosophers. And for this reason too, it is important for you to join in to help. Do you have anything here you’d like to hit on or expand on or anything I might’ve gotten wrong?

Stuart: I think that’s basically right. One thing that I probably should clarify, and it comes maybe from the phrase value alignment. The goal is not to build machines whose values are identical to those of humans. In other words, it’s not to just put in the right objective because I actually believe that that’s just fundamentally impossible to do that. Partly because humans actually don’t know their own preference structure. There’s lots of things that we might have a future positive or a negative reaction to that we don’t yet know, lots of foods that we haven’t yet tried. And in the book I give the example of the durian fruit, which some people really love and some people find utterly disgusting, and I don’t know which I am because I’ve never tried it. So I’m genuinely uncertain about my own preference structure.

It’s really not going to be possible for machines to be built with the right objective built in. They have to know that they don’t know what the objective is. And it’s that uncertainty that creates this deferential behavior. It becomes rational for that machine to ask permission and to allow itself to be switched off, which as I said, are things that a standard model machine would never do.

The reason why psychology, economics, moral philosophy become absolutely central, is that these fields have studied questions of human preferences, human motivation, and also the fundamental question which machines are going to face, of how do you act on behalf of more than one person? The version of the problem where there’s one machine and one human is relatively constrained and relatively straightforward to solve, but when you get one machine and many humans or many machines and many humans, then all kinds of complications come in, which social scientists have studied for centuries. That’s why they do it, because there’s more than one person.

And psychology comes in because the process whereby the machine is going to learn about human preferences requires that there be some connection between those preferences and the behavior that humans exhibit, because the inverse reinforcement learning process involves observing the behavior and figuring out what are the underlying preferences that would explain that behavior, and then how can I help the human with those preferences.

Humans, surprise, surprise, are not perfectly rational. If they were perfectly rational, we wouldn’t need to worry about psychology; we would do all this just with mathematics. But the connection between human preferences and human behavior is extremely complex. It’s mediated by our whole cognitive structure, and is subject to lots of deviations from perfect rationality. One of the deviations is that we are simply unable, despite our best efforts, to calculate what is the right thing to do given our preferences.

Lee Sedol, I’m pretty sure wanted to win the games of Go that he was playing against AlphaGo, but he wasn’t able to, because he couldn’t calculate the winning move. And so if you observe his behavior and you assume that he’s perfectly rational, the only explanation is that he wanted to lose, because that’s what he did. He made losing moves. But actually that would be obviously a mistake.

So we have to interpret his behavior in the light of his cognitive limitations. That becomes then a matter of empirical psychology. What are the cognitive limitations of humans, and how do they manifest themselves in the kind of imperfect decisions that we make? And then there’s other deviations from rationality. We’re myopic, we suffer from weakness of will. We know that we ought to do this, that this is the right thing to do, but we do something else. And we’re emotional. We do things driven by our emotional subsystems, when we lose our temper for example, that we later regret and say, “I wish I hadn’t done that.”

 All of this is really important for us to understand going forward, if we want to build machines that can accurately interpret human behavior as evidence for underlying human preferences.

Lucas: You’ve touched on inverse reinforcement learning in terms of human behavior. Stuart Armstrong was on the other week, and I believe his claim was that you can’t infer anything about behavior without making assumptions about rationality and vice versa. So there’s sort of an incompleteness there. I’m just pushing here and wondering more about the value of human speech, about what our revealed preferences might be, how this fits in with your book and narrative, as well as furthering neuroscience and psychology, and how all of these things can decrease uncertainty over human preferences for the AI.

Stuart: That’s a complicated set of questions. I agree with Stuart Armstrong that humans are not perfectly rational. I’ve in fact written an entire book about that. But I don’t agree that it’s fundamentally impossible to recover information about preferences from human behavior. Let me give the kind of straw man argument. So let’s take Gary Kasparov: chess player, was world champion in the 1990s, some people would argue the strongest chess player in history. You might think it’s obvious that he wanted to win the games that he played. And when he did win, he was smiling, jumping up and down, shaking his fists in triumph. And when he lost, he behaved in a very depressed way, he was angry with himself and so on.

Now it’s entirely possible logically that in fact he wanted to lose every single game that he played, but his decision making was so far from rational that even though he wanted to lose, he kept playing the best possible move. So he’s got this completely reversed set of goals and a completely reversed decision making process. So it looks on the outside as if he’s trying to win and he’s happy when he wins. But in fact, he’s trying to lose and he’s unhappy when he wins, but his attempt to appear unhappy again is reversed. So it looks on the outside like he’s really happy because he keeps doing the wrong things, so to speak.

This is an old idea in philosophy. Donald Davidson calls it radical interpretation: that from the outside, you can sort of flip all the bits and come up with an explanation that’s sort the complete reverse of what any reasonable person would think the explanation to be. The problem with that approach is that it then takes away the meaning of the word “preference” altogether. For example, let’s take the situation where Kasparov can checkmate his opponent in one move, and it’s blatantly obvious and in fact, he’s taken a whole sequence of moves to get to that situation.

If in all such cases where there’s an obvious way to achieve the objective, he simply does something different, in other words, let’s say he resigns, so whenever he’s in a position with an obvious immediate win, he instantly resigns, then in what sense is it meaningful to say that Kasparov actually wants to win the game if he always resigns whenever he has a chance of winning?

You simply vitiate the entire meaning of the word “preference”. It’s just not correct to say that a person who always resigns whenever they have a chance of winning really wants to win games. You can then kind of work back from there. So by observing human behavior in situations where the decision is kind of an obvious one that doesn’t require a huge amount of calculation, then it’s reasonable to assume that the preferences are the ones that they reveal by choosing the obvious action. If you offer someone a lump of coal or a $1,000 bill and they choose a $1,000 bill, it’s unreasonable to say, “Oh, they really prefer the lump of coal, but they’re just really stupid, so they keep choosing the $1,000 dollar bill.” That would just be daft. So in fact it’s quite natural that we’re able to gradually infer the preferences of imperfect entities, but we have to make some assumptions that we might call minimal rationality, which is that in cases where the choice is obvious, people will generally tend to make the obvious choice.

Lucas: I want to be careful here about not misrepresenting any of Stuart Armstrong’s ideas. I think this is in relation to the work Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents, if you’re familiar with that?

Stuart: Yeah.

Lucas: So then everything you said still suffices. Is that the case?

Stuart: I don’t think we radically disagree. I think maybe it’s a matter of emphasis. How important is it to observe the fact that there is this possibility of radical interpretation? It doesn’t worry me. Maybe it worries him, but it doesn’t worry me because we do a reasonably good job of inferring each other’s preferences all the time by just ascribing at least a minimum amount of rationality in human decision making behavior.

This is why economists, the way they try to elicit preferences, is by offering you direct choices. They say, “Here’s two pizzas. Are you going to have a bubblegum and pineapple pizza, or you can have ham and cheese pizza. Which one would you like?” And if you choose the ham and cheese pizza, they’ll infer that you prefer the ham and cheese pizza, and not the bubblegum and pineapple one, as seems pretty reasonable.

There may be real cases where there is genuine ambiguity about what’s driving human behavior. I am certainly not pretending that human cognition is no mystery; it still is largely a mystery. And I think for the long term, it’s going to be really important to try to unpack some of that mystery. Horribly to me, the biggest deviation from rationality that humans exhibit is the fact that our choices are always made in the context of a whole hierarchy of commitments that effectively put us into what’s usually a much, much smaller decision-making situation than the real problem. So the real problem is I’m alive, I’m in this enormous world, I’m going to live for a few more decades hopefully, and then my descendants will live for years after that and lots of other people on the world will live for a long time. So which actions do I do now?

And I could do anything. I could continue talking to you and recording this podcast. I could take out my phone and start trading stocks. I could go out on the street and start protesting climate change. I could set fire to the building and claim the insurance payment, and so on and so forth. I could do a gazillion things. Anything that’s logically possible I could do. And I continue to talk in the podcast because I’m existing in this whole network and hierarchy of commitments. I agreed that we would do the podcast, and why did I do that? Well, because you asked me, and because I’ve written the book and why did I write the book and so on.

So there’s a whole nested collection of commitments, and we do that because otherwise we couldn’t possibly manage to behave successfully in the real world at all. The real decision problem is not, what do I say next in this podcast? It’s what motor control commands do I send to my 600 odd muscles in order to optimize my payoff for the rest of time until the heat death of the universe? And that’s completely and utterly impossible to figure out.

I always, and we always, exist within what I think Savage called a small world decision problem. We are aware only of a small number of options. So if you want to understand human behavior, you have to understand what are the commitments and what is the hierarchy of activities in which that human is engaged. Because otherwise you might be wondering, well why isn’t Stuart taking out his phone and trading stocks? But that would be a silly thing to wonder. It’s reasonable to ask, well why is he answering the question that way and not the other way?

Lucas: And so “AI, please fetch the coffee,” also exists in such a hierarchy. And without the hierarchy, the request is missing much of the meaning that is required for the AI to successfully do the thing. So it’s like an inevitability that this hierarchy is required to do things that are meaningful for people.

Stuart: Yeah, I think that’s right. Requests are a very interesting special case of behavior, right? They’re just another kind of behavior. But up to now, we’ve interpreted them as defining the objective for the machine, which is clearly not the case. And people have recognized this for a long time. For example, my late colleague Bob Wilensky had a project called the Unix Consultant, which was a natural language system, and it was actually built as an agent, that would help you with Unix stuff, so managing files on your desktop and so on. You could ask it questions like, “Could you make some more space on my disk?”, and the system needs to know that RM*, which means “remove all files”, is probably not the right thing to do, that this request to make space on the disk is actually part of a larger plan that the user might have. And for that plan, most of the other files are required.

So a more appropriate response would be, “I found these backup files that have already been deleted. Should I empty them from the trash?”, or whatever it might be. So in almost no circumstances would a request be taken literally as defining the sole objective. If you asked for a cup of coffee, what happens if there’s no coffee? Perhaps it’s reasonable to bring a cup of tea or “Would you like a can of Coke instead?”, and not to … I think in the book I had the example that you stop at a gas station in the middle of the desert, 250 miles from the nearest town and they haven’t got any coffee. The right thing to do is not to trundle off across the desert and come back 10 days later with coffee from a nearby town. But instead to ask, well, “There isn’t any coffee. Would you like some tea or some Coca-Cola instead?”

 This is very natural for humans and in philosophy of language, my other late colleague Paul Grice, was famous for pointing out that many statements, questions, requests, commands in language have this characteristic that they don’t really mean what they say. I mean, we all understand if someone says, “Can you pass the salt?”, the correct answer is not, “Yes, I am physically able to pass the salt.” He became an adjective, right? So we talk about Gricean analysis, where you don’t take the meaning literally, but you look at the context in which it was said and the motivations of the speaker and so on to infer what is a reasonable course of action when you hear that request.

Lucas: You’ve done a wonderful job so far painting the picture of the AI alignment problem and the solution for which you offer, at least the pivoting which you’d like the community to take. So for laypersons who might not be involved or experts in AI research, plus the AI alignment community, plus potential researchers who might be brought in by this process or book, plus policymakers who may also listen to it, what’s at stake here? Why does this matter?

Stuart: I think AI, for most of its history, has been an interesting curiosity. It’s a fascinating problem, but as a technology it was woefully lacking. And it has found various niches where it’s useful, even before the current incarnation in terms of deep learning. But if we assume that progress will continue and that we will create machines with general purpose intelligence, that would be roughly speaking, the biggest event in human history.

History, our civilization, is just a consequence of the fact that we have intelligence, and if we had a lot more, it would be a radical step change in our civilization. If these were possible at all, it would enable other inventions that people have talked about as possibly the biggest event in human history, for example, creating the ability for people to live forever or much, much longer life span than we currently have, or creating the possibility for people to travel faster than light so that we could colonize the universe.

If those are possible, then they’re going to be much more possible with the help of AI. If there’s a solution to climate change, it’s going to be much more possible to solve climate change with the help of AI. It’s this fact that AI in the form of general purpose intelligence systems is this kind of über technology that makes it such a powerful development if and when it happens. So the upside is enormous. And then the downside is also enormous, because if you build things that are more intelligent than you, then you face this problem. You’ve made something that’s much more powerful than human beings, but somehow you’ve got to make sure that it never actually has any power. And that’s not completely obvious how to do that.

The last part of the book is a proposal for how we could do that, how you could change this notion of what we mean by an intelligent system so that rather than copying this sort of abstract human model, this idea of rationality, of decision making in the interest, in the pursuit of one’s own objectives, we have this other kind of system, this sort of coupled binary system where the machine is necessarily acting in the service of human preferences.

If we can do that, then we can reap the benefits of arbitrarily intelligent AI. Then as I said, the upside would be enormous. If we can’t do that, if we can’t solve this problem, then there are really two possibilities. One is that we need to curtail the development of artificial intelligence and for all the reasons that I just mentioned, it’s going to be very hard because the upside incentive is so enormous. It would be very hard to stop research and development in AI.

The third alternative is that we create general purpose, superhuman intelligent machines and we lose control of them, and they’re pursuing objectives that are ultimately mistaken objectives. There’s tons of science fiction stories that tell you what happens next, and none of them are desirable futures for the human race.

Lucas: Can you expand upon what you mean by if we’re successful in the control/alignment problem, what “tremendous” actually means? What actually are the conclusions or what is borne out of the process of generating an aligned super intelligence from that point on until heat death or whatever else?

Stuart: Assuming that we have a general purpose intelligence that is beneficial to humans, then you can think about it in two ways. I already mentioned the possibility that you’d be able to use that capability to solve problems that we find very difficult, such as eternal life, curing disease, solving the problem of climate change, solving the problem of faster than light travel and so on. You might think of these as sort of the science fiction-y upside benefits. But just in practical terms, when you think about the quality of life for most people on earth, let’s say it leaves something to be desired. And you say, “Okay, would be a reasonable aspiration?”, and put it somewhere like the 90th percentile in the US. That would mean a ten-fold increase in GDP for the world if you brought everyone on earth up to what we call a reasonably nice standard of living by Western standards.

General purpose AI can do that in the following way, without all these science fiction inventions and so on. So just deploying the technologies and materials and processes that we already have in ways that are much, much more efficient and obviously much, much less labor intensive.

The reason that things cost a lot and the reason that people in poor countries can’t afford them … They can’t build bridges or lay railroad tracks or build hospitals because they’re really, really expensive and they haven’t yet developed the productive capacities to produce goods that could pay for all those things. The reason things are really, really expensive is because they have a very long chain of production in which human effort is involved at every stage. The money all goes to pay all those humans, whether it’s the scientists and engineers who designed the MRI machine or the people who worked on the production line or the people who worked mining the metals that go into making the MRI machine.

All the money is really paying for human time. If machines are doing every stage of the production process, then you take all of those costs out, and to some extent it becomes like a digital newspaper, in the sense that you can have as much of it as you want. It’s almost free to make new copies of a digital newspaper, and it would become almost free to produce the material goods and services that constitute a good quality of life for people. And at that point, arguing about who has more of it is like arguing about who has more digital copies of the newspaper. It becomes sort of pointless.

That has two benefits. One is everyone is relatively much better off, assuming that we can get politics and economics out of the way, and also there’s then much less incentive for people to go around starting wars and killing each other, because there isn’t this struggle which has sort of characterized most of human history. The struggle for power, wealth and access to resources and so on. There are other reasons people kill each other, religion being one of them, but it certainly I think would help if this source of competition and warfare were removed.

Lucas: These are very important short-term considerations and benefits from getting this control problem and this alignment problem correct. One thing that the superintelligence will hopefully also do is reduce existential risk to zero, right?  And so if existential risk is reduced to zero, then basically what happens is the entire cosmic endowment, some hundreds of thousands of galaxies, become unlocked to us. Perhaps some fraction of it would have to be explored first in order to ensure existential risk is pretty close to zero. I find your arguments are pragmatic and helpful for the common person about why this is important.

For me personally, and why I’m passionate about AI alignment and existential risk issues, is that the reduction of existential risk to zero and having an aligned intelligence that’s capable of authentically spreading through the cosmic endowment, to me seems to potentially unlock a kind of transcendent object at the end of time, ultimately influenced by what we do here and now, which is directed and created by coming to better know what is good, and spreading that.

What I find so beautiful and important and meaningful about this problem in particular, and why anyone who’s reading your book, why it’s so important for them for core reading, and reading for laypersons, for computer scientists, for just everyone, is that if we get this right, this universe can be maybe one of the universes and perhaps the multiverse, where something like the most beautiful thing physically possible could be made by us within the laws of physics. And that to me is extremely awe-inspiring.

Stuart: I think that human beings being the way they are, will probably find more ways to get it wrong. We’ll need more solutions for those problems and perhaps AI will help us solve other existential risks, and perhaps it won’t. The control problem I think is very important. There are a couple of other issues that I think we still need to be concerned with. Well, I don’t think we need to be concerned with all of them, but a couple of issues that I haven’t begun to address or solve … One of those is obviously the problem of misuse, that we may find ways to build beneficial AI systems that remain under control in a mathematically guaranteed way. And that’s great. But the problem of making sure that only those kinds of systems are ever built and used, that’s a different problem. That’s a problem about human motivation and human behavior, which I don’t really have a good solution to. It’s sort of like the malware problem, except much, much, much, much worse. If we do go ahead developing general purpose intelligence systems that are beneficial and so on, then, parts of that technology, the general purpose intelligent capabilities could be put into systems that are not beneficial as it were, that don’t have a safety catch. And that misuse problem. If you look at how well we’re doing with malware, you’d have to say, more work needs to be done. We’re kind of totally failing to control malware and the ability of people to inflict damage on others by uncontrolled software that’s getting worse. We need an international response and a policing response. Some people argue that, oh, it’s fine. The super intelligent AI that we build will make sure that other nefarious development efforts are nipped in the bud.

This doesn’t make me particularly confident. So I think that’s an issue. The third issue is, shall we say enfeeblement. This notion that if we develop machines that are capable of running every aspect of our civilization, then that changes the dynamic that’s been in place since the beginning of human history or pre history. Which is that for our civilization to continue, we have had to pass on our knowledge and our skills to the next generation. That people have to learn what it is that the human race knows over and over again in every generation, just to keep things going. And if you add it all up, if you look, there’s about a hundred odd billion people who’ve ever lived and they spend each about 10 years learning stuff on average. So that’s a trillion person years of teaching and learning to keep our civilization going. And there’s a very good reason why we’ve done that because without it, things would fall apart very quickly.

But that’s going to change. Now. We don’t have to put it into the heads of the next generation of humans. We can put it into the heads of the machines and they can take care of the civilization. And then you get this almost irreversible process of enfeeblement, where humans no longer know how their own civilization functions. They lose knowledge of science, of engineering, even of the humanities of literature. If machines are writing books and producing movies, then we don’t even need to learn that. You see this in E. M. Forster’s story, The Machine Stops from 1909 which is a very prescient story about a civilization that becomes completely dependent on its own machines. Or if you like something more recent in WALL-E the human race is on a, sort of a cruise ship in space and they all become obese and stupid because the machines look after everything and all they do is consume and enjoy. And that’s not a future that I would want for the human race.

And arguably the machines should say, this is not the future you want, tie your shoelaces, but we are these, shortsighted. We may effectively override what the machines are telling us and say, “No, no, you have to tie my shoe laces for me.” So I think this is a problem that we have to think about. Again, this is a problem for infinity. Once you turn things over to the machines, it’s practically impossible, I think, to reverse that process, we have to keep our own human civilization going in perpetuity and that requires a kind of a cultural process that I don’t yet understand how it would work, exactly.

Because the effort involved in learning, let’s say going to medical school, it’s 15 years of school and then college and then medical school and then residency. It’s a huge effort. It’s a huge investment and at some point the incentive to undergo that process will disappear. And so something else other than… So at the moment it’s partly money, partly prestige, partly a desire to be someone who is in a position to help others. So somehow we got to make our culture capable of maintaining that process indefinitely when many of the incentive structures that have kept it in place go away.

Lucas: This makes me wonder and think about how from an evolutionary cosmological perspective, how this sort of transition from humans being the most intelligent form of life on this planet to machine intelligence being the most intelligent form of life. How that plays out in the very longterm. If we can do thought experiments where we imagine if monkeys had been actually creating humans and then had created humans, what the role of the monkey would still be.

Stuart: Yep. But we should not be creating the machine analog of humans, I.E. autonomous entities pursuing their own objectives. So we’ve pursued our objectives pretty much at the expense of the monkeys and the gorillas and we should not be producing machines that play an analogous role. That would be a really dumb thing to do.

Lucas: That’s an interesting comparison because the objectives of the human are exogenous to the monkey and that’s the key issue that you point out. If the monkey had been clever and had been able to control evolution, then they would have set the human uncertain as to the monkey’s preferences and then had him optimize those.

Stuart: Yeah, I mean they could imagine creating a race of humans that were intelligent but completely subservient to the interests of the monkeys. Assuming that they solved the enfeeblement problem and the misuse problem, then they’d be pretty happy with the way things turned out. I don’t see any real alternative. So Samuel Butler in 1863 wrote a book about a society that faces the problem of superintelligent machines and they take the other solution, which is actually to stop. They see no alternative but to just ban the construction of intelligent machines altogether. In fact, they ban all machines and in Frank Herbert’s Dune, the same thing. They have a catastrophic war in which humanity just survives in its conflict with intelligent machines. And then from then on, all intelligent machines, in fact, all computers are banned altogether. I can’t see that that’s a plausible direction, but it could be that we decide at some point that we cannot solve the control problem or we can’t solve the misuse problem or we can’t solve the enfeeblement problem.

And we decided that it’s in our best interests to just not go down this path at all. To me that just doesn’t feel like a possible direction. Things can change if we start to see bigger catastrophes. I think the click through catastrophe is already pretty big and it results from very, very simple minded algorithms that know nothing about human cognition or politics or anything else. They’re not even explicitly trying to manipulate us. It’s just, that’s what the code does in a very simple minded way. So we could imagine bigger catastrophes happening that we survived by the skin of our teeth as happened in Dune for example. And then that would change the way people think about the problem. And we see this over and over again with nuclear power, with fossil fuels and so on that by large technology is always seen as beneficial and more technology is therefore more beneficial.

And we pushed your head often ignoring the people who say “But, but, but what about this drawback? What about this drawback?” And maybe that starting to change with respect to fossil fuels. Several countries have now decided since Chernobyl and Fukushima to ban nuclear power, the EU has much stronger restriction on genetically modified foods than a lot of other countries, so there are pockets where people have pushed back against technological progress and said, “No, not all technology is good and not all uses of technology are good and so we need to exercise a choice.” But the benefits of AI are potentially so enormous. It’s going to take a lot to undo this forward progress.

Lucas: Yeah, absolutely. Whatever results from earth originating intelligent life at the end of time, that thing is up to us to create. I’m quoting you here, you say, “A compassionate and jubilant use of humanity’s cosmic endowment sounds wonderful, but we also have to reckon with the rapid rate of innovation in the malfeasance sector, ill intentioned people are thinking up new ways to misuse AI so quickly that this chapter is likely to be outdated even before an attains printed form. Think of it not as depressing reading. However, but as a call to act before it’s too late.”

Thinking about this and everything you just touched on. There’s obviously a ton for us to get right here that needs to be gotten right and it’s a question and problem for everyone in the human species to have a voice in.

Stuart: Yeah. I think we really need to start considering the possibility that there ought to be a law against it. For a long time the IT industry almost uniquely has operated in a completely unregulated way. The car industry for example, cars have to follow various kinds of design and safety rules. You have to have headlights and turn signals and brakes and so on. A car that’s designed in an unsafe way gets taken off the market, but software can do pretty much whatever it wants.

Every license agreement that you sign whenever you buy or use software tells you that it doesn’t matter what their software does. The manufacturer is not responsible for anything and so on. And I think it’s a good idea to actually take legislative steps, regulatory steps just to get comfortable with the idea that yes, I see we maybe do need regulation. San Francisco, for example, has banned the use of facial recognition in public or for policing. California has a ban on the impersonation of human beings by AI systems. I think that ban should be pretty much universal. But in California it’s primary area of applicability is in persuading people to vote in any particular direction in an election. So it’s a fairly narrow limitation. But when you think about it, why would you want to allow AI systems to impersonate human beings so that in other words, the human who’s in conversation, believes that if they’re talking to another human being, that they owe that other human being a whole raft of respect, politeness, all kinds of obligations that are involved in interacting with other humans.

But you don’t owe any of those things to an AI system. And so why should we allow people to effectively defraud humans by convincing them that in fact they’re engaged with another human when they aren’t? So I think it would be a good idea to just start things off with some basic common sense rules. I think the GDPR rule that says that you can’t use an algorithm to make a decision that has a significant legal effect on a person. So you can’t put them in jail simply as a result of an algorithm, for example. You can’t fire them from a job simply as a result of an algorithm. You can use the algorithm to advise, but a human has to be involved in the decision and the person has to be able to query the decision and ask for the reasons and in some sense have a right of appeal.

So these are common sense rules that almost everyone would agree with. And yet certainly in the U.S., there’s reluctance to put them into effect. And I think going forward, if we want to have safe AI systems, there’s at least going to be a role for regulations. There should also be standards as in I triple E standards. There should also be professional codes of conduct. People should be trained in how to recognize potentially unsafe designs for AI systems, but there should, I think, be a role for regulation where at some point you would say, if you want to put an AI system on the internet, for example, just as if you want to put software into the app store, it has to pass a whole bunch of checks to make sure that it’s safe to make sure that it won’t wreak havoc. So, we better start thinking about that. I don’t know yet what that regulation should say, but we shouldn’t be in principle opposed to the idea that such regulations might exist at some point.

Lucas: I basically agree that these regulations should be implemented today, but they seem pretty temporary or transient as the uncertainty in the AI system for the humans’ objective function or utility function decreases. So they become more certain about what we want. At some point it becomes unethical to have human beings governing these processes instead of AI systems. Right? So if we have timelines from AI researchers that range from 50 to a hundred years for AGI, we could potentially see laws and regulations like this go up in the next five to 10 and then disappear again somewhere within the next hundred to 150 years max.

Stuart: That’s an interesting viewpoint. And I think we have to be a little careful because autonomy is part of our preference structure. So although one might say, okay, know who gets to run the government? Well self, evidently it’s possible that machines could do a better job than the humans we currently have that would be better only in a narrow sense that maybe it would reduce crime, maybe it would increase economic output, we’d have better health outcomes, people would be more educated than they would with humans making those decisions, but there would be a dramatic loss in autonomy. And autonomy is a significant part of our preference structure. And so it isn’t necessarily the case that the right solution is that machines should be running the government. And this is something that the machines themselves will presumably recognize and this is the reason why parents at some point tell the child, “No, you have to tie your own shoe laces.” Because they want the child to develop autonomy.

The same thing will be true. The machines want humans to retain autonomy. As I said earlier, with respect to enfeeblement, right? It’s this conflict between our longterm best interest and our short term-ism in the choices that we tend to make. It’s always easier to say, “Oh no, I can’t be bothered at the time I shoelaces. Please could you do it?” But if you keep doing that, then the longterm consequences are bad. We have to understand how autonomy, which includes machines not making decisions, folds into our overall preference structure. And up to now there hasn’t been much of a choice, at least in the global sense. Of course it’s been humans making the decisions, although within any local context it’s only a subset of humans who are making the decisions and a lot of other people don’t have as much autonomy. To me, I think autonomy is a really important currency that to the extent possible, everyone should have as much of it as possible.

Lucas: I think you really hit the nail on the head. The problem is where autonomy fits in the hierarchy of our preferences and meta preferences. For me, it seems more instrumental than being an end goal in itself. Now this is an empirical question across all people where autonomy fits in their preference hierarchies and whether it’s like a terminal value or not, and whether under reflection and self idealization, our preferences distill into something else or not. Autonomy could possibly but not necessarily be an end goal. In so far as that it simply provides utility for all of our other goals. Because without autonomy we can’t act on what we think will best optimize our own preferences and end values. So definitely a lot of questions there. The structure of our preference hierarchy will certainly dictate, it seems, the longterm outcome of humanity and how enfeeblement unfolds.

Stuart: The danger would be that we misunderstand the entire nature of the human preference hierarchy. So sociologists and others have talked about the hierarchy of human needs in terms of food, shelter, physical security and so on. But they’ve always kind of assumed that you are a human being and therefore you’re the one deciding stuff. And so they tend not to think so much about fundamental properties of the ability to make your own choices for good or ill. And science fiction writers have had a field day with this. Pointing out that machines that do what you want are potentially disastrous because you lose the freedom of choice.

One could imagine that if we formulate things not quite right and the effect of the algorithms that we build is to make machines that don’t value autonomy in the right way or don’t have it folded into the overall preference structure in the right way, that we could end up with a subtle but gradual and very serious loss of autonomy in a way that we may not even notice as it happens. Like the slow boiling frog. If we could look ahead a hundred years and see how things turn out, he would say, “Oh my goodness, that is a terrible mistake”. We’re going to make sure that that doesn’t happen. So I think we need to be pretty careful. And again this is where we probably need the help of philosophers to make sure that we keep things straight and understand how these things fit together.

Lucas: Right, so seems like we simply don’t understand ourselves. We don’t know the hierarchy of our preferences. We don’t really know what preferences exactly are. Stuart Armstrong talks about how we haven’t figured out the symbol grounding problem. So there are issues with even understanding how preferences relate to one another ultimately and how the meaning there is born. And we’re building AI systems which will be more capable than us. Perhaps they will be conscious. You have a short subchapter I believe on that or at least on how you’re not going to talk about consciousness.

Stuart: Yeah. I have a paragraph saying I have nothing to say.

Lucas: So potentially these things will also be moral patients and we don’t know how to get them to do the things that we’re not entirely sure that we want them to do. So how would you differentiate this book from Superintelligence or Life 3.0 or other books on the AI alignment problem. And superintelligence in this space.

Stuart: I think the two major differences are one, I believe that to understand this whole set of issues or even just to understand what’s happening with AI and what’s going to happen, you have to understand something about AI. And I think that Superintelligence and Life 3.0 are to some extent, easier to grasp. If you already understand quite a bit about AI. And if you don’t, then it’s quite difficult to get as much out of those books as is in there. I think they are full of interesting points and ideas, but those points and ideas are easier to get out if you understand AI. So I wanted people to understand AI, understand, not just it as a technology, right? You could talk about how deep learning works, but that’s not the point. The point is really what is intelligence and how have we taken that qualitative understanding of what that means and turned it into this technical discipline where the standard model is machines that achieve fixed objectives.

And then the second major difference is that I’m proposing a solution for at least one of the big failure modes of AI. And as far as I can tell, that solution, I mean, it’s sort of mentioned in some ways in Superintelligence, I think the phrase there is normative uncertainty, but it has a slightly different connotation. And partly that’s because this approach of inverse reinforcement learning is something that we’ve actually worked on at Berkeley for a little over 20 years. It wasn’t invented for this purpose, but it happens to fit this purpose and then the approach of how we solve this problem is fleshed out in terms of understanding that it’s this coupled system between the human that has the preferences and the machine that’s trying to satisfy those preferences and doesn’t know what they are. So I think that part is different. That’s not really present in those other two books.

It certainly shares, I think the desire to convince people that this is a serious issue. I think both Superintelligence and Life 3.0 do a good job of that Superintelligence is sort of a bit more depressing. It’s such a good job of convincing you that things can go South, so many ways that you almost despair. Life 3.0 is a bit more cheerful. And also I think Life 3.0 does a good job of asking you what you want the outcome to be. And obviously you don’t want it to be catastrophic outcomes where we’re all placed in concrete coffins with heroin drips as Stuart Armstrong likes to put it.

But there are lots of other outcomes which are the ones you want. So I think that’s an interesting part of that book. And of course Max Tegmark, the author of Life 3.0 is a physicist. So he has lots of amazing stuff about the technologies of the future, which I don’t have so much. So those are the main differences. I think that wanting to convey the essence of intelligence, how that notion has developed, how is it really an integral part of our whole intellectual tradition and our technological society and how that model is fundamentally wrong and what’s the new model that we have to replace it with.

Lucas: Yeah, absolutely. I feel that you help to clarify intelligence for me, the history of intelligence from evolution up until modern computer science problems. I think that you really set the AI alignment problem up well resulting from there being intelligences and multi-agent scenarios, trying to do different things, and then you suggest a solution, which we’ve discussed here already. So thanks so much for coming on the podcast, Stuart, your book is set for release on October 8th?

Stuart: That’s correct.

Lucas: Great. We’ll include links for that in the description. Thanks so much for coming on.

 If you enjoyed this podcast, please subscribe. Give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI alignment series.

End of recorded material

FLI Podcast: Feeding Everyone in a Global Catastrophe with Dave Denkenberger & Joshua Pearce

Most of us working on catastrophic and existential threats focus on trying to prevent them — not on figuring out how to survive the aftermath. But what if, despite everyone’s best efforts, humanity does undergo such a catastrophe? This month’s podcast is all about what we can do in the present to ensure humanity’s survival in a future worst-case scenario. Ariel is joined by Dave Denkenberger and Joshua Pearce, co-authors of the book Feeding Everyone No Matter What, who explain what would constitute a catastrophic event, what it would take to feed the global population, and how their research could help address world hunger today. They also discuss infrastructural preparations, appropriate technology, and why it’s worth investing in these efforts.

Topics discussed include:

  • Causes of global catastrophe
  • Planning for catastrophic events
  • Getting governments onboard
  • Application to current crises
  • Alternative food sources
  • Historical precedence for societal collapse
  • Appropriate technology
  • Hardwired optimism
  • Surprising things that could save lives
  • Climate change and adaptation
  • Moral hazards
  • Why it’s in the best interest of the global wealthy to make food more available

References discussed include:

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Ariel Conn: In a world of people who worry about catastrophic threats to humanity, most efforts are geared toward preventing catastrophic threats. But what happens if something does go catastrophically wrong? How can we ensure that things don’t spiral out of control, but instead, humanity is set up to save as many lives as possible, and return to a stable, thriving state, as soon as possible? I’m Ariel Conn, and on this month’s episode of the FLI podcast, I’m speaking with Dave Denkenberger and Joshua Pearce.

Dave and Joshua want to make sure that if a catastrophic event occurs, then at the very least, all of the survivors around the planet will be able to continue eating. Dave got his Master’s from Princeton in mechanical and aerospace engineering, and his PhD from the University of Colorado at Boulder in building engineering. His dissertation was on his patented heat exchanger. He is an assistant professor at University of Alaska Fairbanks in mechanical engineering. He co-founded and directs the Alliance to Feed the Earth in Disasters, also known as ALLFED, and he donates half his income to that. He received the National Science Foundation Graduate Research Fellowship. He is a Penn State distinguished alumnus and he is a registered professional engineer. He has authored 56 publications with over 1600 citations and over 50,000 downloads — including the book Feeding Everyone No Matter What, which he co-authored with Joshua — and his work has been featured in over 20 countries, over 200 articles, including Science.

Joshua received his PhD in materials engineering from the Pennsylvania State University. He then developed the first sustainability program in the Pennsylvania State system of higher education and helped develop the Applied Sustainability Graduate Engineering Program while at Queens University Canada. He is currently the Richard Witte Professor of Materials Science and Engineering and a professor cross-appointed in the Department of Materials Science and Engineering, and he’s in the Department of Electrical and Computer Engineering at the Michigan Technological University where he runs the Open Sustainability Technology research group. He was a Fulbright-Aalto University Distinguished Chair last year and remains a visiting professor of photovoltaics and Nano-engineering at Aalto University. He’s also a visiting professor at the University of Lorraine in France. His research concentrates on the use of open source appropriate technology to find collaborative solutions to problems in sustainability and poverty reduction. He has authored over 250 publications, which have earned more than 11,000 citations. You can find his work on appropedia.org, and his research is regularly covered by the international and national press and continually ranks in the top 0.1% on academia.edu. He helped found the field of alternative food for global catastrophes with Dave, and again he was co-author on the book Feeding Everyone No Matter What.

So Dave and Joshua, thank you so much for joining us this month.

Dave Denkenberger: Thank you.

Joshua Pearce: Thank you for having us.

Ariel Conn: My first question for the two of you is a two-part question. First, why did you decide to consider how to survive a disaster rather — than focusing on prevention, as so many other people do? And second, how did you two start working together on this topic?

Joshua Pearce: So, I’ll take a first crack at this. Both of us have worked in the area of prevention, particularly in regards to alternative energy sources in order to be able to mitigate climate destabilization from fossil fuel burning. But what we both came to realize is that many of the disasters that we look at that could actually wipe out humanity aren’t things that we can necessarily do anything to avoid. The ones that we can do something about — climate change and nuclear winter — we’ve even worked together on it.

So for example, we did a study where we looked at how many nuclear weapons a state should have if they would continue to be rational. And by rational I mean even if everything were to go your way, if you shot all of your nuclear weapons, they all hit their targets, the people you were aiming at weren’t firing back at you, at what point would just the effects of firing that many weapons hurt your own society, possibly kill many of your own people, or destroy your own nation?

The answer to that turned out to be a really remarkably low number. The answer was 100. And many of the nuclear power states currently have more weapons than that. And so it’s clear at least from our current political system that we’re not behaving rationally and that there’s a real need to have a backup plan for humanity in case something does go wrong — whether it’s our fault, or whether it’s just something that happens in nature that we can’t control like a super volcano or an asteroid impact.

Dave Denkenberger: Even though there is more focus on preventing a catastrophe than there is on resilience to the catastrophe, overall the field is highly neglected. As someone pointed out, there are still more publications on dung beetles than there are on preventing or dealing with global catastrophic risks. But I would say that the particular sub-field of resilience to the catastrophes is even more neglected. That’s why I think it’s a high priority to investigate.

Joshua Pearce: We actually met way back as undergraduate students at Penn State. I was a chemistry and physics double major and one of my friends a year above said, “You have to take an engineering science class before you leave.” It changed his life. I signed up for this class taught by the man that eventually became my advisor, Christopher Wronski, and it was a brutal class — very difficult conceptually and mathematically. And I remember when one of my first tests came back, there was this bimodal distribution where there were two students who scored A’s and everybody else failed. Turned out that the two students were Dave and I, so we started working together then just on homework assignments, and then continued collaborating through all different areas of technical experiments and theory for years and years. And then Dave had this very interesting idea about what do we do in the event of a global catastrophe? How can we feed everybody? And to attack it as an engineering problem, rather than a social problem. We started working on it very aggressively.

Dave Denkenberger: So it’s been, I guess, 18 years now that we’ve been working together: a very fruitful collaboration.

Ariel Conn: Before I get any farther into the interview, let’s quickly define what a catastrophic event is and the types of catastrophic events that you both look at most.

Dave Denkenberger: The original focus was on the catastrophes that could collapse global agriculture. These would include nuclear winter from a full-scale nuclear war like US-Russia, causing burning of cities and blocking of the sun with smoke, but it could also mean a super volcanic eruption like the one that happened about 74,000 years ago that many think nearly wiped out the human species. And then there could also be a large asteroid impact similar to the one that wiped out the dinosaurs about 66 million years ago.

And in those cases, it’s very clear we need to have some other alternative source of food, but we also look at what I call the 10% global shortfalls. These are things like a volcano that caused the year without a summer in 1816, might have reduced food supply by about 10%, and caused widespread famine including in Europe and almost in the US. Then it could be a slightly smaller sized asteroid, or a regional nuclear war, and actually many other catastrophes such as a super weed, a plant that could out-compete crops. If this happened naturally, it probably would be slow enough that we could respond, but if it were part of a coordinated terrorist attack, that could be catastrophic. Even though technically we waste more than 10% of our food and we feed more than 10% of our food to animals, I think realistically, if we had a 10% food shortfall, the price of food would go so high that hundreds of millions of people could starve.

Joshua Pearce: Something that’s really important to understand about the way that we analyze these risks is that currently, even with the agricultural system completely working fine, we’ve got somewhere on the order of 800 million people without enough food to eat, because of waste and inefficiencies. And so anything that starts to cut into our ability for our agricultural system to continue, especially if all of plant life no longer works for a number of years because of the sun being blocked, we have to have some method to provide alternative foods to feed the bulk of the human population.

Ariel Conn: I think that ties in to the next question then, and that is what does it mean to feed everyone no matter what, as you say in the title of your book?

Dave Denkenberger: As Joshua pointed out, we are still not feeding everyone adequately right now. The idea of feeding everyone no matter what is an aspirational goal, and it’s showing that if we cooperated, we could actually feed everyone, even if the sun is blocked. Of course, it might not work out exactly like that, but we think that we can do much better than if we were not prepared for one of these catastrophes.

Joshua Pearce: Right. Today, roughly one in nine people go to bed hungry every night, and somewhere on the order of 25,000 people starve to death or die from hunger-related disease [per day]. And so one of the inspiring things from our initial analysis drawn up in the book is that even in the worst-case scenarios where something major happens, like a comet strike that would wipe out the dinosaurs, humans don’t need to be wiped out: We could provide for ourselves. And the embarrassing thing is that today, even with the agricultural system working fine, we’re not able to do that. And so what I’m at least hoping is that some of our work on these alternative foods provides another mechanism to provide low-cost calories for the people that need it, even today when there is no catastrophe.

Dave Denkenberger: One of the technologies that we think could be useful even now is there’s a company called Comet Bio that is turning agricultural residues like leaves and stalks into edible sugar, and they think that’s actually going to be able to compete with sugar cane. It has the advantage of not taking up lots of land that we might be cutting the rainforest down for, so it has environmental benefits as well as humanitarian benefits. Another area that I think would be relevant is in smaller disasters, such as an earthquake or a hurricane, generally the cheapest solution is just shipping in grain from outside, but if transportation is disrupted, it might make sense to be able to produce some food locally — like if a hurricane blows all the crops down and you’re not going to be able to get any normal harvest from them, you can actually grind up those leaves, like from wheat leaves, and squeeze out the liquid, boil the liquid, and then you get a protein concentrate, and people can eat that.

Ariel Conn: So that’s definitely a question that I had, and that is to what extent can we start implementing some of the plans today during a disaster? This is a pre-recorded podcast; Dorian has just struck the Bahamas. Can the stuff that you are working on now help people who are still stuck on an island after it’s been ravaged by a hurricane?

Dave Denkenberger: I think there is potential for that, the getting food from leaves. There’s actually a non-profit organization called Leaf for Life that has been doing this in less developed countries for decades now. Some other possibilities would be some mushrooms can mature in just a few weeks, and they can grow on waste, basically.

Joshua Pearce: The ones that would be good for an immediate catastrophe are the in between food that we’re working on: between the time that you run out of stored food and the time that you can ramp up the full scale, alternative foods.

Ariel Conn: Can you elaborate on that a little bit more and explain what that process would look like? What does happen between when the disaster strikes? And what does it look like to start ramping up food development in a couple weeks or a couple months or however long that takes?

Joshua Pearce: In the book we develop 10 primary pathways to develop alternative food sources that could feed the entire global population. But the big challenge for that is it’s not just are there enough calories — but you have to have enough calories at the right time.

If, say, a comet strikes tomorrow and throws up a huge amount of earth and ash and covers the sun, we’d have roughly six months of stored food in grocery stores and pantry that we could use to eat. But then for most of the major sources of alternative food, it would take around a year to ramp them up, to take these processes that might not even exist now and get them to industrial scale to feed billions of people. So the most challenging is that six-month-to-one-year period, and for those we would be using the alternative foods that Dave talked about, the mushrooms that can grow really fast and leaves. And the leaf one, part of those leaves can come from agricultural residues, things that we already know are safe.

The much larger biomass that we might be able to use is just normal killed tree leaves. The only problem with that is that there hasn’t been really any research into whether or not that’s safe. We don’t know, for example, if you can eat maple or oak leaf concentrate. The studies haven’t been done yet. And that’s one of the areas that we’re really focusing on now, is to take some of these ideas that are promising and prove that they’re actually technically feasible and safe for people to use in the event of a serious catastrophe, a minor one, or just being able to feed people that for whatever reason don’t have enough food.

Dave Denkenberger: I would add that even though we might have six months of stored food, that would be a best-case scenario when we’ve just had the harvest in the northern hemisphere; We could only have two or three months of stored food. But in many of these catastrophes, even a pretty severe nuclear winter, there’s likely to be some sunlight still coming down to the earth, and so a recent project we’ve been working on is growing seaweed. This has a lot of advantages because seaweed can tolerate low light levels, the ocean would not cool as fast as on the land, and it grows very quickly. So we’ve actually been applying seaweed growth models to the conditions of nuclear winter.

Ariel Conn: You talk about the food that we have stored being able to last for two to six months. How much transportation is involved in that? And how much transportation would we have, given different scenarios? I’ve heard that the town I’m in now, if it gets blocked off by a big snow storm, we have about two weeks of food. So I’m curious: How does that apply elsewhere? And are we worried about transportation being cut off, or do we think that transportation will still be possible?

Dave Denkenberger: Certainly there will be destruction of infrastructure regionally, whether it’s nuclear war or a super volcano or asteroid impact. So in those affected countries, transportation of food is going to be very challenging, but most of the people would not be in those countries. That’s why we think that there’s still going to be a lot of infrastructure still functioning. There are still going to be chemical factories that we can retrofit to turn leaves into sugar, or another one of the technologies is turning natural gas into single-cell protein.

Ariel Conn: There’s the issue of developing agriculture if the sun is blocked, which is one of the things that you guys are working on, and that can happen with nuclear war leading to nuclear winter; It can happen with the super volcano, with the asteroid. Let’s go a little more in depth and into what happens with these catastrophic events that block the sun. What happens with them? Why are they so devastating?

Joshua Pearce: All the past literature on what would happen if, say, we lost agriculture for a number of years, is all pretty grim. The base assumption is that everyone would simply starve to death, and there might be some fighting before that happens. When you look at what would happen based on previous knowledge of generating food from traditional ways, those were the right answers. And so, what we’re calling catastrophic events not only deal with the most extreme ones, the sun-killing ideas, but also the maybe a little less tragic but still very detrimental to the agricultural system: so something like a planned number of terrorist events to wipe out the major bread baskets of the world. Again, for the same idea, is that you’re impacting the number of available calories that are available to the entire population, and our work is trying to ensure that we can still feed everyone.

Dave Denkenberger: We wrote a paper on if we had a scenario that chaos did not break out, but there was still trade between countries and sharing of information and a global price of food — in that case, with stored food, there might around 10% of people surviving. It could be much worse though. As Joshua pointed out, if the food were distributed equally, then everyone would starve. Also people have pointed out, well, in civilization, we have food storage, so some people could survive — but if there’s a loss of civilization through the catastrophe, and we have to go back to being hunter-gatherers, first, hunter gatherers that we still have now generally don’t have food storage, so they would not survive, but then there’s a recent book called The Secret of Our Success that argues that it might not be as easy as we think to go back to being hunter-gatherers.

So that is another failure mode where it could actually cause human extinction. But then even if we don’t have extinction, if we have a collapse of civilization, there are many reasons why we might not be able to recover civilization. We’ve had a stable climate for the last 10,000 years; That might not continue. We’ve already used up the easily accessible fossil fuels that we wouldn’t have to rebuild industrial civilization. Just thinking about the original definition of civilization, about being able to cooperate with people who are not related to you, like outside your tribe — maybe the trauma of the catastrophe could make the remaining humans less open to trusting people, and maybe we would not recover that civilization. And then I would say even if we don’t lose civilization, the trauma of the catastrophe could make other catastrophes more likely.

One people are concerned about is global totalitarianism. We’ve had totalitarian states in the past, but they’ve generally been out-competed by other, free-er societies. But if it were a global totalitarianism, then there would be no competition, and that might be a stable state that we could be stuck in. And then even if we don’t go that route, the trauma from the catastrophe could cause worse values that end up in artificial intelligence that could define our future. And I would say even on these catastrophes that are slightly less extreme, the 10% food shortfalls, we don’t know what would happen after that. Tensions would be high; This could end up in full-scale nuclear war, and then some of these really extreme scenarios occurring.

Ariel Conn: What’s the historical precedence that we’ve got to work with in terms of trying to figure out how humanity would respond?

Dave Denkenberger: There have been localized collapses of society, and Jared Diamond has cataloged a lot of these in his book Collapse, but you can argue that there have even been more global collapse scenarios. Jeffrey Ladish has been looking at some collapses historically, and some catastrophes — like the black death was very high mortality but did not result in a collapse of economic production in Europe; But other collapses actually have occurred. There’s enough uncertainty to say that collapse is possible and that we might not recover from it.

Ariel Conn: A lot of this is about food production, but I think you guys have also done work on instances in which maybe it’s easier to produce food but other resources have been destroyed. So for example, a solar flare, a solar storm knocks out our electric grid. How do we address that?

Joshua Pearce: In the event that a solar flare wipes out the electricity grid and most non-shielded electrical devices, that would be another scenario where we might legitimately lose civilization. There’s been a lot of work in the electrical engineering community on how we might shield things and harden them, but one of the things that we can absolutely do, at least on the electricity side, is start to go from our centralized grid infrastructure into a more decentralized method of producing and consuming electricity. The idea here would be that the grid would break down into a federation of micro-grids, and the micro-grids could be as small as even your own house, where you, say, have solar panels on your roof producing electricity that would charge a small battery, and then when those two sources of power don’t provide enough, you have a backup generator, a co-generation system.

And a lot of the work my group has done has shown that in the United States, those types of systems are already economic. Pretty much everywhere in the US now, if you have exposure to sunshine, you can produce electricity less expensively than you buy it from the grid. If you add in the backup generator, the backup co-gen — in many places, particularly in the northern part of the US, that’s necessary in order to provide yourself with power — that again makes you more secure. And in the event of some of these catastrophes that we’re looking at, now the ones that block the sun, the solar won’t be particularly useful, but what solar does do is preserve our fossil fuels for use in the event of a catastrophe. And if you are truly insular, in that you’re able to produce all of your own power, then you have a backup generator of some kind and fuel storage onsite.

In the context of providing some resiliency for the overall civilization, many of the technical paths that we’re on now, at least electrically, are moving us in that direction anyway. Solar and wind power are both the fastest growing sources of electricity generation both in the US and globally, and their costs now are so competitive that we’re seeing that accelerate much faster than anyone predicted.

Dave Denkenberger: It is true that a solar flare would generally only affect the large grid systems. In 1859 there was the Carrington event that basically destroyed our telegraph systems, which was all we had at the time. But then we also had a near miss with a solar flare in 2012, so the world almost did end in 2012. But then there’s evidence that in the first millennium AD that there were even larger solar storms that could disrupt electricity globally. But there are other ways that electricity could be disrupted. One of those is the high altitude detonation of a nuclear weapon, producing an electromagnetic pulse or an EMP. If this were done multiple places around the world, that could disrupt electricity globally, and the problem with that is it could affect even smaller systems. Then there’s also the coordinated cyber attack, which could be led by a narrow artificial intelligence computer virus, and then anything connected to the internet would be vulnerable, basically.

In these scenarios, at least the sun would still be shining. But we wouldn’t have our tractors, because basically everything is dependent on electricity, like pulling fossil fuels out of the ground, and we also wouldn’t have our industrial fertilizers. And so the assumption is as well that most people would die, because the reason we can feed more than seven billion people is because of the industry we’ve developed. People have also talked about, well, let’s harden the grid to EMP, but that would cost something like $100 billion.

So what we’ve been looking at are, what are inexpensive ways of getting prepared if there is a loss of electricity? One of those is can we make quickly farming implements that would work by hand or by animal power? And even though a very small percent of our total land area is being plowed by draft animals, we still actually have a lot of cows left for food, not for draft animals. It would actually be feasible to do that. 

But if we lost electricity, we’d lose communications. We have a short wave radio, or ham radio, expert on our team who’s been doing this for 58 years, and he’s estimated that for something like five million dollars, we could actually have a backup communication system, and then we would also need to have a backup power system, which would likely be solar cells. But we would need to have this system not plugged into the grid, because if it’s plugged in, it would likely get destroyed by the EMP.

Joshua Pearce: And this gets into that area of appropriate technology and open source appropriate technology that we’ve done a lot of work on. And the idea basically is that the plans for something like a solar powered ham radio station that would be used as a backup communication system, those plans need to be developed now and shared globally so that everyone, no matter where they happen to be, can start to implement these basic safety precautions now. We’re trying to do that for all the tools that we’re implementing, sharing them on sites like Appropedia.org, which is an appropriate technology wiki that already is trying to help small-scale farmers in the developing world now lift themselves out of poverty by applying science and technologies that we already know about that are generally small-scale, low-cost, and not terribly sophisticated. And so there’s many things as an overall global society that we understand much better how to do now that if you just share a little bit of information in the right way, you can help people — both today but also in the event of a catastrophe.

Dave Denkenberger: And I think that’s critical: that if one of these catastrophes happened and people realized that most people were going to die, I’m very worried that there would be chaos, potentially within countries, and then also between countries. But if people realized that we could actually feed everyone if we cooperated, then I think we have a much better chance of cooperating, so you could think of this actually as a peace project.

Ariel Conn: One of the criticisms that I’ve heard, that honestly I think it’s a little strange, but the idea that we don’t need to deal with worrying about alternative foods now because if a catastrophe strikes, then we’ll be motivated to develop these alternative food systems.

I was curious if you guys have estimates of how much of a time difference you think would exist between us having a plan for how we would feed people if these disasters do strike versus us realizing the disaster has struck and now we need to figure something out, and how long it would take us to figure something out? That second part of the question is both in situations where people are cooperating and also in situations where people are not cooperating.

Dave Denkenberger: I think that if you don’t have chaos, the big problem is that yes, people would be able to put lots of money into developing food sources, but there are some things that take a certain amount of calendar time, like testing out different diets for animals or building pilot factories for food production. You generally need to test these things out before you build the large factories. I don’t have a quantitative estimate, but I do think it would delay by many months; And as we said, we only have a few months of food storage, so I do think that a delay would cost many lives and could result in the collapse of civilization that could have been prevented if we were actually prepared ahead of time.

Joshua Pearce: I think the boy scouts are right on this. You should always be prepared. If you think about just something like the number of types of leaves that would need to be tested, if we get a head start on it in order to determine toxicity as well as the nutrients that could come from them, we’ll be much, much better off in the event of a catastrophe — whether or not we’re working together. And in the cases where we’re not working together, to have this knowledge that’s built up within the population and spread out, makes it much more likely that overall humanity will survive.

Ariel Conn: What, roughly, does it cost to plan ahead: to do this research and to get systems and organization in place so that we can feed people if a disaster strikes?

Dave Denkenberger: Around order of magnitude $100 million. We think that that would fund a lot of research to figure out what are the most promising food sources, and also interventions for handling the loss of electricity and industry, and then also doing development of the most promising food sources, actual pilot scale, and funding a backup communications system, and then also working with countries, corporations, international organizations to actually have response plans for how we would respond quickly in a catastrophe. It’s really a very small amount of money compared to the benefit, in terms of how many lives we could save and preserving civilization.

Joshua Pearce: All this money doesn’t have to come at once, and some of the issues of alternative foods are being funded in other ways. There already are, for example, chemical engineering plants being looked at to be turned into food supply factories. That work is already ongoing. What Dave is talking about is combining all the efforts that are already existing and what ALLFED is trying to do, in order to be able to provide a very good, solid backup plan for society.

Ariel Conn: So Joshua, you mentioned ALLFED, and I think now is a good time to transition to that. Can you guys explain what ALLFED is?

Dave Denkenberger: The Alliance to Feed the Earth in Disasters, or ALLFED, is a non-profit organization that I helped to co-found, and our goal is to build an alliance with interested stakeholders to do this research on alternate food sources, develop the sources, and then also develop these response plans.

Ariel Conn: I’ll also add a quick disclosure that I also do work with ALLFED, so I don’t know if people will care, but there that is. So what are some of the challenges you’ve faced so far in trying to implement these solutions?

Dave Denkenberger: I would say a big challenge, a surprise that came to me, is that when we’ve started talking to international organizations and countries, no one appears to have a plan for what would happen. Of course you hear about the continuity of government plans, and bunkers, but there doesn’t seem to be a plan for actually keeping most people alive. And this doesn’t apply just to the sun-blocking catastrophes; It also applies to the 10% shortfalls.

There was a UK government study that estimated that extreme weather on multiple continents, like flooding and droughts, has something like an 80% chance of happening this century that would actually reduce the food supply by 10%. And yet no one has a plan of how they would react. It’s been a challenge for people to actually take this seriously.

Joshua Pearce: I think that goes back to the devaluation of human life, where we’re not taking seriously the thousands of people that, say, starve to death today and we’re not actively trying to solve that problem when from a financial standpoint, it’s trivial based on the total economic output of the globe; From a technical standpoint, it’s ridiculously easy; But we don’t have the social infrastructure in place in order to just be able to feed everyone now and be able to meet the basic needs of humanity. What we’re proposing is to prepare for a catastrophe in order to be able to feed everybody: That actually is pretty radical.

Initially, I think when we got started, overcoming the views that this was a radical departure for what the types of research that would normally be funded or anything like that — that was something that was challenging. But I think now existential risk just as a field is growing and maturing, and because many of the technologies in the alternative food sector that we’ve looked at have direct applications today, it’s being seen as less and less radical — although, in the popular media, for example, they’d be more happy for us to talk about how we could turn rotting wood into beetles and then eat beetles than to actually look at concrete plans in order to be able to implement it and do the research that needs to be done in order to make sure that that is the right path.

Ariel Conn: Do you think people also struggle with the idea that these disasters will even happen? That there’s that issue of people not being able to recognize the risks?

Joshua Pearce: It’s very hard to comprehend. You may have your family and your friends; It’s hard to imagine a really large catastrophe. But these have happened throughout history, both at the global scale but even just something like a world war has happened multiple times in the last century. We’re, I think, hardwired to be a little bit optimistic about these things, and no one wants to see any of this happen, but that doesn’t mean that it’s a good idea to put our head in the sand. And even though it’s a relatively low probability event, say the case of an all-out nuclear war, something on the order of one percent, it still is there. And as we’ve seen in recent history, even some of the countries that we think of as stable aren’t really necessarily stable.

And so currently we have thousands of nuclear warheads, and it only takes a tiny fraction of them in order to be able to push us into one of these global catastrophic scenarios. Whether that’s an accident or one crazy government actor or a legitimate small-scale war, say an India and a Pakistan that pull out the nuclear weapons, these are things that we should be preparing for.

In the beginning it was a little bit more difficult to have people consider them, but now it’s becoming more and more mainstream. Many of our publications and ALLFED publications and collaborators are pushing into the mainstream of the literature.

Dave Denkenberger: I would say even though the probability each year is relatively low, it certainly adds up over time, and we’re eventually going to have at least some natural disaster like a volcano. But people have said, “Well, it might not occur in my lifetime, so if I work on this or if I donate to it, my money might be wasted” — and I said, “Well, do you consider if you pay for insurance and don’t get anything out of it in a year, your money is wasted?” “No.” So basically I think of this as an insurance policy for civilization.

Ariel Conn: In your research, personally for you, what are some of the interesting things that you found that you think could actually save a lot of lives that you hadn’t expected?

Dave Denkenberger: I think one particularly promising one is the turning of natural gas into single-cell protein, and fortunately, there are actually two companies that are doing this right now. They are focusing on stranded natural gas, which means too far away from a market, and they’re actually producing this as fish food and other animal feed.

Joshua Pearce: For me, living up here in the upper peninsula of Michigan where we’re surrounded by trees, can’t help but look out my window and look at all the potential biomass that could actually be a food source. If it turns out that we can get even a small fraction of that into human edible food, I think that could really shift the balance in providing food, both now and in the case of a disaster.

Dave Denkenberger: One interesting thing coming to Alaska is I’ve learned about the Aleutian Islands that stick out into the pacific. They are very cloudy. It is so cool in the summer that they cannot even grow trees. They also don’t get very much rain. The conditions there are actually fairly similar to nuclear winter in the tropics; And yet, they can grow potatoes. So lately I’ve become more optimistic that we might be able to do some agriculture near the equator where it would not freeze, even in nuclear winter.

Ariel Conn: I want to switch gears a little bit. We’ve been talking about disasters that would be relatively immediate, but one of the threats that we’re trying to figure out how to deal with now is climate change. And I was wondering how efforts that you’re both putting into alternative foods could help as we try to figure out how to adapt to climate change.

Joshua Pearce: I think a lot of the work that we’re doing has a dual use. Because we are trying to squeeze every last calorie we could out of primarily fossil fuel sources and trees and leaves, that if by using those same techniques in the ongoing disaster of climate change, we can hopefully feed more people. And so that’s things like growing mushrooms on partially decomposed wood, eating the mushrooms, but then feeding the leftovers to, say, ruminants or chickens, and then eating those. There’s a lot of industrial ecology practices we can apply to the agricultural food system so that we can get every last calorie out of our primary inputs. So that I think is something we can focus on now and push forward regardless of the speed of the catastrophe.

Dave Denkenberger: I would also say that in addition to this extreme weather on multiple continents that is made more likely by climate change, there’s also abrupt climate change in the ice core record. We’ve had an 18 degree fahrenheit drop in just one decade over a continent. That could be another scenario of a 10% food shortfall globally. And another one people have talked about is what’s called extreme climate change that would still be slow. This is sometimes called tail risk, where we have this expected or median climate change of a few degrees celsius, but maybe there would be five or even 10 degrees celsius — so 18 degree fahrenheit — that could happen over a century or two. We might not be able to have agriculture at all in the tropics, so it would be very valuable to have some food backup plan for that.

Ariel Conn: I wanted to get into concerns about moral hazards with this research. I’ve heard some criticism that if you present a solution to, say, surviving nuclear winter that maybe people will think nuclear war is more feasible. How do you address concerns like that — that if we give people a means of not starving, they’ll do something stupid?

Dave Denkenberger: I think you’ve actually summarized this succinctly by saying, this would be like saying we shouldn’t have the jaws of life because that would cause people to drive recklessly. But the longer answer would be: there is evidence that the awareness of nuclear winter in the 80s was a reason that Gorbachev and Reagan worked towards reducing the nuclear stockpile. However, we still have enough nuclear weapons to potentially cause nuclear winter, and I doubt that the decision in the heat of the moment to go to nuclear war is actually going to take into account the non-target countries. I also think that there’s a significant cost of nuclear war directly, independent of nuclear winter. I would also say that this backup plan helps up with catastrophes that we don’t have control over, like a volcanic eruption. Overall, I think we’re much better off with a backup plan.

Joshua Pearce: I of course completely agree. It’s insane to not have a backup plan. The idea that the irrational behavior that’s currently displayed in any country with more than 100 nuclear weapons isn’t going to get worse because now they know that at a larger fraction their population won’t starve to death as they use them — I think that’s crazy.

Ariel Conn: As you’ve mentioned, there are quite a few governments — in fact, as far as I can tell, all governments don’t really have a backup plan. How surprised have you been by this? And also how optimistic are you that you can convince governments to start implementing some sort of plan to feed people if disaster happens?

Dave Denkenberger: As I said, I certainly have been surprised with the lack of plans. I think that as we develop the research further and are able to show examples of companies already doing very similar things, showing more detailed analysis of what current factories we have that could be retrofitted quickly to produce food — that’s actually an active area of research that we’re doing right now — then I am optimistic that governments will eventually come around to the value of planning for these catastrophes.

Joshua Pearce: I think it’s slightly depressing when you look around the globe and all the hundreds of countries, and how poorly most of them care for their own citizens. It’s sort of a commentary on how evolved or how much of a civilization we really are, so instead of comparing number of Olympic medals or how much economic output your country does, I think we should look at the poorest citizens in each country. And if you can’t feed the people that are in your country, you should be embarrassed to be a world leader. And for whatever reason, world leaders show their faces every day while their constituents, the citizens of their countries, are starving to death today, let alone in the event of a catastrophe.

If you look at the — I’ll call them the more civilized countries, and I’ve been spending some time in Europe, where rational, science-based approaches to governing are much more mature than what I’ve been used to. I think it gives me quite a bit of optimism as we take these ideas of sustainability and of long-term planning seriously, try to move civilization into a state where it’s not doing significant harm to the environment or to our own health or to the health and the environment in the future — that gives me a lot of cause for hope. Hopefully as all the different countries throughout the world mature and grow up as governments, they can start taking the health and welfare of their own populations much more seriously.

Dave Denkenberger: And I think that even though I’m personally very motivated about the long-term future of human civilization, I think that because what we’re proposing is so cost effective, even if an individual government doesn’t put very much weight on people outside its borders, or in future generations even within the country, it’s still cost effective. And we actually wrote a paper from the US perspective showing how cheaply they could get prepared and save so many lives just within their own borders.

Ariel Conn: What do you think is most important for people to understand about both ALLFED and the other research you’re doing? And is there anything, especially that you think we didn’t get into, that is important to mention?

Dave Denkenberger: I would say that thanks to recent grants from the Berkeley Existential Risk Initiative, the Effective Altruism Lottery, and the Center for Effective Altruism, that we’ve been able to do, especially this year, a lot of new research and, as I mentioned, retrofitting factories to produce food. We’re also looking at, can we construct factories quickly, like having construction crews work around the clock? Also investigating seaweed; But I would still say that there’s much more work to do, and we have been building our alliance, and we have many researchers and volunteers that are ready to do more work with additional funding, so we estimate in the next 12 months that we could effectively use approximately $1.5 million.

Joshua Pearce: A lot of the areas of research that are needed to provide a strong backup plan for humanity are relatively greenfield; This isn’t areas that people have done a lot of research in before. And so for other academics, maybe small companies that slightly overlap the alternative food ecosystem of intellectual pursuits, there’s a lot of opportunities for you to get involved, either in direct collaboration with ALLFED or just bringing these types of ideas into your own subfield. And so we’re always looking out for collaborators, and we’re happy to talk to anybody that’s interested in this area and would like to move the ball forward.

Dave Denkenberger: We have a list of theses that undergraduates or graduates could do on the website called Effective Thesis. We’ve gotten a number of volunteers through that.

I would also say another surprising thing to me was that when we were looking at these scenarios of if the world cooperated but only had stored food, the amount of money people would spend on that stored food was tremendous — something like $90 trillion. And that huge expenditure, only 10% of people survived. But instead if we could produce alternate foods, our goal is around a dollar a dry pound of food. One pound of dry food can feed a person for a day, then more like 97% of people would be able to afford food with their current incomes. And yet, even though we feed so many more people, the total expenditure on food was less. You could argue that even if you are in the global wealthy that could potentially survive one of these catastrophes if chaos didn’t break out, it would still be in your interest to get prepared for alternate foods, because you’d have to pay less money for your food.

Ariel Conn: And that’s all with a research funding request of 1.5 million? Is that correct?

Dave Denkenberger: The full plan is more like $100 million.

Joshua Pearce: It’s what we could use as the current team now, effectively.

Ariel Conn: Okay. Well, even the 100 million still seems reasonable.

Joshua Pearce: It’s still a bargain. One of the things we’ve been primarily assuming during all of our core scenarios is that there would be human cooperation, and that things would break down into fighting, but as we know historically, that’s an extremely optimistic way to look at it. And so even if you’re one of the global wealthy, in the top 10% globally in terms of financial means and capital, even if you would be able to feed yourself in one of these relatively modest reductions in overall agricultural supply, it is not realistic to assume that the poor people are just going to lay down and starve to death. They’re going to be storming your mansion. And so if you can provide them with food with a relatively low upfront capital investment, it makes a lot of sense, again, for you personally, because you’re not fighting them off at your door.

Dave Denkenberger: One other thing that surprised me was we did a real worst case scenario where the sun is mostly blocked, say by nuclear winter, but then we also had a loss of electricity and industry globally, say there were multiple EMPs around the world. And I, going into it, was not too optimistic that we’d be able to feed everyone. But we actually have a paper on it saying that it’s technically feasible, so I think it really comes down to getting prepared and having that message in the decision makers at the right time, such that they realize it’s in their interest to cooperate.

Another issue that surprised me: when we were writing the book, I thought about seaweed, but then I looked at how much seaweed for sushi cost, and it was just tremendously expensive per calorie, so I didn’t pursue it. But then I found out later that we actually produce a lot of seaweed at a reasonable price. And so now I think that we might be able to scale up that food source from seaweed in just a few months.

Ariel Conn: How quickly does seaweed grow, and how abundantly?

Dave Denkenberger: It depends on the species, but one species that is edible, we put into the scenario of nuclear winter, and one thing to note is that the ocean, as the upper layers cool, they sink, and then the lower layers of the ocean come to the surface, and that brings nutrients to the surface. We found in pretty big areas on Earth, in the ocean, that the seaweed could actually grow more than 10% per day. With that exponential growth, you quickly scale up to feeding a lot of people. Now of course we need to scale up the infrastructure, the ropes that it grows on, but that’s what we’re working out.

The other thing I would add is that in these catastrophes, if many people are starving, then I think not only will people not care about saving other species, but they may actively eat other species to extinction. And it turns out that feeding seven billion people is a lot more food than keeping, say, 500 individuals of many different species alive. And so I think we could actually use this to save a lot of species. And if it were a natural catastrophe, well some species would go extinct naturally — so maybe for the first time, humans could actually be increasing biodiversity.

Joshua Pearce: That’s a nice optimistic way to end this.

Ariel Conn: Yeah, that’s what I was just thinking. Anything else?

Dave Denkenberger: I think that’s it.

Joshua Pearce: We’re all good.

Ariel Conn: All right. This has been a really interesting conversation. Thank you so much for joining us.

Dave Denkenberger: Thank you.

Joshua Pearce: Thank you for having us.

 

Not Cool: A Climate Podcast

FLI is excited to announce the latest in our podcast line-up: Not Cool: A Climate Podcast! In this new series, hosted by Ariel Conn, we’ll hear directly from climate experts from around the world, as they answer every question we can think of about the climate crisis. And we’ve launched it just in time for the United Nations Climate Action Summit, which begins on September 23.

You can listen to the short trailer above that highlights what we’ll be covering in the coming months, or read the transcript below. And of course you can jump right in to the first episode — all podcasts for this series can be found at futureoflife.org/notcool. You can also always listen to all FLI podcasts on any of your favorite podcast platforms just by searching for “Future of Life Institute.” The Not Cool podcasts are all there, and we’ll be releasing new episodes every Tuesday and Thursday for at least the next couple of months. We hope these interviews will help you better understand the science and policies behind the climate crisis and what we can all do to prevent the worst effects of climate change.

We want to make sure we get your questions answered too! If you haven’t had a chance to fill out our survey about what you want to learn about climate change, please consider doing so now, and let us know what you’d like to learn.

Transcript

This is really the issue of our times, and our children and grandchildren will not forgive us if we don’t contain this problem.

~Jessica Troni, Senior Programme Officer, UN Environment-Global Environment Facility Climate Change Adaptation portfolio.

Climate change, to state the obvious, is a huge and complicated problem. The crisis is a problem so big it’s being studied by people with PhDs in meteorology, geology, physics, chemistry, psychology, economics, political science, and more. It’s a problem that needs to be tackled at every level, from individual action to international cooperation. It’s a problem that seems daunting, to say the least. Yet it’s a problem that must be solved. And that’s where hope lies. You see, as far as existential threats to humanity go, climate change stands out as being particularly solvable. Challenging? Yes. But not impossible.

The trends are bad. I will quote René Dubos who said, however, “Trends are not destiny.” So the trends are bad, but we can change the trends.

~Suzanne Jones, Mayor, Boulder CO // Executive Director, Eco-Cycle

Unlike the threats posed by artificial intelligence, biotechnology or nuclear weapons, you don’t need to have an advanced science degree or be a high-ranking government official to start having a meaningful impact on your own carbon footprint. Each of us can begin making lifestyle changes today that will help. The people you vote into office at all levels of government, from local to national, can each  influence and create better climate policies. But this is a problem for which every action each of us takes truly does help.

When you have a fractal, complicated, humongous, super wicked problem like this, it means there’s some facet of it that every person on the planet can do something about it. Artist, communicator, teacher, engineer, entrepreneur. There’s something in it for everybody.

~Andrew Revkin, Head of Initiative on Communication and Sustainability, Columbia University // Science & Environmental Journalist

I’m Ariel Conn, and I’m the host of Not Cool, a climate podcast that dives deep into understanding both the climate crisis and the solutions. I started this podcast because the news about climate change seems to get worse with each new article and report, but the solutions, at least as reported, remain vague and elusive. I wanted to hear from the scientists and experts themselves to learn what’s really going on and how we can all come together to solve this crisis. And so I’ll be talking with climate experts from around the world, including scientists, journalists, policy experts and more, to learn the problems climate change poses, what we know and what’s still uncertain about our future climate, and what we can all do to help put the brakes on this threat.

We’ll look at some of the basic science behind climate change and global warming, like the history of climate modeling, what the carbon cycle is, what tipping points are and whether we’ve already passed some, what extreme weather events are and why they’re getting worse. We’ll look at the challenges facing us, from political inertia to technical roadblocks. We’ll talk about the impacts on human health and lifestyles from the spread of deadly diseases to national security threats to problems with zoning laws. We’ll learn about geoengineering, ocean acidification, deforestation, and how local communities can take action, regardless of what’s happening at the federal level.

I think the most important thing that every single person can do is talk more about climate change.  Social momentum is the key to political momentum and getting real action.

~John Cook, Founder, SkepticalScience.com // Research Assistant Professor, Center for Climate Change Communication, George Mason University

Let’s start talking. Let’s build momentum. And let’s take real action. Because climate change is so not cool.

Visit futureoflife.org/notcool for a complete list of episodes, which we will be updating every Tuesday and Thursday for at least the next couple of months. And we hope you’ll also join the discussion. You can find us on twitter using #NotCool and #ChangeForClimate.

AI Alignment Podcast: Synthesizing a human’s preferences into a utility function with Stuart Armstrong

In his Research Agenda v0.9: Synthesizing a human’s preferences into a utility function, Stuart Armstrong develops an approach for generating friendly artificial intelligence. His alignment proposal can broadly be understood as a kind of inverse reinforcement learning where most of the task of inferring human preferences is left to the AI itself. It’s up to us to build the correct assumptions, definitions, preference learning methodology, and synthesis process into the AI system such that it will be able to meaningfully learn human preferences and synthesize them into an adequate utility function. In order to get this all right, his agenda looks at how to understand and identify human partial preferences, how to ultimately synthesize these learned preferences into an “adequate” utility function, the practicalities of developing and estimating the human utility function, and how this agenda can assist in other methods of AI alignment.

Topics discussed in this episode include:

  • The core aspects and ideas of Stuart’s research agenda
  • Human values being changeable, manipulable, contradictory, and underdefined
  • This research agenda in the context of the broader AI alignment landscape
  • What the proposed synthesis process looks like
  • How to identify human partial preferences
  • Why a utility function anyway?
  • Idealization and reflective equilibrium
  • Open questions and potential problem areas

Last chance to take a short (4 minute) survey to share your feedback about the podcast.

 

Key points from Stuart: 

  • “There are two core parts to this research project essentially. The first part is to identify the humans’ internal models, figure out what they are, how we use them and how we can get an AI to realize what’s going on. So those give us the sort of partial preferences, the pieces from which we build our general preferences. The second part is to then knit all these pieces together into an overall preference for any given individual in a way that works reasonably well and respects as much as possible the person’s different preferences, meta-preferences and so on. The second part of the project is the one that people tend to have strong opinions about because they can see how it works and how the building blocks might fit together and how they’d prefer that it would be fit together in different ways and so on but in essence, the first part is the most important because that fundamentally defines the pieces of what human preferences are.”
  • “So, when I said that human values are contradictory, changeable, manipulable and underdefined, I was saying that the first three are relatively easy to deal with but that the last one is not. Most of the time, people have not considered the whole of the situation that they or the world or whatever is confronted with. No situation is exactly analogous to another, so you have to try and fit it in to different categories. So if someone dubious gets elected in a country and starts doing very authoritarian things, does this fit in the tyranny box which should be resisted or does this fit in the normal process of democracy box in which case it should be endured and dealt with through democratic means. What’ll happen is generally that it’ll have features of both, so it might not fit comfortably in either box and then there’s a wide variety for someone to be hypocritical or to choose one side or the other but the reason that there’s such a wide variety of possibilities is because this is a situation that has not been exactly confronted before so people don’t actually have preferences here. They don’t have a partial preference over this situation because it’s not one that they’ve ever considered… I’ve actually argued at some point in the research agenda that this is an argument for insuring that we don’t go too far from the human baseline normal into exotic things where our preferences are not well-defined because in these areas, the chance that there is a large negative seems higher than the chance that there’s a large positive… So, when I say not go too far, I don’t mean not embrace a hugely transformative future. I’m saying not embrace a hugely transformative future where our moral categories start breaking down.”
  • “One of the reasons to look for a utility function is to look for something stable that doesn’t change over time and there is evidence that consistency requirements will push any form of preference function towards a utility function and that if you don’t have a utility function, you just lose value. So, the desire to put this into a utility function is not out of an admiration for utility functions per se but our desire to get something that won’t further change or won’t further drift in a direction that we can’t control and have no idea about. The other reason is that as we start to control our own preferences better and have a better ability to manipulate our own minds, we are going to be pushing ourselves towards utility functions because of the same pressures of basically not losing value pointlessly.”
  • “Reflective equilibrium is basically you refine your own preferences, make them more consistent, apply them to yourself until you’ve reached a moment where your meta-preferences and your preferences are all smoothly aligned with each other. What I’m doing is a much more messy synthesis process and I’m doing it in order to preserve as much as possible of the actual human preferences. It is very easy to reach reflective equilibrium by just, for instance, having completely flat preferences or very simple preferences, these tend to be very reflectively in equilibrium with itself and pushing towards this thing is a push towards, in my view, excessive simplicity and the great risk of losing valuable preferences. The risk of losing valuable preferences seems to me a much higher risk than the gain in terms of simplicity or elegance that you might get. There is no reason that the kludgey human brain and it’s mess of preferences should lead to some simple reflective equilibrium. In fact, you could say that this is an argument against reflexive equilibrium because it means that many different starting points, many different minds with very different preferences will lead to similar outcomes which basically means that you’re throwing away a lot of the details of your input data.”
  • “Imagine that we have reached some positive outcome, we have got alignment and we haven’t reached it through a single trick and we haven’t reached it through the sort of tool AIs or software as a service or those kinds of approaches, we have reached an actual alignment. It, therefore, seems to me all the problems that I’ve listed or almost all of them will have had to have been solved, therefore, in a sense, much of this research agenda needs to be done directly or indirectly in order to achieve any form of sensible alignment. Now, the term directly or indirectly is doing a lot of the work here but I feel that quite a bit of this will have to be done directly.”

 

Important timestamps: 

0:00 Introductions 

3:24 A story of evolution (inspiring just-so story)

6:30 How does your “inspiring just-so story” help to inform this research agenda?

8:53 The two core parts to the research agenda 

10:00 How this research agenda is contextualized in the AI alignment landscape

12:45 The fundamental ideas behind the research project 

15:10 What are partial preferences? 

17:50 Why reflexive self-consistency isn’t enough 

20:05 How are humans contradictory and how does this affect the difficulty of the agenda?

25:30 Why human values being underdefined presents the greatest challenge 

33:55 Expanding on the synthesis process 

35:20 How to extract the partial preferences of the person 

36:50 Why a utility function? 

41:45 Are there alternative goal ordering or action producing methods for agents other than utility functions?

44:40 Extending and normalizing partial preferences and covering the rest of section 2 

50:00 Moving into section 3, synthesizing the utility function in practice 

52:00 Why this research agenda is helpful for other alignment methodologies 

55:50 Limits of the agenda and other problems 

58:40 Synthesizing a species wide utility function 

1:01:20 Concerns over the alignment methodology containing leaky abstractions 

1:06:10 Reflective equilibrium and the agenda not being a philosophical ideal 

1:08:10 Can we check the result of the synthesis process?

01:09:55 How did the Mahatma Armstrong idealization process fail? 

01:14:40 Any clarifications for the AI alignment community? 

 

Works referenced:

Research Agenda v0.9: Synthesising a human’s preferences into a utility function 

Some Comments on Stuart Armstrong’s “Research Agenda v0.9” 

Mahatma Armstrong: CEVed to death 

The Bitter Lesson 

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas: Hey everyone and welcome back to the AI Alignment Podcast at the Future of Life Institute. I’m Lucas Perry and today we’ll be speaking with Stuart Armstrong on his Research Agenda version 0.9: Synthesizing a human’s preferences into a utility function. Here Stuart takes us through the fundamental idea behind this research agenda, what this process of synthesizing human preferences into a utility function might look like, key philosophical and empirical insights needed for progress, how human values are changeable, manipulable, under-defined and contradictory, how these facts affect generating an adequate synthesis of human values, where this all fits in the alignment landscape and how it can inform other approaches to aligned AI systems.

If you find this podcast interesting or useful, consider sharing it with friends, on social media platforms, forums or anywhere you think it might be found valuable. I’d also like to put out a final call for this round of SurveyMonkey polling and feedback, so if you have any comments, suggestions or any other thoughts you’d like to share with me about the podcast, potential guests or anything else, feel free to do so through the SurveyMonkey poll link attached to the description of wherever you might find this podcast. I’d love to hear from you. There also seems to be some lack of knowledge regarding the pages that we create for each podcast episode. You can find a link to that in the description as well and it contains a summary of the episode, topics discussed, key points from the guest, important timestamps if you want to skip around, works referenced, as well as a full transcript of the audio in case you prefer reading.

Stuart Armstrong is a researcher at the Future of Humanity Institute who focuses on the safety and possibilities of artificial intelligence, how to define the potential goals of AI and map humanities partially defined values into it and the longterm potential for intelligent life across the reachable universe. He has been working with people at FHI and other organizations such as DeepMind to formalize AI desiderata in general models so the AI designers can include these safety methods in their designs. His collaboration with DeepMind on “Interruptability” has been mentioned in over 100 media articles. Stuart’s past research interests include comparing existential risks in general, including their probability and their interactions, anthropic probability, how the fact that we exist affects our probability estimates around that key fact, decision theories that are stable under self-reflection and anthropic considerations, negotiation theory and how to deal with uncertainty about your own preferences, computational biochemistry, fast ligand screening, parabolic geometry and his Oxford DPhil was on the holonomy of projective and conformal Cartan geometries and so without further ado or pretenses that I know anything about the holonomy of projective and conformal Cartan geometries, I give you Stuart Armstrong.

We’re here today to discuss your research agenda version 0.9: Synthesizing a human’s preferences into a utility function. One wonderful place for us to start here would be with this sort of story of evolution, which you call an inspiring just so story, and so starting this, I think it would be helpful for us contextualizing sort of the place of the human and what the human is as we sort of find ourselves here at the beginning of this value alignment problem. I’ll go ahead and read there here for listeners to begin developing a historical context and narrative.

So, I’m quoting you here. You say, “This is the story of how evolution created humans with preferences and what the nature of these preferences are. The story is not true in the sense of accurate. Instead, it is intended to provide some inspiration as to the direction of this research agenda. In the beginning, evolution created instinct driven agents. These agents have no preferences or goals nor do they need any. They were like Q-learning agents. They knew the correct action to take in different circumstances but that was it. Consider baby turtles that walk towards the light upon birth because traditionally, the sea was lighter than the land. Of course, this behavior fails them in the era of artificial lighting but evolution has a tiny bandwidth, acting once per generation, so it created agents capable of planning, of figuring out different approaches rather than having to follow instincts. This was useful especially in varying environments and so evolution off-loaded a lot of it’s job onto the planning agents.”

“Of course, to be of any use, the planning agents need to be able to model their environment to some extent or else their plans can’t work and had to have preferences or else every plan was as good as another. So, in creating the first planning agents, evolution created the first agents with preferences. Of course, evolution is messy, undirected process, so the process wasn’t clean. Planning agents are still riven with instincts and the modeling of the environment is situational, used for when it was needed rather than some consistent whole. Thus, the preferences of these agents were underdefined and some times contradictory. Finally, evolution created agents capable of self-modeling and of modeling other agents in their species. This might have been because of competitive social pressures as agents learned to lie and detect lying. Of course, this being evolution, the self and other modeling took the form of kludges built upon spandrels, built upon kludges and then arrived humans, who developed norms and norm violations.”

“As a side effect of this, we started having higher order preferences as to what norms and preferences should be but instincts and contradictions remained. This is evolution after all, and evolution looked upon this hideous mess and saw that it was good. Good for evolution that is but if we want it to be good for us, we’re going to need to straighten out this mess somewhat.” Here we arrive, Stuart, in the human condition after hundreds of millions of years of evolution. So, given the story of human evolution that you’ve written here, why were you so interested in this story and why were you looking into this mess to better understand AI alignment and development this research agenda?

Stuart: This goes back to a paper that I co-wrote for NuerIPS It basically develops the idea of inverse reinforcement learning or more broadly, can you infer what the preferences of an agent are just by observing their behavior. Humans are not entirely rational, so the question I was looking at is can you simultaneously infer the rationality and the preferences of an agent by observing their behavior. It turns out to be mathematically completely impossible. We can’t infer the preferences without making assumptions about the rationality and we can’t infer the rationality without making assumptions about the preferences. This is a rigorous result, so my looking at human evolution is to basically get around this result, in a sense, to make the right assumptions so that we can extract actual human preferences since we can’t just do it by observing behavior. We need to dig a bit deeper.

Lucas: So, what have you gleaned then from looking at this process of human evolution and seeing into how messy the person is?

Stuart: Well, there’s two key insights here. The first is that I located where human preferences reside or where we can assume that human preferences reside and that’s in the internal models of the humans, how we model the world, how we judge, that was a good thing or I want that or ooh, I’d be really embarrassed about that, and so human preferences are defined in this project or at least the building blocks of human preferences are defined to be in these internal models that humans have with the labeling of states of outcomes as good or bad. The other point to bring about evolution is that since it’s not anything like a clean process, it’s not like we have one general model with clearly labeled preferences and then everything else flows from that. It is a mixture of situational models in different circumstances with subtly different things labeled as good or bad. So, as I said to you in preferences are contradictory, changeable, manipulable and underdefined.

So, there are two core parts to this research project essentially. The first part is to identify the humans’ internal models, figure out what they are, how we use them and how we can get an AI to realize what’s going on. So those give us the sort of partial preferences, the pieces from which we build our general preferences. The second part is to then knit all these pieces together into an overall preference for any given individual in a way that works reasonably well and respects as much as possible the person’s different preferences, meta-preferences and so on.

The second part of the project is the one that people tend to have strong opinions about because they can see how it works and how the building blocks might fit together and how they’d prefer that it would be fit together in different ways and so on but in essence, the first part is the most important because that fundamentally defines the pieces of what human preferences are.

Lucas: Before we dive into the specifics of your agenda here, can you contextualize it within evolution of your thought on AI alignment and also how it fits within the broader research landscape?

Stuart: So, this is just my perspective on what the AI alignment landscape looks like. There are a collection of different approaches addressing different aspects of the alignment problem. Some of them, which MIRI is working a lot on, are technical things of how to ensure stability of goals and other similar thoughts along these lines that should be necessary for any approach. Others are developed on how to make the AI safe either indirectly or make itself fully aligned. So, the first category you have things like software as a service. Can we have super intelligent abilities integrated in a system that doesn’t allow for say super intelligent agents with pernicious goals.

Others that I have looked into in the past are things like low impact agents or oracles, which again, the idea is we have a superintelligence, we cannot align it with human preferences, yet we can use it to get some useful work done. Then there are the approaches, which aim to solve the whole problem and get actual alignment, what used to be called the friendly AI approach. So here, it’s not an AI that’s constrained in any ways, it’s an AI that is intrinsically motivated to do the right thing. There are a variety of different approaches to that, some more serious than others. Paul Christiano has an interesting variant on that, though it’s hard to tell, I would say, his in a bit of a mixture of value alignment and constraining what the AI can do in a sense, but it is very similar and so this is of that last type, of getting the aligned, the friendly AI, the aligned utility function.

In that area, there are what I would call the ones that sort of rely on indirect proxies. This is the ideas of you put Nick Bostrom in a room for 500 years or a virtual version of that and hope that you get something aligned at the end of that. There are direct approaches and this is the basic direct approach, doing everything the hard way in a sense but defining everything that needs to be defined so that the AI can then assemble an aligned preference function from all the data.

Lucas: Wonderful. So you gave us a good summary earlier of the different parts of this research agenda. Would you like to expand a little bit on the “fundamental idea” behind this specific research project?

Stuart: There are two fundamental ideas that are not too hard to articulate. The first is that though our revealed preferences could be wrong though our stated preferences could be wrong, what our actual preferences are at least in one moment is what we model inside our head, what we’re thinking of as the better option. We might lie, as I say, in politics or in a court of law or just socially but generally, when we know that we’re lying, it’s because there’s a divergence between what we’re saying and what we’re modeling internally. So, it is this internal model, which I’m identifying as the place where our preferences lie and then all the rest of it, the whole convoluted synthesis project is just basically how do we take these basic pieces and combine them in a way that does not seem to result in anything disastrous and that respects human preferences and meta-preferences and this is a key thing, actually reaches a result. That’s why the research project is designed for having a lot of default actions in a lot of situations.

Like if the person does not have strong meta-preferences, then there’s a whole procedure of how you combine say preferences about the world and preferences about your identity are, by default, combined in a different way if you would want GDP to go up, that’s a preference about the world. If you yourself would want to believe something or believe only the truth, for example, that’s a preference about your identity. It tends to be that identity preferences are more fragile, so the default is that preferences about the world are just added together and this overcomes most of the contradictions because very few human preferences are exactly anti-aligned whereas identity preferences are combined in a more smooth process so that you don’t lose too much on any of them. But as I said, these are the default procedures, and they’re all defined so that we get an answer but there’s also large abilities for the person’s meta-preferences to override the defaults. Again, precautions are taken to ensure that an answer is actually reached.

Lucas: Can you unpack what partial preferences are? What you mean by partial preferences and how they’re contextualized within human mental models?

Stuart: What I mean by partial preference is mainly that a human has a small model of part of the world like let’s say they’re going to a movie and they would prefer to invite someone they like to go with them. Within this mental model, there is the movie, themselves and the presence or absence of the other person. So, this is a very narrow model of reality, virtually the entire rest of the world and, definitely, the entire rest of the universe does not affect this. It could be very different and not change anything of this. So, this is what I call a partial preference. You can’t go from this to a general rule of what the person would want to do in every circumstance but it is a narrow valid preference. Partial preferences refers to two things, first of all, that it doesn’t cover all of our preferences and secondly, the model in which it lives only covers a narrow slice of the world.

You can make some modifications to this. This is the whole point of the second section that if the approach works, variations on the synthesis project should not actually result in results that are disastrous at all. If the synthesis process being changed a little bit would result in a disaster, then something has gone wrong with the whole approach but you could, for example, add restrictions like looking for consistent preferences but I’m starting with basically the fundamental thing is there is this mental model, there is an unambiguous judgment that one thing is better than another and then we can go from there in many ways. A key part of this approach is that there is no single fundamental synthesis process that would work, so it is aiming for an adequate synthesis rather than an idealized one because humans are a mess of contradictory preferences and because even philosophers have contradictory meta-preferences within their own minds and with each other and because people can learn different preferences depending on the order in which information is presented to them, for example.

Any method has to make a lot of choices, and therefore, I’m writing down explicitly as many of the choices that have to be made as I can so that other people can see what I see the processes entailing. I am quite wary of things that look for reflexive self-consistency because in a sense, if you define your ideal system as one that’s reflexively self-consistent, that’s a sort of local condition in a sense that the morality judges itself by its own assessment and that means that you could theoretically wander arbitrarily far in preference space before you hit that. I don’t want something that is just defined by this has reached reflective equilibrium, this morality synthesis is now self-consistent, I want something that is self-consistent and it’s not too far from where it started. So, I prefer to tie things much more closely to actual human preferences and to explicitly aim for a synthesis process that doesn’t wander too far away from them.

Lucas: I see, so the starting point is the evaluative moral that we’re trying to keep it close to?

Stuart: Yes, I don’t think you can say that any human preference synthesized is intrinsically wrong as long as it reflects some of the preferences that were inputs into it. However, I think you can say that it is wrong from the perspective of the human that you started with if it strongly contradicts what they would want. Disagreements from my starting position is something which I take to be very relevant to the ultimate outcome. There’s a bit of a challenge here because we have to avoid say preferences which are based on inaccurate facts. So, some of the preferences are inevitably going to be removed or changed just because they’re based on factually inaccurate beliefs. Some other processes of trying to make consistent what is sort of very vague will also result in some preferences being moved beyond. So, you can’t just say the starting person has veto power over the final outcome but you do want to respect their starting preferences as much as you possibly can.

Lucas: So, reflecting here on the difficulty of this agenda and on how human beings contain contradictory preferences and models, can you expand a bit how we contain these internal contradictions and how this contributes to the difficulty of the agenda?

Stuart: I mean humans contain many contradictions within them. Our mood shifts. We famously are hypocritical in favor of ourselves and against the foibles of others, we basically rewrite narratives to allow ourselves to always be heroes. Anyone who’s sort of had some experience of a human has had knowledge of when they’ve decided one way or decided the other way or felt that something was important and something else wasn’t and often, people just come up with a justification for what they wanted to do anyway, especially if they’re in a social situation, and then some people can cling to this justification and integrate that into their morality while behaving differently in other ways. The easiest example are sort of political hypocrites. The anti-gay preacher who sleeps with other men is a stereotype for a reason but it’s not just a sort of contradiction at that level. It’s that basically most of the categories in which we articulate our preferences are not particularly consistent.

If we throw a potentially powerful AI in this, which could change the world drastically, we may end up with things across our preferences. For example, suppose that someone created or wanted to create a subspecies of human that was bred to be a slave race. Now, this race did not particularly enjoy being a slave race but they wanted to be slaves very strongly. In this situation, a lot of our intuitions are falling apart because we know that slavery is almost always involuntary and is backed up by coercion. We also know that even though our preferences and our enjoyments do sometimes come apart, they don’t normally come apart that much. So, we’re now confronted by a novel situation where a lot of our intuitions are pushing against each other.

You also have things like nationalism for example. Some people have strong nationalist sentiments about their country and sometimes their country changes and in this case, what seemed like a very simple, yes, I will obey the laws of my nation, for example, becomes much more complicated as the whole concept of my nation starts to break down. This is the main way that I see preferences to being underdefined. They’re articulated in terms of concepts which are not universal and which bind together many, many different concepts that may come apart.

Lucas: So, at any given moment, like myself at this moment, the issue is that there’s a large branching factor of how many possible future Lucases there can be. At this time, currently and maybe a short interval around this time as you sort of explore in your paper, the sum total of my partial preferences and the partial world models in which these partial preferences are contained. The expression of these preferences and models can be expressed differently and sort of hacked and changed based off how questions are asked, the order of questions. I am like a 10,000-faced thing which I can show you one of my many faces depending on how you push my buttons and depending on all of the external input that I get in the future, I’m going to express and maybe become more idealized in one of many different paths. The only thing that we have to evaluate which of these many different paths I would prefer is what I would say right now, right?

Say my core value is joy or certain kinds of conscious experiences over others and all I would have for evaluating this many branching thing is say this preference now at this time but that could be changed in the future, who knows? I will create new narratives and stories that justify the new person that I am and that makes sense of the new values and preferences that I have retroactively, like something that I wouldn’t actually have approved of now but my new, maybe more evil version of myself would approve and create a new narrative retroactively. Is this sort of helping to elucidate and paint the picture of why human beings are so messy?

Stuart: Yes, we need to separate that into two. The first is that our values can be manipulated by other humans as they often are and by the AI itself during the process but that can be combated to some extent. I have a paper that may soon come out on how to reduce the influence of an AI over a learning process that it can manipulate. That’s one aspect. The other aspect is when you are confronted by a new situation, you can go in multiple different directions and these things are just not defined. So, when I said that human values are contradictory, changeable, manipulable and underdefined, I was saying that the first three are relatively easy to deal with but that the last one is not.

Most of the time, people have not considered the whole of the situation that they or the world or whatever is confronted with. No situation is exactly analogous to another, so you have to try and fit it in to different categories. So if someone dubious gets elected in a country and starts doing very authoritarian things, does this fit in the tyranny box which should be resisted or does this fit in the normal process of democracy box in which case it should be endured and dealt with through democratic means. What’ll happen is generally that it’ll have features of both, so it might not fit comfortably in either box and then there’s a wide variety for someone to be hypocritical or to choose one side or the other but the reason that there’s such a wide variety of possibilities is because this is a situation that has not been exactly confronted before so people don’t actually have preferences here. They don’t have a partial preference over this situation because it’s not one that they’ve ever considered.

How they develop one is due to a lot as you say, the order in which information is presented, which category it seems to most strongly fit into and so on. We are going here for very mild underdefinedness. The willing slave race was my attempt to push it out a bit further into something somewhat odd and then if you consider a powerful AI that is able to create vast numbers of intelligent entities, for example, and reshape society, human bodies and human minds in hugely transformative ways, we are going to enter sort of very odd situations where all our starting instincts are almost useless. I’ve actually argued at some point in the research agenda that this is an argument for insuring that we don’t go too far from the human baseline normal into exotic things where our preferences are not well-defined because in these areas, the chance that there is a large negative seems higher than the chance that there’s a large positive.

Now, I’m talking about things that are very distant in terms of our categories, like the world of Star Trek is exactly the human world from this perspective because even though they have science fiction technology, all of the concepts and decisions they are articulated around concepts that we’re very familiar with because it is a work of fiction addressed to us now. So, when I say not go too far, I don’t mean not embrace a hugely transformative future. I’m saying not embrace a hugely transformative future where our moral categories start breaking down.

Lucas: In my mind, there’s two senses. There’s the sense in which we have these models for things and we have all of these necessary and sufficient conditions for which something can be pattern matched to some sort of concept or thing and we can encounter situations where there’re conditions for many different things being included in the context in a new way which makes it so that the thing like goodness or justice is underdefined in the slavery case because we don’t really know initially whether this thing is good or bad. I see this underdefined in this sense. The other sense is maybe the sense in which my brain is a neural architectural aggregate of a lot of neurons and the sum total of its firing statistics and specific neural pathways can be potentially identified as containing preferences and models somewhere within there. So is it also true to say that it’s underdefined in the sense that the human as not a thing in the world but as a process in the world largely constituted of the human brain, even within that process, it’s underdefined where in the neural firing statistics or the processing of the person there could ever be something called a concrete preference or value?

Stuart: I would disagree that it is underdefined in the second sense.

Lucas: Okay.

Stuart: In order to solve the second problem, you need to solve the symbol grounding problem for humans. You need to show that the symbols or the neural pattern firing or the neuron connection or something inside the brain corresponds to some concepts in the outside world. This is one of my sort of side research projects. When I say side research project, I mean I wrote a couple of blog posts on this pointing out how I might approach it and I point out that you can do this in a very empirical way. If you think that a certain pattern of neural firing refers to say a rabbit, you can see whether this thing firing in the brain is that predictive of say a rabbit in the outside world or predictive of this person is going to start talking about rabbits soon.

In model theory, the actual thing that gives meaning to the symbols is sort of beyond the scope of the math theory but if you have a potential connection between the symbols and the outside world, you can check whether this theory is a good one or a terrible one. If you say this corresponds to hunger and yet that thing only seems to trigger when someone’s having sex, for example, we can say, okay, your model that this corresponds to hunger is terrible. It’s wrong. I cannot use it for predicting that the person will eat in the world but I can use it for predicting that they’re having sex. So, if I model this as connected with sex, this is a much better grounding of that symbol. So using methods like this and there’re some subtleties I also address Quine’s “gavagai” and connect it to sort of webs of connotation and concepts that go together but the basic idea is to empirically solve the symbol grounding problem for humans.

When I say that things are underdefined, I mean that they are articulated in terms of concepts that are underdefined across all possibilities in the world, not that these concepts could be anything or we don’t know what they mean. Our mental models correspond to something. It’s a collection of past experience and the concepts in our brain are tying together a variety of experiences that we’ve had. They might not be crisp. They might not be well-defined even if you look at say the totality of the universe but they correspond to something, to some repeated experience, some concepts to some thought process that we’ve had and that we’ve extracted this idea from. When we do this in practice, we are going to inject some of our own judgements into it and since humans are so very similar in how we interpret each other and how we decompose many concepts, it’s not necessarily particularly bad that we do so, but I strongly disagree that these are arbitrary concepts that are going to be put in by hand. They are going to be in the main identified via once you have some criteria for tracking what happens in the brain, comparing it with the outside world and those kinds of things.

My concept, maybe a cinema is not an objectively well-defined fact but what I think of as a cinema and what I expect in a cinema and what I don’t expect in a cinema, like I expect it to go dark and a projector and things like that. I don’t expect that this would be in a completely open space in the Sahara Desert under the sun with no seats and no sounds and no projection. I’m pretty clear that one of these things is a lot more of a cinema than the other.

Lucas: Do you want to expand here a little bit about this synthesis process?

Stuart: The main idea is to try and ensure that no disasters come about and the main thing that could lead to a disaster is the over prioritization of certain preferences over others. There are other avenues to disaster but this seems to be the most obvious. The other important part of the synthesis process is that it has to reach an outcome, which means that a vague description is not sufficient, so that’s why it’s phrased in terms of this is the default way that you synthesize preferences. This way may be modified by certain meta-preferences. The meta-preferences have to be reducible to some different way of synthesizing the preferences.

For example, the synthesis is not particularly over-weighting long-term preferences versus short term preferences. It would prioritize long-term preferences but not exclude short term ones. So, I want to be thin is not necessarily prioritizing over that’s a delicious piece of cake that I’d like to eat right now, for example, but human meta-preferences often prioritize long-term preferences over short term ones, so this is going to be included and this is going to change the default balance towards long-term preferences.

Lucas: So, powering the synthesis process, how are we to extract the partial preferences and their weights from the person?

Stuart: That’s, as I say, the first part of the project and that is a lot more empirical. This is going to be a lot more looking at what neuroscience says, maybe even what algorithm theory says or what modeling of algorithms say and about what’s physically going on in the brain and how this corresponds to internal mental models. There might be things like people noting down what they’re thinking, correlating this with changes in the brain and this is a much more empirical aspect to the process that could be carried out essentially independently from the synthesis product.

Lucas: So, a much more advanced neuroscience would be beneficial here?

Stuart: Yes, but even without that, it might be possible to infer some of these things indirectly via the AI and if the AI accounts well for uncertainties, this will not result in disasters. If it knows that we would really dislike losing something of importance to our values, even if it’s not entirely sure what the thing of importance is, it will naturally, with that kind of motivation, act in a cautious way, trying to preserve anything that could be valuable until such time as it figures out better what we want in this model.

Lucas: So, in section two of your paper, synthesizing the preference utility function, within this section, you note that this is not the only way of constructing the human utility function. So, can you guide us through this more theoretical section, first discussing what sort of utility function and why a utility function in the first place?

Stuart: One of the reasons to look for a utility function is to look for something stable that doesn’t change over time and there is evidence that consistency requirements will push any form of preference function towards a utility function and that if you don’t have a utility function, you just lose value. So, the desire to put this into a utility function is not out of an admiration for utility functions per se but our desire to get something that won’t further change or won’t further drift in a direction that we can’t control and have no idea about. The other reason is that as we start to control our own preferences better and have a better ability to manipulate our own minds, we are going to be pushing ourselves towards utility functions because of the same pressures of basically not losing value pointlessly.

You can kind of see it in some investment bankers who have to a large extent, constructed their own preferences to be expected money maximizers within a range and it was quite surprising to see but human beings are capable of pushing themselves towards that and this is what repeated exposure to different investment decision tends to do to you and it’s the correct thing to do in terms of maximizing the money and this is the kind of thing that general pressure on humans combined with human’s ability to self-modify, which we may develop in the future, so all this is going to be pushing us towards a utility function anyway, so we may as well go all the way and get the utility function directly rather than being pushed into it.

Lucas: So, is the view here that the reason why we’re choosing utility functions even when human beings are very far from being utility functions is that when optimizing our choices in mundane scenarios, it’s pushing us in that direction anyway?

Stuart: In part. I mean utility functions can be arbitrarily complicated and can be consistent with arbitrarily complex behavior. A lot of when people think of utility functions, they tend to think of simple utility functions and simple utility functions are obviously simplifications that don’t capture everything that we value but complex utility functions can capture as much of the value as we want. What tends to happen is that when people have say, inconsistent preferences, that they are pushed to make them consistent by the circumstances of how things are presented, like you might start with the chocolate mousse but then if offered a trade for the cherry pie, go for the cherry pie and then if offered a trade for the maple pie, go for the maple pie but then you won’t go back to the chocolate or even if you do, you won’t continue going around the cycle because you’ve seen that there is a cycle and this is ridiculous and then you stop it at that point.

So, what we decide when we don’t have utility functions tends to be determined by the order in which things are encountered and under contingent things and as I say, non-utility functions tend to be intrinsically less stable and so can drift. So, for all these reasons, it’s better to nail down a utility function from the start so that you don’t have the further drift and your preferences are not determined by the order in which you encounter things, for example.

Lucas: This is though in part thus a kind of normative preference then, right? To use utility functions in order not to be pushed around like that. Maybe one can have the meta-preferences for their preferences to be expressed in the order in which they encounter things.

Stuart: You could have that strong meta-preference, yes, though even that can be captured by a utility function if you feel like doing it. Utility functions can capture pretty much any form of preferences, even the ones that seem absurdly inconsistent. So, we’re not actually losing anything in theory by insisting that it should be a utility function. We may be losing things in practice in the construction of that utility function. I’m just saying if you don’t have something that is isomorphic with a utility function or very close to that, your preferences are going to drift randomly affected by many contingent factors. You might want that, in which case, you should put it in explicitly rather than implicitly and if you put it in explicitly, it can be captured by a utility function that is conditional on the things that you see, in the order in which you see them, for example.

Lucas: So, comprehensive AI services and other tool-like AI approaches to AI alignment I suppose avoid some of the anxieties produced by a strong agential AIs with utility functions. Are there alternative goal ordering or action producing methods in agents other than utility functions that may have the properties that we desire of utility functions or is the category of utility functions just so large that it encapsulates much of what is just mathematically rigorous and simple?

Stuart: I’m not entirely sure. Alternative goal structures tend to be quite ad hoc and limited in my practical experience whereas utility functions or reward functions which may or may not be isomorphic do seem to be universal. There are possible inconsistencies within utility functions themselves if you get a self-referential utility function including your own preferences, for example, but MIRI’s work should hope to clarify those aspects. I came up with an alternative goal structure which is basically an equivalence class of utility functions that are not equivalent in terms of utility and this could successfully model an agent’s who’s preferences were determined by the order in which things were chosen but I put this together as a toy model or as a thought experiment. I would never seriously suggest building that. So, it just seems that for the moment, most non-utility function things are either ad hoc or under-defined or incomplete and that most things can be captured by utility functions, so the things that are not utility functions all seem at the moment to be flawed and the utility functions seem to be sufficiently versatile to capture anything that you would want.

This may mean by the way that we may lose some of the elegant properties of utility function that we normally assume like deontology can be captured by a utility function that assigns one to obeying all the rules and zero to violating any of them and this is a perfectly valid utility function, however, there’s not much in terms of expected utility in terms of this. It behaves almost exactly like a behavioral constraint, never choose any option that is against the rules. That kind of thing, even though it’s technically a utility function, might not behave the way that we’re used to utility functions behaving in practice. So, when I say that it should be captured as a utility function, I mean formally it has to be defined in this way but informally, it may not have the properties that we informally expect of utility functions.

Lucas: Wonderful. This is a really great picture that you’re painting. Can you discuss extending and normalizing the partial preferences? Take us through the rest of section two on synthesizing to a utility function.

Stuart: The extending is just basically you have, for instance, a preference of going to the cinema this day with that friend versus going to the cinema without that friend. That’s an incredibly narrow preference, but you also have preferences about watching films in general, being with friends in general, so these things should be combined in as much as they can be into some judgment of what you like to watch, who you like to watch with and under what circumstances. That’s the generalizing. The extending is basically trying to push these beyond the typical situations. So, if there was a sort of virtual reality, which really gave you the feeling that other people were present with you, which current virtual reality doesn’t tend to, then would this count as being with your friend. What level of interaction would be required for it to count as being with your friend? Well, that’s some of the sort of extending.

The normalizing is just basically the fact that utility functions are defined up to scaling, up to multiplying by some positive real constant. So, if you want to add utilities together or combine them in a smooth-min or combine them in any way, you have to scale the different preferences and there are various ways of doing this. I fail to find an intrinsically good way of doing it that has all the nice formal properties that you would want but there are a variety of ways that can be done, all of which seem acceptable. The one I’m currently using is the mean max normalization, which is that the best possible outcome gets a utility of one, and the average outcome gets a utility of zero. This is the scaling.

Then the weight of these preferences is just how strongly you feel about it. Do you have a mild preference for going to the cinema with this friend? Do you have an overwhelming desire for chocolate? Once they’ve normalized, you weigh them, and you combine them.

Lucas: Can you take us through the rest of section two here, if there’s anything else here that you think is worth mentioning?

Stuart: I’d like to point out that this is intended to work with any particular human being that you point the process at, so there are a lot of assumptions that I made from my non-moral realist, worried about over simplification and other things. The idea is that if people have strong meta-preferences themselves, these will overwhelm the default decisions that I’ve made but if people don’t have strong meta-preferences, then they are synthesized in this way in the way which I feel is the best to not lose any important human value. There are also judgements about what would constitute a disaster or how we might judge this to have gone disastrously wrong, those are important and need to be sort of fleshed out a bit more because many of them can’t be quite captured within this system.

The other thing is that the outcomes may be very different. To choose a silly example, if you are 50% total utilitarian versus 50% average utilitarian or if you’re 45%, 55% either way, the outcomes are going to be very different because the pressure on the future is going to be different and because the AI is going to have a lot of power, it’s going to result in very different outcomes but from our perspective where if we put 50/50 total utilitarianism and average utilitarianism, we’re not exactly 50/50 most of the time. We’re kind of … Yeah, they’re about the same. So, 45, 55, should not result in a disaster if 50/50 doesn’t.

So, even though from the perspective of these three mixes, 45/55, 50/50, 55/45, these three mixes will look at something that optimizes one of the other two mixes and say that is very bad from my perspective, however, more human perspective, we’re saying all of them are pretty much okay. Well, we would say none of them are pretty much okay because they don’t incorporate many other of our preferences but the idea is that when we get all the preferences together, it shouldn’t matter a bit if it’s a bit fuzzy. So even though the outcome will change a lot if we shift it a little bit, the quality of the outcome shouldn’t change a lot and this is connected with a point that I’ll put up in section three that uncertainties may change the outcome a lot but again, uncertainties should not change the quality of the outcome and the quality of the outcome is measured in a somewhat informal way by our current preferences.

Lucas: So, moving along here into section three, what can you tell us about the synthesis of the human utility function in practice?

Stuart: So, first of all, there’s … Well, let’s do this project, let’s get it done but we don’t have perfect models of the human brain, we haven’t grounded all the symbols, what are we going to do with the great uncertainties. So, that’s arguing that even with the uncertainties, this method is considerably better than nothing and you should expect it to be pretty safe and somewhat adequate even with great uncertainties. The other part is I’m showing how thinking in terms of the human mental models can help to correct and improve some other methods like revealed preferences, our stated preferences, or the locking the philosopher in a box for a thousand years. All methods fail and we actually have a pretty clear idea when they fail, revealed preferences fail because we don’t model bounded rationality very well and even when we do, we know that sometimes our preferences are different from what we reveal. Stated preferences fail in situations where there’s strong incentives not to tell the truth, for example.

We could deal with these by sort of adding all the counter examples of the special case or we could add the counter examples as something to learn from or what I’m recommending is that we add them as something to learn from while stating that the reason that this is a counter example is that there is a divergence between whatever we’re measuring and the internal model of the human. The idea being that it is a lot easier to generalize when you have an error theory rather than just lists of error examples.

Lucas: Right and so there’s also this point of view here that you’re arguing that this research agenda and perspective is also potentially very helpful for things like corrigibility and low impact research and Christiano’s distillation and amplification, which you claim all seem to be methods that require some simplified version of the human utility function. So any sorts of conceptual insights or systematic insights which are generated through this research agenda in your view seem to be able to make significant contributions to other research agendas which don’t specifically take this lens?

Stuart: I feel that even something like corrigibility can benefit from this because in my experience, things like corrigibility, things like low impact have to define to some extent what is important and what can be categorized as unimportant. A low impact AI cannot be agnostic about our preferences, it has to know that a nuclear war is a high impact thing whether or not we’d like it whereas turning on an orange light that doesn’t go anywhere is a low impact thing, but there’s no real intrinsic measure by which one is high impact and the other is low impact. Both of them have ripples across the universe. So, I think I phrased it as Hitler, Gandhi and Thanos all know what a low impact AI is, all know what an oracle AI is, or know the behavior to expect from it. So, it means that we need to get some of the human preferences in, the bit that tells us that nuclear wars are high impact but we don’t need to get all of it in because since so many different humans will agree on it, you don’t need to capture any of their individual preferences.

Lucas: So, it’s applicable to these other methodologies and it’s also your belief and I’m quoting you here, you say that, “I’d give a 10% chance of it being possible this way, meaning through this research agenda and a 95% chance that some of these ideas will be very useful for other methods of alignment.” So, just adding that here as your credences for the skillfulness of applying insights from this research agenda to other areas of AI alignment.

Stuart: In a sense, you could think of this research agenda in reverse. Imagine that we have reached some outcome that isn’t some positive outcome, we have got alignment and we haven’t reached it through a single trick and we haven’t reached it through the sort of tool AIs or software as a service or those kinds of approaches, we have reached an actual alignment. It, therefore, seems to me all the problems that I’ve listed or almost all of them will have had to have been solved, therefore, in a sense, much of this research agenda needs to be done directly or indirectly in order to achieve any form of sensible alignment. Now, the term directly or indirectly is doing a lot of the work here but I feel that quite a bit of this will have to be done directly.

Lucas: Yeah, I think that that makes a lot of sense. It seems like there’s just a ton about the person that is just confused and difficult to understand what we even mean here in terms of our understanding of the person and also broader definitions included in alignment. Given this optimism that you’ve stated here surrounding the applicability of this research agenda on synthesizing a humans’ preferences into a utility function, what can you say about the limits of this method? Any pessimism to inject here?

Stuart: So, I have a section four, which is labeled as the things that I don’t address. Some of these are actually a bit sneaky like the section on how to combine the preferences of different people because if you read that section, it basically lays out ways of combining different people’s preferences. But I’ve put it in that to say I don’t want to talk about this issue in the context of this research agenda because I think this just diverts from the important work here, and there are a few of those points but some of them are genuine things that I think are problems and the biggest is the fact that there is a sort of informal Godel statement in humans about their own preferences. How many people would accept a computer synthesis of their preferences and say yes, that is my preferences, especially when they can explore it a bit and find the counter intuitive bits? I expect humans in general to reject the AI assigned synthesis no matter what it is, pretty much just because it was synthesized and then given to them, I expect them to reject or want to change it.

We have a natural reluctance to accept the judgment of other entities about our own morality and this is a perfectly fine meta-preference that most humans have and I think all humans have to some degree and I have no way of capturing it within the system because it’s basically a Godel statement in a sense. The best synthesis process is the one that wasn’t used. The other thing is that people want to continue with moral learning and moral improvement and I’ve tried to decompose moral learning and more improvements into different things and show that some forms of moral improvements and moral learning will continue even when you have a fully synthesized utility function but I know that this doesn’t capture everything of what people mean by this and I think it doesn’t even capture everything of what I would mean by this. So, again, there is a large hole in there.

There are some other holes of the sort of more technical nature like infinite utilities, stability of values and a bunch of other things but conceptually, I’m the most worried about these two aspects, the fact that you would reject what values you were assigned and the fact that you’d want to continue to improve and how do we define continuing improvement that isn’t just the same as well your values may drift randomly.

Lucas: What are your thoughts here? Feel free to expand on both the practical and theoretical difficulties of applying this across humanity and aggregating it into a single human species wide utility function.

Stuart: Well, the practical difficulties are basically politics, how to get agreements between different groups. People might want to hang onto their assets or their advantages. Other people might want sort of stronger equality. Everyone will have broad principles to appeal to. Basically, there’s going to be a lot of fighting over the different weightings of individual utilities. The hope there is that, especially with a powerful AI, that the advantage might be sufficiently high that it’s easier to do something where everybody gains even if the gains are uneven than to talk about how to divide a fixed sized pie. The theoretical issue is mainly what do we do with anti-altruistic preferences. I’m not talking about selfish preferences, those are very easy to deal with. That’s just basically competition for the utility, for the resources, for the goodness but actual anti-altruistic utilities so, someone who wants harm to befall other people and also to deal with altruistic preferences because you shouldn’t penalize people for having altruistic preferences.

You should, in a sense, take out the altruistic preferences and put that in the humanity one and allow their own personal preferences some extra weight, but anti-altruistic preferences are a challenge especially because it’s not quite clear where the edge is. Now, if you want someone to suffer, that’s an anti-altruistic preference. If you want to win a game and part of your enjoyment of the game is that other people lose, where exactly does that lie and that’s a very natural preference. You might become a very different person if you didn’t get some at least mild enjoyment from other people losing or from the status boost there is a bit tricky. You might sort of just tone them down so that mild anti-altruistic preferences are perfectly fine, so if you want someone to lose to your brilliant strategy at chess, that’s perfectly fine but if you want someone to be dropped slowly into a pit of boiling acid, then that’s not fine.

The other big question is population ethics. How do we deal with new entities and how do we deal with other conscious or not quite conscious animals around the world, so who gets to count as a part of the global utility function?

Lucas: So, I’m curious to know about concerns over aspects of this alignment story or any kind of alignment story involving lots of leaky abstractions, like in Rich Sutton’s short essay called The Bitter Lesson, he discusses how the bitter lesson of computer science is how leveraging computation over human domain-specific ingenuity has broadly been more efficacious for breeding very powerful results. We seem to have this tendency or partiality towards trying to imbue human wisdom or knowledge or unique techniques or kind of trickery or domain-specific insight into architecting the algorithm and alignment process in specific ways whereas maybe just throwing tons of computation at the thing has been more productive historically. Do you have any response here for concerns over concepts being leaky abstractions, or the categories in which you use to break down human preferences, not fully capturing what our preferences are?

Stuart: Well, in a sense that’s part of the research project and part of the reasons why I warned against going to distant words where in my phrasing, the web of connotations break down, in your phrasing the abstractions become too leaky and this is also part of why even though the second part is done as if this is the theoretical way of doing it, I also think there should be a lot of experimental aspect to it to test where this is going, where it goes surprisingly wrong or surprisingly right, the second part, though it’s presented as just this is basically the algorithm, it should be tested and checked and played around with to see how it goes. For The Bitter Lesson, the difference here I think is that in the case of The Bitter Lesson, we know what we’re trying to do.

We have objectives whether it’s winning at a game, whether it’s classifying images successfully, whether it’s classifying some other feature successfully, we have some criteria for the success of it. The constraints I’m putting in by hand are not so much trying to put in the wisdom of the human or the wisdom of the Stuart. There’s some of that but it’s to try and avoid disasters and the disasters cannot be just avoided with more data. You can get to many different points from the data and I’m trying to carve away lots of them. Don’t oversimplify, for example. So, to go back to The Bitter Lesson, you could say that you can tune your regularizer and what I’m saying is have a very weak regularizer, for example and this is not something that The Bitter Lesson applies to because in the real world, on the problems where The Bitter Lesson applies, you can see whether hand tuning the regularizer works because you can check what the outcome is and compare it with what you want.

Since you can’t compare it with what you want, because if we knew what we wanted we’d kind of have it solved, what I’m saying here is don’t put a strong regularizer for these reasons. The data can’t tell me that I need a stronger regularizer because the data has no opinion if you want on that. There is no ideal outcome to compare with. There might be some problems but the problems like if our preferences do not look like my logic or like our logic, this points towards the method failing, not towards the method’s needing more data and less restrictions.

Lucas: I mean I’m sure part of this research agenda is also further clarification and refinement of the taxonomy and categories used, which could potentially be elucidated by progress in neuroscience.

Stuart: Yes, and there’s a reason that this is version 0.9 and not yet version 1. I’m getting a lot of feedback and going to refine it before trying to put it out as version 1. It’s in alpha or in beta at the moment. It’s a prerelease agenda.

Lucas: Well, so hopefully this podcast will spark a lot more interest and knowledge about this research agenda and so hopefully we can further contribute to bettering it.

Stuart: When I say that this is in alpha or in beta, that doesn’t mean don’t criticize it, do criticize it and especially if these can lead to improvements but don’t just assume that this is fully set in stone yet.

Lucas: Right, so that’s sort of framing this whole conversation in the light of epistemic humility and willingness to change. So, two more questions here and then we’ll wrap up. So, reflective equilibrium, you say that this is not a philosophical ideal, can you expand here about your thoughts on reflective equilibrium and how this process is not a philosophical ideal?

Stuart: Reflective equilibrium is basically you refine your own preferences, make them more consistent, apply them to yourself until you’ve reached a moment where your meta-preferences and your preferences are all smoothly aligned with each other. What I’m doing is a much more messy synthesis process and I’m doing it in order to preserve as much as possible of the actual human preferences. It is very easy to reach reflective equilibrium by just, for instance, having completely flat preferences or very simple preferences, these tend to be very reflectively in equilibrium with itself and pushing towards this thing is a push towards, in my view, excessive simplicity and the great risk of losing valuable preferences. The risk of losing valuable preferences seems to me a much higher risk than the gain in terms of simplicity or elegance that you might get. There is no reason that the kludgey human brain and it’s mess of preferences should lead to some simple reflective equilibrium.

In fact, you could say that this is an argument against reflexive equilibrium because it means that many different starting points, many different minds with very different preferences will lead to similar outcomes which basically means that you’re throwing away a lot of the details of your input data.

Lucas: So, I guess two things, one is that this process clarifies and improves on incorrect beliefs in the person but it does not reflect what you or I might call moral wrongness, so like if some human is evil, then the synthesized human utility function will reflect that evilness. So, my second question here is, an idealization process is very alluring to me. Is it possible to synthesize the human utility function and then run it internally on the AI and then see what we get in the end and then check if that’s a good thing or not?

Stuart: Yes, in practice, this whole thing, if it works, is going to be very experimental and we’re going to be checking the outcomes and there’s nothing wrong with sort of wanting to be an idealized version of yourself. What I have, especially if it’s just one idealized, it’s the version where you are the idealized version of the idealized version of the idealized version of the idealized version, et cetera, of yourself where there is a great risk of losing yourself and the inputs there. This is where I had the idealized process where I started off wanting to be more compassionate and spreading my compassion to more and more things at each step, eventually coming to value insects as much as humans and then at the next step, value rocks as much as humans and then removing humans because of the damage that they can do to mountains, that was a process or something along the lines of what I can see if you are constantly idealizing yourself without any criteria for stop idealizing now or you’ve gone too far from where you started.

Your ideal self is pretty close to yourself. The triple idealized version of your idealized, idealized self or so on, starts becoming pretty far from your starting point and this is the sort of areas where I fear over-simplicity or trying to get to reflective equilibrium at the expense of other qualities and so on, these are the places where I fear this pushes towards.

Lucas: Can you make more clear what failed in our view in terms of that idealization process where Mahatma Armstrong turns into a complete negative utilitarian?

Stuart: It didn’t even turn into a negative utilitarian, it just turned into someone that valued rocks as much as they valued humans and therefore eliminated humans on utilitarian grounds in order to preserve rocks or to preserve insects if you wanted to go down one level of credibility. The point of this is this was the outcome of someone that wants to be more compassionate, continuously wanting to make more compassionate versions of themselves that still want to be more compassionate and so on. It went too far from where it had started. It’s one of many possible narratives but the point is the only way of resisting something like that happening is to tie the higher levels to the starting point. A better thing might say I want to be what myself would think is good and what my idealized self would think was good and what the idealized, idealized self would think was good and so on. So that kind of thing could work but just idealizing without ever tying it back to the starting point, to what compassion meant for the first entity, not what it meant for the nth entity is the problem that I see here.

Lucas: If I think about all possible versions of myself across time and I just happen to be one of them, this just seems to be a meta-preference to bias towards the one that I happen to be at this moment, right?

Stuart: We have to make a decision as to what preferences to take and we may as well take now because if we try and take into account our future preferences, we are starting to come a cropper with the manipulable aspect of our preferences. The fact that these could be literally anything. There is a future Stuart who is probably a Nazi because you can apply a certain amount of pressure to transform my preferences. I would not want to endorse their preferences now. There are future Stuarts who are saints, whose preferences I might endorse. So, if we’re deciding which future preferences that we’re accepting, we have to decide it according to criteria and criteria that at least are in part of what we have now.

We could sort of defer to our expected future selves if we sort of say I expect a reasonable experience of the future, define what reasonable means and then average out our current preferences with our reasonable future preferences if we can define what we mean by reasonable then, then yes, we can do this. This is our sole way of doing things and if we do it this way, it will most likely be non-disastrous. If doing the synthesis process with our current preference is non-disastrous then doing it with the average of our future reasonable preferences is also going to be non-disastrous. This is one of the choices that you could choose to put into the process.

Lucas: Right, so we can be mindful here that we’ll have lots of meta-preferences about the synthesis process itself.

Stuart: Yes, you can put it as a meta-preference or you can put it explicitly in the process if that’s a way you would prefer to do it. The whole process is designed strongly around get an answer from this process, so the, yes, we could do this, let’s see if we can do it for one person over a short period of time and then we can talk about how we might take into account considerations like that, including as I say, this might be in the meta-preferences themselves. This is basically another version of moral learning. We’re kind of okay with our values shifting but not okay with our values shifting arbitrarily. We really don’t want our values to completely flip from what we have now, though some aspects we’re more okay with them changing. This is part of the complicated how do you do moral learning.

Lucas: All right, beautiful, Stuart. Contemplating all this is really quite fascinating and I just think in general, humanity has a ton more thinking to do and self-reflection in order to get this process really right and I think that this conversation has really helped elucidate that to me and all of my contradictory preferences and my multitudes within the context of my partial and sometimes erroneous mental models, reflecting on that also has me feeling maybe slightly depersonalized and a bit ontologically empty but it’s beautiful and fascinating. Do you have anything here that you would like to make clear to the AI alignment community about this research agenda? Any last few words that you would like to say or points to clarify?

Stuart: There are people who disagree with this research agenda, some of them quite strongly and some of them having alternative approaches. I like that fact that they are researching other alternatives. If they disagree with the agenda and want to engage with it, the best engagement that I could see is pointing out why bits of the agenda are unnecessary or how alternate solutions could work. You could also point out that maybe it’s impossible to do it this way, which would also be useful but if you think you have a solution or the sketch of a solution, then pointing out which bits of the agenda you solve otherwise would be a very valuable exercise.

Lucas: In terms of engagement, you prefer people writing responses on the AI Alignment forum or Lesswrong

Stuart: Emailing me is also fine. I will eventually answer every non-crazy email.

Lucas: Okay, wonderful. I really appreciate all of your work here on this research agenda and all of your writing and thinking in general. You’re helping to create beautiful futures with AI and you’re much appreciated for that.

If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

End of recorded material

FLI Podcast: Beyond the Arms Race Narrative: AI & China with Helen Toner & Elsa Kania

Discussions of Chinese artificial intelligence frequently center around the trope of a U.S.-China arms race. On this month’s FLI podcast, we’re moving beyond the arms race narrative and taking a closer look at the realities of AI in China and what they really mean for the United States. Experts Helen Toner and Elsa Kania, both of Georgetown University’s Center for Security and Emerging Technology, discuss China’s rise as a world AI power, the relationship between the Chinese tech industry and the military, and the use of AI in human rights abuses by the Chinese government. They also touch on Chinese-American technological collaboration, technological difficulties facing China, and what may determine international competitive advantage going forward. 

Topics discussed in this episode include:

  • The rise of AI in China
  • The escalation of tensions between U.S. and China in the AI realm 
  • Chinese AI Development plans and policy initiatives
  • The AI arms race narrative and the problems with it 
  • Civil-military fusion in China vs. U.S.
  • The regulation of Chinese-American technological collaboration
  • AI and authoritarianism
  • Openness in AI research and when it is (and isn’t) appropriate
  • The relationship between privacy and advancement in AI 

References discussed in this episode include:

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Ariel Conn: Hi everyone and welcome to another episode of the FLI podcast! I’m your host Ariel Conn. Now, by sheer coincidence, Lucas and I both brought on guests to cover the same theme this month, and that is AI and China. Fortunately, AI and China is a huge topic with a lot to cover. For this episode, I’m pleased to have Helen Toner and Elsa Kania join the show. We will be discussing things like the Beijing AI Principles, why the AI arms race narrative is problematic, civil-military fusion in China versus in the US, the use of AI in human rights abuses, and much more.

Helen is Director of Strategy at Georgetown’s Center for Security and Emerging Technology. She previously worked as a Senior Research Analyst at the Open Philanthropy Project, where she advised policymakers and grantmakers on AI policy and strategy. Between working at Open Philanthropy and joining CSET, Helen lived in Beijing for nine months, studying the Chinese AI ecosystem as a Research Affiliate of Oxford University’s Center for the Governance of AI. Helen holds a Bachelor of Science and a Diploma in Languages from the University of Melbourne.

Elsa is a Research Fellow also at Georgetown’s CSET, and she is also a PhD student in Harvard University’s Department of Government. Her research focuses on Chinese military innovation and technological development.

Elsa and Helen, thank you so much for joining us.

Helen Toner: Great to be here.

Elsa Kania: Glad to be here.

Ariel Conn: So, I have a lot of questions for you about what’s happening in China with AI, and how that’s impacting U.S. China relations. But before I dig into all of that, I want to actually start with some of the more recent news, which is the Beijing principles that came out recently. I was actually surprised because they seem to be some of the strongest principles about artificial intelligence that I’ve seen, and I was wondering if you both could comment on your own reactions to those principles.

Elsa Kania: I was encouraged to see these principles released, and I think it is heartening to see greater discussion of AI ethics in China. At the same time, I’m not convinced that these are necessarily strong in the sense of not clear as to what the mechanism for enforcement would be, and I think that this is not unique to China, but I think often the articulation of principles can be a means of burnishing the image, whether of a company or a country, with regard to its intentions in AI.

Although it’s encouraging to hear a commitment to use AI to do good, and for humanity, and control risks, these are very abstract statements, and some of them are rather starkly at odds with realities of how we know AI is being abused by the Chinese government today for purposes that reinforce the coercive capacity of the state: including censorship, surveillance; prominently in Xinjiang where facial recognition has been racially targeted against ethnic minorities, against the backdrop of the incarceration and imprisonment of upwards of a million — by some estimates — Uyghurs in Xinjiang.

So, I think it’s hard not to feel a degree of cognitive dissonance when reading these principles. And again I applaud those involved in the process for their efforts and for continuing to move this conversation forward in China; But again, I’m skeptical that this espoused commitment to certain ethics will necessarily constrain the Chinese government from using AI in ways that it appears to be deeply committed to do so for reasons of concerns about social stability and state security.

Ariel Conn: So one question that I have is, did the Chinese government actually sign on to these principles? Or is it other entities that are involved?

Elsa Kania: So the Beijing AI principles were launched in some association with the Ministry of Science and Technology for China. So, certainly the Chinese government, actually initially in its New Generation AI Development Plan back in the summer of 2017, had committed to trying to lead and engage with issues of legal, ethical, and regulatory frameworks for artificial intelligence. And I think it is telling that these have been released in English; And to some degree part of the audience for these principles is international, against the backdrop of a push for the Chinese government to promote international cooperation in AI.

And the launch of a number of world AI conferences and attempts to really engage with the international community, again, are encouraging in some respects — but also there can be a level of inconsistency. And I think a major asymmetry is the fact that these principles, and many initiatives in AI ethics in China, are shaped by the government’s involvement. And it’s hard to imagine the sort of open exchange among civil society and different stakeholders that we’ve seen in the United States, and globally, happen in China, given the role of the government. I think it’s telling at the same time that the preamble for the Beijing AI principles talks about the construction of a human community with a shared future, which is a staple in Xi Jinping’s propaganda, and a concept that really encapsulates Chinese ambitions to shape the future course of global governance.

So again, I think I’m heartened to see greater discussion of AI ethics in China. But I think the environment in which these conversations are happening — as well as of course the constraints from any meaningful enforcement, or alteration of the government’s current trajectory in AI — makes me skeptical in some respects. I hope that I am wrong, and I hope that we will see this call to use AI for humanity, and to be diverse and inclusive, start to shape the conversation. So, it will be interesting to see whether we see indicators of results, or impact from these principles going forward.

Helen Toner: Yeah. I think that’s exactly right. And in particular, the release of these principles I think made clear a limitation of this kind of document in general. This was one of a series of sets of principles like this that have been released by a number of different organizations. And the fact of seeing principles like this that look so good on paper, in contrast with some of the behavior that Elsa described from the Chinese government, I think really puts into stark relief the limitations of well-meaning, nice sounding ideas like this that really have no enforcement mechanism.

Ariel, you asked about whether the Chinese government had signed onto these, and as Elsa described, there was certainly government involvement here. But just because there is some amount of the government giving, or some part of the Chinese government giving its blessing to the principles, does not imply that there are any kind of enforcement mechanisms, or any kind of teeth to a document of this kind.

Elsa Kania: And certainly that’s not unique to China. And I think there have been questions of whether corporate AI principles, whether from American or Chinese companies, are essentially intended for public relations purposes, or will actually shape the company’s decision making. So, I think it’s really important to move these conversations forward on ethics. At the same time, it will be interesting to see how principles translate into practice, or perhaps in some cases don’t.

Ariel Conn: So I want to backtrack a little bit to where some of the discussion about China’s development of AI started, at least from more Western perspectives. My understanding is that seeing AlphaGo beat Lee Sedol led to something of a rallying cry — I don’t know if that’s quite the right phrase — but that that sort of helped trigger the Chinese government to say, “We need to be developing this a lot stronger and faster.” Is that the case? Or what’s been sort of the trajectory of AI development in China?

Elsa Kania: I think it depends on how far back you want to go historically.

Ariel Conn: That’s fair.

Elsa Kania: I think in recent history certainly AlphaGo was a unique moment — both as an indication of how rapidly AI was progressing, given that experts had not anticipated an AI could win the game of Go for another 10, perhaps 15 years — and also in the context of how the Chinese government, and even the Chinese military, saw this as an indication of the capabilities of American artificial intelligence, including the relevance of the capacities for tactics and strategizing, command decision making in a military context. 

At the same time of course I think another influence in 2016 appears to have been the U.S. government’s emphasis on AI at the time, including a plan for research and development that may have received more attention in Beijing than it did in Washington in some respects, because this does appear to have been one of the factors that inspired China’s New Generation AI Development Plan, launched the following year. 

But I think if we’re looking at the history of AI in China, we can trace it back much further: even some linkages to the early history of cybernetics and systems engineering. And there are honestly some quite interesting episodes early on, because during the Cold War, artificial intelligence could be a topic that had some ideological undertones and underpinnings — including how the Soviet Union saw AI in system science, and some of the critiques of this as revisionism.

And then there is even an interesting detour in the 80s or so: when Qian Xuesen, a prominent strategic scientist in China’s nuclear weapons program, saw AI as entangled with an interest in parapsychology — including exceptional human body functions such as the capacity to recognize characters with your ears. There was a craze for ESP in China in the 80s, and actually received some attention in scientific literature as well: There was an interesting conflation of artificial intelligence and special functions that became the subject of some ideological debate in which Qian Xuesen was an advocate essentially of ESP in ways that undermined early AI development in China.

And other academic rivals in the Chinese Academy of Sciences argued in favor of AI as a discipline of emerging science relative to the pseudoscience that human special functions turned out to be, and this became a debate of some ideological importance as well against the backdrop of questions of arbitrating what science was, and how the Chinese Communist Party tried to sort of shape science. 

I think that does go to illustrate that although a lot of the headlines about China’s rise in AI are much more recent, not only state support for research, but also the significant increasing in publications far predates this attention, and really can be traced to some degree to the 90s, and especially from the mid 2000s onward.

Helen Toner: I’ll just add as well that if we’re thinking about what it is that caused this surge in Western interest in Chinese AI, I think a really important part of the backdrop is the shift in U.S. defense thinking to move away from thinking primarily about terrorism, and non-state actors as the primary threat to U.S. security, and shifting towards thinking about near-peer adversaries — so primarily China and Russia — which is a recent change in U.S. doctrine. And I think that is also an important factor in understanding why Chinese interest and success in AI has become such an important sort of conspicuous part of the discussion.

Elsa Kania: There’s also been really a recalibration of assessments of the state of technology and innovation in China, from often outright skepticism and dismissal that China could innovate to sometimes now a course correction towards the opposite extreme; and now anxieties that China may be beating us in the “race for AI” or 5G — even quantum computing has provoked a lot of concern. So, I think on one hand it is long overdue that U.S. policy makers and the American National Security community take seriously what are quite real and rapid advances in science and technology in China.

At the same time I think sometimes this reaction has resulted in more inflated assessments that have provoked concerns about the notion of an arms race, which I think is really wrong and misleading framing of this when we’re talking about a general purpose technology that has such a range of applications, and for which the economic and societal impacts may be more significant than the military applications in the near-term, which I say is an analyst who focuses on military issues.

Ariel Conn: I want to keep going with this idea of the fear that’s sort of been developing in the U.S. in response to China’s developments. And I guess I first started seeing it a lot more when China released their Next Generation Artificial Intelligence Plan — I believe that’s the one that said by 2030 they wanted to dominate in AI.

Helen Toner: That’s right.

Ariel Conn: So I’d like to hear both of your thoughts on that. But I’m also sort of interested in — to me it seemed like that plan came out in part as a response to what they were seeing from the US, and then the U.S. response to this is to — maybe panic is a little bit extreme, but possibly overreact to the Chinese plan — and maybe they didn’t overreact, that might be incorrect. But it seems like we’re definitely seeing an escalation occurring.

So let’s start by just talking about what that plan said, and then I want to dive into this idea of the escalation, and maybe how we can look at that problem, or address it, or consider it.

Elsa Kania: So, I’d been certainly looking at a lot of different plans and policy initiatives for the 13th Five-Year Plan period, which is 2016 to 2020, and I had noticed when this New Generation AI Development Plan came out; and initially it was only available in Chinese. A couple of us, after we’d come across it initially, had organized to work on a translation of it, and to this day that’s still the only unofficial English translation of this plan available. So far as I can tell the Chinese government itself never actually translated that plan. And in that regard, it does not appear to have been intended for an international audience in the way that, for instance, the Beijing AI Principles were.

So, I think that some of the rhetoric in the plan that rightly provoked concerns — calling for China to lead the world in AI and be a premier global innovation center for artificial intelligence — is striking, but is consistent with S&T plans that often call for China to seize the strategic commanding heights of innovation, and future advantage. So I think that a lot of the signaling about the strategic importance of AI to some degree was intended for an internal audience, and certainly we’ve seen a powerful response in terms of plans and policies launched across all elements of the Chinese government, and at all levels of government including a number of cities and provinces.

I do think it was highly significant in reflecting how the Chinese government saw AI as really a critical strategic technology to transform the Chinese economy, and society, and military — though that’s discussed in less detail in the plan.

But there is also an open acknowledgement in the plan that China still sees itself as well behind the U.S. in some respects. So, I think the ambitions and the resources and policy support across all levels of government that this plan has catalyzed are extremely significant, and I think do merit some concern, but I think some of the rhetoric about an AI race, or arms race — clearly there is competition in this domain. But I do think the plan should be placed in the context of an overall drive by the Chinese government to escape the middle income trap, and sustain economic growth at a time when it’s slowing and looking to AI as an important instrument to advance these national objectives.

Helen Toner: I also think there is something kind of amusing that happened where, as Elsa said earlier, it seems like one driver of the creation of this plan was that China saw the U.S. government under the Obama administration in 2016 run a series of events and then put together a white paper about AI, and a federal R&D plan. And China’s response to this was to think, “Oh, we should really put together our own strategy, since the U.S.has one.” And then somehow with the change in administrations, and the time that had elapsed, there suddenly became this narrative of, “Oh no, China has an AI strategy and the U.S. doesn’t have one; So now we have to have one because they have one.” And that was a little bit farcical to be honest. And I think has now died down after, I believe it’s called the American AI Initiative that President Trump released. But that was amusing to watch while it was happening.

Elsa Kania: I hope that the concerns over the state of AI in China can provoke concerns that motivate productive responses. I agree that sometimes the debate has focused too much on the notion of what it would mean to have an AI strategy, or concerns about the plan as sort of one of the most tangible manifestations of these ambitions. But I do think there are reasons for concern that the U.S. has really not recognized the competitive challenge, and sometimes still seems to take for granted American leadership in emerging technologies for which the landscape does remain much more contested.

Helen Toner: For sure.

Ariel Conn: Do you feel like we’re starting to see de-escalation then — that people are starting to maybe change their rhetoric about making sure someone’s ahead, or who’s ahead, or all that type of lingo? Or do you think we are still seeing this escalation that is perhaps being reported in the press still?

Helen Toner: I think there is still a significant amount of concern. Perhaps one shift that we’ve seen a little bit — and Elsa I’d be curious if you agree — is that I think around the time that the Next Generation Plan was released, and attention was starting to turn to China, there began to be a bit of a narrative of, “Not only is China trying to catch up with the U.S. and making progress in catching up with the U.S. but perhaps has already surpassed the U.S. and is perhaps already clearly ahead in AI research globally.” That’s an extremely difficult thing to measure, but I think some of the arguments that were made to say that were not as well backed up as they could have been.

Maybe one thing that I’ve observed over the last six or 12 months is a little bit of a rebalancing in thinking. It’s certainly true that China is investing very heavily in this, and is trying really hard; And it’s certainly true that they are seeing some results from that, but it’s not at all clear that they have already caught up with the U.S. in any meaningful way, or are surpassing it. Of course, it depends how you slice up the space, and whether you’re looking more at fundamental research, or applied research, or so on. But that might be one shift we’ve seen a little bit.

Elsa Kania: I agree. I think there has continued to be a recalibration of assessments, and even a rethinking of the notion of what leading in AI even means. And I used to be asked the question all the time of who was winning the race, or even arms race, for AI. And often I would respond by breaking down the question, asking, “Well what do you mean by who?” Because the answer will differ depending on whether we’re talking about American and Chinese companies, relative to how do we think about aggregating China and the United States as a whole when it comes to AI research — particularly considering the level of integration and interdependence between American and Chinese innovation ecosystems. What do we mean by winning in this context? How do we think about the metrics, or even desired end states? Is this a race to develop something akin to artificial general intelligence? Or is this a rivalry to see which nation can best leverage AI for economic and societal development across the board?

And then again, why do we continue to talk about this as a race? I think that is a metaphor in framing that does readily come to mind and can be catchy. And as someone who looks at the military dimension of this quite frequently, I often find myself explaining why I don’t think “arms race” is an appropriate conceptualization either. Because this is a technology that will have a range of applications across different elements of the military enterprise — and that does have great promise for providing decisive advantage in the future of warfare, and yet we’re not talking about a single capability or weapon systems, but rather something that is much more general purpose, and that is fairly nascent in its development.

So, AI does factor into this overall U.S.-China military competition that is much more complex and amorphous than the notion of an arms race to develop killer robots would imply. Because certainly there are autonomous weapons development underway in the U.S. and China today; and I think that is quite concerning from the perspective of thinking about the future military balance, or how the U.S. and Chinese militaries might be increasing the risks of a crisis, and considerations of how to mitigate those concerns and reinforce strategic stability.

So hopefully there is starting to be greater questioning of some of these more simplistic framings, often in headlines, often in some of the more sensationalist statements out there. I don’t believe China is yet an AI superpower, but clearly China is an AI powerhouse.

Ariel Conn: Somewhat recently there was an op ed by Peter Thiel in which he claims that China’s tech development is naturally a part of the military. There’s also this idea that I think comes from China of military-civil fusion. And I was wondering if you could go into the extent to which China’s AI development is naturally a part of their military, and the extent to which companies and research institutes are able to differentiate their work from military applications.

Elsa Kania: All right. So, the article in question did not provide a very nuanced discussion of these issues. And to start I would say that it is hardly surprising that the Chinese military is apparently enthusiastic about leveraging artificial intelligence. China’s new national defense white paper, titled “China’s National Defense in the New Era,” talked about advances in technologies like big data, cloud computing, artificial intelligence, quantum information, as significant at a time when the character of warfare is evolving — what is known as today’s informatized warfare, towards future intelligentized warfare, in which some of these emerging technologies, namely artificial intelligence, could be integrated into the system of systems for future conflict.

And the Chinese military is pursuing this notion of military intelligentization, which essentially involves looking to leverage AI for a range of military applications. At the same time, I see military-civil fusion as a concept and strategy to remain quite aspirational in some respects.

There’s also a degree of irony, I’d argue, that much of what China is attempting to achieve through military-civil fusion is inspired by dynamics and processes that they have seen be successful in the American defense innovation ecosystem. I think sometimes there is this tendency to talk about military-civil fusion as this exotic or uniquely Chinese approach, when in fact there are certain aspects of it that are directly mimicking, or responding to, or learning from what the U.S. has had within our ecosystem for a much longer history. And China’s trying to create this more rapidly and more recently. 

So, the delta of increase, perhaps, and the level of integration between defense, academic, and commercial developments, may be greater. But I think the actual results so far are more limited. And again it is significant, and there are reasons for concern. We are seeing a greater and greater blurring of boundaries between defense and commercial research, but the fusion is again much more aspirational, as opposed to the current state of play.

Helen Toner: I’ll add as well, returning to that specific op ed when Thiel mentioned military-civil fusion, he actually linked to an article by a colleague of Elsa’s and mine, Lorand Laskai, where he wrote about military-civil fusion, and Lorand straight up said that Thiel had clearly not read the article, based on the way that he described military-civil fusion.

Ariel Conn: Well, that’s reassuring.

Elsa Kania: We are seeing militaries around the world, the U.S. and China among them, looking to build bridges to the private sector, and deepening cooperation with commercial enterprises. And I think it’s worth thinking about the factors that could provide a potential advantage; or for militaries that are looking to increase their capacity as organizations to leverage these technologies — this is an important dimension of that. And I think we are seeing some major progress in China in terms of new partnerships, including initiatives at the local level, new parks, new joint laboratories. But I do think, as with the overall status of China’s AI plan, there’s a lot of activity and a lot of investment. But the results are harder to ascertain at this point.

And again, I think it also does speak to questions of ethics in the sense that we have in the U.S. seen very open debate about companies and concerns, particularly of their employees, about whether they should or should not be working with the military or government on different projects. And I remain skeptical that we could see comparable debates or conversations happening in China, or that a Chinese company would outright say no to the government. I think certainly some companies may resist on certain points, or at the margins, especially when they have commercial interests that differ from the priorities of the government. But I do think the political economy of this ecosystem as a whole is very distinct.

And again I’m skeptical that if the employees of a Chinese company had moral qualms about working with the Chinese military, they’d have the freedom to organize, and engage in activism to try to change that.

Ariel Conn: I’d like to go into that a little bit more, because there’s definitely concerns that get raised that we have companies in the U.S. that are rejecting contracts with the U.S. government for fear that their work will be militarized, while at the same time — as you said — companies in China may not have that luxury. But then there’s also instances where you have say Google in China doing research, and so does that mean that Google is essentially working with the Chinese military and not the U.S. military? I think there’s a lot of misunderstanding about what the situation actually is there. I was wondering if you could both go into that a little bit.

Helen Toner: Yeah. I think this is a refrain that comes up a lot in DC as, “Well, look at how Google withdrew from its contract to work on Project Maven,” which is a Department of Defense Initiative looking at tagging overhead imagery, “So clearly U.S. companies aren’t willing to work with the U.S. government, But on the other hand they are still working in China. And as we all know, research in China is immediately used by the Chinese military, so therefore, they’re aiding the Chinese military even though they’re not willing to aid the U.S. military.” And I do think this is highly oversimplified description, and pretty incorrect.

So, a couple elements here. One is that I think the Google Project Maven decision seems to have been pretty unique. We haven’t really seen it repeated by other companies. Google continues to work with the U.S. military and the U.S. government in some other ways — for example working on DARPA projects, and working on other projects; And other U.S. companies are also very willing to work with the U.S. government including really world-leading companies. A big example right now is Amazon and Microsoft bidding on this JEDI contract, which is to provide cloud computing services to the Pentagon. So, I think on the one hand, this claim that U.S. companies are unwilling to work with the U.S. military is a vast overgeneralization.

And then on the other hand, I think I would point back to what Elsa was saying about the state of military-civil fusion in China, and the extent to which it makes sense or doesn’t make sense to say that any research done in China is immediately going to be incorporated into Chinese military technologies. I definitely wouldn’t say there is nothing to be concerned about here. But I think that the simplified refrain is not very productive.

Elsa Kania: With regard to some of these controversies, I do continue to believe that having these open debates, and the freedom that American companies and researchers have, is a strength of our system. I don’t think we should envy the state of play in China, where we have seen the Chinese Communist Party become more and more intrusive with regard to its impositions upon the tech sector, and I think there may be costs in terms of the long-term trajectory of innovation in China.

And with regard to the particular activities of American companies in China, certainly there have been some cases where companies have engaged in projects, or with partners, that I think are quite problematic. And one of the most prominent examples of that recently has been Google’s involvement in Dragonfly — creating a censored search engine — which was thoroughly condemned, including because of its apparent inconsistency with their principles. So, I do think there are concerns not only of values but also of security when it comes to American companies and universities that are engaged in China, and it’s never quite a black and white issue or distinction.

So for instance in the case of Google, their presence in China in terms of research does remain fairly limited. There have been a couple of cases where papers published in collaboration between a Google researcher and a Chinese colleague involve topics that are quite sensitive and evidently not the best topic on which to be collaborating, in my opinion — such as target recognition. There’s also been concerns over research on facial recognition, given the known abuse of that technology by the Chinese government. 

I think that also when American companies or universities partner or coauthor with Chinese counterparts, especially those that are linked to or are outright elements of the Chinese military — such as the National University of Defense Technology, which has been quite active in overseas collaborations — I do think that there should be some red lines. I don’t think the answer is “no American companies or universities should do any work on AI in China.” I think that would actually be damaging to American innovation, and I think some of the criticisms of Google have been unfair in that regard, because I do think that a more nuanced conversation is really critical going forward to think about the risks and how to get policy right.

Ariel Conn: So I want to come back to this idea of openness in a minute, but first I want to stick with some pseudo-military concerns. Maybe this is more reflective of what I’m reading, but I seem to see a lot more concern being raised about military applications of AI in China, and some concerns obviously about AI use with their humanitarian issues are starting to come to the surface. In light of some recent events especially like what we’re seeing in Hong Kong, and then with the Uyghurs, should we be worrying more about how China is using AI for what we perceive as human rights abuses?

Elsa Kania: That is something that greatly concerns me, particularly when it comes to the gravity of the atrocities in Xinjiang. And certainly there are very low tech coercive elements to how the Chinese government is essentially trying to re-engineer an entire population in ways that have been compared by experts as tantamount to a cultural genocide, and the creation of concentration camps — and beyond that, the pervasiveness of biometrics and surveillance enabled by facial recognition, and the creation of new software programs to better aggregate big data about individuals. I think all of that paints a very dark picture of ways in which artificial intelligence can enable authoritarianism, and can reinforce the Chinese government’s capability to repress its own population in ways that in some cases can become pervasive in day to day life.

And I’d say that having been to Beijing recently, surveillance is kind of like air pollution. It is pervasive, in terms of the cameras you see out on the streets. It is inescapable in a sense, and it is something that the average person or citizen in China can do very little about. I think of course this is not quite a perfect panopticon yet; Elements of this remain a work in progress. But I do think that the overall trajectory of these developments is deeply worrying in terms of human rights abuses, and yet it’s not as much of a feature of conversations in AI ethics in China. But I think it does overshadow some of the more positive aspects of what the Chinese government is doing with AI, like in health care and education, that this is also very much a reality.

And I think when it comes to the Chinese military’s interest in AI, it is quite a complex landscape of research and development and experimentation. To my knowledge it does not appear that the Chinese military is yet at the stage of deploying all that much in the way of AI: again very active efforts and long term development of weapons systems — including cruise missiles, hypersonics, a range of unmanned systems across all domains with growing degrees of autonomy, unmanned underwater vehicles and submarines, progress in swarming that has been prominently demonstrated, scavenger robots in space as a covert counter-space capability, human machine integration or interaction.

But I think that the translation of some of these initial stages of military innovation into future capabilities will be challenging for the PLA in some respects. There could be ways in which the Chinese military has advantages relative to the U.S., given apparent enthusiasm and support from top-level leadership at the level of Xi Jinping himself, and several prominent generals, who have been advocating for and supporting investments in these future capabilities.

But I do think that we’re really just at the start of seeing what AI will mean for the future of military affairs, and future of warfare. But when it comes to developments underway in China, particularly in the Chinese defense industry, I think the willingness of Chinese companies to export drones, robotic systems — many of which again have growing levels of autonomy, or at least are advertised as such — is also concerning from the perspective of other militaries that will be acquiring these capabilities and could use them in ways that violate human rights. 

But I do think there are concerns how the Chinese military would use its own capabilities. The export of some of these weapons systems going forward, as well as the potential use of made-in-China technologies by non-state actors and terrorist organizations, as we’ve already seen with the use of drones made by DJI by ISIS, or Daesh, in Syria, including as improvised IEDs. So there are no shortage of reasons for concerns, but I’ll stop there for now.

Ariel Conn: Helen, did you have anything you wanted to add?

Helen Toner: I think Elsa said it well. I would just reiterate that I think the ways that we’re starting to see China incorporating AI into its larger surveillance state, and methods of domestic control, are extremely concerning.

Ariel Conn: There’s debate I think about how open AI companies and researchers should be about their technology. But we sort of have a culture of openness in AI. And so I’m sort of curious: how is that being treated in China? Does it seem like that can actually help mitigate some of the negative applications that we see of AI? Or does it help enable the Chinese or anyone else to develop AI in non-beneficial ways that we are concerned about? What’s the role of openness in this?

Elsa Kania: I think openness is vital to innovation, and I hope that can be sustained — even as we are seeing greater concerns about the misuse or transfer of these technologies. I think that the level of openness and integration between the American and Chinese innovation ecosystems is useful in the sense that it does provide a level of visibility, or awareness, or sort of a shared understanding of the state of research. But I think at the same time there are reasons to have some thought-through parameters on that openness, or again — whether from the perspective of ethics or security — ways that having better guidelines or frameworks for how to engage, I think, will be important in order to sustain that openness and engagement.

I think that having better guardrails, and how to think about where openness is warranted, and when there should be at the very least common sense, and hopefully some rigorous consideration of these concerns, will be important. And then also another dimension of openness is thinking about when to release, or publish, or make available certain research, or even the tools underlying those advances; and when it’s better to keep more information proprietary. And I think the greater concern there, beyond the U.S.-China relationship, may be the potential for misuse or exploitation of these technologies by non-state actors, or terrorist organizations, even high end criminal organizations. I think the openness of the AI field is really critical. But I also think to sustain that, it will be important to think very carefully through some of these potential negative externalities across the board.

Helen Toner: One element that makes it extra complicated here in terms of openness and collaboration between U.S. and Chinese researchers: so much of the work that is going on there is really quite basic research — work on computer vision, or on speech recognition, or things of that nature. And that kind of research can be used for so many things, including both harmful, oppressive applications, as well as many much more acceptable applications. I think it’s really difficult to think through how to think about openness in that context.

So, one thing I would love to see is more information being made available to researchers. For example, I do think that any researcher who is working with a Chinese individual, or company, or organization should be aware of what is going on in Xinjiang, and should be aware of the governance practices that are common in China. And it would be great if there were more information available on specific institutions, and how they’re connected to various practices, and so on. That would be a good step towards helping non-Chinese researchers understand what kinds of situations they might be getting themselves involved in.

Ariel Conn: Do you get the sense that AI researchers are considering how some of their work can be applied in these situations where human rights abuses are taking place? I mean, I think we’re starting to see that more, but I guess maybe how much do you feel like you’re seeing that vs. how much more do you think AI researchers need to be making themselves aware?

Helen Toner: I think there’s a lot of interest and care among many AI researchers in how their work will be used, and in making the world a better place, and so on. And I think things like Google’s withdrawal from Project Maven, and also the pressure that was put on Google when it was leaked that it was working on a censored search engine to be used in China: I think those are both evidence of the level of, I guess, caring that is there. But I do think that there could be more awareness of specific issues that are going on in China. I think the situation in Xinjiang is gradually becoming more widely known, but I wouldn’t be surprised if it wasn’t something that plenty of AI researchers had come across. I think it’s a matter of pairing that interest in how their work might be used with information about what is going on, and what might happen in the future.

Ariel Conn: One of the things that I’ve also read, and I think both of you addressed this in works of yours that I was looking at: there’s this concern that China obviously has a lot more people, their privacy policies aren’t as strict, and so they have a lot more access to big data, and that that could be a huge advantage for them. Reading some of your work, it sounded like maybe that wasn’t quite the advantage that people worry about, at least yet. And I was hoping you could explain a little bit about technological difficulties that they might be facing even if they do have more data.

Helen Toner: For sure. I think there are quite a few different ways in which this argument is weaker than it might appear at first. So, I think there are many reasons to be concerned about the privacy implications of China’s data practices. Certainly having spent time in China, it’s very clear that the instant messages you’re sending, for example, are not only being read by you; That’s certainly concerning from that perspective. But if we’re talking about whether data will give them an advantage in developing AI, think there are a few different reasons to be a little bit skeptical.

One reason, which I think you alluded to, is simply whether they can make use of this data that they’re collecting. There was some reporting, I believe, last year coming out of Tencent, talking about ways in which data was very siloed inside the company, and it’s notoriously difficult. The joke among the data scientists is that when you’re trying to solve some problem with data, you spend the first 90% of your time just cleaning and structuring the data, and then only the last 10 percent actually solving the problem. So, that’s the sort of logistical or practical issue that you mentioned.

Other issues are things like: the U.S. doesn’t have as large a population as China, but U.S. companies have much greater international reach. So, they often have as many, if not more, users compared with Chinese companies. Even more importantly, I think, are two extra issues — one of which being that for most AI applications, the kind of data that will be useful in training a given model needs to be relevant to the problem that model is solving. So, if you have lots of data about Chinese customers’ purchases on Taobao, which is Chinese Amazon, then you’re going to be really good at predicting what kind of purchases Chinese consumers will make on Taobao. But that’s not going to help you with, for example, the kind of overhead imagery analysis that Project Maven was targeting, and things like this.

So that’s one really fundamental problem, I think, is this matter of data primarily being useful for training systems that are solving problems that are very related to the data that you have. And then a second really fundamental issue is thinking about how important it is or isn’t to have pre-gathered data in order to train a given model. And so, something that I think is left out of a lot of conversations on this issue is the fact that many types of models — notably, reinforcement learning models — can often be trained on what is referred to as synthetic data, which basically means data that you generate during the experiment — as opposed to requiring a pre gathered data set that you are training your model on.

So, an example of this would be AlphaGo, that we mentioned before. The original AlphaGo was first trained on human games, and then fine tuned from there. But AlphaGo Zero, which was released subsequently, did not actually need any pre-collected data, and instead just used computation to simulate games and play against itself, and thereby learn how to play the game even better than AlphaGo, which was trained on human data. So, I think there are all manner of reasons to be a little bit skeptical of this story that China has some fundamental advantage in access to data.

Elsa Kania: Those are all great points, and I would just add that I think this is particularly true when we look at the apparent disparities in access to data between China’s commercial ecosystem and the Chinese military. As Helen mentioned, much of that data generated from China’s mobile ecosystem will have very little relevance if you are looking to build advanced weapon systems, and the critical question going forward, or the much more relevant concern, will be the Chinese military’s capacity as an organization to improve its management and employment of its own data, while also gaining access to other relevant sources of data and looking to leverage simulations, even war gaming, as techniques to generate more data of relevance to training AI systems for military purposes.

So, the notion that data is the new oil I think is at best a massive oversimplification, given this is much more a complex landscape; And access to and use of, even labeling of data become very practical measures that militaries, among other bureaucracies, will have to grapple with as they think about how to develop AI that is trained particularly for the missions they have in mind.

Ariel Conn: So, does it seem fair to say then that it’s perfectly reasonable for Western countries to maintain, and possibly even develop, stricter privacy laws and still remain competitive?

Helen Toner: I think absolutely. The idea that one would need to reduce privacy controls in order to keep up with some volume of data that needs to be collected in order to be competitive in AI fundamentally misunderstands how AI research works. And I think also misunderstands the ways that Western companies will stay competitive; I think it’s not an accident that WeChat, for example, the most popular messaging app in China has really struggled to spread beyond China, the Chinese diaspora. I would posit that a significant part of that is the fact that it’s clear that messages on that app are going to the Chinese government. So, I think U.S. and other Western companies should be wary of sacrificing the kinds of features and functionalities that are based in the values that we hold dear.

Elsa Kania: I’d just add that I think there’s often this framing of a dichotomy between privacy and advancement in AI — and as Helen said, I think that there are ways to reconcile our priorities and our values in this context. And I think the U.S. government can also do much more when it comes to better leveraging data that it does have available, and making it more open for research purposes while focusing on privacy in the process. Exploitation of data should not come at the expense of privacy or be seen as at odds with advancement.

Helen Toner: And I’ll also add as well that we’re seeing advancements in various technologies that make it possible to utilize data without invading the privacy of the holder of that data. So, these are things like differential privacy, multi-party computation, a number of other related techniques that make it possible to securely and privately make use of data for improving goals without exposing the individual data of any particular user.

Ariel Conn: I feel like that in and of itself is another podcast topic.

Helen Toner: I agree.

Ariel Conn: The last question I have is: what do you think is most important for people to know and consider when looking at Chinese AI development and the Western concerns about it?

Elsa Kania: The U.S. in many respects does remain in a fairly advantageous position. However, I worry we may erode our own advantages if we don’t recognize what they are. And I think it does come down to the fact that the openness of the American innovation ecosystem, including our welcome to students and scholars from all over the world, has been critical to progress in science in the United States. And I think it’s really vital to sustain that. I think between the United States and China today, the critical determinant of competitive advantage going forward will be talent. I think there are many ways that China continues to struggle and is lagging behind its access to human capital resources — though there are some major policy initiatives underway from the Chinese Ministry of Education, significant expansions of the use of AI in and for education.

So, I think that as we think about relative trajectories in the long term, it will be important to think about talent, and how this is playing out in a very complex and often very integrated landscape between the U.S. and China. And I’ve said it before, and I’ll say it again: I think in the United States it is encouraging that the Department of Defense has a strategy for AI and is thinking very carefully about the ethics and opportunities it provides. I hope that the U.S. Department of Education, and that states and cities across the U.S., will also start to think more about what AI can do in terms of opportunities, in terms of more personalized and modernized approaches to education in the 21st century.

Because I think again, although I’m someone who as an analyst looks more at the military elements of this question, I think talent and education are foundational to everything. And some of what the Chinese government is doing with exploring the potential of AI in education are things that I wish the U.S. government would consider pursuing equally actively — though with greater concern to privacy and to the well-being of students. I don’t think we should necessarily envy or look to emulate many elements of China’s approach, but I think on talent and education it’s really critical for the U.S. to think about that as a main frontier of competition and to sustain openness to students and scientists from around the world, which requires thinking about some of these tricky issues of immigration that have become politicized to a level that is unfortunate and risks damaging our overall innovation ecosystem, not to mention the well-being and opportunities of those who can sometimes get caught in this crossfire in terms of the geopolitics and politics.

Helen Toner: I’d echo what Elsa said. I think in a nutshell what I would recommend for those interested in thinking about China’s prospects in AI is to be less concerned about how much data they have access to, or about the Chinese government and its plans being a well-oiled machine that works perfectly on the first try — and to pay more attention to, on the one hand, the willingness of the Chinese Communist Party to use extremely oppressive measures, and on the other hand, to pay more attention to the question of human capital and talent in AI development, and to focus more on how the U.S. can do better at attracting and retaining top talent — which has historically been something the U.S. has done really well, but for a variety of reasons has perhaps started to slide a little bit in recent years.

Ariel Conn: All right. Well, thank you both so much for joining this month. This was really interesting for me.

Elsa Kania: Thank you so much. Enjoyed the conversation, and certainly much more to discuss on these fronts.

Helen Toner: Thanks so much for having us.

 

 

AI Alignment Podcast: China’s AI Superpower Dream with Jeffrey Ding

“In July 2017, The State Council of China released the New Generation Artificial Intelligence Development Plan. This policy outlines China’s strategy to build a domestic AI industry worth nearly US$150 billion in the next few years and to become the leading AI power by 2030. This officially marked the development of the AI sector as a national priority and it was included in President Xi Jinping’s grand vision for China.” (FLI’s AI Policy – China page) In the context of these developments and an increase in conversations regarding AI and China, Lucas spoke with Jeffrey Ding from the Center for the Governance of AI (GovAI). Jeffrey is the China lead for GovAI where he researches China’s AI development and strategy, as well as China’s approach to strategic technologies more generally. 

Topics discussed in this episode include:

  • China’s historical relationships with technology development
  • China’s AI goals and some recently released principles
  • Jeffrey Ding’s work, Deciphering China’s AI Dream
  • The central drivers of AI and the resulting Chinese AI strategy
  • Chinese AI capabilities
  • AGI and superintelligence awareness and thinking in China
  • Dispelling AI myths, promoting appropriate memes
  • What healthy competition between the US and China might look like

You can take a short (3 minute) survey to share your feedback about the podcast here.

 

Key points from Jeffrey: 

  • “Even if you don’t think Chinese AI capabilities are as strong as have been hyped up in the media and elsewhere, important actors will treat China as either a bogeyman figure or as a Sputnik type of wake-up call motivator… other key actors will leverage that as a narrative, as a Sputnik moment of sorts to justify whatever policies they want to do. So we want to understand what’s happening and how the conversation around what’s happening in China’s AI development is unfolding.”
  • “There certainly are differences, but we don’t want to exaggerate them. I think oftentimes analysis of China happens in a vacuum where it’s like, ‘Oh, this only happens in this mysterious far off land, we call China and it doesn’t happen anywhere else.’ Shoshana Zuboff has this great book on Surveillance Capitalism that shows how the violation of privacy is pretty extensive on the US side, not only from big companies but also from the national security apparatus. So I think a similar phenomenon is taking place with the social credit system. Jeremy Dom at Yale laws China Center has put it really nicely where he says that, ‘We often project our worst fears about technology in AI onto what’s happening in China, and we look through a glass darkly and we unleash all of our anxieties on what’s happening on to China without reflecting on what’s happening here in the US, what’s happening here in the UK.'”
  • “I think we have to be careful about which historical analogies and memes we choose. So ‘arms race’ is a very specific call back to cold war context, where there’s almost these discrete types of missiles that we are racing Soviet Union on and discrete applications that we can count up; Or even going way back to what some scholars call the first industrial arms race in the military sphere over steam power boats between Britain and France in the late 19th century. And all of those instances you can count up. France has four iron clads, UK has four iron clads; They’re racing to see who can build more. I don’t think there’s anything like that. There’s not this discreet thing that we’re racing to see who can have more of. If anything, it’s about a competition to see who can absorb AI advances from abroad better, who can diffuse them throughout the economy, who can adopt them in a more sustainable way without sacrificing core values. So that’s sort of one meme that I really want to dispel. Related to that, assumptions that often influence a lot of our discourse on this is techno-nationalist assumption, which is this idea that technology is contained within national boundaries and that the nation state is the most important actor –– which is correct and a good one to have and a lot of instances. But there are also good reasons to adopt techno-globalist assumptions as well, especially in the area of how fast technologies diffuse nowadays and also how much underneath this national level competition, firms from different countries are working together and make standards alliances with each other. So there’s this undercurrent of techno-globalism, where there are people flows, idea flows, company flows happening while the coverage and the sexy topic is always going to be about national level competition, zero sum competition, relative games rhetoric. So you’re trying to find a balance between those two streams.”
  • “I think currently a lot of people in the US are locked into this mindset that the only two players that exist in the world are the US and China. And if you look at our conversation, right, oftentimes I’ve displayed that bias as well. We should probably have talked a lot more about China-EU or China-Japan corporations in this space and networks in this space because there’s a lot happening there too. So a lot of US policy makers see this as a two-player game between the US and China. And then in that sense, if there’s some cancer research project about discovering proteins using AI that may benefit China by 10 points and benefit the US only by eight points, but it’s going to save a lot of people from cancer  –– if you only care about making everything about maintaining a lead over China, then you might not take that deal. But if you think about it from the broader landscape of it’s not just a zero sum competition between US and China, then your kind of evaluation of those different point structures and what you think is rational will change.”

 

Important timestamps: 

0:00 intro 

2:14 Motivations for the conversation

5:44 Historical background on China and AI 

8:13 AI principles in China and the US 

16:20 Jeffrey Ding’s work, Deciphering China’s AI Dream 

21:55 Does China’s government play a central hand in setting regulations? 

23:25 Can Chinese implementation of regulations and standards move faster than in the US? Is China buying shares in companies to have decision making power? 

27:05 The components and drivers of AI in China and how they affect Chinese AI strategy 

35:30 Chinese government guidance funds for AI development 

37:30 Analyzing China’s AI capabilities 

44:20 Implications for the future of AI and AI strategy given the current state of the world 

49:30 How important are AGI and superintelligence concerns in China?

52:30 Are there explicit technical AI research programs in China for AGI? 

53:40 Dispelling AI myths and promoting appropriate memes

56:10 Relative and absolute gains in international politics 

59:11 On Peter Thiel’s recent comments on superintelligence, AI, and China 

1:04:10 Major updates and changes since Jeffrey wrote Deciphering China’s AI Dream 

1:05:50 What does healthy competition between China and the US look like? 

1:11:05 Where to follow Jeffrey and read more of his work

 

Works referenced 

Deciphering China’s AI Dream

FLI AI Policy – China page

ChinAI Newsletter

Jeff’s Twitter

Previous podcast with Jeffrey

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. More works from GovAI can be found here.

 

Lucas Perry: Hello everyone and welcome back to the AI Alignment Podcast at The Future of Life Institute. I’m Lucas Perry and today we’ll be speaking with Jeffrey Ding from The Future of Humanity Institute on China and their efforts to be the leading AI Superpower by 2030. In this podcast, we provide a largely descriptive account of China’s historical technological efforts, their current intentions and methods for pushing Chinese AI Success, some of the foundational AI principles being called for within China; We cover the drivers of AI progress, the components of success, China’s strategies born of these variables; We also assess China’s current and likely future AI capabilities, and the consequences of all this tied together. The FLI AI Policy China page, and Jeffrey Ding’s publication Deciphering China’s AI Dream are large drivers of this conversation, and I recommend you check them out.

If you find this podcast interesting or useful, consider sharing it with friends on social media platforms, forums, or anywhere you think it might be found valuable. As always, you can provide feedback for me by following the SurveyMonkey link found in the description of wherever you might find this podcast. 

Jeffrey Ding specializes in AI strategy and China’s approach to strategic technologies more generally. He is the China lead for the Center for the Governance of AI. There, Jeff researches China’s development of AI and his work has been cited in the Washington Post, South China Morning Post, MIT Technological Review, Bloomberg News, Quartz, and other outlets. He is a fluent Mandarin speaker and has worked at the US Department of State and the Hong Kong Legislative Council. He is also reading for a PhD in international relations as a Rhodes scholar at the University of Oxford. And so without further ado, let’s jump into our conversation with Jeffrey Ding.

Let’s go ahead and start off by providing a bit of the motivations for this conversation today. So why is it that China is important for AI alignment? Why should we be having this conversation? Why are people worried about the US-China AI Dynamic?

Jeffrey Ding: Two main reasons, and I think they follow an “even if” structure. The first reason is China is probably second only to the US in terms of a comprehensive national AI capabilities measurement. That’s a very hard and abstract thing to measure. But if you’re taking which countries have the firms on the leading edge of the technology, the universities, the research labs, and then the scale to lead in industrial terms and also in potential investment in projects related to artificial general intelligence. I would put China second only to the US, at least in terms of my intuition and sort of my analysis that I’ve done on the subject.

The second reason is even if you don’t think Chinese AI capabilities are as strong as have been hyped up in the media and elsewhere, important actors will treat China as either a bogeyman figure or as a Sputnik type of wake-up call motivator. And you can see this in the rhetoric coming from the US especially today, and even in areas that aren’t necessarily connected. So Axios had a leaked memo from the US National Security Council that was talking about centralizing US telecommunication services to prepare for 5G. And in the memo, one of the justifications for this was because China is leading in AI advances. The memo doesn’t really tie the two together. There are connections –– 5G may empower different AI technologies –– but that’s a clear example of how even if Chinese capabilities in AI, especially in projects related to AGI, are not as substantial as has been reported, or we think, other key actors will leverage that as a narrative, as a Sputnik moment of sorts to justify whatever policies they want to do. So we want to understand what’s happening and how the conversation around what’s happening in China’s AI development is unfolding.

Lucas Perry: So the first aspect being that they’re basically the second most powerful AI developer. And we can get into later their relative strength to the US; I think that in your estimation, they have about half as much AI capability relative to the United States. And here, the second one is you’re saying –– and there’s this common meme in AI Alignment about how avoiding races is important because in races, actors have incentives to cut corners in order to gain decisive strategic advantage by being the first to deploy advanced forms of artificial intelligence –– so there’s this important need, you’re saying, for actually understanding the relationship and state of Chinese AI Development to dispel inflammatory race narratives?

Jeffrey Ding: Yeah, I would say China’s probably at the center of most race narratives when we talk about AI arms races and the conversation in at least US policy-making circles –– which is what I follow most, US national security circles –– has not talked necessarily about AI as a decisive strategic advantage in terms of artificial general intelligence, but definitely in terms of decisive strategic advantage and who has more productive power, military power. So yeah, I would agree with that.

Lucas Perry: All right, so let’s provide a little bit more historical background here, I think, to sort of contextualize why there’s this rising conversation about the role of China in the AI space. So I’m taking this here from the FLI AI Policy China page: “In July of 2017, the State Council of China released the New Generation Artificial Intelligence Development Plan. And this was an AI research strategy policy to build a domestic AI industry worth nearly $150 billion in the next few years” –– again, this was in 2017 –– “and to become a leading AI power by 2030. This officially marked the development of the AI sector as a national priority, and it was included in President Xi Jinping’s grand vision for China.” And just adding a little bit more color here: “given this, the government expects its companies and research facilities to be at the same level as leading countries like the United States by 2020.” So within a year from now –– maybe a bit ambitious, given your estimation that they have is about half as much capability as us.

But continuing this picture I’m painting: “five years later, it calls for breakthroughs in select disciplines within AI” –– so that would be by 2025. “That will become a key impetus for economic transformation. And then in the final stage, by 2030, China is intending to become the world’s premier artificial intelligence innovation center, which will in turn foster a new national leadership and establish the key fundamentals for an economic great power,” in their words. So there’s this very clear, intentional stance that China has been developing in the past few years.

Jeffrey Ding: Yeah, definitely. And I think it was Jess Newman who put together the AI policy in China page –– did a great job. It’s a good summary of this New Generation AI Development Plan issued in July 2017 and I would say the plan was more reflective of momentum that was already happening at the local level with companies like Baidu, Tencent, Alibaba, making the shift to focus on AI as a core part of their business strategy. Shenzhen, other cities, had already set up their own local funds and plans, and this was an instance of the Chinese national government, in the words of I think Paul Triolo and some other folks at New America, “riding the wave,” and kind of joining this wave of AI development.

Lucas Perry: And so adding a bit more color here again: there’s also been developments in principles that are being espoused in this context. I’d say probably the first major principles on AI were developed at the Asilomar Conference, at least those pertaining to AGI. In June 2019, the New Generation of AI Governance Expert Committee released principles for next-generation artificial intelligence governance, which included tenants like harmony and friendliness and fairness and justice, inclusiveness and sharing, open cooperation, shared responsibility, and agile governance. 

And then also in May of 2019 the Beijing AI Principles were released. That was by a multi-stakeholder coalition, including the Beijing Academy of Artificial Intelligence, a bunch of top universities in China, as well as industrial firms such as Baidu, Alibaba, and Tencent. And these 15 principles, among other things, called for “the construction of a human community with a shared future and the realization of beneficial AI for humankind in nature.” So it seems like principles and intentions are also being developed similarly in China that sort of echo and reflect many of the principles and intentions that have been developing in the states.

Jeffrey Ding: Yeah, I think there’s definitely a lot of similarities, and I think it’s not just with this recent flurry of AI ethics documents that you’ve done a good job of summarizing. It dates back to even the plan that we were just talking about. If you read the July 2017 New Generation AI Plan carefully, there’s a lot of sections devoted to AI ethics, including some sections that are worried about human robot alienation.

So, depending on how you read that, you could read that as already anticipating some of the issues that could occur if human goals and AI goals do not align. Even back in March, I believe, of 2018, a lot of government bodies came together with companies to put out a white paper on AI standardization, which I translated for New America. And in that, they talk about AI safety and security issues, how it’s important to ensure that the design goals of AI are consistent with the interests, ethics, and morals of most humans. So a lot of these topics, I don’t even know if they’re western topics. These are just basic concepts: We want systems to be controllable and reliable. And yes, those have deeper meanings in the sense of AGI, but that doesn’t mean that some of these initial core values can’t be really easily applied to some of these deeper meanings that we talk about when we talk about AGI ethics.

Lucas Perry: So with all of the animosity and posturing and whatever that happens between the United States and China, these sort of principles and intentions which are being developed, at least in terms of AI –– both of them sort of have international intentions for the common good of humanity; At least that’s what is being stated in these documents. How do you think about the reality of the day-to-day combativeness and competition between the US and China in relation to these principles which strive towards the deployment of AI for the common good of humanity more broadly, rather than just within the context of one country?

Jeffrey Ding: It’s a really good question. I think the first point to clarify is these statements don’t have teeth behind them unless they’re enforced, unless there’s resources dedicated to funding research on these issues, to track 1.5, track 2 diplomacy, technical meetings between researchers. These are just statements that people can put out and they don’t have teeth unless they’re actually enforced. Oftentimes, we know it’s the case. Firms like Google and Microsoft, Amazon, will put out principles about facial recognition or what their ethical stances are, but behind the scenes they’ll chase profit motives and maximize shareholder value. And I would say the same would take place for Tencent, Baidu, Alibaba. So I want to clarify that, first of all. The competitive dynamics are real: It’s partly not just an AI story, it’s a broader story of China’s rise. I’ve come from international relations background, so I’m a PhD student at Oxford studying that, and there’s a big debate in the literature about what happens when a rising power challenges an established power. And oftentimes frictions result, and it’s about how to manage these frictions without leading to accidents, miscalculation, arms races. And that’s the tough part of it.

Lucas Perry: So it seems –– at least for a baseline, thinking that we’re still pretty early in the process of AI alignment or this long-term vision we have –– it seems like at least there is theoretically some shared foundational principles reflective across both the cultures. Again, these Beijing AI Principles also include focus on benefiting all of humanity and the environment; serving human values such as privacy, dignity, freedom, autonomy and rights; continuous focus on AI safety and security; inclusivity, openness; supporting international cooperation; and avoiding a malicious AI race. So the question now simply seems: implementation of these shared principles, ensuring that they manifest.

Jeffrey Ding: Yeah. I don’t mean to be dismissive of these efforts to create principles that were at least expressing the rhetoric of planning for all of humanity. I think there’s definitely a lot of areas of US-China cooperation in the past that have also echoed some of these principles: bi-lateral cooperation on climate change research; there’s a good nuclear safety cooperation module; different centers that we’ve worked on. But at the same time, I also think that even with that list of terms you just mentioned, there are some differences in terms of how both sides understand different terms.

So with privacy in the Chinese context, it’s not necessarily that Chinese people or political actors don’t care about privacy. It’s that privacy might mean more of privacy as an instrumental right, to ensure your financial data doesn’t get leaked, you don’t lose all your money; to ensure that your consumer data is protected from companies; but not necessarily in other contexts where privacy is seen as an intrinsic right, as a civil right of sorts, where it’s also about an individual’s protection from government surveillance. That type of protection is not caught up in conversations about privacy in China as much.

Lucas Perry: Right, so there are going to be implicitly different understandings about some of these principles that we’ll have to navigate. And again, you brought up privacy as something –– and this has been something people have been paying more attention to, as there has been kind of this hype and maybe a little bit of hysteria over the China social crediting system, and plenty of misunderstanding around that.

Jeffrey Ding: Yeah, and this ties into a lot of what I’ve been thinking about lately, which is there certainly are differences, but we don’t want to exaggerate them. I think oftentimes analysis of China happens in a vacuum where it’s like, “Oh, this only happens in this mysterious far off land we call China and it doesn’t happen anywhere else.” Shoshana Zuboff has this great book on surveillance capitalism that shows how the violation of privacy is pretty extensive on the US side, not only from big companies but also from the national security apparatus.

So I think a similar phenomenon is taking place with the social credit system. Jeremy Dom at Yale Law’s China Center has put it really nicely where he says that, “We often project our worst fears about technology in AI onto what’s happening in China, and we look through a glass darkly and we unleash all of our anxieties on what’s happening onto China without reflecting on what’s happening here in the US, what’s happening here in the UK.”

Lucas Perry: Right. I would guess that generally in human psychology it seems easier to see the evil in the other rather than in the self.

Jeffrey Ding: Yeah, that’s a little bit out of range for me, but I’m sure there’s studies on that.

Lucas Perry: Yeah. All right, so let’s get in here now to your work on deciphering China’s AI dream. This is a work that you’d published in 2018 and in this work you divided up into these four different sections. First you work on context, then you discuss components, then you discuss capabilities, and then you discuss consequences all in relation to AI in China. Would you like to just sort of unpack the structuring?

Jeffrey Ding: Yeah, this was very much just a descriptive paper. I was just starting out researching this area and I just had a bunch of basic questions. So question number one for context: what is the background behind China’s AI Strategy? How does it compare to other countries’ plans? How does it compare to its own past science and technology plans? The second question was, what are they doing in terms of pushing forward drivers of AI Development? So that’s the component section. The third question is, how well are they doing? It’s about assessing China’s AI capabilities. And then the fourth is, so what’s it all mean? Why does it matter? And that’s where I talk about the consequences and the potential implications of China’s AI ambitions for issues related to AI Safety, some of the AGI issues we’ve been talking about, national security, economic development, and social governance.

Lucas Perry: So let’s go ahead and move sequentially through these. We’ve already here discussed a bit of context about what’s going on in China in terms of at least the intentional stance and the development of some principles. Are there any other key facets or areas here that you’d like to add about China’s AI strategy in terms of its past science and technology? Just to paint a picture for our listeners.

Jeffrey Ding: Yeah, definitely. I think two past critical technologies that you could look at are the plans to increase China’s space industry, aerospace sector; and then also biotechnology. So in each of these other areas there was also a national level strategic plan; An agency or an office was set up to manage this national plan; Substantial funding was dedicated. With the New Generation AI Plan, there was also a sort of implementation office set up across a bunch of the different departments tasked with implementing the plan.

AI was also elevated to the level of a national strategic technology. And so what’s different between these two phases? Because it’s debatable how successful the space plan and the biotech plans have been. What’s different with AI is you already had big tech giants who are pursuing AI capabilities and have the resources to shift a lot of their investments toward the AI space, independent of government funding mechanisms: companies like Baidu, Tencent, Alibaba, even startups that have really risen like SenseTime. And you see that reflected in the type of model.

It’s no longer the traditional national champion model where the government almost builds a company from the ground up, maybe with the help of like international financers and investors. Now it’s a national team model where they ask for the support of these leading tech giants, but it’s not like these tech giants are reliant on the government for subsidies or funding to survive. They are already flourishing firms that have international presence.

The other bit of context I would just add is that if you look at the New Generation Plan, there’s a lot of terms that are related to manufacturing. And I mentioned in Deciphering China’s AI Dream, how there’s a lot of connections and callbacks to manufacturing plans. And I think this is key because it’s one aspect of China’s strive for AI as they want to escape the middle income trap and kind of get to those higher levels of value-add in the manufacturing chain. So I want to stress that as a key point of context.

Lucas Perry: So the framing here is the Chinese government is trying to enable companies which already exist and already are successful. And this stands in contrast to the US and the UK where it seems like the government isn’t even part of a teamwork effort.

Jeffrey Ding: Yeah. So maybe a good comparison would be how technical standards develop, which is an emphasis of not only this deciphering China dream paper but a lot of later work. So I’m talking about technical standards, like how do you measure the accuracy of facial recognition systems and who gets to set those measures, or product safety standards for different AI applications. And in many other countries, including the US, the process for that is much more decentralized. It’s largely done through industry alliances. There is the NIST, which is a body under the Department of Commerce in the US that helps coordinate that to some extent, but not nearly as much as what happens in China with the Standards Administration Commission (SAC), I believe. There, it’s much more of a centralized effort to create technical standards. And there are pros and cons to both.

With the more decentralized approach, you minimize the risks of technological lock-in by setting standards too early, and you let firms have a little bit more freedom, competition as well. Whereas having a more centralized top-down effort might lead to earlier harmonization on standards and let you leverage economies of scale when you just have more interoperable protocols. That could help with data sharing, help with creating stable test bed for different firms to compete and measure stuff I was talking about earlier, like algorithmic accuracy. So there are pros and cons of the two different approaches. But I think yeah, that does flush out how the relationship between firms and the government differs a little bit, at least in the context of standards setting.

Lucas Perry: So on top of standards setting, would you say China’s government plays more of a central hand in the regulation as well?

Jeffrey Ding: That’s a good question. It probably differs in terms of what area of regulation. So I think in some cases there’s a willingness to let companies experiment and then put down regulations afterward. So this is the classic example with mobile payments: There was definitely a gray space as to how these platforms like Alipay, WeChat Pay were essentially pushing into a gray area of law in terms of who could handle this much money that’s traditionally in the hands of the banks. Instead of clamping down on it right away, the Chinese government kind of let that play itself out, and then once these mobile pay platforms got big enough that they’re holding so much capital and have so much influence on the monetary stock, they then started drafting regulations for them to be almost treated as banks. So that’s an example of where it’s more of a hands-off approach.

In AI, folks have said that the US and China are probably closer in terms of their approach to regulation, which is much more hands-off than the EU. And I think that’s just a product partly of the structural differences in the AI ecosystem. The EU has very few big internet giants and AI algorithm firms, so they have more of an incentive to regulate other countries’ big tech giants and AI firms.

Lucas Perry: So two questions are coming up. One is, is there sufficiently more unity and coordination in the Chinese government such that when standards and regulations, or decisions surrounding AI, need to be implemented that they’re able to move, say, much quicker than the United States government? And the second thing was, I believe you mentioned also that the Chinese government is also trying to find ways of using potential government money for buying up shares in these companies and try to gain decision making power.

Jeffrey Ding: Yeah, I’ll start with the latter. The reference is to the establishment of special management shares: so these would be almost symbolic, less than 1% shares in a company so that they could maybe get a seat on the board –– or another vehicle is through the establishment of party committees within companies, so there’s always a tie to party leadership. I don’t have that much more insight into how these work. I think probably it’s fair to say that the day-to-day and long-term planning decisions of a lot of these companies are mostly just driven by what their leadership wants, not necessarily what the party leaders want, because it’s just very hard to micromanage these billion dollar giants.

And that was part of a lot of what was happening with the reform of the state-owned enterprise sector, where, I think it was the SAC –– there are a lot of acronyms –– but this was the body in control of state-owned enterprises and they significantly cut down the number of enterprises that they directly oversee and sort of focused on the big ones, like the big banks or the big oil companies.

To your first point on how smooth policy enforcement is, this is not something I’ve studied that carefully. I think to some extent there’s more variability in terms of what the government does. So I read somewhere that if you look at the government relations departments of Chinese big tech companies versus US big tech companies, there’s just a lot more on the Chinese side –– although that might be changing with recent developments in the US. Two cases I’m thinking of right now are the Chinese government worrying about addictive games and then issuing the ban against some games including Tencent’s PUBG, which has wrecked Tencent’s game revenues and was really hurtful for their stock value.

So that’s something that would be very hard for the US government to be like, “Hey, this game is banned.” At the same time, there’s a lot of messiness with this, which is why I’m pontificating and equivocating and not really giving you a stable answer, because local governments don’t implement things that well. There’s a lot of local center attention. And especially with technical stuff –– this is the case of the US as well –– there’s just not as much technical talent in the government. So with a lot of these technical privacy issues, it’s very hard to develop good regulations if you don’t actually understand the tech. So what they’ve been trying to do is audit privacy policies of different social media tech companies and they started with 10 of the biggest and have tried to audit them. So I think it’s very much a developing process in both China and the US.

Lucas Perry: So you’re saying that the Chinese government, like the US, lacks much scientific or technical expertise? I had some sort of idea in my head that many of the Chinese mayors or other political figures actually have engineering degrees or degrees in science.

Jeffrey Ding: That’s definitely true. But I mean, by technical expertise I mean something like what the US government did with the digital service corps, where they’re getting people who have worked in the leading edge tech firms to then work for the government. That type of stuff would be useful in China.

Lucas Perry: So let’s move on to the second part, discussing components. And here you relate the key features of China’s AI strategy to the drivers of AI development, and here the drivers of AI development you say are hardware in the form of chips for training and executing AI algorithms, data as an input for AI Algorithms, research and algorithm development –– so actual AI researchers working on the architectures and systems through which the data will be put, and then the commercial AI ecosystems, which I suppose support and feed these first three things. What can you say about the state of these components in China and how it affects China’s AI strategy?

Jeffrey Ding: I think the main thing that I want to emphasize here that a lot of this is the Chinese government is trying to fill in some of the gaps, a lot of this is about enabling people, firms that are already doing the work. One of the gaps is private firms tend to under-invest in basic research or will under-invest in broader education because they don’t get a capture all those gains. So the government tries to support not only AI as a national level discipline but also to construct AI institutes, help fund talent programs to bring back the leading researchers from overseas. So that’s one part of it. 

The second part of it, which I did not talk about that much in the report in this section but I’ve recently researched more and more about, is that where the government is more actively driving things is when they are the final end client. So this is definitely the case in the surveillance industry space: provincial-level public security bureaus are working with companies in both hardware, data, research and development and the whole security systems integration process to develop more advanced high tech surveillance systems.

Lucas Perry: Expanding here, there’s also this way of understanding Chinese AI strategy as it relates to previous technologies and how it’s similar or different. Ways in which it’s similar involve strong degree of state support and intervention, transfer of both technology and talent, and investment in long-term whole-of-society measures; I’m quoting you here.

Jeffrey Ding: Yeah.

Lucas Perry: Furthermore, you state that China is adopting a catch-up approach in the hardware necessary to train and execute AI algorithms. This points towards an asymmetry, that most of the chip manufacturers are not in China and they have to buy them from Nvidia. And then you go on to mention about how access to large quantities of data is an important driver for AI systems and that China’s data protectionism favors Chinese AI companies and accessing data from China’s large domestic market, but it also detracts from cross-border pooling of data.

Jeffrey Ding: Yeah, and just to expand on that point, there’s been good research out of folks at DigiChina, which is a New America Institute, that looks at the cybersecurity law –– and we’re still figuring out how that’s going to be implemented completely, but the original draft would have prevented companies from taking data that was collected inside of China and taking it outside of China.

And actually these folks at DigiChina point out how some of the major backlash to this law didn’t just come from US multinational incorporations but also Chinese multinationals. That aspect of data protectionism illustrates a key trade-off: on one sense, countries and national security players are valuing personal data almost as a national security asset for the risk of blackmail or something. So this is the whole Grindr case in the US where I think Grindr was encouraged or strongly encouraged by the US government to find a non-Chinese owner. So that’s on one aspect you want to protect personal information, but on the other hand, free data flows are critical to spurring gains and innovation as well for some of these larger companies.

Lucas Perry: Is there an interest here to be able to sell their data to other companies abroad? Is that why they’re against this data protectionism in China?

Jeffrey Ding: I don’t know that much about this particular case, but I think Alibaba and Tencent have labs all around the world. So they might want to collate their data together, so they were worried that the cybersecurity law would affect that.

Lucas Perry: And just highlighting here for the listeners that access to large amounts of high quality data is extremely important for efficaciously training models and machine learning systems. Data is a new, very valuable resource. And so you go on here to say, I’m quoting you again, “China’s also actively recruiting and cultivating talented researchers to develop AI algorithms. The state council’s AI plan outlines a two pronged gathering and training approach.” This seems to be very important, but it also seems like from your report that China’s losing AI talent to America largely. What can you say about this?

Jeffrey Ding: Often the biggest bottleneck cited to AI development is lack of technical talent. That gap will eventually be filled just based on pure operations in the market, but in the meantime there has been a focus on AI talent, whether that’s through some of these national talent programs, or it also happens through things like local governments offering tax breaks for companies who may have headquarters around the world.

For example, Jingchi which is an autonomous driving startup, they had I think their main base in California or one of their main bases in California; But then Shenzhen or Guangzhou, I’m not sure which local government it was, they gave them basically free office space to move one of their bases back to China and that brings a lot of talented people back. And you’re right, a lot of the best and brightest do go to US companies as well, and one of the key channels for recruiting Chinese students are big firms setting up offshore research and development labs like Microsoft Research Asia in Beijing.

And then the third thing I’ll point out, and this is something I’ve noticed recently when I was doing translations from science and tech media platforms that are looking at the talent space in particular: They’ve pointed out that there’s sometimes a tension between the gathering and the training planks. So there’ve been complaints from domestic Chinese researchers, so maybe you have two super talented PhD students. One decides to stay in China, the other decides to go abroad for their post-doc. And oftentimes the talent plans –– the recruiting, gathering plank of this talent policy –– will then favor the person who went abroad for the post-doc experience over the person who stayed in China, and they might be just as good. So then that actually creates an incentive for more people to go abroad. There’s been good research that a lot of the best and brightest ended up staying abroad; The stay rates, especially in the US for Chinese PhD students in computer science fields, are shockingly high.

Lucas Perry: What can you say about Chinese PhD student anxieties with regards to leaving the United States to go visit family in China and come back? I’ve heard that there may be anxieties about not being let back in given that their research has focused on AI and that there’s been increasing US suspicions of spying or whatever.

Jeffrey Ding: I don’t know how much of it is a recent development but I think it’s just when applying for different stages of the path to permanent residency –– whether it’s applying for the H-1B visa or if you’re in the green card pipeline –– I’ve heard just secondhand that they avoid traveling abroad or going back to visit family just to kind of show commitment that they’re residing here in the US. So I don’t know how much of that is recent. My dad actually, he started out as a PhD student in math at University of Iowa before switching to computer science and I remember we had a death in the family and he couldn’t go back because it was so early on in his stay. So I’m sure it’s a conflicted situation for a lot of Chinese international students in the US.

Lucas Perry: So moving along here and ending this component section, you also say here –– and this kind of goes back to what we were discussing earlier about government guidance funds –– Chinese government is also starting to take a more active role in funding AI ventures, helping to grow the fourth driver of AI development, which again is the commercial AI ecosystems, which support and are the context for hardware data and research on algorithm development. And so the Chinese government is disbursing funds through what are called Government Guidance Funds or GGFs, set up by local governments and state owned companies. And the government has invested more than a billion US dollars on domestic startups. This seems to be in clear contrast with how America functions on this, with much of the investments shifting towards healthcare and AI as the priority areas in the last two years.

Jeffrey Ding: Right, yeah. So the GGFs are an interesting funding vehicle. The China Money Network, which has I think the best English language coverage of these vehicles, say that they may be history’s greatest experiment in using state capitol to reshape a nation’s economy. These essentially are Public Private Partnerships, PPPs, which do exist across the world, in the US. And the idea is basically the state seeds and anchors these investment vehicles and then they partner with private capital to also invest in startups, companies that the government thinks either are supporting a particular policy initiative or are good for overall development.

A lot of this is hard to decipher in terms of what the impact has been so far, because publicly available information is relatively scarce. I mentioned in my report that these funds haven’t had a successful exit yet, which means that maybe just they need more time. I think there’s also been some complaints that the big VCs –– whether it’s Chinese VCs or even international VCs that have a Chinese arm –– they much prefer to just to go it on their own rather than be tied to all the strings and potential regulations that come with working with the government. So I think it’s definitely a case of time will tell, and also this is a very fertile research area that I know some people are looking into. So be on the lookout for more conclusive findings about these GGFs, especially how they relate to the emerging technologies.

Lucas Perry: All right. So we’re getting to your capabilities section, which assesses the current state of China’s AI capabilities across the four drivers of AI development. Here you’re constructing an AI Potential Index, which is an index for the potentiality of, say, a country, based off these four variables, to be able to create successful AI products. So based on your research, you give China an AI Potential Index score of 17, which is about half of the US’s AI Potential Index score of 33. And so you state here that what is sort of essential to draw from this finding is the relative scale, or at least the proportionality, between China and the US. So the conclusion which we can try to draw from this is that China trails the US in every driver except for access to data, and that on all of these dimensions China is about half as capable as the US.

Jeffrey Ding: Yes, so the AIPI, the AI Potential Index, was definitely just meant as a first cut at developing a measure for which we can make comparative claims. I think at the time, and even now, I think we just throw around things like, “who is ahead in AI?” I was reading this recent Defense One article that was like, “China’s the world leader in GANs,” G-A-Ns, Generative Adversarial Networks. That’s just not even a claim that is coherent. Are you the leader at developing the talent who is going to make advancement to GANs? Are you the leader at applying and deploying GANs in the military field? Are you the leader in producing the most publications related to GANs?

I think that’s what was frustrating me about the conversation and net assessment of different countries’ AI capabilities, so that’s why I tried to develop a more systematic framework which looked at the different drivers, and it was basically looking at what is the potential of country’s AI capabilities based on their marks across these drivers.

Since then, probably the main thing that I’ve done update this was in my written testimony before the US China Economic and Security Review Commission, where I kind of switch up a little bit how I evaluate the current AI capabilities of China and the US. Basically there’s this very fuzzy concept of national AI capabilities that we throw around and I slice it up into three cross-sections. The first is, let’s look at what the scientific and technological inputs and outputs different countries are putting into AI. So that’s: how many publications are coming out of this country in Europe versus China versus US? How many outputs also in the sense of publications or inputs in the sense of R&D investments? So let’s take a look at that. 

The second slice is, let’s not just say AI. I think every time you say AI it’s always better to specify subtypes, or at least in the second slice I look at different layers of the AI value chain: foundational layers, technological layers, and the application layer. So, for example, foundation layers may be who is leading in developing the AI open source software that serves as the technological backbone for a lot of these AI applications and technologies? 

And then the third slice that I take is different sub domains of AI –– so computer vision, predictive intelligence, natural language processing, et cetera. And basically my conclusion: I throw a bunch of statistics in this written testimony out there –– some of it draws from this AI potential index that I put out last year –– and my conclusion is that China is not poised to overtake the US in the technology domain of AI; Rather the US maintains structural advantages in the quality of S and T inputs and outputs, the fundamental layers of the AI value chain, and key sub domains of AI.

So yeah, this stuff changes really fast too. I think a lot of people are trying to put together more systemic ways of measuring these things. So Jack Clark at openAI; projects like the AI index out of Stanford University; Matt Sheehan recently put out a really good piece for MacroPolo on developing sort of a five-dimensional framework for understanding data. So in this AIPI first cut, my data indicator is just a very raw who has more mobile phone users, but that obviously doesn’t matter for who’s going to lead in autonomous vehicles. So having finer grained understanding of how to measure different drivers will definitely help this field going forward.

Lucas Perry: What can you say about symmetries or asymmetries in terms of sub-fields in AI research like GANs or computer vision or any number of different sub-fields? Can we expect very strong specialties to develop in one country rather than another, or there to be lasting asymmetries in this space, or does research publication subvert this to some extent?

Jeffrey Ding: I think natural language processing is probably the best example because everyone says NLP, but then you just have that abstract word and you never dive into, “Oh wait, China might have a comparative advantage in Chinese language data processing, speech recognition, knowledge mapping,” which makes sense. There is just more of an incentive for Chinese companies to put out huge open source repositories to train automatic speech recognition.

So there might be some advantage in Chinese language data processing, although Microsoft Research Asia has very strong NOP capabilities as well. Facial recognition, maybe another area of comparative advantage: I think in my testimony I cite that China has published 900 patents in this sub domain in 2017; In that same year less than 150 patents related to facial recognition were filed in the US. So that could be partly just because there’s so much more of a fervor for surveillance applications, but in other domains such as the larger scale business applications the US probably possesses a decisive advantage. So autonomous vehicles are the best example of that: In my opinion, Google’s Waymo, GM’s Cruise are lapping the field.

And then finally in my written testimony I also try to look at military applications, and I find one metric that puts the US as having more than seven times as many military patents filed with the terms “autonomous” or “unmanned” in the patent abstract in the years 2003 to 2015. So yeah, that’s one of the research streams I’m really interested in, is how can we have more fine grain metrics that actually put into context China’s AI development, and that way we can have a more measured understanding of it.

Lucas Perry: All right, so we’ve gone into length now providing a descriptive account of China and the United States and key descriptive insights of your research. Moving into consequences now, I’ll just state some of these insights which you bring to light in your paper and then maybe you can expand on them a bit.

Jeffrey Ding: Sure.

Lucas Perry: You discuss the potential implications of China’s AI dream for issues of AI safety and ethics, national security, economic development, and social governance. The thinking here is becoming more diversified and substantive, though you claim it’s also too early to form firm conclusions about the long-term trajectory of China’s AI development; This is probably also true of any other country, really. You go on to conclude that a group of Chinese actors is increasingly engaged with issues of AI safety and ethics. 

A new book has been authored by Tencent’s Research Institute, and it includes a chapter in which the authors discuss the Asilomar Principles in detail and call for  strong regulations and controlling spells for AI. There’s also this conclusion that military applications of AI could provide a decisive strategic advantage in international security. The degree to which China’s approach to military AI represents a revolution in military affairs is an important question to study, to see how strategic advantages between the United States and China continue to change. You continue by elucidating how the economic benefit is the primary and immediate driving force behind China’s development of AI –– and again, I think you highlighted this sort of manufacturing perspective on this.

And finally, China’s adoption of AI Technologies could also have implications for its mode of social governance. For the state council’s AI plan, you state, “AI will play an irreplaceable role in maintaining social stability, an aim reflected in local level integrations of AI across a broad range of public services, including judicial services, medical care, and public security.” So given these sort of insights that you’ve come to and consequences of this descriptive picture we’ve painted about China and AI, is there anything else you’d like to add here?

Jeffrey Ding: Yeah, I think as you are laying out those four categories of consequences, I was just thinking this is what makes this area so exciting to study because if you think about it, each four of those consequences map out onto four research fields: AI ethics and safety, which with benevolent AI efforts, stuff that FLI is doing, the broader technology studies, critical technologies studies, technology ethics field; then in the social governance space, AI as a tool of social control: what are the social aftershocks of AI’s economic implications? You have this entire field of democracy studies or studies of technology and authoritarianism; and the economic benefits, you have this entire field of innovation studies: how do we understand the productivity benefits of general purpose technologies? And of course with AI as a revolution in military affairs, you have this whole field of security studies that is trying to understand what are the implications of new emerging technologies for national security? 

So it’s easy to start delineating these into their separate containers. I think what’s hard, especially for those of us are really concerned about that first field, AI ethics and safety, and the risks of AGI arms races, is a lot of other people are really, really concerned about those other three fields. And how do we tie in concepts from those fields? How do we take from those fields, learn from those fields, shape the language that we’re using to also be in conversation with those fields –– and then also see how those fields may actually be in conflict with some of what our goals are? And then how do we navigate those conflicts? How do we prioritize different things over others? It’s an exciting but daunting prospect ahead.

Lucas Perry: If you’re listening to this and are interested in becoming an AI researcher in terms of the China landscape, we need you. There’s a lot of great and open research questions here to work on.

Jeffrey Ding: For sure. For sure.

Lucas Perry: So I’ve extracted some insights from previous podcasts you did –– I can leave a link for that in the page for this podcast –– so I just want to kind of rapid fire these as points that I thought were interesting that we may or may not have covered here. You point out a language asymmetry: The best Chinese AI researchers read English and Chinese, whereas the western researchers generally cannot do this. You have a newsletter called China AI with 1A; Your newsletter attempts to correct for this as you translate important Chinese tech-related things into English. I suggest everyone follow that if you’re interested in continuing to track China and AI. There is more international cooperation on research at international conferences –– this is a general trend that you point out: Some top Chinese AI conferences are English only. Furthermore, I believe that you claim that the top 10% of AI research is still happening in America and the UK. 

Another point which I think that you’ve brought up is that China is behind on military AI uses. I’m also interested here just to see if you can expand a little bit more on it, but that China and AI safety and superintelligence is also something interesting to hear a little bit more about because on this podcast we often take the lens of long-term AI issues and AGI and super intelligence. So I think you mentioned that the Nick Bostrom of China is Professor, correct me if I get this wrong, Jao ting Wang. And also I’m curious here if you might be able to expand on how large or serious this China superintelligence FLI/FHI vibe is and what the implications of this are, and if there are any orgs in China that are explicitly focused on this. I’m sorry if this is a silly question, but are there like nonprofits in China in the same way that there are in the US? How does that function? Is China on the brink of having an FHI or FLI or MIRI or anything like this?

Jeffrey Ding: So a lot to untangle there and all really good questions. First, just to clarify, yeah, there are definitely nonprofits, non-governmental organizations. In recent years there has been some pressure on international nongovernmental organizations, nonprofit organizations, but there’s definitely nonprofits. One of the open source NLP initiatives I mentioned earlier, the Chinese language Corpus, was put together by a nonprofit online organization called AIShell Foundation, and they put together AIShell-1, AIShell-2, which are the largest open source speech Corpus available for Mandarin speech recognition.

I haven’t really followed up on Jao ting Wang. He’s a philosopher at the Chinese Academy of Social Sciences. The sort of “Nick Bostrom of China” label was more of a newsletter headline to get people to read, but he does devote a lot of time and thinking to the long-term risks of AI. Another professor at Nanjing University by the name of Zhi-Hua Zhou, he’s published articles about the need to not even touch some of what he calls strong AI. These were published in a pretty influential publication outlet by the Chinese Computer Federation, which brings together a lot of the big name computer scientists. So there’s definitely conversations about this happening. Whether there is an FHI, FLI equivalent, let’s say probably not, at least not yet.

Peking University may be developing something in this space. Berggruen Institute is also I think looking at some related issues. There’s probably a lot of stuff happening in Hong Kong as well; Maybe we just haven’t looked hard enough. I think the biggest difference is there’s definitely not something on the level of a DeepMind or OpenAI, because even the firms with the best general AI capabilities –– DeepMind and OpenAI almost like these unique entities where profits and stocks don’t matter.

So yeah, definitely some differences, but honestly I updated significantly once I started reading more, and nobody had really looked at this Zhi-Hua Zhou essay before we went looking and found it. So maybe there are a lot of these organizations and institutions out there but we just need to look harder.

Lucas Perry: So on this point of there not being OpenAI or DeepMind equivalents, are there any research organizations or departments explicitly focused on the mission of creating artificial general intelligence or superintelligence safely scalable machine learning systems that could go from now until infinity? Or is this just more like scattered researchers?

Jeffrey Ding: I think it’s how you define an AGI project. Like what you just said is probably a good tight definition. I know Seth Baum, he’s done some research tracking AGI projects and he says that there are six in China. I would say probably the only ones that come close are, I guess Tencent says it’s one of their missions streams to develop artificial general intelligence; horizon robotics, which is actually like a chip company, they also state it as one of their objectives. It depends also on how much you think work on neuroscience related pathways into AGI count or not. So there’s probably some Chinese Academy of Science labs working on whole brain emulation or kind of more brain inspired approaches to AGI, but definitely not anywhere to the level of DeepMind, OpenAI.

Lucas Perry: All right. So there are some myths in table one of your paper which you demystify. Three of these are: China’s approach to AI is defined by its top-down and monolithic nature; China is winning the AI arms race; And there is little to no discussion of issues of AI ethics and safety in China. And then maybe lastly I might add, if you might be able to add to it, that there is just to begin with an AI arms race between the US and China.

Jeffrey Ding: Yeah, I think that’s a good addition. I think we have to be careful about which historical analogies and memes we choose. So “arms race” is a very specific call back to cold war context, where there’s almost these discrete types of missiles that we are racing Soviet Union on and discrete applications that we can count up; Or even going way back to what some scholars call the first industrial arms race in the military sphere over steam power boats between Britain and France in the late 19th century. And all of those instances you can count up. France has four iron clads, UK has four iron clads; They’re racing to see who can build more. I don’t think there’s anything like that. There’s not this discreet thing that we’re racing to see who can have more of. If anything, it’s about a competition to see who can absorb AI advances from abroad better, who can diffuse them throughout the economy, who can adopt them in a more sustainable way without sacrificing core values.

So that’s sort of one meme that I really want to dispel. Related to that, assumptions that often influence a lot of our discourse on this is techno-nationalist assumption, which is this idea that technology is contained within national boundaries and that the nation state is the most important actor –– which is correct and a good one to have and a lot of instances. But there are also good reasons to adopt techno-globalist assumptions as well, especially in the area of how fast technologies diffuse nowadays and also how much underneath this national level competition, firms from different countries are working together and make standards alliances with each other. So there’s this undercurrent of techno-globalism, where there are people flows, idea flows, company flows happening while the coverage and the sexy topic is always going to be about national level competition, zero sum competition, relative games rhetoric. So you’re trying to find a balance between those two streams.

Lucas Perry: What can you say about this sort of reflection on zero sum games versus healthy competition and the properties of AI and AI research? I’m seeking clarification on this secondary framing that we can take on a more international perspective about deployment and implementation of AI research and systems rather than, as you said, this sort of techno-nationalist one.

Jeffrey Ding: Actually, this idea comes from my supervisor: Relative gains make sense if there’s only two players involved, just from a pure self-interest maximizing standpoint. But once you introduce three or more players, relative gains doesn’t make as much sense as optimizing for absolute gains. So maybe one way to explain this is to take the perspective of a European country –– let’s say Germany –– and you are working on an AI project with China or some other country that maybe the US is pressuring you not to work with; You’re working with Saudi Arabia or China on some project and it’s going to benefit China 10 arbitrary points and it’s going to benefit Germany eight arbitrary points versus if you didn’t choose to cooperate at all.

So in that sense, Germany, the rational actor, would take that deal. You’re not just caring about being better than China; From a German perspective, you care about maintaining leadership in the European Union, providing health benefits to your citizens, continuing to power your economy. So in that sense you would take the deal even though China benefits a little bit more, relatively speaking. 

I think currently a lot of people in the US are locked into this mindset that the only two players that exist in the world are the US and China. And if you look at our conversation, right, oftentimes I’ve displayed that bias as well. We should probably have talked a lot more about China-EU or China-Japan cooperation in this space and networks in this space because there’s a lot happening there too. So a lot of US policy makers see this as a two-player game between the US and China. And then in that sense, if there’s some cancer research project about discovering proteins using AI that may benefit China by 10 points and benefit the US only by eight points, but it’s going to save a lot of people from cancer  –– if you only care about making everything about maintaining a lead over China, then you might not take that deal. But if you think about it from the broader landscape of it’s not just a zero sum competition between US and China, then your kind of evaluation of those different point structures and what you think is rational will change.

Lucas Perry: So as there’s more actors, is the idea here that you care more about absolute gains in the sense that these utility points or whatever can be translated into decisive strategic advantages like military advantages?

Jeffrey Ding: Yeah, I think that’s part of it. What I was thinking along that example is basically 

if you as Germany don’t choose to cooperate with Saudi Arabia or work on this joint research project with China then the UK or some other countries just going to swoop in. And that possibility doesn’t exist in the world where you’re just thinking about two players. There’s a lot of different ways to fit these sort of formal models, but that’s probably the most simplistic way of explaining it.

Lucas Perry: Okay, cool. So you’ve spoken a bit here on important myths that we need to dispel or memes that we need to combat. And recently Peter Thiel has been on a bunch of conservative platforms, and he also wrote an op-ed, basically fanning the flames of AGI as a military weapon, AI as a path to superintelligence and, “Google campuses have lots of Chinese people on them who may be spies,” and that Google is actively helping China with AI military technology. In terms of bad memes and myths to combat, what are your thoughts here?

Jeffrey Ding: There’s just a lot of things that Thiel gets wrong. I’m mostly kind of just confused because he is one of the original founders of OpenAI, he’s funded other institutions, really concerned about AGI safety, really concerned about race dynamics –– and then in the middle of this piece, he first says AI is a military technology, then he goes back to saying AI is dual use in the middle, and then he says this ambiguity is “strangely missing from the narrative that pits a monolithic AI against all of humanity.” He out of anyone should know that these conversations about the risks of AGI, why are you attacking this straw man in the form of a terminator AI meme? Especially, you’re funding a lot of the organizations that are worried about the risks of AGI for all of humanity. 

The other main thing that’s really problematic is if you’re concerned about the US military advantage, that more than ever is rooted on our innovation advantage. It’s not about spinoff from military innovation to civilian innovation, which was the case in the days of US tech competition against Japan. It’s more the case of spin on, where innovations are happening in the commercial sector that are undergirding the US military advantage.

And this idea of painting Google as anti-American for setting up labs in China is so counterproductive. There are independent Google developer conferences all across China just because so many Chinese programmers want to use Google tools like TensorFlow. It goes back to the fundamental AI open source software I was talking about earlier that lets Google expand its talent pool: People want to work on Google products; They’re more used to the framework of Google tools to build all these products. Google’s not doing this out of charity to help the Chinese military. They’re doing this because the US has a flawed high-skilled immigration system, so they need to go to other countries to get talent. 

Also, the other thing about the piece is he cites no empirical research on any of these fronts, when there’s this whole globalization of innovation literature that backs up empirically a lot of what I’m saying. And then I’ve done my own empirical research on Microsoft Research Asia, which as we’ve mentioned is their second biggest lab overall, it’s based in Beijing. I’ve tracked their PhD Fellowship Program: This basically gives people at Chinese PhD programs, you get a full scholarship and you just do an internship at Microsoft Research Asia for one of the summers. And then we track their career trajectories, and a lot of them end up coming to the US or working for Microsoft Research Asia in Beijing. And the ones that come to the US don’t just go to Microsoft: They go to Snapchat or Facebook or other companies. And it’s not just about the people: As I mentioned earlier, we have this innovation centrism about who produces the technology first, but oftentimes it’s about who diffuses and adopts the technology first. And we’re not always going to be the first on the scene, so we have to be able to adopt and diffuse technologies that are invented first in other areas. And these overseas labs are some of our best portals into understanding what’s happening in these other areas. If we lose them, it’s another form of asymmetry because Chinese AI companies are going abroad and expanding. 

I honestly, I’m just really confused about what the point of this piece was and to be honest, it’s kind of sad because this is not what Thiel researches every day. So he’s obviously picking up bits and pieces from the narrative frames that are dominating our conversation. And it’s actually probably a structural stain on how we’ve allowed the discourse to have so many of these bad problematic memes, and we need more people calling them out actively, doing the heart to heart conversations behind the scenes to get people to change their minds or have productive constructive conversations about these.

And the last thing I’ll point out here is there’s this zombie Cold War mentality that still lingers today, and I think the historian Walter McDougall was really great in calling this out, where he talks about we paint this other, this enemy, and we use it to justify sacrifices in human values to drive society to its fullest technological potential. And that often comes with sacrificing human values like privacy, equality, freedom of speech. And I don’t want us to compete with China over who can build better tools to sensor, repress, and surveil dissidents and minority groups, right? Let’s see who can build the better, I don’t know, industrial internet of things or build better privacy preserving algorithms that are going to sustain a more trustworthy AI ecosystem.

Lucas Perry: Awesome. So just moving along here as we’re making it to the end of our conversation: What are updates you’ve had or major changes since you’ve written Deciphering China’s AI Dreams, since it has been a year?

Jeffrey Ding: Yeah, I mentioned some of the updates in the capability section. The consequences, I mean I think those are still the four main big issues, all of them tied to four different literature bases. The biggest change would probably be in the component section. I think when I started out, I was pretty new in this field, I was reading a lot of literature from the China watching community and also a lot from Chinese comparative politics or articles about China, and so I focused a lot on government policies. And while I think the party and the government are definitely major players, I think I probably overemphasized the importance of government policies versus what is happening at the local level.

So if I were to go back and rewrite it, I would’ve looked a lot more at what is happening at the local level, given more examples of AI firms, like iFlytek I think is a very interesting under-covered firm, and how they are setting up research institutes with a university in Chung Cheng very similar to the industry- academia style collaborations in the US, basically just ensuring that they’re able to train the next generation of talent. They have relatively close ties to the state as well, I think controlling shares or a large percentage of shares owned by state-owned vehicles. So I probably would have gone back and looked at some of these more under-covered firms and localities and looked at what they were doing rather than just looking at the rhetoric coming from the central government.

Lucas Perry: Okay. What does it mean for there to be healthy competition between the United States and China? What is an ideal AI research and political situation? What are the ideal properties of the relations the US and China can have on the path to superintelligence?

Jeffrey Ding: Yeah.

Lucas Perry: Solve AI Governance for me, Jeff!

Jeffrey Ding: If I could answer that question, I think I could probably retire or something. I don’t know.

Lucas Perry: Well, we’d still have to figure out how to implement the ideal governance solutions.

Jeffrey Ding: Yeah. I think one starting point is on the way to more advanced AI systems, we have to stop looking at AI as if it’s like this completely special area with no analogs, because even though there are unique aspects of AI –– like their autonomous intelligence systems, a possibility of the product surpassing human level intelligence, or the process surpassing human level intelligence –– we can learn a lot from past general purpose technologies like steam, electricity, the diesel engine. And we can learn about a lot of competition in past strategic industries like chips, steel.

So I think probably one thing that we can distill from some of this literature is there are some aspects of AI development that are going to be more likely to lead to race dynamics than others. So one cut that you could take are industries where it’s likely that there are only going to be two or three, four or five major players –– so it might be the case that capital costs, the upstart costs, the infrastructure costs of autonomous vehicles requires that there are going to be only one or two players across the world. And that is like, hey, if you’re a national government who’s thinking strategically, you might really want to have a player in that space, so that might incentivize more competition. Whereas in other fields, maybe there’s just going to be a lot more competition or less need for relative gain, zero sum thinking. So like neural machine translation, that could be a case of something that just almost becomes like a commodity. 

So then there are things we can think about in those fields where there’s only going to be four or five players or three or four players. Can we maybe balance it out so that at least one is from the two major powers or is the better approach to, I don’t know, enact global competition, global antitrust policy to kind of ensure that there’s always going to be a bunch of different players from a bunch of different countries? So those are some of the things that come to mind that I’m thinking about, but yeah, this is definitely something where I claim zero credibility relative to others who are thinking about it.

Lucas Perry: Right. Well unclear anyone has very good answers here. I think my perspective, to add at least one frame on it, is that given the dual use nature of many of the technologies like computer vision and like embedded robot systems and developing autonomy and image classification –– all of these different AI specialty subsystems can be sort of put together in arbitrary ways. So in terms of autonomous weapons, FLI’s position is, it’s important to establish international standards around the appropriate and beneficial uses of these technologies.

Image classification, as people already know, can be used for discrimination or beneficial things. And the technologies can be aggregated to make anything from literal terminator swarm robots to lifesaving medical treatments. So the relation between the United States and China can be made more productive if clear standards based on the expression of the principles we enumerated earlier could be created. And given that, then we might be taking some paths towards a beneficial beautiful future of advanced AI systems.

Jeffrey Ding: Yeah, no, I like that a lot. And some of the technical standards documents I’ve been translating: I definitely think in the short-term, technical standards are a good way forward, sort of solve the starter pack type of problems before AGI. Even some Chinese white papers on AI standardization have put out the idea of ranking the intelligence level of different autonomous systems –– like an autonomous car might be more than a smart speaker or something: Even that is a nice way to kind of keep track of the progress, is continuities in terms of intelligence explosions and trajectories in the space. So yeah, I definitely second that idea. Standardization efforts, autonomous weapons regulation efforts, as serving as the building blocks for larger AGI safety issues.

Lucas Perry: I would definitely like to echo this starter pack point of view. There’s a lot of open questions about the architectures or ways in which we’re going to get to AGI, about how the political landscape and research landscape is going to change in time. But I think that we already have enough capabilities and questions that we should really be considering where we can be practicing and implementing the regulations and standards and principles and intentions today in 2019 that are going to lead to robustly good futures for AGI and superintelligence.

Jeffrey Ding: Yeah. Cool.

Lucas Perry: So Jeff, if people want to follow you, what is the best way to do that?

Jeffrey Ding: You can hit me up on Twitter, I’m @JJDing99; Or I put out a weekly newsletter featuring translations on AI related issues from Chinese media, Chinese scholars and that’s China AI Newsletter, C-H-I-N-A-I. if you just search that, it should pop up.

Lucas Perry: Links to those will be provided in the description of wherever you might find this podcast. Jeff, thank you so much for coming on and thank you for all of your work and research and efforts in this space, for helping to create a robust and beneficial future with AI.

Jeffrey Ding: All right, Lucas. Thanks. Thanks for the opportunity. This was fun.

Lucas Perry: If you enjoyed this podcast, please subscribe, give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

End of recorded material

The Climate Crisis as an Existential Threat with Simon Beard and Haydn Belfield

Does the climate crisis pose an existential threat? And is that even the best way to formulate the question, or should we be looking at the relationship between the climate crisis and existential threats differently? In this month’s FLI podcast, Ariel was joined by Simon Beard and Haydn Belfield of the University of Cambridge’s Center for the Study of Existential Risk (CSER), who explained why, despite the many unknowns, it might indeed make sense to study climate change as an existential threat. Simon and Haydn broke down the different systems underlying human civilization and the ways climate change threatens these systems; They also discussed our species’ unique strengths and vulnerabilities — and the ways in which technology has heightened both — with respect to the changing climate.

This month’s podcast helps serve as the basis for a new podcast we’re launching later this month about the climate crisis. We’ll be talking to climate scientists, meteorologists, AI researchers, policy experts, economists, social scientists, journalists, and more to go in depth about a vast array of climate topics. We’ll talk about the basic science behind climate change, like greenhouse gases, the carbon cycle, feedback loops, and tipping points. We’ll discuss various impacts of greenhouse gases, like increased extreme weather events, loss of biodiversity, ocean acidification, resource conflict, and the possible threat to our own continued existence. We’ll talk about the human causes of climate change and the many human solutions that need to be implemented. And so much more!. If you don’t already subscribe to our podcasts on your preferred podcast platform, please consider doing so now to ensure you’ll be notified when the climate series launches.

We’d also like to make sure we’re covering the climate topics that are of most interest to you. If you have a couple minutes, please fill out a short survey at surveymonkey.com/r/climatepodcastsurvey, and let us know what you want to learn more about.

Topics discussed in this episode include:

  • What an existential risk is and how to classify different threats
  • Systems critical to human civilization
  • Destabilizing conditions and the global systems death spiral
  • How we’re vulnerable as a species
  • The “rungless ladder”
  • Why we can’t wait for technology to solve climate change
  • Uncertainty and how to deal with it
  • How to incentivize more creative science
  • What individuals can do

References discussed in this episode include:

Want to get involved? CSER is hiring! Find a list of openings here.

Ariel Conn: Hi everyone and welcome to another episode of the FLI podcast. I’m your host, Ariel Conn, and I am especially excited about this month’s episode. Not only because, as always, we have two amazing guests joining us, but also because this podcast helps lay the groundwork for an upcoming series we’re releasing on climate change.

There’s a lot of debate within the existential risk community about whether the climate crisis really does pose an existential threat, or if it will just be really, really bad for humanity. But this debate exists because we don’t know enough yet about how bad the climate crisis will get nor about how humanity will react to these changes. It’s very possible that today’s predicted scenarios for the future underestimate how bad climate change could be, while also underestimating how badly humanity will respond to these changes. Yet if we can get enough people to take this threat seriously and to take real, meaningful action, then we could prevent the worst of climate change, and maybe even improve some aspects of life. 

In late August, we’ll be launching a new podcast series dedicated to climate change. I’ll be talking to climate scientists, meteorologists, AI researchers, policy experts, economists, social scientists, journalists, and more to go in depth about a vast array of climate topics. We’ll talk about the basic science behind climate change, like greenhouse gases, the carbon cycle, feedback loops, and tipping points. We’ll discuss various impacts of greenhouse gases, like increased extreme weather events, loss of biodiversity, ocean acidification, resource conflict, and the possible threat to our own continued existence. We’ll talk about the human causes of climate change and the many human solutions that need to be implemented. And so much more. If you don’t already subscribe to our podcasts on your preferred podcast platform, please consider doing so now to ensure you’ll be notified as soon as the climate series launches.

But first, today, I’m joined by two guests who suggest we should reconsider studying climate change as an existential threat. Dr. Simon Beard and Haydn Belfield are researchers at University of Cambridge’s Center for the Study of Existential Risk, or CSER. CSER is an interdisciplinary research group dedicated to the study and mitigation of risks that could lead to human extinction or a civilizational collapse. They study existential risks, develop collaborative strategies to reduce them, and foster a global community of academics, technologists, and policy makers working to safeguard humanity. Their research focuses on four areas: biological risks, environmental risks, risks from artificial intelligence, and how to manage extreme technological risk in general.

Simon is a senior research associate and academic program manager; He’s a moral philosopher by training. Haydn is a research associate and academic project manager, as well as an associate fellow at the Leverhulme Center for the Future of Intelligence. His background is in politics and policy, including working for the UK Labor party for several years. Simon and Haydn, thank you so much for joining us today.

Simon Beard: Thank you.

Haydn Belfield: Hello, thank you.

Ariel Conn: So I’ve brought you both on to talk about some work that you’re involved with, looking at studying climate change as an existential risk. But before we really get into that, I want to remind people about some of the terminology. So I was hoping you could quickly go over a reminder of what an existential threat is and how that differs from a catastrophic threat and if there’s any other terminology that you think is useful for people to understand before we start looking at the extreme threats of climate change.

Simon Beard: So, we use these various terms as kind of terms of art within the field of existential risk studies, in a sense. We know what we mean by them, but all of them, in a way, are different ways of pointing to the same kind of outcome — which is something unexpectedly, unprecedentedly bad. And, actually, once you’ve got your head around that, different groups have slightly different understandings of what the differences between these three terms are. 

So, for some groups, it’s all about just the scale of badness. So, an extreme risk is one that does a sort of an extreme level of harm; A catastrophic risk does more harm, a catastrophic level of harm. And an existential risk is something where either everyone dies, human extinction occurs, or you have an outcome which is an equivalent amount of harm: Maybe some people survive, but their lives are terrible. Actually, at the Center for the Study of Existential Risk, we are concerned about this classification in terms of the cost involved, but we also have coupled that with a slightly different sort of terminology, which is really about systems and the operation of the global systems that surround us.

Most of the systems — be this physiological systems, the world’s ecological system, the social, economic, technological, cultural systems that surround those institutions that we build on — they have a kind of normal space of operation where they do the things that you expect them to do. And this is what human life, human flourishing, and human survival are built on: that we can get food from the biosphere, that our bodies will continue to operate in a way that’s consistent with and supporting our health and our continued survival, and that the institutions that we’ve developed will still work, will still deliver food to our tables, will still suppress interpersonal and international violence, and that we’ll basically, we’ll be able to get on with our lives.

If you look at it that way, then an extreme risk, or an extreme threat, is one that pushes at least one of these systems outside of its normal boundaries of operation and creates an abnormal behavior that we then have to work really hard to respond to. A catastrophic risk is one where that happens, but then that also cascades. Particularly in global catastrophe, you have a whole system that encompasses everyone all around the world, or maybe a set of systems that encompass everyone all around the world, that are all operating in this abnormal state that’s really hard for us to respond to.

And then an existential catastrophe is one where the systems have been pushed into such an abnormal state that either you can’t get them back or it’s going to be really hard. And life as we know it cannot be resumed; We’re going to have to live in a very different and very inferior world, at least from our current way of thinking.

Haydn Belfield: I think that sort of captures it really well. One thing that you could kind of visualize, it might be something like, imagine a really bad endemic. 100 years ago, we had the Spanish flu pandemic that killed 100 million people — that was really bad. But it could be even worse. So imagine one tomorrow that killed a billion people. That would be one of the worst things that’s ever happened to humanity; It would be sort of a global catastrophic risk. But it might not end our story, it might not be the end of our potential. But imagine if it killed everyone, or it killed almost everyone, and it was impossible to recover: That would be an existential risk.

Ariel Conn: So, there’s — at least I’ve seen some debate about whether we want to consider climate change as falling into either a global catastrophic or existential risk category. And I want to start first with an article that, Simon, you wrote back in 2017, to consider this question. The subheading of your article is a question that I think is actually really important. And it was: how much should we care about something that is probably not going to happen? I want to ask you about that — how much should we care about something that is probably not going to happen?

Simon Beard: I think this is really important when you think about existential risk. People’s minds, they want to think about predictions, they want someone who works in existential risk to be a prophet of doom. That is the idea that we have — that you know what the future is going to be like, and it’s going to be terrible, and what you’re saying is, this is what’s going to happen. That’s not how people who work in existential risk operate. We are dealing with risks, and risks are about knowing all the possible outcomes: whether any of those are this severe long term threat, an irrecoverable loss to our species.

And it doesn’t have to be the case that you think that something is the most likely or the most probable as a potential outcome for you to get really worried about the thing that could bring that about. And even a 1% risk of one of these existential catastrophes is still completely unacceptable because of the scale of the threat, and the harm we’re talking about. And because if this happens, there is no going back; It’s not something that we can do a safe experiment with.

So when you’re dealing with risk, you have to deal with probabilities. You don’t have to be convinced that climate change is going to have these effects to really place it on the same level as some of the other existential risks that people talk about — nuclear weapons, and artificial intelligence, and so on — you just need to see that this is possible. We can’t exclude it based on the knowledge that we have at the moment, but it seems like a credible threat with a real chance of materializing. And something that we can do about it, because ultimately the aim of all existential risk research is safety — trying to make the world a safer place and the future of humanity a more certain thing.

Ariel Conn: Before I get into the work that you’re doing now, I want to stick with one more question that I have about this article. I was amused when you sent me the link to it — you sort of prefaced it by saying that you think it’s rather emblematic of some of the problematic ways that we think about climate change, especially as an existential risk, and that your thinking has evolved in the last couple of years since writing this. I was hoping you could just talk a little bit about some of the problems you see with the way we’re thinking about climate change as an x-risk.

Simon Beard: I wrote this paper largely out of a realization that people wanted us to talk about climate change in the next century. And we wanted to talk about it. It’s always up there on the list of risks and threats that people bring up when you talk about existential risk. And so I thought, well, let’s get the ball rolling; Let’s review what’s out there, and the kind of predictions that people who seem to know what they’re talking about have made about this — you know, economists, climate scientists, and so on — and make this case that this suggests there is a credible threat, and we need to take this seriously. And that seemed, at the time, like a really good place to start.

But the more I thought about it afterwards, the more flawed I saw the approach as being. And it’s hard to regret a paper like that, because I’m still convinced that the risk is very real, and people need to take it seriously. But for instance, one of the things that kept on coming up is that when people make predictions about climate change as an existential risk, they’re always very vague. Why is it a risk? What’s the sort of scenarios that we worry about? Where are the danger levels? And they always want to link it to a particular temperature threshold or a particular greenhouse gas trajectory. And that just didn’t strike me as credible, that we would cross a particular temperature threshold and then that would be the end of humanity.

Because of course, a huge amount of the risk that we face depends upon how humanity responds to the changing climate, not just upon climate change. I think people have this idea in their mind that it’ll get so hot, everyone will fry or everyone will die of heat exhaustion. And that’s just not a credible scenario. So there were these really credible scholars, like Marty Weitzman and Ram Ramanathan, who tried to work this out, and have tried to predict what was going to happen. But they seemed to me to be missing a lot, and try and make very precise claims but based on very vague scenarios. So we kind of said at that point, we’re going to stop doing this until we have worked out a better way of thinking about climate change as an existential threat. And we’ve been thinking a lot about this in the intervening 18 months, and that’s where the research that you’re seeing that we’re hoping to publish soon and the desire to do this podcast really come from. So it seems to us that there are kind of three ways that people have gone about thinking about climate change as an existential risk. It’s a really hard question. We don’t really know what’s going to happen. There’s a lot of speculation involved in this.

One of the ways that people have gone about trying to respond to this has just been to speculate, just been to come up with some plausible scenario or pick a temperature number out of the air and say, “Well, that seems about right, if that were to happen that would lead to human extinction, or at least a major disruption of all of these systems that we rely upon. So what’s the risk of that happening, and then we’ll label that as the existential climate threat.” As far as we can tell, there isn’t the research to back up some of these numbers. Many of them conflict: In Ram Ramanathan’s paper he goes for five degrees; In Marty Weitzman’s paper he goes to six degrees; There’s another paper that was produced by Breakthrough where they go for four degrees. There’s kind of quite a lot of disagreement about where the danger levels lie.

And some of it’s just really bad. So there’s this prominent paper by Jem Bendell — he never got it published, but it’s been read like 150,000 times, I think — on adapting to extreme climate change. And he just picks this random scenario where the sea levels rise, a whole bunch of coastal nuclear reactors get inundated with seawater, and they go critical, and this just causes human extinction. That’s not credible in many different ways, not least just that won’t have that much damage. But it just doesn’t seem credible that this slow sea level rise would have this disastrous meltdown effect — we could respond to that. What passes for scientific study and speculation didn’t seem good enough to us.

Then there were some papers which just kind of passed the whole thing by — say, “Well, we can’t come up with a plausible scenario or a plausible threat level, but there just seem to be a lot of bad things going on around there. Given that we know that the climate is changing, and that we are responding to this in a variety of ways, probably quite inadequately, it doesn’t help us to prioritize efforts or really understand the level of risk we face and when maybe some more extreme measures like geoengineering become more appropriate because of the level of risk that we face.”

And then there’s a final set of studies — there have been an increasing number of these; one recently came out in Vox, Anders Sandberg has done one, and Toby Ord talks about one — where people say, “Well, let’s just go for the things that we know, let’s go for the best data and the best studies.” And these usually focus on a very limited number of climate effects, the more direct impacts of things like heat exhaustion, perhaps sometimes the crop failure — but only really looking at the most direct climate impacts and only where there are existing studies. And then they try and extrapolate from that, sometimes using integrated assessment models, sometimes it’s the other kinds of analysis, but usually in quite a straightforward linear economic analysis or epidemiological analysis.

And that also is useful. I don’t want to dis these papers; I think that they provide very useful information for us. But there is no way that that can constitute an adequate risk assessment, given the complexity of the impacts that climate change is having, and the ways in which we’re responding to that. And it’s very easy for people to read these numbers and these figures and conclude, as I think the Vox article did, climate change isn’t an existential risk, it’s just going to kill a lot of people. Well, no, we know it will kill a lot of people, but that doesn’t answer the question about whether it is an existential threat. There are a lot of things that you’re not considering in this analysis. So given that there wasn’t really a good example that we could follow within the literature, we’ve kind of turned it on its head. And we’re now saying, maybe we need to work backwards.

Rather than trying to work forwards from the climate change we’re expecting and the effects that we think that is going to have and then whether these seem to constitute an existential threat, maybe we need to start from the other end and think about what are the conditions that could most plausibly destabilize the global civilization and the continued future of our species? And then work back from them to ask, are there plausible climate scenarios that could bring these about? And there’s already been some interesting work in this area for natural systems, and this kind of global Earth system thinking and the planetary boundaries framework, but there’s been very little work on this done at the social level.

And even less work done when you consider that we rely on both social and natural systems for our survival. So what we really need is some kind of approach that will integrate these two. That’s a huge research agenda. So this is how we think we’re going to proceed in trying to move beyond the limited research that we’ve got available. And now we need to go ahead and actually construct these analysis and do a lot more work in this field. And maybe we’re going to start to be able to produce a better answer.

Ariel Conn: Can you give some examples of the research that has started with this approach of working backwards?

Simon Beard: So there’s been some really interesting research coming out of the Stockholm Resilience Center dealing with natural Earth systems. So they first produced this paper on planetary boundaries, where they looked at a range of, I think it’s nine systems — the biosphere, biogeochemical systems, yes, climate system and so on — and said, are these systems operating in what we would consider their normal functioning boundaries? That’s how they’ve operated throughout the pliocene, throughout the last several thousand years, during which human civilization has developed. Or do they show signs of transitioning to a new state of abnormal operation? Or are they in a state that’s already posing high risk to the future of human civilization, but without really specifying what that risk is.

Then they produced another paper recently on Hothouse Earth, where they started to look for tipping points within the system, points where, in a sense, change become self perpetuating. And rather than just a kind of gradual transition from what we’re used to, to maybe an abnormal condition, all of a sudden, a whole bunch of changes start to accelerate. So it becomes much harder to adapt to these. Their analysis is quite limited, but they argue that quite a lot of these tipping point seem to start kicking in at about one and a half to two degrees warming above pre-industrial levels.

We’re getting quite close to that now. But yeah, the real question for us at the Center for the Study of Existential risk looking at humanity is, what are the effects of this going to be? And also what are the risks that exist within those socio-technological systems, the institutions that we set up, the way that we survive as a civilization, the way we get our food, the way we get our information, and so on, because there’s also significant fragilities and potential tipping points there as well. 

That’s a very new sort of study, I mean, to the point were a lot of people just refer back to this one book written by Jared Diamond in 2005 as if it was the authoritative tome on collapse. And it’s a popular book, and he’s not an expert in this: He’s kind of a very generalist scholar, but he provides a very narrative-based analysis of the collapse of certain historical civilizations and draws out a couple of key lessons from that. But it’s all very vague and really written for a general audience. And that still kind of stands out as this is the weighty tome, this is where you go to get answers to your questions. It’s very early and we think that there’s a lot of room for better analysis of that question. And that’s something we’re looking at a lot.

Ariel Conn: Can you talk about the difference between treating climate change itself as an existential risk, like saying this is an x-risk, and studying it as if it poses such a threat? If that distinction makes sense?

Simon Beard: Yeah. When you label something as an existential risk, I think that is in many ways a very political move. And I think that that has been the predominant lens through which people have approached this question of how we should talk about climate change. People want to draw attention to it, they realize that there’s a lot of bad things that could come from it. And it seems like we could improve the quality of our future lives relatively easily by tackling climate change.

It’s not like AI safety, you know, the threats that we face from advance artificial intelligence, where you really have to have advanced knowledge of machine learning and a lot of skills and do a lot of research to understand what’s going on here and what the real threats that we face might be. This is quite clear. So talking about it, labeling it as an existential risk has predominantly been a political act. But we are an academic institution. 

I think when you ask this question about studying it as an existential threat, one of the great challenges we face is all things that are perceived as existential threats, they’re all interconnected. Human extinction, or the collapse of our civilization, or these outcomes that we worry about: these are scenarios and they will have complex causes — complex technological causes, complex natural causes. And in a sense, when you want to ask the question, should we study climate change as an existential risk? What you’re really asking is, if we look at everything that flows from climate change, will we learn something about the conditions that could precipitate the end of our civilization? 

Now, ultimately, that might come about because of some heat exhaustion or vast crop failure because of the climate change directly. It may come about because, say, climate change triggers a nuclear war. And then there’s a question of, was that a climate-based extinction or a nuclear-based extinction? Or it might come about because we develop technologies to counter climate change, and then those technologies prove to be more dangerous than we thought and pose an existential threat. So when we carve this off as an academic question, what we really want to know is, do we understand more about the conditions that would lead to existential risk, and do we understand more about how we can prevent this bad thing from happening, if we look specifically at climate change? It’s a slightly different bar. But it’s all really just this question of, is talking about climate change, or thinking about climate change, a way to move to a safer world? We think it is but we think that there’s quite a lot of complex, difficult research that is needed to really make that so. And at the moment, what we have is a lot of speculation.

Haydn Belfield: I’ve got maybe an answer to that as well. Over the last few years, lots, and lots of politicians have said climate change is an existential risk, and lots of activists as well. So you get lots and lots of speeches, or rallies, or articles saying this is an existential risk. But at the same time, over the last few years, we’ve had people who study existential risk for a living, saying, “Well, we think it’s an existential risk in the same way that nuclear war is an existential risk. But it’s not maybe this single event that could kill lots and lots of people, or everyone, in kind of one fell swoop.”

So you get people saying, “Well, it’s not a direct risk on its own, because you can’t really kill absolutely everybody on earth with climate change. Maybe there’s bits of the world you can’t live in, but people move around. So it’s not an existential risk.” And I think the problem with both of these ways of viewing it is that word that I’ve been emphasizing, “an.” So I would kind of want to ban the word “an” existential risk, or “a” existential risk, and just say, does it contribute to existential risk in general?

So it’s pretty clear that climate change is going to make a bunch of the hazards that we face — like pandemics, or conflict, or environmental one-off disasters — more likely, but it will also make us more vulnerable to a whole range of hazards, and it will also increase the chances of all these types of things happening, and increase our exposure. So like with Simon, I would want to ask, is climate change going to increase the existential risk we face, and not get hung up on this question of is it “an” existential risk?

Simon Beard: The problem is, unfortunately, there is an existing terminology and existing way of talking that to some extent we’re bound up with. And this is how the debate is. So we’ve really struggled with to what extent we kind of impose the terminology that we’ve most liked on the field and the way that these things are discussed? And we know ultimately existential risk is just a thing; It’s a homogenous lump at the end of human civilization or the human species, and what we’re really looking at is the drivers of that and the things that push that up, and we want to push it down. That is not a concept that I think lots of people find easy to engage with. People do like to carve this up into particular hazards and vulnerabilities and so on.

Haydn Belfield: That’s how most of risk studies works. Most of when you study natural disasters, or you study accidents, in an industry setting, that’s what you’re looking at. You’re not looking at this risk as completely separate. You’re saying, “What hazards are we facing? What are our vulnerabilities? And what are our exposure,” and kind of combining all of those into having some overall assessment of the risk you face. You don’t try and silo it up into, this is bio, this is nuclear, this is AI, this is environment.

Ariel Conn: So that connects to a question that I have for you both. And that is what do you see as society’s greatest vulnerabilities today?

Haydn Belfield: Do you want to give that a go, Simon?

Simon Beard: Sure. So I really hesitate to answer any question that’s posed quite in that way, just because I don’t know what our greatest vulnerability is.

Haydn Belfield: Because you’re a very good academic, Simon.

Simon Beard: But we know some of the things that contribute to our vulnerability overall. One that really sticks in my head came out of a study we did looking at what we can learn from previous mass extinction events. And one of the things that people have found looking at the species that tend to die out in mass extinctions, and the species that survive, is this idea that the specialists — the efficient specialists — who’ve really carved out a strong biological niche for themselves, and are often the ones that are doing very well as a result of that, tend to be the species that die out, and the species that survive are the species that are generalists. But that means that within any given niche or habitat or environment, they’re always much more marginal, biologically speaking.

And then you say, “Well, what is humanity? Are we a specialist that’s very vulnerable to collapse, or are we a generalist that’s very robust and resilient to this kind of collapse that would fare very well?” And what you have to say is, as a species, when you consider humanity on its own, we seem to be the ultimate generalist, and indeed, we’re the only generalist who’s really moved beyond marginality. We thrive in every environment, every biome, and we survive in places where almost no other life form would survive. We survived on the surface of the moon — not for very long, but we did; We survived Antarctica, on the back ice, for long periods of time. And we can survive at the bottom of the Mariana Trench, and just a ridiculously large range of habitats.

But of course, the way we’ve achieved that is that every individual is now an incredible specialist. There are very few people in the world who could really support themselves. And you can’t just sort of pick it up and go along with it. You know like this last weekend, I went to an agricultural museum with my kids, and they were showing, you know, how you plow fields and how you gather crops and looked after it. And there’s a lot of really important, quite artisanal skills about what you had to do to gather the food and protect it and prepare it and so on. And you can’t just pick this up with a book; you really have to spend a long time learning it and getting used to it and getting your body strong enough to do these things.

And so every one of us as an individual, I think, is very vulnerable, and relies upon these massive global systems that we’ve set up, these massive global institutions, to provide this support and to make us this wonderfully adaptable generalist species. So, so long as institutions and the technologies that they’ve created and the broad socio-technological systems that we’ve created — so long as they carry on thriving and operating as we want them to, then we are very, very generalist, very adaptable, very likely to make it through any kind of trouble that we might face in the next couple of centuries — with a few exceptions, a few really extreme events. 

But the flip side of that is anything that threatens those global socio-technological institutions also threatens to move us from this very resilient global population we have at the moment to an incredibly fragile one. If we fall back on individuals and our communities, all of a sudden, we are going to become the vulnerable specialist that each of us individually is. That is a potentially catastrophic outcome that people don’t think about enough.

Haydn Belfield: One of my colleagues, Luke Kemp, likes to describe this as a rungless ladder. So the idea is that there’s been lots and lots of collapses before in human history. But what normally happens is elites at the top of the society collapse, and it’s bad for them. But for everyone else, you kind of drop one rung down on the ladder, but it’s okay, you just go back to the farm, and you still know how to farm, your family’s still farming — things get a little worse, maybe, but it’s not really that bad. And you get people leaving the cities, things like that; But you only drop one rung down the ladder, you don’t fall off it. But as we’ve gone many, many more rungs up the ladder, we’ve knocked out every rung below us. And now we’re really high up the ladder. Very few of us know how to farm, how to hunt or gather, how to survive, and so on. So were we to fall off that rungless ladder, then we might come crashing down with a wallop.

Ariel Conn: I’m sort of curious. We’re talking about how humanity is generalist but we’re looking within the boundaries of the types of places we can live. And yet, we’re all very specifically, as you described, reliant on technology in order to live in these very different, diverse environments. And so I wonder if we actually are generalists? Or if we are still specialists at a societal level because of technology, if that makes sense?

Simon Beard: Absolutely. I mean, the point of this was, we kind of wanted to work out where we fell on the spectrum. And basically, it’s a spectrum that you can’t apply to humanity: We appear to fall as the most extreme species in both ends. And I think one of the reasons for that is that the scale as it would be applied to most species really only looks at the physical characteristics of the species, and how they interact directly with their environment — whereas we’ve developed all these highly emergent systems that go way beyond how we interact with the environment, that determine how we interact with one another, and how we interact with the technologies that we’ve created.

And those basically allow us to interact with the world around us in the same ways that both generalists and specialists would. That’s great in many ways: It’s really served us well as a species, it’s been part of the hallmark of our success and our ability to get this far. But it is a real threat, because it adds a whole bunch of systems that have to be operating in a way as we expect them to in order for us to continue. Maybe so long as these systems function it makes us more resilient to normal environmental shocks. But it makes us vulnerable to a whole bunch of other shocks.

And then you look at the way that we actually treat these emergent socio-technological systems. And we’re constantly driving for efficiency; We’re constantly driving for growth, as quick and easy growth as we can get. And the ways that you do that are often by making the systems themselves much less resilient. Resiliency requires redundancy, requires diversity, requires flexibility, requires all of the things that either an economic planner or a market functioning on short-term economic return really hate, because they get in the way of productivity.

Haydn Belfield: Do you want to explain what resilience is?

Simon Beard: No.

Ariel Conn: Hayden do you want to explain it?

Haydn Belfield: I’ll give it a shot, yeah. So, just since people might not be familiar with it — so what I normally think of is someone balancing. How robust they are is how much you can push that person balancing before they fall over, and then resilience is how quickly they get up and can balance again. The next time they balance, they’re even stronger than before. So that’s what we’re talking about when we’re talking about resilience, how quickly and how well you’re able to respond to those kinds of external shocks.

Ariel Conn: I want to stick with this topic of the impact of technology, because one of the arguments that I often hear about why climate change isn’t as big of an existential threat or a contributor to existential risk as some people worry is because at some point in the near future, we will develop technologies that will help us address climate change, and so we don’t need to worry about it. You guys bring this up in the paper that you’re working on as potentially a dangerous approach; I was hoping you could talk about that.

Simon Beard: I think there’s various problems with looking for the technological solutions. One of them is technologies tend to be developed for quite specific purposes. But some of the conditions that we are examining as potential civilization collapse due to climate change scenarios involve quite widespread and wide-scale systemic change to society and to the environment around us. And engineers have a great challenge even capturing and responding to one kind of change. Engineering is an art of the small; It’s a reductionist art; You break things down, and you look at the components, and you solve each of the challenges one by one.

And there are definitely visionary engineers who look at systems and look at how the parts all fit together. But even there, you have to have a model, you have to have a basic set of assumptions of how all these parts fit together and how they’re going to interact. And this is why you get things like Murphy’s Law — you know, if it can go wrong, it will go wrong — because that’s not how the real world works. The real world is constantly throwing different challenges at you, problems that you didn’t foresee, or couldn’t have foreseen because they are inconsistent with the assumption you made, all of these things. 

So it is quite a stretch to put your faith in technology being able to solve this problem, when you don’t understand exactly what the problem that you’re facing is. And you don’t necessarily at this point understand where we may cross the tipping point, the point of no return, when you really have to step up this R & D funding. Or now you know the problem that the engineers have to solve, because it’s staring you in the face: By the time that that happens, it may be too late. If you get positive feedback loops — you know, reinforcement where one bad thing leads to another bad thing, leads to another bad thing, which then contributes to the original bad thing — you need so much more energy to push the system back into a state of normality than for this cycle to just keep on pushing it further and further away from what you previously were at.

So that throws up significant barriers to a technological fix. The other issue, just going back to what we were saying earlier, is technology does also breed fragility. We have a set of paradigms about how technologies are developed, how they interface with the economy that we face, which is always pushing for more growth and more efficiency. It has not got a very good track record of investing in resilience, investing in redundancy, investing in fail-safes, and so on. You typically need to have strong, externally enforced incentives for that to happen.

And if you’re busy saying this isn’t really a threat, this isn’t something we need to worry about, there’s a real risk that you’re not going to achieve that. And yes, you may be able to develop new technologies that start to work. But are they actually just storing up more problems for the future? We can’t wait until the story’s ended and then know whether these technologies really did make us safer in the end or more vulnerable.

Haydn Belfield: So I think I would have an overall skepticism about technology from a kind of, “Oh, it’s going to increase our resilience.” My skepticism in this case is just more practical. So it could very well be that we do develop — so there’s these things called negative emissions technologies, which suck CO2 out of the air — we could maybe develop that. Or things that could lower the temperature of the earth: maybe we can find a way to do that, throw the whole climate and weather into a chaotic system. Maybe tomorrow’s the day that we get the breakthrough with nuclear fusion. I mean, it could be that all of these things happen — it’d be great if they could. But I just wouldn’t put all my bets on it. The idea that we don’t need to prioritize climate change above all else, and make it a real central effort for societies, for companies, for governments, because we can just hope for some techno-fix to come along and save us — I just think it’s too risky, and it’s unwise. Especially because if we’re listening to the scientists, we don’t have that much longer. We’ve only got a few decades left, maybe even one decade, to really make dramatic changes. And we just won’t have invented some silver bullet within a decade’s time. Maybe technology could save us from climate change; I’d love it if it could. But we just can’t be sure about that, so we need to make other changes.

Simon Beard: That’s really interesting, Hayden, because when you list negative emissions technologies, or nuclear fusion, that’s not the sort of technology I’m talking about. I was thinking about technology as something that would basically just be used to make us more robust. Obviously, one of the things that you do if you think that climate change is an existential threat is you say, “Well, we really need to prioritize more investment into these potential technology solutions.” The belief that climate change is exponential threat is not committing you to trying to make climate change worse, or something like that.

You want to make it as small as possible, you want to reduce this impact as much as possible. That’s how you respond to climate change as an existential threat. if you don’t believe climate change is an existential threat, you would invest less in those technologies. Also, I do wanna say — and I mean, I think there’s some legitimate debate about this, but I don’t like the 12 years terminology, I don’t think we know nearly enough to support those kind of claims. The IPCC came up with this 12 years, but it’s not really clear what they meant by it. And it’s certainly not clear where they got it from. People have been saying, “Oh, we’ve got a year to fix the climate,” or something, for as long as I can remember discussions going on about climate change.

It’s one of those things where that makes a lot of sense politically, but those claims aren’t scientifically based. We don’t know. We need to make sure that that’s not true; We need to falsify these claims, either by really looking at it, and finding out that it genuinely is safer than we thought it was or by doing the technological development and greenhouse gas reduction efforts and other climate mitigation methods to make it safe. That’s just how it works.

Ariel Conn: Do you think that we’re seeing the kind of investment in technology, you know, trying to develop any of these solutions, that we would be seeing if people were sufficiently concerned about climate change as an existential threat?

Simon Beard: So one of the things that worries me is people always judge this by looking at one thing and saying, “Are we doing enough of that thing? Are we reducing our carbon dioxide emissions fast enough? Are people changing their behaviors fast enough? Are we developing technologies fast enough? Are we ready?” Because we know so little about the nature of the risk, we have to respond to this in a portfolio manner; We have to say, “What are all the different actions and the different things that we can take that will make us safer?” And we need to do all of those. And we need to do as much as we can of all of these.

And I think there is a definite negative answer to your question when you look at it like that, because people aren’t doing enough thinking and aren’t doing enough work about how we do all the things we need to do to make us safe from climate change. People tend to get an idea of what they think a safer world would look like, and then complain that we’re not doing enough of that thing, which is very legitimate and we should be doing more of all of these things. But if you look at it as an existential risk, and you look at it from an existential safety angle, there’s just so few people who are saying, “Let’s do everything we can to protect ourselves from this risk.”

Way too many people are saying, “I’ve had a great idea, let’s do this.” That doesn’t seem to me like safety-based thinking; That seems to me like putting all your eggs in one basket and basically generating the solution to climate change that’s most likely to be fragile, that’s most likely to miss something important and not solve the real problem and store up trouble for a future date and so on. We need to do more — but that’s not just more quantitatively, it’s also more qualitatively.

Haydn Belfield: I think just clearly we’re not doing enough. We’re not cutting emissions enough, we’re not moving to renewables fast enough, we’re not even beginning to explore possible solar geoengineering responses, we don’t have anything that really works to suck carbon dioxide or other greenhouse gases out of the air. Definitely, we’re not yet taking it seriously enough as something that could be a major contributor to the end of our civilization or the end of our entire species.

Ariel Conn: I think this connects nicely to another section of some of the work you’ve been doing. And that is looking at — I think there were seven critical systems that are listed as sort of necessary for humanity and civilization.

Simon Beard: Seven levels of critical systems.

Ariel Conn: Okay.

Simon Beard: We rely on all sorts of systems for our continued functioning and survival. And a sufficiently significant failure in any of these systems could be fatal to all of our species. We can kind of classify these systems at various levels. So at the bottom, there are the physical systems — that’s basically the laws of physics. Atoms operate, how subatomic particles operate, how they interact with each other: those are pretty safe. There are some advanced physics experiments that some people have postulated may be a threat to those systems. But they all seem pretty safe. 

We then kind of move up: We’ve got basic chemical systems and biochemical systems, how we generate enzymes and all the molecules that we use — proteins, lipids, and so on. Then we move up to the level of the cell; Then we move up to the level of the anatomical systems — the digestive system, the respiratory system — we need all these things. Then you look at the organism as a whole and how it operates. Then you look at how organisms interact with each other: the biosphere system, the biological system, ecological system.

And then as human beings, we’ve added this kind of seventh, even more emergent, system, which is not just how humans interact with each other, but the kind of systems that we have made to govern our interaction, and to determine how we work together with each other: political institutions, technology, the way we distribute resources around the planet, and so on. So there are a really quite amazing number of potential vulnerabilities that our species has. 

It’s many more than seven, but categorizing needs on the kind of the seven levels is helpful to not miss anything, because I think most people’s idea of an existential threat is something like a really big gun. Guns, we understand how they kill people, if you just had a really huge gun, and just blew a hole in everyone’s head. But that’s both missing things that are actually a lot more basic than the way that people normally die, but also a lot more sophisticated and emergent. All of these are potentially quite threatening.

Ariel Conn: So can you explain a little bit more detail how climate change affects these different levels?

Haydn Belfield: So I guess the way I’ll do is I’ll first talk a bit about natural feedback stuff, and then talk about the social feedback loops. Everyone listening to this will be familiar with feedback loops, like methane getting released from permafrost in the Arctic, or methane coming out of clathrates in the ocean, or there’s other kinds of feedback loops. So there’s one that was discovered only recently, very recent paper was about cloud formation. So if it gets to four degrees, these models show that it becomes much harder for clouds to form. And so you don’t get much sort of radiation bouncing off those clouds and you get very rapid additional heating up to 12 degrees, is what it said.

So the first way that climate change could affect these kinds of systems that we’re talking about is it just makes it anatomically way too hot: You get all these feedback, and it just becomes far too hot for anyone to survive sort of anywhere on the surface. It might get much too hot in certain areas of the globe for really civilization to be able to continue there, much like it’s very hard in the center of the Sahara to have large cities or anything like that. But that seems quite unlikely that climate change would ever get that bad. The kind of stuff that we’re much more concerned about is the more general effects that climate change, climate chaos, climate breakdown might have on a bunch of other systems.

So in this paper, we’ve broken it down into three. We’ve looked at the effects of climate change on the food/water/energy system, the ecological system, and on our political system and conflict. And climate change is likely to have very negative effects on all three of those systems. It’s likely to negatively affect crop yields; It’s likely to increase freak weather events, and there’s some possibility that you might have these sort of very freak weather events — droughts, or hurricanes is also one — in areas where we produce lots of our calories, so bread baskets around the world. So climate change is going to have very negative effects most likely on our food and energy and water systems.

Then separately, there’s ecological systems. People will be very familiar with climate change driving lots of habitat loss, and therefore the loss of species; People will be very familiar with coral reefs dying and bleaching and going away. This could also have very negative effects on us, because we rely on these ecological systems to provide what we call ecological services. Ecological services are things like pollination, so if all the bees died what would we do? Ecological services also include the fish that we catch and eat, or fresh, clean drinking water. So climate change is likely to have very negative effects on that whole set of systems. And then it’s likely to have negative effects on our political system.

If there are large areas of the world that are nigh on uninhabitable, because you can’t grow food or you can’t go out at midday, or there’s no clean water available, then you’re likely to see maybe state breakdown, maybe huge numbers of people leaving — much more than we’ve ever encountered before, sort of 10s or hundred millions of people dislocated and moving around the world. That’s likely to lead to conflict and war. So those are some ways in which climate change could have negative effects on three sets of systems that we crucially rely on as a civilization.

Ariel Conn: So in your work, you also talk about the global systems death spiral. Was that part of this?

Haydn Belfield: Yeah, that’s right. The global systems death spiral is a catchy term to describe the interaction between all these different systems. So not only would climate change have negative effects on our ecosystems, on our food and water and energy systems, the political system and conflict, but these different effects are likely to interact and make each other worse. So imagine our ecosystems are harmed by climate change: Well, that probably has an effect on food/water systems, because we rely on our ecosystems for these ecosystem services. 

So then, the bad effects on our food and water systems: Well, that probably leads to conflict. So some colleagues of ours at the Anglia Ruskin University have something called a global chaos map, which is a great name for a research project, where they try and link incidences of shocks to the food system and conflict — riots or civil wars. And they’ve identified lots and lots of examples of this. Most famously, the Arab Spring, which has now become lots of conflicts, has been linked to a big spike in food prices several years ago. So there’s that link there between food and water, insecurity and conflict. 

And then conflict leads back into ecosystem damage. Because if you have conflict, you’ve got weak governance, you’ve got weak governments trying to protect their ecosystems, and weak government has been identified as the strongest single predictor of ecosystem loss, biodiversity loss. They all interact with one another, and make one another worse. And you could also think about things going back the other way. So for example, if you’re in a war zone, if you’ve got conflict, you’ve got failing states — that has knock-on effects on the food systems, and the water systems that we rely on: We often get famines during wartime.

And then if they don’t have enough food to eat, they don’t have water to drink, maybe that has negative effects on our ecosystems, too, because people are desperate to eat anything. So what we’re trying to point out here is that the systems aren’t independent from one another — they’re not like three different knobs that are all getting turned up independently by climate change — but that they interact with one another in a way that could cause lots of chaos and lots of negative outcomes for world society.

Simon Beard: We did this kind of pilot study looking at the ecological system and the food system and the global political system and looking at the connections of those three, really just in one direction: looking at the impact of food insecurity on conflict, and conflict and political instability on the biosphere, and loss of biosphere on integrity of the food system. But that was largely determined by the fact that these were three connections that we either had looked at directly, or had close colleagues who had looked at, so we had quite good access to the resources.

As Hayden said, everything kind of also works in the other direction, most likely. And also, there are many, many more global systems that interact in different ways. Another trio that we’re very interested in looking at in the future is the connection between the biosphere and the political system, but this time, also, with some of the health systems, the emergence of new diseases, the ability to respond to public health emergencies, and especially when these things are looked at in kind of one health perspective, where plant health and animal health and human health are all actually very closely interacting with one another.

And then you kind of see this pattern where, yes, we could survive six degrees plus, and we could survive famine, and we could survive x, y, and z. But once these things start interacting, it just drives you to a situation where really everything that we take for granted at the moment up to and including the survival of the species — they’re all on the table, they’re all up for grabs once you start to get this destructive cycle between changes in the environment and changes in how human society interacts with the environment. It’s the very dangerous, potentially very self-perpetuating feedback loop, and that’s why we refer to it as a global systems death spiral: because we really can’t predict at this point in time where it will end. But it looks very, very bleak, and very, very hard to see how once you enter into this situation, you could then kind of dial it back and return to a safe operating environment for humanity and the systems that we rely on. 

There’s definitely a new stable state at the end of this spiral. So when you get feedback loops between systems, it’s not that they will just carry on amplifying change forever; They’re moving towards another kind of stable state, but you don’t know how long it’s going to take to get there, you don’t know what that steady state will be. So for the simulation with the death of clouds, this idea that purely physical feedback between rising global temperatures, changes in the water cycle, and cloud cover, then you end up with a world that’s much, much hotter and much more arid than the one we have at the moment, which could be a very dangerous state. For sort of perpetual human survival, we would need a completely different way of feeding ourselves and really interacting with the environment. 

You don’t know what sort of death traps or kill mechanisms lie along that path of change; You don’t know if there is, for instance, somewhere here, it’s going to trigger a nuclear war, or it’s going to trigger attempts to geoengineer the climate in a sort of bid to gain safety, but actually these turn out to have catastrophic consequences, or all the others that are unknown unknowns we want to make turn into known unknowns, and then turn into things that we can actually begin to understand and study. So in terms of not knowing where the bottom is, that’s potentially limitless as far as humanity is concerned. We know that it will have an end. Worst case scenario, that end is a very arid climate with a much less complex, much simpler atmosphere, which would basically need to be terraformed back into a livable environment in the way that we’re currently thinking maybe we could do that for Mars. But to get a global effort to do that, in an already sort of disintegrating Earth, I think would be an extremely tall order. There’s a huge range of different threats and different potential opportunities for an existential catastrophe to unravel within this kind of death spiral. And we think this really is a very credible threat.

Ariel Conn: How do we deal with all this uncertainty?

Haydn Belfield: More research needed, is the classic academic response to any time you ask that question. More research.

Simon Beard: That’s definitely the case, but there are also big questions about the kind of research. So mostly scientists want to study things that they already kind of understand: where you already have well established techniques, you have journals that people can publish their research in, you have an extensive peer review community, you can say, yes, you have done this study by the book, you get to publish it. That’s what all the incentives are aligned towards. 

And that sort of research is very important and very valuable, and I don’t want to say that we need less of that kind of research. But that kind of research is not going to deal with the sort of radical uncertainty that we’re talking about here. So we do need more creative science, we need science that is willing to engage in speculation, but to do so in an open and rigorous way. One of the things is you need scientists who are willing to come on the stand and say, “Look, here’s a hypothesis. I think it’s probably wrong, and I don’t yet know how to test it. But I want people to come out and help me find a way to test this hypothesis and falsify it.” 

There aren’t any scientific incentive structures at the moment that encourage that. That is not a way to get tenure, and it’s not a way to get a professorship or chair, or to take your paper published. That is a really stupid strategy to take if you want to be a successful scientist. So what we need to do is we need to create a safe sandbox for people who are concerned about this — and we know from our engagement that there are a lot of people who would really like to study this and really like to understand it better — for them to do that. So one of the big things that we’re really looking at here in CSER is how do we make the tools to make the tools that will then allow us to study this. How do we provide the methodological insights or the new perspectives that are needed to move towards establishing a science of social collapse or environmental collapse that we can actually use to then answer some of these questions.

So there are several things that we’re working on at the moment. One important thing, which I think is a very crucial step for dealing with the sort of radical uncertainty we face, is this classification. We’ve already talked about classifying different levels of critical system. That’s one part of a larger classification scheme that CSER has been developing to just look at all the different components of risk and say, “Well, there’s this and this and this. Once you start to sort of engage in that exercise and look at what are all the systems that might be vulnerable? What are all the possible vulnerabilities that exist within those systems? What are all the ways in which humanity has exposed these vulnerabilities that they could harness if things go wrong? And you map that out; You haven’t got to the truth, but you’ve moved a lot of things in the unknown category into the, “Okay, I now know all the ways that things could go wrong, and I know that I haven’t a clue how any of these things could happen.” Then you need to say, “Well, what are the techniques that seem appropriate?” 

So we think the planetary boundaries framework, albeit it doesn’t answer the question that we’re interested in, it offers a really nice approach to looking at this question about where tipping points arise, where systems move out of their ordinary operation. We want to apply that in new environments, we want to find new ways of using that. And there are other tools as well that we can take, for instance, from disaster studies and risk management studies, looking at things like fault tree analysis where you say, “What are all the things that might go wrong with this? And what are the levers that we currently have or the interventions that we could make to stop this from happening?” 

We also think that there’s a lot more room for people to share their knowledge and their thoughts and their fears and expectations to what we call structured expert solicitations, where you get people who have very different knowledge together, and you find a way that they can all talk to each other and they can all learn from each other. And often you get answers out of these sort of exercises that are very different to what any individual might put in at the beginning, but they represent a much more sort of complete, much more creative structure. And you can get those published because it’s a recognized scientific method, so structured expert solicitations on climate change got published in Nature last month. Which is great, because it’s a really under researched topic. But I think one of the things that really helped there was that they were using an established method.

What I really hope that CSER’s work going forward is going to achieve is just to make this space that we can actually work with many more of the people who we need to work with to answer these questions and understand the nature of this risk and pull them all together and make the social structures so that the kind of research that we really badly need at this point can actually start to emerge.

Ariel Conn: A lot of what you’re talking about doesn’t sound like something that we can do in the short term, that it will take at least a decade, if not more to get some of this research accomplished. So in the interest of speed — which is one of the uncertainties we have, we don’t seem to have a good grasp of how much time we have before the climate could get really bad — what do we do in the short term? What do we do for the next decade? What do non-academics do?

 

Haydn Belfield: The thing is, it’s kind of two separate questions, right? We certainly know all we need to know to take really drastic, serious action on climate change. What we’re asking is a slightly more specific question, which is how can climate change, climate breakdown, climate chaos contribute to existential risk. So we already know with very high certainty that climate change is going to be terrible for billions of people in the world, that it’s going to make people’s lives harder, it’s going to make them getting out of extreme poverty much harder.

 

And we also know that the people who have contributed the least to the problem are going to be the ones that are screwed the worst by climate change. And it’s just so unfair, and so wrong, that I think we know enough now to take serious action on climate change. And not only is it wrong, it’s not in the interest of rich countries to live in this world of chaos, of worse weather events, and so on. So I think we already know enough, we have enough certainty on those questions to act very seriously, to reduce our emissions very quickly, to invest in as much clean technology as we can, and to collaborate collectively around the world to make those changes. And what we’re saying though, is about the different, more unusual question of how it contributes to existential risk more specifically. So I think I would just make that distinction pretty clear. 

 

Simon Beard: So there’s a direct answer to your question and an indirect answer to your question. Direct answer to your question is all the things you know you should be doing. Fly less, preferably not at all; eat less meat, preferably not at all, and perfectly not dairy, either. Every time there’s an election, vote, but also ask all the candidates — all the candidates, don’t just go for the ones who you think will give you the answer you like — “I’m thinking of voting for you. What are you going to do about climate change?” 

 

There are a lot of people all over the political spectrum who care about climate change. Yeah, there are political slumps in who cares more, and so on. But every political candidate has votes that they could pick up if they did more on climate change, irrespective of their political persuasion. And even if you have a political conviction, so that you’re always going to vote the same way, you can nudge candidates to get those votes and to do more on climate change by just asking that simple question: “I’m thinking of voting for you. What are you going to do about climate change?” That’s a really low buy, it’s good for election; If they get 100 letters, all saying that, and they’re all personal letters, and not just some mass campaign, it really does change the way that people think about the problems that they face. But I also want to challenge you a bit on this, “This is going to take decades,” because it depends — depends how we approach it.

 

Ariel Conn: So one example of research that can happen quickly and action that can occur quickly is this example that you give early on in the work that you’re doing, comparing the need to study climate change as a contributor to existential risk as the work that was done in the 80s, looking at how nuclear weapons can create a nuclear winter, and how that connects to an existential risk. And so I was hoping you could also talk a little bit about that comparison.

 

Simon Beard: Yeah, so I think this is really important and I know a lot of the things that we’re talking about here, about critical global systems and how they interact with each other and so on — it’s long winded, and it’s technical, and it can sound a bit boring. But this was, for me, a really big inspiration as for why we’re trying to look at it in this way. So when people started to explode nuclear weapons in the Manhattan Project in the early 1940s, right from the beginning, they were concerned about the kind of threats, or the kind of risks that these posed, and firstly thought, well, maybe it would set light to the upper atmosphere. And there were big worries about the radiation. And then, for a time, there were worries just about the explosive capacity. 

 

This was enough to raise a kind of general sense of alarm and threat. But none of these were really credible. They didn’t last; They didn’t withstand scientific scrutiny for very long. And then Carl Sagan and some colleagues did this research in the early 1980s on modeling the climate impacts of nuclear weapons, which is not a really intuitive thing to do, right? When you’ve got the most explosive weapon ever envisaged, and it has all this nuclear fallout and so, and you think, what’s this going to do to the global climate, that doesn’t seem like that’s going to be where the problems lie.

 

But they discover when they look at that, that no, it’s a big thing. If you have nuclear strikes on cities, it sends a lot of ash into the upper atmosphere. And it’s very similar to what happens if you have a very large asteroid, or a very large set of volcanoes going off; The kind of changes that you see in the upper atmosphere are very similar, and you get this dramatic global cooling. And this then threatens — as a lot of mass extinctions have — threatens the underlying food source. And that’s how humans starve. And this comes out in 1983, this is kind of 40 years after people started talking about nuclear risk. And it changes the game, because all of a sudden, in looking at this rather unusual topic, they find a really credible way in which nuclear winter leads to everyone dying.

 

The research is still much discussed, and what kind of nuclear warhead, what kind of nuclear explosions, and how many and would they need to hit cities, or would they need to hit areas with particularly large sulphur deposits, or all of these things — these are still being discussed. But all of a sudden, the top leaders, the geopolitical leaders start to take this threat seriously. And we know Reagan was very interested and explored this a lot, the Russians even more so. And it really does seem to have kick started a lot of nuclear disarmament debate and discussion and real action.

 

And what we’re trying to do in reframing the way that people research climate change as an existential threat is to look for something like that: What’s a credible way in which this really does lead to an existential catastrophe for humanity? Because that hasn’t been done yet. We don’t have that. We feel like we have it because everyone knows the threat and the risk. But really, we’re just at this area of kind of vague speculation. There’s a lot of room for people to step up with this kind of research. And the historical evidence suggests that this can make a real difference.

 

Haydn Belfield: We tend to think of existential risks as one-off threats — some big explosion, or some big thing, like an individual asteroid that hits an individual species of dinosaurs and then kills it, right — we tend to think of existential risks as one singular event. But really, that’s not how most mass extinctions happen. That’s not how civilizational collapses have tended to happen over history. The way that all of these things have actually happened, when you go back to look at archeological evidence or you go back to look at the fossil evidence, is that there’s a whole range of different things — different hazards and different internal capabilities of these systems, whether they’re species or societies — and they get overcome by a range of different things. 

 

So, often in archeological history — in the Pueblo Southwest, for example — there’ll be one set of climatic conditions, and one external shock that faces the community, and they react fine to it. But then, in a few different years, the same community is faced by some similar threats, but reacts completely differently and collapses completely. It’s not that there’s these one singular, overwhelming events from outside, it’s that you have to look at all the different systems that this one particular society or whatever relies on. And you have to look at when all of those things overcome the overall resilience of a system. 

 

Or looking at species, like what happens when sometimes a species can recover from an external shock, and sometimes there’s just too many things, and the conditions aren’t right, and they get overcome, and they go extinct. That’s where looking at existential risk, and looking at the study of how we might collapse or how we might go extinct — that’s where the field needs to go: It needs to go into looking at what are all the different hazards we face, how do they interact with the vulnerabilities that we have, and the internal dynamics of our systems that we rely on, and the different resilience of those systems, and how are we exposed to those hazards in different ways, and having a much more sophisticated, complicated, messy look at how they all interact. I think that’s the way that existential risk research needs to go.

 

Simon Beard: I agree. I think that fits in with various things we said earlier.

 

Ariel Conn: So then my final question for both of you is — I mean, you’re not even just looking at climate change as an existential threat; I know you look at lots of things and how they contribute to existential threats — but looking at climate change, what gives you hope?

 

Simon Beard: At a psychological level, hope and fear aren’t actually big day-to-day parts of my life. Because working in existential risk, you have this amazing privilege that you’re doing something, you’re working to make that difference between human extinction and civilization collapse and human survival and flourishing. It’s a waste to have that opportunity and to get too emotional about it. It’s a waste firstly because it is the most fascinating problem. It is intellectually stimulating; It is diverse; It allows you to engage with and talk to the best people, both in terms of intelligence and creativity, but also in terms of drive and passion, and activism and ability to get things done.

 

But also because it’s a necessary task: We have to get on with it, we have to do this. So I don’t know if I have hope. But that doesn’t mean that I’m scared or anxious, I just have a strong sense of what I have to do. I have to do what I can to contribute, to make a difference, to maximize my impact. That’s a series of problems and we have to solve those problems. If there’s one overriding emotion that I have in relation to my work, and what I do, and what gets me out of bed, it’s curiosity — which is, I think, at the end of the day, one of the most motivating emotions that exists. People often say to me, “What’s the thing I should be most worried about: nuclear war, or artificial intelligence or climate change? Like, tell me, what should I be most worried about?” You shouldn’t worry about any of those things. Because worry is a very disabling emotion.

 

People who worry stay in bed. I haven’t got time to do that. I had heart surgery about 18 months ago, a big heart bypass operation. And they warned me before that, after this surgery, you’re going to feel emotional, it happens to everyone. It’s basically a near death experience. You have to be cooled down to a state that you can’t recover on your own; They have to heat you up. Your body kind of remembers these things. And I do remember a couple of nights after getting home from that. And I just burst into floods of tears thinking about this kind of existential collapse, and, you know, what it would mean for my kids and how we’d survive it, and it was completely overwhelming. As overwhelming as you’d expect it to be for someone who has to think about that. 

 

But this isn’t how we engage with it. This isn’t science fiction stories that we’re telling ourselves to feel scared or feel a rush. This is a real problem. And we’re here to solve that problem. I’ve been very moved the last month or so by all the stuff about the Apollo landing missions. And it’s reminded me, sort of a big inspiration of my life, one of these bizarre inspirations of my life, was getting Microsoft Encarta 95, which was kind of my first all-purpose knowledge source. And when you loaded it up — because it was the first one on CD ROM — they had these sound clips and they included that bit of JFK’s speech about we choose to go to the moon, not because it’s easy, but because it’s hard. And that has been a really inspiring quote for me. And I think I’ve often chosen to do things because they’re hard. 

 

And it’s been kind of upsetting — this is the first time this kind of moon landing anniversary’s come up — and I realized no, he was being completely literal. Like the reason that I chose to go to the moon was it was so hard that the Russians couldn’t do it. So they were confident that they were going to win the race. And that was all that mattered. But for me, I think in this case, we’re choosing to do this research and to do this work, not because it’s hard, but because it’s easy. Because understanding climate change, being curious about it, working out new ways to adapt, and to mitigate, and to manage the risk, is so much easier than living with the negative consequences of it. This is the best deal on the table at the moment. This is the way that we maximize the benefit for minimizing the cost.

 

This is not the great big structural change that completely messes up our entire society, and reduces us to some kind of Greek primitivism. That’s what happens if climate change kicks in. That’s when we start to see people reduced to subsistence level, agricultural, whatever it is. Understanding the risk and responding to it: this is the way that we keep all the good things that our civilization has given us. This is the way that we keep international travel, that we keep our technology, that we keep our food and getting nice things from all around the world. 

 

And yes, it does require some sacrifices. But these are really small change in the scale of things. And once we start to make them we will find ways of working around it. We are very creative, we are very adaptable, we can adapt to the changes that we need to make to mitigate climate change. And we’ll be good at that. And I just wish that anyone listening to this podcast had that mindset, didn’t think about fear or about blame, or shame or anger — that they thought about curiosity, and they thought about what can I do, and how good this is going to be, how bright and open our future is, and how much we can achieve as a species.

 

If we can just get over these hurdles, these mistakes that we made years ago, for various reasons — often a small number of people in the land, you know, that’s what determined that we have petrol cars rather than battery cars — and we can undo them; It’s in our power, it’s in our gift. We are the species that can determine our own fate; We get to choose. And that’s why we’re doing this research. And I think if lots of people — especially if lots of people who are well educated, maybe scientists, maybe people who are thinking about a career in science — view this problem in that light, as what can I do? What’s the difference I can make? We’re powerful. It’s a much less difficult problem to solve and a much better ultimate payoff that we’ll get than if we try and solve this any other way, especially if we don’t do anything.

 

Ariel Conn: That was wonderful.

 

Simon Beard: Yeah, I’m ready to storm the barricade.

 

Ariel Conn: All right, Haydn try to top that.

 

Haydn Belfield: No way. That’s great. I think Simon said all that needs to be said on that.

 

Ariel Conn: All right. Well, thank you both for joining us today.

 

Simon Beard: Thank you. It’s been a pleasure.

 

Haydn Belfield: Yeah, absolute pleasure.

 

 

 

 

AI Alignment Podcast: On the Governance of AI with Jade Leung

In this podcast, Lucas spoke with Jade Leung from the Center for the Governance of AI (GovAI). GovAI strives to help humanity capture the benefits and mitigate the risks of artificial intelligence. The center focuses on the political challenges arising from transformative AI, and they seek to guide the development of such technology for the common good by researching issues in AI governance and advising decision makers. Jade is Head of Research and Partnerships at GovAI, where her research focuses on modeling the politics of strategic general purpose technologies, with the intention of understanding which dynamics seed cooperation and conflict.

Topics discussed in this episode include:

  • The landscape of AI governance
  • GovAI’s research agenda and priorities
  • Aligning government and companies with ideal governance and the common good
  • Norms and efforts in the AI alignment community in this space
  • Technical AI alignment vs. AI Governance vs. malicious use cases
  • Lethal autonomous weapons
  • Where we are in terms of our efforts and what further work is needed in this space

You can take a short (3 minute) survey to share your feedback about the podcast here.

Important timestamps: 

0:00 Introduction and updates

2:07 What is AI governance?

11:35 Specific work that Jade and the GovAI team are working on

17:21 Windfall clause

21:20 Policy advocacy and AI alignment community norms and efforts

27:22 Moving away from short-term vs long-term framing to a stakes framing

30:44 How do we come to ideal governance?

40:22 How can we contribute to ideal governance through influencing companies and government?

48:12 US and China on AI

51:18 What more can we be doing to positively impact AI governance?

56:46 What is more worrisome, malicious use cases of AI or technical AI alignment?

01:01:19 What is more important/difficult, AI governance or technical AI alignment?

01:03:49 Lethal autonomous weapons

01:09:49 Thinking through tech companies in this space and what we should do

 

Two key points from Jade: 

“I think one way in which we need to rebalance a little bit, as kind of an example of this is, I’m aware that a lot of the work, at least that I see in this space, is sort of focused on very aligned organizations and non-government organizations. So we’re looking at private labs that are working on developing AGI. And they’re more nimble. They have more familiar people in them, we think more similarly to those kinds of people. And so I think there’s an attraction. There’s really good rational reasons to engage with the folks because they’re the ones who are developing this technology and they’re plausibly the ones who are going to develop something advanced.

“But there’s also, I think, somewhat biased reasons why we engage, is because they’re not as messy, or they’re more familiar, or we see more value aligned. And I think this early in the field, putting all our eggs in a couple of very, very limited baskets, is plausibly not that great a strategy. That being said, I’m actually not entirely sure what I’m advocating for. I’m not sure that I want people to go and engage with all of the UN conversations on this because there’s a lot of noise and very little signal. So I think it’s a tricky one to navigate, for sure. But I’ve just been reflecting on it lately, that I think we sort of need to be a bit conscious about not group thinking ourselves into thinking we’re sort of covering all the basis that we need to cover.”

 

“I think one thing I’d like for people to be thinking about… this short term v. long term bifurcation. And I think a fair number of people are. And the framing that I’ve tried on a little bit is more thinking about it in terms of stakes. So how high are the stakes for a particular application area, or a particular sort of manifestation of a risk or a concern.

“And I think in terms of thinking about it in the stakes sense, as opposed to the timeline sense, helps me at least try to identify things that we currently call or label near term concerns, and try to filter the ones that are worth engaging in versus the ones that maybe we just don’t need to engage in at all. An example here is that basically I am trying to identify near term/existing concerns that I think could scale in stakes as AI becomes more advanced. And if those exist, then there’s really good reason to engage in them for several reasons, right?…Plausibly, another one would be privacy as well, because I think privacy is currently a very salient concern. But also, privacy is an example of one of the fundamental values that we are at risk of eroding if we continue to deploy technologies for other reasons : efficiency gains, or for increasing control and centralizing of power. And privacy is this small microcosm of a maybe larger concern about how we could possibly be chipping away at these very fundamental things which we would want to preserve in the longer run, but we’re at risk of not preserving because we continue to operate in this dynamic of innovation and performance for whatever cost. Those are examples of conversations where I find it plausible that there are existing conversations that we should be more engaged in just because those are actually going to matter for the things that we call long term concerns, or the things that I would call sort of high stakes concerns.”

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. Key works mentioned in this podcast can be found here 

Lucas: Hey, everyone. Welcome back to the AI Alignment Podcast. I’m Lucas Perry. And today, we will be speaking with Jade Leung from the Center for the Governance of AI, housed at the Future of Humanity Institute. Their work strives to help humanity capture the benefits and mitigate the risks of artificial intelligence. They focus on the political challenges arising from transformative AI, and seek to guide the development of such technology for the common good by researching issues in AI governance and advising decision makers. Jade is Head of Research and Partnerships at GovAI, and her research work focusing on modeling the politics of strategic general purpose technologies, with the intention of understanding which dynamics seed cooperation and conflict.

In this episode, we discuss GovAI’s research agenda and priorities, the landscape of AI governance, how we might arrive at ideal governance, the dynamics and roles of both companies and states within this space, how we might be able to better align private companies with what we take to be ideal governance. We get into the relative importance of technical AI alignment and governance efforts on our path to AGI, we touch on lethal autonomous weapons, and also discuss where we are in terms of our efforts in this broad space, and what work we might like to see more of.

As a general bit of announcement, I found all the feedback coming in through the SurveyMonkey poll to be greatly helpful. I’ve read through all of your comments and thoughts, and am working on incorporating feedback where I can. So for the meanwhile, I’m going to leave the survey up, and you’ll be able to find a link to it in a description of wherever you might find this podcast. Your feedback really helps and is appreciated. And, as always, if you find this podcast interesting or useful, consider sharing with others who might find it valuable as well. And so, without further ado, let’s jump into our conversation with Jade Leung.

So let’s go ahead and start by providing a little bit of framing on what AI governance is, the conceptual landscape that surrounds it. What is AI governance, and how do you view and think about this space?

Jade: I think the way that I tend to think about AI governance is with respect to how it relates to the technical field of AI safety. In both fields, the broad goal is how humanity can best navigate our transition towards a world with advanced AI systems in it. The technical AI safety agenda and the kind of research that’s being done there is primarily focused on how do we build these systems safely and well. And the way that I think about AI governance with respect to that is broadly everything else that’s not that. So that includes things like the social, political, economic context that surrounds the way in which this technology is developed and built and used and employed.

And specifically, I think with AI governance, we focus on a couple of different elements of it. One big element is the governance piece. So what are the kinds of norms and institutions we want around a world with advanced AI serving the common good of humanity. And then we also focus a lot on the kind of strategic political impacts and effects and consequences of the route on the way to a world like that. So what are the kinds of risks, social, political, economic? And what are the kinds of impacts and effects that us developing it in sort of sub-optimal ways could have on the various things that we care about.

Lucas: Right. And so just to throw out some other cornerstones here, because I think there’s many different ways of breaking up this field and thinking about it, and this sort of touches on some of the things that you mentioned. There’s the political angle, the economic angle. There’s the military. There’s the governance and the ethical dimensions.

Here on the AI Alignment Podcast, before we’ve, at least breaking down the taxonomy sort of into the technical AI alignment research, which is getting machine systems to be aligned with human values and desires and goals, and then the sort of AI governance, the strategy, the law stuff, and then the ethical dimension. Do you have any preferred view or way of breaking this all down? Or is it all just about good to you?

Jade: Yeah. I mean, there are a number of different ways of breaking it down. And I think people also mean different things when they say strategy and governance and whatnot. I’m not particular excited about getting into definitional debates. But maybe one way of thinking about what this word governance means is, at least I often think of governance as the norms, and the processes, and the institutions that are going to, and already do, shape the development and deployment of AI. So I think a couple of things that are work underlining in that, I think there’s … The word governance isn’t just specifically government and regulations. I think that’s a specific kind of broadening of the term, which is worth pointing out because that’s a common misconception, I think, when people use the word governance.

So when I say governance, I mean governance and regulation, for sure. But I also mean what are other actors doing that aren’t governance? So labs, researchers, developers, NGOs, journalists, et cetera, and also other mechanisms that aren’t regulation. So it could be things like reputation, financial flows, talent flows, public perception, what’s within and outside the opportune window, et cetera. So there’s a number of different levers I think you can pull if you’re thinking about governance.

It’s probably worth also pointing out, I think, when people say governance, a lot of the time people are talking about the normative side of things, so what should it look like, and how could be if it were good? A lot of governance research, at least in this space now, is very much descriptive. So it’s kind of like what’s actually happening, and trying to understand the landscape of risk, the landscape of existing norms that we have to work with, what’s a tractable way forward with existing actors? How do you model existing actors in the first place? So a fair amount of the research is very descriptive, and I would qualify that as AI governance research, for sure.

Other ways of breaking it down are, according to the research done that we put out, is one option. So that kind of breaks it down into firstly understanding the technological trajectory, so that’s understanding where this technology is likely to go, what are the technical inputs and constraints, and particularly the ones that have implications for governance outcomes. This looks like things like modeling AI progress, mapping capabilities, involves a fair amount of technical work.

And then you’ve got the politics cluster, which is probably where a fair amount of the work is at the moment. This is looking at political dynamics between powerful actors. So, for example, my work is focusing on big firms and government and how they relate to each other, but also includes how AI transforms and impacts political systems, both domestically and internationally. This includes the cluster around international security and the race dynamics that fall into that. And then also international trade, which is a thing that we don’t talk about a huge amount, but politics also includes this big dimension of economics in it.

And then the last cluster is this governance cluster, which is probably the most normative end of what we would want to be working on in this space. This is looking at things like what are the ideal institutions, infrastructure, norms, mechanisms that we can put in place now/in the future that we should be aiming towards that can steer us in robustly good directions. And this also includes understanding what shapes the way that these governance systems are developed. So, for example, what roles does the public have to play in this? What role do researchers have to play in this? And what can we learn from the way that we’ve governed previous technologies in similar domains, or with similar challenges, and how have we done on the governance front on those bits as well. So that’s another way of breaking it down, but I’ve heard more than a couple of ways of breaking this space down.

Lucas: Yeah, yeah. And all of them are sort of valid in their own ways, and so we don’t have to spend too much time on this here. Now, a lot of these things that you’ve mentioned are quite macroscopic effects in the society and the world, like norms and values and developing a concept of ideal governance and understanding actors and incentives and corporations and institutions and governments. Largely, I find myself having trouble developing strong intuitions about how to think about how to impact these things because it’s so big it’s almost like the question of, “Okay, let’s figure out how to model all of human civilization.” At least all of the things that matter a lot for the development and deployment of technology.

And then let’s also think about ideal governance, like what is also the best of all possible worlds, based off of our current values, that we would like to use our model of human civilization to bring us closer towards? So being in this field, and exploring all of these research threads, how do you view making progress here?

Jade: I can hear the confusion in your voice, and I very much resonate with it. We’re sort of consistently confused, I think, at this place. And it is a very big, both set of questions, and a big space to kind of wrap one’s head around. I want to emphasize that this space is very new, and people working in this space are very few, at least with respect to AI safety, for example, which is still a very small section that feels as though it’s growing, which is a good thing. We are at least a couple of years behind, both in terms of size, but also in terms of sophistication of thought and sophistication of understanding what are more concrete/sort of decision relevant ways in which we can progress this research. So we’re working hard, but it’s a fair ways off.

One way in which I think about it is to think about it in terms of what actors are making decisions now/in the near to medium future, that are the decisions that you want to influence. And then you sort of work backwards from that. I think at least, for me, when I think about how we do our research at the Center for the Governance of AI, for example, when I think about what is valuable for us to research and what’s valuable to invest in, I want to be able to tell a story of how I expect this research to influence a decision, or a set of decisions, or a decision maker’s priorities or strategies or whatever.

Ways of breaking that down a little bit further would be to say, you know, who are the actors that we actually care about? One relatively crude bifurcation is focusing on those who are in charge of developing and deploying these technologies, firms, labs, researchers, et cetera, and then those who are in charge of sort of shaping the environment in which this technology is deployed, and used, and is incentivized to progress. So that’s folks who shape the legislative environment, folks who shape the market environment, folks who shape the research culture environment, and expectations and whatnot.

And with those two sets of decision makers, you can then boil it down into what are the particular decisions they are in charge of making that you can decide you want to influence, or try to influence, by providing them with research insights or doing research that will in some down shoot way, affect the way they think about how these decisions should be made. And a very, very concrete example would be to pick, say, a particular firm. And they have a set of priorities, or a set of things that they care about achieving within the lifespan of that firm. And they have a set of strategies and tactics that they intend to use to execute on that set of priorities. So you can either focus on trying to shift their priorities towards better directions if you think they’re off, or you can try to point out ways in which their strategies could be done slightly better, e.g. they be coordinating more with other actors, or they should be thinking harder about openness in their research norms. Et cetera, et cetera.

Well, you can kind of boil it down to the actor level and the decision specific level, and get some sense of what it actually means for progress to happen, and for you to have some kind of impact with this research. One caveat with this is that I think if one takes this lens on what research is worth doing, you’ll end up missing a lot of valuable research being done. So a lot of the work that we do currently, as I said before, is very much understanding what’s going on in the first place. What are the actual inputs into the AI production function that matter and are constrained and are bottle-necked? Where are they currently controlled? A number of other things which are mostly just descriptive I can’t tell you with which decision I’m going to influence by understanding this. But having a better baseline will inform better work across a number of different areas. I’d say that this particular lens is one way of thinking about progress. There’s a number of other things that it wouldn’t measure, that are still worth doing in this space.

Lucas: So it does seem like we gain a fair amount of tractability by just thinking, at least short term, who are the key actors, and how might we be able to guide them in a direction which seems better. I think here it would also be helpful if you could let us know, what is the actual research that you, and say, Allan Dafoe engage in on a day to day basis. So there’s analyzing historical cases. I know that you guys have done work with specifying your research agenda. You have done surveys of American attitudes and trends on opinions on AI. Jeffrey Ding has also released a paper on deciphering China’s AI dream, tries to understand China’s AI strategy. You’ve also released on the malicious use cases of artificial intelligence. So, I mean, what is it like being Jade on a day to day trying to conquer this problem?

Jade: The specific work that I’ve spent most of my research time on to date sort of falls into the politics/governance cluster. And basically, the work that I do is centered on the assumption that there are things that we can learn from a history of trying to govern strategic general purpose technologies well. And if you look at AI, and you believe that it has certain properties that make it strategic, strategic here in the sense that it’s important for things like national security and economic leadership of nations and whatnot. And it’s also general purpose technology, in that it has the potential to do what GPTs do, which is to sort of change the nature of economic production, push forward a number of different frontiers simultaneously, enable consistent cumulative progress, change course of organizational functions like transportation, communication, et cetera.

So if you think that AI looks like strategic general purpose technology, then the claim is something like, in history we’ve seen a set of technology that plausibly have the same traits. So the ones that I focus on are biotechnology, cryptography, and aerospace technology. And the question that sort of kicked off this research is, how have we dealt with the very fraught competition that we currently see in the space of AI when we’ve competed across these technologies in the past. And the reason why there’s a focus on competition here is because, I think one important thing that characterizes a lot of the reasons why we’ve got a fair number of risks in the AI space is because we are competing over it. “We” here being very powerful nations, very powerful firms, and the reason why competition is an important thing to highlight is that it exacerbates a number of risks and it causes a number of risks.

So when you’re in a competitive environment, actors were normally incentivized to take larger risks than they otherwise would rationally do. They are largely incentivized to not engage in the kind of thinking that is required to think about public goods governance and serving the common benefit of humanity. And they’re more likely to engage in thinking about, is more about serving parochial, sort of private, interests.

Competition is bad for a number of reasons. Or it could be bad for a number of reasons. And so the question I’m asking is, how have we competed in the past? And what have been the outcomes of those competitions? Long story short, so the research that I do is basically I dissect these cases of technology development, specifically in the US. And I analyze the kinds of conflicts, and the kinds of cooperation that have existed between the US government and the firms that were leading technology development, and also the researcher communities that were driving these technologies forward.

Other pieces of research that are going on, we have a fair number of our researcher working on understanding what are the important inputs into AI that are actually progressing us forward. How important is compute relative to algorithmic structures, for example? How important is talent, with respect to other inputs? And then the reason why that’s important to analyze and useful to think about is understanding who controls these inputs, and how they’re likely to progress in terms of future trends. So that’s an example of the technology forecasting work.

In the politics work, we have a pretty big chunk on looking at the relationship between governments and firms. So this is a big piece of work that I’ve been doing, along with a fair amount of others, understanding, for example, if the US government wanted to control AI R&D, what are the various levers that they have available, that they could use to do things like seize patents, or control research publications, or exercise things like export controls, or investment constraints, or whatnot. And the reason why we focus on that is because my hypothesis is that ultimately, ultimately you’re going to start to see states get much more involved. At the moment, you’re currently in this period of time wherein a lot of people describe it as very private sector driven, and the governments are behind, I think, and history would also suggest that the state is going to be involved much more significantly very soon. So understanding what they could do, and what their motivations are, are important.

And then, lastly, on the governance piece, a big chunk of our work here is specifically on public opinions. So you’ve mentioned this before. But basically, we have a big substantial chunk of our work, consistently, is just understanding what the public thinks about various issues to do with AI. So recently, we published a report of the recent set of surveys that we did surveying the American public. And we asked them a variety of different questions and got some very interesting answers.

So we asked them questions like: What risks do you think are most important? Which institution do you trust the most to do things with respect to AI governance and development? How important do you think certain types of governance challenges are for American people? Et cetera. And the reason why this is important for the governance piece is because governance ultimately needs to have sort of public legitimacy. And so the idea was that understanding how the American public thinks about certain issues can at least help to shape some of the conversation around where we should be headed in governance work.

Lucas: So there’s also been work here, for example, on capabilities forecasting. And I think Allan and Nick Bostrom also come at these from slightly different angles sometimes. And I’d just like to explore all of these so we can get all of the sort of flavors of the different ways that researchers come at this problem. Was it Ben Garfinkel who did the offense-defense analysis?

Jade: Yeah.

Lucas: So, for example, there’s work on that. That work was specifically on trying to understand how the offense-defense bias scales as capabilities change. This could have been done with nuclear weapons, for example.

Jade: Yeah, exactly. That was an awesome piece of work by Allan and Ben Garfinkel, looking at this concept of the offense-defense balance, which exists for weapon systems broadly. And they were sort of analyzing and modeling. It’s a relatively theoretical piece of work, trying to model how the offense-defense balance changes with investments. And then there was a bit of a investigation there specifically on how we could expect AI to affect the offense-defense balance in different types of contexts. The other cluster work, which I failed to mention as well, is a lot of our work on policy, specifically. So this is where projects like the windfall clause would fall in.

Lucas: Could you explain what the windfall clause is, in a sentence or two?

Jade: The windfall clause is an example of a policy lever, which we think could be a good idea to talk about in public and potentially think about implementing. And the windfall clause is an ex-ante voluntary commitment by AI developers to distribute profits from the development of advanced AI for the common benefit of humanity. What I mean by ex-ante is that they commit to it now. So an AI developer, say a given AI firm, will commit to, or sign, the windfall clause prior to knowing whether they will get to anything like advanced AI. And what they commit to is saying that if I hit a certain threshold of profits, so what we call windfall profit, and the threshold is very, very, very high. So the idea is that this should only really kick in if a firm really hits the jackpot and develops something that is so advanced, or so transformative in the economic sense, that they get a huge amount of profit from it at some sort of very unprecedented scale.

So if they hit that threshold of profit, this clause will kick in, and that will commit them to distributing their profits according to some kind of pre-committed distribution mechanism. And the idea with the distribution mechanism is that it will redistribute these products along the lines of ensuring that sort of everyone in the world can benefit from this kind of bounty. There’s a lot of different ways in which you could do the distribution. And we’re about to put out the report which outlines some of our thinking on it. And there are many more ways in which it could be done besides from what we talk about.

But effectively, what you want in a distribution mechanism is you want it to be able to do things like rectify inequalities that could have been caused in the process of developing advanced AI. You want it to be able to provide a financial buffer to those who’ve been thoughtlessly unemployed by the development of advanced AI. And then you also want it to do somewhat positive things too. So it could be, for example, that you distribute it according to meeting the sustainable development goals. Or it could be redistributed according to a scheme that looks something like the UBI. And that transitions us into a different type of economic structure. So there are various ways in which you could play around with it.

Effectively, the windfall clause is starting a conversation about how we should be thinking about the responsibilities that AI developers have to ensure that if they do luck out, or if they do develop something that is as advanced as some of what we speculate we could get to, there is a responsibility there. And there also should be a committed mechanism there to ensure that that is balanced out in a way that reflects the way that we want this value to be distributed across the world.

And that’s an example of the policy lever that is sort of uniquely concrete, in that we don’t actually do a lot of concrete research. We don’t do much policy advocacy work at all. But to the extent that we want to do some policy advocacy work, it’s mostly with the motivation that we want to be starting important conversations about robustly good policies that we could be advocating for now, that can help steer us in better directions.

Lucas: And fitting this into the research threads that we’re talking about here, this goes back to, I believe, Nick Bostrom’s Superintelligence. And so it’s sort of predicated on more foundational principles, which can be attributed to before the Asilomar Conference, but also the Asilomar principles which were developed in 2017, that the benefits of AI should be spread widely, and there should be abundance. And so then there becomes these sort of specific policy implementations or mechanisms by which we are going to realize these principles which form the foundation of our ideal governance.

So Nick has sort of done a lot of this work on forecasting. The forecasting in Superintelligence was less about concrete timelines, and more about the logical conclusions of the kinds of capabilities that AI will have, fitting that into our timeline of AI governance thinking, with ideal governance at the end of that. And then behind us, we have history, which we can, as you’re doing yourself, try to glean more information about how what you call general purpose technologies affect incentives and institutions and policy and law and the reaction of government to these new powerful things. Before we brought up the windfall clause, you were discussing policy at FHI.

Jade: Yeah, and one of the reasons why it’s hard is because if we put on the frame that we mostly make progress by influencing decisions, we want to be pretty certain about what kinds of directions we want these decisions to go, and what we would want these decisions to be, before we engage in any sort of substantial policy advocacy work to try to make that actually a thing in the real world. I am very, very hesitant about our ability to do that well, at least at the moment. I think we need to be very humble about thinking about making concrete recommendations because this work is hard. And I also think there is this dynamic, at least, in setting norms, and particularly legislation or regulation, but also just setting up institutions, in that it’s pretty slow work, but it’s very path dependent work. So if you establish things, they’ll be sort of here to stay. And we see a lot of legacy institutions and legacy norms that are maybe a bit outdated with respect to how the world has progressed in general. But we still struggle with them because it’s very hard to get rid of them. And so the kind of emphasis on humility, I think, is a big one. And it’s a big reason why basically policy advocacy work is quite slim on the ground, at least in the moment, because we’re not confident enough in our views on things.

Lucas: Yeah, but there’s also this tension here. The technology’s coming anyway. And so we’re sort of on this timeline to get the right policy stuff figured out. And here, when I look at, let’s just take the Democrats and the Republicans in the United States, and how they interact. Generally, in terms of specific policy implementation and recommendation, it just seems like different people have various dispositions and foundational principles which are at odds with one another, and that policy recommendations are often not substantially tested, or the result of empirical scientific investigation. They’re sort of a culmination and aggregate of one’s very broad squishy intuitions and modeling or the world, and different intuitions one has. Which is sort of why, at least at the policy level, seemingly in the United States government, it seems like a lot of the conversation is just endless arguing that gets nowhere. How do we avoid that here?

Jade: I mean, this is not just specifically an AI governance problem. I think we just struggle with this in general as we try to do governance and politics work in a good way. It’s a frustrating dynamic. But I think one thing that you said definitely resonates and that, a bit contra to what I just said. Whether we like it or not, governance is going to happen, particularly if you take the view that basically anything that shapes the way this is going to go, you could call governance. Something is going to fill the gap because that’s what humans do. You either have the absence of good governance, or you have somewhat better governance if you try to engage a little bit. There’s definitely that tension.

One thing that I’ve recently been reflecting on, in terms of things that we under-prioritize in this community, because it’s sort of a bit of a double-edged sword of being very conscientious about being epistemically humble and being very cautious about things, and trying to be better calibrated and all of that, which are very strong traits of people who work in this space at the moment. But I think almost because of those traits, too, we undervalue, or we don’t invest enough time or resource in just trying to engage in existing policy discussions and existing governance institutions. And I think there’s also an aversion to engaging in things that feel frustrating and slow, and that’s plausibly a mistake, at least in terms of how much attention we pay to it because in the absence of our engagement, the things still going to happen anyway.

Lucas: I must admit that as someone interested in philosophy I’ve resisted for a long time now, the idea of governance in AI at least casually in favor of nice calm cool rational conversations at tables that you might have with friends about values, and ideal governance, and what kinds of futures you’d like. But as you’re saying, and as Alan says, that’s not the way that the world works. So here we are.

Jade: So here we are. And I think one way in which we need to rebalance a little bit, as kind of an example of this is, I’m aware that a lot of the work, at least that I see in this space, is sort of focused on very aligned organizations and non-government organizations. So we’re looking at private labs that are working on developing AGI. And they’re more nimble. They have more familiar people in them, we think more similarly to those kinds of people. And so I think there’s an attraction. There’s really good rational reasons to engage with the folks because they’re the ones who are developing this technology and they’re plausibly the ones who are going to develop something advanced.

But there’s also, I think, somewhat biased reasons why we engage, is because they’re not as messy, or they’re more familiar, or we feel more value aligned. And I think this early in the field, putting all our eggs in a couple of very, very limited baskets, is plausibly not that great a strategy. That being said, I’m actually not entirely sure what I’m advocating for. I’m not sure that I want people to go and engage with all of the UN conversations on this because there’s a lot of noise and very little signal. So I think it’s a tricky one to navigate, for sure. But I’ve just been reflecting on it lately, that I think we sort of need to be a bit conscious about not group thinking ourselves into thinking we’re sort of covering all the bases that we need to cover.

Lucas: Yeah. My view on this, and this may be wrong, is just looking at the EA community, and the alignment community, and all that they’ve done to try to help with AI alignment. It seems like a lot of talent feeding into tech companies. And there’s minimal efforts right now to engage in actual policy and decision making at the government level, even for short term issues like disemployment and privacy and other things. The AI alignment is happening now, it seems.

Jade: On the noise to signal point, I think one thing I’d like for people to be thinking about, I’m pretty annoyed at this short term v. long term bifurcation. And I think a fair number of people are. And the framing that I’ve tried on a little bit is more thinking about it in terms of stakes. So how high are the stakes for a particular application area, or a particular sort of manifestation of a risk or a concern.

And I think in terms of thinking about it in the stakes sense, as opposed to the timeline sense, helps me at least try to identify things that we currently call or label near term concerns, and try to filter the ones that are worth engaging in versus the ones that maybe we just don’t need to engage in at all. An example here is that basically I am trying to identify near term/existing concerns that I think could scale in stakes as AI becomes more advanced. And if those exist, then there’s really good reason to engage in them for several reasons, right? One is this path dependency that I talked about before, so norms that you’re developing around, for example, privacy or surveillance. Those norms are going to stick, and the ways in which we decide we want to govern that, even with narrow technologies now, those are the ones we’re going to inherit, grandfather in, as we start to advance this technology space. And then I think you can also just get a fair amount of information about how we should be governing the more advanced versions of these risks or concerns if you engage earlier.

I think there are actually probably, even just off the top off of my head, I can think of a couple which seemed to have scalable stakes. So, for example, a very existing conversation in the policy space is about this labor displacement problem and automation. And that’s the thing that people are freaking out about now, is the extent that you have litigation and bills and whatnot being passed, or being talked about at least. And you’ve got a number of people running on political platforms on the basis of that kind of issue. And that is both an existing concern, given automation to date. But it’s also plausibly a huge concern as this stuff is more advanced, to the point of economic singularity, if you wanted to use that term, where you’ve got vast changes in the structure of the labor market and the employment market, and you can have substantial transformative impacts on the ways in which humans engage and create economic value and production.

And so existing automation concerns can scale into large scale labor displacement concerns, can scale into pretty confusing philosophical questions about what it means to conduct oneself as a human in a world where you’re no longer needed in terms of employment. And so that’s an example of a conversation which I wish more people were engaged in right now.

Plausibly, another one would be privacy as well, because I think privacy is currently a very salient concern. But also, privacy is an example of one of the fundamental values that we are at risk of eroding if we continue to deploy technologies for other reasons : efficiency gains, or for increasing control and centralizing of power. And privacy is this small microcosm of a maybe larger concern about how we could possibly be chipping away at these very fundamental things which we would want to preserve in the longer run, but we’re at risk of not preserving because we continue to operate in this dynamic of innovation and performance for whatever cost. Those are examples of conversations where I find it plausible that there are existing conversations that we should be more engaged in just because those are actually going to matter for the things that we call long term concerns, or the things that I would call sort of high stakes concerns.

Lucas: That makes sense. I think that trying on the stakes framing is helpful, and I can see why. It’s just a question about what are the things today, and within the next few years, that are likely to have a large effect on a larger end that we arrive at with transformative AI. So we’ve got this space of all these four cornerstones that you guys are exploring. Again, this has to do with the interplay and interdependency of technical AI safety, politics, policy of ideal governance, the economics, the military balance and struggle, and race dynamics all here with AI, on our path to AGI. So starting here with ideal governance, and we can see how we can move through these cornerstones, what is the process by which ideal governance is arrived at? How might this evolve over time as we get closer to superintelligence?

Jade: It may be a couple of thoughts, mostly about what I think a desirable process is that we should follow, or what kind of desired traits do we want to have in the way that we get to ideal governance and what ideal governance could plausibly look like. I think that’s to the extent that I maybe have thoughts about it. And they’re quite obvious ones, I think. Governance literature has said a lot about what consists of both morally sound, politically sound, socially sound governance processes or design of governance processes.

So those are things like legitimacy and accountability and transparency. I think there are some interesting debates about how important certain goals are, either as end goals or as instrumental goals. So for example, I’m not clear where my thinking is on how important inclusion and diversity is. As we’re aiming for ideal governance, so I think that’s an open question, at least in my mind.

There are also things to think through around what’s unique to trying to aim for ideal governance for a transformative general purpose technology. We don’t have a very good track record of governing general purpose technologies at all. I think we have general purpose technologies that have integrated into society and have served a lot of value. But that’s not for having had governance of them. I think we’ve been come combination of lucky and somewhat thoughtful sometimes, but not consistently so. If we’re staking the claim that AI could be a uniquely transformative technology, then we need to ensure that we’re thinking hard about the specific challenges that it poses. It’s a very fast-moving emerging technology. And governments historically has always been relatively slow at catching up. But you also have certain capabilities that you can realize by developing, for example, AGI or super intelligence, which governance frameworks or institutions have never had to deal with before. So thinking hard about what’s unique about this particular governance challenge, I think, is important.

Lucas: Seems like often, ideal governance is arrived at through massive suffering of previous political systems, like this form of ideal governance that the founding fathers of the United States came up with was sort of an expression of the suffering they experienced at the hands of the British. And so I guess if you track historically how we’ve shifted from feudalism and monarchy to democracy and capitalism and all these other things, it seems like governance is a large slowly reactive process born of revolution. Whereas, here, what we’re actually trying to do is have foresight and wisdom about what the world should look like, rather than trying to learn from some mistake or some un-ideal governance we generate through AI.

Jade: Yeah, and I think that’s also another big piece of it, is another way of thinking about how to get to ideal governance is to aim for a period of time, or a state of the world in which we can actually do the thinking well without a number of other distractions/concerns on the way. So for example, conditions that we want to drive towards would mean getting rid of things like the current competitor environment that we have, which for many reasons, some of which I mentioned earlier, it’s a bad thing, and it’s particularly counterproductive to giving us the kind of space and cooperative spirit and whatnot that we need to come to ideal governance. Because if you’re caught in this strategic competitive environment, then that makes a bunch of things just much harder to do in terms of aiming for coordination and cooperation and whatnot.

You also probably want better, more accurate, information out there, hence being able to think harder by looking at better information. And so a lot of work can be done to encourage more accurate information to hold more weight in public discussions, and then also encourage an environment that is genuine, epistemically healthy deliberation about that kind of information. All of what I’m saying is also not particularly unique, maybe, to ideal governance for AI. I think in general, you can sometimes broaden this discussion to what does it look like to govern a global world relatively well. And AI is one of the particular challenges that are maybe forcing us to have some of these conversations. But in some ways, when you end up talking about governance, it ends up being relatively abstract in a way, I think, ruins technology. At least in some ways there are also particular challenges, I think, if you’re thinking particularly about superintelligence scenarios. But if you’re just talking about governance challenges in general, things like accurate information, more patience, lack of competition and rivalrous dynamics and what not, that generally is kind of just helpful.

Lucas: So, I mean, arriving at ideal governance here, I’m just trying to model and think about it, and understand if there should be anything here that should be practiced differently, or if I’m just sort of slightly confused here. Generally, when I think about ideal governance, I see that it’s born of very basic values and principles. And I view these values and principles as coming from nature, like the genetics, evolution instantiating certain biases and principles and people that tend to lead to cooperation, conditioning of a culture, how we’re nurtured in our homes, and how our environment conditions us. And also, people update their values and principles as they live in the world and communicate with other people and engage in public discourse, even more foundational, meta-ethical reasoning, or normative reasoning about what is valuable.

And historically, these sort of conversations haven’t mattered, or they don’t seem to matter, or they seem to just be things that people assume, and they don’t get that abstract or meta about their values and their views of value, and their views of ethics. It’s been said that, in some sense, on our path to superintelligence, we’re doing philosophy on a deadline, and that there are sort of deep and difficult questions about the nature of value, and how best to express value, and how to idealize ourselves as individuals and as a civilization.

So I guess I’m just throwing this all out there. Maybe not necessarily we have any concrete answers. But I’m just trying to think more about the kinds of practices and reasoning that should and can be expected to inform ideal governance. Should meta-ethics matter here, where it doesn’t seem to matter in public discourse. I still struggle between the ultimate value expression that might be happening through superintelligence, and the tension between that, and how are public discourse functions. I don’t know if you have any thoughts here.

Jade: No particular thoughts, aside from to generally agree that I think meta-ethics is important. It is also confusing to me why public discourse doesn’t seem to track the things that seem important. This probably is something that we’ve struggled and tried to address in various ways before, so I guess I’m always cognizant of trying to learn from ways in which we’ve tried to improve public discourse and tried to create spaces for this kind of conversation.

It’s a tricky one for sure, and thinking about better practices is probably the main way at least in which I engage with thinking about ideal governance. It’s often the case that people, when they look at the cluster of ideal governance work though like, “Oh, this is the thing that’s going to tell us what the answer is,” like what’s the constitution that we have to put in place, or whatever it is.

At least for me, the maun chunk of thinking is mostly centered around process, and it’s mostly centered around what constitutes a productive optimal process, and some ways of answering this pretty hard question. And how do you create the conditions in which you can engage with that process without being distracted or concerned about things like competition? Those are kind of the main ways in which it seems obvious that we can fix the current environment so that we’re better placed to answer what is a very hard question.

Lucas: Coming to mind here is also, is this feature that you pointed out, I believe, that ideal governance is not figuring everything out in terms of our values, but rather creating the kind of civilization and space in which we can take the time to figure out ideal governance. So maybe ideal governance is not solving ideal governance, but creating a space to solve ideal governance.

Usually, ideal governance has to do with modeling human psychology, and how to best to get human being to produce value and live together harmoniously. But when we introduce AI, and human beings become potentially obsolete, then ideal governance potentially becomes something else. And I wonder, if the role of, say, experimental cities with different laws, policies, and governing institutions might be helpful here.

Jade: Yeah, that’s an interesting thought. Another thought that came to mind as well, actually, is just kind of reflecting on how ill-equipped I feel thinking about this question. One funny trait of this field is that you have a slim number of philosophers, but specially in the AI strategy and safety space, it’s political scientists, international relations people, economists, and engineers, and computer scientists thinking about questions that other spaces have tried to answer in different ways before.

So when you mention psychology, that’s an example. Obviously, philosophy has something to say about this. But there’s also a whole space of people have thought about how we govern things well across a number of different domains, and how we do a bunch of coordination and cooperation better, and stuff like that. And so it makes me reflect on the fact that there could be things that we already have learned that we should be reflecting a little bit more on which we currently just don’t have access to because we don’t necessarily have the right people or the right domains of knowledge in this space.

Lucas: Like AI alignment has been attracting a certain crowd of researchers, and so we miss out on some of the insights that, say, psychologists might have about ideal governance.

Jade: Exactly, yeah.

Lucas: So moving along here, from ideal governance, assuming we can agree on what ideal governance is, or if we can come to a place where civilization is stable and out of existential risk territory, and where we can sit down and actually talk about ideal governance, how do we begin to think about how to contribute to AI governance through working with or in private companies and/or government.

Jade: This is a good, and quite large, question. I think there are a couple of main ways in which I think about productive actions that either companies or governments can take, or productive things we can do with both of these actors to make them more inclined to do good things. On the point of other companies, the primary thing I think that is important to work on, at least concretely in the near term, is to do something like establish the norm and expectation that as developers of this important technology that will have a large plausible impact on the world, they have a very large responsibility proportional to their ability to impact the development of this technology. By making the responsibility something that is tied to their ability to shape this technology, I think that as a foundational premise or a foundational axiom to hold about why private companies are important, that can get us a lot of relatively concrete things that we should be thinking about doing.

The simple way of saying its is something like if you are developing the thing, you’re responsibly for thinking about how that thing is going to affect the world. And establishing that, I think is a somewhat obvious thing. But it’s definitely not how the private sector operates at the moment, in that there is an assumed limited responsibility irrespective of how your stuff is deployed in the world. What that actually means can be relatively concrete. Just looking at what these labs, or what these firms have the ability to influence, and trying to understand how you want to change it.

So, for example, internal company policy on things like what kind of research is done and invested in, and how you allocate resources across, for example, safety and capabilities research, what particular publishing norms you have, and considerations around risks or benefits. Those are very concrete internal company policies that can be adjusted and shifted based on one’s idea of what they’re responsible for. The broad thing, I think, to try to steer them in this direction of embracing, acknowledging, and then living up this greater responsibility, as an entity that is responsible for developing the thing.

Lucas: How would we concretely change the incentive structure of a company who’s interested in maximizing profit towards this increased responsibility, say, in the domains that you just enumerated.

Jade: This is definitely probably one of the hardest things about this claim being translated into practice. I mean, it’s not the first time we’ve been somewhat upset at companies for doing things that society doesn’t agree with. We don’t have a great track record of changing the way that industries or companies work. That being said, I think if you’re outside of the company, there are particularly levers that one can pull that can influence the way that a company is incentivized. And then I think we’ve also got examples of us being able to use these levers well.

The fact that companies are constrained by the environment that a government creates, and governments also have the threat of things like regulation, or the threat of being able to pass certain laws or whatnot, which actually the mere threat, historically, has done a fair amount in terms of incentivizing companies to just step up their game because they don’t want regulation to kick in, which isn’t conducive to what they want to do, for example.

Users of the technology is a pretty classic one. It’s a pretty inefficient one, I think, because you’ve got to coordinate many, many different types of users, and actors, and consumers and whatnot, to have an impact on what companies are incentivized to do. But you have seen environmental practices in other types of industries that have been put in place as standards or expectations that companies should abide by because consumers across a long period of time have been able to say, “I disagree with this particular practice.” That’s an example of a trend that has succeeded.

Lucas: That would be like boycotting or divestment.

Jade: Yeah, exactly. And maybe a slightly more efficient one is focusing on things like researchers and employees. That is, if you are a researcher, if you’re an employee, you have levers over the employer that you work for. They need you, and you need them, and there’s that kind of dependency in that relationship. This is all a long way of saying that I think, yes, I agree it’s hard to change incentive structures of any industry, and maybe specifically so in this case because they’re very large. But I don’t think it’s impossible. And I think we need to think harder about how to use those well. I think the other thing that’s working in our favor in this particular case is that we have a unique set of founders or leaders of these labs or companies that have expressed pretty genuine sounding commitments to safety and to cooperativeness, and to serving the common good. It’s not a very robust strategy to rely on certain founders just being good people. But I think in this case, it’s kind of working in our favor.

Lucas: For now, yeah. There’s probably already other interest groups who are less careful, who are actually just making policy recommendations right now, and we’re broadly not in on the conversation due to the way that we think about the issue. So in terms of government, what should we be doing? Yeah, it seems like there’s just not much happening.

Jade: Yeah. So I agree there isn’t much happening, or at least relative to how much work we’re putting into trying to understand and engage with private labs. There isn’t much happening with government. So I think there needs to be more thought put into how we do that piece of engagement. I think good things that we could be trying to encourage more governments to do, for one, investing in productive relationships with the technical community, and productive relationships with the researcher community, and with companies as well. At least in the US, it’s pretty adversarial between Silicon Valley firms and DC.

And that isn’t good for a number of reasons. And one very obvious reason is that there isn’t common information or common understand of what’s going on, what the risks are, what the capabilities are, et cetera. One of the main critiques of governments is that they’re ill-equipped in terms of access to knowledge, and access to expertise, to be able to appropriately design things like bills, or things like pieces of legislation or whatnot. And I think that’s also something that governments should take responsibility for addressing.

So those are kind of law hanging fruit. There’s a really tricky balance that I think governments will need to strike, which is the balance between avoiding over-hasty ill-informed regulation. A lot of my work looking at history will show that the main ways in which we’ve achieved substantial regulation is as a result of big public, largely negative events to do with the technology screwing something up, or the technology causing a lot of fear, for whatever reasons. And so there’s a very sharp spike in public fear or public concern, and then the government then kicks into gear. And I think that’s not a good dynamic in terms of forming nuanced well-considered regulation and governance norms. Avoiding the outcome is important, but it’s also important that governments do engage and track how this is going, and particularly track where things like company policy and industry-wide efforts are not going to be sufficient. So when do you start translating some of the more soft law, if you will, into actual hard law.

That will be a very tricky timing question, I think, for governments to grapple with. But ultimately, it’s not sufficient to have companies governing themselves. You’ll need to be able to consecrate it into government backed efforts and initiatives and legislation and bills. My strong intuition is that it’s not quite the right time to roll out object level policies. And so the main task for governments will be just to position themselves to do that well when the time is right.

Lucas: So what’s coming to my mind here is I’m thinking about YouTube compilations of congressional members of the United States and senators asking horrible questions to Mark Zuckerberg and the CEO of, say, Google. They just don’t understand the issues. The United States is currently not really thinking that much about AI, and especially transformative AI. Whereas, China, it seems, has taken a step in this direction and is doing massive governmental investments. So what can we say about this assuming difference? And the question is, what are governments to do in this space? Different governments are paying attention at different levels.

Jade: Some governments are more technological savvy than others, for one. So I pushed back on the US not … They’re paying attention on different things. So, for example, the Department of Commerce put out a notice to the public indicating that they’re exploring putting in place export controls on a cluster of emerging technologies, including a fair number of AI relevant technologies. The point of export controls is to do something like ensure that adversaries don’t get access to critical technologies that, if they do, then that could undermine national security and/or domestic industrial base. The reasons why export controls are concerning is because they’re a) a relatively outdated tool. They used to work relatively well when you were targeting specific kind of weapons technologies, or basically things that you could touch and see. And the restriction of them from being on the market by the US means that a fair amount of it won’t be able to be accessed by other folks around the world. And you’ve seen export controls be increasingly less effective the more that we’ve tried to apply to things like cryptography, where it’s largely software based. And so trying to use export controls, which are applied at the national border, is a very tricky thing to make effective.

So you have the US paying attention to the fact that they think that AI is a national security concern, at least in this respect, enough to indicate that they’re interested in exploring export controls. I think it’s unlikely that export controls are going to be effective at achieving the goals that the US want to pursue. But I think export controls is also indicative of a world that we don’t want to slide in, which is a world where you have rivalrous economic blocks, where you’re sort of protecting your own base, and you’re not contributing to the kind of global commons of progressing this technology.

Maybe it goes back to what we were saying before, in that if you’re not engaged in the governance, the governance is going to happen anyway. This is an example of activity is going to happen anyway. I think people assume now, probably rightfully so, that the US government is not going to be very effective because they are not technically literate. In general, they are sort of relatively slow moving. They’ve got a bunch of other problems that they need to think about, et cetera. But I don’t think it’s going to take very, very long for the US government to start to seriously engage. I think the thing that is worth trying to influence is what they do when they start to engage.

If I had a policy in mind that I thought was robustly good that the US government should pass, then that would be the more proactive approach. It seems possible that if we think about this hard enough, there could be robustly good things that the US government could do, that could be good to be proactive about.

Lucas: Okay, so there’s this sort of general sense that we’re pretty heavy on academic papers because we’re really trying to understand the problem, and the problem is so difficult, and we’re trying to be careful and sure about how we progress. And it seems like it’s not clear if there is much room, currently, for direct action, given our uncertainty about specific policy implementations. There are some shorter term issues. And sorry to say shorter term issues. But, by that, I mean automation and maybe lethal autonomous weapons and privacy. These things, we have a more clear sense of, at least about potential things that we can start doing. So I’m just trying to get a sense here from you, on top of these efforts to try to understand the issues more, and on top of these efforts, for example, like 80,000 Hours has contributed. And by working to place aligned persons in various private organizations, what else can we be doing? What would you like to see more being done on here?

Jade: I think this is on top of just more research. But that would be the first thing that comes to mind, is people thinking hard about it seems like a thing that I want a lot more of, in general. But on top of that, what you mentioned, I think, the placing people, that maybe fits into this broader category of things that seems good to do, which is investing in building our capacity to influence the future. That’s quite a general statement. But something like it takes a fair amount of time to build up influence, particularly in certain institutions, like governments, like international institutions, et cetera. And so investing in that early seems good. And doing things like trying to encourage value aligned sensible people to climb the ladders that they need to climb in order to get to positions of influence, that generally seems like a good and useful thing.

The other thing that comes to mind as well is putting out more accurate information. One specific version of things that we could do here is, there is currently a fair number of inaccurate, or not well justified memes that are floating around, that are informing the way that people think. For example, the US and China are in a race. Or a more nuanced one is something like, inevitably, you’re going to have a safety performance trade off. And those are not great memes, in the sense that they don’t seem to be conclusively true. But they’re also not great in that they put you in a position of concluding something like, “Oh, well, if I’m going to invest in safety, I’ve got to be an altruist, or I’m going to trade off my competitive advantage.”

And so identifying what those bad ones are, and countering those, is one thing to do. Better memes could be something like those are developing this technology are responsible for thinking through its consequences. Or something even as simple as governance doesn’t mean government, and it doesn’t mean regulation. Because I think you’ve got a lot of firms who are terrified of regulation. And so they won’t engage in this governance conversation because of it. So there could be some really simple things I think we could do, just to make the public discourse both more accurate and more conducive to things being done that are good in the future.

Lucas: Yeah, here I’m also just seeing the tension here between the appropriate kinds of memes that inspire, I guess, a lot of the thinking within the AI alignment community, and the x-risk community, versus what is actually useful or spreadable for the general public, adding in here ways in which accurate information can be info-hazardy. I think broadly in our community, the common good principle, and building an awesome future for all sentient creatures, and I am curious to know how spreadable those memes are.

Jade: Yeah, the spreadability of memes is a thing that I want someone to investigate more. The things that make things not spreadable, for example, are just things that are, at a very simple level, quite complicated to explain, or are somewhat counterintuitive so you can’t pump the intuition very easily. Particularly things that require you to decide that one set of values that you care about, that’s competing against another set of values. Any set of things that brings nationalism against cosmopolitanism, I think, is a tricky one, because you have some subset of people. The ones that you and I talk to the most are very cosmopolitan. But you also have a fair amount of people who care about the common good principle, in some sense, but also care about their nation in a fairly large sense as well.

So there are things that make certain memes less good or less spreadable. And one key thing will be to figure out which ones are actually good in the true sense, and good in the pragmatic to spread sense.

Lucas: Maybe there’s a sort of research program here, where psychologists and researchers can explore focus groups on the best spreadable memes, which reflect a lot of the core and most important values that we see within AI alignment, and EA, and x-risk.

Jade: Yeah, that could be an interesting project. I think also in AI safety, or in the AI alignment space, people are framing safety in quite different ways. One framing of that, which like it’s a part of what it means to be a good AI person, is to think about safety. That’s an example of one that I’ve seen take off a little bit more lately because that’s an explicit act of trying to mainstream the thing. That’s a meme, or an example of a framing, or a meme, or whatever you want to call it. And you know there are pros and cons of that. The pros would be, plausibly, it’s just more mainstream. And I think you’ve seen evidence of that be the case because more people are more inclined to say, “Yeah, I agree. I don’t want to build a thing that kills me if I want it to get coffee.” But you’re not going to have a lot of conversations about maybe the magnitude of risks that you actually care about. So that’s maybe a con.

There’s maybe a bunch of stuff to do in this general space of thinking about how to better frame the kind of public facing narratives of some of these issues. Realistically, memes are going to fill the space. People are going to talk about it in certain ways. You might as well try to make it better, if it’s going to happen.

Lucas: Yeah, I really like that. That’s a very good point. So let’s talk here a little bit about technical AI alignment. So in technical AI alignment, the primary concerns are around the difficulty of specifying what humans actually care about. So this is like capturing human values and aligning with our preferences and goals, and what idealized versions of us might want. So, so much of AI governance is thus about ensuring that this AI alignment process we engage in doesn’t skip too many corners. The purpose of AI governance is to decrease risks, to increase coordination, and to do all of these other things to ensure that, say, the benefits of AI are spread widely and robustly, that we don’t get locked into any negative governance systems or value systems, and that this process of bringing AIs in alignment with the good doesn’t have researchers, or companies, or governments skipping too many corners on safety. In this context, and this interplay between governance and AI alignment, how much of a concern are malicious use cases relative to the AI alignment concerns within the context of AI governance?

Jade: That’s a hard one to answer, both because there is a fair amount of uncertainty around how you discuss the scale of the thing. But also because I think there are some interesting interactions between these two problems. For example, if you’re talking about how AI alignment interacts with this AI governance problem. You mentioned before AI alignment research is, in some ways, contingent on other things going well. I generally agree with that.

For example, it depends on AI safety taking place in research cultures and important labs. It requires institutional buy-in and coordination between institutions. It requires this mitigation of race dynamics so that you can actually allocate resources towards AI alignment research. All those things. And so in some ways, that particular problem being solved is contingent on us doing AI governance well. But then, also to the point of how big is malicious use risk relative to AI alignment, I think in some ways that’s hard to answer. But in some ideal world, you could sequence the problems that you could solve. If you solve the AI alignment problem first, then AI governance research basically becomes a much narrower space, addressing how an aligned AI could still cause problems because we’re not thinking about the concentration of power, the concentration of economic gains. And so you need to think about things like the windfall clause, to distribute that, or whatever it is. And you also need to think about the transition to creating an aligned AI, and what could be messy in that transition, how you avoid public backlash so that you can actually see the fruits of you having solved this AI alignment problem.

So that becomes more the kind of nature of the thing that AI governance research becomes, if you assume that you’ve solved the AI alignment problem. But if we assume that, in some world, it’s not that easy to solve, and both problems are hard, then I think there’s this interaction between the two. In some ways, it becomes harder. In some ways, they’re dependent. In some ways, it becomes easier if you solve bits of one problem.

Lucas: I generally model the risks of malicious use cases as being less than the AI alignment stuff.

Jade: I mean, I’m not sure I agree with that. But two things I could say to that. I think, one, intuition is something like you have to be a pretty awful person to really want to use a very powerful system to cause terrible ends. And it seems more plausible that people will just do it by accident, or unintentionally, or inadvertently.

Lucas: Or because the incentive structures aren’t aligned, and then we race.

Jade: Yeah. And then the other way to sort of support this claim is, if you look at biotechnology and bio-weapons, specifically, bio-security/bio-terrorism issues, so like malicious use equivalent. Those have been far less, in terms of frequency, compared to just bio-safety issues, which are the equivalent of accident risks. So people causing unintentional harm because we aren’t treating biotechnology safely, that’s cause a lot more problems, at least in terms of frequency, compared to people actually just trying to use it for terrible means.

Lucas: Right, but don’t we have to be careful here with the strategic properties and capabilities of the technology, especially in the context in which it exists? Because there’s nuclear weapons, which are sort of the larger more absolute power imbuing technology. There has been less of a need for people to take bio-weapons to that level. You know? And also there’s going to be limits, like with nuclear weapons, on the ability of a rogue actor to manufacture really effective bio-weapons without a large production facility or team of research scientists.

Jade: For sure, yeah. And there’s a number of those considerations, I think, to bear in mind. So it definitely isn’t the case that you haven’t seen malicious use in bio strictly because people haven’t wanted to do it. There’s a bunch of things like accessibility problems, and tacit knowledge that’s required, and those kinds of things.

Lucas: Then let’s go ahead and abstract away malicious use cases, and just think about technical AI alignment, and then AI/AGI governance. How do you see the relative importance of AI and AGI governance, and the process of AI alignment that we’re undertaking? Is solving AI governance potentially a bigger problem than AI alignment research, since AI alignment research will require the appropriate political context to succeed? On our path to AGI, we’ll need to mitigate a lot of the race conditions and increase coordination. And then even after we reach AGI, the AI governance problem will continue, as we sort of explored earlier that we need to be able to maintain a space in which humanity, AIs, and all earth originating sentient creatures are able to idealize harmoniously and in unity.

Jade: I both don’t think it’s possible to actually assess them at this point, in terms of how much we understand this problem. I have a bias towards saying that AI governance is the harder problem because I’m embedded in it and see it a lot more. And maybe ways to support that claim are things we’ve talked about. So AI alignment going well, or happening at all, is sort of contingent on a number of other factors that AI governments are trying to solve, so social political economic context needs to be right in order for that to actually happen, and then in order for that to have an impact.

There are some interesting things that are made maybe easier by AI alignment being solved, or somewhat solved, if you are thinking about the AI governance problem. In fact, it’s just like a general cluster of AI being safer and more robust and more transparent, or whatever, makes certain AI governance challenges just easier. The really obvious example here that comes to mind is the verification problem. The inability to verify what certain systems are designed to do and will do causes a bunch of governance problems. Like, arms control agreements are very hard. Establishing trust between parties to cooperate and coordinate is very hard.

If you happen to be able to solve some of those problems in the process of trying to tackle this AI alignment problem. And that makes AI governance a little bit easier. I’m not sure which direction it cashes out, in terms of which problem is more important. I’m certain that there are interactions between the two, and I’m pretty certain that one depends on the other, to some extent. So it becomes imminently really hard to govern the thing, if you can’t align the thing. But it also is probably the case that by solving some of the problems in one domain, you can help make the other problem a little bit tractable and easier.

Lucas: So now I’d like to get into lethal autonomous weapons. And we can go ahead and add whatever caveats are appropriate here. So in terms of lethal autonomous weapons, some people think that there are major stakes here. Lethal autonomous weapons are a major AI enabled technology that’s likely to come on the stage soon, as we make some moderate improvements to already existing technology, and then package it all together into the form of a lethal autonomous weapon. Some take the view that this is a crucial moment, or that there are high stakes here to get such weapons banned. The thinking here might be that by demarcating unacceptable uses of AI technology, such as for autonomously killing people, and by showing that we are capable of coordinating on this large and initial AI issue, that we might be taking the first steps in AI alignment, and the first steps in demonstrating our ability to take the technology and its consequences seriously.

And so we mentioned earlier how there’s been a lot of thinking, but not much action. This seems to be an initial place where we can take action. We don’t need to keep delaying our direction action and real world participation. So if we can’t get a ban on autonomous weapons, maybe it would seem that we have less hope for coordinating on more difficult issues. And so the lethal autonomous weapons may exacerbate global conflict by increasing skirmishing at borders, decrease the cost of war, dehumanize killing, taking the human element out of death, et cetera.

And other people disagree with this. Other people might argue that banning lethal autonomous weapons isn’t necessary in the long game. It’s not, as we’re framing it, a high stakes thing. Just because this sort of developmental step in this technology is not really crucial for coordination, or for political military stability. Or that coordination later would be born of other things, and that this would just be some other new military technology without much impact. So curious here, to gather what your views, or the views of FHI, or the Center for the Governance of AI, might have on autonomous weapons. Should there be a ban? Should the AI alignment community be doing more about this? And if not, why?

Jade: In terms of caveats, I’ve got a lot of them. So I think the first one is that I’ve not read up on this issue at all, followed it very loosely, but not nearly closely enough, that I feel like I have a confident well-informed opinion.

Lucas: Can I ask why?

Jade: Mostly because of bandwidth issues. It’s not because I have categorized them ahead of something not worth engaging in. I’m actually pretty uncertain about that. The second caveat is, definitely don’t claim to speak on behalf of anyone but myself in this case. The Center for the Governance of AI, we don’t have a particular position on this, nor the FHI.

Lucas: Would you say that this is because the Center for the Governance of AI, would it be for bandwidth issues again? Or would it be because it’s de-prioritized.

Jade: The main thing is bandwidth. Also, I think the main reason why it’s probably been de-prioritized, at least subconsciously, has been the framing of sort of focusing on things that are neglected by folks around the world. It seems like there are people at least with sort of somewhat good intentions tentatively engaged in the LAWS (lethal autonomous weapons) discussion. And so within that frame, I think de-prioritization because it’s not obviously neglected compared to other things that aren’t getting any focus at all.

With those things in mind, I could see a pretty decent case for investing more effort in engaging in this discussion, at least compared to what we currently have. I guess it’s hard to tell, compared to alternatives of how we could be spending those resources, giving it’s such a resource constrained space, in terms of people working in AI alignment, or just bandwidth, in terms of this community in general. So briefly, I think we’ve talked about this idea that there’s this fair amount of path dependency in the way that institutions and norms are built up. And if this is one of the first spaces, with respect to AI capabilities, where we’re going to be getting or driving towards some attempt at international norms, or establishing international institutions that could govern this space, then that’s going to be relevant in a general sense. And specifically, it’s going to be relevant for sort of defense and security related concerns in the AI space.

And so I think you both want to engage because there’s an opportunity to seed desirable norms and practices and process and information. But you also possibly want to engage because there could be a risk that bad norms are established. And so it’s important to engage, to prevent it going down something which is not a good path in terms of this path dependency.

Another reason I think that is maybe worth thinking through, in terms of making a case for engaging more, is that applications of AI in the military and defense spaces, possibly one of the most likely to cause substantial disruption in the near-ish future, and could be an example of something that I call the high stakes concerns in the future. And you can talk about AI and its impact on various aspects of the military domain, where it could have substantial risks. So, for example, in cyber escalation, or destabilizing nuclear security. Those would be examples where military and AI come together, and you can have bad outcomes that we do actually really care about. And so for the same reason, engaging specifically in any discussion that is touching on military and AI concerns, could be important.

And then the last one that comes to mind is the one that you mentioned. This is an opportunity to basically practice doing this coordination thing. And there are various things that are worth practicing or attempting. For one, I think even just observing how these discussions pan out is going to tell you a fair amount about how important actors think about the trade offs of using AI versus sort of going towards more safe outcomes or governance processes. And then our ability corral interest around good values or appropriate norms, or whatnot, that’s a good test of our ability to generally coordinate when we have some of those trade offs around, for example, military advantage versus safety. It gives you some insight into how we could be dealing with similarly shaped issues.

Lucas: All right. So let’s go ahead and bring it back here to concrete actionable real world things today, and understanding what’s actually going on outside of the abstract thinking. So I’m curious to know here more about private companies. At least, to me, they largely seem to be agents of capitalism, like we said. They have a bottom line that they’re trying to meet. And they’re not ultimately aligned with pro-social outcomes. They’re not necessarily committed to ideal governance, but perhaps forms of governance which best serve them. And as we sort of feed aligned people into tech companies, how should we be thinking about their goals, modulating their incentives? What does DeepMind really want? Or what can we realistically expect from key players? And what mechanisms, in addition to the windfall clause, can we use to sort of curb the worst aspects of profit-driven private companies?

Jade: If I knew what DeepMind actually wanted, or what Google actually thought, we’d be in a pretty different place. So a fair amount of what we’ve chatted through, I would echo again. So I think there’s both the importance of realizing that they’re not completely divorced from other people influencing them, or other actors influencing them. And so just thinking hard about which levers are in place already that actually constrain the action of companies, is a pretty good place to start, in terms of thinking about how you can have an impact on their activities.

There’s this common way of talking about big tech companies, which is they can do whatever they want, and they run the world, and we’ve got no way of controlling them. Reality is that they are consistently constrained by a fair number of things. Because they are agents of capitalism, as you described, and because they have to respond to various things within that system. So we’ve mentioned things before, like governments have levers, consumers have levers, employees have levers. And so I think focusing on what those are is a good place to start. Anything that comes to mind is, there’s something here around taking a very optimistic view of how companies could behave. Or at least this is the way that I prefer to think about it, is that you both need to be excited, and motivated, and think that companies can change and create the conditions in which they can. But one also then needs to have a kind of hidden clinic, in some ways.

On both of these, I think the first one, I really want the public discourse to turn more towards the direction of, if we assume that companies want to have the option of demonstrating pro-social incentives, then we should do things like ensure that the market rewards them for acting in pro-social ways, instead of penalizing their attempts at doing so, instead of critiquing every action that they take. So, for example, I think we should be making bigger deals, basically, of when companies are trying to do things that at least will look like them moving in the right direction, as opposed to immediately critiquing them as ethics washing, or sort of just paying lip service to the thing. I want there to be more of an environment where, if you are a company, or you’re a head of a company, if you’re genuinely well-intentioned, you feel like your efforts will be rewarded, because that’s how incentive structures work, right?

And then on the second point, in terms of being realistic about the fact that you can’t just wish companies into being good, that’s when I think the importance of things like public institutions and civil society groups become important. So ensuring that there are consistent forms of pressure, and keep making sure that they feel like their actions are being rewarding if pro-social, but also that there are ways of spotting in which they can be speaking as if they’re being pro-social, but acting differently.

So I think everyone’s kind of basically got a responsibility here, to ensure that this goes forward in some kind of productive direction. I think it’s hard. And we said before, you know, some industries have changed in the past successfully. But that’s always been hard, and long, and messy, and whatnot. But yeah, I do think it’s probably more tractable than the average person would think, in terms of influencing these companies to move in directions that are generally just a little bit more socially beneficial.

Lucas: Yeah. I mean, also the companies were generally made up of fairly reasonable well-intentioned people. I’m not all pessimistic. There are just a lot of people who sit at desks and have their structure. So yeah, thank you so much for coming on, Jade. It’s really been a pleasure. And, yeah.

Jade: Likewise.

Lucas: If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

End of recorded material

FLI Podcast: Is Nuclear Weapons Testing Back on the Horizon? With Jeffrey Lewis and Alex Bell

Nuclear weapons testing is mostly a thing of the past: The last nuclear weapon test explosion on US soil was conducted over 25 years ago. But how much longer can nuclear weapons testing remain a taboo that almost no country will violate? 

In an official statement from the end of May, the Director of the U.S. Defense Intelligence Agency (DIA) expressed the belief that both Russia and China were preparing for explosive tests of low-yield nuclear weapons, if not already testing. Such accusations could potentially be used by the U.S. to justify a breach of the Comprehensive Nuclear-Test-Ban Treaty (CTBT).

The CTBT prohibits all signatories from testing nuclear weapons of any size (North Korea, India, and Pakistan are not signatories). But the CTBT never actually entered into force, in large part because the U.S. has still not ratified it, though Russia did.

The existence of the treaty, even without ratification, has been sufficient to establish the norms and taboos necessary to ensure an international moratorium on nuclear weapons tests for a couple decades. But will that last? Or will the U.S., Russia, or China start testing nuclear weapons again? 

This month, Ariel was joined by Jeffrey Lewis, Director of the East Asia Nonproliferation Program at the Center for Nonproliferation Studies and founder of armscontrolwonk.com, and Alex Bell, Senior Policy Director at the Center for Arms Control and Non-Proliferation. Lewis and Bell discuss the DIA’s allegations, the history of the CTBT, why it’s in the U.S. interest to ratify the treaty, and more.

Topics discussed in this episode: 

  • The validity of the U.S. allegations –Is Russia really testing weapons?
  • The International Monitoring System — How effective is it if the treaty isn’t in effect?
  • The modernization of U.S/Russian/Chinese nuclear arsenals and what that means
  • Why there’s a push for nuclear testing
  • Why opposing nuclear testing can help ensure the US maintains nuclear superiority 

References discussed in this episode: 

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Ariel Conn: Welcome to another episode of the FLI Podcast. I’m your host Ariel Conn, and the big question I want to delve into this month is: will the U.S. or Russia or China start testing nuclear weapons again? Now, at the end of May, the Director of the U.S. Defense Intelligence Agency, the DIA, gave a statement about Russian and Chinese nuclear modernization trends. I want to start by reading a couple short sections of his speech.

About Russia, he said, “The United States believes that Russia probably is not adhering to its nuclear testing moratorium in a manner consistent with the zero-yield standard. Our understanding of nuclear weapon development leads us to believe Russia’s testing activities would help it to improve its nuclear weapons capabilities.”

And then later in the statement that he gave, he said, “U.S. government information indicates that China is possibly preparing to operate its test site year-round, a development that speaks directly to China’s growing goals for its nuclear forces. Further, China continues to use explosive containment chambers at its nuclear test site and Chinese leaders previously joined Russia in watering down language in a P5 statement that would have affirmed a uniform understanding of zero-yield testing. The combination of these facts and China’s lack of transparency on their nuclear testing activities raises questions as to whether China could achieve such progress without activities inconsistent with the Comprehensive Nuclear-Test-Ban Treaty.”

Now, we’ve already seen this year that the Intermediate-Range Nuclear Forces Treaty, the INF, has started to falter. The U.S. seems to be trying to pull itself out of the treaty and now we have reason possibly to be a little worried about the Comprehensive Test-Ban Treaty. So to discuss what the future may hold for this test ban treaty, I am delighted to be joined today by Jeffrey Lewis and Alex Bell.

Jeffrey is the Director of the East Asia Nonproliferation Program at the Center for Nonproliferation Studies at the Middlebury Institute. Before coming to CNS, he was the Director of the Nuclear Strategy and Nonproliferation Initiative at the New America Foundation and prior to that, he worked with the ADAM Project at the Belfer Center for Science and International Affairs, the Association of Professional Schools of International Affairs, the Center for Strategic and International Studies, and he was once a Desk Officer in the Office of the Under Secretary of Defense for Policy. But he’s probably a little bit more famous as being the founder of armscontrolwonk.com, which is the leading blog and podcast on disarmament, arms control, and nonproliferation.

Alex Bell is the Senior Policy Director at the Center for Arms Control and Non-Proliferation. Previously, she served as a Senior Advisor in the Office of the Under Secretary of State for Arms Control and International Security. Before joining the Department of State in 2010, she worked on nuclear policy issues at the Ploughshares Fund and the Center for American Progress. Alex is on the board of the British American Security Information Council and she was also a Peace Corps volunteer. And she is fairly certain that she is Tuxedo, North Carolina’s only nuclear policy expert.

So, Alex and Jeffrey, thank you so much for joining me today.

Jeffrey Lewis: It’s great to be here.

Ariel Conn: Let’s dive right into questions. I was hoping one of you or maybe both of you could just sort of give a really quick overview or a super brief history of the Comprehensive Nuclear-Test-Ban Treaty –– especially who has signed and ratified, and who hasn’t signed and/or ratified with regard to the U.S., Russia, and China.

Jeffrey Lewis: So, there were a number of treaties during the Cold War that restricted nuclear explosions, so you had to do them underground. But in the 1990s, the Clinton administration helped negotiate a global ban on all nuclear explosions. So that’s what the Comprehensive Nuclear-Test-Ban Treaty is. The comprehensive part is, you can’t do any explosions of any yield.

And a curious feature of this agreement is that for the treaty to come into force, certain countries must sign and ratify the treaty. One of those countries was Russia, which has both signed and ratified it. Another country was the United States. We have signed it, but the Senate did not ratify it in 1999, and I think we’re still waiting. China has signed it and basically indicated that they’ll ratify it only when the United States does. India has not signed and not ratified, and North Korea and Iran –– not signed and not ratified.

So it’s been 23 years. There’s a Comprehensive Test-Ban Treaty Organization, which is responsible for getting things ready to go when the treaty is ready; I’m actually here in Vienna at a conference that they’re putting on. But 23 years later, the treaty is still not in force even though we haven’t had any nuclear explosions in the United States or Russia since the end of the Cold War.

Ariel Conn: Yeah. So my understanding is that even though we haven’t actually ratified this and it’s not enforced, most countries, with maybe one or two exceptions, do actually abide by it. Is that true?

Alex Bell: Absolutely. There are 184 member states to the treaty, 168 total ratifications, and the only country to conduct explosive tests in the 21st century is North Korea. So while it is not yet in force, the moratorium against explosive testing is incredibly strong.

Ariel Conn: And do you remain hopeful that that’s going to stay the case, or do comments from people like Lieutenant General Ashley have you concerned?

Alex Bell: It’s a little concerning that the nature of these accusations that came from Lieutenant General Ashley didn’t seem to follow the pattern of how the U.S. government historically has talked about compliance issues that it has seen with various treaties and obligations. We have yet to hear a formal statement from the Department of State who actually has the responsibility to manage compliance issues, nor have we heard from the main part of the Intelligence Community, the Office of the Director for National Intelligence. It’s a bit strange and it has had people thinking, what was the purpose of this accusation if not to sort of move us away from the test ban?

Jeffrey Lewis: I would add that during the debate inside the Trump administration, when they were writing what was called the Nuclear Posture Review, there was a push by some people for the United States to start conducting nuclear explosions again, something that it had not done since the early 1990s. So on the one hand, it’s easy to see this as a kind of straight forward intelligence matter: Are the Russians doing it or are they not?

But on the other hand, there has always been a group of people in the United States who are upset about the test moratorium, and don’t want to see the test ban ratified, and would like the United States to resume nuclear testing. And those people have, since the 1990s, always pointed at the Russians, claiming that they must be doing secret tests and so we should start our own.

And the kind of beautiful irony of this is that when you read articles from Russians who want to start testing –– because, you know, their labs are like ours, they want to do nuclear explosions –– they say, “The Americans are surely getting ready to cheat. So we should go ahead and get ready to go.” So you have these people pointing fingers at one another, but I think the reality is that there are too many people in the United States and Russia who’d be happy to go back to a world in which there was a lot of nuclear testing.

Ariel Conn: And so do we have reason to believe that the Russians might be testing low-yield nuclear weapons or does that still seem to be entirely speculative?

Alex Bell: I’ll let Jeffrey go into some of the historical concerns people have had about the Russian program, but I think it’s important to note that the Russians immediately denied these accusations with the Foreign Minister, Lavrov, actually describing them as delusional and the Deputy Foreign Minister, Sergei Ryabkov, affirmed that they’re in full and absolute compliance with the treaty and the unilateral moratorium on nuclear testing that is also in place until the treaty enters into force. He also penned an op-ed a number of years ago affirming that the Russians believed that any yield on any tests would violate the agreement.

Jeffrey Lewis: Yeah, you know, really from the day the test ban was signed, there have been a group of people in the United States who have argued that the U.S. and Russia have different definitions of zero –– which I don’t find very credible, but it’s a thing people say –– and that the Russians are using this to conduct very small nuclear explosions. This literally was a debate that tore the U.S. Intelligence Community apart during the Clinton administration and these fears led to a really embarrassing moment.

There was a seismic event, some ground motion, some shaking near the Russian nuclear test site in 1997 and the Intelligence Community decided, “Aha, this is it. This is a nuclear test. We’ve caught the Russians,” and Madeline Albright démarched Moscow for conducting a clandestine nuclear test in violation of the CTBT, which it had just signed, and it turned out it was an earthquake out in the ocean.

So there have been a group of people who have been making this claim for more than 20 years. I have never seen any evidence that would persuade me that this is anything other than something they say because they just don’t trust the Russians. I suppose it is possible –– even a stopped watch is right twice a day. But I think before we take any actions, it would behoove us to figure out if there are any facts behind this. Because when you’ve heard the same story for 20 years with no evidence, it’s like the boy who cried wolf. It’s kind of hard to believe

Alex Bell: And that gets back to the sort of strange way that this accusation was framed: not by the Department of State; It’s not clear that Congress has been briefed about it; It’s not clear our allies were briefed about it before Lieutenant General Ashley made these comments. Everything’s been done in a rather unorthodox way and for something as serious as a potential low-yield nuclear test, this really needs to be done according to form.

Jeffrey Lewis: It’s not typical if you’re going to make an accusation that the country is cheating on an arms control treaty to drive a clown car up and then have 15 clowns come out and honk some horns. It makes it harder to accept whatever underlying evidence there may be if you choose to do it in this kind of ridiculous fashion.

Alex Bell: And that would be for any administration, but particularly, an administration that has made a habit of getting out of agreements sort of habitually now.

Jeffrey Lewis: What I loved about the statement that the Defense Intelligence Agency released –– so after the DIA director made this statement, and it’s really worth watching because he reads the statement, which is super inflammatory and there was a reporter in the audience who had been given his remarks in advance. So someone clearly leaked the testimony to make sure there was a reporter there and the reporter asks a question, and then Ashley kind of freaks out and walks back what he said.

So DIA then releases a statement where they double down and say, “No, no, no, he really meant it,” but it starts with the craziest sentence I’ve ever seen, which is “The United States government, including the Intelligence Community, assesses,” which if you know anything about the way the U.S. government works is insane because only the Intelligence Community is supposed to assess. This implies that John Bolton had an assessment, and Mike Pompeo had an assessment, and just the comical manner in which it was handled makes it very hard to take seriously or to see it as anything other than just nakedly partisan assault on the test moratorium and the test ban.

Ariel Conn: So I want to follow up about what the implications are for the test ban, but I want to go back real quick just to some of the technical side of identifying a low-yield explosion. I actually have a background in seismology, so I know that it’s not that big of a challenge for people who study seismic waves to recognize the difference between an earthquake and a blast. And so I’m wondering how small a low yield test actually is. Is it harder to identify, or are there just not seismic stations that the U.S. has access to, or is there something else involved?

Jeffrey Lewis: Well so these are called hydronuclear experiments. They are so incredibly small. They are, on the order in the U.S., there’s something like four pounds of explosive, so basically less explosion than the actual conventional explosions that are used to detonate the nuclear weapon. Some people think the Russians have a slightly bigger definition that might go up to 100 kilograms, but these are mouse farts. They are so small that unless you have the seismic station sitting right next to it, you would never know.

In a way, I think that’s a perfect example of why we’re so skeptical because when the test ban was negotiated, there was this giant international monitoring system put into place. It is not just seismic stations, but it is hydroacoustic stations to listen underwater, infrasound stations to listen for explosions in the air, radionuclide stations to detect any radioactive particles that happen to escape in the event of a test. It’s all of this stuff and it is incredibly sensitive and can detect incredibly small explosions down to about 1,000 tons of explosive and in many cases even less.

And so what’s happened is the allegations against the Russians, every time we have better monitoring and it’s clear that they’re not doing the bigger things, then the allegations are they’re doing ever smaller things. So, again, the way in which it was rolled out was kind of comical and caused us, at least me, to have some doubts about it. It is also the case that the nature of the allegation –– that it’s these tiny, tiny, tiny, tiny experiments, which U.S. scientists, by the way, have said they don’t have any interest in doing because they don’t think they are useful –– it’s almost like the perfect accusation and so that also to me is a little bit suspicious in terms of the motives of the people claiming this is happening.

Alex Bell: I think it’s also important to remember when dealing with verification of treaties, we’re looking for things that would be militarily significant. That’s how we try to build the verification system: that if anybody tried to do anything militarily significant, we’d be able to detect that in enough time to respond effectively and make sure the other side doesn’t gain anything from the violation.

So you could say that experiments like this that our own scientists don’t think are useful are not actually militarily significant, so why are we bringing it up? Do we think that this is a challenge to the treaty overall or do we not like the nature of Russia’s violations? And further, if we’re concerned about it, we should be talking to the Russians instead of about them.

Jeffrey Lewis: I think that is actually the most important point that Alex just made. If you actually think that the Russians have a different definition of zero, then go talk to them and get the same definition. If you think that the Russians are conducting these tests, then talk to the Russians and see if you can get access. If the United States were to ratify the test ban and the treaty were to come into force, there is a provision for the U.S. to ask for an inspection. It’s just a little bit rich to me that the people making this allegation are also the people who refuse to do anything about it diplomatically. If they were truly worried, they’d try to fix the problem.

Ariel Conn: Regarding the fact that the Test-Ban Treaty isn’t technically in force, are a lot of the verification processes still essentially in force anyway?

Alex Bell: The International Monitoring System, as Jeff pointed out, was just sort of in its infancy when the treaty was negotiated and now it’s become this marvel of modern technology capable of detecting tests at even very low yields. And so it is up and running and functioning. It was monitoring the various North Korean nuclear tests that have taken place in this century. It also was doing a lot of additional science like tracking radio particulates that came from the Fukushima disaster back in 2011.

So it is functioning. It is giving readings to any party to the treaty, and it is particularly useful right now to have an independent international source of information of this kind. They specifically did put out a very brief statement following this accusation from the Defense Intelligence Agency saying that they had detected nothing that would indicate a test. So that’s about as far as I think they could get, as far as a diplomatic equivalent of, “What are you talking about?”

Jeffrey Lewis: I Googled it because I don’t remember it off the top of my head, but it’s 321 monitoring stations and 16 laboratories. So the entire monitoring system has been built out and it works far better than anybody thought it would. It’s just that once the treaty comes into force, there will be an additional provision, which is: in the event that the International Monitoring System, or a state party, has any reason to think that there is a violation, that country can request an inspection. And the CTBTO trains to send people to do onsite inspections in the event of something like this. So there is a mechanism to deal with this problem. It’s just that you have to ratify the treaty.

Ariel Conn: So what are the political implications, I guess, of the fact that the U.S. has not ratified this, but Russia has –– and that it’s been, I think you said 23 years? It sounds like the U.S. is frustrated with Russia, but is there a point at which Russia gets frustrated with the U.S.?

Jeffrey Lewis: I’m a little worried about that, yeah. The reality of the situation is I’m not sure that the United States can continue to reap the benefits of this monitoring system and the benefits of what I think Alex rightly described as a global norm against nuclear testing and sort of expect everybody else to restrain themselves while in the United States we refuse to ratify the treaty and talk about resuming nuclear testing.

And so I don’t think it’s a near term risk that the Russians are going to resume testing, but we have seen… We do a lot of work with satellite images at the Middlebury Institute and the U.S. has undertaken a pretty big campaign to keep its nuclear test site modern and ready to conduct a nuclear test on as little as six months’ notice. In the past few years, we’ve seen the Russians do the same thing.

For many years, they neglected their test site. It was in really poor shape and starting in about 2015, they started putting money into it in order to improve its readiness. So it’s very hard for us to say, “Do as we say, not as we do.”

Alex Bell: Yeah, I think it’s also important to realize that if the United States resumes testing, everyone will resume testing. The guardrails will be completely off and that doesn’t make any sense because having the most technologically advanced and capable nuclear weapons infrastructure like we do, we’re benefitted from a global ban on explicit testing. It means we’re sort of locking in our own superiority.

Ariel Conn: So we’re putting that at risk. So I want to expand the conversation from just Russia and the U.S. to pull China in as well because the talk that Ashley gave was also about China’s modernization efforts. And he made some comments that sounded almost like maybe China is considering testing as well. I was sort of curious what your take on his China comments are.

Jeffrey Lewis: I’m going to jump in and be aggressive on this one because my doctoral dissertation was on the history of China’s nuclear weapons program. The class I teach at the Middlebury Institute is one in which we look at declassified U.S. intelligence assessments and then we look at Chinese historical materials in order to see how wrong the intelligence assessments were. This specifically covers U.S. assessments of China’s nuclear testing, and the U.S. just has an awful track record on this topic.

I actually interviewed the former head of China’s nuclear weapons program once, and I was talking to him about this because I was showing him some declassified assessments and I was sort of asking him about, you know, “Had you done this or had you done that?” He sort of kind of took it all in and he just kind of laughed, and he said, “I think many of your assessments were not very accurate.” There was sort of a twinkle in his eye as he said it because I think he was just sort of like, “We wrote a book about it, we told you what we did.”

Anything is possible, and the point of these allegations is events are so small that they are impossible to disprove, but to me, that’s looking at it backwards. If you’re going to cause a major international crisis, you need to come to the table with some evidence, and I just don’t see it.

Alex Bell: The GEM, the Group of Eminent Members, which is an advisory group to the CTBTO, put it best when they said the most effective way to sort of deal with this problem is to get the treaty into force. So we could have intrusive short notice onsite inspections to detect and deter any possible violations.

Jeffrey Lewis: I actually got in trouble, I got to hushed because I was talking to a member and they were trying to work on this statement and they needed the member to come back in.

Ariel Conn: So I guess when you look at stuff like this –– so, basically, all three countries are currently modernizing their nuclear arsenals. Maybe we should just spend a couple minutes talking about that too. What does it mean for each country to be modernizing their arsenal? What does that sort of very briefly look like?

Alex Bell: Nuclear weapons delivery systems, nuclear weapons do age. You do have to maintain them, like you would with any weapon system, but fortunately, from the U.S. perspective, we have exceedingly capable scientists who are able to extend the life of these systems without testing. Jeffrey, if you want to go into what other countries are doing.

Jeffrey Lewis: Yeah. I think the simplest thing to do is to talk about, at least for the nuclear warheads part, I think as Alex mentioned, all of the countries are building new submarines, and missiles, and bombers that can deliver these nuclear weapons. And that’s a giant enterprise. It costs many billions of dollars every year. But when you actually look at the warheads themselves can tell you what we do in the United States. In some cases, we build new versions of existing designs. In almost all cases, we replace components as they age.

So the warhead design might stay the same, but piece by piece things get replaced. And because we’ve been replacing those pieces over time, if they have to put a new fuse in for a nuclear warhead, they don’t go back and build the ’70s era fuse. They build a new fuse. So even though we say that we’re only replacing the existing components and we don’t try to add new capabilities, in fact, we add new capabilities all the time because as all of these components get better than the weapons themselves get better, and we’re altering the characteristics of the warheads.

So the United States has a warhead on its submarine-launched ballistic missiles, and the Trump administration just undertook a program to give it a capability so that we can turn down the yield. So if we want to make it go off with a very small explosion, they can do that. It’s a full plate of the kinds of changes that are being made, and I think we’re seeing that in Russia and China too.

They are doing all of the same things to preserve the existing weapons they have. They rebuild designs that they have, and I think that they tinker with those designs. And that is constrained somewhat by the fact that there is no explosive testing –– that makes it harder to do those things, which is precisely why we wanted this ban in the first place –– but everybody is playing with their nuclear weapons.

And I think just because there’s a testing moratorium, the scientists who do this, some of them, because they want to go back to nuclear testing or nuclear explosions, they say, “If we could only test with explosions, that would be better.” So there’s even more they want to do, but let’s not act like they don’t get to touch the bombs, because they play with them all the time.

Alex Bell: Yeah. It’s interesting you brought up the low yield option for our submarine-launched ballistic missiles because the House of Representatives actually in the defense appropriations and authorization process that it’s going through right now actually blocked further funding and the deployment of this particular type of warhead because, in their opinion, the President already had plenty low-yield nuclear options, thank you very much. He doesn’t need anymore.

Jeffrey Lewis: Of course, I don’t think this president needs any nuclear options, but-

Alex Bell: But it just shows there’s definitely a political and oversight feature that comes into this modernization debate. The idea that even if the forces that Jeffrey talked about who’ve always wanted to return to testing, even if they could prevail upon a particular administration to go in that direction, it’s unlikely Congress would be as sanguine about it.

Nevada, where our former nuclear testing site is, now the Nevada National Security Site –– it’s not clear that Nevadans are going to be okay with a return to explosive nuclear testing, nor will the people of Utah who sit downwind from that particular site. So there’s actually a “not in my backyard” kind of feature to the debate about further testing.

Jeffrey Lewis: Yeah. The Department of Energy has actually taken… Anytime they do a conventional explosion at the Nevada site, they keep it a secret because they were going to do a conventional explosion 10 or 15 years ago and people got wind of it and were outraged because they were terrified the conventional explosion would kick up a bunch of dust and that there might still be radioactive particulates.

I’m not sure that that was an accurate worry, but I think it speaks to the lack of trust that people around the test site have, given some of the irresponsible things that the U.S. nuclear weapons complex has done over the years. That’s a whole other podcast, but you don’t want to live next to anything that NNSA overseas.

Alex Bell: There’s also a proximity issue. Las Vegas is incredibly close to that facility. Back in the day when they did underground testing there, it used to shake the buildings on the Strip. And Las Vegas has only expanded from 20, 30 years ago, so you’re going to have a lot of people that would be very worried.

Ariel Conn: Yeah. So that’s actually a question that I had. I mean, we have a better idea today of what the impacts of nuclear testing are. Would Americans approve of nuclear weapons being tested on our ground?

Jeffrey Lewis: Probably if they didn’t have to live next to them.

Alex Bell: Yeah. I’ve been to some of the states where we conducted tests other than Nevada. So Colorado, where we tried to do this brilliant idea of whether we could do fracking via nuclear explosion. You can see the problems inherent in that idea. Alaska, New Mexico, obviously, where the first nuclear test happened. We also tested weapons in Mississippi. So all of these states have been affected in various ways and radio particulates from the sites in Nevada have drifted as far away from Maine, and scientists have been able to trace cancer clusters half a continent away.

Jeffrey Lewis: Yeah, I would add that –– Alex mentioned testing in Alaska –– so there was a giant test in 1971 in Alaska called Cannikin. It was five megatons. So a megaton is 1,000 kilotons. Hiroshima was 20 kilotons and it really made some Canadians angry and the consequence of the angry Canadians was they founded Greenpeace. So the whole iconic Greenpeace on a boat was originally driven by a desire to stop U.S. nuclear testing in Alaska. So, you know, people get worked up.

Ariel Conn: Do you think someone in the U.S. is actively trying to bring testing back? Do you think that we’re going to see more of this or do you think this might just go away?

Jeffrey Lewis: Oh yeah. There was a huge debate at the beginning of the Trump administration. I actually wrote this article making fun of Rick Perry, the Secretary of Energy, who I have to admit has turned out to be a perfectly normal cabinet secretary in an administration that looks like the Star Wars Cantina.

Alex Bell: It’s a low bar.

Jeffrey Lewis: It’s a low bar, and maybe just barely, but Rick got over it. But I was sort of mocking him and the article was headlined, “Even Rick Perry isn’t dumb enough to resume nuclear testing,” and I got notes, people saying, “This is not funny. This is a serious possibility.” So, yeah, I think there has long been a group of people who did not want to end testing. U.S. labs refuse to prepare for the end of testing. So when the U.S. stopped, it was Congress just telling them to stop. They have always wanted to go back to testing, and these are the same people I think who are accusing the Russians of doing things, I think as much so that they can get out of the test ban as anything else.

Alex Bell: Yeah, I would agree with that assessment. Those people have always been here. It’s strange to me because most scientists have affirmed that we know more about our nuclear weapons now not blowing them up than we did before because of the advanced computer modeling, technological advances of the Stockpile Stewardship program, which is the program that extends the life of these warheads. They get to do a lot of great science, and they’ve learned a lot of things about our nuclear forces that we didn’t know before.

So it’s hard to make a case that it is absolutely necessary or would ever be absolutely necessary to return to testing. You would have to totally throw out our obligations that we have to things like the nuclear non-proliferation treaty, which is to pursue the cessation of an arms race in good faith, and a return to testing I think would not be very good faith.

Ariel Conn: Maybe we’ve sort of touched on this, but I guess it’s still not clear to me. Why would we want to return to testing? Especially if, like you said, the models are so good?

Jeffrey Lewis: I think you have to approach that question like an anthropologist. Because some countries are quite happy living under a test ban for exactly the reason that you pointed out, that they are getting all kinds of money to do all kinds of interesting science. And so Chinese seem pretty happy about it; The UK, actually –– I’ve met some UK scientists who are totally satisfied with it.

But I think the culture in the U.S. laboratories, which had really nothing to do with the reliability of the weapons and everything to do with the culture of the lab, was like the day that a young designer became a man or a woman was the day that person’s design went out into the desert and they had to stand there and be terrified it wasn’t going to work, and then feel the big rumble. So I think there are different ways of doing science. I think the labs in the United States were and are sentimentally attached to solving these problems with explosions.

Alex Bell: There’s also sort of a strange desire to see them. My first trip out to the test site, I was the only woman on the trip and we were looking at the Sedan Crater, which is just this enormous crater from an explosion underground that was much bigger than we thought it was going to be. It made this, I think it’s seven football fields across, and to me, it was just sort of horrifying, and I looked at it with dread. And a lot of the people who were on the trip reacted entirely differently with, “I thought it would be bigger,” and, “Wouldn’t it be awesome to see one of these go off, just once?” and had a much different take on what these tests were for and what they sort of indicated.

Ariel Conn: So we can actually test nuclear weapons without exploding them. Can you talk about what the difference is between testing and explosions, and what that means?

Jeffrey Lewis: The way a nuclear weapon works is you have a sphere of fissile material –– so that’s plutonium or highly enriched uranium –– and that’s surrounded by conventional explosives. And around that, there are detonators and electronics to make sure that the explosives all detonate at the exact same moment so that they spherically compress or implode the plutonium or highly enriched uranium. So when it gets squeezed down, it makes a big bang, and then if it’s a thermonuclear weapon, then there’s something called a secondary, which complicates it.

But you can do that –– you can test all of those components, just as long as you don’t have enough plutonium or highly enriched uranium in the middle to cause a nuclear explosion. So you can fill it with just regular uranium, which won’t go critical, and so you could test the whole setup that way for all of the things in a nuclear weapon that would make it a thermonuclear weapon. There’s a variety of different fusion research techniques you can do to test those kinds of reactions.

So you can really simulate everything, and you can do as many computer simulations as you want, it’s just that you can’t put it all together and get the big bang. And so the U.S. has built this giant facility at Livermore called NIF, the National Ignition Facility, which is a many billion-dollar piece of equipment, in order to sort of simulate some of the fusion aspects of a nuclear weapon. It’s an incredible piece of equipment that has taught U.S. scientists far more than they ever knew about these processes when they were actually exploding things. It’s far better for them, and they can do that. It’s completely legal.

Alex Bell: Yeah, the most powerful computer in the world belongs to Los Alamos. Its job is to help simulate these nuclear explosions and process data related to the nuclear stockpile.

Jeffrey Lewis: Yeah, I got a kick –– I always check in on that list, and it’s almost invariably one of the U.S. nuclear laboratories that has the top computer. And then one time I noticed that the Chinese had jumped up there for a minute and it was their laboratory.

Alex Bell: Yup, it trades back and forth.

Jeffrey Lewis: Good times.

Alex Bell: A lot of the data that goes into this is observational information and technical readings that we got from when we did explosive testing. And our testing record is far more extensive than any other country, which is one of the reasons why we have sort of this advantage that would be locked in, in the event of a CTBT entering into force.

Ariel Conn: Yeah, I thought that was actually a really interesting point. I don’t know if there’s more to elaborate on it, but the idea that the U.S. could actually sacrifice some of its nuclear superiority by ––

Alex Bell: Returning to testing?

Ariel Conn: Yeah.

Alex Bell: Yeah, because if we go, everyone goes.

Ariel Conn: There were countries that still weren’t thrilled even with the testing that is allowed. Can you elaborate on that a little bit?

Alex Bell: Yes. A lot of countries, particularly the countries that back the Treaty on the Prohibition of Nuclear Weapons, which is a new treaty that does not have any nuclear weapon states as a part of it, but it’s a total ban on the possession and use of nuclear weapons, and those countries are particularly frustrated with what they see as the slow pace of disarmament by the nuclear weapon states.

The Nonproliferation Treaty, which is sort of the glue that holds all this together, was indefinitely extended back in 1995. The price for that from the non-nuclear weapon states was the commitment of nuclear weapon states to sign and ratify a comprehensive test ban. So 25 years later almost, they’re still waiting.

Ariel Conn: I will add that, I think as of this week, I believe three of the United States –– California, New Jersey and Oregon –– have passed resolutions supporting the U.S. joining the treaty that actually bans nuclear weapons, that recent one.

Alex Bell: Yeah. It’s been interesting, while it’s something that the verification measures –– Jeffrey might have some thoughts on this too –– to me, principles aside, the verification measures in the Treaty on the Prohibition of Nuclear Weapons makes it sort of an unviable treaty. But from a messaging perspective, you’re seeing kind of the first time since the Cold War where citizenry around the world is saying, “You have to get rid of these weapons. They’re no longer acceptable. They’ve become liabilities, not assets.”

So while I don’t think the treaty itself is a workable treaty for the United States, I think that the sentiment behind it is useful in persuading leaders that we do need to do more on disarmament.

Jeffrey Lewis: I would just say that I think just like we saw earlier, there’s a lot of the U.S. wanting to have its cake and eat it too. And so the Nonproliferation Treaty, which is the big treaty that says, “Countries should not be able to acquire nuclear weapons,” it also commits the United States and the other nuclear powers to work toward disarmament. That’s not something they take seriously.

Just like with nuclear testing where you see this, “Oh, well, maybe we could edge back and do it,” you see the same thing just on disarmament issues generally. So having people out there who are insisting on holding the most powerful countries to account to make sure that they do their share, I also think is really important.

Ariel Conn: All right. So I actually think that’s sort of a nice note to end on. Is there anything else that you think is important that we didn’t get into or that just generally is important for people to know?

Alex Bell: I would just reiterate the point that if the U.S. government is truly concerned that Russia is conducting tests at even very low yields, that we need to be engaged in a conversation with them, that a global ban on nuclear explosive testing is good for every country in this world and we shouldn’t be doing things to derail the pursuit of such a treaty.

Ariel Conn: Agreed. All right, well, thank you both so much for joining today.

As always, if you’ve been enjoying the podcast, please take a moment to like it, share it, and maybe even leave a good review and I will be back again next month with another episode of the FLI Podcast.

FLI Podcast: Applying AI Safety & Ethics Today with Ashley Llorens & Francesca Rossi

As we grapple with questions about AI safety and ethics, we’re implicitly asking something else: what type of a future do we want, and how can AI help us get there?

In this month’s podcast, Ariel spoke with Ashley Llorens, the Founding Chief of the Intelligent Systems Center at the Johns Hopkins Applied Physics Laboratory, and Francesca Rossi, the IBM AI Ethics Global Leader at the IBM TJ Watson Research Lab and an FLI board member, about developing AI that will make us safer, more productive, and more creative. Too often, Rossi points out, we build our visions of the future around our current technology. Here, Llorens and Rossi take the opposite approach: let’s build our technology around our visions for the future.

Topics discussed in this episode include:

  • Hopes for the future of AI
  • AI-human collaboration
  • AI’s influence on art and creativity
  • The UN AI for Good Summit
  • Gaps in AI safety
  • Preparing AI for uncertainty
  • Holding AI accountable

Publications and resources discussed in this episode include:

Ariel: Hello and welcome to another episode of the FLI podcast. I’m your host Ariel Conn, and today we’ll be looking at how to address safety and ethical issues surrounding artificial intelligence, and how we can implement safe and ethical AIs both now and into the future. Joining us this month are Ashley Llorens and Francesca Rossi who will talk about what they’re seeing in academia, industry, and the military in terms of how AI safety is already being applied and where the gaps are that still need to be addressed.

Ashley is the Founding Chief of the Intelligent Systems Center at the John Hopkins Applied Physics Laboratory where he directs research and development in machine learning, robotics, autonomous systems, and neuroscience all towards addressing national and global challenges. He has served on the Defense Science Board, the Naval Studies Board of the National Academy of Sciences, and the Center for a New American Security’s AI task force. He is also a voting member of the Recording Academy, which is the organization that hosts the Grammy Awards, and I will definitely be asking him about that later in the show.

Francesca is the IBM AI Ethics Global Leader at the IBM TJ Watson Research Lab. She is an advisory board member for FLI, a founding board member for the Partnership on AI, a deputy academic director of the Leverhulme Centre for the Future of Intelligence, a fellow with AAAI and EurAI (that’s e-u-r-a-i), and she will be the general chair of AAAI in 2020. She was previously Professor of Computer Science at the University of Padova in Italy, and she’s been president of IJCAI and the editor-in-chief of the Journal of AI Research. She is currently joining us from the United Nations AI For Good Summit, which I will also ask about later in the show.

So Ashley and Francesca, thank you so much for joining us today.

Francesca: Thank you.

Ashley: Glad to be here.

Ariel: Alright. The first question that I have for both of you, and Ashley, maybe I’ll direct this towards you first: basically, as you look into the future and you look at artificial intelligence becoming more of a role in our everyday lives — before we look at how everything could go wrong, what are we striving for? What do you hope will happen with artificial intelligence and humanity?

Ashley: My perspective on AI is informed a lot by my research and experiences at the Johns Hopkins Applied Physics Lab, which I’ve been at for a number of years. My earliest explorations had to do with applications of artificial intelligence to robotics systems, in particular underwater robotics systems, systems where signal processing and machine learning are needed to give the system situational awareness. And of course, light doesn’t travel very well underwater, so it’s an interesting task to make a machine see with sound for all of its awareness and all of its perception.

And in that journey, I realized how hard it is to have AI-enabled systems capable of functioning in the real world. That’s really been a personal research journey that’s turned into an institution-wide research journey for Johns Hopkins APL writ large. And we’re a large not-for-profit R & D organization that does national security, space exploration, and health. We’re about 7,000 folks or so across many different disciplines, but many scientists and engineers working on those kinds of problems — we say critical contributions to critical challenges.

So as I look forward, I’m really looking at AI-enabled systems, whether they’re algorithmic in cyberspace or they’re real-world systems that are really able to act with greater autonomy in the context of these important national and global challenges. So for national security: to have robotic systems that can be where people don’t want to be, in terms of being under the sea or even having a robot go into a situation that could be dangerous so a person doesn’t have to. And to have that system be able to deal with all the uncertainty associated with that.

You look at future space exploration missions where — in terms of AI for scientific discovery, we talk a lot about that — imagine a system that can perform science with greater degrees of autonomy and figure out novel ways of using its instruments to form and interrogate hypotheses when billions of miles away. Or in health applications where we can have systems more ubiquitously interpreting data and helping us to make decisions about our health to increase our lifespan, or health span as they say.

I’ve been accused of being a techno-optimist, I guess. I don’t think technology is the solution to everything, but it is my personal fascination. And in general, just having this AI capable of adding value for humanity in a real world that’s messy and sloppy and uncertain.

Ariel: Alright. Francesca, you and I have talked a bit in the past, and so I know you do a lot of work with AI safety and ethics. But I know you’re also incredibly hopeful about where we can go with AI. So if you could start by talking about some of the things that you’re most looking forward to.

Francesca: Sure. Partially focused on the need of developing autonomous AI systems that can act where humans cannot go, for example, and that’s definitely very, very important. I would like to focus more on the need also of AI systems that can actually work together with humans, augmenting our own capabilities to make decisions or to function in our work environment or in our private environment. That’s the focus of and the purpose of AI that I see, that I work on, and I focus on what are the challenges in making this system really work well with humans.

This means of course that while it may seem that in some sense it’s easier to develop an AI system that works together with humans because there is complementarity — some things are made by humans, some things are made by the machine. But actually, there are several additional challenges because you want these two entities, the human and the machine, to actually become a real team and work together and collaborate together to achieve a certain goal. You want these machines to be able to communicate, interact in a very natural way with human beings and you want these machines to be not just reactive to commands, but also proactive at trying to understand what the human being needs in that moment, in that context in order to provide all the information and knowledge that it needs from the data that surrounds whatever task is going to be addressed.

That’s the focus also of what IBM Business Model is, because of course IBM releases AI to be used in other companies so that their professional people can use it to do better the job that they’re doing. And it has many, many different interesting research directions. The one that I’m mostly focused on is around value alignment. How do you make sure that these systems know and are aware of the values that they should follow and of the ethical principles that they should follow, while trying to help human beings do whatever they need to do? And there are many ways that you can do that and many ways to model them to reason with these ethical principles and so on.

Being here in Geneva at AI For Good, I mean, in general, I think that here for example the emphasis is — and rightly so — about the sustainable development goals of the UN: these 17 goals that define a vision of the future, the future that we want. And we’re trying to understand how we can leverage technologies such as AI to achieve that vision. The vision can be slightly nuanced and different, but to me, the development of advanced AI is not the end goal, but is only a way to get to the vision of the future that I have. And so, to me, this AI For Good Summit and the 17 sustainable development goals define a vision of the future that is important to have when one has in mind how to improve technology.

Ariel: For listeners who aren’t as familiar with the sustainable development goals, we can include links to what all of those are in the podcast description.

Francesca: I was impressed at this AI For Good Summit. This Summit started three years ago with kind of 400 people. Then last year it was like 500 people, and this year there are 3,200 registered participants. To really give you an idea of how more and more everybody’s interested into these subjects.

Ariel: Have you also been equally impressed by the topics that are covered?

Francesca: Well, I mean, it started today. So I just saw in the morning there are five different parallel sessions that will go throughout the following two days. One is AI education and learning. One is health and wellbeing. One is AI human dignity and inclusive society. One is scaling AI for good. And one is AI for space. These five themes will go throughout two days together with many other smaller ones. But for what I’ve seen this morning, it’s really a very high level of the discussion. It’s going to be very impactful. Each event is unique, has its own specificity, but this event is unique because it’s focused on a vision of the future, which in this case are the sustainable development goals.

Ariel: Well, I’m really glad that you’re there. We’re excited to have you there. And so, you’re talking about moving towards futures where we have AIs that can do things that either humans can’t do or don’t want to do or isn’t safe, visions where we can achieve more because we’re working with AI systems as opposed to just humans trying to do things alone. But we still have to get to those points where this is being implemented safely and ethically.

I’ll come back to the question of what we’re doing right so far, but first, what do you see as the biggest gaps in AI safety and ethics? And this is a super broad question, but looking at it with respect to, say, the military or industry or academia. What are some of the biggest problems you see in terms of us safely applying AI to solve problems?

Ashley: It’s a really important question. My answer is going to center around uncertainty and dealing with that in the context of the operation of the system, and let’s say the implementation or the execution of the ethics of the system as well. But first, backing up to Francesca’s comment, I just want to emphasize this notion of teaming and really embrace this narrative in my remarks here.

I’ve heard it said before that every machine is part of some human workflow. I think a colleague Matt Johnson at the Florida Institute for Human and Machine Cognition says that, which I really like. And so, just to make clear, whether we’re talking about the cognitive enhancements, an application of AI where maybe you’re doing information retrieval, or even a space exploration example, it’s always part of a human-machine team. In the space exploration example, the scientists and the engineers are on the earth, maybe many light hours away, but the machines are helping them do science. But at the end of the day, the scientific discovery is really happening on earth with the scientists. And so, whether it’s a machine operating remotely or by cognitive assistance, it’s always part of a human-machine team. That’s just something I wanted to amplify that Francesca said.

But coming back to the gaps, a lot of times I think what we’re missing in our conversations is getting some structure around the role of uncertainty in these agents that we’re trying to create that are going to help achieve that bright future that Francesca was referring to. To help us think about this at APL, we think about agents as needing to perceive, decide, act in teams. This is a framework that just helps us understand these general capabilities that we’ll need and to start thinking about the role of uncertainty, and then combinations of learning and reasoning that would help agents to deal with that. And so, if you think about an agent pursuing goals, the first thing it has to do is get an understanding of the world states. This is this task of perception.

We often talk about, well, if an agent sees this or that, or if an agent finds itself in this situation, we want it to behave this way. Obviously, the trolley problem is an example we revisit often. I won’t go into the details there, but the question is, I think, given some imperfect observation of the world, how does the structure of that uncertainty factor into the correct functioning of the agent in that situation? And then, how does that factor into the ethical, I’ll say, choices or data-driven responses that an agent might have to that situation?

Then we talk about decision making. An agent has goals. In order to act on its goals, it has to decide about how certain sequences of actions would affect future states of the world. And then again how, in the context of an uncertain world, is the agent going to go about accurately evaluating possible future actions when it’s outside of a gaming environment, for example. How does uncertainty play into that and its evaluation of possible actions? And then in the carrying out of those actions, there may be physical reasoning, geometric reasoning that has to happen. For example, if an agent is going to act in a physical space, or reasoning about a cyber-physical environment where there’s critical infrastructure that needs to be protected or something like that.

And then finally, to Francesca’s point, the interactions, or the teaming with other agents that may be teammates or actually may be adversarial. And so, how does the reasoning about what my teammates might be intending to do, what the state of my teammates might be in terms of cognitive load if it’s a human teammate, what might the intent of adversarial agents be in confounding or interfering with the goals of the human-machine team?

And so, to recap a little bit, I think this notion of machines dealing with uncertainty in real world situations is one of the key challenges that we need to deal with over the coming decades. And so, I think having more explicit conversations about how uncertainty manifests in these situations, how you deal with it in the context of the real world operation of an AI-enabled system, and then how we give structure to the uncertainty in a way that should inform our ethical reasoning about the operation of these systems. I think that’s a very worthy area of focus for us over the coming decades.

Ariel: Could you walk us through a specific example of how an AI system might be applied and what sort of uncertainties it might come across?

Ashley: Yeah, sure. So think about the situation where there’s a dangerous environment, let’s say, in a policing action or in a terrorist situation. Hey, there might be hostiles in this building, and right now a human being might have to go into that building to investigate it. We’ll send a team of robots in there to do the investigation of the building to see if it’s safe, and you can think about that situation as analogous for a number of possible different situations.

And now, let’s think about the state of computer vision technology, where straight pattern recognition is hopefully a fair characterization of the state of the art, where we know we can very accurately recognize objects from a given universe of objects in a computer vision feed, for example. Well, now what happens if these agents encounter objects from outside of that universe of training classes? How can we start to bound the performance of the computer vision algorithm with respect to objects from unknown classes? You can start to get a sense from that progression, just from the perception part of that problem, from recognize, of these 200 possible objects, tell me which class it comes from, to having to do vision type tasks in environments that would present many new and novel objects that they may have to perceive and reason about.

You can think about that perception task now as extending to agents that might be in that environment and trying to ascertain from partial observations of what the agents might look like, partial observations of the things they might be doing to try to have some assessment of this is a friendly agent or this is an unfriendly agent, to reasoning about affordances of objects in the environment that might present our systems with ways of dealing with those agents that conform to ethical principles.

That was not a very, very concrete example, but hopefully starts to get one level deeper into the kinds of situations we want to put systems into and the kinds of uncertainty that might arise.

Francesca: To tie to what Ashley just said, we definitely need a lot more ways to have realistic simulations of what can happen in real life. So testbeds, sandboxes, that is definitely needed. But related to that, there is also this ongoing effort — which has already resulted in tools and mechanisms, but many people are still working on it — which is to understand better the error landscape that the machine learning approach may have. We know machine learning always has a small percentage of error in any given situation and that’s okay, but we need to understand what’s the robustness of the system in terms of that error, and also we need to understand the structure of that error space because this information can inform us on what are the most appropriate or less appropriate use cases for the system.

Of course, going from there, this understanding of the error landscape is just one aspect of the need for transparency on the capabilities and limitations of the AI systems when they are deployed. It’s a challenge that spans from academia or research centers to, of course, the business units and the companies developing and delivering AI systems. So that’s why at IBM we are working a lot on this issue of collecting information during the development and the design phase around the properties of the systems, because we think that understanding of this property is very important to really understand what should or should not be done with the system.

And then, of course, there is, as you know, a lot of work around understanding other properties of the system. Like, fairness is one of the values that we may want to inject, but of course it’s not as simple as it looks because there are many, many definitions of fairness and each one is more appropriate or less appropriate in certain scenarios and certain tasks. It is important to identify the right one at the beginning of the design and the development process, and then to inject mechanisms to detect and mitigate bias according to that notion of fairness that we have decided is the correct one for that product.

And so, this brings us also to the other big challenge, which is to help developers understand how to define these notions, these values like fairness that they need to use in developing the system — how to define them not just by themselves within the tech company, but also communicating with the communities that are going to be impacted by these AI product, and that may have something to say on what is the right definition of fairness that they care about. That’s why, for example, another thing that we did, besides developing research and also products, but we also invest a lot in educating developers in trying to help them understand in their everyday jobs how to think about these issues, whether it’s fairness, robustness, transparency, and so on.

And so, we built this very small booklet — we call it the Everyday AI Ethics Guide for Designers and Developers — that raises a lot of questions that should be in their mind in their everyday job. Because as you know, for example, if you don’t think about bias or fairness during these development phases and you just check whether your product is fair or not or when it’s ready to be deployed, then you may discover that actually you need to start from scratch again if you discover that it doesn’t have the right notion of fairness.

Another effort that we really care a lot about in this effort to build teams of humans and machines is the issue of explainability, to make sure that it is possible to understand why these systems are recommending certain decisions. Explainability is something that, especially in this environment of teaming AI machines, is very important, because without this capability of AI systems of explaining why they are recommending certain decision, then the human being part of the team will not in the long run trust the AI system, so will not adopt it possibly. And so we will also lose the positive and beneficial effect of the AI system.

The last thing that I want to say is that this education of developers extends actually much beyond the developers to also the policy makers. That’s why it’s important to have a lot of interaction with policy makers that need to really be educated about the state of the art, about the challenges, about the limits of current AI, in order to understand how to best drive the current technology, to be more and more advanced, but also beneficial and driven towards the beneficial efforts. And what are the right mechanisms to drive the technology into the direction that we want? Still needs a lot more multi-stakeholder discussion to really achieve the best results, I think.

Ashley: Just picking up on a couple of those themes that Francesca raised: first, I just want to touch on simulations. At the applied physics laboratory, one of the core things we do is develop systems for the real world. And so, as the tools of artificial intelligence are evolving, the art and the science of systems engineering is starting to morph into this AI systems engineering regime. And we see simulation as key, more key than it’s ever been, into developing real world systems that are enabled by AI.

One of the things we’re really looking into now is what we call live virtual constructive simulations. These are simulations that you can do distributed learning for agents in a constructive mode where you have highly parallelized learning, but where you actually have links and hooks for live interactions with humans to get the human-machine teaming. And then finally, bridging the gap between simulation and real world where some of the agents represented in the context of the human-machine teaming functionality can be virtual and some can actually be represented by real systems in the real world. And so, we think that these kinds of environments, these live virtual constructive environments, will be important for bridging the gap from simulation to real.

Now, in the context of that is this notion of sharing information. If you think about the complexity of the systems that we’re building, and the complexity and the uncertainty of the real world conditions — whether that’s physical or cyber or what have you — it’s going to be more and more challenging for a single development team to analytically characterize the performance of the system in the context of real-world environment. And so, I think as a community we’re really doing science; We’re performing science, fielding these complex systems in these real-world environments. And so, the more we can make that a collective scientific exploration where we’re setting hypotheses, performing these experiments — these experiments of deploying AI in real world situations — the more quickly we’ll make progress.

And then, finally, I just wanted to talk about accountability, which I think builds on this notion of transparency and explainability. From what I can see — and this is something we don’t talk about enough, I think — is I think we need to change our notion of accountability when it comes to AI-enabled systems. I think our human nature is we want individual accountability for individual decisions and individual actions. If an accident happens, our whole legal system, our whole accountability framework is, “Well, tell me exactly what happened that time,” and I want to get some accountability based on that and I want to see something improve based on that. Whether it’s a plane crash or a car crash, or let’s say there’s corruption in a Fortune 500 company — we want see the CFO fired and we want to see a new person hired.

I think when you look at these algorithms, they’re driven by statistics, and the statistics that drive these models are really not well suited for individual accountability. It’s very hard to establish the validity of a particular answer or classification or something that comes out of the algorithm. Rather, we’re really starting to look at the performance of these algorithms over a period of time. It’s hard to say, “Okay, this AI-enabled system: tell me what happened on Wednesday,” or, “Let me hold you accountable for what happened on Wednesday.” And more so, “Let me hold you accountable for everything that you did during the month of April that resulted in this performance.”

And so, I think our notion of accountability is going to have to embrace this notion of ensemble validity, validity over a collection of activities, actions, decisions. Because right now, I think if you look at the underlying mathematical frameworks for these algorithms, they’re not well supported for this notion of individual accountability for decisions.

Francesca: Accountability is very important. It needs a lot more discussion. This is one of the topic also that we have been discussing in this initiative by the European Commission in defining the AI Ethics Guidelines for Europe, and accountability is one of the seven requirements. But it’s not easy to define what it means. What Ashley said is one possibility: Change our idea of accountability from one specific instance to over several instances. That’s one possibility, but I think that that’s something that needs a lot more discussion with several stakeholders.

Ariel: You’ve both mentioned some things that sound like we’re starting to move in the right direction. Francesca, you talked about getting developers to think about some of the issues like fairness and bias before they start to develop things. You talked about trying to get policy makers more involved. Ashley, you mentioned the live virtual simulations. Looking at where we are today, what are some of the things that you think have been most successful in moving towards a world where we’re considering AI safety more regularly, or completely regularly?

Francesca: First of all, we’ve gone a really long way in a relatively short period of time, and the Future of Life Institute has been instrumental in building the community, and everybody understands that the only approach to address this issue is a multidisciplinary, multi-stakeholder approach. The Future of Life Institute, with the first Puerto Rico conference, showed very clearly that this is the approach to follow. So I think that in terms of building the community that discusses and identifies the issues, I think we have done a lot.

I think that at this point, what we need is greater coordination and also redundancy removal among all these different initiatives. I think we have to find, as a community, the main issues and the main principles and guidelines that we think are needed for the development of more advanced forms of AI, starting from the current state of the art. If you look at the values, at these guidelines or lists of principles around AI ethics from the values initiatives, they are of course different from each other but they have a lot in common. So we really were able to identify these issues, and this identification of the main issues is important as we move forward to more advanced versions of AI.

And then, I think another thing that also we are doing in a rather successful though not complete way is trying to move from research to practice. From high level principles to concrete, develop, and deploy the products that can embed these principles and guidelines into not just the scientific papers that are published, but also into the platform, the services, and the tool kits that companies use with their clients. We needed an initial phase where there were high level discussions about guidelines and principles, but now we are in the second phase where these go and percolate down to the business units and to how products are built and deployed.

Ashley: Yeah, just building on some of Francesca’s comments, I’ve been very inspired by the work of the Future of Life Institute and the burgeoning, I’ll say, emerging AI safety community. Similar to Francesca’s comment, I think that the real frontier here is now taking a lot of that energy, a lot of that academic exploration, research, and analysis and starting to find the intersections of a lot of those explorations with the real systems that we’re building.

You’re definitely seeing within IBM, as Francesca mentioned, within Microsoft, within more applied R & D organizations like Johns Hopkins APL, where I am, internal efforts to try to bridge the gap. And what I really want to try to work to catalyze in the coming years is a broader, more community-wide intersection between the academic research community looking out over the coming centuries and the applied research community that’s looking out over the coming decades, and find the intersection there. How do we start to pose a lot of these longer term challenge problems in the context of real systems that we’re developing?

And maybe we get to examples. Let’s say, for ethics, beyond the trolley problem and into posing problems that are more real-world or closer, better analogies to the kinds of systems we’re developing, the kinds of situations they will find themselves in, and start to give structure to some of the underlying uncertainty. Having our debates informed by those things.

Ariel: I think that transitions really nicely to the next question I want to ask you both, and that is, over the next 5 to 10 years, what do you want to see out of the AI community that you think will be most useful in implementing safety and ethics?

Ashley: I’ll probably sound repetitive, but I really think focusing in on characterizing — I think I like the way Francesca put it — the error landscape of a system as a function of the complex internal states and workings of the system, and the complex and uncertain real-world environments, whether cyber or physical that the system will be operating in, and really get deeper there. It’s probably clear to anyone that works in the space that we really need to fundamentally advance the science and the technology. I’ll start to introduce the word now: trust, as it pertains to AI-enabled systems operating in these complex and uncertain environments. And again, starting to better ground some of our longer-term thinking about AI being beneficial for humanity and grounding those conversations into the realities of the technologies as they stand today and as we hope to develop and advance them over the next few decades.

Francesca: Trust means building trust in the technology itself — and so the things that we already mentioned like making sure that it’s fair, value aligned, robust, explainable — but also building trust in those that produce the technology. But then, I mean, this is the current topic: How do we build trust? Because without trust we’re not going to adopt the full potential of the beneficial effect of the technology. It makes sense to also think in parallel, and more in the long-term, what’s the right governance? What’s the right coordination of initiatives around AI and AI ethics? And this is already a discussion that is taking place.

And then, after governance and coordination, it’s also important with more and more advanced versions of AI, to think about our identity, to think about the control issues, to think in general about this vision of the future, the wellbeing of the people, of the society, of the planet. And how to reverse engineer, in some sense, from a vision of the future to what it means in terms of a behavior of the technology, behavior of those that produce the technology, and behavior of those that regulate the technology, and so on.

We need a lot more of this reverse engineering approach, where instead of starting from the current state of the art of the technology and saying, “Okay, these are the properties that I think I want in this technology: fairness, robustness, transparency, and so on, because otherwise I don’t like this technology to be deployed without these properties.” And then see what happens in the next version, more advanced version of the technology, and think about possibly new properties and so on. This is one approach, but the other approach is that, “Okay, this is the vision of life in, I don’t know, 50 years from now. How do I go from that to the kind of the technology, to the direction that I want to push the technology towards to achieve that vision?

Ariel: We are getting a little bit short on time, and I did want to follow up with Ashley about his other job. Basically, Ashley, as far as my understanding, you essentially have a side job as a hip hop artist. I think it would be fun to just talk a little bit in the last couple of minutes that we have about how both you and Francesca see artificial intelligence impacting these more creative fields. Is this something that you see as enhancing artists’ abilities to do more? Do you think there’s a reason for artists to be concerned that AI will soon be a competition for them? What are your thoughts for the future of creativity and AI?

Ashley: Yeah. It’s interesting. As you point out, over the last decade or so, in addition to furthering my career as an engineer, I’ve also been a hip hop artist and I’ve toured around the world and put out some albums.I think where we see the biggest impact of technology on music and creativity, I think, is, one, in the democratization of access to creation. Technology is a lot cheaper. Having a microphone and a recording setup or something like that, from the standpoint of somebody that does vocals like me, is much more accessible to many more people. And then, you see advances and — you know, when I started doing music I would print CDs and press vinyl. There was no iTunes. And just, iTunes has revolutionized how music is accessed by people, and more generally how creative products are accessed by people in streaming, etc. So I think looking backward, we’ve seen most of the impact of technology on those two things: access to the creation and then access to the content.

Looking forward, will those continue to be the dominant factors in terms of how technology is influencing the creation of music, for example? Or will there be something more? Will AI start to become more of a creative partner? We’ll see that and it will be evolutionary. I think we already see technology being a creative partner more and more so over time. A lot of the things that I studied in school — digital signal processing, frequency, selective filtering — a lot of those things are baked into the tools already. And just as we see AI helping to interpret other kinds of signal processing products like radiology scans, we’ll see more and more of that in the creation of music where an AI assistant — for example, if I’m looking for samples from other music — an AI assistant that can comb through a large library of music and find good samples for me. Just as we do with Instagram filters — an AI suggesting good filters for pictures I take on my iPhone — you can see in music AI suggesting good audio filters or good mastering settings or something, given a song that I’m trying to produce or goals that I have for the feel and tone of the product.

And so, already I think as an evolutionary step, not even a revolutionary step, AI becoming more present in the creation of music. I think maybe, as in other application areas, we may see, again, AI being more of a teammate, not only in the creation of the music, but in the playing of the music. I heard an article or a cast on NPR about a piano player that developed an AI accompaniment for himself. And so, as he played in a live show, for example, there would be an AI accompaniment and you could dial back the settings on it in terms of how aggressive it was in rhythm and time, where it situated with respect to the lead performer. Maybe in hip hop we’ll see AI hype men or AI DJs. It’s expensive to travel overseas, and so somebody like me goes overseas to do a show, and instead of bringing a DJ with me, I have an AI program that can select my tracks and add cuts at the right places and things like that. So that was a long-winded answer, but there’s a lot there. Hopefully that was addressing your question.

Ariel: Yeah, absolutely. Francesca, did you have anything you wanted to add about what you think AI can do for creativity?

Francesca: Yeah. I mean, of course I’m less familiar of what AI is already doing right now, but I am aware of many systems from companies into the space of delivering content or music or so on, systems where the AI part is helping humans develop their own creativity even farther. And as Ashley said, I mean, I hope that in the future AI can help us be more creative — even people that maybe are less able than Ashley to be creative themselves. And I hope that this will enhance the creativity of everybody, because this will enhance the creativity, yes, in hip hop or in making songs or in other things, but also I think it will help to solve some very fundamental problems because having a population which is more creative, of course, is more creative in everything.

So in general, I hope that AI will help us human beings be more creative in all aspects of our life besides entertainment — which is of course very, very important for all of us for the wellbeing and so on — but also in all the other aspects of our life. And this is the goal that I think — going to the beginning where I said AI’s purpose should be the one of enhancing our own capabilities. And of course, creativity is also a very important capability that human beings have.

Ariel: Alright. Well, thank you both so much for joining us today. I really enjoyed the conversation.

Francesca: Thank you.

Ashley: Thanks for having me. I really enjoyed it.

Ariel: For all of our listeners, if you have been enjoying this podcast, please take a moment to like it or share it and maybe even give us a good review. And we will be back again next month.

AI Alignment Podcast: On Consciousness, Qualia, and Meaning with Mike Johnson and Andrés Gómez Emilsson

Consciousness is a concept which is at the forefront of much scientific and philosophical thinking. At the same time, there is large disagreement over what consciousness exactly is and whether it can be fully captured by science or is best explained away by a reductionist understanding. Some believe consciousness to be the source of all value and others take it to be a kind of delusion or confusion generated by algorithms in the brain. The Qualia Research Institute takes consciousness to be something substantial and real in the world that they expect can be captured by the language and tools of science and mathematics. To understand this position, we will have to unpack the philosophical motivations which inform this view, the intuition pumps which lend themselves to these motivations, and then explore the scientific process of investigation which is born of these considerations. Whether you take consciousness to be something real or illusory, the implications of these possibilities certainly have tremendous moral and empirical implications for life’s purpose and role in the universe. Is existence without consciousness meaningful?

In this podcast, Lucas spoke with Mike Johnson and Andrés Gómez Emilsson of the Qualia Research Institute. Andrés is a consciousness researcher at QRI and is also the Co-founder and President of the Stanford Transhumanist Association. He has a Master’s in Computational Psychology from Stanford. Mike is Executive Director at QRI and is also a co-founder. Mike is interested in neuroscience, philosophy of mind, and complexity theory.

Topics discussed in this episode include:

  • Functionalism and qualia realism
  • Views that are skeptical of consciousness
  • What we mean by consciousness
  • Consciousness and casuality
  • Marr’s levels of analysis
  • Core problem areas in thinking about consciousness
  • The Symmetry Theory of Valence
  • AI alignment and consciousness

You can take a short (3 minute) survey to share your feedback about the podcast here.

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, iTunes, Google Play, Stitcher, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can learn more about consciousness research at the Qualia Research InstituteMike‘s blog, and Andrés blog. You can listen to the podcast above or read the transcript below. Thanks to Ian Rusconi for production and edits as well as Scott Hirsh for feedback.

Lucas: Hey, everyone. Welcome back to the AI Alignment Podcast. I’m Lucas Perry, and today we’ll be speaking with Andrés Gomez Emilsson and Mike Johnson from the Qualia Research Institute. In this episode, we discuss the Qualia Research Institute’s mission and core philosophy. We get into the differences between and arguments for and against functionalism and qualia realism. We discuss definitions of consciousness, how consciousness might be causal, we explore Marr’s Levels of Analysis, we discuss the Symmetry Theory of Valence. We also get into identity and consciousness, and the world, the is-out problem, what this all means for AI alignment and building beautiful futures.

And then end on some fun bits, exploring the potentially large amounts of qualia hidden away in cosmological events, and whether or not our universe is something more like heaven or hell. And remember, if you find this podcast interesting or useful, remember to like, comment, subscribe, and follow us on your preferred listening platform. You can continue to help make this podcast better by participating in a very short survey linked in the description of wherever you might find this podcast. It really helps. Andrés is a consciousness researcher at QRI and is also the Co-founder and President of the Stanford Transhumanist Association. He has a Master’s in Computational Psychology from Stanford. Mike is Executive Director at QRI and is also a co-founder.

He is interested in neuroscience, philosophy of mind, and complexity theory. And so, without further ado, I give you Mike Johnson and Andrés Gomez Emilsson. So, Mike and Andrés, thank you so much for coming on. Really excited about this conversation and there’s definitely a ton for us to get into here.

Andrés: Thank you so much for having us. It’s a pleasure.

Mike: Yeah, glad to be here.

Lucas: Let’s start off just talking to provide some background about the Qualia Research Institute. If you guys could explain a little bit, your perspective of the mission and base philosophy and vision that you guys have at QRI. If you could share that, that would be great.

Andrés: Yeah, for sure. I think one important point is there’s some people that think that really what matters might have to do with performing particular types of algorithms, or achieving external goals in the world. Broadly speaking, we tend to focus on experience as the source of value, and if you assume that experience is a source of value, then really mapping out what is the set of possible experiences, what are their computational properties, and above all, how good or bad they feel seems like an ethical and theoretical priority to actually make progress on how to systematically figure out what it is that we should be doing.

Mike: I’ll just add to that, this thing called consciousness seems pretty confusing and strange. We think of it as pre-paradigmatic, much like alchemy. Our vision for what we’re doing is to systematize it and to do to consciousness research what chemistry did to alchemy.

Lucas: To sort of summarize this, you guys are attempting to be very clear about phenomenology. You want to provide a formal structure for understanding and also being able to infer phenomenological states in people. So you guys are realists about consciousness?

Mike: Yes, absolutely.

Lucas: Let’s go ahead and lay some conceptual foundations. On your website, you guys describe QRI’s full stack, so the kinds of metaphysical and philosophical assumptions that you guys are holding to while you’re on this endeavor to mathematically capture consciousness.

Mike: I would say ‘full stack’ talks about how we do philosophy of mind, we do neuroscience, and we’re just getting into neurotechnology with the thought that yeah, if you have a better theory of consciousness, you should be able to have a better theory about the brain. And if you have a better theory about the brain, you should be able to build cooler stuff than you could otherwise. But starting with the philosophy, there’s this conception of qualia of formalism; the idea that phenomenology can be precisely represented mathematically. You borrow the goal from Giulio Tononi’s IIT. We don’t necessarily agree with the specific math involved, but the goal of constructing a mathematical object that is isomorphic to a systems phenomenology would be the correct approach if you want to formalize phenomenology.

And then from there, one of the big questions in how you even start is, what’s the simplest starting point? And here, I think one of our big innovations that is not seen at any other research group is we’ve started with emotional valence and pleasure. We think these are not only very ethically important, but also just literally the easiest place to start reverse engineering.

Lucas: Right, and so this view is also colored by physicalism and quality of structuralism and valence realism. Could you explain some of those things in a non-jargony way?

Mike: Sure. Quality of formalism is this idea that math is the right language to talk about qualia in, and that we can get a precise answer. This is another way of saying that we’re realists about consciousness much as people can be realists about electromagnetism. We’re also valence realists. This refers to how we believe emotional valence, or pain and pleasure, the goodness or badness of an experience. We think this is a natural kind. This concept carves reality at the joints. We have some further thoughts on how to define this mathematically as well.

Lucas: So you guys are physicalists, so you think that basically the causal structure of the world is best understood by physics and that consciousness was always part of the game engine of the universe from the beginning. Ontologically, it was basic and always there in the same sense that the other forces of nature were already in the game engine since the beginning?

Mike: Yeah, I would say so. I personally like the frame of dual aspect monism, but I would also step back a little bit and say there’s two attractors in this discussion. One is the physicalist attractor, and that’s QRI. Another would be the functionalist/computationalist attractor. I think a lot of AI researchers are in this attractor and this is a pretty deep question of, if we want to try to understand what value is, or what’s really going on, or if we want to try to reverse engineer phenomenology, do we pay attention to bits or atoms? What’s more real; bits or atoms?

Lucas: That’s an excellent question. Scientific reductionism here I think is very interesting. Could you guys go ahead and unpack though the skeptics position of your view and broadly adjudicate the merits of each view?

Andrés: Maybe a really important frame here is called Marr’s Levels of Analyses. David Marr was a cognitive scientist, wrote a really influential book in the ’80s called On Vision where he basically creates a schema for how to understand knowledge about, in this particular case, how you actually make sense of the world visually. The framework goes as follows: you have three ways in which you can describe a information processing system. First of all, the competitional/behavioral level. What that is about is understanding the input output mapping of an information processing system. Part of it is also understanding the run-time complexity of the system and under what conditions it’s able to perform its actions. Here an analogy would be with an abacus, for example.

On the computational/behavioral level, what an abacus can do is add, subtract, multiply, divide, and if you’re really creative you can also exponentiate and do other interesting things. Then you have the algorithmic level of analysis, which is a little bit more detailed, and in a sense more constrained. What the algorithm level of analysis is about is figuring out what are the internal representations and possible manipulations of those representations such that you get the input output of mapping described by the first layer. Here you have an interesting relationship where understanding the first layer doesn’t fully constrain the second one. That is to say, there are many systems that have the same input output mapping but that under the hood uses different algorithms.

In the case of the abacus, an algorithm might be something whenever you want to add a number you just push a bead. Whenever you’re done with a row, you push all of the beads backs and then you add a bead in the row underneath. And finally, you have the implementation level of analysis, and that is, what is the system actually made of? How is it constructed? All of these different levels ultimately also map onto different theories of consciousness, and that is basically where in the stack you associate consciousness, or being, or “what matters”. So, for example, behaviorists in the ’50s, they may associate consciousness, if they give any credibility to that term, with the behavioral level. They don’t really care what’s happening inside as long as you have extended pattern of reinforcement learning over many iterations.

What matters is basically how you’re behaving and that’s the crux of who you are. A functionalist will actually care about what algorithms you’re running, how is it that you’re actually transforming the input into the output. Functionalists generally do care about, for example, brain imaging, they do care about the high level algorithms that the brain is running, and generally will be very interested in figuring out these algorithms and generalize them in fields like machine learning and digital neural networks and so on. A physicalist associate consciousness at the implementation level of analysis. How the system is physically constructed, has bearings on what is it like to be that system.

Lucas: So, you guys haven’t said that this was your favorite approach, but if people are familiar with David Chalmers, these seem to be the easy problems, right? And functionalists are interested in just the easy problems and some of them will actually just try to explain consciousness away, right?

Mike: Yeah, I would say so. And I think to try to condense some of the criticism we have of functionalism, I would claim that it looks like a theory of consciousness and can feel like a theory of consciousness, but it may not actually do what we need a theory of consciousness to do; specify which exact phenomenological states are present.

Lucas: Is there not some conceptual partitioning that we need to do between functionalists who believe in qualia or consciousness, and those that are illusionists or want to explain it away or think that it’s a myth?

Mike: I think that there is that partition, and I guess there is a question of how principled the partition you can be, or whether if you chase the ideas down as far as you can, the partition collapses. Either consciousness is a thing that is real in some fundamental sense and I think you can get there with physicalism, or consciousness is more of a process, a leaky abstraction. I think functionalism naturally tugs in that direction. For example, Brian Tomasik has followed this line of reasoning and come to the conclusion of analytic functionalism, which is trying to explain away consciousness.

Lucas: What is your guys’s working definition of consciousness and what does it mean to say that consciousness is real.

Mike: It is a word that’s overloaded. It’s used in many contexts. I would frame it as what it feels like to be something, and something is conscious if there is something it feels like to be that thing.

Andrés: It’s important also to highlight some of its properties. As Mike pointed out, consciousness, it’s used in many different ways. There’s like eight to definitions for the word consciousness, and honestly, all of them are really interesting. Some of them are more fundamental than others and we tend to focus on the more fundamental side of the spectrum for the word. A sense that would be very not fundamental would be consciousness in the sense of social awareness or something like that. We actually think of consciousness much more in terms of qualia; what is it like to be something? What is it like to exist? Some of the key properties of consciousness are as follows: First of all, we do think it exists.

Second, in some sense it has causal power in the sense that the fact that we are conscious matters for evolution, evolution made us conscious for a reason that it’s actually doing some computational legwork that would be maybe possible to do, but just not as efficient or not as conveniently as it is possible with consciousness. Then also you have the property of qualia, the fact that we can experience sights, and colors, and tactile sensations, and thoughts experiences, and emotions, and so on, and all of these are in completely different worlds, and in a sense they are, but they have the property that they can be part of a unified experience that can experience color at the same time as experiencing sound. That sends those different types of sensations, we describe them as the category of consciousness because they can be experienced together.

And finally, you have unity, the fact that you have the capability of experiencing many qualia simultaneously. That’s generally a very strong claim to make, but we think you need to acknowledge and take seriously its unity.

Lucas: What are your guys’s intuition pumps for thinking why consciousness exists as a thing? Why is there a qualia?

Andrés: There’s the metaphysical question of why consciousness exists to begin within. That’s something I would like to punt for the time being. There’s also the question of why was it recruited for information processing purposes in animals? The intuition here is that there are various contrasts that you can have within experience, can serve a computational role. So, there may be a very deep reason why color qualia or visual qualia is used for information processing associated with sight, and why tactile qualia is associated with information processing useful for touching and making haptic representations, and that might have to do with the actual map of how all the qualia values are related to each other. Obviously, you have all of these edge cases, people who are seeing synesthetic.

They may open their eyes and they experience sounds associated with colors, and people tend to think of those as abnormal. I would flip it around and say that we are all synesthetic, it’s just that the synesthesia that we have in general is very evolutionarily adaptive. The reason why you experience colors when you open your eyes is that that type of qualia is really well suited to represent geometrically a projective space. That’s something that naturally comes out of representing the world with the sensory apparatus like eyes. That doesn’t mean that there aren’t other ways of doing it. It’s possible that you could have an offshoot of humans that whenever they opened their eyes, they experience sound and they use that very well to represent the visual world.

But we may very well be in a local maxima of how different types of qualia are used to represent and do certain types of computations in a very well-suited way. It’s like the intuition behind why we’re conscious, is that all of these different contrasts in the structure of the relationship of possible qualia values has computational implications, and there’s actual ways of using this contrast in very computationally effective ways.

Lucas: So, just to channel of the functionalist here, wouldn’t he just say that everything you just said about qualia could be fully reducible to input output and algorithmic information processing? So, why do we need this extra property of qualia?

Andrés: There’s this article, I believe is by Brian Tomasik that basically says, flavors of consciousness are flavors of computation. It might be very useful to do that exercise, where basically you identify color qualia as just a certain type of computation and it may very well be that the geometric structure of color is actually just a particular algorithmic structure, that whenever you have a particular type of algorithmic information processing, you get these geometric plate space. In the case of color, that’s a Euclidean three-dimensional space. In the case of tactile or smell, it might be a much more complicated space, but then it’s in a sense implied by the algorithms that we run. There is a number of good arguments there.

The general approach to how to tackle them is that when it comes down to actually defining what algorithms a given system is running, you will hit a wall when you try to formalize exactly how to do it. So, one example is, how do you determine the scope of an algorithm? When you’re analyzing a physical system and you’re trying to identify what algorithm it is running, are you allowed to basically contemplate 1,000 atoms? Are you allowed to contemplate a million atoms? Where is a natural boundary for you to say, “Whatever is inside here can be part of the same algorithm, but whatever is outside of it can’t.” And, there really isn’t a framing variant way of making those decisions. On the other hand, if you ask to see a qualia with actual physical states, there is a framing variant way of describing what the system is.

Mike: So, a couple of years ago I posted a piece giving a critique of functionalism and one of the examples that I brought up was, if I have a bag of popcorn and I shake the bag of popcorn, did I just torture someone? Did I just run a whole brain emulation of some horrible experience, or did I not? There’s not really an objective way to determine which algorithms a physical system is objectively running. So this is a kind of an unanswerable question from the perspective of functionalism, whereas with the physical theory of consciousness, it would have a clear answer.

Andrés: Another metaphor here he is, let’s say you’re at a park enjoying an ice cream. In this system that I created that has, let’s say isomorphic algorithms to whatever is going on in your brain, the particular algorithms that your brain is running in that precise moment within a functionalist paradigm maps onto a metal ball rolling down one of the paths within these machine in a straight line, not touching anything else. So there’s actually not much going on. According to functionalism, that would have to be equivalent and it would actually be generating your experience. Now the weird thing there is that you could actually break the machine, you could do a lot of things and the behavior of the ball would not change.

Meaning that within functionalism, and to actually understand what a system is doing, you need to understand the counter-factuals of the system. You need to understand, what would the system be doing if the input had been different? And all of a sudden, you end with this very, very gnarly problem of defining, well, how do you actually objectively decide what is the boundary of the system? Even some of these particular states that allegedly are very complicated, the system looks extremely simple, and you can remove a lot of parts without actually modifying its behavior. Then that casts in question whether there is a objective boundary, any known arbitrary boundary that you can draw around the system and say, “Yeah, this is equivalent to what’s going on in your brain,” right now.

This has a very heavy bearing on the binding problem. The binding problem for those who haven’t heard of it is basically, how is it possible that 100 billion neurons just because they’re skull-bound, spatially distributed, how is it possible that they simultaneously contribute to a unified experience as opposed to, for example, neurons in your brain and neurons in my brain contributing to a unified experience? You hit a lot of problems like what is the speed of propagation of information for different states within the brain? I’ll leave it at that for the time being.

Lucas: I would just like to be careful about this intuition here that experience is unified. I think that the intuition pump for that is direct phenomenological experience like experience seems unified, but experience also seems a lot of different ways that aren’t necessarily descriptive of reality, right?

Andrés: You can think of it as different levels of sophistication, where you may start out with a very naive understanding of the world, where you confuse your experience for the world itself. A very large percentage of people perceive the world and in a sense think that they are experiencing the world directly, whereas all the evidence indicates that actually you’re experiencing an internal representation. You can go and dream, you can hallucinate, you can enter interesting meditative states, and those don’t map to external states of the world.

There’s this transition that happens when you realize that in some sense you’re experiencing a world simulation created by your brain, and of course, you’re fooled by it in countless ways, especially when it comes to e