Posts in this category get featured at the top of the front page.

AI Alignment Podcast: On the Long-term Importance of Current AI Policy with Nicolas Moës and Jared Brown

 Topics discussed in this episode include:

  • The importance of current AI policy work for long-term AI risk
  • Where we currently stand in the process of forming AI policy
  • Why persons worried about existential risk should care about present day AI policy
  • AI and the global community
  • The rationality and irrationality around AI race narratives

Timestamps: 

0:00 Intro

4:58 Why it’s important to work on AI policy 

12:08 Our historical position in the process of AI policy

21:54 For long-termists and those concerned about AGI risk, how is AI policy today important and relevant? 

33:46 AI policy and shorter-term global catastrophic and existential risks

38:18 The Brussels and Sacramento effects

41:23 Why is racing on AI technology bad? 

48:45 The rationality of racing to AGI 

58:22 Where is AI policy currently?

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today’s episode is with Jared Brown and Nicolas Moës, two AI Policy researchers and AI influencers who are both concerned with the long-term and existential risks associated with artificial general intelligence and superintelligence. For us at the the Future of Life Institute, we’re particularly interested in mitigating threats from powerful AI that could lead to the extinction of life. One avenue of trying to address such threats could be through action in the space of AI policy. But just what can we do today to help ensure beneficial outcomes from AGI and superintelligence in the policy sphere? This podcast focuses on this question.

As for some key points to reflect on throughout the podcast, Nicolas Moes points out that engaging in AI policy today is important because: 1) Experience gained on short-term AI policy issues is important to be considered a relevant advisor on long-term AI policy issues coming up in the future. 2) There are very few people that care about AGI safety currently in government, politics or in policy communities. 3) There are opportunities to influence current AI policy decisions in order to provide a fertile ground for future policy decisions or, better but rarer, to be directly shaping AGI safety policy today though evergreen texts. Future policy that is implemented is path dependent on current policy that we implement today. What we do now is precedent setting. 4) There are opportunities today to develop a skillset useful for other policy issues and causes. 5) Little resource is being spent on this avenue for impact, so the current return on investment is quite good.

Finally I’d like to reflect on the need to bridge the long-term and short-term partitioning of AI risk discourse. You might have heard this divide before, where there are long-term risks from AI, like a long-term risk being powerful AGI or superintelligence misaligned with human values causing the extinction of life, and then short-term risk like algorithmic bias and automation induced disemployment. Bridging this divide means understanding the real and deep interdependencies and path dependencies between the technology and governance which choose to develop today, and the world where AGI or superintelligence emerges. 

For those not familiar with Jared Brown or Nicolas Moës, Nicolas is an economist by training focused on the impact of Artificial Intelligence on geopolitics, the economy and society. He is the Brussels-based representative of The Future Society. Passionate about global technological progress, Nicolas monitors global developments in the legislative framework surrounding AI. He completed his Masters degree in Economics at the University of Oxford with a thesis on institutional engineering for resolving the tragedy of the commons in global contexts. 

Jared is the Senior Advisor for Government Affairs at FLI, working to reduce global catastrophic and existential risk (GCR/x-risk) by influencing the U.S. policymaking process, especially as it relates to emerging technologies. He is also a Special Advisor for Government Affairs at the Global Catastrophic Risk Institute. He has spent his career working at the intersection of public policy, emergency management, and risk management, having previously served as an Analyst in Emergency Management and Homeland Security Policy at the U.S. Congressional Research Service and in homeland security at the U.S. Department of Transportation.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate. These contributions make it possible for us to bring you conversations like these and to develop the podcast further. You can also follow us on your preferred listening platform by searching for us directly or following the links on the page for this podcast found in the description.

And with that, here is Jared Brown and Nicolas Moës on AI policy. 

I guess we can start off here, with developing the motivations around why it’s important for people to be considering AI policy. So, why is it important to be working on AI policy right now?

Nicolas Moës: It’s important right now because there has been an uptick in markets, right? So AI technologies are now embedded in many more products than ever before. Part of it is hype, but part of it is also having a real impact on profits and bottom line. So there is an impact on society that we have never seen before. For example, the way Facebook algorithms have affected recent history is something that has made the population and policy makers panic a bit.

And so quite naturally the policy window has opened. I think it’s also important to be working on it for people who would like to make the world better for two reasons. As I mentioned, since the policy window is open that means that there is a demand for advice to fill in the gaps that exist in the legislation, right? There have been many concrete situations where, as an AI policy researcher, you get asked to provide input either by joining expert group, or workshops or simply directly some people who say, “Oh, you know about AI, so could you just send me a position paper on this?”

Nicolas Moës: So these policies are getting written right now, which at first is quite soft and then becomes harder and harder policies, and now to the point that at least in the EU, you have regulations for AI on the agenda, which is one of the hardest form of legislation out there. Once these are written it is very difficult to change them. It’s quite sticky. There is a lot of path dependency in legislation. So this first legislation that passes, will probably shape the box in which future legislation can evolve. Its constraints, the trajectory of future policies, and therefore it’s really difficult to take future policies in another direction. So for people who are concerned about AGI, it’s important to be already present right now.

The second point, is that these people who are currently interacting with policymakers on a daily basis are concerned about very specific things and they are gaining a lot of experience with policymakers, so that in the future when you have more general algorithms that come into play, the people with experience to advise on these policies will actually be concerned about what many people call short term issues. People who are concerned more about the safety, the robustness of these more general algorithm would actually end up having a hard time getting into the room, right? You cannot just walk in and claim authority when you have people with 10, 15 or even 20 years of experience regulating this particular field of engineering.

Jared Brown: I think that sums it up great, and I would just add that there are some very specific examples of where we’re seeing what has largely been, up to this point, a set of principles being developed by different governments, or industry groups. We’re now seeing attempts to actually enact hard law or policy.

Just in the US, the Office of Management and Budget and the Office of Science and Technology Policy issued a memorandum calling for further AI regulation and non-regulatory actions and they issued a set of principles, that’s out for comment right now, and people are looking at those principles, trying to see if there’s ways of commenting on it to increase its longterm focus and its ability to adapt to increasingly powerful AI.

The OECD has already issued, and had sign ons to its AI principles, which are quite good.

Lucas Perry: What is the OECD?

Nicolas Moës: The Organization for Economic Cooperation and Development.

Jared Brown: Yes. Those principles are now going from principles to an observatory, and that will be launched by the end of February. And we’re seeing the effect of these principles now being adopted, and attempts now are being made to implement those into real regulatory approaches. So, the window from transitioning from principles to hard law is occurring right now, and as Nicholas said, decisions that are made now will have longterm effects because typically governments don’t turn their attention to issues more than once every five, maybe even 10 years. And so, if you come in three years from now with some brilliant idea about AI policy, chances are, the moment to enact that policy has already passed because the year prior, or two years prior, your government has enacted its formative legislation on AI.

Nicolas Moës: Yeah, yeah. So long as this policy benefits most people, they are very unlikely to even reopen, let’s say, the discussion, at all.

Lucas Perry: Right. So a few points here. The first is this one about path dependency, which means that the kinds of policies which we adopt now are going to be really important, because they’re going to inform and shape the kinds of policies that we’re able or willing to adopt later, and AI is going to be around for a long, long time. So we’re setting a lot of the foundation. The second thing was that if you care about AGI risk, or the risks of superintelligence, or very powerful forms of AI that you need to have been part of the conversation since the beginning, or else you’re not going to really be able to get a seat at the table when these things come around.

And Jared, is there a point here that I’m missing that you were trying to make?

Jared Brown: No, I think that sums it up nicely. The effect of these policies, and the ability of these policies to remain what you might call evergreen. So, long lasting and adaptive to the changing nature of AI technology is going to be critical. We see this all the time in tech policy. There are tech policies out there that were informed by the challenges of the time in which they were made and they quickly become detrimental, or outdated at best. And then there are tech policies that tend to be more adaptive, and those stand the test of time. And we need to be willing to engage with the short term policy making considerations, such that we’re making sure that the policies are evergreen for AI, as it becomes increasingly powerful.

Nicolas Moës: Besides the evergreen aspects of the policies that you want to set up now, there’s this notion of providing a fertile ground. So some policies that are very appropriate for short term issues, for example, fairness and deception, and fundamental rights abuse and that kind of thing, are actually almost copy pasted to future legislation. So, if you manage to already put concerns for safety, like robustness, corrigibility, and value alignment of the algorithm today, even if you don’t have any influence in 10 or 15 years when they review the legislation, you have some chances to see the policymakers just copy pasting this part on safety and to put it in whatever new legislation comes up in 10 years.

Jared Brown: There’s precedent setting, and legislators are woe to have to make fundamental reforms to legislation, and so if we see proper consideration of safety and security on AI in the evergreen pieces of legislation that are being developed now, that’s unlikely to be removed in future legislation.

Lucas Perry: Jared, you said that a lot of the principles and norms which have been articulated over say, the past five years are becoming codified into hard law slowly. It also would just be good if you guys could historically contextualize our position in terms of AI policy, whether or not we stand at an important inflection point, where we are in terms of this emerging technology.

Jared Brown: Sure, sure. So, I think if you went back just to 2017, 2016, at least in the US, there was very little attention to artificial intelligence. There were a smattering of congressional hearings being held, a few pertinent policy documents being released by executive agencies, but by and large, the term artificial intelligence remained in the science fiction realm of thinking.

Since that time, there’s been a massive amount of attention paid to artificial intelligence, such that in almost every Western democracy that I’m familiar with, it’s now part of the common discourse about technology policy. The phrase emerging tech is something that you see all over the place, regardless of the context, and there’s a real sensitivity by Western style democracy policymakers towards this idea that technology is shifting under our feet. There’s this thing called artificial intelligence, there’s this thing called synthetic biology, there’s other technologies linked into that — 5G and hypersonics are two other areas — where there’s a real understanding that something is changing, and we need to get our arms around it. Now, that has largely started with, in the past year, or year and a half, a slew of principles. There are at least 80 some odd sets of principles. FLI was one of the first to create a set of principles, along with many partners, and those are the Asilomar AI Principles.

Those principles you can see replicated and informing many sets of principles since then. We mentioned earlier, the OECD AI principles are probably the most substantive and important at this point, because they have the signature and backing of so many sovereign nation states, including the United States and most of the EU. Now that we have these core soft law principles, there’s an appetite for converting that into real hard law regulation or approaches to how AI will be managed in different governance systems.

What we’re seeing in the US, there’s been a few regulatory approaches already taken. For instance, rule making on the inclusion of AI algorithms into the housing market. This vision, if you will, from the Department of Transportation, about how to deal with autonomous vehicles. The FDA has approved products coming into the market that involve AI and diagnostics in the healthcare industry, and so forth. We’re seeing initial policies being established, but what we haven’t yet seen in any real context, is sort of a cross-sectoral AI broadly-focused piece of legislation or regulation.

And that’s what’s currently being developed both in the EU and in the US. That type of legislation, which seems like a natural evolution from where we’re at with principles, into a comprehensive holistic approach to AI regulation and legislation, is now occurring. And that’s why this time is so critical for AI policy.

Lucas Perry: So you’re saying that a broader and more holistic view about AI regulation and what it means to have and regulate beneficial AI is developed before more specific policies are implemented, with regards to the military, or autonomous weapons, or healthcare, or nuclear command and control.

Jared Brown: So, typically, governments try, whether or not they succeed remains to be seen, to be more strategic in their approach. If there is a common element that’s affecting many different sectors of society, they try and at least strategically approach that issue, to think: what is common across all policy arenas, where AI is having an effect, and what can we do to legislate holistically about AI? And then as necessary, build sector specific policies on particular issues.

So clearly, you’re not going to see some massive piece of legislation that covers all the potential issues that has to do with autonomous vehicles, labor displacement, workforce training, et cetera. But you do want to have an overarching strategic plan for how you’re regulating, how you’re thinking about governing AI holistically. And that’s what’s occurring right now, is we have the principles, now we need to develop that cross-sectoral approach, so that we can then subsequently have consistent and informed policy on particular issue areas as they come up, and as they’re needed.

Lucas Perry: And that cross-sectoral approach would be something like: AI should be interpretable and robust and secure.

Jared Brown: That’s written in principles to a large degree. But now we’re seeing, what does that really mean? So in the EU they’re calling it the European Approach to AI, and they’re going to be coming out with a white paper, maybe by the time this podcast is released, and that will sort of be their initial official set of options and opinions about how AI can be dealt with holistically by the EU. In the US, they’re setting regulatory principles for individual regulatory agencies. These are principles that will apply to the FDA, or the Department of Transportation, or the Department of Commerce, or the Department of Defense, as they think about how they deal with the specific issues of AI in their arenas of governance. Making sure that baseline foundation is informed and is an evergreen document, so that it incorporates future considerations, or is at least adaptable to future technological development in AI is critically important.

Nicolas Moës: With regards to the EU in particular, the historical context is maybe a bit different. As you mentioned, right now they are discussing this white paper with many transversal policy instruments that would be put forward, with this legislation. This is going to be negotiated over the next year. There is intentions to have the legislation at the EU level by the end of the current commission’s term. So that’s mean within five years. This is something that is quite interesting to explore, is that in 2016 there was this parliamentary dossier on initiative, so it’s something that does not have any binding power, just to show the opinion of the European parliament, that was dealing with robotics and civil laws. So, considering how civil law in Europe should be adjusted to robotics.

That was in 2016, right? And now there’s been this uptick in activities. This is something that we have to be aware of. It’s moved quite fast, but then again, there still is a couple of years before regulations get approved. This is one point that I wanted to clarify about, when we say it is fast or it is slow, we are talking still about a couple of years. Which is, when you know how long it takes for you to develop your network, to develop your understanding of the issues, and to try to influence the issues, a couple of years is really way too short. The second point I wanted to make is also, what will the policy landscape look like in two years? Will we have the EU again leveraging its huge market power to impose its regulations within the European Commission. There are some intentions to diffuse whatever regulations come out of the European Commission right now, throughout the world, right? To form a sort of influence sphere, where all the AI produced, even abroad, would actually be fitting EU standards.

Over the past two, three years there have been a mushrooming of AI policy players, right? The ITU has set up this AI For Good, and has reoriented its position towards AI. There has been the Global Forum on AI for Humanity, political AI summits, which kind of pace the discussions about the global governance of artificial intelligence.

But would there be space for new players in the future? That’s something that I’m a bit unsure. One of the reasons why it might be an inflection point, as you asked, is because now I think the pawns are set on the board, right? And it is unlikely that somebody could come in and just disturb everything. I don’t know in Washington how it plays, but in Brussels it seems very much like everybody knows each other already and it’s only about bargaining with each other, not especially listening to outside views.

Jared Brown: So, I think the policy environment is being set. I wouldn’t quite go so far as to say all of the pawns are on the chess board, but I think many of them are. The queen is certainly industry, and industry has stood up and taken notice that governments want to regulate and want to be proactive about their approach to artificial intelligence. And you’ve seen this, because you can open up your daily newspaper pretty much anywhere in the world and see some headline about some CEO of some powerful tech company mentioning AI in the same breath as government, and government action or government regulations.

Industry is certainly aware of the attention that AI is getting, and they are positioning themselves to influence that as much as possible. And so civil society groups such as the ones Nico and I represent have to step up, which is not to say the industry has all bad ideas, some of what they’re proposing is quite good. But it’s not exclusively a domain controlled by industry opinions about the regulatory nature of future technologies.

Lucas Perry: All right. I’d like to pivot here, more into some of the views and motivations the Future of Life Institute and the Future Society take, when looking at AI policy. The question in particular that I’d like to explore is how is current AI policy important for those concerned with AGI risk and longterm considerations about artificial intelligence growing into powerful generality, and then one day surpassing human beings in intelligence? For those interested in the issue of AGI risk or super intelligence risk, is AI policy today important? Why might it be important? What can we do to help shape or inform the outcomes related to this?

Nicolas Moës: I mean, obviously, I’m working full time on this and if I could, I would work double full time on this. So I do think it’s important. But it’s still too early to be talking about this in the policy rooms, at least in Brussels. Even though we have identified a couple of policymakers that would be keen to talk about that. But it’s politically not feasible to put forward these kind of discussions. However, AI policy currently is important because there is a demand for advice, for policy research, for concrete recommendations about how to govern this technological transition that we are experiencing.

So there is this demand where people who are concerned about fundamental rights, and safety, and robustness, civil society groups, but also academics and industry themselves sometime come in with their clear recommendations about how you should concretely regulate, or govern, or otherwise influence the development and deployment of AI technologies, and in that set of people, if you have people who are concerned about safety, you would be able then, to provide advice for providing evergreen policies, as we’ve mentioned earlier and set up, let’s say, a fertile ground for better policies in the future, as well.

The second part of why it’s important right now is also the longterm workforce management. If people who are concerned about the AGI safety are not in the room right now, and if they are in the room but focused only on AGI safety, they might be perceived as irrelevant by current policymakers, and therefore they might have restricted access to opportunities for gaining experience in that field. And therefore over the long term this dynamic reduces the growth rate, let’s say, of the workforce that is concerned about AGI safety, and that could be identified as a relevant advisor in the future. As a general purpose technology, even short term issues regarding AI policy have a long term impact on the whole of society.

Jared Brown: Both Nicholas and I have used this term “path dependency,” which you’ll hear a lot in our community and I think it really helps maybe to build out that metaphor. Various different members of the audience of this podcast are going to have different timelines in their heads when they think about when AGI might occur, and who’s going to develop it, what the characteristics of that system will be, and how likely it is that it will be unaligned, and so on and so forth. I’m not here to engage in that debate, but I would encourage everyone to literally think about whatever timeline you have in your head, or whatever descriptions you have for the characteristics that are most likely to occur when AGI occurs.

You have a vision of that future environment, and clearly you can imagine different environments by which humanity is more likely to be able to manage that challenge than other environments. An obvious example, if the world were engaged in World War Three, 30 years from now, and some company develops AGI, that’s not good. It’s not a good world for AGI to be developed in, if it’s currently engaged in World War Three at the same time. I’m not suggesting we’re doing anything to mitigate World War Three, but there are different environments for when AGI can occur that will make it more or less likely that we will have a beneficial outcome from the development of this technology.

We’re literally on a path towards that future. More government funding for AI safety research is a good thing. That’s a decision that has to get made, that’s made every single day, in governments all across the world. Governments have R&D budgets. How much is that being spent on AI safety versus AI capability development? If you would like to see more, then that decision is being made every single fiscal year of every single government that has an R&D budget. And what you can do to influence it is really up to you and how many resources you’re going to put into it.

Lucas Perry: Many of the ways it seems that AI policy currently is important for AGI existential risk are indirect. Perhaps it’s direct insofar as there’s these foundational evergreen documents, and maybe changing our trajectory directly is kind of a direct intervention.

Jared Brown: How much has nuclear policy changed? When our governance of nuclear weapons changed because the US initially decided to use the weapon. That decision irrevocably changed the future of Nuclear Weapons Policy, and there is no way you can counterfactually unspool all of the various different ways the initial use of the weapon, not once, but twice by the US sent a signal to the world A, the US was willing to use this weapon and the power of that weapon was on full display.

There are going to be junctures in the trajectory of AI policy that are going to be potentially as fundamental as whether or not the US should use a nuclear weapon at Hiroshima. Those decisions are going to be hard to see necessarily right now, if you’re not in the room and you’re not thinking about the way that policy is going to project into the future. That’s where this matters. You can’t unspool and rerun history. We can’t decide for instance, on lethal autonomous weapons policy. There is a world that exists, a future scenario 30 years from now, where international governance has never been established on lethal autonomous weapons. And lethal autonomous weapons is completely the norm for militaries to use indiscriminately or without proper safety at all. And then there’s a world where they’ve been completely banned. Those two conditions will have serious effect on the likelihood that governments are up to the challenge of addressing potential global catastrophic and existential risk arising from unaligned AGI. And so it’s more than just setting a path. It’s central to the capacity building of our future to deal with these challenges.

Nicolas Moës: Regarding other existential risks, I mean Jared is more of an expert on that than I am. In the EU, because this topic is so hot, it’s much more promising, let’s say as an avenue for impact, than other policy dossiers because we don’t have the omnibus type of legislation that you have in the US. The EU remains quite topic for topic. In the end, there is very little power embeded in the EU, mostly it depends on the nation states as well, right?

So AI is as moves at the EU level, which makes you want to walk at the EU level AI policy for sure. But for the other issues, it sometimes remains still at the national level. That’d being said, the EU also has this particularity, let’s say off being able to reshape debates at the national level. So, if there were people to consider what are the best approaches to reduce existential risk in general via EU policy, I’m sure there would be a couple of dossiers right now with policy window opens that could be a conduit for impact.

Jared Brown: If the community of folks that are concerned about the development of AGI are correct and that it may have potentially global catastrophic and existential threat to society, then you’re necessarily obviously admitting that AGI is also going to affect the society extremely broadly. It’s going to be akin to an industrial revolution, as is often said. And that’s going to permeate every which way in society.

And there’s been some great work to scope this out. For instance, in the nuclear sphere, I would recommend to all the audience that they take a look at a recent edited compendium of papers by the Stockholm International Peace Research Institute. They have a fantastic compendium of papers about AI’s effect on strategic stability in nuclear risk. That type of sector specific analysis can be done with synthetic biology and various other things that people are concerned about as evolving into existential or global catastrophic risk.

And then there are current concerns with non anthropomorphic risk. AI is going to be tremendously helpful if used correctly to track and monitor near earth objects. You have to be concerned about asteroid impacts. AI is a great tool to be used to help reduce that risk by monitoring and tracking near Earth objects.

We may yet make tremendous discoveries in geology to deal with supervolcanoes. Just recently there’s been some great coverage of a AI company called Blue Dot for monitoring the potential pandemics arising with the Coronavirus. We see these applications of AI very beneficially reducing other global catastrophic and existential risks, but there are aggravating factors as well, especially for other anthropomorphic concerns related to nuclear risk and synthetic biology.

Nicolas Moës: Some people who are concerned about is AGI sometimes might see AI as overall negative in expectation, but a lot of policy makers see AI as an opportunity more than as a risk, right? So, starting with a negative narrative or a pessimistic narrative is difficult in the current landscape.

In Europe it might be a bit easier because for odd historical reasons it tends to be a bit more cautious about technology and tends to be more proactive about regulations than maybe anywhere else in the world. I’m not saying whether it’s a good thing or a bad thing. I think there’s advantages and disadvantages. It’s important to know though that even in Europe you still have people who are anti-regulation. The European commission set this independent high level expert group on AI with 52 or 54 experts on AI to decide about the ethical principles that will inform the legislation on AI. So this was for the past year and a half, or the past two years even. Among them, the divisions are really important. Some of them wanted to just let it go for self-regulation because even issues of fairness or safety will be detected eventually by society and addressed when they arise. And it’s important to mention that actually in the commission, even though the current white paper seems to be more on the side of preventive regulations or proactive regulations, the commissioner for digital, Thierry Breton is definitely cautious about the approach he takes. But you can see that he is quite positive about the potential of technology.

The important thing here as well is that these players have an influential role to play on policy, right? So, going back to this negative narrative about AGI, it’s also something where we have to talk about how you communicate and how you influence in the end the policy debate, given the current preferences and the opinions of people in society as a whole, not only the opinions of experts. If it was only about experts, it would be maybe different, but this is politics, right? The opinion of everybody matters and it’s important that whatever influence you want to have on AI policy is compatible with the rest of society’s opinion.

Lucas Perry: So, I’m curious to know more about the extent to which the AI policy sphere is mindful of and exploring the shorter term global catastrophic or maybe even existential risks that arise from the interplay of more near term artificial intelligence with other kinds of technologies. Jared mentioned a few in terms of synthetic biology, and global pandemics, and autonomous weapons, and AI being implemented in the military and early warning detection systems. So, I’m curious to know more about the extent to which there are considerations and discussions around the interplay of shorter term AI risks with actual global catastrophic and existential risks.

Jared Brown: So, there’s this general understanding, which I think most people accept, that AI is not magical. It is open to manipulation, it has certain inherent flaws in its current capability and constructs. We need to make sure that that is fully embraced as we consider different applications of AI into systems like nuclear command and control. At a certain point in time, the argument could be sound that AI is a better decision maker than your average set of humans in a command and control structure. There’s no shortage of instances of near misses with nuclear war based on existing sensor arrays, and so on and so forth, and the humans behind those sensor arrays, with nuclear command and control. But we have to be making those evaluations fully informed about the true limitations of AI and that’s where the community is really important. We have to cut through the hype and cut through overselling what AI is capable of, and be brutally honest about the current limitations of AI as it evolves, and whether or not it makes sense from a risk perspective to integrate AI in certain ways.

Nicolas Moës: There has been human mistakes that have led to close calls, but I believe these close calls have been corrected because of another human in the loop. In early warning systems though, you might actually end up with no human in the loop. I mean, again, we cannot really say whether these humans in the loop were statistically important because we don’t have the alternatives obviously to compare it to.

Another thing regarding whether some people think that AI is magic, I, I think, would be a bit more cynical. I still find myself in some workshops or policy conferences where you have some people who apparently haven’t seen ever a line of code in their entire life and still believe that if you tell the developer “make sure your AI is explainable,” that magically the AI would become explainable. This is still quite common in Brussels, I’m afraid. But there is a lot of heterogeneity. I think now we have, even among the 705 MEPs, there is one of them who is a former programmer from France. And that’s the kind of person who, given his expertise, if he was placed on the AI dossier, I guess he would have a lot more influence because of his expertise.

Jared Brown: Yeah. I think in the US there’s this phrase that kicks around that the US is experiencing a techlash, meaning there’s a growing reluctance, cynicism, criticism of major tech industry players. So, this started with the Cambridge Analytica problems that arose in the 2016 election. Some of it’s related to concerns about potential monopolies. I will say that it’s not directly related to AI, but that general level of criticism, more skepticism, is being imbued into the overall policy environment. And so people are more willing to question the latest, next greatest thing that’s coming from the tech industry because we’re currently having this retrospective analysis of what we used to think of a fantastic and development may not be as fantastic as we thought it was. That kind of skepticism is somewhat helpful for our community because it can be leveraged for people to be more willing to take a critical eye in the way that we apply technology going forward, knowing that there may have been some mistakes made in the past.

Lucas Perry: Before we move on to more empirical questions and questions about how AI policy is actually being implemented today, are there any other things here that you guys would like to touch on or say about the importance of engaging with AI policy and its interplay and role in mitigating both AGI risk and existential risk?

Nicolas Moës: Yeah, the so called Brussels effect, which actually describes that whatever decisions in European policy that is made is actually influencing the rest of the world. I mentioned it briefly earlier. I’d be curious to hear what you, Jared, thinks about that. In Washington, do people consider it, the GDPR for example, as a pre made text that they can just copy paste? Because apparently, I know that California has released something quite similar based on GDPR. By the way, GDPR is the General Data Protection Regulations governing protection of privacy in the EU. It’s a regulation, so it has a binding effect on EU member States. That, by the Brussels effect, what I mean is that for example, this big piece of legislation as being, let’s say, integrated by big companies abroad, including US companies to ensure that they can keep access to the European market.

And so the commission is actually quite proud of announcing that for example, some Brazilian legislator or some Japanese legislator or some Indian legislators are coming to the commission to translate the text of GDPR, and to take it back to their discussion in their own jurisdiction. I’m curious to hear what you think of whether the European third way about AI has a greater potential to lead to beneficial AI and beneficial AGI than legislation coming out of the US and China given the economic incentives that they’ve got.

Jared Brown: I think in addition to the Brussels effect, we might have to amend it to say the Brussels and the Sacramento effect. Sacramento being the State Capitol of California because it’s one thing for the EU who have adopted the GDPR, and then California essentially replicated a lot of the GDPR, but not entirely, into what they call the CCPA, the California Consumer Privacy Act. If you combine the market size of the EU with California, you clearly have enough influence over the global economy. California for those who aren’t familiar, would be the seventh or sixth largest economy in the world if it were a standalone nation. So, the combined effect of Brussels and Sacramento developing tech policy or leading tech policy is not to be understated.

What remains to be seen though is how long lasting that precedent will be. And their ability to essentially be the first movers in the regulatory space will remain. With some of the criticism being developed around GDPR and the CCPA, it could be that leads to other governments trying to be more proactive to be the first out the door, the first movers in terms of major regulatory effects, which would minimize the Brussels effect or the Brussels and Sacramento effect.

Lucas Perry: So in terms of race conditions and sticking here on questions of global catastrophic risk and existential risks and why AI policy and governance and strategy considerations are important for risks associated with racing between say the United States and China on AI technology. Could you guys speak a little bit to the importance of appropriate AI policy and strategic positioning on mitigating race conditions and a why race would be bad for AGI risk and existential and global catastrophic risks in general?

Jared Brown: To simplify it, the basic logic here is that if two competing nations states or companies are engaged in a competitive environment to be the first to develop X, Y, Z, and they see tremendous incentive and advantage to being the first to develop such technology, then they’re more likely to cut corners when it comes to safety. And cut corners thinking about how to carefully apply these new developments to various different environments. There has been a lot of discussion about who will come to dominate the world and control AI technology. I’m not sure that either Nicolas or I really think that narrative is entirely accurate. Technology need not be a zero sum environment where the benefits are only accrued by one state or another. Or that the benefits accruing to one state necessarily reduce the benefits to another state. And there has been a growing recognition of this.

Nicolas earlier mentioned the high level expert group in the EU, an equivalent type body in the US, it’s called the National Security Commission on AI. And in their interim report they recognize that there is a strong need and one of their early recommendations is for what they call Track 1.5 or Track 2 diplomacy, which is essentially jargon for engagement with China and Russia on AI safety issues. Because if we deploy these technologies in reckless ways, that doesn’t benefit anyone. And we can still move cooperatively on AI safety and on the responsible use of AI without mitigating or entering into a zero sum environment where the benefits are only going to be accrued by one state or another.

Nicolas Moës: I definitely see the safety technologies as that would benefit everybody. If you’re thinking in two different types of inventions, the one that promotes safety indeed would be useful, but I believe that enhancing raw capabilities, you would actually race for that. Right? So, I totally agree with your decision narrative. I know people on both sides seeing this as a silly thing, you know, with media hype and of course industry benefiting a lot from this narrative.

There is a lot of this though that remains the rational thing to do, right? Whenever you start negotiating standards, you can say, “Well look at our systems. They are more advanced, so they should become the global standards for AI,” right? That actually is worrisome because the trajectory right now, since there is this narrative in place, is that over the medium term, you would expect the technologies maybe to diverge, and so both blocks, or if you want to charitably include the EU into this race, the three blocks would start diverging and therefore we’ll need each other less and less. The economic cost of an open conflict would actually decrease, but this is over the very long term.

That’s kind of the dangers of race dynamics as I see them. Again, it’s very heterogeneous, right? When we say the US against China, when you look at the more granular level of even units of governments are sometimes operating with a very different mindset. So, as for what in AI policy can actually be relevant to this for example, I do think they can, because at least on the Chinese side as far as I know, there is this awareness of the safety issue. Right? And there has been a pretty explicit article. It was like, “the US and China should work together to future proof AI.” So, it gives you the impression that some government officials or former government officials in China are interested in this dialogue about the safety of AI, which is what we would want. We don’t especially have to put the raw capabilities question on the table so long as there is common agreements about safety.

At the global level, there’s a lot of things happening to tackle this coordination problem. For example, the OECD AI Policy Observatory is an interesting setup because that’s an institution with which the US is still interacting. There have been fewer and fewer multilateral fora with which the US administration has been willing to interact constructively, let’s say. But for the OECD one yes, there’s been quite a lot of interactions. China is an observer to the OECD. So, I do believe that there is potential there to have a dialogue between the US and China, in particular about AI governance. And plenty of other fora exist at the global level to enable this Track 1.5 / Track 2 diplomacy that you mentioned Jared. For example, the Global Governance of AI Forum that the Future Society has organized, and Beneficial AGI that Future of Life Institute has organized.

Jared Brown: Yeah, and that’s sort of part and parcel with one of the most prominent examples of, some people call it scientific diplomacy, and that’s kind of a weird term, but the Pugwash conferences that occurred all throughout the Cold War where technical experts were meeting on the side to essentially establish a rapport between Russian and US scientists on issues of nuclear security and biological security as well.

So, there are plenty of examples where even if this race dynamic gets out of control, and even if we find ourselves 20 years from now in an extremely competitive, contentious relationship with near peer adversaries competing over the development of AI technology and other technologies, we shouldn’t, as civil society groups, give up hope and surrender to the inevitability that safety problems are likely to occur. We need to be looking to the past examples of what can be leveraged in order to appeal to essentially the common humanity of these nation states in their common interest in not wanting to see threats arise that would challenge either of their power dynamics.

Nicolas Moës: The context matters a lot, but sometimes it can be easier than one can think, right? So, I think when we organized the US China AI Tech Summit, because it was about business, about the cutting edge and because it was also about just getting together to discuss. And a bit before this US / China race dynamics was full on, there was not so many issues with getting our guests. Knowledge might be a bit more difficult with some officials not able to join events where officials from other countries are because of diplomatic reasons. And that was in June 2018 right? But back then there was the willingness and the possibility, since the US China tension was quite limited.

Jared Brown: Yeah, and I’ll just throw out a quick plug for other FLI podcasts. I recommend listeners check out the work that we did with Matthew Meselson. Max Tegmark had a great podcast on the development of the Biological Weapons Convention, which is a great example of how two competing nation states came to a common understanding about what was essentially a global catastrophic, or is, a global catastrophic and existential risk and develop the biological weapons convention.

Lucas Perry: So, tabling collaboration on safety, which can certainly be mutually beneficial in just focusing on capabilities research and how at least it seems basically just rational to race for that in a game theoretic sense.

That seems basically just rational to race for that in a game theoretic sense. I’m interested in exploring if you guys have any views or points to add here about mitigating the risks there, and how it may simply actually not be rational to race for that?

Nicolas Moës: So, there is the narrative currently that it’s rational to race on some aspect of raw capabilities, right? However, when you go beyond the typical game theoretical model, when you enable people to build bridges, you could actually find certain circumstances under which you have a so-called institutional entrepreneur building up in institutions that is legitimate so that everybody agrees upon that enforces the cooporation agreement.

In economics, the windfall clause is regarding the distribution of it. Here what I’m talking about in the game theoretical space, is how to avoid the negative impact, right? So, the windfall clause would operate in this very limited set of scenarios whereby the AGI leads to an abundance of wealth, and then a windfall clause deals with the distributional aspect and therefore reduce the incentive to a certain extent to produce AGI. However, to abide to the windfall clause, you still have to preserve the incentive to develop the AGI. Right? But you might actually tamp that down.

What I was talking about here, regarding the institutional entrepreneur, who can break this race by simply having a credible commitment from both sides and enforcing that commitment. So like the typical model of the tragedy of the commons, which here could be seen as you over-explored the time to superintelligence level, you can solve the tragedy of the commons, actually. So it’s not that rational anymore. Once you know that there is a solution, it’s not rational to go for the worst case scenario, right? You actually can design a mechanism that forces you to move towards the better outcome. It’s costly though, but it can be done if people are willing to put in the effort, and it’s not costly enough to justify not doing it.

Jared Brown: I would just add that the underlying assumptions about the rationality of racing towards raw capability development, largely depend on the level of risk you assign to unaligned AI or deploying narrow AI in ways that exacerbate global catastrophic and existential risk. Those game theories essentially can be changed and those dynamics can be changed if our community eventually starts to better sensitize players on both sides about the lose/lose situation, which we could find ourselves in through this type of racing. And so it’s not set in stone and the environment can be changed as information asymmetry is decreased between the two competing partners and there’s a greater appreciation for the lose/lose situations that can be developed.

Lucas Perry: Yeah. So I guess I just want to highlight the point then the superficial first analysis, it would seem that the rational game theoretic thing to do is to increase capability as much as possible, so that you have power and security over other actors. But that might not be true under further investigation.

Jared Brown: Right, and I mean, for those people who haven’t had to suffer through game theory classes, there’s a great popular culture example here that a lot of people have seen Stranger Things on Netflix. If you haven’t, maybe skip ahead 20 seconds until I’m done saying this. But there is an example of the US and Russia competing to understand the upside down world, and then releasing untold havoc onto their societies, because of this upside down discovery. For those of you who have watched, it’s actually a fairly realistic example of where this kind of competing technological development leads somewhere that’s a lose/lose for both parties, and if they had better cooperation and better information sharing about the potential risks, because they were each discovering it themselves without communicating those risks, neither would have opened up the portals to the upside down world.

Nicolas Moës: The same dynamics, the same “oh it’s rational to race” dynamic applied to nuclear policy and nuclear arms race has led to, actually, some treaties, far from perfection. Right? But some treaties. So this is the thing where, because the model, the tragedy of the commons, it’s easy to communicate. It’s a nice thing was doom and fatality that is embedded with it. This resonates really well with people, especially in the media, it’s a very simple thing to say. But this simply might not be true. Right? As I mentioned. So there is this institutional entrepreneurship aspect which requires resources, right? So that is very costly to do. But civil society is doing that, and I think the Future of Life Institute has agency to do that. The Future Society is definitely doing that. We are actually agents of breaking away from these game theoretical situations that would be otherwise unlikely.

We fixate a lot on the model, but in reality, we have seen the nuclear policy, the worst case scenario being averted sometimes by mistake. Right? The human in the loop not following the policy or something like that. Right. So it’s interesting as well. It shows how unpredictable all this is. It really shows that for AI, it’s the same. You could have the militaries on both sides, literally from one day to the next, start a discussion about AI safety, and how to ensure that they keep control. There’s a lot of goodwill on both sides and so maybe we could say like, “Oh, the economist” — and I’m an economist by just training so I can be a bit harsh on myself — they’re like, the economist would say, “But this is not rational.” Well, in the end, it is more rational, right? So long as you win, you know, remain in a healthy life and feel like you have done the right thing, this is the rational thing to do. Maybe if Netflix is not your thing, “Inadequate Equilibria” by Eliezer Yudkowsky explores these kinds of conundrums as well. Why do you have sub-optimal situations in life in general? It’s a very, general model, but I found it very interesting to think about these issues, and in the end it boils down to these kinds of situations.

Lucas Perry: Yeah, right. Like for example, the United States and Russia having like 7,000 nuclear warheads each, and being on hair trigger alert with one another, is a kind of in-optimal equilibrium that we’ve nudged ourself into. I mean it maybe just completely unrealistic, but a more optimum place to be would be no nuclear weapons, but have used all of that technology and information for nuclear power. Well, we would all just be better off.

Nicolas Moës: Yeah. What you describe seems to be a better situation. However, the rational thing to do at some point would have been before the Soviet Union developed, incapacitate Soviet Union to develop. Now, the mutually assured destruction policy is holding up a lot of that. But I do believe that the diplomacy, the discussions, the communication, even merely the fact of communicating like, “Look, if you do that and we will do that,” is a form of progress towards: basically you should not use it.

Jared Brown: Game theory is nice to boil things down into a nice little boxes, clearly. But the dynamics of the nuclear situation with the USSR and the US add countless number of boxes that you get end up in and yes, each of us having way too large nuclear arsenals is a sub-optimal outcome, but it’s not the worst possible outcome, that would have been total nuclear annihilation. So it’s important not just to look at it criticisms of the current situation, but also see the benefits of this current situation and why this box is better than some other boxes that we ended up in. And that way, we can leverage the past that we have taken to get to where we’re at, find the paths that were actually positive, and reapply those lessons learned to the trajectory of emerging technology once again. We can’t throw out everything that has happened on nuclear policy and assume that there’s nothing to be gained from it, just because the situation that we’ve ended up in is suboptimal.

Nicolas Moës: Something that I have experienced while interacting with policymakers and diplomats. You actually have an agency over what is going on. This is important also to note, is that it’s not like a small thing, and the world is passing by. No. Even in policy, which seems to be maybe a bit more arcane, in policy, you can pull the right levers to make somebody feel less like they have to obey this race narrative.

Jared Brown: Just recently in the last National Defense Authorization Act, there was a provision talking about the importance of military to military dialogues being established, potentially even with adversarial states like North Korea and Iran, for that exact reason. That better communication between militaries can lead to a reduction of miscalculation, and therefore adverse escalation of conflicts. We saw this just recently between the US and Iran. There was not direct communication perhaps between the US and Iran, but there was indirect communication, some of that over Twitter, about the intentions and the actions that different states might take. Iran and the US, in reaction to other events, and that may have helped deescalate the situation to where we find now. It’s far from perfect, but this is the type of thing that civil society can help encourage as we are dealing with new types of technology that can be as dangerous as nuclear weapons.

Lucas Perry: I just want to touch on what is actually going on now and actually being considered before we wrap things up. You talked about this a little bit before, Jared, you mentioned that currently in terms of AI policy, we are moving from principles and recommendations to the implementation of these into hard law. So building off of this, I’m just trying to get a better sense of where AI policy is, currently. What are the kinds of things that have been implemented, and what hasn’t, and what needs to be done?

Jared Brown: So there are some key decisions that have to be made in the near term on AI policy that I see replicating in many different government environments. One of them is about liability. I think it’s very important for people to understand the influence that establishing liability has for safety considerations. By liability, I mean who is legally responsible if something goes wrong? The basic idea is if an autonomous vehicle crashes into a school bus, who’s going to be held responsible and under what conditions? Or if an algorithm is biased and systematically violates the civil rights of one minority group, who is legally responsible for that? Is it the creator of the algorithm, the developer of the algorithm? Is it the deployer of that algorithm? Is there no liability for anyone at all in that system? And governments writ large are struggling with trying to assign liability, and that’s a key area of governance and AI policy that’s occurring now.

For the most part, it would be wise for governments to not provide blanket liability to AI, simply as a matter of trying to encourage and foster the adoption of those technologies; such that we encourage people to essentially use those technologies in unquestioning ways and sincerely surrender the decision making from the human to that AI algorithm. There are other key issue areas. There is the question of educating the populace. The example here I give is, you hear the term financial literacy all the time about how educated is your populace about how to deal with money matters.

There’s a lot about technical literacy, technology literacy being developed. The Finnish government has a whole course on AI that they’re making available to the entire EU. How we educate our population and prepare our population from a workforce training perspective matters a lot. If that training incorporates considerations for common AI safety problems, if we’re training people about how adversarial examples can affect machine learning and so on and so forth, we’re doing a better job of sensitizing the population to potential longterm risks. That’s another example of where AI policy is being developed. And I’ll throw out one more, which is a common example that people will understand. You have a driver’s license from your state. The state has traditionally been responsible for deciding the human qualities that are necessary, in order for you to operate a vehicle. And the same goes for state licensing boards have been responsible for certifying and allowing people to practice the law or practice medicine.

Doctors and lawyers, there are national organizations, but licensing is typically done at the state. Now if we talk about AI starting to essentially replace human functions, governments have to look again at this division about who regulates what and when. There’s sort of an opportunity in all democracies to reevaluate the distribution of responsibility between units of government, about who has the responsibility to regulate and monitor and govern AI, when it is doing something that a human being used to do. And there are different pros and cons for different models. But suffice it to say that that’s a common theme in AI policy right now, is how to deal with who has the responsibility to govern AI, if it’s essentially replacing what used to be formally, exclusively a human function.

Nicolas Moës: Yeah, so in terms of where we stand, currently, actually let’s bring some context maybe to this question as well, right? The way it has evolved over the past few years is that you had really ethical principles in 2017 and 2018. Let’s look at the global level first. Like at the global level, you had for example, the Montréal Declaration, which was intended to be global, but for mostly fundamental rights-oriented countries, so that that excludes some of the key players. We have already talked about dozens and dozens of principles for AI in values context or in general, right. That was 2018, and then once we have seen is more the first multi-lateral guidelines so we have the OECD principles, GPAI which is this global panel on AI, was also a big thing between Canada and France, which was initially intended to become kind of the international body for AI governance, but that deflated a bit over time, and so you had also the establishment of all this fora for discussion, that I have already mentioned. Political AI summits and the Global Forum on AI for Humanity, which is, again, a Franco-Canadian initiative like the AI for Good. The Global Governance of AI Forum in the Middle East. There was this ethically aligned design initiative at the IEEE, which is a global standards center, which has garnered a lot of attention among policymakers and other stakeholders. But the move towards harder law is coming, and since it’s towards harder law, at the global level there is not much that can happen. Nation states remain sovereign in the eye of international law.

So unless you write up an international treaty, it would be at the government level that you have to move towards hard law. So at the global level, the next step that we can see is these audits and certification principles. It’s not hard law, but you use labels to independently certify whether an algorithm is good. Some of them are tailored for specific countries. So I think Denmark has its own certification mechanism for AI algorithms. The US is seeing the appearance of values initiatives, notably by the big consulting companies, which are all of the auditors. So this is something that is interesting to see how we shift from soft law, towards this industry-wide regulation for these algorithms. At the EU level, where you have some hard legislative power, you had also a high level group on liability. Which is very important, because they basically argued that we’re going to have to update product liability rules in certain ways for AI and for internet of things products.

This is interesting to look at as well, because when you look at product liability rules, this is hard law, right? So what they have recommended is directly translatable into this legislation. And so you move on at this stage since the end of 2019, you have this hard law coming up and this commission white paper which really kickstarts the debates about what will the regulation for AI be? And whether it will be a regulation. So it could be something else like a directive. The high level expert group has come up with a self assessment list for companies to see whether they are obeying the ethical principles decided upon in Europe. So these are kind of soft self regulation things, which might eventually affect court rulings or something like that. But they do not represent the law, and now the big players are moving in, either at the global level with these more and more powerful labeling initiatives, or certification initiatives, and at the EU level with this hard law.

And the reason why the EU level has moved on towards hard law so quickly, is because during the very short campaign of the commission president, AI was a political issue. The techlash was strong, and of course a lot of industry was complaining that there was nothing happening in AI in the EU. So they wanted strong action and that kind of stuff. The circumstances that led the EU to be in pole position for developing hard law. Elsewhere in the world, you actually have more fragmented initiatives at this stage, except the OECD AI policy observatory, which might be influential in itself, right? It’s important to note the AI principles that the OECD has published. Even though they are not binding, they would actually influence the whole debate. Right? Because at the international level, for example, when the OECD had privacy principles, this became the reference point for many legislators. So some countries who don’t want to spend years even debating how to legislate AI might just be like, “okay, here is the OECD principles, how do we implement that in our current body of law?” And that’s it.

Jared Brown: And I’ll just add one more quick dynamic that’s coming up with AI policy, which is essentially the tolerance of that government for the risk associated with emerging technology. A classic example here is, the US actually has a much higher level of flood risk tolerance than other countries. So we engineer largely, throughout the US, our dams and our flood walls and our other flood protection systems to a 1-in-100 year standard. Meaning the flood protection system is supposed to protect you from a severe storm that would have a 1% chance of occurring in a given year. Other countries have vastly different decisions there. Different countries make different policy decisions about the tolerance that they’re going to have for certain things to happen. And so as we think about emerging technology risk, it’s important to think about the way that your government is shaping policies and the underlying tolerance that they have for something going wrong.

It could be as simple as how likely it is that you will die because of an autonomous vehicle crash. And the EU, traditionally, has had what they call a precautionary principal approach, which is in the face of uncertain risks, they’re more likely to regulate and restrict development until those risks are better understood, than the US, which typically has adopted the precautionary principle less often.

Nicolas Moës: There is a lot of uncertainty. A lot of uncertainty about policy, but also a lot of uncertainty about the impact that all these technologies are having. The dam standard, you can quantify quite easily the force of nature, but here we are dealing with social forces that are a bit different. I still remember quite a lot of people being very negative about Facebook’s chances of success, because people would not be willing to put pictures of themself online. I guess 10 years later, these people have been proven wrong. The same thing could happen with AI, right? So people are currently, at least in the EU, afraid of some aspects of AI. So let’s say an autonomous vehicle. Surrendering decision-making about our life and death to an autonomous vehicle, that’s something that’s maybe as technology improves, people would be more and more willing to do that. So yeah, it’s very difficult to predict, and even more to quantify I think.

Lucas Perry: All right. So thank you both so much. Do either of you guys have any concluding thoughts about AI policy or anything else you’d just like to wrap up on?

Jared Brown: I just hope the audience really appreciates the importance of engaging in the policy discussion. Trying to map out a beneficial forward for AI policy, because if you’re concerned like we are about the long term trajectory of this emerging technology and other emerging technologies, it’s never too early to start engaging in the policy discussion on how to map a beneficial path forward.

Nicolas Moës: Yeah, and one last thought, we were talking with Jared a couple of days ago about the number of people doing that. So thank you by the way, Jared for inviting me, and Lucas, for inviting me on the podcast. But that led us to wonder how many people are doing what we are doing, with the motivation that we have regarding these longer term concerns. That makes me think, yeah, there’s very few resources like labor resources, financial resources, dedicated to this issue. And I’d be really interested if there is, in the audience, anybody interested in that issue, definitely, they should get in touch. There are too few people right now with similar motivations, and caring about the same thing in AI policy to actually miss the opportunity of meeting each other and coordinating better.

Jared Brown: Agreed.

Lucas Perry: All right. Wonderful. So yeah, thank you guys both so much for coming on.

End of recorded material

FLI Podcast: Identity, Information & the Nature of Reality with Anthony Aguirre

Our perceptions of reality are based on the physics of interactions ranging from millimeters to miles in scale. But when it comes to the very small and the very massive, our intuitions often fail us. Given the extent to which modern physics challenges our understanding of the world around us, how wrong could we be about the fundamental nature of reality? And given our failure to anticipate the counterintuitive nature of the universe, how accurate are our intuitions about metaphysical and personal identity? Just how seriously should we take our everyday experiences of the world? Anthony Aguirre, cosmologist and FLI co-founder, returns for a second episode to offer his perspective on these complex questions. This conversation explores the view that reality fundamentally consists of information and examines its implications for our understandings of existence and identity.

Topics discussed in this episode include:

  • Views on the nature of reality
  • Quantum mechanics and the implications of quantum uncertainty
  • Identity, information and description
  • Continuum of objectivity/subjectivity

Timestamps: 

3:35 – General history of views on fundamental reality

9:45 – Quantum uncertainty and observation as interaction

24:43 – The universe as constituted of information

29:26 – What is information and what does the view of reality as information have to say about objects and identity

37:14 – Identity as on a continuum of objectivity and subjectivity

46:09 – What makes something more or less objective?

58:25 – Emergence in physical reality and identity

1:15:35 – Questions about the philosophy of identity in the 21st century

1:27:13 – Differing views on identity changing human desires

1:33:28 – How the reality as information perspective informs questions of identity

1:39:25 – Concluding thoughts

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Recently we had a conversation between Max Tegmark and Yuval Noah Harari where in consideration of 21st century technological issues Yuval recommended “Get to know yourself better. It’s maybe the most important thing in life. We haven’t really progressed much in the last thousands of years and the reason is that yes, we keep getting this advice but we don’t really want to do it…. I mean, especially as technology will give us all, at least some of us, more and more power, the temptations of naive utopias are going to be more and more irresistible and I think the really most powerful check on these naive utopias is really getting to know yourself better.

Drawing inspiration from this, our following podcast was with Andres Gomez Emillson and David Pearce on different views of identity, like open, closed, and empty individualism, and their importance in the world. Our conversation today with Anthony Aguirre follows up on and further explores the importance of questions of self and identity in the 21st century.

This episode focuses on exploring this question from a physics perspective where we discuss the view of reality as fundamentally consisting of information. This helps us to ground what actually exists, how we come to know that, and how this challenges our commonly held intuitions about there existing a concrete reality out there populated by conventionally accepted objects and things, like cups and people, that we often take for granted without challenging or looking into much. This conversation subverted many of my assumptions about science, physics, and the nature of reality, and if that sounds interesting to you, I think you’ll find it valuable as well. 

For those of you not familiar with Anthony Athony, he is a physicist that studies the formation, nature, and evolution of the universe, focusing primarily on the model of eternal inflation—the idea that inflation goes on forever in some regions of universe—and what it may mean for the ultimate beginning of the universe and time. He is the co-founder and associate scientific director of the Foundational Questions Institute and is also a co-founder of the Future of Life Institute. He also co-founded Metaculus, an effort to optimally aggregate predictions about scientific discoveries, technological breakthroughs, and other interesting issues.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate. These contributions make it possible for us to bring you conversations like these and to develop the podcast further. You can also follow us on your preferred listening platform by searching for us directly or following the links on the page for this podcast found in the description.

And with that, let’s get into our conversation with Anthony Aguirre.

So the last time we had you on, we had a conversation on information. Could you take us through the history of how people have viewed fundamental reality and fundamental ontology over time from a kind of idealism to then materialism to then this new shift that’s informed by quantum mechanics about seeing things as being constituted of information.

Anthony Aguirre: So, without being a historian of science, I can only give you the general impression that I have. And of course through history, many different people have viewed things very different ways. So, I would say in the history of humanity, there have obviously been many, many ways to think about the ultimate nature of reality, if you will, starting with a sense that the fundamental nature of external reality is one that’s based on different substances and tendencies and some level of regularity in those things, but without a sense that there are firm or certainly not mathematical regularities and things. And that there are causes of events, but without a sense that those causes can be described in some mathematical way.

So that changed obviously in terms of Western science with the advent of mechanics by Galileo and Newton and others showing that there are not just regularities in the sense that the same result will happen from the same causes over and over again, that was appreciated for a long time, but that those could be accessed not just experimentally but modeled mathematically and that there could be a relatively small set of mathematical laws that could then be used to explain a very wide range of different physical phenomena. I think that sense was not there before, it was clear that things caused other things and events caused other events, but I suspect the thinking was that it was more in a one off way, like, “That’s a complicated thing. It’s caused by a whole bunch of other complicated things. In principle, those things are connected.” But there wasn’t a sense that you could get in there and understand what that connection was analytically or intellectually and certainly not in a way that had some dramatic economy in the sense that we now appreciate from Galileo and Newton and subsequent physics.

Once we had that change to mathematical laws, then there was a question of, what are those mathematical laws describing? And the answer there was essentially that those mathematical laws are describing particles and forces between particles. And at some level, a couple of other auxiliary things like space and time are sort of there in the backdrop, but essentially the nature of reality is a bunch of little bits of stuff that are moving around under mathematically specified forces.

That is a sort of complete-ish description. I mean certainly Newton would have and have not said that that’s a complete description in the sense that, in Newton’s view, there were particles and those particles made up things and the forces told them exactly what to do, but at the same time there were lots of other things in Newton’s conception of reality like God and presumably other entities. So it’s not exactly clear how materialist Newton or Galileo for example were, but as time went on that became a more entrenched idea among hardcore theoretical physicists at least, or physicists, that there was ultimately this truest, most fundamental, most base description of reality that was lots of particles moving around under mathematical forces.

Now, that I think is a conception that is very much still with us in many senses but has taken on a much deeper level of subtlety given the advent of modern physics including particularly quantum mechanics and also I think a sort of modern recognition or sort of higher level maybe of sophistication and thinking about the relation between different descriptions of natural phenomena. So, let’s talk about quantum mechanics first. Quantum mechanics does say that there are particles in a sense, like you can say that there are particles but particles aren’t really the thing. You can ask questions of reality that entail that reality is made of particles and you will get answers that look like answers about particles. But you can also ask questions about the same physical system about how it is as a wave and you will get answers about how it is as a wave.

And in general in quantum mechanics, there are all sorts of questions that you can ask and you will get answers about the physical system in the terms that you asked those questions about. So as long as it is a sort of well-defined physical experiment that you can do and that you can translate into a kind of mathematical form, what does it mean to do that experiment? Quantum mechanics gives you a way to compute predictions for how that experiment will turn out without really taking a particular view on what that physical system is, is it a particle? Is it a wave? Or is it something else? And I think this is important to note, it’s not just that quantum mechanics says that things are particles and waves at the same time, it’s that they’re all sorts of things at the same time.

So you can ask how much of my phone is an elephant in quantum mechanics. A phone is totally not the same thing as an elephant, but a phone has a wave function, so if I knew the wave function of the phone and I knew a procedure for asking, “Is something an elephant?”, then I could apply that procedure to the phone and the answer would not be, “No, the phone is definitely not an elephant.” The answer would be, “The phone is a tiny, tiny, tiny, tiny, tiny bit an elephant.” So this is very exaggerated because we’re talking phones and elephants, all these numbers are so tiny. But the point is that I can interrogate reality in quantum mechanics in many different ways. I can formulate whatever questions I want and it will give me answers in terms of those questions.

And generally if my questions totally mismatched with what the system is, I’ll get, “No, it’s not really that.” But the no is always a, “No, the probability is incredibly tiny that it’s that.” But in quantum mechanics, there’s always some chance that if you look at your phone, you’ll notice that it’s an elephant. It’s just that that number is so tiny that it never matters, but when you’re talking about individual particles, you might find that that probability is significant, that the particle is somewhat different than you thought it was and that’s part of the quantum uncertainty and weirdness.

Lucas Perry: Can you unpack a little bit that quantum uncertainty and weirdness that explains, when you ask questions to quantum mechanics, you don’t ever get definite answers? Is that right?

Anthony Aguirre: Almost never. So there are occasions where you get definite answers. If you ask a question of a quantum system and it gives you an answer and then you ask that question immediately again, you’ll get the same answer for sure.

Lucas Perry: What does immediately mean?

Anthony Aguirre: Really immediately. So formally, like immediately, immediately. If time goes by between the two measurements then the system can evolve a little bit and then you won’t definitely get the same answer. That is if you have a quantum system, there is a particular set of questions that you can ask it that you will get definite answers to and the quantum state essentially is that set of questions. When you say an electron is here and it has this spin that is, it’s rotating around this direction, what you really mean is that there are a particular set of questions like, “Where are you? And what is your spin?” That if you asked them of this electron, you would get a definite answer.

Now if you take that same electron that I was going to ask those questions to and I would get a definite answer because that’s the state the electron is in, but you come along and ask a different question than one of the ones that is in that list, you will get an answer but it won’t be a definite answer. So that’s kind of the fundamental hallmark of quantum mechanics is that the list of questions you can ask to which you will get a definite answer is a finite one. And for a little particle it’s a very short list, like an electron is a very short list.

Lucas Perry: Is this because the act of observation includes interaction with the particle in such a way that it is changed by the interaction?

Anthony Aguirre: I think that’s a useful way to look at it in a sense, but it’s slightly misleading in the sense that as I said, if you ask exactly the right question, then you will get a definite answer. So you haven’t interfered with the system at all if you ask exactly the right question.

Lucas Perry: That means performing the kind of experiment that doesn’t change what the particle will be doing or its nature? Is that what that means?

Anthony Aguirre: Yes. It’s sort of like you’ve got a very, very particularly shaped net and you can cast it on something and if the thing happens to have exactly the right shape, your net just falls right over it and it doesn’t affect the thing at all and you say, “Oh, it has that property.” But if it has any other shape, then your net kind of messes it up, it gets perturbed and you catch something in your net. The net is your experiment, but you mess up the system while you’re doing it, but it’s not that you necessarily mess up the system, it’s that you’re asking it a question that it isn’t ready to answer definitively, but rather some other question.

So this is always true, but it’s kind of the crucial thing of reality. But the crucial thing about quantum mechanics is that that list is finite. We’re used to asking any question that… I’ve got a mug, I can ask, “Is it brown? Is it here? Is it there? How heavy?” Whatever question I think of, I feel like I can answer. I can ask the question and there will be an answer to it because whatever question I ask, if it’s a well-defined question before I ask it, the mug either has that property or it doesn’t. But quantum mechanics tells us that is true. But there’s only a finite number of answers there are built in to the object. And I can ask other questions, but I just can’t expect the answer to already be there in the sense that I’ll get a definite answer to it.

So this is a very subtle way that there’s this interactive process between the observer and the thing that’s observed. If we’re talking about something that is maximally specified that it has a particular quantum state, there is some way that it is in a sense, but you can’t ever find that out because as soon as you start asking questions of it, you change the thing unless you happen to ask exactly the right questions. But in order to ask exactly the right questions, you would already have to know what state it’s in. And the only way you can do that is by actually creating the system effectively.

So if I create an electron in a particular state in my lab, then I know what state it’s in and I know exactly what questions to ask it in order to get answers that are certain. But if I just come across an electron in the wild, I don’t know exactly what questions to ask. And so I just have to ask whatever questions I will and chances are it won’t be the right questions for that electron. And I won’t ever know whether they were or not because I’ll just get some set of answers and I won’t know whether those were the properties that the electron actually had already or if they were the ones that it fell into by chance upon my asking those questions.

Lucas Perry: How much of this is actual properties and features about the particles in and of themselves and how much is it about the fact that we’re like observers or agents that have to interact with the particles in some ways in order to get information about them? Such that we can’t ask too many questions without perturbing the thing in and of itself and then not being able to get definitive answers to other questions?

Anthony Aguirre: Well, I’m not sure how to answer that because I think it’s just that is the structure of quantum mechanics, which is the structure of reality. So it’s explicitly posed in terms of quantum states of things and a structure of observations that can be made or observables that can be measured so you can see whether the system has a particular value of that observable or not. If you take out the observation part or the measurement part, you just have a quantum state which evolves according to some equation and that’s fine, but that’s not something you can actually compare in any sense to reality or to observation or use in any way. You need something that will connect that quantum state and evolution equation to something that you can actually do or observe.

And I think that is something that’s a little bit different. You can say in Newtonian mechanics or classical physics, there’s something arguably reasonable about saying, “Here is the system, it’s these particles and they’re moving around in this way.” And that’s saying something. I think you can argue about whether that’s actually true, that that’s saying something. But you can talk about the particles themselves in a fairly meaningful way without talking about the observer or the person who’s measuring it or something like that. Whereas in quantum mechanics, it’s really fairly useless to talk about the wave function of something without talking about the way that you measure things or the basis that you operate it on and so on.

That was a long sort of digression in a sense, but I think that’s crucial because that I think is a major underlying change in the way that we think about reality, not as something that is purely out there, but understanding that even to the extent that there’s something out there, any sense of our experiencing that is unavoidably an interactive one and in a way that you cannot ignore the interaction, that you might have this idea that there’s an external objective reality that although it’s inconvenient to know, although on an everyday basis you might mess with it a little bit when you interact with it, in principle it’s out there and if you could just be careful enough, you could avoid that input from the observer. Quantum mechanics says, “No. That’s a fundamental part of it. There’s no avoiding that. It’s a basic part of the theory that reality is made up of this combination of the measurer and the state.”

I also think that once you admit, because you have to in this case that there is more to a useful or complete description of reality than just the kind of objective state of the physical system, then you notice that there are a bunch of other things that actually are there as well that you have to admit are part of reality. So, if you ask some quantum mechanical question, like if I ask, “Is my mug brown? And is it spinning? Where is it?” Those kinds of questions, you have to ask, what is the reality status of those questions or the categories that I’m defining and asking those questions? Like brownness, what is that? That’s obviously something that I invented, not me personally, but I invented in this particular case. Brownness is something that biological creatures and humans and so on invented. The sensation of brown is something that biological creatures maybe devised, the calling something brown and the word brown are obviously human and English creations.

So those are things that are created through this process and are not there certainly in the quantum state. And yet if we say that the quantum state on its own is not a meaningful or useful description of reality, but we have to augment it with the sorts of questions that we ask and the sort of procedure of asking and getting questions answered, then those extra things that we have to put into the description entail a whole lot of different things. So there’s not just the wave function. So in that simple example, there’s a set of questions and possible answers to those questions that the mug could give me. And there are different ways of talking about how mathematically to define those questions.

One way is to call them course grained states or macro states, that is, there are lots of ways that reality can be, but I want to extract out certain features of reality. So if I take the set of possible ways that a mug can be, there’s some tiny subset of all those different ways that the atoms in my mug could be that I would actually call a mug and a smaller subset of those that I would call a brown mug and a smaller subset of those that I would call a brown mug that’s sitting still and so on. So they’re kind of subsets of the set of all possible ways that a physical system with that many atoms and that mass and so on could be and when I’m asking questions about the mug, like are you brown? I’m asking, “Is the system in that particular subset of possibilities that I call a brown mug sitting on a table?”

I would say that at some level, almost all of what we do in interacting with reality is like that process. There’s this huge set of possible realities that we could inhabit. What we do are to divvy up that reality into many, many possibilities corresponding to questions that we might ask and answers to those questions we might ask and then we go and ask those questions of reality and we get sort of yes or no answers to them. And quantum mechanics is sort of the enactment of that process with full exactness that applies to even the smallest systems, but we can think of that process just on a day to day level, like we can think of, what are all the possible ways that the system could be? And then ask certain questions. Is it this? Is it that?

So this is a conception of reality that’s kind of like a big game of 20 questions. Every time we look out at reality, we’re just asking different questions of it. Normally we’re narrowing down the possibility space of how reality is by asking those questions, getting answers to it. To me a really interesting question is like, what is the ontological reality status of all those big sets of questions that we’re asking? Your tendency as a theoretical physicist is to say, “Oh, the wave function is the thing that’s real and that’s what actually exists, and all these extra things are just extra things that we made up and our globbed onto the wave function.” But I think that’s kind of a very impoverished view of reality, not just impoverished, but completely useless and empty of any utility or meaning because quantum mechanics by its nature requires both parts. The questions and the state. If you cut out all the questions, you’re just left with this very empty thing that has no applicability or meaning.

Lucas Perry: But doesn’t that tell us how reality is in and of itself?

Anthony Aguirre: I don’t think it tells you anything, honestly. It’s almost impossible to even say what the wave function is except in some terms. Like if I just write down, “Okay, the wave function of the universe is psi.” What did that tell me? Nothing. There’s nothing there. There’s no way that I could even communicate to you what the wave function is without reference to some set of questions because remember the wave function is a definite set of answers to a particular set of questions. So, I have to communicate to you the set of questions to which the wave function is the definite answer and those questions are things that have to do with macroscopic reality.

There’s no way that I can tell you what the wave function is if I were to try to communicate it to you without reference to those questions. Like if I say, “Okay, I’ve got a thingie here and it’s got a wave function,” and you asked me, “Okay, what is the wave function?” I don’t know how to tell you. I could tell you it’s mass, but now what I’m really saying is, here’s a set of energy measuring things that I might do and the amplitude for getting those different possible outcomes in that energy measuring thing is 0.1 for that one and 0.2 for that one and so on. But I have to tell you what those energy measuring things are in order to be able to tell you what the wave function is.

Lucas Perry: If you go back to the starting conditions of the universe, that initial state is a definite thing, right? Prior to any observers and defined coherently and exactly in and of itself. Right?

Anthony Aguirre: I don’t know if I would say that.

Lucas Perry: I understand that for us to know anything we have to ask questions. I’m asking you about something that I know that has no utility because we’re always going to be the observer standing in reference, right? But just to think about reality in and of itself.

Anthony Aguirre: Right. But you’re assuming that there is such a thing and that’s not entirely clear to me. So I recognize that there’s a desire to feel like there is a sort of objective reality that is out there and that there’s meaning to saying what that reality is, but that is not entirely clear to me that that’s a safe assumption to make. So it is true that we can go back in time and attribute all kinds of pretty objective properties of the universe and it certainly is true that it can’t be that we needed people and observers and things back at the beginning in order to be able to talk about those things. But it’s a very thorny question to me, that it’s meaningful to say that there was a quantum state that the universe had at the very beginning when I don’t know what operationally that means. I wouldn’t know how to describe that quantum state or make it meaningful other than in terms of measurable things which require adding a whole bunch of ingredients to the description of what the universe is.

To say that the universe started in this quantum state, to make that meaningful requires these extra ingredients. But we also recognize that those extra ingredients are themselves parts of the universe. So, either you have to take this view that there is a quantum state and somehow we’re going to get out of that in this kind of circular self-consistent way, a bunch of measuring apparatuses that are hidden in that quantum state and make certain measurements and then define the quantum state in this bootstrapping way. Or you have to say that the quantum state, and I’m not sure how different these things are, that the quantum state is part of reality, but in order to actually specify what reality is, there’s a whole bunch of extra ingredients that we have to define and we have to put in there.

And that’s kind of the view that I take nowadays, that there is reality and then there’s our description of reality. And as we describe reality, one of the things that we need to describe reality are quantum states and one of the things that we need to describe reality are coarse grainings or systems of measurement or bases and so on. There are all these extra things that we need to put in. And the quantum states are one of them and a very important one. And evolution equations are one of them in a very important one. But to identify reality with the state plus the fundamental laws that evolve that state, I just don’t think is quite the right way to think about it.

Lucas Perry: Okay, so this is all very illuminating for this perspective here that we’re trying to explore, which is the universe being simply constituted of information.

Anthony Aguirre: Yeah, so let’s talk about that. Once you let go, I think of the idea that there is matter that is made of particles and then there are arrangements of that matter and there are things that that matter does, but the matter is this intrinsically existing stuff. Once you start to think of there being the state, which is a set of answers to questions, that set of answers to questions is a very informative thing. It’s a kind of maximally informative thing, but it isn’t a different kind of thing to other sets of answers to questions.

That is to say that I’ve got information about something, kind of is saying that I’ve asked a bunch of questions and I’ve gotten answers about it so I know about it. If I keep asking enough incredibly detailed questions that maybe I’ve maximally specified the state of the cup and I have as much information as I can have about the cup. But in that process, as I ask more and more information, as I more and more specify what the cup is like, there’s no particular place in which the cup changes its nature. So I start out asking questions and I get more and more and more information until I get the most information that I can. And then I call that, that’s the most information I can get and now I’ve specified the quantum state of the cup.

But in that sense then a quantum state is like the sort of end state of a process of interrogating a physical system to get more and more information about it. So to me that suggests this interpretation that the nature of something like the quantum state of something is an informational thing. It’s identified with a maximal set of information that you can have about something. But that’s kind of one end of the spectrum, the maximal knowing about that thing end of the spectrum. But if we don’t go that far, then we just have less information about the thing. And once you start to think that way, well what then isn’t information? If the nature of things is to be a state and a set of questions and the state gives me answers to those questions, that’s a set of information. But as I said, that sort of applies to all physical systems that’s kind of what they are according to quantum mechanics.

So there used to be a sense, I think that there was a thing, it was a bunch of particles and then when I ask questions I could learn about that thing. The lesson to me of quantum mechanics is that there’s no space between the answers to questions that I get when I ask questions of a thing and the thing itself. The thing is in a sense, the set of answers to the questions that I have or could ask of it. It comes much less of a kind of physical tangible thing made of stuff and much more of a thing made out of information and it’s information that I can get by interacting with that thing, but there isn’t a thing there that the information is about. That notion seems to be sort of absent. There’s no need to think that there is a thing that the information is about. All we know is the information.

Lucas Perry: Is that true of the particles arranged cup wise or the cup thing that is there? Is it true of that thing in and of itself or is that basically just the truth of being an epistemic agent who’s trying to interrogate the cup thing?

Anthony Aguirre: Suppose the fundamental nature of reality was a bunch of particles, then what I said is still true. I can imagine if things like observers exist, then they can ask questions and they can get answers and those will be answers about the physical system that kind of has this intrinsic nature of bits of stuff. And it would still, I think, be true that most of reality is made of everything but the little bits of stuff, the little bits of stuff are only there at the very end. If you ask the very most precise questions you get more and more a sense of, “Oh they’re little bits of stuff.” But I think what’s interesting is that what quantum mechanics tells us is we keep getting more and more fine grained information about something, but then at the very end rather than little bits of stuff, it sort of disappears before our eyes. There aren’t any little bits of stuff there, there’s just the answers to the most refined sets of questions that we can ask.

So that’s where I think there’s sort of a difference is that there’s this sense in classical physics that underlying all these questions and answers and information is this other thing of a different nature, that is matter and it has a different fundamental quality to it than the information. And in quantum mechanics it seems to me like there’s no need to think that there is such a thing, that there is no need to think that there is some other different stuff that is non-informational that’s out there that the information is about because the informational description is complete.

Lucas Perry: So I guess there’s two questions here that come out of this. It’d be good if you could define and unpack what information exactly is and then if you could explore and get further into the idea of how this challenges our notion of what a macroscopic thing is or what a microscopic or what a quantum thing is, something that we believe to have identity. And then also how this impacts identity like cup identity or particle identity, what it means for people and galaxies and the universe to be constituted of information. So those two things.

Anthony Aguirre: Okay. So there are lots of ways to talk about information. There are also qualitative and quantitative ways to talk about it. So let me talk about the quantitative way first. So you can say that if I have a whole possibility space, like many different possibilities for the way something can be and then I restrict those possibilities to a smaller set of possibilities in some way. Either I say it’s definitely in one of these, or maybe there’s a higher probability that it’s one of these than one of those. I, in some way restrict rather than every possibility is the same, I say that some possibilities are more than others, they’re more likely or it’s restricted to some subset. Then I have information about that system and that information is precisely the gap between everything being just equally likely and every possibility being equally good and knowing that some of them are more likely or valid or something than others.

So, information is that gap that says it’s more this than some of those other things. So, that’s a super general way of talking about it but that can be made very mathematically precise. So if I say there are four bits of information stored in my computer, exactly what I mean is that there are a bunch of registers and if I don’t know whether they’re ones or zeros, I say I have no information. If I know that these four are 1101, then I’ve restricted my full set of possibilities to this subset in which those are 1101 and I have those four bits of information. So I can be very mathematically precise about this. And I can even say if the first bit, well I don’t know whether it’s 01 but it’s 75% chance that it’s zero and 25% chance that it’s one, that’s still information. It’s less than one bit of information.

People think of bits as being very discrete things, but you can have fractions of bits of information. There’s nothing wrong with that. The very general definition as restrictions away from every possibility being equally likely to some being more likely than others. And that can be made mathematically precise and is exactly the sort of information we talk about when we say, “My hard drive is 80 gigabytes in size or I have 20 megabits per second of internet speed.” It’s exactly that sort of information that we’re quantifying.

Now, when I think about a cup, I can think about the system in some way like, there are some number of atoms like 10 to the 25th or whatever, atoms or electrons and protons and neutrons or whatever, and there are then some huge, huge possible set of ways that those things can be and some tiny, tiny, tiny, tiny, tiny, tiny, almost infinitesimally tiny subset of those ways that can be are something that I would label a cup. So if I say, “Oh look, I have a cup”, I’m actually specifying a vast amount of information by saying, “Look, I have a cup.”

Now if I say, “Look, I have a cup and inside it is some dregs of coffee.” I’ve got a huge amount more information. Now, it doesn’t feel like a huge amount more of information. It’s just like, “Yeah, what did I expect? Dregs of coffee.” It’s not that big of a deal but physically speaking, it’s a huge amount of information that I’ve specified just by noticing that there are dregs of coffee in the cup instead of dregs of all kinds of other liquids and all kinds of other states and so on.

So that’s the quantitative aspect, I can quantify how much information is in a description of a system and the description of it is important because you might come along and you can’t see this cup. So I can tell you, there’s some stuff on my desk. You know a lot less about what’s on my desk than I do. So we have different descriptions of this same system and I’ve got a whole lot more information than you do about what’s on my desk. So the information, and this is an important thing, is associated with somebody’s description of the system, not necessarily a person’s, but any way of specifying probabilities of the system being in a subset of all of its possibilities. Whether that’s somebody describing it or whatever else, anything that defines probabilities over the states that the system could be in, that’s defining an amount of information associated with those probabilities.

So there’s that quantity. But there’s also, when I say, what is a mug? So you can say that the mug is made of protons, electrons, and neutrons, but of course pretty much anything in our world is made of protons, neutrons, and electrons. So what makes this a mug rather than a phone or a little bit of an elephant or whatever, is the particular arrangement that those atoms have. To say that a mug is just protons, neutrons, and electrons, I think is totally misleading in the sense that the protons, neutrons, and electrons are the least informative part of what makes it a mug. So there’s a quantity associated with that, the mug part of possibility space is very small compared to all of the possibilities. So that means that there’s a lot of information in saying that it’s a mug.

But there’s also the quality of what that particular subset is and that that particular subset is connected in various ways with things in my description, like solidity and mass and brownness and hardness and hollowness. It is at the intersection of a whole bunch of other properties that a system might have. So each of those properties I can also think of as subsets of possibility space. Suppose I take all things that are a kilogram, that’s how many protons, neutrons, and electrons they have. So, that’s my system. There’s a gazillion different ways that a kilogram of protons and neutrons and electrons can be where we could write down the very exponential numbers that it is.

Now, if I then say, “Okay, let me take a subset of that possibility space that are solid,” that’s a very small subset. There are lots of ways things can be gases and liquids. Okay, so I’ve made a small subset. Now let me take another property, which is hardness. So, that’s another subset of all possibilities. And where hardness intersect solid, I have hard, solid things and so on. So I can keep adding properties on and when I’ve specified enough properties, it’s something that I would give the label of a mug. So when I ask, what is a mug made of? In some sense it’s made of protons, neutrons, and electrons, but I think in a more meaningful sense, it’s made of the properties that make up it being a mug rather than some other thing. And those properties are these subsets or these ways of breaking up the state space of the mug into different possibilities.

In that sense, I kind of think of the mug as more made of properties with an associated amount of information with them and the sort of fundamental nature of the mug is that set of properties. And your reaction to that might be like, “Yes it has those properties but it is made of stuff.” But then if you go back and ask, what is that stuff? Again, the stuff is a particular set of properties. As deep as you go, it’s properties all the way down until you get to the properties of electrons, protons, and neutrons, which are just particular ways that those are and answers to those questions that you get by asking the right questions of those things.

And so that’s what it means to me to take the view that everything is made up of information in some way, it’s to take a view that there isn’t a separation between the properties that we intersect to say that it is something and the thing itself that has those properties.

Lucas Perry: So in terms of identity here, there was a question about the identity status of the cup. I think that, from hearing your talks previously, you propose a spectrum of subjectivity and objectivity rather than it being a kind of binary thing, because the cup is a set of questions and properties. Can you expand a little bit about the identity of the cup and what the meaning of the cup, given that it is constituted from this quantum mechanical perspective of just information about the kinds of questions and properties we may ask of cup-like objects.

Anthony Aguirre: I think there are different ways in which the description of a system or what it is that we mean when we say it is this kind of thing. “It is a cup” or the laws of physics or like, “There is this theorem of mathematics” or “I feel itchy”, are three fairly different statements. But my view is that we should not try to sort them into objective facts of the world and individual subjective or personal perspective kind of things.

But there’s really this continuum in between them. So when I say that there’s this thing on my desk that is a cup, there’s my particular point of view that sees the cup and that has a whole bunch of personal associations with the cup. Like I really like this one. I like that it’s made out of clay. I’ve had a lot of nice coffee out of it. And so I’m like … So that’s very personal stuff.

There’s cupness which is obviously not there in the fires of the Big Bang. It’s something that has evolved socially and via biological utility and all the processes that have led to our technological society and our culture having things that we store stuff in and liquids and-

Lucas Perry: That cupness though is kind of like the platonic idealism that we experience imbued upon the object, right? Because of our conventional experience of reality. We can forget the cupness experience is there like that and we identify it and like reify it, right? And then we’re like, “Oh, there’s just cupness there.”

Anthony Aguirre: We get this sense that there is an objectively speaking cup out there, but we forget the level of creation and formulation that has gone on historically and socially and so on to create this notion, this shared collective notion of cupness that is a creation of humanity and that we all carry around with us as part of our mental apparatus.

And then we say, “Oh, cupness is an objective thing and we all agree that this is a cup and the cup is out there.” But really it’s not. It’s somewhere in this spectrum, in the sense that there will certainly be cups, that it’s ambiguous whether it’s a cup or not. There will be people who don’t know what a cup is and so on.

It’s not like every possible person will agree even whether this is a brown cup. Some people may say, “Well actually I’d call that grayish.” It feels fairly objective, but obviously there’s this intersubjective component to it of all these ingredients that we invented going into making that a cup.

Now there are other things that feel more objective than that in a sense, like the laws of physics or some things about mathematics where you say like, “Oh, the ratio of the circumference to the diameter of a circle.” We didn’t make that up. That was there at the beginning of the universe. And that’s a longer conversation, but certainly that feels more objective than the cup.

Once it’s understood what the terms are, there’s sort of no disagreeing with that statement as long as we’re in flat space and so on. And there’s no sense in which we feel like that statement has a large human input. We certainly feel like that ratio was what it was and that we can express it as this series of fractions and so on. Long before there were people, that was true. So there’s a feeling that that is a much more objective thing. And I think that’s fair to say. It has more of that objectivity than a cup. But what I disagree with and find kind of not useful is the notion that there is a demarcation between things that are and aren’t objective.

I sort of feel like you will never find that bright line between an actually objective thing and a not actually objective thing. It will always be somewhere on this continuum and it’s probably not even a one dimensional continuum, but somewhere in this spectrum between things that are quite objective and things that are very, very subjective will be somewhere in that region, kind of everything that makes up our world that we experience.

Lucas Perry: Right. So I guess you could just kind of boil that down by saying that is true because all of the things are just constituted of the kinds of properties and questions that you’re interested in asking about the thing and the questions about the mathematical properties feel and seem more objective because they’re derived from primitive self-intuitive axioms. And then it’s just question wormholes from there, you know? That stand upon bedrock of slightly more and more dubious and relativistic and subjective questions and properties that one may or may not be interested in.

Anthony Aguirre: Yeah. So there are a couple of things I would say to that. One is that there’s a tendency among some people to feel like more objective is more true or more real or something like that. Whereas I think it’s different. And with more true and more real tends to come a normative sense of better. Like more true things are better things. There are two steps there from more objective to more true and from more true to better, both of which are kind of ones that we shouldn’t necessarily just swallow because I think it’s more complicated than that.

So more objective is different and might be more useful for certain purposes. Like it’s really great that the laws of physics are in the very objective side of the spectrum in that we feel like once we’ve found some, lots of different people can use them for all kinds of different things without having to refigure them out. And we can kind of agree on them. And we can also feel like they were true a long time ago and use them for all kinds of things that happened long ago and far away. So there are all these great things about the fact that they are on this sort of objective side of things.

At the same time, the things that actually matter to us in and that are like the most important things in the world to us are a totally subjective thing.

Lucas Perry: Love and human rights and the fact that other humans exist.

Anthony Aguirre: Right. Like all value at some level … I certainly see value as very connected with the subjective experience of things that are experiencing things and that’s purely subjective. Nobody would tell you that the subjective experience of beings is unimportant, I think.

Lucas Perry: But there’s the objectivity of the subjectivity, right? One might argue that the valence of the conscious experience is objective and that that is the objective ground.

Anthony Aguirre: So this was just to say that it’s not that objective is better or more valuable or something like that. It’s just different. And important in different ways. The laws of physics are super important and useful in certain ways, but if someone only knew and applied the laws of physics and held no regard or importance for the subjective experience of beings, I would be very worried about the sorts of things that they would do.

I think there’s some way in which people think dismissively of things that are less objective or that are subjective, like, “Oh, that’s just a subjective feeling of something.” Or, “That’s not like the true objective reality. Like I’m superior because I’m talking about the true objective reality” and I just don’t think that’s a useful way to think about it.

Lucas Perry: Yeah. These deflationary memes or jokes or arguments that love is an absurd reduction of a bunch of chemicals or whatever, that’s this kind of reduction of the supposed value of something which is subjective. But all of the things that we care about most in life, we talked about this last time that like hold together the fabric of reality and provide a ton of meaning, are subjective things. What are these kinds of things? I guess from the perspective of this conversation, it’s like they’re the kinds of questions that you can ask about systems and like how they will interact with each other and the kinds of properties that they have. Right?

Why are these particular questions and properties important? Well, I mean historically and evolutionarily speaking, they have particular functions, right? So it seems clearer and that I would agree with you that there’s the space of all possible questions and properties we can ask about things. And because of historical reasons, we care about a particularly arbitrary subset of those questions and properties that have functional use. And that is constituted of all of these subjective things like cups and houses and like love and like marriage and like rights.

Anthony Aguirre: I’m only, I think, objecting to the notion that those are somehow less real or sort of derivative of a description in terms of particles or fields or mathematics.

Lucas Perry: So the sense in which they’re less real is the sense in which we’ll get confused by the cupness being like a thing in the world. So that’s why I wanted to highlight that phenomenological sense of cupness before where the platonic idealism we see of the cupness is there in and of itself.

Anthony Aguirre: Yeah, I think I agree with that.

Lucas Perry: So what is it that defines whether or not something falls more on the objective side or more on the subjective side? Aren’t all the questions that we ask about macroscopic and fuzzy concepts like love and human rights and cups and houses and human beings … Don’t all those questions have definitive answers as long as the categories are coherent and properly defined?

Anthony Aguirre: I guess the way I see it is that there’s kind of a sense of how broadly shared through agents and through space and time are those categorizations or those sets of properties. Cupness is pretty widespread. It doesn’t go further back in time than humanity. Protozoa don’t use cups. So cupness is fairly objective in that sense. It’s tricky because there exists a subjectivity objectivity axis of how widely shared are the sets of properties and then there’s a different subjective objective axis of experience of my individual phenomenological experience of subjectivity versus an objective view of the world. And I think those are connected but they’re not quite the same sense of the subjective and objective.

Lucas Perry: I think that to put it on that axis is actually a little bit confusing. I understand that the more functional that a meme or a idea or concept is, the more widely shared it’s going to be. But I don’t think that just because more and more agents are agreeing to use some kind of concept like money, that that is becoming more objective. I think it’s just becoming more shared.

Anthony Aguirre: Yeah, that’s fine. I guess I would ask you what does more and less objective mean, if it’s not that?

Lucas Perry: Yeah, I mean I don’t know.

Anthony Aguirre: I’m not sure how to say something is more or less objective without referring to some sense like that, that it is more widespread in some way or that there are more sort of subjective views of the world that share that set of descriptions.

If we go back to the thinking about the probabilities in whatever sense you’re defining the probabilities and the properties, the more perspectives are using a shared set of properties, the more objectively defined are the things that are defined by those properties. Now, how to say that precisely like is this objectivity level 12 because 12 people share that set of properties and 50 people share these, so it’s objectivity level … I wouldn’t want to quantify it that way necessarily.

But I think there is some sort of sense of that, that the more different perspectives on the world use that same set of descriptions in order to interact with the world, the more kind of objective that set of descriptions is. Again, I don’t think that captures everything. Like I still think there was a sense in which the laws of physics were objective before anyone was talking about them and using them. It’s quite difficult. I mean when you think about mathematics-

Lucas Perry: Yeah, I was going to bring that up.

Anthony Aguirre: You know, if you think of mathematics as you’ve got a set of axioms and a set of rules for generating true statements out of those axioms. Even if you pick a particular set of rules, there are a huge number of sets of possible axioms and then each set of axioms, if you just grind those rules on those axioms, will produce just an infinite number of true statements. But grinding axioms into true statements is not doing mathematics, I would say.

So it is true that every true mathematical statement should have a sequence of steps that goes from the axioms to that true mathematical statement. But for every thing that we read in a math textbook, there’s an exponentially large number of other consequences of axioms that just nobody cares about because they’re totally uninteresting.

Lucas Perry: Yeah, there’s no utility to them. So this is again finding spaces of mathematics that have utility.

Anthony Aguirre: What makes certain ones more useful than others? So it seems like you know, e, Euler’s number is a very special number. It’s useful for all kinds of stuff. Obviously there are a continuous infinity of other numbers that are just as valid as that one. Right? But there’s something very special about that one because it shows up all the time, it’s really useful for all these different things.

So we’ve picked out that particular number as being special. And I would say there’s a lot of information associated with that pointing to e and saying, “Oh look, this number”, we’ve done something by that pointing. There’s a whole bunch of information and interesting stuff associated with pointing out that that number is special. So that pointing is something that we humans have done at some level. There wasn’t a symbol e or the notion of e or anything like that before humans were around.

Nonetheless, there’s some sense in which once we find e and see how cool it is and how useful it is, we say, “It was always true that e^ix = cos(x) + i sin(x). Like that was always true even though we just proved it a couple of centuries ago and so on. How could that have not been true? And it was always true, but it wasn’t always true that we knew that it was interesting.

So it’s kind of the interesting-ness and the pointing to that particular theorem as being an interesting one out of all the possible consequences that you could grind out of a set of axioms, that’s what was created by humanity. Now why the process by which we noticed that that was an interesting thing, much more interesting than many other things, how much objectivity there is to that is an interesting question.

Surely some other species that we encountered, almost surely, they would have noticed that that was a particularly interesting mathematical fact like we did. Why? That’s a really hard question to answer. So there is a subjective or non-objective part of it and that we as a species developed that thing. The interesting-ness of it wasn’t always there. We kind of created that interesting-ness of it, but we probably noticed its interesting-ness for some reason and that reason seems to go above and beyond the sort of human processes that noticed it. So there’s no easy answer to this, I think.

Lucas Perry: My layman’s easy answer would be just that it helps you describe and make the formalization and development of mathematical fields, right?

Anthony Aguirre: Sure. But is that helpfulness a fact of the world or a contingent thing that we’ve noticed as we’ve developed mathematics? How, among all species that ever could be imagined that exist, would almost all of them identify that as being useful and interesting or would only some of them and other ones have a very different concept of what’s useful and interesting? That’s really hard to know. And is it more or less objective in that sort of sense?

Lucas Perry: I guess, part of my intuition here is just that it has to do with the way that our universe is constituted. Calculus is useful for like modeling and following velocities and accelerations and objects in Newtonian physics. So like this calculus thing has utility because of this.

Anthony Aguirre: Right. But that which makes it useful, that feels like it’s something more objective, right? Like calculus is inheriting it objectiveness from the objective nature of the universe that makes calculus useful.

Lucas Perry: So the objectiveness is born of its relationship to the real world?

Anthony Aguirre: Yes, but again, what does that mean? It’s hard to put your finger at all on what that thing is that the real world has that makes calculus useful for describing it other than saying the real world is well-described by calculus, right? It feels very circular to say that.

Lucas Perry: Okay, so I’m thoroughly confused then about subjectivity and objectivity, so this is good.

Anthony Aguirre: I think we all have this intense desire to feel like we understand what’s going on. We don’t really understand how reality works or is constituted. We can nonetheless learn more about how it’s constituted and sitting on that razor’s edge between feeling pride and like, “Yes, we figured a bunch of stuff out and we really can predict the world and we can do technology and all these things”, all of which is true, while also feeling the humility that when we really go into it, reality is fundamentally very mysterious, I think is right, but difficult.

My frustration is when I see people purporting to fully understand things like, “Oh, I get it. This is the way that the world is.” And taking a very dismissive attitude toward thinking the world is not the way that they particularly see it. And that’s not as uncommon an attitude as one would like. Right? That is a lot of people’s tendency because there’s a great desire and safety in feeling like you understand this is the way that the world is and if only these poor benighted other souls could see it the way I do, they would be better off. That’s hard because we genuinely do understand much, much, much more about the world than we ever did.

So much so that there is a temptation to feel like we really understand it and I think at some level that’s more the notion that I feel like it’s important to push back against the notion that we get it all. Like you know, we more or less understand how the world is and how it works and how it fundamentally operates. Among some circles that’s more of the hubristic danger of falling into that then there is falling into the, “We don’t know anything.” Although there are other parts of society where there’s the other end too, the anti intellectual stances that like my conception of reality is just as good as yours that I just made up yesterday and we’re all equally good at understanding what the world is really like. Also quite dangerous.

Lucas Perry: The core draw away here for me is just this essential confusion about how to navigate this space of what it means for something to be more subjective and objective and the perspective of analyzing it through the kinds of questions and properties we would ask or be interested in. What you were just saying also had me reflecting a lot on people whose identity is extremely caught up in nationalism or like a team sport. It would seem to be trivial questions or properties you could ask. Like where did you happen to be born? Which city do you particularly have fondness towards? The identity of really being like an American or like really being a fan of the Patriots, people become just completely enthralled and engrossed by that. Your consciousness and ego just gets obliterated into identification with, “I am an American Patriot fan” and like there’s just no perspective. There is no context. When one goes way too far towards the objective, when one is mistaking the nature of things.

Anthony Aguirre: Yeah, there are all sorts of mistakes that we all make all the time and it’s interesting to see pathologies in all directions in terms of how we think about the world and our relation to it. And there are certain cases where you feel like if we could just all take a little bit more of an objective view of this, everyone would be so much better off and kind of vice versa. It takes a lot of very difficult skill to approach our complex world and reality in a way that we’re thinking about it in a useful way in this wide variety of different circumstances where sometimes it’s more useful to think about it more objectively and sometimes more subjectively or along all sorts of other different axes.

It’s a real challenge. I mean that’s part of what it is to be human and to engage in a worthy way with other people and with the world and so on, is to have to understand the more and less useful and skillful ways and lenses through which to look at those things.

At one time, almost everything we do is in error, but you also have to be forgiven because almost everything that you could do would be an error in some way from some standpoint. And sometimes thinking that the cup is objectively real is an error. Thinking that you made up the cup and invented it all on your own is also an error. So like the cup is real and isn’t real and is made up and isn’t made up. Any way you think about it is kind of wrong, but it’s also all kind of okay because you can still pick up the cup and take a drink.

So it’s very tricky. It’s a tricky reality we reside in, but that’s good. I think if everything was straightforward and obvious, that would be a boring world.

Lucas Perry: If everything were straightforward and obvious, then I would reprogram everyone to not find straightforward and obvious things boring and then we would not have this requirement to be in a complicated, un-understandable world.

Anthony Aguirre: I think there’s a Douglas Adams line that, “If you figure it all out, then immediately it all stops and starts again in a more complicated way that becomes more and more difficult. And of course this is something that’s happened many, many times before.”

Lucas Perry: I don’t know how useful it is now, but is talking about emergence here, is that something that’s useful, you think, for talking about identity?

Anthony Aguirre: Maybe. There’s a question of identity of what makes something one thing rather than another and then there’s another question of personal identity and sort of my particular perspective or view of the world, like what I identify as my awareness, my consciousness, my phenomenal experience of the world and that identity and how it persists through time. That identity and how it does or doesn’t connect with other ones. Like, is it truly its own island or should I take a more expansive view of it and is it something that persists over time?

Is there a core thing that persist over time or is it succession of things that are loosely identified or tightly identified with each other? I’m not sure whether all of the stuff that we’ve been talking about in terms of properties and questions and answers and states and things applies to that, but I’m not sure that it doesn’t either.

Lucas Perry: I think it does. Wouldn’t the self or like questions on personal identity be arbitrary questions in a very large state that we would be interested in asking particular questions about what constitutes the person? Is there a self? The self is like a squishy fuzzy concept like love. Does the self exist? Does love exist? Where do they fall on the subjective objective scale?

Anthony Aguirre: Well there are many different questions we could think about, but if I think of my identity through time, I could maybe talk about how similar some physical system is to the physical system I identify as me right now. And I could say I’ve sort of identified through time with the physical system that is really much like me and physics makes that easy because physical systems are very stable and this body kind of evolves slowly. But once you get to the really hard questions like suppose I duplicate this physical system in some way, is my identity one of those or two of those and what happens if you destroy the original one and, you know, those are genuinely confusing questions that I’m not sure that the sort of niceties of understanding emergence and the properties and so on, I’m not sure how much it has to say about it. I’m not sure that it doesn’t, but having thought a lot about the earlier identity questions, I feel no less confused.

Lucas Perry: The way in which emergence is helpful or interesting to me is the way in which … the levels of reality at which human beings conceptualize, which would be like quantum mechanics and then atomic science and then chemistry and then biology and so on.

We imagine them as being sort of stacked up on each other and that if reductionism is attractive to one, you would think that all the top layers supervene upon the nature of the very bottom layer, quantum mechanics. Which is true to some sense and you would want to say that there is fundamental brute identity facts about the like very, very, very base layer.

So you could say that there are such things as irreducible quantum atoms like maybe they reduce into other things but that’s an open question for now. And if we are confident about the identity of those things, there’s at least a starting place, you know from which we would have true answers about identity. Does that make sense?

Anthony Aguirre: Well the sentences make sense but I just largely don’t agree with them. And for all the reasons that we’ve talked about. I think there needs to be a word that is the opposite of emergence, like distillation or something, because I think it’s useful to think both directions.

Like I think it is certainly useful to be able to think about, I have a whole bunch of particles that do these things and then I have another description of them that glosses over say the individual actions of the particles, but creates some very reliable regularity that I can call a law like thermodynamics or like some chemical laws and so on.

So I think that is true, but it’s also useful to think of the other direction, which is we have complicated physical systems and by making very particular simplifications and carving away a lot of the complexity, we create systems that are simple enough to have very simple laws describe them. I would call that a sort of distillation process, which is one that we do. So we go through this process when we encounter new phenomena. We kind of look for ways that we can cut away lots of the complexity, cut away a lot of the properties, try to create a system that’s simple enough to describe in some mathematical way, using some simple attenuated set of concepts and so on.

And then often we take that set and then we try to work our way back up by using those laws and kind of having things that emerge from that lower level description. But I think both processes are quite important and it’s a little bit intellectually dangerous to think of what I’d call the distillation process as a truth-finding process. Like I’m finding these laws that were all already there rather than I’m finding some regularities that are left when I remove all this extra stuff and then forget that you’ve removed all the extra stuff and that when you go back from the so-called more fundamental description, to the emerged description, that you’re secretly sticking a lot of that stuff back in without noticing that you’re doing it.

So that’s sort of my point of view, that the notion that we can go from this description in terms of particles and fields and that we could derive all these emerged layers from it, I think it’s just not true in practice for sure, but also not really true in principle. There’s stuff that we have to add to the system in order to describe those other levels that we sort of pretend that we’re not adding. We say, “Oh, I’m just assuming this extra little thing” but really you’re adding concepts and quantities and all kinds of other apparatus to the thing that you started with.

Lucas Perry: Does that actually describe reality then or does that give you an approximation, the emergent levels?

Anthony Aguirre: Sure. It just gives you answers to different questions than the particle and field level does.

Lucas Perry: But given that the particle and field level stuff is still there, doesn’t that higher order thing still have the capacity for like strange quantum things to happen and that would not be accounted for in the emergent level understanding and therefore it would not always be true if there was some like entanglement or like quantum tunneling business going on?

Anthony Aguirre: Yeah, I think there’s more latitude perhaps. The statistical laws and statistical mechanics are statistical laws. They’re totally exact, but the things that they make are statistical descriptions of the world that are approximate in some way. So it’s like they’re approximate but they’re approximate in a very, very well defined way. I mean it’s certainly true that the different descriptions should not contradict each other. If you have a description of a macroscopic phenomenon that doesn’t conserve energy, then that’s a sort of wrongheaded way to look at that system.

Lucas Perry: But what if that macroscopic system does something quantum? Then the macroscopic description fails. So then it’s like not true or it’s not predictive.

Anthony Aguirre: Yeah, not true I think is not quite the right, like that description let you down in that circumstance. Everything will let you down sometimes.

Lucas Perry: I understand what you’re saying. The things are like functional at the perspective and scales at which you’re interested. And this goes back to kind of this more epistemological agent centered view of science and like engaging in the world that we were talking about earlier. I guess, for a very long time the way that I viewed science as explaining the intrinsic nature of the physical, but really it’s not doing that because all of these things are going to fail at different times. They just have strong predictive power. And maybe it was very wrong of me early on to ever think that science was describing the intrinsic nature of the physical.

Anthony Aguirre: I don’t think it’s entirely wrong. You do get something through distilling more and going more toward the particle and field level in that once you specify something that the quantum mechanics and the standard model of particle physics say gives you a well-defined answer to, then you feel really sure that you’re going to get that result. You do get a dramatically higher level of confidence from doing that distilling process and idealizing a system enough that you can actually do the mathematics to figure out what should happen according to the fundamental physical laws, as we describe them in terms of particles and fields and so on.

So I think that’s the sense in which they’re extra true or real or fundamental, is that you get that higher level of confidence. But at the cost that you had to shoehorn your physical system, either add in assumptions or cutaway things in order to make it something that is describable using that level of description.

You know, not everyone will agree with the way that I’m characterizing this. I think you’ll talk to other physicists and they would say, “Yes they are approximations, but really there’s this objective description and you know, there’s this fundamental description in terms of particles and fields and we’re just making different approximations to it when we talk about these other levels.”

I don’t think there’s much of a difference operationally in terms of that way of talking about it and mine. But I think this is a more true-to-life description of reality, I guess.

Lucas Perry: Right. So I mean there are the fundamental forces and the fundamental forces are what evolve everything. And you’re saying that the emergent things have to do with adding and cutting away things so that you can like simplify the whole process, extract out these other rules and laws which are still highly predictive. Is that all true to say so far?

Anthony Aguirre: Somewhat. I think it’s just that we don’t actually do any of that. We very, very, very, very rarely take a more fundamental set of rules and derive.

Lucas Perry: Yeah, yeah, yeah. That’s not how science works.

Anthony Aguirre: Right. We think that there is such a process in principle.

Lucas Perry: Right.

Anthony Aguirre: But not in practice.

Lucas Perry: But yeah, understanding it in principle would give us more information about how reality is.

Anthony Aguirre: I don’t believe that there is in principle that process. I think the going from the more fundamental level to the “emerged” can’t be done without taking input that comes from the emerged level. Like I don’t think you’re going to find the emerged level in the fundamental description in and of itself without unavoidably taking information from the emerged level.

Lucas Perry: Yeah. To modify the-

Anthony Aguirre: Not modifying but augmenting. Augmenting in the sense that you’re adding things like brownness that you will never find, as far as you will ever look, you will never find brownness in the wave function. It just isn’t there.

Lucas Perry: It’s like you wouldn’t find some kind of chemical law or property in the wave function.

Anthony Aguirre: Any more than you’ll find here or now in the state of the universe. Like they’re just not there. Those are things, incredibly useful things, important things like here and now are pretty central to my description of the world. I’m not going to do much without those, but they’re not in the wave function and they’re not in the boundary conditions of the universe and it’s okay that I have to add those. There’s nothing evil in that doing that.

Like I can just accept that I have to have some input from the reality that I’m trying to describe in order to use that fundamental description. It’s fine. But like, there’s nothing to be worried about, there’s nothing anti-scientific about that. It’s just the idea that someone’s going to hand you the wave function and you’ll derive that the cup is brown here and now is crazy. It just doesn’t work that way. Not in there. That’s my view anyway.

Lucas Perry: But the cup being brown here and now is a consequence of the wave function evolving an agent who then specifies that information, right?

Anthony Aguirre: Again, I don’t know what that would look like. Here’s the wave function. Here’s Schrodinger’s equation and the Hamiltonian. Now tell me is the brown cup in front of or in back of the tape measure? It’s not in there. There’s all colored cups and all colored tape measures and all kinds of configurations. They’re all there in the wave function. To get an answer to that question, you have to put in more information which is like which cup and where and when.

That’s just information you have to put in, in order to get an answer. The answer is not there to begin with and that’s okay. It doesn’t mean that there’s something wrong with the wave function description or that you’ve got the wrong Hamiltonian or the wrong Schrodinger’s equation. It just means that to call that a complete description of reality, I think that’s just very misleading. I understand what people intend by saying that everything is just the wave function and the Schrodinger equation. I just think that’s not the right way to look at it.

Lucas Perry: I understand what you’re saying, like the question only makes sense if say that wave function has evolved to a point that it has created human beings who would specify that information, right?

Anthony Aguirre: None of those things are in there.

Lucas Perry: They’re not in the primordial state but they’re born later.

Anthony Aguirre: Later is no different from the beginning thing. It’s just a wave function. There’s really no difference in quality between the wave function now and at the beginning. It’s exactly the same sort of entity. There’s no more, no less in it than there was then. Everything that we ascribe to being now in the universe that wasn’t there at the beginning are additional ingredients that we have to specify from our position, things like now and here and all those properties of thing.

Lucas Perry: Does the wave function just evolve the initial conditions? Are the initial conditions contained within the wave function?

Anthony Aguirre: Well, both in the sense that if there’s such a thing as the wave function of the universe, and that’s a whole nother topic as to whether that’s a right-minded thing to say, but say that there is, then there’s exactly the same information content to that wave function at anytime and that given the wave function at a time, and the Schrodinger equation, we can say what the wave function is at any other time. There’s nothing added or subtracted.

One is just as good as the other. In that sense, there’s no more stuff in the wave function “now” than there was at the beginning. It’s just the same. All of the sense in which there’s more in the universe now than there was at the Big Bang has to do with things that we specify in addition to the wave function, I would say, that constitute the other levels of reality that we interact with. They’re extra information that we’ve added to the wave function from our actual experience of reality.

If you take a timeline of all possible times, without pointing to any particular one, there’s no time information in that system, but when I say, “Oh look, I declare that I’m now 13.8 billion years from the big bang,” you’re pointing to a particular time by associating with my experience now. By doing that pointing, I’m creating information in just the same way that we’ve described it before. I’m making information by picking out a particular time. That’s something new that I’ve added to what was a barren timeline before I’ve added now something.

There’s more information than there was before by the fact of my pointing to it. I think most of the world is of that nature that it is made of information created by our pointing to it from our particular perspective here and now in the universe seeing this and that and having measured this and that and the other thing. Most of the universe I contend is made of that sort of stuff, information that comes from our pointing to it by seeing it, not information that was there intrinsically in the universe, which is, I think, radical in a sense, but I think is just the way reality is, and that none of that stuff is there in the wave function.

Lucas Perry: At least the capacity is there for it because the wave function will produce us to then specify that information.

Anthony Aguirre: Right, but it produces all kinds of other stuff. It’s like if I create a random number generator, and it just generates a whole list of random numbers, if I look at that list and find, “Oh look, there’s one, one, one, one, one, one, one, one, one,” that’s interesting. I didn’t see that before. By pointing to that, you’ve now created information. The information wasn’t there before. That’s largely what I see the universe as, and in large part, it’s low information in a sense.

I’m hemming and hawing because there are ways in which it’s very high information too, but I think most of the information that we see about the world is information of that type that exists because we very collectively as beings that have evolved and had culture and all the stuff that we’ve gone through historically we are pointing to it.

Lucas Perry: So connecting this back to the spectrum of objectivity and subjectivity, as we were talking for a long time about cups and as we talked about on the last podcast about human rights for example as being a myth or kinds of properties which we’re interested in ascribing to all people, which people actually intrinsically lack. People are numerically distinct over time. They’re qualitatively distinct, very often. There’s nothing in the heart of physics which gives us the kinds of properties.

Human rights, for example, are supposed to be instantiating in us. Rather, it’s a functional convention that is very useful for producing value. We’ve specified this information that all human beings share unalienable rights, but as we enter the 21st century, the way that things are changing is that the numerical and qualitative facts about being a human being that have held for thousands of years are going to begin to be perturbed.

Anthony Aguirre: Yes.

Lucas Perry: You brought this up by saying… You could either duplicate yourself arbitrarily, whether you do that physically via scans and instantiating actual molecular duplicates of yourself. You could be mind uploaded, and then you could have that duplicated arbitrarily. For hundreds of thousands of years, your atoms would cycle out every seven years or so, and that’s how you would be numerically distinct, and qualitatively, you would just change over your whole lifetime until you became thermodynamically very uninteresting and spread out and died.

Now, there’s this duplication stuff. There is your ability to qualitatively change yourself very arbitrarily. So at first, it will be through bioengineering like designer babies. There’s all these interesting things and lots of thought experiments that go along with it. What about people who have their corpus callosum cut? You have the sense of phenomenological self, which is associated with that. You feel like you’re a unitary subject of experience.

What happens to your first person phenomenological perspective if you do something like that? What about if you create a corpus callosum bridge to another person’s brain, what happens to the phenomenological self or identity? Science and AI and increasing intelligence and power over the universe will increasingly give us this power to radically change and subvert our commonly held intuitions about identity, which are constituted about the kinds of questions and properties which we’re interested in.

Then also the phenomenological experience, which is whether or not you have a strong sense of self, whether or not you are empty of a sense of self or whether or not you feel identified with all of consciousness and the whole world. There’s spectrums and degrees and all kinds of things around here. That is an introduction to the kind of problem that this is.

Anthony Aguirre: I agree with everything you said, but you’re very unhelpfully asking all the super interesting questions-

Lucas Perry: At once.

Anthony Aguirre: … which are all totally impossible to solve. No, I totally agree. We’ve had this enviable situation of one mind equals one self equals one brain equals one body that has made it much easier to accord to that whole set of things, all of which are identified with each other a set of rights and moral values and things like that.

Lucas Perry: Which all rest on these intuitions, right? That are all going to change.

Anthony Aguirre: Right.

Lucas Perry: Property and rights and value and relationships and phenomenological self, et cetera.

Anthony Aguirre: Right, so we either have a choice of trying to maintain that identity, and remove any possibility of breaking some of those identities because it’s really important to keep all those things identified, or we have to understand some other way to accord value and rights and all those things given that the one-to-one correspondence can break. Both of those are going to be very hard, I think. As a practical matter, it’s simply going to happen that those identifications are going to get broken sooner or later.

As you say, if we have a sufficient communication bandwidth between two different brains, for example, one can easily imagine that they’ll start to have a single identity just as the two hemispheres of our brain are connected enough that they generally have what feels like a single identity. Even though if you cut it, it seems fairly clear that there are in some sense two different identities. At minimum, technologically, we ought to be able to do that.

It seems very likely that we’ll have machine intelligence systems whose phenomenological awareness of the world is unclear but at least have a concept of self and a history and agency and will be easily duplicatable. They at least will have to face the question of what it means when they get duplicated because that’s going to happen to them, and they’re going to have to have a way of dealing with that reality because it’s going to be their everyday reality that they can be copied, ad infinitum, and reset and so on.

If they’re functioning is it all like a current digital computer. There are also going to be even bigger gulfs than there are now between levels of capability and awareness and knowledge and perhaps consciousness. We already have those, and we gloss over them, and I think that’s a good thing in according people fundamental human rights. We don’t give people at least explicitly legally more rights when they’re better educated and wealthier and so on, even if in practice they do get more.

Legally, we don’t, even though that range is pretty big, but if it gets dramatically bigger, it may get harder and harder to maintain even that principle. I find it both exciting and incredibly daunting because the questions are so hard to think of how we’re going to deal with that set of ethical questions and identity questions, and yet we’re going to have to somehow. I don’t think we can avoid them. One possibility is to decide that we’re going to attempt to never break those sets of identities.

I sometimes think about Star Wars. They’ve got all this amazing technology, right? They can zip across the universe, but then it’s incredibly primitive in others. Their computers suck and all of their AI is in robots. One robot, one brain, one consciousness, they’re all identical. So I have this theory of Star Wars that behind the scenes, there’s some vast intelligence that’s maybe baked into the midi-chlorians or whatever, that prevents more weird, complicated things like powerful AI or powerful software systems.

It’s like overseer that keeps everything just nicely embodied in individual physical agents that do stuff. Obviously, that’s not part of the Star Wars canon, but that’s how it plays out, right? Even though there’s all this high tech, they’ve neatly avoided all of these annoying questions and difficult questions by just maintaining that one-to-one correspondence. That is in some level an option. That is something that we could try to do because we might decide that not doing that leads to such a big open can of worms that we will never be able to deal with, that we better maintain that one-to-one correspondence.

My guess is that even if that was a good idea, we wouldn’t be coordinated enough or foresightful enough to maintain that.

Lucas Perry: There would be optimization pressures to do otherwise.

Anthony Aguirre: There would. It would take some almost God-like entity to keep it from happening. Then we have to ask, “Where is the theory of what to value and how do we value individual people? Where is that next going to come from?” That last time, at least in the West, it was born out of enlightenment philosophy and coming out of, honestly, I think Judeo Christian religion. That’s very tied together. Is there something that is going to come out of some other major philosophical work? I’m not sure that I see that project happening and unfolding.

Lucas Perry: Formally right now?

Anthony Aguirre: Yes. Do you?

Lucas Perry: No, I don’t see that, but I think that there are the beginnings of that. I think that I would propose and others, and I don’t know how others would feel, but that foundation instead of enlightenment philosophy about rights based off the immutable rights that beings have given their identity class, it would be in the future a sufficiently advanced science of consciousness would just value all of the different agents based off the understanding of the degrees and kinds of experience and awareness and causal implications that it could have in the world.

I would just do a kind of consequentialism and so far as it would be possible. Then I guess the interesting part would be where consequentialism fails because it’s computationally intractable. You would want to invent other kinds of things that would stand in the way, but I feel optimistic that the very very smart things in the future could do something like that. I would ground it on consciousness.

Anthony Aguirre: I mean, there are so many questions even if you take the view that you’re trying to maximize high quality phenomenological experience moments or whatever, I think there’s so many things that that leaves either problematic or unanswered.

Lucas Perry: Like what?

Anthony Aguirre: What about beings that may have super high levels of awareness and consciousness but not positive or negative valence? Do they count or not? Does it mean anything that experiences are connected through time in some large set of personal identity or is a bunch of disconnected experiences just as good as other ones? There may be a positive valence to experience that comes out of its aggregation over time and its development and evolution over time that is absent from any individual one of those moments, all of which may be less good than a drug trip or just eating a candy bar, but like a life of eating candy bars versus a less pleasurable but more fulfilling life. How do we quantify those things against each other?

Lucas Perry: The repugnant conclusion, what do we think about the repugnant conclusion that’s like kind of that. A quick definition, the repugnant conclusion is how you would compare a very small, limited number of amazing experiences against an astronomically large number of experiences which are just barely better than non-existence, very, very, very, very slightly better than a valence of zero. If all of those added up to be just like a fraction of a hair larger than the few really, really good experiences, which world should you pick? Hedonic consequentialism would argue that you should pick the astronomically large number of experiences that are barely worth living and that to some is repugnant.

Anthony Aguirre: I think it’s safe to say that there is no proposal on the table that everyone feels like, “Oh yeah, that’s the way to do it.” I’d be profoundly suspicious of anything that claimed to be that. So I don’t think there are going to be easy answers, but it may be that there’s at least a framework from which we can stand to get into some of the complexities. That may be a very different framework than the one that we have now.

Where that will come from and how we would transition to it and what that would mean and what kind of terrible and wonderful consequences that might have, I think, certainly nobody knows. It’s not even clear that anybody has a sense of what that will look like.

Lucas Perry: I think that one of the last questions here and perspectives that I’d like to get from you are how this perspective on how human perspectives on identity changes what we want. So this one-to-one correspondence, one body, one brain, one phenomenological self that feels like its consciousness is its own and is like an Island, how that experience changes what human beings want in the 21st century with regards to upgrading or merging with AI and technology or with cryonics.

If everything and everyone is numerically and quantitatively completely impermanent such that no matter what kind of technological intervention we do in 100 to 200 years, everyone will either be thermodynamically scattered or so completely and fundamentally changed that you won’t be able to recognize yourself and the ethical implications of this and how it changes what kinds of futures people want. I’m curious to know if you have any thoughts of this holding in the perspective in your head of Max’s book Life 3.0 and the kinds of world trajectories that people are interested in from there.

Anthony Aguirre: That’s a big question. That’s hard to know how to approach. I think there are many genuinely qualitatively different possible futures, so I don’t think there is a way that things are going to turn out in terms of all these questions. I think it’s going to be historically contingent and there are going to be real choices that we make. I’m of two minds in this, and that I do believe in something like moral progress and that I feel like there’s an agreed sense that we feel now that things that we did in the past were morally incorrect, and that we’ve learned new moral truths that allow us to live in a better way than we used to.

At the same time, I feel like there are ways that society has turned out. It could have been that the world became much more dominated by Eastern philosophy than Western philosophy say. I think we would probably still feel like we had made moral progress through that somewhat different history as we’ve made moral progress through this history that we did take. I’m torn between a feeling that there is real moral progress, but that progress is not toward some predefined optimal moral system that we’re going to progress towards and find, but that the progress will also have a whole bunch of contingent things that occur through our society’s evolution through chance or through choice that we make, and that there genuinely are very different paths that we have ahead of us.

No small part of that will be our current way of thinking in our current values and how we tried to keep things aligned with those current values. I think there will be a strong desire to maintain this one-to-one connection between identity and moral value and mind and so on, and that things that violate that, I think, are going to be seen as threats. They are profound threats to our current moral system. How that will play out is really unforeseeable.

Will those be seen as threats that we eventually just say actually, they weren’t that scary after all and we just have to adjust? Will they be threats that are just pushed aside by the tide of reality and technology? Will they be threats that we decide are so threatening that we want to hold on and really solidify and codify this relation? I think those are all possibilities, and it’s also possible that I’m wrong and that there will just be this smooth evolution where our connection between our phones will become brain interfaces, and we’ll just get more and more dr-individualized in some smooth way, and that people will sound an alarm that that’s happening and no one will care. That’s also quite possible, whether that alarm is appropriate or not.

Lucas Perry: They just look at the guy sounding the alarm, and then stick the plug in their head.

Anthony Aguirre: Right. So it’s good for us all to think deeply about this and think about what preferences we have, because where we go will end up being some combination of where the technology goes and what preferences we choose and how we express them. Part of the direction will be determined by those that express their preferences convincingly and loudly and have some arguments for them and that defend them and so on. That’s how our progress happens.

Some set of ideas prevails, and we can hope that it’s a good one. I have my own personal prejudices and preferences about some of the questions that are, for example, asked in Max’s book about what futures are most preferred. At some point, I may put more time into developing those into arguments and see if I still feel those preferences or believe them. I’m not sure that I’m ready to do that at the moment, but I think that’s something that we all have to do.

I mean, I think, I do feel a little bit like step one was to identify some of the thorny questions that we’re going to have to answer and talk about how we have to have a conversation about those things and how difficult those questions are going to be, but at some point, we’re actually going to have to start taking positions on some of those questions. I think that’s something that largely nobody is doing now, but it’s not clear how much time we have before we need to have thought about them and actually taking a position on them and argued it out and had some positions prevail.

The alternative to that is this random process driven by the technology and the other social forces that are at work like moneyed interests and social imperatives and all those sorts of things. Having these questions decided by those forces rather than reflection and thinking and debate among people who are trying really hard to think about these questions, that seems like not such a great idea.

Lucas Perry: I agree. That’s my felt sense too. We went from talking to information about emergence to identity. I think it would be really helpful if you could tie together in particular the information discussion with how that information perspective and discussion can inform these questions about identity in the 21st and 22nd centuries.

Anthony Aguirre: I guess one way that the identity and the information parts are connected is I made this argument that a lot of what the world is is information that is associated with a particular vantage point and a particular set of pointings to things that we have as an agent, as a prospective in the world. I think that there’s a question as to whether there is moral value in that. There’s a real sense that every person views the world from their own perspective, but I think it’s more real than that and that when you identify a view of the world and all that comes with that, it really is creating a world in a sense.

There’s some of the world that’s objective at various different levels, but a lot of what the world is is what is created by an individual standpoint and vantage point that is seeing that world and interacting with it. I do wonder is there some sense of grounding some level of value on that creative act? On the fact that as a individual agent that understands and exists over time and assembles this whole sophisticated, complicated view of the world that has all this information content to it, should we not accord some high level of normative value to that, that it’s not just a way to describe how the world is made, but what is valuable in the world be connected with that creation process by the individual standpoint?

That may be a seed for developing some bridge between the view of reality as information, information as something that is largely connected with a vantage point and a vantage point as something that is personal self identity and as connected now with individual consciousness and mind and brain and so on. Is there a way to inhere value in that ability to create lots of sophisticated information through interaction with the world that would bring value to also not just individuals but sets of individuals that together create large amounts of information?

That’s something that develop further, I think. That link that view of how the world is constituted is this interaction between the agent of the world. Maybe there’s something there in terms of a seed for how to ground moral value in a way that’s distinct from the identification that we do now.

Lucas Perry: I guess there’s also this facet where this process of agents asking particular questions and specifying certain kinds of properties that they care about and pointing to specific things, that that process is the same process of construction of the self or the egocentric phenomenal experience and conceptual experience of self. This is all just information that you specify as part of this identification process and the reification process of self.

It would be very good if everyone were mindful enough about thinking about where on the spectrum of objectivity and subjectivity these things they take to ultimately be part of self actually fall, and what are the questions and properties and features they’re actually constituted of? Then what will happen is likely, your commonly held intuitions will largely be subverted. Maybe you’ll still be interested in being a strong nationalist, but maybe you’ll have a better understanding of what it’s actually constituted of.

That’s the Buddhist perspective. I’m just articulating it, I think, through the language and concepts that you’ve provided, where one begins seeing conventional reality as how it’s actually being formulated and no longer confuses the conventional as the ultimate.

Anthony Aguirre: There’s a lot of sophistication, I think, to Buddhist moral thinking, but a lot of it is based around this notion of avoiding suffering and sentient beings. I think there’s so many different sorts suffering and there’s so many different levels that just avoiding suffering ends up implying a lot of stuff, because we’re very good at suffering when our needs are not met. Avoiding suffering is very, very complicated because our unmet needs are very, very complicated.

The view that I was just pointing to is pointing towards some level of value that is rather distinct from suffering because one can imagine a super sophisticated system that has this incredibly rich identity and incredibly rich view of the world and may suffer or not. It’s not clear how closely connected those things are. It’s always dangerous when you think about how to ground value because you realize that any answer you have to that question leave certain things out.

If we try to ground value in sophistication of worldview or something like that, then do we really not value the young kids? I mean, that seems monstrous. Even though they have a pretty simple minded worldview, that seems wrong. I think there are no easy answers to this, but that’s just a sense in which I think I do feel instinctively that there ought to be some level of moral value accorded to beautifully complex, self-aware systems in the world that have created this sophisticated universe through there being experience and existence and interaction with the world.

That ought to count for something. Certainly, it’s not something we want to just blindly destroy, but exactly why we don’t want to destroy it. The deep reason, I think, needs to be investigated. That seems true to me, but I can’t necessarily defend why.

Lucas Perry: That’s really good and, I think, an excellent place to wrap up concluding thoughts. My ethics is so sentience focused that that is an open question, and I would want to pursue deeply why that seems intrinsically valuable for me. Just the obvious direct answer would be because it allows or does not allow for certain kinds of conscious experiences, which is what matters. That is not intrinsically valuable, but it is valuable based off of its relationship to consciousness obviously.

Of course, that’s up for debate and to be argued about. Given uncertainty about consciousness, the view which you propose may be very skillful for dealing with the uncertainty. This is one of the most interesting conversations for me. Like you said, I think it’s very neglected. There’s no one working on it formally. Maybe it’s just too early. I think that it seems like there’s a really big role for popular media and communication to explore these issues.

There are so many good thought experiments in philosophy of personal identity and elsewhere that could be excellent and fun for the public. It’s not just that it’s philosophy that is becoming increasingly needed, but it’s also fun and interesting philosophy. Much of it like the teleportation machines and severing the corpus callosum, it’s perfect stuff for Black Mirror episodes and popular science things which are increasingly becoming interesting, but it’s also I feel existentially very important and interesting.

I think I have a pretty big fear of death. I feel like a lot of that fear is born of those individualism, where you identify it with your own personal consciousness and qualitative nature and some of your numerical nature perhaps, and there’s this great attachment to it. There’s the question in journey of further and always investigating this question of identity and who am I or what am I? That process, I think, also has a lot of important implications for people’s existential anxiety.

That also feeds into and informs how people wish to relate and deal with these technological changes in the 21st century and the kinds of futures they would or would not be excited about. I think those are generally my feelings about this. I hope that it doesn’t just come down to what you were talking about, the socioeconomic and social forces just determining how the whole process unfolds, but there’s actually a philosophical and moral reflection and idealization that happens there, so we can decide how consciousness ever evolves into the deep future.

Anthony Aguirre: I think I agree with a lot of what you said. I think we’ve had this very esoteric discussion about the nature of reality and self and all these things that obviously a lot of people in the world are not going to be that into, but at the same time, I think as you said, some will and some of the questions when framed in evocative ways are super just intrinsically interesting. I think it’s also important to realize how large an affect some of this pretty esoteric philosophical thinking about the nature of reality has had that we had our moral system and legal system and governmental system were largely created in response to careful philosophical thinking and long treatises in the 17th and 18th and 19th centuries.

We need more of those now. We need brilliant works that are not just asking these questions, but actually compellingly arguing for ways to think about them, and putting it out there and saying, “This is the way that we ought to value things, or this is the ground for valuing this or that, or this is the way that we should consider reality and what it means for us.” We don’t have to accept any one of those views, but I fear that in the lack of daringly trying to deeply develop those ideas and push for them and argue for them that we will end up, as you say, just randomly meandering around to where the social forces pushes.

If we really want a development of real ideas on which to found our long-term future, we better start really developing them and valuing them and putting them out there and taking them seriously rather than thinking, “Oh, this is weird esoteric conversation off in the corner of philosophy academia, blah, blah, blah.” De-valuing it in that way, I think, is not just not useful, but really misunderstanding how things have happened historically. Those discussions in the right way and published and pushed in the right ways have had huge influence on the course of humanity. So they shouldn’t be underestimated, and let’s keep going. You can write the book, and we’ll read it.

Lucas Perry: Wonderful. Well, the last point I think is very useful is what you’re saying is very true in terms of the pragmatics and illustrating that. In particular, the enlightenment treatises have very particular views on personal identity. The personal identity of people of color over time has shifted in terms of slavery. The way in which Western colonial powers conceptualize the West Africans for example, was in very particular way.

Even today with gender issues in general, that is also a mainstream discourse on the nature of personal identity. It’s already been a part of the formation of society and culture and civilization, and it will only continue to do so. With that, thanks so much, Anthony. I appreciate it.

AI Alignment Podcast: Identity and the AI Revolution with David Pearce and Andrés Gómez Emilsson

 Topics discussed in this episode include:

  • Identity from epistemic, ontological, and phenomenological perspectives
  • Identity formation in biological evolution
  • Open, closed, and empty individualism
  • The moral relevance of views on identity
  • Identity in the world today and on the path to superintelligence and beyond

Timestamps: 

0:00 – Intro

6:33 – What is identity?

9:52 – Ontological aspects of identity

12:50 – Epistemological and phenomenological aspects of identity

18:21 – Biological evolution of identity

26:23 – Functionality or arbitrariness of identity / whether or not there are right or wrong answers

31:23 – Moral relevance of identity

34:20 – Religion as codifying views on identity

37:50 – Different views on identity

53:16 – The hard problem and the binding problem

56:52 – The problem of causal efficacy, and the palette problem

1:00:12 – Navigating views of identity towards truth

1:08:34 – The relationship between identity and the self model

1:10:43 – The ethical implications of different views on identity

1:21:11 – The consequences of different views on identity on preference weighting

1:26:34 – Identity and AI alignment

1:37:50 – Nationalism and AI alignment

1:42:09 – Cryonics, species divergence, immortality, uploads, and merging.

1:50:28 – Future scenarios from Life 3.0

1:58:35 – The role of identity in the AI itself

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

The transcript has been edited for style and clarity

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today we have an episode with Andres Gomez Emillson and David Pearce on identity. This episode is about identity from the ontological, epistemological and phenomenological perspectives. In less jargony language, we discuss identity from the fundamental perspective of what actually exists, of how identity arises given functional world models and self models in biological organisms, and of the subjective or qualitative experience of self or identity as a feature of consciousness. Given these angles on identity, we discuss what identity is, the formation of identity in biological life via evolution, why identity is important to explore and it’s ethical implications and implications for game theory, and  we directly discuss its relevance to the AI alignment problem and the project of creating beneficial AI.

I think the question of “How is this relevant to AI Alignment?” is useful to explore here in the intro. The AI Alignment problem can be construed in the technical limited sense of the question of “how to program AI systems to understand and be aligned with human values, preferences, goals, ethics, and objectives.” In a limited sense this is strictly a technical problem that supervenes upon research in machine learning, AI, computer science, psychology, neuroscience, philosophy, etc. I like to approach the problem of aligning AI systems from a broader and more generalist perspective. In the way that I think about the problem, a broader view of AI alignment takes into account the problems of AI governance, philosophy, AI ethics, and reflects deeply on the context in which the technical side of the problem will be taking place, the motivations of humanity and the human beings engaged in the AI alignment process, the ingredients required for success, and other civilization level questions on our way hopefully to beneficial superintelligence. 

It is from both of these perspectives that I feel exploring the question of identity is important. AI researchers have their own identities and those identities factor into their lived experience of the world, their motivations, and their ethics. In fact, the same is of course true of policy makers and anyone in positions of power to influence the alignment process, so being aware of commonly held identity models and views is important for understanding their consequences and functions in the world. From a macroscopic perspective, identity has evolved over the past 4.5 billion years on earth and surely will continue to do so in AI systems themselves and in the humans which hope to wield that power. Some humans may wish to merge, other to pass away or simply die, and others to be upgraded or uploaded in some way. Questions of identity are also crucial to this process of relating to one another and to AI systems in a rapidly evolving world where what it means to be human is quickly changing, where copies of digital minds or AIs can be made trivially, and the boundary between what we conventionally call the self and world begins to dissolve and break down in new ways, demanding new understandings of ourselves and identity in particular. I also want to highlight an important thought from the podcast that any actions we wish to take with regards to improving or changing understandings or lived experience of identity must be Sociologically relevant, or such interventions simply risk being irrelevant. This means understanding what is reasonable for human beings to be able to update their minds with and accept over certain periods of time and also the game theoretic implications of certain views of identity and their functional usefulness. This conversation is thus an attempt to broaden the conversation on these issues outside of what is normally discussed and to flag this area as something worthy of consideration.

For those not familiar with David Pearce or Andres Gomez Emilsson. David is a co-founder of the World Transhumanist Association, rebranded humanity plus, and is a prominent figure within the transhumanism movement in general. You might know him from his work on the Hedonistic Imperative, a book which explores our moral obligation to work towards the abolition of suffering in all sentient life through technological intervention. Andrés is a consciousness researcher at the Qualia Research Institute and is also the Co-founder and President of the Stanford Transhumanist Association. He has a Master’s in Computational Psychology from Stanford.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate

 If you’d like to be a regular supporter, please consider a monthly subscription donation to make sure we can continue our efforts into the future. 

These contributions make it possible for us to bring you conversations like these and to develop the podcast further. You can also follow us on your preferred listening platform by searching for us directly or following the links on the page for this podcast found in the description. 

And with that, here is my conversation with Andres Gomez Emilsson and David Pearce 

I just want to start off with some quotes here that I think would be useful. The last podcast that we had was with Yuval Noah Harari and Max Tegmark. One of the points that Yuval really emphasized was the importance of self understanding questions like, who am I? What am I in the age of technology? Yuval all said “Get to know yourself better. It’s maybe the most important thing in life. We haven’t really progressed much in the last thousands of years, and the reason is that yes, we keep getting this advice, but we don’t really want to do it,” he goes on to say that, “especially as technology will give us all, at least some of us more and more power, the temptations of naive utopias are going to be more and more irresistible, and I think the really most powerful check on these naive utopias is really getting to know yourself better.”

In search of getting to know ourselves better, I want to explore this question of identity with both of you. To start off, what is identity?

David Pearce: One problem is that we have more than one conception of identity. There is the straightforward, logical sense that philosophers call the indiscernibility of identicals, namely that if A equals B, then anything true of A is true of B. In one sense, that’s trivially true, but when it comes to something like personal identity, it just doesn’t hold water at all. You are a different person from your namesake who went to bed last night – and it’s very easy carelessly to shift between these two different senses of identity.

Or one might speak of the United States. In what sense is the United States the same nation in 2020 as it was in 1975? It’s interest-relative.

Andrés Gómez Emilsson: Yeah and to go a little bit deeper on that, I would make the distinction as David made it between ontological identity, what fundamentally is actually going on in the physical world? In instantiated reality? Then there’s conventional identity definitely, the idea of continuing to exist from one moment to another as a human and also countries and so on.

Then there’s also phenomenological identity, which is our intuitive common sense view of: What are we and basically, what are the conditions that will allow us to continue to exist? We can go into more detail but yet, the phenomenological notion of identity is an incredible can of worms because there’s so many different ways of experiencing identity and all of them have their own interesting idiosyncrasies. Most people tend to confuse the two. They tend to confuse ontological and phenomenological identity. Just as a simple example that I’m sure we will revisit in the future, when a person has, let’s say an ego dissolution or a mystical experience and they feel that they merged with the rest of the cosmos, and they come out and say, “Oh, we’re all one consciousness.” That tends to be interpreted as some kind of grasp of an ontological reality. Whereas we could argue in a sense that that was just the shift in phenomenological identity, that your sense of self got transformed, not necessarily that you’re actually directly merging with the cosmos in a literal sense. Although, of course it might be very indicative of how conventional our sense of identity is if it can be modified so drastically in other states of consciousness.

Lucas Perry: Right, and let’s just start with the ontological sense. How does one understand or think about identity from the ontological side?

Andrés Gómez Emilsson: In order to reason about this, you need a shared frame of reference for what actually exists, and a number of things including the nature of time and space, and memory because in the common sense view of time called presentism, where basically there’s just the present moment, the past is a convenient construction and the future is a fiction useful in practical sense, but they don’t literally exist in that sense. This notion that A equals B in the sense of, Hey, you could modify what happens to A and that will automatically also modify what happens to B. It kind of makes sense and you can perhaps think of identity is moving over time along with everything else.

On the other hand, if you have an eternalist point of view where basically you interpret the whole of space time as just basically there, on their own coordinates in the multiverse, that kind of provides a different notion of ontological identity because it’s in a sense, a moment of experience is its own separate piece of reality.

In addition, you also need to consider the question of connectivity: in what way different parts of reality are connected to each other? In a conventional sense, as you go from one second to the next, you’ve continued to be connected to yourself in an unbroken stream of consciousness and this has actually led some philosophers to hypothesize that the proper unit of identity is from the moment your wake up to the moment in which you go to sleep because that’s an unbroken chain/stream of consciousness.

From a scientific and philosophically rigorous point of view, it’s actually difficult to make the case that our stream of consciousness is truly unbroken. Definitely if you have an eternalist point of view on experience and on the nature of time, what you will instead see is from the moment you wake up to the moment you go to sleep, there’s actually been an extraordinarily large amount of snapshots of discrete and moments of experience. In that sense, each of those individual moments of experiences would be its own ontologically separate individual.

Now one of the things that becomes kind of complicated with a kind of an eternalist account of time and identity is that you cannot actually change it. There’s nothing you can actually do to A, so that reasoning of if you do anything to A an A equals B, then the same will happen to B, doesn’t even actually apply in here because everything is already there. You cannot actually modify A any more than you can modify the number five.

David Pearce: Yes, it’s a rather depressing perspective in many ways, the eternalist view. If one internalizes it too much, it can lead to a sense of fatalism and despair. A lot of the time it’s probably actually best to think of the future as open.

Lucas Perry: This helps to clarify some of the ontological part of identity. Now, you mentioned this phenomenological aspect and I want to say also the epistemological aspect of identity. Could you unpack those two? And maybe clarify this distinction for me if you wouldn’t parse it this way? I guess I would say that the epistemological one is the models that human beings have about the world and about ourselves. It includes how the world is populated with a lot of different objects that have identity like humans and planets and galaxies. Then we have our self model, which is the model of our body and our space in social groups and who we think we are.

Then there’s the phenomenological identity, which is that subjective qualitative experience of self or the ego in relation to experience. Or where there’s an identification with attention and experience. Could you unpack these two later senses?

Andrés Gómez Emilsson: Yeah, for sure. I mean in a sense you could have like an implicit self model that doesn’t actually become part of your consciousness or it’s not necessarily something that you’re explicitly rendering. This goes on all the time. You’ve definitely, I’m sure, had the experience of riding a bicycle and after a little while you can almost do it without thinking. Of course, you’re engaging with the process in a very embodied fashion, but you’re not cognizing very much about it. Definitely you’re not representing, let’s say your body state, or you’re representing exactly what is going on in a cognitive way. It’s all kind of implicit in the way in which you feel. I would say that paints a little bit of a distinction between a self model which is ultimately functional. It has to do with, are you processing the information that you’re required to solve the task that involves modeling what you are in your environment and distinguishing it from the felt sense of, are you a person? What are you? How are you located and so on.

The first one is the one that most of robotics and machine learning, that have like an embodied component, are really trying to get at. You just need the appropriate information processing in order to solve the task. They’re not very concerned about, does this feel like anything? Or does it feel like a particular entity or a self to be that particular algorithm?

Whereas, we’re talking about the phenomenological sense of identity. That’s very explicitly about how it feels like and there’s all kinds of ways in which a healthy so to speak, sense of identity, can be broken down in all sorts of interesting ways. There’s many failure modes, we can put it that way.

One might argue, I mean I suspect for example, David Pearce might say this, which is that, our self models or our implicit sense of self, because of the way in which it was brought up through Darwinian selection pressures, is already extremely ill in some sense at least, from the point of view of it, it actually telling us something true and actually making us do something ethical. It has all sorts of problems, but it is definitely functional. You can anticipate being a person tomorrow and plan accordingly. You leave messages to yourself by encoding them in memory and yeah, this is a convenient sense of conventional identity.

It’s very natural for most people’s experiences. I can briefly mention a couple of ways in which it can break down. One of them is depersonalization. It’s a particular psychological disorder where one stops feeling like a person, and it might have something to do with basically, not being able to synchronize with your bodily feelings in such a way that you don’t actually feel embodied. You may feel this incarnate entity or just a witness experiencing a human experience, but not actually being that person.

Then you also have things such as empathogen induced sense of shared identity with others. If you’d take MDMA, you may feel that all of humanity is deeply connected, or we’re all part of the same essence of humanity in a very positive sense of identity, but perhaps not in an evolutionary adaptive sense. Finally, is people with a multiple personality disorder, where in a sense they have a very unstable sense of who they are and sometimes it can be so extreme that there’s epistemological blockages from one sense of self to another.

David Pearce: As neuroscientist Donald Hoffman likes to say, fitness trumps truth. Each of us runs a world-simulation. But it’s not an impartial, accurate, faithful world-simulation. I am at the center of a world-simulation, my egocentric world, the hub of reality that follows me around. And of course there are billions upon billions of other analogous examples too. This is genetically extremely fitness-enhancing. But it’s systematically misleading. In that sense, I think Darwinian life is malware.

Lucas Perry: Wrapping up here on these different aspects of identity, I just want to make sure that I have all of them here. Would you say that those are all of the aspects?

David Pearce: One can add the distinction between type- and token- identity. In principle, it’s possible to create from scratch a molecular duplicate of you. Is that person you? It’s type-identical, but it’s not token-identical.

Lucas Perry: Oh, right. I think I’ve heard this used in some other places as numerical distinction versus qualitative distinction. Is that right?

David Pearce: Yeah, that’s the same distinction.

Lucas Perry: Unpacking here more about what identity is. Let’s talk about it purely as something that the world has produced. What can we say about the evolution of identity in biological life? What is the efficacy of certain identity models in Darwinian evolution?

Andrés Gómez Emilsson: I would say that self models most likely have existed, potentially since pretty early on in the evolutionary timeline. You may argue that in some sense even bacteria has some kind of self model. But again, a self model is really just functional. The bacteria does need to know, at least implicitly, it’s size in order to be able to navigate it’s environment, follow chemical gradients, and so on, not step on itself. That’s not the same, again, as a phenomenal sense of identity, and that one I would strongly suspect came much later. Perhaps with the advent of the first primitive nervous systems. That would be only if actually running that phenomenal model is giving you some kind of fitness advantage.

One of the things that you will encounter with David and I is that we think that phenomenally bound experiences have a lot of computational properties and in a sense, the reason why we’re conscious has to do with the fact that unified moments of experience are doing computationally useful legwork. It comes when you merge implicit self models in just the functional sense together with the computational benefits of actually running a conscious system that, perhaps for the first time in history, you will actually have a phenomenal self model.

I would suspect probably in the Cambrian explosion this was already going on to some extent. All of these interesting evolutionary oddities that happen in the Cambrian explosion probably had some kind of rudimentary sense of self. I would be skeptical that is going on.

For example, in plants. One of the key reasons is that running a real time world simulation in a conscious framework is very calorically expensive.

David Pearce: Yes, it’s a scandal. What, evolutionarily speaking, is consciousness “for”? What could a hypothetical p-zombie not do? The perspective that Andrés and I are articulating is that essentially what makes biological minds special is phenomenal binding – the capacity to run real-time, phenomenally-bound world-simulations, i.e. not just be 86 billion discrete, membrane-bound pixels of experience. Somehow, we each generate an entire cross-modally matched, real-time world-simulation, made up of individual perceptual objects, somehow bound into a unitary self. The unity of perception is extraordinarily computationally powerful and adaptive. Simply saying that it’s extremely fitness-enhancing doesn’t explain it, because something like telepathy would be extremely fitness-enhancing too, but it’s physically impossible.

Yes, how biological minds manage to run phenomenally-bound world-simulations is unknown: they would seem to be classically impossible. One way to appreciate just how advantageous is (non-psychotic) phenomenal binding is to look at syndromes where binding even partially breaks down: simultanagnosia, where one can see only one object at once, or motion blindness (akinetopsia), where you can’t actually see moving objects, or florid schizophrenia. Just imagine those syndromes combined. Why aren’t we just micro-experiential zombies?

Lucas Perry: Do we have any interesting points here to look at in the evolutionary tree for where identity is substantially different from ape consciousness? If we look back at human evolution, it seems that it’s given the apes and particularly our species a pretty strong sense of self, and that gives rise to much of our ape socialization and politics. I’m wondering if there was anything else like maybe insects or other creatures that have gone in a different direction? Also if you guys might be able to just speak a little bit on the formation of ape identity.

Andrés Gómez Emilsson: Definitely I think like the perspective of the selfish gene, it’s pretty illuminating here. Nominally, our sense of identity is the sense of one person, one mind. In practice however, if you make sense of identity as well in terms of that which you want to defend, or that of which you consider worth preserving, you will see that people’s sense of identity also extends to their family members and of course, with the neocortex and the ability to create more complex associations. Then you have crazy things like sense of identity being based on race or country of origin or other constructs like that.are building on top of imports from the sense of, hey, the people who are familiar to you feel more like you.

It’s genetically adaptive to have that and from the point of view of the selfish gene, genes that could recognize themselves in others and favor the existence of others that also share the same genes, are more likely to reproduce. That’s called the inclusive fitness in biology, you’re not just trying to survive yourself or make copies of yourself, you’re also trying to help those that are very similar to you do the same. Almost certainly, it’s a huge aspect of how we perceive the world. Just anecdotally from a number of trip reports, there’s this interesting thread of how some chemicals like MDMA and 2CB, for those who don’t know, it’s these empathogenic psychedelics, that people get the strange sense that people they’ve never met before in their life are as close to them as a cousin, or maybe a half brother, or half sister. It’s a very comfortable and quite beautiful feeling. You could imagine that nature was very selective on who do you give that feeling to in order to maximize inclusive fitness.

All of this builds up to the overall prediction I would make that, the sense of identity of ants and other extremely social insects might be very different. The reason being that they are genetically incentivized to basically treat each other as themselves. Most ants themselves don’t produce any offspring. They are genetically sisters and all of their genetic incentives are into basically helping the queen pass on the genes into other colonies. In that sense, I would imagine an ant probably sees other ants of the same colony pretty much as themselves.

David Pearce: Yes. There was an extraordinary finding a few years ago: members of one species of social ant actually passed the mirror test – which has traditionally been regarded as the gold standard for the concept of a self. It was shocking enough, to many people, when a small fish was shown to be capable of mirror self-recognition. If some ants too can pass the mirror test, it suggests some form of meta-cognition, self-recognition, that is extraordinarily ancient.

What is it that distinguishes humans from nonhuman animals? I suspect the distinction relates to something that is still physically unexplained: how is it that a massively parallel brain gives rise to serial, logico-linguistic thought? It’s unexplained, but I would say this serial stream is what distinguishes us, most of all – not possession of a self-concept.

Lucas Perry: Is there such a thing as a right answer to questions of identity? Or is it fundamentally just something that’s functional? Or is it ultimately arbitrary?

Andrés Gómez Emilsson: I think there is the right answer. From a functional perspective, there’s just so many different ways of thinking about it. As I was describing perhaps with ants and humans, their sense of identity is probably pretty different. But, they both are useful for passing on the genes. In that sense they’re all equally valid. Imagine in the future is some kind of a swarm mind that also has its own distinct functionally adaptive sense of identity, and I mean in that sense that it ground truth to what it should be from the point of view of functionality. It really just depends on what is the replication unit.

Ontologically though, I think there’s a case to be made that either or empty individualism are true. Maybe it would be good to define those terms first.

Lucas Perry: Before we do that. Your answer then is just that, yes, you suspect that also ontologically in terms of fundamental physics, there are answers to questions of identity? Identity itself isn’t a confused category?

Andrés Gómez Emilsson: Yeah, I don’t think it’s a leaky reification as they say.

Lucas Perry: From the phenomenological sense, is the self an illusion or not? Is the self a valid category? Is your view also on identity that there is a right answer there?

Andrés Gómez Emilsson: From the phenomenological point of view? No, I would consider it a parameter, mostly. Just something that you can vary, and there’s trade offs or different experiences of identity.

Lucas Perry: Okay. How about you David?

David Pearce: I think ultimately, yes, there are right answers. In practice, life would be unlivable if we didn’t maintain these fictions. These fictions are (in one sense) deeply immoral. We punish someone for a deed that their namesake performed, let’s say 10, 15, 20 years ago. America recently executed a murderer for a crime that was done 20 years ago. Now quite aside from issues of freedom and responsibility and so on, this is just scapegoating.

Lucas Perry: David, do you feel that in the ontological sense there are right or wrong answers to questions of identity? And in the phenomenological sense? And in the functional sense?

David Pearce: Yes.

Lucas Perry: Okay, so then I guess you disagree with Andres about the phenomenological sense?

David Pearce: I’m not sure, Andrés and I agree about most things. Are we disagreeing Andrés?

Andrés Gómez Emilsson: I’m not sure. I mean, what I said about the phenomenal aspect of identity was that I think of it as a parameter of our world simulation. In that sense, there’s no true phenomenological sense of identity. They’re all useful for different things. The reason I would say this too is, you can assume that something like each snapshot of experience, is its own separate identity. I’m not even sure you can accurately represent that in a moment of experience itself. This is itself a huge can of worms that opens up the problem of referents. Can we even actually refer to something from our own point of view? My intuition here is that, whatever sense of identity you have at a phenomenal level, I think of it as a parameter of the world simulation and I don’t think it can be an accurate representation of something true. It’s just going to be a feeling, so to speak.

David Pearce: I could endorse that. We fundamentally misperceive each other. The Hogan sisters, conjoined craniopagus twins, know something that the rest of us don’t. The Hogan sisters share a thalamic bridge, which enables them partially, to a limited extent, to “mind-meld”. The rest of us see other people essentially as objects that have feelings. When one thinks of one’s own ignorance, perhaps one laments one’s failures as a mathematician or a physicist or whatever; but an absolutely fundamental form of ignorance that we take for granted is we (mis)conceive other people and nonhuman animals as essentially objects with feelings, whereas individually, we ourselves have first-person experience. Whether it’s going to be possible to overcome this limitation in the future I don’t know. It’s going to be immensely technically challenging – building something like reversible thalamic bridges. A lot depends on one’s theory of phenomenal binding. But let’s imagine a future civilization in which partial “mind-melding” is routine. I think it will lead to a revolution not just in morality, but in decision-theoretic rationality too – one will take into account the desires, the interests, and the preferences of what will seem like different aspects of oneself.

Lucas Perry: Why does identity matter morally? I think you guys have made a good case about how it’s important functionally, historically in terms of biological evolution, and then in terms of like society and culture identity is clearly extremely important for human social relations, for navigating social hierarchies and understanding one’s position of having a concept of self and identity over time, but why does it matter morally here?

Andrés Gómez Emilsson: One interesting story where you can think of a lot of social movements, in a sense, a lot of ideologies that have existed in human history, as attempts to hack people’s sense of identities or make use of them for the purpose of the reproduction of the ideology or the social movement itself.

To a large extent, a lot of the things that you see in a therapy have a lot to do with expanding your sense of identity to include your future self as well, which is something that a lot of people struggle with when it comes to impulsive decisions or your rationality. There’s these interesting point of view of how a two year old or a three year old, hasn’t yet internalized the fact that they will wake up tomorrow and that the consequences of what they did today will linger on in the following days. This is kind of a revelation when a kid finally internalizes the fact that, Oh my gosh, I will continue to exist for the rest of my life. There’s going to be a point where I’m going to be 40 years old and also there’s going to be a time where I’m 80 years old and all of those are real, and I should plan ahead for it.

Ultimately, I do think that advocating for a very inclusive sense of identity, where the locus of identity is consciousness itself. I do think that might be a tremendous moral and ethical implications.

David Pearce: We want an inclusive sense of “us” that embraces all sentient beings.  This is extremely ambitious, but I think that should be the long-term goal.

Lucas Perry: Right, there’s a spectrum here and where you fall on the spectrum will lead to different functions and behaviors, solipsism or extreme egoism on one end, pure selflessness or ego death or pure altruism on the other end. Perhaps there are other degrees and axes on which you can move, but the point is it leads to radically different identifications and relations with other sentient beings and with other instantiations of consciousness.

David Pearce: Would our conception of death be different if it was a convention to give someone a different name when they woke up each morning? Because after all, waking up is akin to reincarnation. Why is it that when one is drifting asleep each night, one isn’t afraid of death? It’s because (in some sense) one believes one is going to be reincarnated in the morning.

Lucas Perry: I like that. Okay, I want to return to this question after we hit on the different views of identity to really unpack the different ethical implications more. I wanted to sneak that in here for a bit of context. Pivoting back to this sort of historical and contextual analysis of identity. We talked about biological evolution as like instantiating these things. How do you guys view religion as codifying an egoist view on identity? Religion codifies the idea of the eternal soul and the soul, I think, maps very strongly onto the phenomenological self. It makes that the thing that is immutable or undying or which transcends this realm?

I’m talking obviously specifically here about Abrahamic religions, but then also in Buddhism there is, the self is an illusion, or what David referred to as empty individualism, which we’ll get into, where it says that identification with the phenomenological self is fundamentally a misapprehension of reality and like a confusion and that that leads to attachment and suffering and fear of death. Do you guys have comments here about religion as codifying views on identity?

Andrés Gómez Emilsson: I think it’s definitely really interesting that there are different views of identity and religion. How I grew up, I always assumed religion was about souls and getting into heaven. As it turns out, I just needed to know about Eastern religions and cults. That also happened to sometimes have like different views of personal identity. That was definitely a revelation to me. I would actually say that I started questioning the sense of a common sense of personal identity before I learned about Eastern religions and I was really pretty surprised and very happy when I found out that, let’s say Hinduism actually, it has a kind of universal consciousness take on identity, a socially sanctioned way of looking at the world that has a very expansive sense of identity. Buddhism is also pretty interesting because as far as I understand it, they consider actually pretty much any view of identity to be a cause for suffering fundamentally has to do with a sense of craving either for existence or craving for non-existence, which they also consider a problem. A Buddhist would generally say that even something like universal consciousness, believing that we’re all fundamentally Krishna incarnating in many different ways, itself will also be a source of suffering to some extent because you may crave further existence, which may not be very good from their point of view. It makes me optimistic that there’s other types of religions with other views of identity.

David Pearce: Yes. Here is one of my earliest memories. My mother belonged to The Order of the Cross – a very obscure, small, vaguely Christian denomination, non-sexist, who worship God the Father-Mother. And I recall being told, aged five, that I could be born again. It might be as a little boy, but it might be as a little girl – because gender didn’t matter. And I was absolutely appalled at this – at the age of five or so – because in some sense girls were, and I couldn’t actually express this, defective.

And religious conceptions of identity vary immensely. One thinks of something like Original Sin in Christianity. I could now make a lot of superficial comments about religion. But one would need to explore in detail the different religious traditions and their different conceptions of identity.

Lucas Perry: What are the different views on identity? If you can say anything, why don’t you hit on the ontological sense and the phenomenological sense? Or if we just want to stick to the phenomenological sense then we can.

Andrés Gómez Emilsson: I mean, are you talking about an open, empty, closed?

Lucas Perry: Yeah. So that would be the phenomenological sense, yeah.

Andrés Gómez Emilsson: No, actually I would claim those are attempts at getting at the ontological sense.

Lucas Perry: Okay.

Andrés Gómez Emilsson: If you do truly have a soul ontology, something that implicitly a very large percentage of the human population have, that would be, yeah, in this view called a closed individualist perspective. Common sense, you start existing when you’re born, you stop existing when you die, you’re just a stream of consciousness. Even perhaps more strongly, you’re a soul that has experiences, but experiences maybe are not fundamental to what you are.

Then there is the more Buddhist and definitely more generally scientifically-minded view, which is empty individualism, which is that you only exist as a moment of experience, and from one moment to the next that you are a completely different entity. And then, finally, there is open individualism, which is like Hinduism claiming that we are all one consciousness fundamentally.

There is an ontological way of thinking of these notions of identity. It’s possible that a lot of people think of them just phenomenologically, or they may just think there’s no further fact beyond the phenomenal. In which case something like that closed individualism, for most people most of the time, is self-evidently true because you are moving in time and you can notice that you continue to be yourself from one moment to the next. Then, of course, what would it feel like if you weren’t the same person from one moment to the next? Well, each of those moments might completely be under the illusion that it is a continuous self.

For most things in philosophy and science, if you want to use something as evidence, it has to agree with one theory and disagree with another one. And the sense of continuity from one second to the next seems to be compatible with all three views. So it’s not itself much evidence either way.

States of depersonalization are probably much more akin to empty individualism from a phenomenological point of view, and then you have ego death and definitely some experiences of the psychedelic variety, especially high doses of psychedelics tend to produce very strong feelings of open individualism. That often comes in the form of noticing that your conventional sense of self is very buggy and doesn’t seem to track anything real, but then realizing that you can identify with awareness itself. And if you do that, then in some sense automatically, you realize that you are every other experience out there, since the fundamental ingredient of a witness or awareness is shared with every conscious experience.

Lucas Perry: These views on identity are confusing to me because agents haven’t existed for most of the universe and I don’t know why we need to privilege agents in our ideas of identity. They seem to me just emergent patterns of a big, ancient, old, physical universe process that’s unfolding. It’s confusing to me that just because there are complex self- and world-modeling patterns in the world, that we need to privilege them with some kind of shared identity across themselves or across the world. Do you see what I mean here?

Andrés Gómez Emilsson: Oh, yeah, yeah, definitely. I’m not agent-centric. And I mean, in a sense also, all of these other exotic feelings of identity often also come with states of low agency. You actually don’t feel that you have much of a choice in what you could do. I mean, definitely depersonalization, for example, often comes with a sense of inability to make choices, that actually it’s not you who’s making the choice, they’re just unfolding and happening. Of course, in some meditative traditions that’s considered a path to awakening, but in practice for a lot of people, that’s a very unpleasant type of experience.

It sounds like it might be privileging agents; I would say that’s not the case. If you zoom out and you see the bigger worldview, it includes basically this concept, David calls it non-materialist physicalist idealism, where the laws of physics describe the behavior of the universe, but that which is behaving according to the laws of physics is qualia, is consciousness itself.

I take very seriously the idea that a given molecule or a particular atom contains moments of experience, it’s just perhaps very fleeting and very dim or are just not very relevant in many ways, but I do think it’s there. And sense of identity, maybe not in a phenomenal sense, I don’t think an atom actually feels like an agent over time, but continuity of its experience and the boundaries of its experience would have strong bearings on ontological sense of identity.

There’s a huge, obviously, a huge jump between talking about the identity of atoms and then talking about the identity of a moment of experience, which presumably is an emergent effect of 100 billion neurons, themselves made of so many different atoms. Crazy as it may be, it is both David Pearce’s view and my view that actually each moment of experience does stand as an ontological unit. It’s just the ontological unit of a certain kind that usually we don’t see in physics, but it is both physical and ontologically closed.

Lucas Perry: Maybe you could unpack this. You know mereological nihilism, maybe I privilege this view where I just am trying to be as simple as possible and not build up too many concepts on top of each other.

Andrés Gómez Emilsson: Mereological nihilism basically says that there are no entities that have parts. Everything is part-less. All that exists in reality is individual monads, so to speak, things that are fundamentally self-existing. For that, if you have let’s say monad A and monad B, just put together side by side, that doesn’t entail that now there is a monad AB that mixes the two.

Lucas Perry: Or if you put a bunch of fundamental quarks together that it makes something called an atom. You would just say that it’s quarks arranged atom-wise. There’s the structure and the information there, but it’s just made of the monads.

Andrés Gómez Emilsson: Right. And the atom is a wonderful case, basically the same as a molecule, where I would say mereological nihilism with fundamental particles as just the only truly existing beings does seem to be false when you look at how, for example, molecules behave. The building block account of how chemical bonds happen, which is with these Lewis diagrams of how it can have a single bond or double bond and you have the octet rule, and you’re trying to build these chains of atoms strung together. And all that matters for those diagrams is what each atom is locally connected to.

However, if you just use these in order to predict what molecules are possible and how they behave and their properties, you will see that there’s a lot of artifacts that are empirically disproven. And over the years, chemistry has become more and more sophisticated where eventually, it’s come to the realization that you need to take into account the entire molecule at once in order to understand what its “dynamically stable” configuration, which involves all of the electrons and all of the nuclei simultaneously interlocking into a particular pattern that self replicates.

Lucas Perry: And it has new properties over and above the parts.

Andrés Gómez Emilsson: Exactly.

Lucas Perry: That doesn’t make any sense to me or my intuitions, so maybe my intuitions are just really wrong. Where does the new property or causality come from? Because it essentially has causal efficacy over and above the parts.

Andrés Gómez Emilsson: Yeah, it’s tremendously confusing. I mean, I’m currently writing an article about basically how this sense of topological segmentation can, in a sense, account both for this effect of what we might call weak downward causation, which is like, you get a molecule and now the molecule will have effects in the world; that you need to take into account all of the electrons and all of the nuclei simultaneously as a unit in order to actually know what the effect is going to be in the world. You cannot just take each of the components separately, but that’s something that we could call weak downward causation. It’s not that fundamentally you’re introducing a new law of physics. Everything is still predicted by Schrödinger equation, it’s still governing the behavior of the entire molecule. It’s just that the appropriate unit of analysis is not the electron, but it would be the entire molecule.

Now, if you pair this together with a sense of identity that comes from topology, then I think there might be a good case for why moments of experience are discrete entities. The analogy here with the topological segmentation, hopefully I’m not going to lose too many listeners here, but we can make an analogy with, for example, a balloon. That if you start out imagining that you are the surface of the balloon and then you take the balloon by two ends and you twist them in opposite directions, eventually at the middle point you get what’s called a pinch point. Basically, the balloon collapses in the center and you end up having these two smooth surfaces connected by a pinch point. Each of those twists creates a new topological segment, or in a sense is segmenting out the balloon. You could basically interpret things such as molecules as new topological segmentations of what’s fundamentally the quantum fields that is implementing them.

Usually, the segmentations may look like an electron or a proton, but if you assemble them together just right, you can get them to essentially melt with each other and become one topologically continuous unit. The nice thing about this account is that you get everything that you want. You explain, on the one hand, why identity would actually have causal implications, and it’s this weak downward causation effect, at the same time as being able to explain: how is it possible that the universe can break down into many different entities? Well, the answer is the way in which it is breaking down is through topological segmentations. You end up having these self-contained regions of the wave function that are discommunicated from the rest of it, and each of those might be a different subject of experience.

David Pearce: It’s very much an open question: the intrinsic nature of the physical. Commonly, materialism and physicalism are conflated. But the point of view that Andrés and I take seriously, non-materialist physicalism, is actually a form of idealism. Recently, philosopher Phil Goff, who used to be a skeptic-critic of non-materialist physicalism because of the binding problem, published a book defending it, “Galileo’s Error”.

Again, it’s very much an open question. We’re making some key background assumptions here. A critical background assumption is physicalism, and that quantum mechanics is complete:  there is no “element of reality” that is missing from the equations (or possibly the fundamental equation) of physics. But physics itself seems to be silent on the intrinsic nature of the physical. What is the intrinsic nature of a quantum field? Intuitively, it’s a field of insentience; but this isn’t a scientific discovery, it’s a (very strong) philosophical intuition.

And if you couple this with the fact that the only part of the world to which one has direct access, i.e., one’s own conscious mind (though this is controversial), is consciousness, sentience. The non-materialist physicalist conjectures that we are typical, in one sense – inasmuch as the fields of your central nervous system aren’t ontologically different from the fields of the rest of the world. And what makes sentient beings special is the way that fields are organized into unified subjects of experience, egocentric world-simulations.

Now, I’m personally fairly confident that we are, individually, minds running egocentric world-simulations: direct realism is false. I’m not at all confident – though I explore the idea – that experience is the intrinsic nature of the physical, the “stuff” of the world. This is a tradition that goes back via Russell, ultimately, to Schopenhauer. Schopenhauer essentially turns Kant on his head.

Kant famously said that all we will ever know is phenomenology, appearances; we will never, never know the intrinsic, noumenal nature of the world. But Schopenhauer argues that essentially we do actually know one tiny piece of the noumenal essence of the world, the essence of the physical, and it’s experiential. So yes, tentatively, at any rate, Andrés and I would defend non-materialist or idealistic physicalism. The actual term “non-materialist physicalism” is due to the late Grover Maxwell.

Lucas Perry: Sorry, could you just define that real quick? I think we haven’t.

David Pearce: Physicalism is the idea that no “element of reality” is missing from the equations of physics, presumably (some relativistic generalization of) the universal Schrödinger equation.

Lucas Perry: It’s a kind of naturalism, too.

David Pearce: Oh, yes. It is naturalism. There are some forms of idealism and panpsychism that are non-naturalistic, but this view is uncompromisingly monist. Non-materialist physicalism isn’t claiming that a primitive experience is attached in some way to fundamental physical properties. The idea is that the actual intrinsic nature, the essence of the physical, is experiential.

Stephen Hawking, for instance, was a wave function monist. A doctrinaire materialist, but he famously said that we have no idea what breathed fire into the equations and makes the universe first to describe. Now, intuitively, of course one assumes that the fire in the equations, Kant’s noumenal essence of the world, is non-experiential. But if so, we have the hard problem, we have the binding problem, we have the problem of causal efficacy, a great mess of problems.

But if, and it’s obviously a huge if, the actual intrinsic nature of the physical is experiential, then we have a theory of reality that is empirically adequate, that has tremendous explanatory and predictive power. It’s mind-bogglingly implausible, at least to those of us steeped in the conceptual framework of materialism. But yes, by transposing the entire mathematical apparatus of modern physics, quantum field theory or its generalization, onto an idealist ontology, one actually has a complete account of reality that explains the technological successes of science, its predictive power, and doesn’t give rise to such insoluble mysteries as the hard problem.

Lucas Perry: I think all of this is very clarifying. There are also background metaphysical views, which people may or may not disagree upon, which are also important for identity. I also want to be careful to define some terms, in case some listeners don’t know what they mean. I think you hit on like four different things which all had to do with consciousness. The hard problem is why different kinds of computation actually… why it’s something to be that computation or like why there is consciousness correlated or associated with that experience.

Then you also said the binding problem. Is it the binding problem, why there is a unitary experience that’s, you said, modally connected earlier?

David Pearce: Yes, and if one takes the standard view from neuroscience that your brain consists of 86-billion-odd discrete, decohered, membrane-bound nerve cells, then phenomenal binding, whether local or global, ought to be impossible. So yeah, this is the binding problem, this (partial) structural mismatch. If your brain is scanned when you’re seeing a particular perceptual object, neuroscanning can apparently pick out distributed feature-processors, edge-detectors, motion-detectors, color-mediating neurons (etc). And yet there isn’t the perfect structural match that must exist if physicalism is true. And David Chalmers – because of this (partial) structural mismatch – goes on to argue that dualism must be true. Although I agree with David Chalmers that yes, phenomenal binding is classically impossible, if one takes the intrinsic nature argument seriously, then phenomenal unity is minted in.

The intrinsic nature argument, recall, is that experience, consciousness, discloses the intrinsic nature of the physical. Now, one of the reasons why this idea is so desperately implausible is it makes the fundamental “psychon” of consciousness ludicrously small. But there’s a neglected corollary of non-materialist physicalism, namely that if experience discloses the intrinsic nature of the physical, then experience must be temporally incredibly fine-grained too. And if we probe your nervous system at a temporal resolution of femtoseconds or even attoseconds, what would we find? My guess is that it would be possible to recover a perfect structural match between what you are experiencing now in your phenomenal world-simulation and the underlying physics. Superpositions (“cat states”) are individual states [i.e. not classical aggregates].

Now, if the effective lifetime of neuronal superpositions and the CNS were milliseconds, they would be the obvious candidate for a perfect structural match and explain the phenomenal unity of consciousness. But physicists, not least Max Tegmark, have done the maths: decoherence means that the effective lifetime of neuronal superpositions in the CNS, assuming the unitary-only dynamics, is femtoseconds or less, which is intuitively the reductio ad absurdum of any kind of quantum mind.

But one person’s reductio ad absurdum is another person’s falsifiable prediction. I’m guessing – I’m sounding like a believer, but I’m not –  I am guessing that with sufficiently sensitive molecular matter- wave interferometry, perhaps using “trained up” mini-brains, that the non-classical interference signature will disclose a perfect structural match between what you’re experiencing right now, your unified phenomenal world-simulation, and the underlying physics.

Lucas Perry: So, we hit on the hard problem and also the binding problem. There was like two other ones that you threw out there earlier that… I forget what they were?

David Pearce: Yeah, the problem of causal efficacy. How is it that you and I can discuss consciousness? How is it that the “raw feels” of consciousness have not merely the causal, but also the functional efficacy to inspire discussions of their existence?

Lucas Perry: And then what was the last one?

David Pearce: Oh, it’s been called the palette problem, P-A-L-E-T-T-E. As in the fact that there is tremendous diversity of different kinds of experience and yet the fundamental entities recognized by physics, at least on the normal tale, are extremely simple and homogeneous. What explains this extraordinarily rich palette of conscious experience? Physics exhaustively describes the structural-relational properties of the world. What physics doesn’t do is deal in the essence of the physical, its intrinsic nature.

Now, it’s an extremely plausible assumption that the world’s fundamental fields are non-experiential, devoid of any subjective properties – and this may well be the case. But if so, we have the hard problem, the binding problem, the problem of causal efficacy, the palette problem – a whole raft of problems.

Lucas Perry: Okay. So, this all serves the purpose of codifying that there’s these questions up in the air about these metaphysical views which inform identity. We got here because we were talking about mereological nihilism, and Andrés said that one view that you guys have is that you can divide or cut or partition consciousness into individual, momentary, unitary moments of experience that you claim are ontologically simple. What is your credence on this view?

Andrés Gómez Emilsson: Phenomenological evidence. When you experience your visual fields, you don’t only experience one point at a time. The contents of your experience are not ones and zeros; it isn’t the case that you experience one and then zero and then one again. Rather, you experience many different types of qualia varieties simultaneously: visuals experience and auditory experience and so on. All of that gets presented to you. I take that very seriously. I mean, some other researchers may fundamentally say that that’s an illusion, that there’s actually never a unified experience, but that has way many more problems than actually thinking seriously that unity of consciousness.

David Pearce: A number of distinct questions arise here. Are each of us egocentric phenomenal world-simulations? A lot of people are implicitly perceptual direct realists, even though they might disavow the label. Implicitly, they assume that they have some kind of direct access to physical properties. They associate experience with some kind of stream of thoughts and feelings behind their forehead. But if instead we are world-simulationists, then there arises the question: what is the actual fundamental nature of the world beyond your phenomenal world-simulation? Is it experiential or non-experiential? I am agnostic about that – even though I explore non-materialist physicalism.

Lucas Perry: So, I guess I’m just trying to get a better answer here on how is it that we navigate these views of identity towards truth?

Andrés Gómez Emilsson: An example I thought of, of a very big contrast between what you may intuitively imagine is going on versus what’s actually happening, is if you are very afraid of snakes, for example, you look at a snake. You feel, “Oh, my gosh, it’s intruding into my world and I should get away from it,” and you have this representation of it as a very big other. Anything that is very threatening, oftentimes you represent it as “an other”.

But crazily, that’s actually just yourself to a large extent because it’s still part of your experience. Within your moment of experience, the whole phenomenal quality of looking at a snake and thinking, “That’s an other,” is entirely contained within you. In that sense, these ways of ascribing identity and continuity to the things around us or a self-other division are almost psychotic. They start out by assuming that you can segment out a piece of your experience and call it something that belongs to somebody else, even though clearly, it’s still just part of your own experience; it’s you.

Lucas Perry: But the background here is also that you’re calling your experience your own experience, which is maybe also a kind of psychopathy. Is that the word you used?

Andrés Gómez Emilsson: Yeah, yeah, yeah, that’s right.

Lucas Perry: Maybe the scientific thing is, there’s just snake experience and it’s neither yours nor not yours, and there’s what we conventionally call a snake.

Andrés Gómez Emilsson: That said, there are ways in which I think you can use experience to gain insight about other experiences. If you’re looking at a picture that has two blue dots, I think you can accurately say, by paying attention to one of those blue dots, the phenomenal property of my sensation of blue is also in that other part of my visual field. And this is a case where in a sense you can I think, meaningfully refer to some aspect of your experience by pointing at an other aspect of your experience. It’s still maybe in some sense kind of crazy, but it’s still closer to truth than many other things that we think of or imagine.

Honest and true statements about the nature of other people’s experiences, I think are very much achievable. Bridging the reference gap, I think it might be possible to overcome and you can probably aim for a true sense of identity, harmonizing the phenomenal and the ontological sense of identity.

Lucas Perry: I mean, I think that part of the motivation, for example in Buddhism, is that you need to always be understanding yourself in reality as it is or else you will suffer, and that it is through understanding how things are that you’ll stop suffering. I like this point that you said about unifying the phenomenal identity and phenomenal self with what is ontologically true, but that also seems not intrinsically necessary because there’s also this other point here where you can maybe function or have the epistemology of any arbitrary identity view but not identify with it. You don’t take it as your ultimate understanding of the nature of the world, or what it means to be this limited pattern in a giant system.

Andrés Gómez Emilsson: I mean, generally speaking, that’s obviously pretty good advice. It does seem to be something that’s constrained to the workings of the human mind as it is currently implemented. I mean, definitely all these Buddhists advises of “don’t identify with it” or “don’t get attached to it.” Ultimately, it cashes out in experiencing less of a craving, for example, or feeling less despair in some cases. Useful advice, not universally applicable.

For many people, their problem might be something like, sure, like desire, craving, attachment, in which case these Buddhist practices will actually be very helpful. But if your problem is something like a melancholic depression, then lack of desire doesn’t actually seem very appealing; that is the default state and it’s not a good one. Just be mindful of universalizing this advice.

David Pearce: Yes. Other things being equal, the happiest people tend to have the most desires. Of course, a tremendous desire can also bring tremendous suffering, but there are a very large number of people in the world who are essentially unmotivated. Nothing really excites them. In some cases, they’re just waiting to die: melancholic depression. Desire can be harnessed.

A big problem, of course, is that in a Darwinian world, many of our desires are mutually inconsistent. And to use (what to me at least would be) a trivial example – it’s not trivial to everyone –  if you have 50 different football teams with all their supporters, there is logically no way that the preferences of these fanatical football supporters can be reconciled. But nonetheless, by raising their hedonic set-points, one can allow all football supporters to enjoy information-sensitive gradients of bliss. But there is simply no way to reconcile their preferences.

Lucas Perry: There’s part of me that does want to do some universalization here, and maybe that is wrong or unskillful to do, but I seem to be able to imagine a future where, say we get aligned superintelligence and there’s some kind of rapid expansion, some kind of optimization bubble of some kind. And maybe there are the worker AIs and then there are the exploiter AIs, and the exploiter AIs just get blissed out.

And imagine if some of the exploiter AIs are egomaniacs in their hedonistic simulations and some of them are hive minds, and they all have different views on open individualism or closed individualism. Some of the views on identity just seem more deluded to me than others. I seem to have a problem with a self identification and reification of self as something. It seems to me, to take something that is conventional and make it an ultimate truth, which is confusing to the agent, and that to me seems bad or wrong, like our world model is wrong. Part of me wants to say it is always better to know the truth, but I also feel like I’m having a hard time being able to say how to navigate views of identity in a true way, and then another part of me feels like actually it doesn’t really matter only in so far as it affects the flavor of that consciousness.

Andrés Gómez Emilsson: If we find like the chemical or genetic levers for different notions of identity, we could presumably imagine a lot of different ecosystems of approaches to identity in the future, some of them perhaps being much more adaptive than others. I do think I grasp a little bit maybe the intuition pump, and I think that’s actually something that resonates quite a bit with us, which is that it is an instrumental value for sure to always be truth-seeking, especially when you’re talking about general intelligence.

It’s very weird and it sounds like it’s going to fail if you say, “Hey, I’m going to be truth-seeking in every domain except on here.” And these might be identity, or value function, or your model of physics or something like that, but perhaps actual superintelligence in some sense it really entails having an open-ended model for everything, including ultimately who you are. If you’re not having those open-ended models that can be revised with further evidence and reasoning, you are not a super intelligence.

That intuition pump may suggest that if intelligence turns out to be extremely adaptive and powerful, then presumably, the superintelligences of the future will have true models of what’s actually going on in the world, not just convenient fictions.

David Pearce: Yes. In some sense I would hope our long-term goal is ignorance of the entire Darwinian era and its horrors. But it would be extremely dangerous if we were to give up prematurely. We need to understand reality and the theoretical upper bounds of rational moral agency in the cosmos. But ultimately, when we have done literally everything that it is possible to do to minimize and prevent suffering, I think in some sense we should want to forget about it altogether. But I would stress the risks of premature defeatism.

Lucas Perry: Of course we’re always going to need a self model, a model of the cognitive architecture in which the self model is embedded, it needs to understand the directly adjacent computations which are integrated into it, but it seems like the views of identity go beyond just this self model. Is that the solution to identity? What does open, closed, or empty individualism have to say about something like that?

Andrés Gómez Emilsson: Open, empty and closed as ontological claims, yeah, I mean they are separable from the functional uses of a self model. It does however, have bearings on basically the decision theoretic rationality of an intelligence, because when it comes to planning ahead, if you have the intense objective of being as happy as you can, and somebody offers you a cloning machine and they say, “Hey, you can trade one year of your life for just a completely new copy of yourself.” Do you press the button to make that happen? For making that decision, you actually do require a model of ontological notion of identity, unless you just care about replication.

Lucas Perry: So I think that the problem there is that identity, at least in us apes, is caught up in ethics. If you could have an agent like that where identity was not factored into ethics, then I think that it would make a better decision.

Andrés Gómez Emilsson: It’s definitely a question too of whether you can bootstrap an impartial god’s-eye-view on the wellbeing of all sentient beings without first having developed a sense of own identity and then wanting to preserve it, and finally updating it with more information, you know, philosophy, reasoning, physics. I do wonder if you can start out without caring about identity, and finally concluding with kind of an impartial god’s-eye-view. I think probably in practice a lot of those transitions do happen because the person is first concerned with themselves, and then they update the model of who they are based on more evidence. You know, I could be wrong, it might be possible to completely sidestep Darwinian identities and just jump straight up into impartial care for all sentient beings, I don’t know.

Lucas Perry: So we’re getting into the ethics of identity here, and why it matters. The question for this portion of the discussion is what are the ethical implications of different views on identity? Andres, I think you can sort of kick this conversation off by talking a little bit about the game theory.

Andrés Gómez Emilsson: Right, well yeah, the game theory is surprisingly complicated. Just consider within a given person, in fact, the different “sub agents” of an individual. Let’s say you’re drinking with your friends on a Friday evening, but you know you have to wake up early at 8:00 AM for whatever reason, and you’re deciding whether to have another drink or not. Your intoxicated self says, “Yes, of course. Tonight is all that matters.” Whereas your cautious self might try to persuade you that no, you will also exist tomorrow in the morning.

Within a given person, there’s all kinds of complex game theory that happens between alternative views of identity. Even implicitly it becomes obviously much more tricky when you expand it outwards, how like some social movements in a sense are trying to hack people’s view of identity, whether the unit is your political party, or the country, or the whole ecosystem, or whatever it may be. A key thing to consider here is the existence of legible Schelling points, also called focal points, which is in the essence of communication between entities, what are some kind of guiding principles that they can use in order to effectively coordinate and move towards a certain goal?

I would say that having something like open individualism itself can be a powerful Schelling point for coordination. Especially because if you can be convinced that somebody is an open individualist, you have reasons to trust them. There’s all of this research on how high-trust social environments are so much more conducive to productivity and long-term sustainability than low-trust environments, and expansive notions of identity are very trust building.

On the other hand, from a game theoretical point of view, you also have the problem of defection. Within an open individualist society, you have a small group of people who can fake the test of open individualism. They can take over from within, and instantiate some kind of a dictatorship or some type of a closed individualist takeover of what was a really good society, good for everybody.

This is a serious problem, even when it comes to, for example, forming groups of people with all of them share a certain experience. For example, MDMA, or 5-MeO-DMT, or let’s say deep stages of meditation. Even then, you’ve got to be careful, because people who are resistant to those states may pretend that they have an expanded notion of identity, but actually covertly work towards a much more reduced sense of identity. I have yet to see a credible game theoretically aware solution to how to make this work.

Lucas Perry: If you could clarify the knobs in a person, whether it be altruism, or selfishness, or other things that the different views on identity turn, and if you could clarify how that affects the game theory, then I think that that would be helpful.

Andrés Gómez Emilsson: I mean, I think the biggest knob is fundamentally what experiences count from the point of view of the fact that you expect to, in a sense, be there or expect them to be real, in as real of a way as your current experience is. It’s also contingent on theories of consciousness, because you could be an open individualist and still believe that higher order cognition is necessary for consciousness, and that non-human animals are not conscious. That gives rise to all sorts of other problems, the person presumably is altruistic and cares about others, but they just still don’t include non-human animals for a completely different reason in that case.

Definitely another knob is how you consider what you will be in the future. Whether you consider that to be part of the universe or the entirety of the universe. I guess I used to think that personal identity was very tied to a hedonic tone. I think of them as much more dissociated now. There is a general pattern: people who are very low mood may have kind of a bias towards empty individualism. People who become open individualists often experience a huge surge in positive feelings for a while because they feel that they’re never going to die, like the fear of death greatly diminishes, but I don’t actually think it’s a surefire or a foolproof way of increasing wellbeing, because if you take seriously open individualism, it also comes with terrible implications. Like that hey, we are also the pigs in factory farms. It’s not a very pleasant view.

Lucas Perry: Yeah, I take that seriously.

Andrés Gómez Emilsson: I used to believe for a while that the best thing we could possibly do in the world was to just write a lot of essays and books about why open individualism is true. Now I think it’s important to combine it with consciousness technologies so that, hey, once we do want to upgrade our sense of identity to a greater circle of compassion, that we also have the enhanced happiness and mental stability to be able to actually engage with that without going crazy.

Lucas Perry: This has me thinking about one point that I think is very motivating for me for the ethical case of veganism. Take the common sense, normal consciousness, like most people have, and that I have, you just feel like a self that’s having an experience. You just feel like you are fortunate enough to be born as you, and to be having the Andrés experience or the Lucas experience, and that your life is from birth to death, or whatever, and when you die you will be annihilated, you will no longer have experience. Then who is it that is experiencing the cow consciousness? Who is it that is experiencing the chicken and the pig consciousness? There’s so many instantiations of that, like billions. Even if this is based off of the irrationality, it still feels motivating to me. Yeah, I could just die and wake up as a cow 10 billion times. That’s kind of the experience that is going on right now. The sudden confused awakening into cow consciousness plus factory farming conditions. I’m not sure if you find that completely irrational or motivating or what.

Andrés Gómez Emilsson: No, I mean I think it makes sense. We have a common friend as well, Magnus Vinding. He wrote a pro-veganism book actually kind of with this line of reasoning. It’s called You Are Them. About how post theoretical science of consciousness and identity itself is a strong case for an ethical lifestyle.

Lucas Perry: Just touching here on the ethical implications, some other points that I just want to add here are that when one is identified with one’s phenomenal identity, in particular, I want to talk about the experience of self, where you feel like you’re a closed individualist, which your life is like when you were born, and then up until when you die, that’s you. I think that that breeds a very strong duality in terms of your relationship with your own personal phenomenal consciousness. The suffering and joy which you have direct access to are categorized as mine or not mine.

Those which are mine take high moral and ethical priority over the suffering of others. You’re not mind-melded with all of the other brains, right? So there’s an epistemological limitation there where you’re not directly experiencing the suffering of other people, but the closed individualist view goes a step further and isn’t just saying that there’s an epistemological limitation, but it’s also saying that this consciousness is mine, and that consciousness is yours, and this is the distinction between self and other. And given selfishness, that self consciousness will take moral priority over other consciousness.

That I think just obviously has massive ethical implications with regards to their greed of people. I view here the ethical implications as being important because, at least in the way that human beings function, if one is able to fully rid themselves of the ultimate identification with your personal consciousness as being the content of self, then I can move beyond the duality of consciousness of self and other, and care about all instances of wellbeing and suffering much more equally than I currently do. That to me seems harder to do, at least with human brains. If we have a strong reification and identification with your instances of suffering or wellbeing as your own.

David Pearce: Part of the problem is that the existence of other subjects of experience is metaphysical speculation. It’s metaphysical speculation that one should take extremely seriously: I’m not a solipsist. I believe that other subjects of experience, human and nonhuman, are as real as my experience. But nonetheless, it is still speculative and theoretical. One cannot feel their experiences. There is simply no way, given the way that we are constituted, the way we are now, that one can behave with impartial, God-like benevolence.

Andrés Gómez Emilsson: I guess I would question it perhaps a little bit that we only care about our future suffering within our own experience, because this is me, this is mine, it’s not an other. In a sense I think we care about those more, largely because they’re are more intense, you do see examples of, for example, mirror touch synesthesia, of people who if they see somebody else get hurt, they also experience pain. I don’t mean a fleeting sense of discomfort, but perhaps even actual strong pain because they’re able to kind of reflect that for whatever reason.

People like that are generally very motivated to help others as well. In a sense, their implicit self model includes others, or at least weighs others more than most people do. I mean in some sense you can perhaps make sense of selfishness in this context as the coincidence that what is within our self model is experienced as more intense. But there’s plenty of counter examples to that, including sense of depersonalization or ego death, where you can experience the feeling of God, for example, as being this eternal and impersonal force that is infinitely more intense than you, and therefore it matters more, even though you don’t experience it as you. Perhaps the core issue is what gets the highest amount of intensity within your world simulation.

Lucas Perry: Okay, so I also just want to touch on a little bit about preferences here before we move on to how this is relevant to AI alignment and the creation of beneficial AI. From the moral realist perspective, if you take the metaphysical existence of consciousness very substantially, and you view it as the ground of morality, then different views on identity will shift how you weight the preferences of other creatures.

So from a moral perspective, whatever kinds of views of identity end up broadening your moral circle of compassion closer and closer to the end goal of impartial benevolence for all sentient beings according to their degree and kinds of worth, I would view as a good thing. But now there’s this other way to think about identity because if you’re listening to this, and you’re a moral anti-realist, there is just the arbitrary, evolutionary, and historical set of preferences that exist across all creatures on the planet.

Then the views on identity I think are also obviously again going to weigh into your moral considerations about how much to just respect different preferences, right. One might want to go beyond hedonic consequentialism here, and could just be a preference consequentialist. You could be a deontological ethicist or a virtue ethicist too. We could also consider about how different views on identity as lived experiences would affect what it means to become virtuous, if being virtuous means moving beyond the self actually.

Andrés Gómez Emilsson: I think I understand what you’re getting at. I mean, really there’s kind of two components to ontology. One is what exists, and then the other one is what is valuable. You can arrive at something like open individualism just from the point of view of what exists, but still have disagreements with other open individualists about what is valuable. Alternatively, you could agree on what is valuable with somebody but completely disagree on what exists. To get the power of cooperation of open individualism as a Schelling point, there also needs to be some level of agreement on what is valuable, not just what exists.

It definitely sounds arrogant, but I do think that by the same principle by which you arrive at open individualism or empty individualism, basically nonstandard views of identities, you can also arrive at hedonistic utilitarianism, and that is, again, like the principle of really caring about knowing who or what you are fundamentally. To know yourself more deeply also entails understanding from second to second how your preferences impact your state of consciousness. It is my view that just as open individualism, you can think of it as the implication of taking a very systematic approach to make sense of identity. Likewise, philosophical hedonism is also an implication of taking a very systematic approach at trying to figure out what is valuable. How do we know that pleasure is good?

David Pearce: Yeah, does the pain-pleasure axis disclose the world’s intrinsic metric of (dis)value? There is something completely coercive about pleasure and pain. One can’t transcend the pleasure/pain axis. Compare the effects of taking heroin, or “enhanced interrogation”. There is no one with an inverted pleasure/pain axis. Supposed counter-examples, like sado-masochists, in fact just validate the primacy of the pleasure/pain axis.

What follows from the primacy of the pleasure/pain axis? Should we be aiming, as classical utilitarians urge, to maximize the positive abundance of subjective value in the universe, or at least our forward light-cone? But if we are classical utilitarians, there is a latently apocalyptic implication of classical utilitarianism – namely, that we ought to be aiming to launch something like a utilitronium (or hedonium) shockwave – where utilitronium or hedonium is matter and energy optimized for pure bliss.

So rather than any kind of notion of personal identity as we currently understand it, if one is a classical utilitarian – or if one is programming a computer or a robot with the utility function of classical utilitarianism –  should one therefore essentially be aiming to launch an apocalyptic utilitronium shockwave? Or alternatively, should one be trying to ensure that the abundance of positive value within our cosmological horizon is suboptimal by classical utilitarian criteria?

I don’t actually personally advocate a utilitronium shockwave. I don’t think it’s sociologically realistic. I think much more sociologically realistic is to aim for a world based on gradients of intelligent bliss -because that way, people’s existing values and preferences can (for the most part) be conserved. But nonetheless, if one is a classical utilitarian, it’s not clear one is allowed this kind of messy compromise.

Lucas Perry: All right, so now that we’re getting into the juicy, hedonistic imperative type stuff, let’s talk about here how about how this is relevant to AI alignment and the creation of beneficial AI. I think that this is clear based off of the conversations we’ve had already about the ethical implications, and just how prevalent identity is in our world for the functioning of society and sociology, and just civilization in general.

Let’s limit the conversation for the moment just to AI alignment. And for this initial discussion of AI alignment, I just want to limit it to the definition of AI alignment as developing the technical process by which AIs can learn human preferences, and help further express and idealize humanity. So exploring how identity is important and meaningful for that process, two points I think that it’s relevant for, who are we making the AI for? Different views on identity I think would matter, because if we assume that sufficiently powerful and integrated AI systems are likely to have consciousness or to have qualia, they’re moral agents in themselves.

So who are we making the AI for? We’re making new patients or subjects of morality if we ground morality on consciousness. So from a purely egoistic point of view, the AI alignment process is just for humans. It’s just to get the AI to serve us. But if we care about all sentient beings impartially, and we just want to maximize conscious bliss in the world, and we don’t have this dualistic distinction of consciousness being self or other, we could make the AI alignment process something that is more purely altruistic. That we recognize that we’re creating something that is fundamentally more morally relevant than we are, given that it may have more profound capacities for experience or not.

David, I’m also holding in my hand, I know that you’re skeptical of the ability of AGI or superintelligence to be conscious. I agree that that’s not solved yet, but I’m just working here with the idea of, okay, maybe if they are. So I think it can change the altruism versus selfishness, the motivations around who we’re training the AIs for. And then the second part is why are we making the AI? Are we making it for ourselves or are we making it for the world?

If we take a view from nowhere, what Andrés called a god’s-eye-view, is this ultimately something that is for humanity or is it something ultimately for just making a better world? Personally, I feel that if the end goal is ultimate loving kindness and impartial ethical commitment to the wellbeing of all sentient creatures in all directions, then ideally the process is something that we’re doing for the world, and that we recognize the intrinsic moral worth of the AGI and superintelligence as ultimately more morally relevant descendants of ours. So I wonder if you guys have any reactions to this?

Andrés Gómez Emilsson: Yeah, yeah, definitely. So many. Tongue in cheek, but you’ve just made me chuckle when you said, “Why are we making the AI to begin with?” I think there’s a case to be made that the actual reason why we’re making AI is a kind of an impressive display of fitness in order to signal our intellectual fortitude and superiority. I mean sociologically speaking, you know, actually getting an AI to do something really well. It’s a way in which you can yourself signal your own intelligence, and I guess I worry to some extent that this is a bit of a tragedy of the commons, as it is the case with our weapon development. You’re so concerned with whether you can, and especially because of the social incentives, that you’re going to gain status and be looked at as somebody who’s really competent and smart, that you don’t really stop and wonder whether you should be building this thing in the first place.

Leaving that aside, just from a purely ethically motivated point of view, I do remember thinking and having a lot of discussions many years ago about if we can make a super computer experience what it is like for a human to be on MDMA. Then all of a sudden that supercomputer becomes a moral patient. It actually matters, you probably shouldn’t turn it off. Maybe in fact you should make more of them. A very important thing I’d like to say here is: I think it’s really important to distinguish the notion of intelligence.

On the one hand, as causal power over your environment, and on the other hand as the capacity for self insight, and introspection, and understanding reality. I would say that we tend to confuse these quite a bit. I mean especially in circles that don’t take consciousness very seriously. It’s usually implicitly assumed that having a superhuman ability to control your environment entails that you also have, in a sense, kind of a superhuman sense of self or a superhuman broad sense of intelligence. Whereas even if you are a functionalist, I mean even if you believe that a digital computer can be conscious, you can make a pretty strong case that even then, it is not something automatic. It’s not just that if you program the appropriate behavior, it will automatically also be conscious.

A super straight forward example here is that if you have the Chinese room, if it’s just a giant lookup table, clearly it is not a subject of experience, even though the input / output mapping might be very persuasive. There’s definitely still the problems there, and I think if we aim instead towards maximizing intelligence in the broad sense, that does entail also the ability to actually understand the nature and scope of other states of consciousness. And in that sense, I think a superintelligence of that sort would it be intrinsically aligned with the intrinsic values of consciousness. But there are just so many ways of making partial superintelligences that maybe are superintelligent in many ways, but not in that one in particular, and I worry about that.

David Pearce: I sometimes sketch this simplistic trichotomy, three conceptions of superintelligence. One is a kind of “Intelligence Explosion” of recursively self-improving software-based AI. Then there is the Kurzweilian scenario – a complete fusion of humans and our machines. And then there is, very crudely, biological superintelligence, not just rewriting our genetic source code, but also (and Neuralink prefigures this) essentially “narrow” superintelligence-on-a-chip so that anything that anything a classical digital computer can do a biological human or a transhuman can do.

So yes, I see full-spectrum superintelligence as our biological descendants, super-sentient, able to navigate radically alien states of consciousness. So I think the question that you’re asking is why are we developing “narrow” AI – non-biological machine superintelligence.

Lucas Perry: Speaking specifically from the AI alignment perspective, how you align current day systems and future systems to superintelligence and beyond with human values and preferences, and so the question born of that, in the context of these questions of identity, is who are we making that AI for and why are we making the AI?

David Pearce: If you’ve got Buddha, “I teach one thing and one thing only, suffering and the end of suffering”… Buddha would press the OFF button, and I would press the OFF button.

Lucas Perry: What’s the off button?

David Pearce: Sorry, the notional initiation of a vacuum phase-transition (or something similar) that (instantaneously) obliterates Darwinian life. But when people talk about “AI alignment”, or most people working in the field at any rate, they are not talking about a Buddhist ethic [the end of suffering] – they have something else in mind. In practical terms, this is not a fruitful line of thought to pursue – you know, the implications of Buddhist, Benatarian, negative utilitarian, suffering-focused ethics.

Essentially that one wants to ratchet up hedonic range and hedonic set-points in such a way that you’re conserving people’s existing preferences – even though their existing preferences and values are, in many cases, in conflict with each other. Now, how one actually implements this in a classical digital computer, or a classically parallel connectionist system, or some kind of hybrid, I don’t know precisely.

Andrés Gómez Emilsson: At least there is one pretty famous cognitive scientist and AI theorist does propose the Buddhist ethic of turning the off button of the universe. Thomas Metzinger, and his benevolent, artificial anti-natalism. I mean, yeah. Actually that’s pretty interesting because he explores the idea of an AI that truly kind of extrapolates human values and what’s good for us as subjects of experience. The AI concludes what we are psychologically unable to, which is that the ethical choice is non-existence.

But yeah, I mean, I think that’s, as David pointed out, implausible. I think it’s much better to put our efforts in creating a super cooperator cluster that tries to recalibrate the hedonic set point so that we are animated by gradients of bliss. Sociological constraints are really, really important here. Otherwise you risk…

Lucas Perry: Being irrelevant.

Andrés Gómez Emilsson: … being irrelevant, yeah, is one thing. The other thing is unleashing an ineffective or failed attempt at sterilizing the world, which would be so much, much worse.

Lucas Perry: I don’t agree with this view, David. Generally, I think that Darwinian history has probably been net negative, but I’m extremely optimistic about how good the future can be. And so I think it’s an open question at the end of time, how much misery and suffering and positive experience there was. So I guess I would say I’m agnostic as to this question. But if we get AI alignment right, and these other things, then I think that it can be extremely good. And I just want to tether this back to identity and AI alignment.

Andrés Gómez Emilsson: I do have the strong intuition that if empty individualism is correct at an ontological level, then actually negative utilitarianism can be pretty strongly defended on the grounds that when you have a moment of intense suffering, that’s the entirety of that entity’s existence. And especially with eternalism, once it happens, there’s nothing you can do to prevent it.

There’s something that seems particularly awful about allowing inherently negative experiences to just exist. That said, I think open individualism actually may to some extent weaken that. Because even if the suffering was very intense, you can still imagine that if you identify with consciousness as a whole, you may be willing to undergo some bad suffering as a trade-off for something much, much better in the future.

It sounds completely insane if you’re currently experiencing a cluster headache or something astronomically painful. But maybe from the point of view of eternity, it actually makes sense. Those are still tiny specs of experience relative to the beings that are going to exist in the future. You can imagine Jupiter brains and Dyson spheres just in a constant ecstatic state. I think open individualism might counterbalance some of the negative utilitarian worries and would be something that an AI would have to contemplate and might push it one way or the other.

Lucas Perry: Let’s go ahead and expand the definition of AI alignment. A broader way we can look at the AI alignment problem, or the problem of generating beneficial AI, and making future AI stuff go well, where that is understood is the project of making sure that the technical, political, social, and moral consequences of short-term to super intelligence and beyond, is that as we go through all of that, that is a beneficial process.

Thinking about identity in that process, we were talking about how strong nationalism or strong identity or identification with regards to a nation state is a form of identity construction that people do. The nation or the country becomes part of self. One of the problems of the AI alignment problem is arms racing between countries, and so taking shortcuts on safety. I’m not trying to propose clear answers or solutions here. It’s unclear how successful an intervention here could even be. But these views on identity and how much nationalism shifts or not, I think feed into how difficult or not the problem will be.

Andrés Gómez Emilsson: The point of game theory becomes very, very important in that yes, you do want to help other people who are also trying to improve the well-being of all consciousness. On the other hand, if there’s a way to fake caring about the entirety of consciousness, that is a problem because then you would be using resources on people who would hoard them or even worse wrestle the power away from you so that they can focus on their narrow sense of identity.

In that sense, I think having technologies in order to set particular phenomenal experiences of identity, as well as to be able to detect them, might be super important. But above all, and I mean this is definitely my area of research, having a way of objectively quantifying how good or bad a state of consciousness is based on the activity of a nervous system seems to me like an extraordinarily key component for any kind of a serious AI alignment.

If you’re actually trying to prevent bad scenarios in the future, you’ve got to have a principle way of knowing whether the outcome is bad, or at the very least knowing whether the outcome is terrible. The aligned AI should be able to grasp that a certain state of consciousness, even if nobody has experienced it before, will be really bad and it should be avoided, and that tends to be the lens through which I see this.

In terms of improving people’s internal self-consistency, as David pointed out, I think it’s kind of pointless to try to satisfy a lot of people’s preferences, such as having their favorite sports team win, because there’s really just no way of satisfying everybody’s preferences. In the realm of psychology is where a lot of these interventions would happen. You can’t expect an AI to be aligned with you, if you yourself are not aligned with yourself, right, if you have all of these strange, psychotic, competing sub-agents. It seems like part of the process is going to be developing techniques to become more consistent, so that we can actually be helped.

David Pearce: In terms of risks this century, nationalism has been responsible for most of the wars of the past two centuries, and nationalism is highly likely to lead to catastrophic war this century. And the underlying source of global catastrophic risk? I don’t think it’s AI. It’s male human primates doing what male human primates have been “designed” by evolution to do – to fight, to compete, to wage war. And even vegan pacifists like me, how do we spend their leisure time? Playing violent video games.

There are technical ways one can envisage mitigating the risk. Perhaps it’s unduly optimistic aiming for all-female governance or for a democratically-accountable world state under the auspices of the United Nations. But I think unless one does have somebody with a monopoly on the use of force that we are going to have cataclysmic nuclear war this century. It’s highly likely: I think we’re sleepwalking our way towards disaster. It’s more intellectually exciting discussing exotic risks from AI that goes FOOM, or something like that. But there are much more mundane catastrophes that are, I suspect, going to unfold this century.

Lucas Perry: All right, so getting into the other part here about AI alignment and beneficial AI throughout this next century, there’s a lot of different things that increased intelligence and capacity and power over the world is going to enable. There’s going to be human biological species divergence via AI-enabled bioengineering. There is this fundamental desire for immortality with many people, and the drive towards super intelligence and beyond for some people promises immortality. I think that in terms of closed individualism here, closed individualism is extremely motivating for this extreme self-concern of desire for immortality.

There are people currently today who are investing in say, cryonics, because they want to freeze themselves and make it long enough so that they can somehow become immortal, very clearly influenced by their ideas of identity. As Yuval Noah Harari was saying on our last podcast, it subverts many of the classic liberal myths that we have about the same intrinsic worth across all people; and then if you add humans 2.0 or 3.0 or 4.0 into the mixture, it’s going to subvert that even more. So there are important questions of identity there, I think.

With sufficiently advanced super intelligence people flirt with the idea of being uploaded. The identity questions here which are relevant are if we scan the information architecture or the neural architecture of your brain and upload it, will people feel like that is them? Is it not them? What does it mean to be you? Also, of course, in scenarios where people want to merge with the AI, what is it that you would want to be kept in the merging process? What is superfluous to you? What is not nonessential to your identity or what it means to be you, that you would be okay or not with merging?

And then I think that most importantly here, I’m very interested in the Descendants scenario, where we just view AI as like our evolutionary descendants. There’s this tendency in humanity to not be okay with this descendant scenario. Because of closed individualist views on identity, they won’t see that consciousness is the same kind of thing, or they won’t see it as their own consciousness. They see that well-being through the lens of self and other, so that makes people less interested in they’re being descendant, super-intelligent conscious AIs. Maybe there’s also a bit of speciesism in there.

I wonder if you guys want to have any reactions to identity in any of these processes? Again, they are human, biological species divergence via AI-enabled bioengineering, immortality, uploads, merging, or the Descendants scenario.

David Pearce: In spite of thinking that Darwinian life is sentient malware, I think cryonics should be opt-out, and cryothanasia should be opt-in, as a way to defang death. So long as someone is suspended in optimal conditions, it ought to be possible for advanced intelligence to reanimate that person. And sure, if one is an “empty” individualist, or if you’re the kind of person who wakes up in the morning troubled that you’re not the person who went to sleep last night, this reanimated person may not really be “you”. But if you’re more normal, yes, I think it should be possible to reanimate “you” if you are suspended.

In terms of mind uploads, this is back to the binding problem. Even assuming that you can be scanned with a moderate degree of fidelity, I don’t think your notional digital counterpart is a subject to experience. Even if I am completely wrong here and that somehow subjects or experience inexplicably emerge in classical digital computers, there’s no guarantee that the qualia would be the same. After all, you can replay a game of chess with perfect fidelity, but there’s no guarantee incidentals like the textures or the pieces will be the same. Why expect the textures of qualia to be the same, but that isn’t really my objection. It’s the fact that a digital computer cannot support phenomenally-bound subjects of experience.

Andrés Gómez Emilsson: I also think cryonics is really good. Even though with a different nonstandard view of personal identity, it’s kind of puzzling. Why would you care about it? Lots of practical considerations. I like what David said of like defanging death. I think that’s a good idea, but also giving people skin in the game for the future.

People who enact policy and become politically successful, often tend to be 50 years plus, and there’s a lot of things that they weigh on, that they will not actually get to experience, that probably biases politicians and people who are enacting policy to focus, especially just on short-term gains as opposed to really genuinely trying to improve the long-term; and I think cryonics would be helpful in giving people skin in the game.

More broadly speaking, it does seem to be the case that what aspect of transhumanism a person is likely to focus on depends a lot on their theories of identity. I mean, if we break down transhumanism into the three supers of super happiness, super longevity, and super intelligence, the longevity branch is pretty large. There’s a lot of people looking for ways of rejuvenating, preventing aging, and reviving ourselves, or even uploading ourselves.

Then there’s people who are very interested in super intelligence. I think that’s probably the most popular type of transhumanism nowadays. That one I think does rely to some extent on people having a functionalist information theoretic account of their own identity. There’s all of these tropes of, “Hey, if you leave a large enough digital footprint online, a super intelligence will be able to reverse engineer your brain just from that, and maybe reanimate you in the future,” or something of that nature.

And then there’s, yeah, people like David and I, and the Qualia Research Institute as well, that care primarily about super happiness. We think of it as kind of a requirement for a future that is actually worth living. You can have all the longevity and all the intelligence you want, but if you’re not happy, I don’t really see the point. A lot of the concerns with longevity, fear of death and so on, in retrospect, I think will be probably considered some kind of a neurosis. Obviously a genetically adaptive neurosis, but something that can be cured with mood-enhancing technologies.

Lucas Perry: Leveraging human selfishness or leveraging how most people are closed individualists seems like the way to having good AI alignment. To one extent, I find the immortality pursuits through cryonics to be pretty elitist. But I think it’s a really good point that giving the policymakers and the older generation and people in power more skin in the game over the future is both potentially very good and also very scary.

It’s very scary to the extent to which they could get absolute power, but also very good if you’re able to mitigate risks of them developing absolute power. But again, as you said, it motivates them towards more deeply and profoundly considering future considerations, being less myopic, being less selfish. So that getting the AI alignment process right and doing the necessary technical work, it’s not done for a short-term nationalistic gain. Again, with an asterisk here that the risk is unilaterally getting more and more power.

Andrés Gómez Emilsson: Yeah, yeah, yeah. Also, without cryonics, another way to increase skin in the game, may be more straight-forwardly positive. Bliss technologies do that. A lot of people who are depressed or nihilistic or vengeful or misanthropic, they don’t really care about destroying the world or watching it burn, so to speak, because they don’t have anything to lose. But if you have a really reliable MDMA-like technological device that reliably produces wonderful states of consciousness, I think people will be much more careful about preserving their own health, and also not watch the world burn, because they know “I could be back home and actually experiencing this rather than just trying to satisfy my misanthropic desires.”

David Pearce: Yeah, the happiest people I know work in the field of existential risk. Rather than great happiness just making people reckless, it can also make them more inclined to conserve and protect.

Lucas Perry: Awesome. I guess just one more thing that I wanted to hit on these different ways that technology is going to change society is… I don’t know. In my heart, the ideal is the vow to liberate all sentient beings in all directions from suffering. The closed individualist view seems generally fairly antithetical to that, but there’s also this desire for me to be realistic about leveraging that human selfishness towards that ethic. The capacity here for conversations on identity going forward, if we can at least give people more information to subvert or challenge or give them information about why the common sense closed individualist view might be wrong, I think it would just have a ton of implications for how people end up viewing human species divergence, or immortality, or uploads, or merging, or the Descendants scenario.

In Max’s book, Life 3.0, he describes a bunch of different scenarios for how you want the world to be as the impact of AI grows, if we’re lucky enough to reach superintelligent AI. These scenarios that he gives are, for example, an Egalitarian Utopia where humans, cyborgs and uploads coexist peacefully thanks to property abolition and guaranteed income. There’s a Libertarian Utopia where human, cyborgs, and uploads, and superintelligences coexist peacefully thanks to property rights. There is a Protector God scenario where essentially omniscient and omnipotent AI maximizes human happiness by intervening only in ways that preserve our feeling of control of our own destiny, and hides well enough that many humans even doubt the AI’s existence. There’s Enslaved God, which is kind of self-evident. The AI is a slave to our will. The Descendants Scenario, which I described earlier, where AIs replace human beings, but give us a graceful exit, making us view them as our worthy descendants, much as parents feel happy and proud to have a child who’s smarter than them, who learns from them, and then accomplishes what they could only dream of, even if they can’t live to see it.

After the book was released, Max did a survey of which ideal societies people were most excited about. And basically most people wanted either the Egalitarian Utopia or the Libertarian Utopia. These are very human centric of course, because I think most people are closed individualists, so okay, they’re going to pick that. And then they wanted to Protector God next, and then the fourth most popular was an Enslaved God. The fifth most popular was Descendants.

I’m a very big fan of the Descendants scenario. Maybe it’s because of my empty individualism. I just feel here that as views on identity are quite uninformed for most people or most people don’t take it, or closed individualism just seems intuitively true from the beginning because it seems like it’s been selected for mostly by Darwinian evolution to have a very strong sense of self. I just think that challenging conventional views on identity will very much shift the kinds of worlds that people are okay with or the kinds of worlds that people want.

If we had a big, massive public education campaign about the philosophy of identity and then take the same survey later, I think that the numbers would be much more different. That seems to be a necessary part of the education of humanity in the process of beneficial AI and AI alignment. To me, the Descendant scenario just seems best because it’s more clearly in line with this ethic of being impartially devoted to maximizing the well-being of sentience everywhere.

I’m curious to know your guys’ reaction to these different scenarios about how you feel views on identity as they shift will inform the kinds of worlds that humanity finds beautiful or meaningful or worthy of pursuit through and with AI.

David Pearce: If today’s hedonic range is -10 to zero to +10, yes, whether building a civilization with a hedonic range of +70 to +100, i.e. with more hedonic contrast, or +90 to a +100 with less hedonic contrast, the multiple phase-changes in consciousness involved are completely inconceivable to humans. But in terms of full-spectrum superintelligence, what we don’t know is the nature of their radically alien-state-spaces of consciousness – far more different than, let’s say, dreaming consciousness and waking consciousness – that I suspect that intelligence going to explore. And we just do not have the language, the concepts, to conceptualize what these alien state-spaces of consciousness are like. I suspect billions of years of consciousness-exploration lie ahead. I assume that a central element will be the pleasure-axis – that these states will be generically wonderful – but they will otherwise be completely alien. And so talk of “identity” with primitive Darwinian malware like us is quite fanciful.

Andrés Gómez Emilsson: Consider the following thought experiment where you have a chimpanzee right next to a person, who is right next to another person, where the third one is currently on a high dose of DMT, combined with ketamine and salvia. If you consider those three entities, I think very likely, actually the experience of the chimpanzee and the experience of the sober person are very much alike, compared to the person who is on DMT, ketamine, and salvia, who is in a completely different alien-state space of consciousness. And in some sense, biologically you’re unrelatable from the point of view of qualia and the sense of self, and time, and space, and all of those things.

Personally, I think having intimations with alien-state spaces of consciousness is actually good also quite apart from changes in a feeling that you’ve become one with the universe. Merely having experience with really different states of consciousness makes it easier for you to identify with consciousness as a whole: you realize, okay, my DMT self, so to speak, cannot exist naturally, and it’s just so much different to who I am normally, and even more different than perhaps being a chimpanzee, that you could imagine caring as well about alien-state spaces of consciousness that are completely nonhuman, and I think that it can be pretty helpful.

The other reason why I give a lot of credence to open individualism being a winning strategy, even just from a purely political and sociological point of view, is that open individualists are not afraid of changing their own state of consciousness, because they realize that it will be them either way. Whereas closed individualists can actually be pretty scared of, for example, taking DMT or something like that. They tend to have at least the suspicion that, oh my gosh, is the person who is going to be on DMT me? Am I going to be there? Or maybe I’m just being possessed by a different entity with completely different values and consciousness.

With open individualism, no matter what type of consciousness your brain generates, it’s going to be you. It massively amplifies the degrees of freedom for coordination. Plus, you’re not afraid of tuning your consciousness for particular new computational uses. Again, this could be extremely powerful as a cooperation and coordination tool.

To summarize, I think a plausible and very nice future scenario is going to be the mixture of open individualism, on the one hand; second, generically enhanced hedonic tone, so that everything is amazing; and third, expanded range of possible experiences. That we will have the tools to experience pretty much arbitrary state spaces of consciousness and consider them our own.

The Descendant scenario, I think it’s much easier to imagine thinking of the new entities as your offspring if you can at least know what they feel like. You can take a drug or something and know, “okay, this is what it’s like to be a post-human android. I like it. This is wonderful. It’s better than being a human.” That would make it possible.

Lucas Perry: Wonderful. This last question is just the role of identity in the AI itself, or the superintelligence itself, as it experiences the world, the ethical implications of those identity models, et cetera. There is the question of identity now, and if we get aligned superintelligence and post-human superintelligence, and we have Jupiter rings or Dyson spheres or whatever, that there’s the question of identity evolving in that system. We are very much creating Life 3.0, and there is a substantive question of what kind of identity views it will take, what it’s phenomenal experience of self or not will have. This all is relevant and important because if we’re concerned with maximizing conscious well-being, then these are flavors of consciousness which would require a sufficiently, rigorous science of consciousness to understand their valence properties.

Andrés Gómez Emilsson: I mean, I think it’s a really, really good thing to think about. The overall frame I tend to utilize, to analyze this kind of questions is, I wrote an article and you can find it in Qualia Computing that is called “Consciousness Versus Replicators.” I think that is a pretty good overarching ethical framework where I basically, I describe how different kinds of ethics can give different worldviews, but also they depend on your philosophical sophistication.

At the very beginning, you have ethics such as the battle between good and evil, but then you start introspecting. You’re like, “okay, what is evil exactly,” and you realize that nobody sets out to do evil from the very beginning. Usually, they actually have motivations that make sense within their own experience. Then you shift towards this other theory that’s called the balance between good and evil, super common in Eastern religions. Also, people who take a lot of psychedelics or meditate a lot tend to arrive to that view, as in like, “oh, don’t be too concerned about suffering or the universe. It’s all a huge yin and yang. The evil part makes the good part better,” or like weird things like that.

Then you have a little bit more developed, what I call it gradients of wisdom. I would say like Sam Harris, and definitely a lot of people in our community think that way, which is they come to the realization that there are societies that don’t help human flourishing, and there are ideologies that do, and it’s really important to be discerning. We can’t just say, “Hey, everything is equally good.”

But finally, I would say the fourth level would be consciousness versus replicators, which involves, one, taking open individualism seriously; and second, realizing that anything that matters, it matters because it influences experiences. Can you have that as your underlying ethical principle? There’s this danger of replicators hijacking our motivational architecture in order to pursue their own replication, independent of the well-being of sentience, and you guard for that. I think you’re in a pretty good space to actually do a lot of good. I would say perhaps that is the sort of ethics or morality we should think about how to instantiate in an artificial intelligence.

In the extreme, you have what I call a pure replicator, and a pure replicator essentially is a system or an entity that uses all of its resources exclusively to make copies of itself, independently of whether that causes good or bad experiences elsewhere. It just doesn’t care. I would argue that humans are not pure replicators. That in fact, we do care about consciousness, at the very least our own consciousness. And evolution is recruiting the fact that we care about consciousness in order to, as a side effect, increase our inclusiveness our genes.

But these discussions we’re having right now, this is the possibility of a post-human ethic is the genie is getting out of the bottle in the sense of consciousness is kind of taking its own values and trying to transcend the selfish genetic process that gave rise to it.

Lucas Perry: Ooh, I like that. That’s good. Anything to add, David?

David Pearce: No. Simply, I hope we have a Buddhist AI.

Lucas Perry: I agree. All right, so I’ve really enjoyed this conversation. I feel more confused now than when I came in, which is very good. Yeah, thank you both so much for coming on.

End of recorded material

FLI Podcast: On Consciousness, Morality, Effective Altruism & Myth with Yuval Noah Harari & Max Tegmark

Neither Yuval Noah Harari nor Max Tegmark need much in the way of introduction. Both are avant-garde thinkers at the forefront of 21st century discourse around science, technology, society and humanity’s future. This conversation represents a rare opportunity for two intellectual leaders to apply their combined expertise — in physics, artificial intelligence, history, philosophy and anthropology — to some of the most profound issues of our time. Max and Yuval bring their own macroscopic perspectives to this discussion of both cosmological and human history, exploring questions of consciousness, ethics, effective altruism, artificial intelligence, human extinction, emerging technologies and the role of myths and stories in fostering societal collaboration and meaning. We hope that you’ll join the Future of Life Institute Podcast for our final conversation of 2019, as we look toward the future and the possibilities it holds for all of us.

Topics discussed include:

  • Max and Yuval’s views and intuitions about consciousness
  • How they ground and think about morality
  • Effective altruism and its cause areas of global health/poverty, animal suffering, and existential risk
  • The function of myths and stories in human society
  • How emerging science, technology, and global paradigms challenge the foundations of many of our stories
  • Technological risks of the 21st century

Timestamps:

0:00 Intro

3:14 Grounding morality and the need for a science of consciousness

11:45 The effective altruism community and it’s main cause areas

13:05 Global health

14:44 Animal suffering and factory farming

17:38 Existential risk and the ethics of the long-term future

23:07 Nuclear war as a neglected global risk

24:45 On the risks of near-term AI and of artificial general intelligence and superintelligence

28:37 On creating new stories for the challenges of the 21st century

32:33 The risks of big data and AI enabled human hacking and monitoring

47:40 What does it mean to be human and what should we want to want?

52:29 On positive global visions for the future

59:29 Goodbyes and appreciations

01:00:20 Outro and supporting the Future of Life Institute Podcast

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today, I’m excited to be bringing you a conversation between professor, philosopher, and historian Yuval Noah Harari and MIT physicist and AI researcher, as well as Future of Life Institute president, Max Tegmark.  Yuval is the author of popular science best sellers, Sapiens: A Brief History of Humankind, Homo Deus: A Brief History of Tomorrow, and of 21 Lessons for the 21st Century. Max is the author of Our Mathematical Universe and Life 3.0: Being human in the Age of Artificial Intelligence. 

This episode covers a variety of topics related to the interests and work of both Max and Yuval. It requires some background knowledge for everything to make sense and so i’ll try to provide some necessary information for listeners unfamiliar with the area of Max’s work in particular here in the intro. If you already feel well acquainted with Max’s work, feel free to skip ahead a minute or use the timestamps in the description for the podcast. 

Topics discussed in this episode include: morality, consciousness, the effective altruism community, animal suffering, existential risk, the function of myths and stories in our world, and the benefits and risks of emerging technology. For those new to the podcast or effective altruism, effective altruism or EA for short is a philosophical and social movement that uses evidence and reasoning to determine the most effective ways of benefiting and improving the lives of others. And existential risk is any risk that has the potential to eliminate all of humanity or, at the very least, to kill large swaths of the global population and leave the survivors unable to rebuild society to current living standards. Advanced emerging technologies are the most likely source of existential risk in the 21st century, for example through unfortunate uses of synthetic biology, nuclear weapons, and powerful future artificial intelligence misaligned with human values and objectives.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate

These contributions make it possible for us to bring you conversations like these and to develop the podcast further. You can also follow us on your preferred listening platform by searching for us directly or following the links on the page for this podcast found in the description. 

And with that, here is our conversation between Max Tegmark and Yuval Noah Harari.

Max Tegmark: Maybe to start at a place where I think you and I both agree, even though it’s controversial, I get the sense from reading your books that you feel that morality has to be grounded on experience, subjective experience. It’s just what I like to call consciousness. I love this argument you’ve given, for example, that people who think consciousness is just bullshit and irrelevant. You challenge them to tell you what’s wrong with torture if it’s just a bunch of electrons and quarks moving around this way rather than that way.

Yuval Noah Harari: Yeah. I think that there is no morality without consciousness and without subjective experiences. At least for me, this is very, very obvious. One of my concerns, again, if I think about the potential rise of AI, is that AI will be super superintelligence but completely non-conscious, which is something that we never had to deal with before. There’s so much of the philosophical and theological discussions of what happens when there is a greater intelligence in the world. We’ve been discussing this for thousands of years with God of course as the object of discussion, but the assumption always was that this greater intelligence would be A) conscious in some sense, and B) good, infinitely good.

And therefore I think that the question we are facing today is completely different and to a large extent is I suspect that we are really facing philosophical bankruptcy that what we’ve done for thousands of years didn’t really prepare us for the kind of challenge that we have now.

Max Tegmark: I certainly agree that we have a very urgent challenge there. I think there is an additional risk which comes from the fact that, I’m embarrassed as a scientist that we actually don’t know for sure which kinds of information processing are conscious and which are not. For many, many years, I’ve been told for example that it’s okay to put lobsters in hot water to boil them but alive before we eat them because they don’t feel any suffering. And then I guess some guy asked the lobster does this hurt? And it didn’t say anything and it was a self serving argument. But then there was a recent study out that showed that actually lobsters do feel pain and they banned lobster boiling in Switzerland now.

I’m very nervous whenever we humans make these very self serving arguments saying, don’t worry about the slaves. It’s okay. They don’t feel, they don’t have a soul, they won’t suffer or women don’t have a soul or animals can’t suffer. I’m very nervous that we’re going to make the same mistake with machines just because it’s so convenient. When I feel the honest truth is, yeah, maybe future superintelligent machines won’t have any experience, but maybe they will. And I think we really have a moral imperative there to do the science to answer that question because otherwise we might be creating enormous amounts of suffering that we don’t even know exists.

Yuval Noah Harari: For this reason and for several other reasons, I think we need to invest as much time and energy in researching consciousness as we do in researching and developing intelligence. If we develop sophisticated artificial intelligence before we really understand consciousness, there is a lot of really big ethical problems that we just don’t know how to solve. One of them is the potential existence of some kind of consciousness in these AI systems, but there are many, many others.

Max Tegmark: I’m so glad to hear you say this actually because I think we really need to distinguish between artificial intelligence and artificial consciousness. Some people just take for granted that they’re the same thing.

Yuval Noah Harari: Yeah, I’m really amazed by it. I’m having quite a lot of discussions about these issues in the last two or three years and I’m repeatedly amazed that a lot of brilliant people just don’t understand the difference between intelligence and consciousness, and when it comes up in discussions about animals, but it also comes up in discussions about computers and about AI. To some extent the confusion is understandable because in humans and other mammals and other animals, consciousness and intelligence, they really go together, but we can’t assume that this is the law of nature and that it’s always like that. In a very, very simple way, I would say that intelligence is the ability to solve problems. Consciousness is the ability to feel things like pain and pleasure and love and hate.

Now in humans and chimpanzees and dogs and maybe even lobsters, we solve problems by having feelings. A lot of the problems we solve, who to mate with and where to invest our money and who to vote for in the elections, we rely on our feelings to make these decisions, but computers make decisions a completely different way. At least today, very few people would argue that computers are conscious and still they can solve certain types of problems much, much better than we.

They have high intelligence in a particular field without having any consciousness and maybe they will eventually reach superintelligence without ever developing consciousness. And we don’t know enough about these ideas of consciousness and superintelligence, but it’s at least feasible that you can solve all problems better than human beings and still have zero consciousness. You just do it in a different way. Just like airplanes fly much faster than birds without ever developing feathers.

Max Tegmark: Right. That’s definitely one of the reasons why people are so confused. There are two other reasons I noticed also among even very smart people why they are utterly confused on this. One is there’s so many different definitions of consciousness. Some people define consciousness in a way that’s almost equivalent intelligence, but if you define it the way you did, the ability to feel things simply having subjective experience. I think a lot of people get confused because they have always thought of subjective experience and intelligence for that matter as something mysterious. That can only exist in biological organisms like us. Whereas what I think we’re really learning from the whole last of century of progress in science is that no, intelligence and consciousness are all about information processing.

People fall prey to this carbon chauvinism idea that it’s only carbon or meat that can have these traits. Whereas in fact it really doesn’t matter whether the information is processed by a carbon atom and a neuron in the brain or by the silicon atom in a computer.

Yuval Noah Harari: I’m not sure I completely agree. I mean, we still don’t have enough data on that. There doesn’t seem to be any reason that we know of that consciousness would be limited to carbon based life forms, but so far this is the case. So maybe we don’t know something. My hunch is that it could be possible to have non-organic consciousness, but until we have better evidence, there is an open possibility that maybe there is something about organic biochemistry, which is essential and we just don’t understand.

And also with the other open case, we are not really sure that’s consciousness is just about information processing. I mean, at present, this is the dominant view in the life sciences, but we don’t really know because we don’t understand consciousness. My personal hunch is that nonorganic consciousness is possible, but I wouldn’t say that we know that for certain. And the other point is that really if you think about it in the broadest sense possible, I think that there is an entire potential universe of different conscious states and we know just a tiny, tiny bit of it.

Max Tegmark: Yeah.

Yuval Noah Harari: Again, thinking a little about different life forms, so human beings are just one type of life form and there are millions of other life forms that existed and billions of potential life forms that never existed but might exist in the future. And it’s a bit like that with consciousness that we really know just human consciousness, we don’t understand even the consciousness of other animals and beyond that potentially there is an infinite number of conscious states or traits that never existed and might exist in the future.

Max Tegmark: I agree with all of that. And I think if you can have nonorganic consciousness, artificial consciousness, which would be my guess, although we don’t know it, I think it’s quite clear then that the mind space of possible artificial consciousness is vastly larger than anything that evolution has given us, so we have to have a very open mind.

If we simply take away from this that we should understand which entities biological and otherwise are conscious and can experience suffering, pleasure and so on, and we try to base our morality on this idea that we want to create more positive experiences and eliminate suffering, then this leads straight into what I find very much at the core of the so called effective altruism community, which we with the Future of Life Institute view ourselves as part of where the idea is we want to help do what we can to make a future that’s good in that sense. Lots of positive experiences, not negative ones and we want to do it effectively.

We want to put our limited time and money and so on into those efforts which will make the biggest difference. And the EA community has for a number of years been highlighting a top three list of issues that they feel are the ones that are most worth putting effort into in this sense. One of them is global health, which is very, very non-controversial. Another one is animal suffering and reducing it. And the third one is preventing life from going extinct by doing something stupid with technology.

I’m very curious whether you feel that the EA movement has basically picked out the correct three things to focus on or whether you have things you would subtract from that list or add to it. Global health, animal suffering, X-risk.

Yuval Noah Harari: Well, I think that nobody can do everything, so whether you’re an individual or an organization, it’s a good idea to pick a good cause and then focus on it and not spend too much time wondering about all the other things that you might do. I mean, these three causes are certainly some of the most important in the world. I would just say that about the first one. It’s not easy at all to determine what are the goals. I mean, as long as health means simply fighting illnesses and sicknesses and bringing people up to what is considered as a normal level of health, then that’s not very problematic.

But in the coming decades, I think that the healthcare industry would focus and more, not on fixing problems but rather on enhancing abilities, enhancing experiences, enhancing bodies and brains and minds and so forth. And that’s much, much more complicated both because of the potential issues of inequality and simply that we don’t know where to aim for. One of the reasons that when you ask me at first about morality, I focused on suffering and not on happiness is that suffering is a much clearer concept than happiness and that’s why when you talk about health care, if you think about this image of the line of normal health, like the baseline of what’s a healthy human being, it’s much easier to deal with things falling under this line than things that potentially are above this line. So I think even this first issue, it will become extremely complicated in the coming decades.

Max Tegmark: And then for the second issue on animal suffering, you’ve used some pretty strong words before. You’ve said that industrial farming is one of the worst crimes in history and you’ve called the fate of industrially farmed animals one of the most pressing ethical questions of our time. A lot of people would be quite shocked when they hear you using strong words about this since they routinely eat factory farmed meat. How do you explain to them?

Yuval Noah Harari: This is quite straightforward. I mean, we are talking about billions upon billions of animals. The majority of large animals today in the world are either humans or are domesticated animals, cows and pigs and chickens and so forth. And so we’re talking about a lot of animals and we are talking about a lot of pain and misery. The industrially farmed cow and chicken are probably competing for the title of the most miserable creature that ever existed. They are capable of experiencing a wide range of sensations and emotions and in most of these industrial facilities they are experiencing the worst possible sensations and emotions.

Max Tegmark: In my case, you’re preaching to the choir here. I find this so disgusting that my wife and I just decided to mostly be vegan. I don’t go preach to other people about what they should do, but I just don’t want to be a part of this. It reminds me so much also things you’ve written about yourself, about how people used to justify having slaves before by saying, “It’s the white man’s burden. We’re helping the slaves. It’s good for them”. And much of the same way now, we make these very self serving arguments for why we should be doing this. What do you personally take away from this? Do you eat meat now, for example?

Yuval Noah Harari: Personally I define myself as vegan-ish. I mean I’m not strictly vegan. I don’t want to make kind of religion out of it and start thinking in terms of purity and whatever. I try to limit as far as possible mindful movement with industries that harm animals for no good reason and it’s not just meat and dairy and eggs, it can be other things as well. The chains of causality in the world today are so complicated that you cannot really extricate yourself completely. It’s just impossible. So for me, and also what I tell other people is just do your best. Again, don’t make it into a kind of religious issue. If somebody comes and tells you that you, I’m now thinking about this animal suffering and I decided to have one day a week without meat then don’t start blaming this person for eating meat the other six days. Just congratulate them on making one step in the right direction.

Max Tegmark: Yeah, that sounds not just like good morality but also good psychology if you actually want to nudge things in the right direction. And then coming to the third one, existential risk. There, I love how Nick Bostrom asks us to compare these two scenarios one in which some calamity kills 99% of all people and another where it kills 100% of all people and then he asks how much worse is the second one. The point being obviously is you know that if we kill everybody we might actually forfeit having billions or quadrillions or more of future minds in the future experiencing these amazing things for billions of years. This is not something I’ve seen you talk as much about in you’re writing it. So I’m very curious how you think about this morally? How you weigh future experiences that could exist versus the ones that we know exist now?

Yuval Noah Harari: I don’t really know. I don’t think that we understand consciousness and experience well enough to even start making such calculations. In general, my suspicion, at least based on our current knowledge, is that it’s simply not a mathematical entity that can be calculated. So we know all these philosophical riddles that people sometimes enjoy so much debating about whether you have five people have this kind and a hundred people of that kind and who should you save and so forth and so on. It’s all based on the assumption that experience is a mathematical entity that can be added and subtracted. And my suspicion is that it’s just not like that.

To some extent, yes, we make these kinds of comparison and calculations all the time, but on a deeper level, I think it’s taking us in the wrong direction. At least at our present level of knowledge, it’s not like eating ice cream is one point of happiness. Killing somebody is a million points of misery. So if by killing somebody we can allow 1,000,001 persons to enjoy ice cream, it’s worth it.

I think the problem here is not that we given the wrong points to the different experiences, it’s just it’s not a mathematical entity in the first place. And again, I know that in some cases we have to do these kinds of calculations, but I will be extremely careful about it and I would definitely not use it as the basis for building entire moral and philosophical projects.

Max Tegmark: I certainly agree with you that it’s an extremely difficult set of questions you get into if you try to trade off positives against negatives, like you mentioned in the ice cream versus murder case there. But I still feel that all in all, as a species, we tend to be a little bit too sloppy and flippant about the future and maybe partly because we haven’t evolved to think so much about what happens in billions of years anyway, and if we look at how reckless we’ve been with nuclear weapons, for example, I recently was involved with our organization giving this award to honor Vasily Arkhipov who quite likely prevented nuclear war between the US and the Soviet Union, and most people hadn’t even heard about that for 40 years. More people have heard of Justin Bieber, than Vasily Arkhipov even though I would argue that that would really unambiguously had been a really, really bad thing and that we should celebrate people who do courageous acts that prevent nuclear war, for instance.

In the same spirit, I often feel concerned that there’s so little attention, even paid to risks that we drive ourselves extinct or cause giants catastrophes compared to how much attention we pay to the Kardashians or whether we can get 1% less unemployment next year. So I’m curious if you have some sympathy for my angst here or whether you think I’m overreacting.

Yuval Noah Harari: I completely agree. I often define it that we are now kind of irresponsible gods. Certainly with regard to the other animals and the ecological system and with regard to ourselves, we have really divine powers of creation and destruction, but we don’t take our job seriously enough. We tend to be very irresponsible in our thinking, and in our behavior. On the other hand, part of the problem is that the number of potential apocalypses is growing exponentially over the last 50 years. And as a scholar and as a communicator, I think it’s part of our job to be extremely careful in the way that we discuss these issues with the general public. And it’s very important to focus the discussion on the more likely scenarios because if we just go on bombarding people with all kinds of potential scenarios of complete destruction, very soon we just lose people’s attention.

They become extremely pessimistic that everything is hopeless. So why worry about all that? So I think part of the job of the scientific community and people who deal with these kinds of issues is to really identify the most likely scenarios and focus the discussion on that. Even if there are some other scenarios which have a small chance of occurring and completely destroying all of humanity and maybe all of life, but we just can’t deal with everything at the same time.

Max Tegmark: I completely agree with that. With one caveat, I think it’s very much in the spirit of effective altruism, what you said. We want to focus on the things that really matter the most and not turn everybody into hypochondriac, paranoid, getting worried about everything. The one caveat I would give is, we shouldn’t just look at the probability of each bad thing happening but we should look at the expected damage it will do so the probability of times how bad it is.

Yuval Noah Harari: I agree.

Max Tegmark: Because nuclear war for example, maybe the chance of having an accidental nuclear war between the US and Russia is only 1% per year or 10% per year or one in a thousand per year. But if you have the nuclear winter caused by that by soot and smoke in the atmosphere, you know, blocking out the sun for years, that could easily kill 7 billion people. So most people on Earth and mass starvation because it would be about 20 Celsius colder. That means that on average if it’s 1% chance per year, which seems small, you’re still killing on average 70 million people. That’s the number that sort of matters I think. That means we should make it a higher priority to reduce that more.

Yuval Noah Harari: With nuclear war, I would say that we are not concerned enough. I mean, too many people, including politicians have this weird impression that well, “Nuclear war, that’s history. No, that was in the 60s and 70s people worried about it.”

Max Tegmark: Exactly.

Yuval Noah Harari: “It’s not a 21st century issue.” This is ridiculous. I mean we are now in even greater danger, at least in terms of the technology than we were in the Cuban missile crisis. But you must remember this in Stanley Kubrick, Dr Strange Love-

Max Tegmark: One of my favorite films of all time.

Yuval Noah Harari: Yeah. And so the subtitle of the film is “How I Stopped Fearing and Learned to Love the Bomb.”

Max Tegmark: Exactly.

Yuval Noah Harari: And the funny thing is it actually happened. People stopped fearing them. Maybe they don’t love it very much, but compared to the 50s and 60s people just don’t talk about it. Like you look at the Brexit debate in Britain and Britain is one of the leading nuclear powers in the world and it’s not even mentioned. It’s not part of the discussion anymore. And that’s very problematic because I think that this is a very serious existential threat. But I’ll take a counter example, which is in the field of AI, even though I understand the philosophical importance of discussing the possibility of general AI emerging in the future and then rapidly taking over the world and you know all the paperclips scenarios and so forth.

I think that at the present moment it really distracts attention of people from the immediate dangers of the AI arms race, which has a far, far higher chance of materializing in the next, say, 10, 20, 30 years. And we need to focus people’s minds on these short term dangers. And I know that there is a small chance that general AI would be upon us say in the next 30 years. But I think it’s a very, very small chance, whereas the chance that kind of primitive AI will completely disrupt the economy, the political system and human life in the next 30 years is about a 100%. It’s bound to happen.

Max Tegmark: Yeah.

Yuval Noah Harari: And I worry far more about what primitive AI will do to the job market, to the military, to people’s daily lives than about a general AI appearing in the more distant future.

Max Tegmark: Yeah, there are a few reactions to this. We can talk more about artificial general intelligence and superintelligence later if we get time. But there was a recent survey of AI researchers around the world asking what they thought and I was interested to note that actually most of them guessed that we will get artificial general intelligence within decades. So I wouldn’t say that the chance is small, but I would agree with you, that is certainly not going to happen tomorrow.

But if we eat our vitamins, you and I and meditate, go to the gym, it’s quite likely we will actually get to experience it. But more importantly, coming back to what you said earlier, I see all of these risks as really being one in the same risk in the sense that what’s happened is of course that science has kept getting ever more powerful. And science definitely gives us ever more powerful technology. And I love technology. I’m a nerd. I work at a university that has technology in its name and I’m optimistic we can create an inspiring high tech future for life if we win what I like to call the wisdom race.

The race between the growing power of the technology and the growing wisdom with which we manage it or putting it in your words, that you just used there, if we can basically learn to take more seriously our job as stewards of this planet, you can look at every science and see exactly the same thing happening. So we physicists are kind of proud that we gave the world cell phones and computers and lasers, but our problem child has been nuclear energy obviously, nuclear weapons in particular. Chemists are proud that they gave the world all these great new materials and their problem child is climate change. Biologists in my book actually have done the best so far, they actually got together in the 70s and persuaded leaders to ban biological weapons and draw a clear red line more broadly between what was acceptable and unacceptable uses of biology.

And that’s why today most people think of biology as really a force for good, something that cures people or helps them live healthier lives. And I think AI is right now lagging a little bit in time. It’s finally getting to the point where they’re starting to have an impact and they’re grappling with the same kind of question. They haven’t had big disasters yet, so they’re in the biology camp there, but they’re trying to figure out where do they draw the line between acceptable and unacceptable uses so you don’t get a crazy military AI arms race in lethal autonomous weapons, so you don’t create very destabilizing income inequality so that AI doesn’t create 1984 on steroids, et cetera.

And I wanted to ask you about what sort of new story as a society you feel we need in order to tackle these challenges. And I’ve been very, very persuaded by your arguments that stories are so central to society for us to collaborate and accomplish stuff, but you’ve also made a really compelling case. I think that’s the most popular recent stories are all getting less powerful or popular. Communism, now there’s a lot of disappointment, and this liberalism and it feels like a lot of people are kind of craving for a new story that involves technology somehow and that can help us get our act together and also help us feel meaning and purpose in this world. But I’ve never in your books seen a clear answer to what you feel that this new story should be.

Yuval Noah Harari: Because I don’t know. If I knew the new story, I will tell it. I think we are now in a kind of double bind, we have to fight on two different fronts. On the one hand we are witnessing in the last few years the collapse of the last big modern story of liberal democracy and liberalism more generally, which has been, I would say as a story, the best story humans ever came up with and it did create the best world that humans ever enjoyed. I mean the world of the late 20th century and early 21st century with all its problems, it’s still better for humans, not for cows or chickens for humans, it’s still better than it’s any previous moment in history.

There are many problems, but anybody who says that this was a bad idea, I would like to hear which year are you thinking about as a better year? Now in 2019, when was it better? In 1919, in 1719, in 1219? I mean, for me, it’s obvious this has been the best story we have come up with.

Max Tegmark: That’s so true. I have to just admit that whenever I read the news for too long, I start getting depressed. But then I always cheer myself up by reading history and reminding myself it was always worse in the past.

Yuval Noah Harari: That never fails. I mean, the last four years have been quite bad, things are deteriorating, but we are still better off than in any previous era, but people are losing faith. In this story, we are reaching really a situation of zero story. All the big stories of the 20th century have collapsed or are collapsing and the vacuum is currently filled by nostalgic fantasies, nationalistic and religious fantasies, which simply don’t offer any real solutions to the problems of the 21st century. So on the one hand we have the task of supporting or reviving the liberal democratic system, which is so far the only game in town. I keep listening to the critics and they have a lot of valid criticism, but I’m waiting for the alternative and the only thing I hear is completely unrealistic nostalgic fantasies about going back to some past golden era that as a historian I know was far, far worse, and even if it was not so far worse, you just can’t go back there. You can’t recreate the 19th century or the middle ages under the conditions of the 21st century. It’s impossible.

So we have this one struggle to maintain what we have already achieved, but then at the same time, on a much deeper level, my suspicion is that the liberal stories we know it at least is really not up to the challenges of the 21st century because it’s built on foundations that the new science and especially the new technologies of artificial intelligence and bioengineering are just destroying the belief we are inherited in the autonomous individual, in free will, in all these basically liberal mythologies. They will become increasingly untenable in contact with new powerful bioengineering and artificial intelligence.

To put it in a very, very concise way, I think we are entering the era of hacking human beings, not just hacking smartphones and bank accounts, but really hacking homo sapiens which was impossible before. I mean, AI gives us the computing power necessary and biology gives us the necessary biological knowledge and when you combine the two you get the ability to hack human beings and if you continue to try, and build society on the philosophical ideas of the 18th century about the individual and freewill and then all that in a world where it’s feasible technically to hack millions of people systematically, it’s just not going to work. And we need an updated story, I’ll just finish this thought. And our problem is that we need to defend the story from the nostalgic fantasies at the same time that we are replacing it by something else. And it’s just very, very difficult.

When I began writing my books like five years ago, I thought the real project was to really go down to the foundations of the liberal story, expose the difficulties and build something new. And then you had all these nostalgic populous eruption of the last four or five years, and I personally find myself more and more engaged in defending the old fashioned liberal story instead of replacing it. Intellectually, it’s very frustrating because I think the really important intellectual work is finding out the new story, but politically it’s far more urgent. If we allow the emergence of some kind of populist authoritarian regimes, then whatever comes out of it will not be a better story.

Max Tegmark: Yeah, unfortunately I agree with your assessment here. I love to travel. I work in basically the United Nations like environment at my university with students from all around the world, and I have this very strong sense that people are feeling increasingly lost around the world today because the stories that used to give them a sense of purpose and meaning and so on are sort of dissolving in front of their eyes. And of course, we don’t like to feel lost then likely to jump on whatever branches are held out for us. And they are often just retrograde things. Let’s go back to the good old days and all sorts of other unrealistic things. But I agree with you that the rise in population we’re seeing now is not the cause. It’s a symptom of people feeling lost.

So I think I was a little bit unfair to ask you in a few minutes to answer the toughest question of our time, what should our new story be? But maybe we could break it into pieces a little bit and say what are at least some elements that we would like the new story to have? For example, it should accomplish, of course, multiple things. It has to incorporate technology in a meaningful way, which our past stories did not and has to incorporate AI progress in biotech, for example. And it also has to be a truly global story, I think this time, which isn’t just a story about how America is going to get better off or China is going to get better off, but one about how we’re all going to get better off together.

And we can put up a whole bunch of other requirements. If we start maybe with this part about the global nature of the story, people disagree violently about so many things around world, but are there any ingredients at all of the story that you think people around the world, would already agreed to some principles or ideas?

Yuval Noah Harari: Again to, I don’t really know. I mean, I don’t know what the new story would look like. Historically, these kinds of really grand narratives, they aren’t created by two, three people having a discussion and thinking, okay, what new stories should we tell? It’s far deeper and more powerful forces that come together to create these new stories. I mean, even trying to say, okay, we don’t have the full view, but let’s try to put a few ingredients in place. The whole thing about the story is that the whole comes before the parts. The narrative is far more important than the individual facts that build it up.

So I’m not sure that we can start creating the story by just, okay, let’s put the first few sentences and who knows how it will continue. You wrote books. I write books, we know that the first few sentences are the last sentences that you usually write.

Max Tegmark: That’s right.

Yuval Noah Harari: Only when you know how the whole book is going to look like, but then you go back to the beginning and you write the first few sentences.

Max Tegmark: Yeah. And sometimes the very last thing you write is the new title.

Yuval Noah Harari: So I agree that whatever the new story is going to be, it’s going to be global. The world is now too small and too interconnected to have just a story for one part of the world. It won’t work. And also it will have to take very seriously both the most updated science and the most updated technology. Something that liberal democracy as we know it, it’s basically still in the 18th century. It’s taking an 18th century story and simply following it to its logical conclusions. For me, maybe the most amazing thing about liberal democracy is it really completely disregarded all the discoveries of the life sciences over the last two centuries.

Max Tegmark: And of the technical sciences!

Yuval Noah Harari: I mean, as if Darwin never existed and we know nothing about evolution. I mean, you can basically meet these folks from the middle of the 18th century, whether it’s Rousseau, Jefferson, and all these guys, and they will be surprised by some of the conclusions we have drawn for the basis they provided us. But fundamentally it’s nothing has changed. Darwin didn’t really change anything. Computers didn’t really change anything. And I think the next story won’t have that luxury of being able to ignore the discoveries of science and technology.

The number one thing it we’ll have to take into account is how do humans live in a world when there is somebody out there that knows you better than you know yourself, but that somebody isn’t God, that somebody is a technological system, which might not be a good system at all. That’s a question we never had to face before. We could always comfort yourself with the idea that we are kind of a black box with the rest of humanity. Nobody can really understand me better than I understand myself. The king, the emperor, the church, they don’t really know what’s happening within me. Maybe God knows. So we had a lot of discussions about what to do with that, the existence of a God who knows us better than we know ourselves, but we didn’t really have to deal with a non-divine system that can hack us.

And this system is emerging. I think it will be in place within our lifetime in contrast to generally artificial intelligence that I’m skeptical whether I’ll see it in my lifetime. I’m convinced we will see, if we live long enough, a system that knows us better than we know ourselves and the basic premises of democracy, of free market capitalism, even of religion just don’t work in such a world. How does democracy function in a world when somebody understands the voter better than the voter understands herself or himself? And the same with the free market. I mean, if the customer is not right, if the algorithm is right, then we need a completely different economic system. That’s the big question that I think we should be focusing on. I don’t have the answer, but whatever story will be relevant to the 21st century, will have to answer this question.

Max Tegmark: I certainly agree with you that democracy has totally failed to adapt to the developments in the life sciences and I would add to that to the developments in the natural sciences too. I watched all of the debates between Trump and Clinton in the last election here in the US and I didn’t know what is artificial intelligence getting mentioned even a single time, not even when they talked about jobs. And the voting system we have, with an electoral college system here where it doesn’t even matter how people vote except in a few swing states where there’s so little influence from the voter to what actually happens. Even though we now have blockchain and could easily implement technical solutions where people will be able to have much more influence. Just reflects that we basically declared victory on our democratic system hundreds of years ago and haven’t updated it.

And I’m very interested in how we can dramatically revamp it if we believe in some form of democracy so that we actually can have more influence on how our society is run as individuals and how we can have good reason to actually trust the system. If it is able to hack us. That is actually working in our best interest. There’s a key tenant in religions that you’re supposed to be able to trust the God as having your best interest in mind. And I think many people in the world today do not trust that their political leaders actually have their best interest in mind.

Yuval Noah Harari: Certainly, I mean that’s the issue. You give a really divine powers to far from divine systems. We shouldn’t be too pessimistic. I mean, the technology is not inherently evil either. And what history teaches us about technology is that technology is also never deterministic. You can use the same technologies to create very different kinds of societies. We saw that in the 20th century when the same technologies were used to build communist dictatorships and liberal democracies, there was no real technological difference between the USSR and the USA. It was just people making different decisions what to do with the same technology.

I don’t think that the new technology is inherently anti-democratic or inherently anti-liberal. It really is about choices that people make even in what kind of technological tools to develop. If I think about, again, AI and surveillance, at present we see all over the world that corporations and governments are developing AI tools to monitor individuals, but technically we can do exactly the opposite. We can create tools that monitor and survey government and corporations in the service of individuals. For instance, to fight corruption in the government as an individual. It’s very difficult for me to say monitor nepotism, politicians appointing all kinds of family members to lucrative positions in the government or in the civil service, but it should be very easy to build an AI tool that goes over the immense amount of information involved. And in the end you just get a simple application on your smartphone you enter the name of a politician and you immediately see within two seconds who he appointed or she appointed from their family and friends to what positions. It should be very easy to do it. I don’t see the Chinese government creating such an application anytime soon, but people can create it.

Or if you think about the fake news epidemic, basically what’s happening is that corporations and governments are hacking us in their service, but the technology can work the other way around. We can develop an antivirus for the mind, the same way we developed antivirus for the computer. We need to develop an antivirus for the mind, an AI system that serves me and not a corporation or a government, and it gets to know my weaknesses in order to protect me against manipulation.

At present, what’s happening is that the hackers are hacking me. they get to know my weaknesses and that’s how they are able to manipulate me. For instance, with fake news. If they discover that I already have a bias against immigrants, they show me one fake news story, maybe about a group of immigrants raping local women. And I easily believe that because I already have this bias. My neighbor may have an opposite bias. She may think that anybody who opposes immigration is a fascist and the same hackers will find that out and will show her a fake news story about, I don’t know, right wing extremists murdering immigrants and she will believe that.

And then if I meet my neighbor, there is no way we can have a conversation about immigration. Now we can and should, develop an AI system that serves me and my neighbor and alerts us. Look, somebody is trying to hack you, somebody trying to manipulate you. And if we learn to trust this system that it serves us, it doesn’t serve any corporation or government. It’s an important tool in protecting our minds from being manipulated. Another tool in the same field, we are now basically feeding enormous amounts of mental junk food to our minds.

We spend hours every day basically feeding our hatred, our fear, our anger, and that’s a terrible and stupid thing to do. The thing is that people discovered that the easiest way to grab our attention is by pressing the hate button in the mind or the fear button in the mind, and we are very vulnerable to that.

Now, just imagine that somebody develops a tool that shows you what’s happening to your brain or to your mind as you’re watching these YouTube clips. Maybe it doesn’t block you, it’s not Big Brother, that blocks, all these things. It’s just like when you buy a product and it shows you how many calories are in the product and how much saturated fat and how much sugar there is in the product. So at least in some cases you learn to make better decisions. Just imagine that you have this small window in your computer which tells you what’s happening to your brain as your watching this video and what’s happening to your levels of hatred or fear or anger and then make your own decision. But at least you are more aware of what kind of food you’re giving to your mind.

Max Tegmark: Yeah. This is something I am also very interested in seeing more of AI systems that empower the individual in all the ways that you mentioned. We are very interested at the Future of Life Institute actually in supporting this kind of thing on the nerdy technical side and I think this also drives home this very important fact that technology is not good or evil. Technology is an amoral tool that can be used both for good things and for bad things. That’s exactly why I feel it’s so important that we develop the wisdom to use it for good things rather than bad things. So in that sense, AI is no different than fire, which can be used for good things and for bad things and but we as a society have developed a lot of wisdom now in fire management. We educate our kids about it. We have fire extinguishers and fire trucks and with artificial intelligence and other powerful tech, I feel we need to do better in similarly developing the wisdom that can steer the technology towards better uses.

Now we’re reaching the end of the hour here. I’d like to just finish with two more questions. One of them is about what we wanted to ultimately mean to be human as we get ever more tech. You put it so beautifully and I think it was Sapiens that tech progress is gradually taking us beyond the asking what we want to ask instead what we want to want and I guess even more broadly how we want to brand ourselves, how we want to think about ourselves as humans in the high tech future.

I’m quite curious. First of all, you personally, if you think about yourself in 30 years, 40 years, what do you want to want and what sort of society would you like to live in say 2060 if you could have it your way?

Yuval Noah Harari: It’s a profound question. It’s a difficult question. My initial answer is that I would really like not just to know the truth about myself but to want to know the truth about myself. Usually the main obstacle in knowing the truth about yourself is that you don’t want to know it. It’s always accessible to you. I mean, we’ve been told for thousands of years by, all the big names in philosophy and religion. Almost all say the same thing. Get to know yourself better. It’s maybe the most important thing in life. We haven’t really progressed much in the last thousands of years and the reason is that yes, we keep getting this advice but we don’t really want to do it.

Working on our motivation in this field I think would be very good for us. It will also protect us from all the naive utopias which tend to draw far more of our attention. I mean, especially as technology will give us all, at least some of us more and more power, the temptations of naive utopias are going to be more and more irresistible and I think the really most powerful check on these naive utopias is really getting to know yourself better.

Max Tegmark: Would you like what it means to be, Yuval 2060 to be more on the hedonistic side that you have all these blissful experiences and serene meditation and so on, or would you like there to be a lot of challenges in there that gives you a sense of meaning or purpose? Would you like to be somehow upgraded with technology?

Yuval Noah Harari: None of the above. I mean at least if I think deeply enough about these issues and yes, I would like to be upgraded but only in the right way and I’m not sure what the right way is. I’m not a great believer in blissful experiences in meditation or otherwise, they tend to be traps that this is what we’ve been looking for all our lives and for millions of years all the animals they just constantly look for blissful experiences and after a couple of millions of years of evolution, it doesn’t seem that it brings us anywhere and especially in meditation you learn these kinds of blissful experiences can be the most deceptive because you fall under the impression that this is the goal that you should be aiming at.

This is a really good meditation. This is a really deep meditation simply because you’re very pleased with yourself and then you spend countless hours later on trying to get back there or regretting that you are not there and in the end it’s just another experience. What we experience with right now when we are now talking on the phone to each other and I feel something in my stomach and you feel something in your head, this is as special and amazing as the most blissful experience of meditation. The only difference is that we’ve gotten used to it so we are not amazed by it, but right now we are experiencing the most amazing thing in the universe and we just take it for granted. Partly because we are distracted by this notion that out there, there is something really, really special that we should be experiencing. So I’m a bit suspicious of blissful experiences.

Again, I would just basically repeat that to really understand yourself also means to really understand the nature of these experiences and if you really understand that, then so many of these big questions will be answered. Similarly, the question that we dealt with in the beginning of how to evaluate different experiences and what kind of experiences should we be creating for humans or for artificial consciousness. For that you need to deeply understand the nature of experience. Otherwise, there’s so many naive utopias that can tempt you. So I would focus on that.

When I say that I want to know the truth about myself, it’s really also it means to really understand the nature of these experiences.

Max Tegmark: To my very last question, coming back to this story and ending on a positive inspiring note. I’ve been thinking back about when new stories led to very positive change. And then I started thinking about a particular Swedish story. So the year was 1945, people were looking at each other all over Europe saying, “We screwed up again”. How about we, instead of using all this technology, people were saying then to build ever more powerful weapons. How about we instead use it to create a society that benefits everybody where we can have free health care, free university for everybody, free retirement and build a real welfare state. And I’m sure there were a lot of curmudgeons around who said “awe you know, that’s just hopeless naive dreamery, go smoke some weed and hug a tree because it’s never going to work.” Right?

But this story, this optimistic vision was sufficiently concrete and sufficiently both bold and realistic seeming that it actually caught on. We did this in Sweden and it actually conquered the world. Not like when the Vikings tried and failed to do it with swords, but this idea conquered the world. So now so many rich countries have copied this idea. I keep wondering if there is another new vision or story like this, some sort of welfare 3.0 which incorporates all of the exciting new technology that has happened since ’45 on the biotech side, on the AI side, et cetera, to envision a society which is truly bold and sufficiently appealing to people around the world that people could rally around this.

I feel that the shared positive experience is something that more than anything else can really help foster collaboration around the world. And I’m curious what you would say in terms of, what do you think of as a bold, positive vision for the planet now going away from what you spoke about earlier with yourself personally, getting to know yourself and so on.

Yuval Noah Harari: I think we can aim towards what you define as welfare 3.0 which is again based on a better understanding of humanity. The welfare state, which many countries have built over the last decades have been an amazing human achievement and it achieved many concrete results in fields that we knew what to aim for, like in health care. So okay, let’s vaccinate all the children in the country and let’s make sure everybody has enough to eat. We succeeded in doing that and the kind of welfare 3.0 program would try to expand that to other fields in which our achievements are far more moderate simply because we don’t know what to aim for. We don’t know what we need to do.

If you think about mental health, it’s much more difficult than providing food to people because we have a very poor understanding of the human mind and of what mental health is. Even if you think about food, one of the scandals of science is that we still don’t know what to eat, so we basically solve the problem of enough food. Now actually we have the opposite problem of people eating too much and not too little, but beyond the medical quantity, it’s I think one of the biggest scandals of science that after centuries we still don’t know what we should eat. And mainly because so many of these miracle diets, they are a one size fits all as if everybody should eat the same thing. Whereas obviously it should be tailored to individuals.

So if you harness the power of AI and big data and machine learning and biotechnology, you could create the best dietary system in the world that tell people individually what would be good for them to eat. And this will have enormous side benefits in reducing medical problems, in reducing waste of food and resources, helping the climate crisis and so forth. So this is just one example.

Max Tegmark: Yeah. Just on that example, I would argue also that part of the problem is beyond that we just don’t know enough that actually there are a lot of lobbyists who are telling people what to eat, knowing full well that that’s bad for them just because that way they’ll make more of a profit. Which gets back to your question of hacking, how we can prevent ourselves from getting hacked by powerful forces that don’t have our best interest in mind. But the things you mentioned seemed like a little bit of first world perspective which it’s easy to get when we live in Israel or Sweden, but of course there are many people on the planet who still live in pretty miserable situations where we actually can quite easily articulate how to make things at least a bit better.

But then also in our societies, I mean you touched on mental health. There’s a significant rise in depression in the United States. Life expectancy in the US has gone down three years in a row, which does not suggest the people are getting happier here. I’m wondering if you also in your positive vision of the future that we can hopefully end on here. We’d want to throw in some ingredients about the sort of society where we don’t just have the lowest rung of the Maslow pyramid taken care of food and shelter and stuff, but also feel meaning and purpose and meaningful connections with our fellow lifeforms.

Yuval Noah Harari: I think it’s not just a first world issue. Again, even if you think about food, even in developing countries, more people today die from diabetes and diseases related to overeating or to overweight than from starvation and mental health issues are certainly not just the problem for the first world. People are suffering from that in all countries. Part of the issue is that mental health is far, far more expensive. Certainly if you think in terms of going to therapy once or twice a week than just giving vaccinations or antibiotics. So it’s much more difficult to create a robust mental health system in poor countries, but we should aim there. It’s certainly not just for the first world. And if we really understand humans better, we can provide much better health care, both physical health and mental health for everybody on the planet, not just for Americans or Israelis or Swedes.

Max Tegmark: In terms of physical health, it’s usually a lot cheaper and simpler to not treat the diseases, but to instead prevent them from happening in the first place by reducing smoking, reducing people eating extremely unhealthy foods, et cetera. And the same way with mental health, presumably a key driver of a lot of the problems we have is that we have put ourselves in a human made environment, which is incredibly different from the environment that we evolved to flourish in. And I’m wondering rather than just trying to develop new pills to help us live in this environment, which is often optimized for the ability to produce stuff, rather than for human happiness. If you think that by deliberately changing our environment to be more conducive to human happiness might improve our happiness a lot without having to treat it, treat mental health disorders.

Yuval Noah Harari: It will demand the enormous amounts of resources and energy. But if you are looking for a big project for the 21st century, then yeah, that’s definitely a good project to undertake.

Max Tegmark: Okay. That’s probably a good challenge from you on which to end this conversation. I’m extremely grateful for having had this opportunity talk with you about these things. These are ideas I will continue thinking about with great enthusiasm for a long time to come and I very much hope we can stay in touch and actually meet in person, even, before too long.

Yuval Noah Harari: Yeah. Thank you for hosting me.

Max Tegmark: I really can’t think of anyone on the planet who thinks more profoundly about the big picture of the human condition here than you and it’s such an honor.

Yuval Noah Harari: Thank you. It was a pleasure for me too. Not a lot of opportunities to really go deeply about these issues. I mean, usually you get pulled away to questions about the 2020 presidential elections and things like that, which is important. But, we still have also to give some time to the big picture.

Max Tegmark: Yeah. Wonderful. So once again, todah, thank you so much.

Lucas Perry: Thanks so much for tuning in and being a part of our final episode of 2019. Many well and warm wishes for a happy and healthy new year from myself and the rest of the Future of Life Institute team. This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

FLI Podcast: Existential Hope in 2020 and Beyond with the FLI Team

As 2019 is coming to an end and the opportunities of 2020 begin to emerge, it’s a great time to reflect on the past year and our reasons for hope in the year to come. We spend much of our time on this podcast discussing risks that will possibly lead to the extinction or the permanent and drastic curtailing of the potential of Earth-originating intelligent life. While this is important and useful, much has been done at FLI and in the broader world to address these issues in service of the common good. It can be skillful to reflect on this progress to see how far we’ve come, to develop hope for the future, and to map out our path ahead. This podcast is a special end of the year episode focused on meeting and introducing the FLI team, discussing what we’ve accomplished and are working on, and sharing our feelings and reasons for existential hope going into 2020 and beyond.

Topics discussed include:

  • Introductions to the FLI team and our work
  • Motivations for our projects and existential risk mitigation efforts
  • The goals and outcomes of our work
  • Our favorite projects at FLI in 2019
  • Optimistic directions for projects in 2020
  • Reasons for existential hope going into 2020 and beyond

Timestamps:

0:00 Intro

1:30 Meeting the Future of Life Institute team

18:30 Motivations for our projects and work at FLI

30:04 What we strive to result from our work at FLI

44:44 Favorite accomplishments of FLI in 2019

01:06:20 Project directions we are most excited about for 2020

01:19:43 Reasons for existential hope in 2020 and beyond

01:38:30 Outro

 

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today’s episode is a special end of the year episode structured as an interview with members of the FLI core team. The purpose of this episode is to introduce the members of our team and their roles, explore the projects and work we’ve been up to at FLI throughout the year, and discuss future project directions we are excited about for 2020. Some topics we explore are the motivations behind our work and projects, what we are hoping will result from them, favorite accomplishments at FLI in 2019, and general trends and reasons we see for existential hope going into 2020 and beyond.

If you find this podcast interesting and valuable, you can follow us on your preferred listening platform like on itunes, soundcloud, google play, stitcher, and spotify

If you’re curious to learn more about the Future of Life Institute, our team, our projects, and our feelings about the state and ongoing efforts related to existential risk mitigation, then I feel you’ll find this podcast valuable. So, to get things started, we’re going to have the team introduce ourselves, and our role(s) at the Future of life Institute

Jared Brown: My name is Jared Brown, and I’m the Senior Advisor for Government Affairs at the Future of Life Institute. I help inform and execute FLI’s strategic advocacy work on governmental policy. It’s sounds a little bit behind the scenes because it is, but I primarily work in the U.S. and in global forums like the United Nations.

Kirsten Gronlund: My name is Kirsten and I am the Editorial Director for The Future of Life Institute. Basically, I run the website. I also create new content and manage the content that’s being created to help communicate the issues that FLI works on. I have been helping to produce a lot of our podcasts. I’ve been working on getting some new long form articles written; we just came out with one about CRISPR and gene drives. Right now I’m actually working on putting together a book list for recommended reading for things related to effective altruism and AI and existential risk. I also do social media, and write the newsletter, and a lot of things. I would say that my job is to figure out what is most important to communicate about what FLI does, and then to figure out how it’s best to communicate those things to our audience. Experimenting with different forms of content, experimenting with different messaging. Communication, basically, and writing and editing.

Meia Chita-Tegmark: I am Meia Chita-Tegmark. I am one of the co-founders of the Future of Life Institute. I am also the treasurer of the Institute, and recently I’ve been focusing many of my efforts on the Future of Life website and our outreach projects. For my day job, I am a postdoc in the human-robot interaction lab at Tufts University. My training is in social psychology, so my research actually focuses on the human end of the human-robot interaction. I mostly study uses of assistive robots in healthcare and I’m also very interested in ethical implications of using, or sometimes not using, these technologies. Now, with the Future of Life Institute, as a co-founder, I am obviously involved in a lot of the decision-making regarding the different projects that we are pursuing, but my main focus right now is the FLI website and our outreach efforts.

Tucker Davey: I’m Tucker Davey. I’ve been a member of the FLI core team for a few years. And for the past few months, I’ve been pivoting towards focusing on projects related to FLI’s AI communication strategy, various projects, especially related to advanced AI and artificial general intelligence, and considering how FLI can best message about these topics. Basically these projects are looking at what we believe about the existential risk of advanced AI, and we’re working to refine our core assumptions and adapt to a quickly changing public understanding of AI. In the past five years, there’s been much more money and hype going towards advanced AI, and people have new ideas in their heads about the risk and the hope from AI. And so, our communication strategy has to adapt to those changes. So that’s kind of a taste of the questions we’re working on, and it’s been really interesting to work with the policy team on these questions.

Jessica Cussins Newman: My name is Jessica Cussins Newman, and I am an AI policy specialist with the Future of Life Institute. I work on AI policy, governance, and ethics, primarily. Over the past year, there have been significant developments in all of these fields, and FLI continues to be a key stakeholder and contributor to numerous AI governance forums. So it’s been exciting to work on a team that’s helping to facilitate the development of safe and beneficial AI, both nationally and globally. To give an example of some of the initiatives that we’ve been involved with this year, we provided comments to the European Commission’s high level expert group on AI, to the Defense Innovation Board’s work on AI ethical principles, to the National Institute of Standards and Technology, or NIST, which developed a plan for federal engagement on technical AI standards.

We’re also continuing to participate in several multi-stakeholder initiatives, such as the Partnership on AI, the CNAS AI Task Force, and the UN Secretary General’s high level panel, and additional cooperation among others. I think all of this is helping to lay the groundwork for a more trustworthy AI, and we’ve also been engaged with direct policy engagement. Earlier this year we co-hosted an AI policy briefing at the California state legislature, and met with the White House Office of Science and Technology Policy. Lastly, on the educational side of this work, we maintain an online resource for global AI policy. So this includes information about national AI strategies and provides background resources and policy recommendations around some of the key issues.

Ian Rusconi: My name is Ian Rusconi and I edit and produce these podcasts. Since FLI’s podcasts aren’t recorded in a controlled studio setting, the interviews often come with a host of technical issues, so some of what I do for these podcasts overlaps with forensic audio enhancement, removing noise from recordings; removing as much of the reverb as possible from recordings, which works better sometimes than others; removing clicks and pops and sampling errors and restoring the quality of clipping audio that was recorded too loudly. And then comes the actual editing, getting rid of all the breathing and lip smacking noises that people find off-putting, and cutting out all of the dead space and vocal dithering, um, uh, like, you know, because we aim for a tight final product that can sometimes end up as much as half the length of the original conversation even before any parts of the conversation are cut out.

Part of working in an audio only format is keeping things to the minimum amount of information required to get your point across, because there is nothing else that distracts the listener from what’s going on. When you’re working with video, you can see people’s body language, and that’s so much of communication. When it’s audio only, you can’t. So a lot of the time, if there is a divergent conversational thread that may be an interesting and related point, it doesn’t actually fit into the core of the information that we’re trying to access, and you can construct a more meaningful narrative by cutting out superfluous details.

Emilia Javorsky: My name’s Emilia Javorsky and at the Future of Life Institute, I work on the topic of lethal autonomous weapons, mainly focusing on issues of education and advocacy efforts. It’s an issue that I care very deeply about and I think is one of the more pressing ones of our time. I actually come from a slightly atypical background to be engaged in this issue. I’m a physician and a scientist by training, but what’s conserved there is a discussion of how do we use AI in high stakes environments where life and death decisions are being made. And so when you are talking about the decisions to prevent harm, which is my field of medicine, or in the case of lethal autonomous weapons, the decision to enact lethal harm, there’s just fundamentally different moral questions, and also system performance questions that come up.

Key ones that I think about a lot are system reliability, accountability, transparency. But when it comes to thinking about lethal autonomous weapons in the context of the battlefield, there’s also this inherent scalability issue that arises. When you’re talking about scalable weapon systems, that quickly introduces unique security challenges in terms of proliferation and an ability to become what you could quite easily define as weapons of mass destruction. 

There’s also the broader moral questions at play here, and the question of whether we as a society want to delegate the decision to take a life to machines. And I personally believe that if we allow autonomous weapons to move forward and we don’t do something to really set a stake in the ground, it could set an irrecoverable precedent when we think about getting ever more powerful AI aligned with our values in the future. It is a very near term issue that requires action.

Anthony Aguirre: I’m Anthony Aguirre. I’m a professor of physics at the University of California at Santa Cruz, and I’m one of FLI’s founders, part of the core team, and probably work mostly on the policy related aspects of artificial intelligence and a few other topics. 

I’d say there are two major efforts that I’m heading up. One is the overall FLI artificial intelligence policy effort. That encompasses a little bit of our efforts on lethal autonomous weapons, but it’s mostly about wider issues of how artificial intelligence development should be thought about, how it should be governed, what kind of soft or hard regulations might we contemplate about it. Global efforts which are really ramping up now, both in the US and Europe and elsewhere, to think about how artificial intelligence should be rolled out in a way that’s kind of ethical, that keeps with the ideals of society, that’s safe and robust and in general is beneficial, rather than running into a whole bunch of negative side effects. That’s part of it.

And then the second thing is I’ve been thinking a lot about what sort of institutions and platforms and capabilities might be useful for society down the line that we can start to create, and nurture and grow now. So I’ve been doing a lot of thinking about… let’s imagine that we’re in some society 10 or 20 or 30 years from now that’s working well, how did it solve some of the problems that we see on the horizon? If we can come up with ways that this fictitious society in principle solved those problems, can we try to lay the groundwork for possibly actually solving those problems by creating new structures and institutions now that can grow into things that could help solve those problems in the future?

So an example of that is Metaculus. This is a prediction platform that I’ve been involved with in the last few years. So this is an effort to create a way to better predict what’s going to happen and make better decisions, both for individual organizations and FLI itself, but just for the world in general. This is kind of a capability that it would be good if the world had, making better predictions about all kinds of things and making better decisions. So that’s one example, but there are a few others that I’ve been contemplating and trying to get spun up.

Max Tegmark: Hi, I’m Max Tegmark, and I think of myself as having two jobs. During the day, I do artificial intelligence research at MIT, and on nights and weekends, I help lead the Future of Life Institute. My day job at MIT used to be focused on cosmology, because I was always drawn to the very biggest questions. The bigger the better, and studying our universe and its origins seemed to be kind of as big as it gets. But in recent years, I’ve felt increasingly fascinated that we have to understand more about how our own brains work, how our intelligence works, and building better artificial intelligence. Asking the question, how can we make sure that this technology, which I think is going to be the most powerful ever, actually becomes the best thing ever to happen to humanity, and not the worst.

Because all technology is really a double-edged sword. It’s not good or evil, it’s just a tool that we can do good or bad things with. If we think about some of the really horrible things that have happened because of AI systems, so far, it’s largely been not because of evil, but just because people didn’t understand how the system worked, and it did something really bad. So what my MIT research group is focused on is exactly tackling that. How can you take today’s AI systems, which are often very capable, but total black boxes… So that if you ask your system, “Why should this person be released on probation, but not this one?” You’re not going to get any better answer than, “I was trained on three terabytes of data and this is my answer. Beep, beep. Boop, boop.” Whereas, I feel we really have the potential to make systems that are just as capable, and much more intelligible. 

Trust should be earned and trust should be built based on us actually being able to peek inside the system and say, “Ah, this is why it works.” And the reason we have founded the Future of Life Institute was because all of us founders, we love technology, and we felt that the reason we would prefer living today rather than any time in the past, is all because of technology. But, for the first time in cosmic history, this technology is also on the verge of giving us the ability to actually self-destruct as a civilization. If we build AI, which can amplify human intelligence like never before, and eventually supersede it, then just imagine your least favorite leader on the planet, and imagine them having artificial general intelligence so they can impose their will on the rest of Earth.

How does that make you feel? It does not make me feel great, and I had a New Year’s resolution in 2014 that I was no longer allowed to complain about stuff if I didn’t actually put some real effort into doing something about it. This is why I put so much effort into FLI. The solution is not to try to stop technology, it just ain’t going to happen. The solution is instead win what I like to call the wisdom race. Make sure that the wisdom with which we manage our technology grows faster than the power of the technology.

Lucas Perry: Awesome, excellent. As for me, I’m Lucas Perry, and I’m the project manager for the Future of Life Institute. I’ve been with FLI for about four years now, and have focused on enabling and delivering projects having to do with existential risk mitigation. Beyond basic operations tasks at FLI that help keep things going, I’ve seen my work as having three cornerstones, these being supporting research on technical AI alignment, on advocacy relating to existential risks and related issues, and on direct work via our projects focused on existential risk. 

In terms of advocacy related work, you may know me as the host of the AI Alignment Podcast Series, and more recently the host of the Future of Life Institute Podcast. I see my work on the AI Alignment Podcast Series as promoting and broadening the discussion around AI alignment and AI safety to a diverse audience of both technical experts and persons interested in the issue.

There I am striving to include a diverse range of voices from many different disciplines, in so far as they can inform the AI alignment problem. The Future of Life Institute Podcast is a bit more general, though often dealing with related issues. There I strive to have conversations about avant garde subjects as they relate to technological risk, existential risk, and cultivating the wisdom with which to manage powerful and emerging technologies. For the AI Alignment Podcast, our most popular episode of all time so far is On Becoming a Moral Realist with Peter Singer, and a close second and third were On Consciousness, Qualia, and Meaning with Mike Johnson and Andres Gomez Emilsson, and An Overview of Technical AI Alignment with Rohin Shah. There are two parts to that podcast. These were really great episodes, and I suggest you check them out if they sound interesting to you. You can do that under the podcast tab on our site or by finding us on your preferred listening platform.

As for the main FLI Podcast Series, our most popular episodes have been an interview with FLI President Max Tegmark called Life 3.0: Being Human in the Age of Artificial intelligence. A podcast similar to this one last year, called Existential Hope in 2019 and Beyond was the second most listened to FLI podcast. And then the third is a more recent podcast called The Climate Crisis As An Existential Threat with Simon Beard and Hayden Belfield. 

In so far as the other avenue of my work, my support of research can be stated quite simply as fostering review of grant applications, and also reviewing interim reports for dispersing funds related to AGI safety grants. And then just touching again on my direct work around our projects, often if you see some project put out by the Future of Life Institute, I usually have at least some involvement with it from a logistics, operations, execution, or ideation standpoint related to it.

And moving into the next line of questioning here for the team, what would you all say motivates your interest in existential risk and the work that you do at FLI? Is there anything in particular that is motivating this work for you?

Ian Rusconi: What motivates my interest in existential risk in general I think is that it’s extraordinarily interdisciplinary. But my interest in what I do at FLI is mostly that I’m really happy to have a hand in producing content that I find compelling. But it isn’t just the subjects and the topics that we cover in these podcasts, it’s how you and Ariel have done so. One of the reasons I have so much respect for the work that you two have done and consequently enjoy working on it so much is the comprehensive approach that you take in your lines of questioning.

You aren’t afraid to get into the weeds with interviewees on very specific technical details, but still seek to clarify jargon and encapsulate explanations, and there’s always an eye towards painting a broader picture so we can contextualize a subject’s placement in a field as a whole. I think that FLI’s podcasts often do a tightrope act, walking the line between popular audience and field specialists in a way that doesn’t treat the former like children, and doesn’t bore the latter with a lack of substance. And that’s a really hard thing to do. And I think it’s a rare opportunity to be able to help create something like this.

Kirsten Gronlund: I guess really broadly, I feel like there’s sort of this sense generally that a lot of these technologies and things that we’re coming up with are going to fix a lot of issues on their own. Like new technology will help us feed more people, and help us end poverty, and I think that that’s not true. We already have the resources to deal with a lot of these problems, and we haven’t been. So I think, really, we need to figure out a way to use what is coming out and the things that we’re inventing to help people. Otherwise we’re going to end up with a lot of new technology making the top 1% way more wealthy, and everyone else potentially worse off.

So I think for me that’s really what it is, is to try to communicate to people that these technologies are not, on their own, the solution, and we need to all work together to figure out how to implement them, and how to restructure things in society more generally so that we can use these really amazing tools to make the world better.

Lucas Perry: Yeah. I’m just thinking about how technology enables abundance and how it seems like there are not limits to human greed, and there are limits to human greed. Human greed can potentially want infinite power, but also there’s radically diminishing returns on one’s own happiness and wellbeing as one gains more access to more abundance. It seems like there’s kind of a duality there. 

Kirsten Gronlund: I agree. I mean, I think that’s a very effective altruist way to look at it. That those same resources, if everyone has some power and some money, people will on average be happier than if you have all of it and everyone else has less. But I feel like people, at least people who are in the position to accumulate way more money than they could ever use, tend to not think of it that way, which is unfortunate.

Tucker Davey: In general with working with FLI, I think I’m motivated by some mix of fear and hope. And I would say the general fear is that, if we as a species don’t figure out how to cooperate on advanced technology, and if we don’t agree to avoid certain dangerous paths, we’ll inevitably find some way to destroy ourselves, whether it’s through AI or nuclear weapons or synthetic biology. But then that’s also balanced by a hope that there’s so much potential for large scale cooperation to achieve our goals on these issues, and so many more people are working on these topics as opposed to five years ago. And I think there really is a lot of consensus on some broad shared goals. So I have a hope that through cooperation and better coordination we can better tackle some of these really big issues.

Emilia Javorsky: Part of the reason as a physician I went into the research side of it is this idea of wanting to help people at scale. I really love the idea of how do we use science and translational medicine, not just to help one person, but to help whole populations of people. And so for me, this issue of lethal autonomous weapons is the converse of that. This is something that really has the capacity to both destroy lives at scale in the near term, and also as we think towards questions like value alignment and longer term, more existential questions, it’s something that for me is just very motivating. 

Jared Brown: This is going to sound a little cheesy and maybe even a little selfish, but my main motivation is my kids. I know that they have a long life ahead of them, hopefully, and there’s various different versions of the future that’ll better or worse for them. And I know that emerging technology policy is going to be key to maximizing the benefit of their future and everybody else’s, and that’s ultimately what motivates me. I’ve been thinking about tech policy basically ever since I started researching and reading Futurism books when my daughter was born about eight years ago, and that’s what really got me into the field and motivated to work on it full-time.

Meia Chita-Tegmark: I like to think of my work as being ultimately about people. I think that one of the most interesting aspects of this human drama is our relationship with technology, which recently has become evermore promising and also evermore dangerous. So, I want to study that, and I feel crazy lucky that there are universities willing to pay me to do it. And also to the best of my abilities, I want to try to nudge people in the technologies that they develop in more positive directions. I’d like to see a world where technology is used to save lives and not to take lives. I’d like to see technologies that are used for nurture and care rather than power and manipulation. 

Jessica Cussins Newman: I think the integration of machine intelligence into the world around us is one of the most impactful changes that we’ll experience in our lifetimes. I’m really excited about the beneficial uses of AI, but I worry about its impacts, and the questions of not just what we can build, but what we should build. And how we could see these technologies being destabilizing, or that won’t be sufficiently thoughtful about ensuring that the systems aren’t developed or used in ways that expose us to new vulnerabilities, or impose undue burdens on particular communities.

Anthony Aguirre: I would say it’s kind of a combination of things. Everybody looks at the world and sees that there are all kinds of problems and issues and negative directions that lots of things are going, and it feels frustrating and depressing. And I feel that given that I’ve got a particular day job that’ll affords me a lot of freedom, given that I have this position at Future of Life Institute, that there are a lot of talented people around who I’m able to work with, there’s a huge opportunity, and a rare opportunity to actually do something.

Who knows how effective it’ll actually be in the end, but to try to do something and to take advantage of the freedom, and standing, and relationships, and capabilities that I have available. I kind of see that as a duty in a sense, that if you find in a place where you have a certain set of capabilities, and resources, and flexibility, and safety, you kind of have a duty to make use of that for something beneficial. I sort of feel that, and so try to do so, but I also feel like it’s just super interesting, thinking about the ways that you can create things that can be effective, it’s just a fun intellectual challenge. 

There are certainly aspects of what I do at Future of Life Institute that are sort of, “Oh, yeah, this is important so I should do it, but I don’t really feel like it.” Those are occasionally there, but mostly it feels like, “Ooh, this is really interesting and exciting, I want to get this done and see what happens.” So in that sense it’s really gratifying in both ways, to feel like it’s both potentially important and positive, but also really fun and interesting.

Max Tegmark: What really motivates me is this optimistic realization that after 13.8 billion years of cosmic history, we have reached this fork in the road where we have these conscious entities on this little spinning ball in space here who, for the first time ever, have the future in their own hands. In the stone age, who cared what you did? Life was going to be more or less the same 200 years later regardless, right? Whereas now, we can either develop super powerful technology and use it to destroy life on earth completely, go extinct and so on. Or, we can create a future where, with the help of artificial intelligence amplifying our intelligence, we can help life flourish like never before. And I’m not talking just about the next election cycle, I’m talking about for billions of years. And not just here, but throughout much of our amazing universe. So I feel actually that we have a huge responsibility, and a very exciting one, to make sure we don’t squander this opportunity, don’t blow it. That’s what lights me on fire.

Lucas Perry: So I’m deeply motivated by the possibilities of the deep future. I often take cosmological or macroscopic perspectives when thinking about my current condition or the condition of life on earth. The universe is about 13.8 billion years old and our short lives of only a few decades are couched within the context of this ancient evolving system of which we are a part. As far as we know, consciousness has only really exploded and come onto the scene in the past few hundred million years, at least in our sector of space and time, and the fate of the universe is uncertain but it seems safe to say that we have at least billions upon billions of years left before the universe perishes in some way. That means there’s likely longer than the current lifetime of the universe for earth originating intelligent life to do and experience amazing and beautiful things beyond what we can even know or conceive of today.

It seems very likely to me that the peaks and depths of human consciousness, from the worst human misery to the greatest of joy, peace, euphoria, and love, represent only a very small portion of a much larger and higher dimensional space of possible conscious experiences. So given this, I’m deeply moved by the possibility of artificial intelligence being the next stage in the evolution of life and the capacities for that intelligence to solve existential risk, for that intelligence to explore the space of consciousness and optimize the world, for super-intelligent and astronomical degrees of the most meaningful and profound states of consciousness possible. So sometimes I ask myself, what’s a universe good for if not ever evolving into higher and more profound and intelligent states of conscious wellbeing? I’m not sure, and this is still an open question for sure, but this deeply motivates me as I feel that the future can be unimaginably good to degrees and kinds of wellbeing that we can’t even conceive of today. There’s a lot of capacity there for the future to be something that is really, really, really worth getting excited and motivated about.

And moving along in terms of questioning again here, this question is again for the whole team: do you have anything more specifically that you hope results from your work, or is born of your work at FLI?

Jared Brown: So, I have two primary objectives, the first is sort of minor but significant. A lot of what I do on a day-to-day basis is advocate for relatively minor changes to existing and future near term policy on emerging technology. And some of these changes won’t make a world of difference unto themselves, but the small marginal benefits to the future can cumulate rather significantly overtime. So, I look for as many small wins as possible in different policy-making environments, and try and achieve those on a regular basis.

And then more holistically in the long-run, I really want to help destigmatize the discussion around global catastrophic and existential risk, and Traditional National Security, and International Security policy-making. It’s still quite an obscure and weird thing to say to people, I work on global catastrophic and existential risk, and it really shouldn’t be. I should be able talk to most policy-makers in security related fields, and have it not come off as a weird or odd thing to be working on. Because inherently what we’re talking about is the very worst of what could happen to you or humanity or even life as we know it on this planet. And there should be more people who work on these issues both from an effective altruistic perspective and other perspectives going forward.

Jessica Cussins Newman: I want to raise awareness about the impacts of AI and the kinds of levers that we have available to us today to help shape these trajectories. So from designing more robust machine learning models, to establishing the institutional procedures or processes that can track and monitor those design decisions and outcomes and impacts, to developing accountability and governance mechanisms to ensure that those AI systems are contributing to a better future. We’ve built a tool that can automate decision making, but we need to retain human control and decide collectively as a society where and how to implement these new abilities.

Max Tegmark: I feel that there’s a huge disconnect right now between our potential, as the human species, and the direction we’re actually heading in. We are spending most of our discussions in news media on total BS. You know, like country A and country B are squabbling about something which is quite minor, in the grand scheme of things, and people are often treating each other very badly in the misunderstanding that they’re in some kind of zero-sum game, where one person can only get better off if someone else gets worse off. Technology is not a zero-sum game. Everybody wins at the same time, ultimately, if you do it right. 

Why are we so much better off now than 50,000 years ago or 300 years ago? It’s because we have antibiotics so we don’t die of stupid diseases all the time. It’s because we have the means to produce food and keep ourselves warm, and so on, with technology, and this is nothing compared to what AI can do.

I’m very much hoping that this mindset that we all lose together or win together is something that can catch on a bit more as people gradually realize the power of this tech. It’s not the case that either China is going to win and the U.S. is going to lose, or vice versa. What’s going to happen is either we’re both going to lose because there’s going to be some horrible conflict and it’s going to ruin things for everybody, or we’re going to have a future where people in China are much better off, and people in the U.S. and elsewhere in the world are also much better off, and everybody feels that they won. There really is no third outcome that’s particularly likely.

Lucas Perry: So, in the short term, I’m hoping that all of the projects we’re engaging with help to nudge the trajectory of life on earth in a positive direction. I’m hopeful that we can mitigate an arms race in lethal autonomous weapons. I see that as being a crucial first step in coordination around AI issues such that, if that fails, it may likely be much harder to coordinate in the future on making sure that beneficial AI takes place. I am also hopeful that we can promote beneficial AI alignment and AI safety research farther and mainstream its objectives and understandings about the risks posed by AI and what it means to create beneficial AI. I’m hoping that we can maximize the wisdom with which we handle technology through projects and outreach, which explicitly cultivate ethics and coordination and governance in ways which help to direct and develop technologies in ways that are beneficial.

I’m also hoping that we can promote and instantiate a culture and interest in existential risk issues and the technical, political, and philosophical problems associated with powerful emerging technologies like AI. It would be wonderful if the conversations that we have on the podcast and at FLI and in the surrounding community weren’t just something for us. These are issues that are deeply interesting and will ever become more important as technology becomes more powerful. And so I’m really hoping that one day discussions about existential risk and all the kinds of conversations that we have on the podcast are much more mainstream, are normal, that there are serious institutions in government and society which explore these, is part of common discourse as a society and civilization.

Emilia Javorsky: In an ideal world, all of FLI’s work in this area, a great outcome would be the realization of the Asilomar principle that an arms race in lethal autonomous weapons must be avoided. I hope that we do get there in the shorter term. I think the activities that we’re doing now on increasing awareness around this issue, better understanding and characterizing the unique risks that these systems pose across the board from a national security perspective, a human rights perspective, and an AI governance perspective, are a really big win in my book.

Meia Chita-Tegmark: When I allow myself to unreservedly daydream about how I want my work to manifest itself into the world, I always conjure up fantasy utopias in which people are cared for and are truly inspired. For example, that’s why I am very committed to fighting against the development of lethal autonomous weapons. It’s precisely because a world with such technologies would be one in which human lives would be cheap, killing would be anonymous, our moral compass would likely be very damaged by this. I want to start work on using technology to help people, maybe to heal people. In my research, I tried to think of various disabilities and how technology can help with those, but that is just one tiny aspect of a wealth of possibilities for using technology, and in particular, AI for good.

Anthony Aguirre: I’ll be quite gratified if I can find that some results of some of the things that I’ve done help society be better and more ready, and to wisely deal with challenges that are unfolding. There are a huge number of problems in society, but there are a particular subset that are just sort of exponentially growing problems, because they have to do with exponentially advancing technology. And the set of people who are actually thinking proactively of the problems that those technologies are going to create, rather than just creating the technologies or sort of dealing with the problems when they arise, it’s quite small.

FLI is a pretty significant part of that tiny community of people who are thinking about that. But I also think it’s very important. Problems are better solved in advance, if possible. So I think anything that we can do to nudge things in the right direction, taking the relatively high point of leverage I think the Future of Life Institute has, will feel useful and worthwhile. Any of these projects being successful, I think will have a significant positive impact, and it’s just a question of buckling down and trying to get them to work.

Kirsten Gronlund: A big part of this field, not necessarily, but sort of just historically has been that it’s very male, and it’s very white, and in and of itself is a pretty privileged group of people, and something that I personally care about a lot is to try to expand some of these conversations around the future, and what we want it to look like, and how we’re going to get there, and involve more people and more diverse voices, more perspectives.

It goes along with what I was saying, that if we don’t figure out how to use these technologies in better ways, we’re just going to be contributing to people who have historically been benefiting from technology, and so I think bringing some of the people who have historically not been benefiting from technology and the way that our society is structured into these conversations, can help us figure out how to make things better. I’ve definitely been trying, while we’re doing this book guide thing, to make sure that there’s a good balance of male and female authors, people of color, et cetera and same with our podcast guests and things like that. But yeah, I mean I think there’s a lot more to be done, definitely, in that area.

Tucker Davey: So with the projects related to FLI’s AI communication strategy, I am hopeful that as an overall community, as an AI safety community, as an effective altruism community, existential risk community, we’ll be able to better understand what our core beliefs are about risks from advanced AI, and better understand how to communicate to different audiences, whether these are policymakers that we need to convince that AI is a problem worth considering, or whether it’s just the general public, or shareholders, or investors. Different audiences have different ideas of AI, and if we as a community want to be more effective at getting them to care about this issue and understand that it’s a big risk, we need to figure out better ways to communicate with them. And I’m hoping that a lot of this communications work will help the community as a whole, not just FLI, communicate with these different parties and help them understand the risks.

Ian Rusconi: Well, I can say that I’ve learned more since I started working on these podcasts about more disparate subjects than I had any idea about. Take lethal autonomous weapon systems, for example, I didn’t know anything about that subject when I started. These podcasts are extremely educational, but they’re conversational, and that makes them accessible, and I love that. And I hope that as our audience increases, other people find the same thing and keep coming back because we learn something new every time. I think that through podcasts, like the ones that we put out at FLI, we are enabling that sort of educational enrichment.

Lucas Perry: Cool. I feel the same way. So, you actually have listened to more FLI podcasts than perhaps anyone, since you’ve listened to all of them. Of all of these podcasts, do you have any specific projects, or a series that you have found particularly valuable? Any favorite podcasts, if you could mention a few, or whatever you found most valuable?

Ian Rusconi: Yeah, a couple of things. First, back in February, Ariel and Max Tegmark did a two part conversation with Matthew Meselson in advance of FLI awarding him in April, and I think that was probably the most fascinating and wide ranging single conversation I’ve ever heard. Philosophy, science history, weapons development, geopolitics, the value of the humanities from a scientific standpoint, artificial intelligence, treaty development. It was just such an incredible amount of lived experience and informed perspective in that conversation. And, in general, when people ask me what kinds of things we cover on the FLI podcast, I point them to that episode.

Second, I’m really proud of the work that we did on Not Cool, A Climate Podcast. The amount of coordination and research Ariel and Kirsten put in to make that project happen was staggering. I think my favorite episodes from there were those dealing with the social ramifications of climate change, specifically human migration. It’s not my favorite topic to think about, for sure, but I think it’s something that we all desperately need to be aware of. I’m oversimplifying things here, but Kris Ebi’s explanations of how crop failure and malnutrition and vector borne diseases can lead to migration, Cullen Hendrix touching on migration as it relates to the social changes and conflicts born of climate change, Lindsay Getschel’s discussion of climate change as a threat multiplier and the national security implications of migration.

Migration is happening all the time and it’s something that we keep proving we’re terrible at dealing with, and climate change is going to increase migration, period. And we need to figure out how to make it work and we need to do it in a way that ameliorates living standards and prevents this extreme concentrated suffering. And there are questions about how to do this while preserving cultural identity, and the social systems that we have put in place, and I know none of these are easy. But if instead we’d just take the question of, how do we reduce suffering? Well, we know how to do that and it’s not complicated per se: have compassion and act on it. We need compassionate government and governance. And that’s a thing that came up a few times, sometimes directly and sometimes obliquely, in Not Cool. The more I think about how to solve problems like these, the more I think the intelligent answer is compassion.

Lucas Perry: So, do you feel like you just learned a ton about climate change from the Not Cool podcast that you just had no idea about?

Ian Rusconi: Yeah, definitely. And that’s really something that I can say about all of FLI’s podcast series in general, is that there are so many subtopics on the things that we talk about that I always learn something new every time I’m putting together one of these episodes. 

Some of the actually most thought provoking podcasts to me are the ones about the nature of intelligence and cognition, and what it means to experience something, and how we make decisions. Two of the AI Alignment Podcast episodes from this year stand out to me in particular. First was the one with Josh Green in February, which did an excellent job of explaining the signal grounding problem and grounded cognition in an understandable and engaging way. And I’m also really interested in his lab’s work using the veil of ignorance. And second was the episode with Mike Johnson and Andres Gomez Emilsson of the Qualia Research Institute in May, where I particularly liked the discussion of electromagnetic harmony in the brain, and the interaction between the consonance and dissonance of it’s waves, and how you can basically think of music as a means by which we can hack our brains. Again, it gets back to the fabulously, extraordinarily interdisciplinary aspect of everything that we talk about here.

Lucas Perry: Kirsten, you’ve also been integral to the podcast process. What are your favorite things that you’ve done at FLI in 2019, and are there any podcasts in particular that stand out for you?

Kirsten Gronlund: The Women For The Future campaign was definitely one of my favorite things, which was basically just trying to highlight the work of women involved in existential risk, and through that try to get more women feeling like this is something that they can do and to introduce them to the field a little bit. And then also the Not Cool Podcast that Ariel and I did. I know climate isn’t the major focus of FLI, but it is such an important issue right now, and it was really just interesting for me because I was much more closely involved with picking the guests and stuff than I have been with some of the other podcasts. So it was just cool to learn about various people and their research and what’s going to happen to us if we don’t fix the climate. 

Lucas Perry: What were some of the most interesting things that you learned from the Not Cool podcast? 

Kirsten Gronlund: Geoengineering was really crazy. I didn’t really know at all what geoengineering was before working on this podcast, and I think it was Alan Robock in his interview who was saying even just for people to learn about the fact that one of the solutions that people are considering to climate change right now being shooting a ton of crap into the atmosphere and basically creating a semi nuclear winter, would hopefully be enough to kind of freak people out into being like, “maybe we should try to fix this a different way.” So that was really crazy.

I also thought it was interesting just learning about some of the effects of climate change that you wouldn’t necessarily think of right away. The fact that they’ve shown the links between increased temperature and upheaval in government, and they’ve shown links between increased temperature and generally bad mood, poor sleep, things like that. The quality of our crops is going to get worse, so we’re going to be eating less nutritious food.

Then some of the cool things, I guess this ties in as well with artificial intelligence, is some of the ways that people are using some of these technologies like AI and machine learning to try to come up with solutions. I thought that was really cool to learn about, because that’s kind of like what I was saying earlier where if we can figure out how to use these technologies in productive ways. They are such powerful tools and can do so much good for us. So it was cool to see that in action in the ways that people are implementing automated systems and machine learning to reduce emissions and help out with the climate.

Lucas Perry: From my end, I’m probably most proud of our large conference, Beneficial AGI 2019, we did to further mainstream AGI safety thinking and research and then the resulting projects which were a result of conversations which took place there were also very exciting and encouraging. I’m also very happy about the growth and development of our podcast series. This year, we’ve had over 200,000 listens to our podcasts. So I’m optimistic about the continued growth and development of our outreach through this medium and our capacity to inform people about these crucial issues.

Everyone else, other than podcasts, what are some of your favorite things that you’ve done at FLI in 2019?

Tucker Davey: I would have to say the conferences. So the beneficial AGI conference was an amazing start to the year. We gathered such a great crowd in Puerto Rico, people from the machine learning side, from governance, from ethics, from psychology, and really getting a great group together to talk out some really big questions, specifically about the long-term future of AI, because there’s so many conferences nowadays about the near term impacts of AI, and very few are specifically dedicated to thinking about the long term. So it was really great to get a group together to talk about those questions and that set off a lot of good thinking for me personally. That was an excellent conference. 

And then a few months later, Anthony and a few others organized a conference called the Augmented Intelligence Summit, and that was another great collection of people from many different disciplines, basically thinking about a hopeful future with AI and trying to do world building exercises to figure out what that ideal world with AI would look like. These conferences and these events in these summits do a great job of bringing together people from different disciplines in different schools of thought to really tackle these hard questions, and everyone who attends them is really dedicated and motivated, so seeing all those faces is really inspiring.

Jessica Cussins Newman: I’ve really enjoyed the policy engagement that we’ve been able to have this year. You know, looking back to last year, we did see a lot of successes around the development of ethical principles for AI, and I think this past year, there’s been significant interest in actually implementing those principles into practice. So seeing many different governance forums, both within the U.S. and around the world, look to that next level, and so I think one of my favorite things has just been seeing FLI become a trusted resource for so many of those governance and policies processes that I think will significantly shape the future of AI.

I think the thing that I continue to value significantly about FLI is its ability as an organization to just bring together an amazing network of AI researchers and scientists, and to be able to hold events, and networking and outreach activities, that can merge those communities with other people thinking about issues around governance or around ethics or other kinds of sectors and disciplines. We have been playing a key role in translating some of the technical challenges related to AI safety and security into academic and policy spheres. And so that continues to be one of my favorite things that FLI is really uniquely good at.

Jared Brown: A recent example here, Future of Life Institute submitted some comments on a regulation that the Department of Housing and Urban Development put out in the U.S. And essentially the regulation is quite complicated, but they were seeking comment about how to integrate artificial intelligence systems into the legal liability framework surrounding something called ‘the Fair Housing Act,’ which is an old, very important civil rights legislation and protection to prevent discrimination in the housing market. And their proposal was essentially to grant users, such as a mortgage lender, or the banking system seeking loans, or even a landlord, if they were to use an algorithm to decide who they rent out a place to, or who to give a loan, that met certain technical standards, they’d be given liability protection. And this stems from the growing use of AI in the housing market. 

Now, in theory, there’s nothing wrong with using algorithmic systems so long as they’re not biased, and they’re accurate, and well thought out. However, if you grant it like HUD wanted to, blanket liability protection, you’re essentially telling that bank officer or that landlord that they should only exclusively use those AI systems that have the liability protection. And if they see a problem in those AI systems, and they’ve got somebody sitting across from them, and think this person really should get a loan, or this person should be able to rent my apartment because I think they’re trustworthy, but the AI algorithm says “no,” they’re not going to dispute what the AI algorithm tells them too, because to do that, they take on liability of their own, and could potentially get sued. So, there’s a real danger here in moving too quickly in terms of how much legal protection we give these systems. And so, the Future of Life Institute, as well as many other different groups, commented on this proposal and pointed out these flaws to the Department of Housing and Urban Development. That’s an example of just one of many different things that the Future of Life has done, and you can actually go online and see our public comments for yourself, if you want to.

Lucas Perry:Wonderful.

Jared Brown: Honestly, a lot of my favorite things are just these off the record type conversations that I have in countless formal and informal settings with different policymakers and people who influence policy. The policy-making world is an old-fashioned, face-to-face type business, and essentially you really have to be there, and to meet these people, and to have these conversations to really develop a level of trust, and a willingness to engage with them in order to be most effective. And thankfully I’ve had a huge range of those conversations throughout the year, especially on AI. And I’ve been really excited to see how well received Future of Life has been as an institution. Our reputation precedes us because of a lot of the great work we’ve done in the past with the Asilomar AI principles, and the AI safety grants. It’s really helped me get in the room for a lot of these conversations, and given us a lot of credibility as we discuss near-term AI policy.

In terms of bigger public projects, I also really enjoyed coordinating with some community partners across the space in our advocacy on the U.S. National Institute of Standards and Technology’s plan for engaging in the development of technical standards on AI. In the policy realm, it’s really hard to see some of the end benefit of your work, because you’re doing advocacy work, and it’s hard to get folks to really tell you why the certain changes were made, and if you were able to persuade them. But in this circumstance, I happen to know for a fact that we had real positive effect on the end products that they developed. I talked to the lead authors about it, and others, and can see the evidence in the final product of the effect of our changes.

In addition to our policy and advocacy work, I really, really like that FLI continues to interface with the AI technical expert community on a regular basis. And this isn’t just through our major conferences, but also informally throughout the entire year, through various different channels and personal relationships that we’ve developed. It’s really critical for anyone’s policy work to be grounded in the technical expertise on the topic that they’re covering. And I’ve been thankful for the number of opportunities I’ve been given throughout the year to really touch base with some of the leading minds in AI about what might work best, and what might not work best from a policy perspective, to help inform our own advocacy and thinking on various different issues.

I also really enjoy the educational and outreach work that FLI is doing. As with our advocacy work, it’s sometimes very difficult to see the end benefit of the work that we do with our podcasts, and our website, and our newsletter. But I know anecdotally, from various different people, that they are listened too, that they are read by leading policymakers and researchers in this space. And so, they have a real effect on developing a common understanding in the community and helping network and develop collaboration on some key topics that are of interest to the Future of Life and people like us.

Emilia Javorsky: 2019 was a great year at FLI. It’s my first year at FLI, so I’m really excited to be part of such an incredible team. There are two real highlights that come to mind. One was publishing an article in the British Medical Journal on this topic of engaging the medical community in the lethal autonomous weapons debate. In previous disarmament conversations, it’s always been a community that has played an instrumental role in getting global action on these issues passed, whether you look at nuclear, landmines, biorisk… So that was something that I thought was a great contribution, because up until now, they hadn’t really been engaged in the discussion.

The other that comes to mind that was really amazing was a workshop that we hosted, where we brought together AI researchers, and roboticists, and lethal autonomous weapons experts, with very divergent range of views of the topic, to see if they could achieve consensus on something. Anything. We weren’t really optimistic to say what that could be going into it, and the result of that was actually remarkably heartening. They came up with a roadmap that outlined four components for action on lethal autonomous weapons, including things like the potential role that a moratorium may play, research areas that need exploration, non-proliferation strategies, ways to avoid unintentional escalation. They actually published this in the IEEE Spectrum, which I really recommend reading, but it was just really exciting to see just how much area of agreement and consensus that can exist in people that you would normally think have very divergent views on the topic.

Max Tegmark: To make it maximally easy for them to get along, we actually did this workshop in our house, and we had lots of wine. And because they were in our house, also it was a bit easier to exert social pressure on them to make sure they were nice to each other, and have a constructive discussion. The task we gave them was simply: write down anything that they all agreed on that should be done to reduce the risk of terrorism or destabilizing events from this tech. And you might’ve expected a priori that they would come up with a blank piece of paper, because some of these people had been arguing very publicly that we need lethal autonomous weapons, and others had been arguing very vociferously that we should ban them. Instead, it was just so touching to see that when they actually met each other, often for the first time, they could actually listen directly to each other, rather than seeing weird quotes in the news about each other. 

Meia Chita-Tegmark: If I had to pick one thing, especially in terms of emotional intensity, it’s really been a while since I’ve been on such an emotional roller coaster as the one during the workshop related to lethal autonomous weapons. It was so inspirational to see how people that come with such diverging opinions could actually put their minds together, and work towards finding consensus. For me, that was such a hope inducing experience. It was a thrill.

Max Tegmark: They built a real camaraderie and respect for each other, and they wrote this report with five different sets of recommendations in different areas, including a moratorium on these things and all sorts of measures to reduce proliferation, and terrorism, and so on, and that made me feel more hopeful.

We got off to a great start I feel with our January 2019 Puerto Rico conference. This was the third one in a series where we brought together world leading AI researchers from academia, and industry, and other thinkers, to talk not about how to make AI more powerful, but how to make it beneficial. And what I was particularly excited about was that this was the first time when we also had a lot of people from China. So it wasn’t just this little western club, it felt much more global. It was very heartening to meet to see how well everybody got along and shared visions people really, really had. And I hope that if people who are actually building this stuff can all get along, can help spread this kind of constructive collaboration to the politicians and the political leaders in their various countries, we’ll all be much better off.

Anthony Aguirre: That felt really worthwhile in multiple aspects. One, just it was a great meeting getting together with this small, but really passionately positive, and smart, and well-intentioned, and friendly community. It’s so nice to get together with all those people, it’s very inspiring. But also, that out of that meeting came a whole bunch of ideas for very interesting and important projects. And so some of the things that I’ve been working on are projects that came out of that meeting, and there’s a whole long list of other projects that came out of that meeting, some of which some people are doing, some of which are just sitting, gathering dust, because there aren’t enough people to do them. That feels like really good news. It’s amazing when you get a group of smart people together to think in a way that hasn’t really been widely done before. Like, “Here’s the world 20 or 30 or 50 or 100 years from now, what are the things that we’re going to want to have happened in order for the world to be good then?”

Not many people sit around thinking that way very often. So to get 50 or 100 people who are really talented together thinking about that, it’s amazing how easy it is to come up with a set of really compelling things to do. Now actually getting those done, getting the people and the money and the time and the organization to get those done is a whole different thing. But that was really cool to see, because you can easily imagine things that have a big influence 10 or 15 years from now that were born right at that meeting.

Lucas Perry: Okay, so that hits on BAGI. So, were there any other policy-related things that you’ve done at FLI in 2019 that you’re really excited about?

Anthony Aguirre: It’s been really good to see, both at FLI and globally, the new and very serious attention being paid to AI policy and technology policy in general. We created the Asilomar principles back in 2017, and now two years later, there are multiple other sets of principles, many of which are overlapping and some of which aren’t. And more importantly, now institutions coming into being, international groups like the OECD, like the United Nations, the European Union, maybe someday the US government, actually taking seriously these sets of principles about how AI should be developed and deployed, so as to be beneficial.

There’s kind of now too much going on to keep track of, multiple bodies, conferences practically every week, so the FLI policy team has been kept busy just keeping track of what’s going on, and working hard to positively influence all these efforts that are going on. Because of course while there’s a lot going on, it doesn’t necessarily mean that there’s a huge amount of expertise that is available to feed those efforts. AI is relatively new on the world’s stage, at least at the size that it’s assuming. AI and policy expertise, that intersection, there just aren’t a huge number of people who are ready to give useful advice on the policy side and the technical side and what the ramifications are and so on.

So I think the fact that FLI has been there from the early days of AI policy five years ago, means that we have a lot to offer to these various efforts that are going on. I feel like we’ve been able to really positively contribute here and there, taking opportunistic chances to lend our help and our expertise to all kinds of efforts that are going on and doing real serious policy work. So that’s been really interesting to see that unfold and how rapidly these various efforts are gearing up around the world. I think that’s something that FLI can really do, bringing the technical expertise to make those discussions and arguments more sophisticated, so that we can really take it to the next step and try to get something done.

Max Tegmark: Another one which was very uplifting is this tradition we have to celebrate unsung heroes. So three years ago we celebrated the guy who prevented the world from getting nuked in 1962, Vasili Arkhipov. Two years ago, we celebrated the man who probably helped us avoid getting nuked in 1983, Stanislav Petrov. And this year we celebrated an American who I think has done more than anyone else to prevent all sorts of horrible things happening with bioweapons, Matthew Meselson from Harvard, who ultimately persuaded Kissinger, who persuaded Brezhnev and everyone else that we should just ban them. 

We celebrated them all by giving them or their survivors a $50,000 award and having a ceremony where we honored them, to remind the world of how valuable it is when you can just draw a clear, moral line between the right thing to do and the wrong thing to do. Even though we call this the Future of Life award officially, informally, I like to think of this as our unsung hero award, because there really aren’t awards particularly for people who prevented shit from happening. Almost all awards are for someone causing something to happen. Yet, obviously we wouldn’t be having this conversation if there’d been a global thermonuclear war. And it’s so easy to think that just because something didn’t happen, there’s not much to think about it. I’m hoping this can help create both a greater appreciation of how vulnerable we are as a species and the value of not being too sloppy. And also, that it can help foster a tradition that if someone does something that future generations really value, we actually celebrate them and reward them. I want us to have a norm in the world where people know that if they sacrifice themselves by doing something courageous, that future generations will really value, then they will actually get appreciation. And if they’re dead, their loved ones will get appreciation.

We now feel incredibly grateful that our world isn’t radioactive rubble, or that we don’t have to read about bioterrorism attacks in the news every day. And we should show our gratitude, because this sends a signal to people today who can prevent tomorrow’s catastrophes. And the reason I think of this as an unsung hero award, and the reason these people have been unsung heroes, is because what they did was often going a little bit against what they were supposed to do at the time, according to the little system they were in, right? Arkhipov and Petrov, neither of them got any medals for averting nuclear war because their peers either were a little bit pissed at them for violating protocol, or a little bit embarrassed that we’d almost had a war by mistake. And we want to send the signal to the kids out there today that, if push comes to shove, you got to go with your own moral principles.

Lucas Perry: Beautiful. What project directions are you most excited about moving in, in 2020 and beyond?

Anthony Aguirre: Along with the ones that I’ve already mentioned, something I’ve been involved with is Metaculus, this prediction platform, and the idea there is there are certain facts about the future world, and Metaculus is a way to predict probabilities for those facts being true about the future world. But they’re also facts about the current world, that we either don’t know whether they’re true or not or we disagree about whether they’re true or not. Something I’ve been thinking a lot about is how to extend the predictions of Metaculus into a general truth-seeking mechanism. If there’s something that’s contentious now, and people disagree about something that should be sort of a fact, can we come up with a reliable truth-seeking arbiter that people will believe, because it’s been right in the past, and it has very clear reliable track record for getting things right, in the same way that Metaculus has that record for getting predictions right?

So that’s something that interests me a lot, is kind of expanding that very strict level of accountability and track record creation from prediction to just truth-seeking. And I think that could be really valuable, because we’re entering this phase where people feel like they don’t know what’s true and facts are under contention. People simply don’t know what to believe. The institutions that they’re used to trusting to give them reliable information are either conflicting with each other or getting drowned in a sea of misinformation.

Lucas Perry: So, would this institution gain its credibility and epistemic status and respectability by taking positions on unresolved, yet concrete issues, which are likely to resolve in the short-term?

Anthony Aguirre: Or the not as short-term. But yeah, so just like in a prediction, where there might be disagreements as to what’s going to happen because nobody quite knows, and then at some point something happens and we all agree, “Oh, that happened, and some people were right and some people were wrong,” I think there are many propositions under contention now, but in a few years when the dust has settled and there’s not so much heat about them, everybody’s going to more or less agree on what the truth was.

And so I think, in a sense, this is about saying, “Here’s something that’s contentious now, let’s make a prediction about how that will turn out to be seen five or 10 or 15 years from now, when the dust has settled people more or less agree on how this was.”

I think there’s only so long that people can go without feeling like they can actually rely on some source of information. I mean, I do think that there is a reality out there, and ultimately you have to pay a price if you are not acting in accordance with what is true about that reality. You can’t indefinitely win by just denying the truth of the way that the world is. People seem to do pretty well for awhile, but I maintain my belief that eventually there will be a competitive advantage in understanding the way things actually are, rather than your fantasy of them.

We in the past did have trusted institutions that people generally listened to, and felt like I’m being told that basic truth. Now they weren’t always, and there were lots of problems with those institutions, but we’ve lost something, in that almost nobody trusts anything anymore at some level, and we have to get that back. We will solve this problem, I think, in the sense that we sort of have to. What that solution will look like is unclear, and this is sort of an effort to seek some way to kind of feel our way towards a potential solution to that.

Tucker Davey: I’m definitely excited to continue this work on our AI messaging and generally just continuing the discussion about advanced AI and artificial general intelligence within the FLI team and within the broader community, to get more consensus about what we believe and how we think we should approach these topics with different communities. And I’m also excited to see how our policy team continues to make more splashes across the world, because it’s really been exciting to watch how Jared and Jessica and Anthony have been able to talk with so many diverse shareholders and help them make better decisions about AI.

Jessica Cussins Newman: I’m most excited to see the further development of some of these global AI policy forums in 2020. For example, the OECD is establishing an AI policy observatory, which we’ll see further development on early in next year. And FLI is keen to support this initiative, and I think it may be a really meaningful forum for global coordination and cooperation on some of these key AI global challenges. So I’m really excited to see what they can achieve.

Jared Brown: I’m really looking forward to the opportunity the Future of Life has to lead the implementation of a recommendation related to artificial intelligence from the UN’s High-Level Panel on Digital Cooperation. This is a group that was led by Jack Ma and Melinda Gates, and they produced an extensive report that had many different recommendations on a range of digital or cyber issues, including one specifically on artificial intelligence. And because of our past work, we were invited to be a leader on the effort to implement and further refine the recommendation on artificial intelligence. And we’ll be able to do that with cooperation from the government of France, and Finland, and also with a UN agency called the UN Global Pulse. So I’m really excited about this opportunity to help lead a major project in the global governance arena, and to help actualize how some of these early soft law norms that have developed in AI policy can be developed for a better future.

I’m also excited about continuing to work with other civil society organizations, such as the Future of Humanity Institute, the Center for the Study of Existential Risk, other groups that are like-minded in their approach to tech issues. And helping to inform how we work on AI policy in a number of different governance spaces, including with the European Union, the OECD, and other environments where AI policy has suddenly become the topic du jour of interest to policy-makers.

Emilia Javorsky: Something that I’m really excited about is continuing to work on this issue of global engagement in the topic of lethal autonomous weapons, as I think this issue is heading in a very positive direction. By that I mean starting to move towards meaningful action. And really the only way we get to action on this issue is through education, because policy makers really need to understand what these systems are, what their risks are, and how AI differs from traditional other areas of technology that have really well established existing governance frameworks. So that’s something I’m really excited about for the next year. And this has been especially in the context of engaging with states at the United nations. So it’s really exciting to continue those efforts and continue to keep this issue on the radar.

Kirsten Gronlund: I’m super excited about our website redesign. I think that’s going to enable us to reach a lot more people and communicate more effectively, and obviously it will make my life a lot easier. So I think that’s going to be great.

Lucas Perry: I’m excited about that too. I think there’s a certain amount of a maintenance period that we need to kind of go through now, with regards to the website and a bunch of the pages, so that everything is refreshed and new and structured better. 

Kirsten Gronlund: Yeah, we just need like a little facelift. We are aware that the website right now is not super user friendly, and we are doing an incredibly in depth audit of the site to figure out, based on data, what’s working and what isn’t working, and how people would best be able to use the site to get the most out of the information that we have, because I think we have really great content, but the way that the site is organized is not super conducive to finding it, or using it.

So anyone who likes our site and our content but has trouble navigating or searching or anything: hopefully that will be getting a lot easier.

Ian Rusconi: I think I’d be interested in more conversations about ethics overall, and how ethical decision making is something that we need more of, as opposed to just economic decision making, and reasons for that with actual concrete examples. It’s one of the things that I find is a very common thread throughout almost all of the conversations that we have, but is rarely explicitly connected from one episode to another. And I think that there is some value in creating a conversational narrative about that. If we look at, say, the Not Cool Project, there are episodes about finance, and episodes about how the effects of what we’ve been doing to create global economy have created problems. And if we look at the AI Alignment Podcasts, there are concerns about how systems will work in the future, and who they will work for, and who benefits from things. And if you look at FLI’s main podcast, there are concerns about denuclearization, and lethal autonomous weapons, and things like that, and there are major ethical considerations to be had in all of these.

And I think that there’s benefit in taking all of these ethical considerations, and talking about them specifically outside of the context of the fields that they are in, just as a way of getting more people to think about ethics. Not in opposition to thinking about, say, economics, but just to get people thinking about ethics as a stand-alone thing, before trying to introduce how it’s relevant to something. I think if more people thought about ethics, we would have a lot less problems than we do.

Lucas Perry: Yeah, I would be interested in that too. I would first want to know empirically how much of the decisions that the average human being makes a day are actually informed by “ethical decision making,” which I guess my intuition at the moment is probably not that much?

Ian Rusconi: Yeah, I don’t know how much ethics plays into my autopilot-type decisions. I would assume. Probably not very much.

Lucas Perry: Yeah. We think about ethics explicitly a lot. I think that that definitely shapes my terminal values. But yeah, I don’t know, I feel confused about this. I don’t know how much of my moment to moment lived experience and decision making is directly born of ethical decision making. So I would be interested in that too, with that framing that I would first want to know the kinds of decision making faculties that we have, and how often each one is employed, and the extent to which improving explicit ethical decision making would help in making people more moral in general.

Ian Rusconi: Yeah, I could absolutely get behind that.

Max Tegmark: What I find also to be a concerning trend, and a predictable one, is that just like we had a lot of greenwashing in the corporate sector about environmental and climate issues, where people would pretend to care about the issues just so they didn’t really have to do much, we’re seeing a lot of what I like to call “ethics washing” now in AI, where people say, “Yeah, yeah. Okay, let’s talk about AI ethics now, like an ethics committee, and blah, blah, blah, but let’s not have any rules or regulations, or anything. We can handle this because we’re so ethical.” And interestingly, the very same people who talk the loudest about ethics are often among the ones who are the most dismissive about the bigger risks from human level AI, and beyond. And also the ones who don’t want to talk about malicious use of AI, right? They’ll be like, “Oh yeah, let’s just make sure that robots and AI systems are ethical and do exactly what they’re told,” but they don’t want to discuss what happens when some country, or some army, or some terrorist group has such systems, and tells them to do things that are horrible for other people. That’s an elephant in the room we are looking forward to help draw more attention to, I think, in the coming year. 

And what I also feel is absolutely crucial here is to avoid splintering the planet again, into basically an eastern and a western zone of dominance that just don’t talk to each other. Trade is down between China and the West. China has its great firewall, so they don’t see much of our internet, and we also don’t see much of their internet. It’s becoming harder and harder for students to come here from China because of visas, and there’s sort of a partitioning into two different spheres of influence. And as I said before, this is a technology which could easily make everybody a hundred times better or richer, and so on. You can imagine many futures where countries just really respect each other’s borders, and everybody can flourish. Yet, major political leaders are acting like this is some sort of zero-sum game. 

I feel that this is one of the most important things to help people understand that, no, it’s not like we have a fixed amount of money or resources to divvy up. If we can avoid very disruptive conflicts, we can all have the future of our dreams.

Lucas Perry: Wonderful. I think this is a good place to end on that point. So, what are reasons that you see for existential hope, going into 2020 and beyond?

Jessica Cussins Newman: I have hope for the future because I have seen this trend where it’s no longer a fringe issue to talk about technology ethics and governance. And I think that used to be the case not so long ago. So it’s heartening that so many people and institutions, from engineers all the way up to nation states, are really taking these issues seriously now. I think that momentum is growing, and I think we’ll see engagement from even more people and more countries in the future.

I would just add that it’s a joy to work with FLI, because it’s an incredibly passionate team, and everybody has a million things going on, and still gives their all to this work and these projects. I think what unites us is that we all think these are some of the most important issues of our time, and so it’s really a pleasure to work with such a dedicated team.

Lucas Perry:  Wonderful.

Jared Brown: As many of the listeners will probably realize, governments across the world have really woken up to this thing called artificial intelligence, and what it means for civil society, their governments, and the future really of humanity. And I’ve been surprised, frankly, over the past year, about how many of the new national, and international strategies, the new principles, and so forth are actually quite aware of both the potential benefits but also the real safety risks associated with AI. And frankly, this time this year, last year, I wouldn’t have thought as many principles would have come out, that there’s a lot of positive work in those principles, there’s a lot of serious thought about the future of where this technology is going. And so, on the whole, I think the picture is much better than what most people might expect in terms of the level of high-level thinking that’s going on in policy-making about AI, its benefits, and its risks going forward. And so on that score, I’m quite hopeful that there’s a lot of positive soft norms to work from. And hopefully we can work to implement those ideas and concepts going forward in real policy.

Lucas Perry: Awesome.

Emilia Javorsky: I am optimistic, and it comes from having had a lot of these conversations, specifically this past year, on lethal autonomous weapons, and speaking with people from a range of views and being able to sit down, coming together, having a rational and respectful discussion, and identifying actionable areas of consensus. That has been something that has been very heartening for me, because there is just so much positive potential for humanity waiting on the science and technology shelves of today, nevermind what’s in the pipeline that’s coming up. And I think that despite all of this tribalism and hyperbole that we’re bombarded with in the media every day, there are ways to work together as a society, and as a global community, and just with each other to make sure that we realize all that positive potential, and I think that sometimes gets lost. I’m optimistic that we can make that happen and that we can find a path forward on restoring that kind of rational discourse and working together.

Tucker Davey: I think my main reasons for existential hope in 2020 and beyond are, first of all, seeing how many more people are getting involved in AI safety, in effective altruism, and existential risk mitigation. It’s really great to see the community growing, and I think just by having more people involved, that’s a huge step. As a broader existential hope, I am very interested in thinking about how we can better coordinate to collectively solve a lot of our civilizational problems, and to that end, I’m interested in ways where we can better communicate about our shared goals on certain issues, ways that we can more credibly commit to action on certain things. So these ideas of credible commitment mechanisms, whether that’s using advanced technology like blockchain or whether that’s just smarter ways to get people to commit to certain actions, I think there’s a lot of existential hope for bigger groups in society coming together and collectively coordinating to make systemic change happen.

I see a lot of potential for society to organize mass movements to address some of the biggest risks that we face. For example, I think it was last year, an AI researcher, Toby Walsh, who we’ve worked with, he organized a boycott against a South Korean company that was working to develop these autonomous weapons. And within a day or two, I think, he contacted a bunch of AI researchers and they signed a pledge to boycott this group until they decided to ditch the project. And the boycotts succeeded basically within two days. And I think that’s one good example of the power of boycotts, and the power of coordination and cooperation to address our shared goals. So if we can learn lessons from Toby Walsh’s boycott, as well as from the fossil fuel and nuclear divestment movements, I think we can start to realize some of our potential to push these big industries in more beneficial directions.

So whether it’s the fossil fuel industry, the nuclear weapons industry, or the AI industry, as a collective, we have a lot of power to use stigma to push these companies in better directions. No company or industry wants bad press. And if we get a bunch of researchers together to agree that a company’s doing some sort of bad practice, and then we can credibly say that, “Look, you guys will get bad press if you guys don’t change your strategy,” many of these companies might start to change their strategy. And I think if we can better coordinate and organize certain movements and boycotts to get different companies and industries to change their practices, that’s a huge source of existential hope moving forward.

Lucas Perry: Yeah. I mean, it seems like the point that you’re trying to articulate is that there are particular instances like this thing that happened with Toby Walsh that show you the efficacy of collective action around our issues.

Tucker Davey: Yeah. I think there’s a lot more agreement on certain shared goals such,as we don’t want banks investing in fossil fuels, or we don’t want AI companies developing weapons that can make targeted kill decisions without human intervention. And if we take some of these broad shared goals and then we develop some sort of plan to basically pressure these companies to change their ways or to adopt better safety measures, I think these sorts of collective action can be very effective. And I think as a broader community, especially with more people in the community, we have much more of a possibility to make this happen.

So I think I see a lot of existential hope from these collective movements to push industries in more beneficial directions, because they can really help us, as individuals, feel more of a sense of agency that we can actually do something to address these risks.

Kirsten Gronlund: I feel like there’s actually been a pretty marked difference in the way that people are reacting to… at least things like climate change, and I sort of feel like more generally, there’s sort of more awareness just of the precariousness of humanity, and the fact that our continued existence and success on this planet is not a given, and we have to actually work to make sure that those things happen. Which is scary, and kind of exhausting, but I think is ultimately a really good thing, the fact that people seem to be realizing that this is a moment where we actually have to act and we have to get our shit together. We have to work together and this isn’t about politics, this isn’t about, I mean it shouldn’t be about money. I think people are starting to figure that out, and it feels like that has really become more pronounced as of late. I think especially younger generations, like obviously there’s Greta Thunberg and the youth movement on these issues. It seems like the people who are growing up now are so much more aware of things than I certainly was at that age, and that’s been cool to see, I think. They’re better than we were, and hopefully things in general are getting better.

Lucas Perry: Awesome.

Ian Rusconi: I think it’s often easier for a lot of us to feel hopeless than it is to feel hopeful. Most of the news that we get comes in the form of warnings, or the existing problems, or the latest catastrophe, and it can be hard to find a sense of agency as an individual when talking about huge global issues like lethal autonomous weapons, or climate change, or runaway AI.

People frame little issues that add up to bigger ones as things like death by 1,000 bee stings, or the straw that broke the camel’s back, and things like that, but that concept works both ways. 1,000 individual steps in a positive direction can change things for the better. And working on these podcasts has shown me the number of people taking those steps. People working on AI safety, international weapons bans, climate change mitigation efforts. There are whole fields of work, absolutely critical work, that so many people, I think, probably know nothing about. Certainly that I knew nothing about. And sometimes, knowing that there are people pulling for us, that’s all we need to be hopeful. 

And beyond that, once you know that work exists and that people are doing it, nothing is stopping you from getting informed and helping to make a difference. 

Kirsten Gronlund: I had a conversation with somebody recently who is super interested in these issues, but was feeling like they just didn’t have particularly relevant knowledge or skills. And what I would say is “neither did I when I started working for FLI,” or at least I didn’t know a lot about these specific issues. But really anyone, if you care about these things, you can bring whatever skills you have to the table, because we need all the help we can get. So don’t be intimidated, and get involved.

Ian Rusconi: I guess I think that’s one of my goals for the podcast, is that it inspires people to do better, which I think it does. And that sort of thing gives me hope.

Lucas Perry: That’s great. I feel happy to hear that, in general.

Max Tegmark: Let me first give a more practical reason for hope, and then get a little philosophical. So on the practical side, there are a lot of really good ideas that the AI community is quite unanimous about, in terms of policy and things that need to happen, that basically aren’t happening because policy makers and political leaders don’t get it yet. And I’m optimistic that we can get a lot of that stuff implemented, even though policy makers won’t pay attention now. If we get AI researchers around the world to formulate and articulate really concrete proposals and plans for policies that should be enacted, and they get totally ignored for a while? That’s fine, because eventually some bad stuff is going to happen because people weren’t listening to their advice. And whenever those bad things do happen, then leaders will be forced to listen because people will be going, “Wait, what are you going to do about this?” And if at that point, there are broad international consensus plans worked out by experts about what should be done, that’s when they actually get implemented. So the hopeful message I have to anyone working in AI policy is: don’t despair if you’re being ignored right now, keep doing all the good work and flesh out the solutions, and start building consensus for it among the experts, and there will be a time people will listen to you. 

To just end on a more philosophical note, again, I think it’s really inspiring to think how much impact intelligence has had on life so far. We realize that we’ve already completely transformed our planet with intelligence. If we can use artificial intelligence to amplify our intelligence, it will empower us to solve all the problems that we’re stumped by thus far, including curing all the diseases that kill our near and dear today. And for those so minded, even help life spread into the cosmos. Not even the sky is the limit, and the decisions about how this is going to go are going to be made within the coming decades, so within the lifetime of most people who are listening to this. There’s never been a more exciting moment to think about grand, positive visions for the future. That’s why I’m so honored and excited to get to work with the Future Life Institute.

Anthony Aguirre: Just like disasters, I think big positive changes can arise with relatively little warning and then seem inevitable in retrospect. I really believe that people are actually wanting and yearning for a society and a future that gives them fulfillment and meaning, and that functions and works for people.

There’s a lot of talk in the AI circles about how to define intelligence, and defining intelligence as the ability to achieve one’s goals. And I do kind of believe that for all its faults, humanity is relatively intelligent as a whole. We can be kind of foolish, but I think we’re not totally incompetent at getting what we are yearning for, and what we are yearning for is a kind of just and supportive and beneficial society that we can exist in. Although there are all these ways in which the dynamics of things that we’ve set up are going awry in all kinds of ways, and people’s own self-interest fighting it out with the self-interest of others is making things go terribly wrong, I do nonetheless see lots of people who are putting interesting, passionate effort forward toward making a better society. I don’t know that that’s going to turn out to be the force that prevails, I just hope that it is, and I think it’s not time to despair.

There’s a little bit of a selection effect in the people that you encounter through something like the Future of Life Institute, but there are a lot of people out there who genuinely are trying to work toward a vision of some better future, and that’s inspiring to see. It’s easy to focus on the differences in goals, because it seems like different factions that people want totally different things. But I think that belies the fact that there are lots of commonalities that we just kind of take for granted, and accept, and brush under the rug. Putting more focus on those and focusing the effort on, “given that we can all agree that we want these things and let’s have an actual discussion about what is the best way to get those things,” that’s something that there’s sort of an answer to, in the sense that we might disagree on what our preferences are, but once we have the set of preferences we agree on, there’s kind of the correct or more correct set of answers to how to get those preferences satisfied. We actually are probably getting better, we can get better, this is an intellectual problem in some sense and a technical problem that we can solve. There’s plenty of room for progress that we can all get behind.

Again, strong selection effect. But when I think about the people that I interact with regularly through the Future of Life Institute and other organizations that I work as a part of, they’re almost universally highly-effective, intelligent, careful-thinking, well-informed, helpful, easy to get along with, cooperative people. And it’s not impossible to create or imagine a society where that’s just a lot more widespread, right? It’s really enjoyable. There’s no reason that the world can’t be more or less dominated by such people.

As economic opportunity grows and education grows and everything, there’s no reason to see that that can’t grow also, in the same way that non-violence has grown. It used to be a part of everyday life for pretty much everybody, now many people I know go through many years without having any violence perpetrated on them or vice versa. We still live in a sort of overall, somewhat violent society, but nothing like what it used to be. And that’s largely because of the creation of wealth and institutions and all these things that make it unnecessary and impossible to have that as part of everybody’s everyday life.

And there’s no reason that can’t happen in most other domains, I think it is happening. I think almost anything is possible. It’s amazing how far we’ve come, and I see no reason to think that there’s some hard limit on how far we go.

Lucas Perry: So I’m hopeful for the new year simply because in areas that are important, I think things are on average getting better than they are getting worse. And it seems to be that much of what causes pessimism is perception that things are getting worse, or that we have these strange nostalgias for past times that we believe to be better than the present moment.

This isn’t new thinking, and is much in line with what Steven Pinker has said, but I feel that when we look at the facts about things like poverty, or knowledge, or global health, or education, or even the conversation surrounding AI alignment and existential risk, that things really are getting better, and that generally the extent to which it seems like it isn’t or that things are getting worse can be seen in many cases as our trend towards more information causing the perception that things are getting worse. But really, we are shining a light on everything that is already bad or we are coming up with new solutions to problems which generate new problems in and of themselves. And I think that this trend towards elucidating all of the problems which already exist, or through which we develop technologies and come to new solutions, which generate their own novel problems, this can seem scary as all of these bad things continue to come up, it seems almost never ending.

But they seem to me more now like revealed opportunities for growth and evolution of human civilization to new heights. We are clearly not at the pinnacle of life or existence or wellbeing, so as we encounter and generate and uncover more and more issues, I find hope in the fact that we can rest assured that we are actively engaged in the process of self-growth as a species. Without encountering new problems about ourselves, we are surely stagnating and risk decline. However, it seems that as we continue to find suffering and confusion and evil in the world and to notice how our new technologies and skills may contribute to these things, we have an opportunity to act upon remedying them and then we can know that we are still growing and that, that is a good thing. And so I think that there’s hope in the fact that we’ve continued to encounter new problems because it means that we continue to grow better. And that seems like a clearly good thing to me.

And with that, thanks so much for tuning into this Year In The Review Podcast on our activities and team as well as our feelings about existential hope moving forward. If you’re a regular listener, we want to share our deepest thanks for being a part of this conversation and thinking about these most fascinating and important of topics. And if you’re a new listener, we hope that you’ll continue to join us in our conversations about how to solve the world’s most pressing problems around existential risks and building a beautiful future for all. Many well and warm wishes for a happy and healthy end of the year for everyone listening from the Future of Life Institute team. If you find this podcast interesting, valuable, unique, or positive, consider sharing it with friends and following us on your preferred listening platform. You can find links for that on the pages for these podcasts found at futureoflife.org.

AI Alignment Podcast: On DeepMind, AI Safety, and Recursive Reward Modeling with Jan Leike

Jan Leike is a senior research scientist who leads the agent alignment team at DeepMind. His is one of three teams within their technical AGI group; each team focuses on different aspects of ensuring advanced AI systems are aligned and beneficial. Jan’s journey in the field of AI has taken him from a PhD on a theoretical reinforcement learning agent called AIXI to empirical AI safety research focused on recursive reward modeling. This conversation explores his movement from theoretical to empirical AI safety research — why empirical safety research is important and how this has lead him to his work on recursive reward modeling. We also discuss research directions he’s optimistic will lead to safely scalable systems, more facets of his own thinking, and other work being done at DeepMind.

 Topics discussed in this episode include:

  • Theoretical and empirical AI safety research
  • Jan’s and DeepMind’s approaches to AI safety
  • Jan’s work and thoughts on recursive reward modeling
  • AI safety benchmarking at DeepMind
  • The potential modularity of AGI
  • Comments on the cultural and intellectual differences between the AI safety and mainstream AI communities
  • Joining the DeepMind safety team

Timestamps: 

0:00 intro

2:15 Jan’s intellectual journey in computer science to AI safety

7:35 Transitioning from theoretical to empirical research

11:25 Jan’s and DeepMind’s approach to AI safety

17:23 Recursive reward modeling

29:26 Experimenting with recursive reward modeling

32:42 How recursive reward modeling serves AI safety

34:55 Pessimism about recursive reward modeling

38:35 How this research direction fits in the safety landscape

42:10 Can deep reinforcement learning get us to AGI?

42:50 How modular will AGI be?

44:25 Efforts at DeepMind for AI safety benchmarking

49:30 Differences between the AI safety and mainstream AI communities

55:15 Most exciting piece of empirical safety work in the next 5 years

56:35 Joining the DeepMind safety team

 

Works referenced:

Scalable agent alignment via reward modeling

The Boat Race Problem

Move 37

Jan Leike on reward hacking

OpenAI Safety Gym

ImageNet

Unrestricted Adversarial Examples

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Hello everyone and welcome to the AI Alignment Podcast. I’m Lucas Perry. Today, we’re speaking with Jan Leike. Jan Leike is a senior research scientist at DeepMind and his research aims at helping to make machine learning robust and beneficial; he works on safety and alignment of reinforcement learning agents. His current research can be understood as motivated by the following question: How can we design competitive and scalable machine learning algorithms that make sequential decisions in the absence of a reward function? If this podcast is interesting or valuable to you, please consider following us on your preferred listening platform and leaving us a good review.

This conversation covers Jan’s PhD and movement from theoretical to empirical AI research, why this shift took place and his view on the importance of empirical AI safety research, we discuss how DeepMind approaches the projects of beneficial AI and AI safety. We discuss the AI alignment landscape today and the kinds of approaches that Jan is most excited about. We get into Jan’s main area of research of recursive reward modeling, and we talk about AI safety benchmarking efforts at DeepMind and the intellectual and cultural differences between the AI alignment/AI safety community and the mainstream AI and machine learning community. As a friendly heads up, there were some audio issues with the incoming audio in the second half of the podcast. We did our best to clean these up and I feel the resulting audio to be easily listenable. I’d also like to give many thanks to Richard Ngo, Vishal Maini, and Richard Mallah for help on developing and refining the questions for this podcast. And with that, let’s get into our conversation with Jan Leike.

Why don’t you start off by taking us through your journey in the field of AI. How did you first become interested in math and computer science? Tell me a little bit about your time as a PhD student. What perked your curiosity where, why you were pursuing what you were pursuing?

Jan Leike: I got interested in AGI and AGI safety around 2012. I was doing a Master’s degree at a time, and I was trying to think about what I should do with my career. And I was reading a whole bunch of stuff online. That’s how I got into this whole area. My background was kind of in math and computer science at the time, but I wasn’t really working on AI. I was more working on software verification. Then I came across Marcus Hutter’s AIXI model, which is basically a formal mathematical model for what AGI could look like. And it’s highly idealized. It’s not something that could actually run, but you can kind of think about it and you can actually prove things about it. And I was really excited about that. I thought that was a great starting point because you remember that was back in 2012 before the whole deep learning revolution happened, so it was not really clear what kind of approaches might we actually take towards AGI. The purpose of my PhD was to kind of understand AGI from a high-level theoretical perspective.

Lucas Perry: The PhD was with Marcus Hutter on AIXI or “A,” “I,” “X,” “I.” From that pursuit, what interesting or really valuable things did you glean from that process?

Jan Leike: So my thesis ended up being just a number of theoretical results, some of which are that actually this idealized agent AIXI is not optimal in any objective sense. In a way, it all depends on the universal Turing machine that is used to define it. But however, there’s variants on AIXI that have objective properties, such as asymptotic convergence to the optimal policy. This variant is basically a variant based on Thompson sampling, but this is a fully general reinforcement learning setting. So that’s partially observable, and you don’t have episodes. It’s like everything is one long episode. So it’s not really a setting where you can give any sample complexity bounds. Asymptotic convergence is all you can do. And then another thing that came out of that was what we called, “A Formal Solution to the Grain of Truth Problem.” This is a collaboration with the Machine Intelligence Research Institute.

And the idea here is that one of the problems with the AIXI formal model is that it assumes that its environment is computable, but itself is incomputable. You can’t really do multi-agent analysis with it. And so what we did was propose a formalism that is like a variant on AIXI that can be in its own environment class if we embed an agent or environment together with other of these AIXI like agents. And then while they do that, they can still asymptotically learn to predict correctly what the agents will do and then converge to Nash equilibrium asymptotically.

Lucas Perry: So the sense in which AIXI was a theoretical ideal was that the process by which it was able to learn or infer preferences was computationally intractable.

Jan Leike: The AIXI model basically just tries to answer the reinforcement learning question. So you’re given an environment and you’re given a reward signal, how do you optimize that? In a way, you’re using what we call the Solomonoff prior to predict the next observations that come from the environment, and then you essentially do an exhaustive tree search over all the possible action sequences that you could take and the possible consequences that you would predict and then make the best action in terms of returns. This is kind of similar to how AlphaGo uses Monte Carlo tree search to select the best actions. The reason why you can’t literally build AIXI is that this Solomonoff prior that I mentioned, it is basically the set of all possible Turing machines which is countably infinite, and then you have to run all of them in parallel and take a weighted average over what they would predict.

If you’ve ever tried to run all of computer programs in parallel, you’ll know that this is not going to go too well. I find AIXI helpful when thinking about AGI in terms of what could advanced machine learning or reinforcement learning agents look like. So they have some kind of learned model in which they do planning and select actions that are good in the long run. So I think in a way it tells us that if you believe that the reinforcement learning problem is the right problem to phrase an AGI in, then AIXI proves that there can be a solution to that problem. I think on a high level, having thought about this model is useful when thinking about where are we headed and what are potential milestones on the way to AGI. But at the same time, I think my mistake at the time was really getting lost in some of the technical details that really matter if you want to publish a paper on this stuff, but don’t transfer as much in the analogy.

Lucas Perry: After you finished up your PhD with Hutter, you finished working on AIXI now. Is this when you transitioned to DeepMind and make this transition from theoretical to empirical research?

Jan Leike: Yeah, that’s basically right. At the time when I started my PhD, I decided to work on theory because it wasn’t really clear to me what AGI would look like and how we’d build it. So I wanted to do something general and something more timeless. Then we saw a whole bunch of evidence that deep reinforcement learning is viable and you can make it work. Came out with a DQN nature paper, and there was the AlphaGo event. And then it became pretty clear to me that deep reinforcement learning was going somewhere, and that’s something we should work on. At the time, my tool set was very theoretical and my comparative advantage was thinking about theory and using the tools that I learned and developed in my PhD.

And the problem was that deep reinforcement learning has very little theory behind it, and there’s very little theory on RL. And the little theory on RL that we have says that basically function approximation shouldn’t really work. So that means it’s really hard to actually gain traction on doing something theoretically. And at the same time, at the time we were very clear that we could just take some agents that existed and we can just build something, and then we could make incremental progress on things that would actually help us make AGI safer.

Lucas Perry: Can you clarify how the theoretical foundations of deep reinforcement learning are weak? What does that mean? Does that mean that we have this thing and it works, but we’re not super sure about how it works? Or the theories about the mechanisms which constitute that functioning thing are weak? We can’t extrapolate out very far with them?

Jan Leike: Yeah, so basically there’s the two parts. So if you take deep neural networks, there are some results that tell you that depth is better than width. And if you increase capacity, you can represent any function and things like that. But basically the kind of thing that I would want to use is real sample complexity bounds that tell you if your network has X many parameters, how much training data you do need, how many batches do you need to train in order to actually converge? Can you converge asymptotically? None of these things are even true in theory. You can get examples where it doesn’t work. And of course we know that in practice because sometimes training is just unstable, but it doesn’t mean that you can’t tune it and make it work in practice.

On the RL side, there is a bunch of convergence results that people have given in tabular MDPs, Markov decision processes. In that setting, everything is really nice and you can give sample complexity bounds, or let’s say some bounds on how long learning will take. But as soon as you kind of go into a function approximation setting, all bets are off and there’s very simple two-state MDPs they can draw where just simple linear function approximation completely breaks. And this is a problem that we haven’t really gotten a great handle on theoretically. And so going from linear function approximation to deep neural networks is just going to make everything so much harder.

Lucas Perry: Are there any other significant ways in which your thinking has changed as you transitioned from theoretical to empirical?

Jan Leike: In the absence of these theoretical tools, you have two options. Either you try to develop these tools, and that seems very hard and many smart people have tried for a long time. Or you just move on to different tools. And I think especially if you have systems that you can do experiments on, then having an empirical approach makes a lot of sense if you think that these systems actually can teach us something useful about the kind of systems that we are going to build in the future.

Lucas Perry: A lot of your thinking has been about understanding the theoretical foundations, like what AGI might even look like, and then transitioning to an empirical based approach that you see as efficacious for studying systems in the real world and bringing about safe AGI systems. So now that you’re in DeepMind and you’re in this intellectual journey that we’re covering, how is DeepMind and how is Jan approaching beneficial AI and AI alignment in this context?

Jan Leike: DeepMind is a big place, and there is a lot of different safety efforts across the organization. People are working on say robustness to adversarial inputs, fairness, verification of neural networks, interpretability and so on and so on. What I’m doing is I’m focusing on reward modeling as an approach to alignment.

Lucas Perry: So just taking a step back and still trying to get a bigger picture of DeepMind’s approach to beneficial AI and AI alignment. It’s being attacked at many different angles. So could you clarify this what seems to be like a portfolio approach? The AI alignment slash AI safety agendas that I’ve seen enumerate several different characteristics or areas of alignment and safety research that we need to get a grapple on, and it seems like DeepMind is trying its best to hit on all of them.

Jan Leike: DeepMind’s approach to safety is quite like a portfolio. We don’t really know what will end up panning out. So we pursue a bunch of different approaches in parallel. So I’m on the technical AGI safety team that roughly consists of three subteams. There’s a team around incentive theory that tries to model on a high level what incentives agents could have in different situations and how we could understand them. Then there is an agent analysis team that is trying to take some of our state of the art agents and figure out what they are doing and how they’re making the decisions they make. And this can be both from a behavioral perspective and from actually looking inside the neural networks. And then finally there is the agent alignment team, which I’m leading, and that’s trying to figure out how to scale reward modeling. There’s also an ethics research team, and then there’s a policy team.

Lucas Perry: This is a good place then to pivot into how you see the AI alignment landscape today. You guys have this portfolio approach that you just explained. Given that and given all of these various efforts for attacking the problem of beneficial AI from different directions, how do you see the AI alignment landscape today? Is there any more insight that you can provide into that portfolio approach given that it is contextualized within many other different organizations who are also coming at the problem from specific angles? So like MIRI is working on agent foundations and does theoretical research. CHAI has its own things that it’s interested in, like cooperative inverse reinforcement learning and inverse reinforcement learning. OpenAI is also doing its own thing that I’m less clear on, which may have to do with factored evaluation. Ought as well is working on stuff. So could you expand a little bit here on that?

Jan Leike: Our direction for getting a solution to the alignment problem revolves around recursive reward modeling. I think on a high level, basically the way I’m thinking about this is that if you’re working on alignment, you really want to be part of the projects that builds AGI. Be there and have impact while that happens. In order to do that, you kind of need to be a part of the action. So you really have to understand the tech on the detailed level. And I don’t think that safety is an add on that you think of later or then add at a later stage in the process. And I don’t think we can just do some theory work that informs algorithmic decisions that make everything go well. I think we need something that is a lot more integrated with the project that actually builds AGI. So in particular, the way we are currently thinking about is it seems like the part that actually gives you alignment is not this algorithmic change and more something like an overall training procedure on how to combine your machine learning components into a big system.

So in terms of how I pick my research directions, I am most excited about approaches that can scale to AGI and beyond. Another thing that I think is really important is that I think people will just want to build agents, and we can’t only constrain ourselves to building say question answering systems. There’s basically a lot of real world problems that we want to solve with AGI, and these are fundamentally sequential decision problems, right? So if I look something up online and then I write an email, there’s a sequence of decisions I make, which websites I access and which links I click on. And then there’s a sequence of decisions of which characters are input in the email. And if you phrase the problem as, “I want to be able to do most things that humans can do with a mouse and a keyboard on a computer,” then that’s a very clearly scoped reinforcement learning problem. Although the reward function problem is not very clear.

Lucas Perry: So you’re articulating that DeepMind, you would explain that even given all these different approaches you guys have on all these different safety teams, the way that you personally pick your research direction is that you’re excited about things which safely scale to AGI and superintelligence and beyond. And that recursive reward modeling is one of these things.

Jan Leike: Yeah. So the problem that we’re trying to solve is the agent alignment problem. And the agent alignment problem is the question of how can we create agents that act in accordance with the user’s intentions. We are kind of inherently focused around agents. But also, we’re trying to figure out how to get them to do what we want. So in terms of reinforcement learning, what we’re trying to do is learn a reward function that captures the user’s intentions and that we can optimize with RL.

Lucas Perry: So let’s get into your work here on recursive reward modeling. This is something that you’re personally working on. Let’s just start off with what is recursive reward modeling?

Jan Leike: I’m going to start off with explaining what reward modeling is. What we want to do is we want to apply reinforcement learning to the real world. And one of the fundamental problems of doing that is that the real world doesn’t have built in reward functions. And so basically what we need is a reward function that captures the user’s intentions. Let me give you an example for the core motivation of why we want to do reward modeling, a blog posts that OpenAI made a while back: The Boat Race Problem, where they were training a reinforcement modeling agent to race the boat around the track and complete the race as fast as possible, but what actually ended up happening is that the boat was getting stuck in the small lagoon and then circling around there. And the reason for that is the RL agent was trying to maximize the number of points that it gets.

And the way you get points in this game is by moving over these buoys that are along the track. And so if you go to the lagoon, there’s these buoys that keep respawning, and then so you can get a lot of points without actually completing the race. This is the kind of behavior that we don’t want out of our AI systems. But then on the other hand, there’s things we wouldn’t think of but we want out of our AI systems. And I think a good example is AlphaGo’s famous Move 37. In its Go game against Lee Sedol, Move 37 was this brilliant move that AlphaGo made that was a move that no human would have made, but it actually ended up turning around the entire game in AlphaGo’s favor. And this is how Lee Sedol ended up losing the game. The commonality between both of these examples is some AI system doing something that a human wouldn’t do.

In one case, that’s something that we want: Move 37. In the other case, it’s something that we don’t want, this is the circling boat. I think the crucial difference here is in what is the goal of the task. In the Go example, the goal was to win the game of Go. Whereas in the boat race example, the goal was to go around the track and complete the race, and the agent clearly wasn’t accomplishing that goal. So that’s why we want to be able to communicate goals to our agents. So we need these goals or these reward functions that our agents learn to be aligned with the user’s intentions. If we do it this way, we also get the possibility that our systems actually outperform humans and actually do something that would be better than what the human would have done. And this is something that you, for example, couldn’t get out of imitation learning or inverse reinforcement learning.

The central claim that reward modeling is revolving around is that evaluation is easier than behavior. So I can, for example, look at a video of an agent and tell you whether or not that agent is doing a backflip, even though I couldn’t do a backflip myself. So in this case, it’s kind of harder to actually do the task than to evaluate it. And that kind of puts the human in the leveraged position because the human only has to be able to give feedback on the behavior rather than actually being able to do it. So we’ve been building prototypes for reward modeling for a number of years now. We want to actually get hands on experience with these systems and see examples of where they fail and how we can fix them. One particular example seen again and again is if you don’t provide online feedback to the agent, something can happen is that the agent finds loopholes in the reward model.

It finds states where the reward model think it’s a high reward state, but actually it isn’t. So one example is in the Atari game Hero, where you can get points for shooting laser beams at spiders. And so what the agent figures out is that if it stands really close to the spider and starts shooting but then turns around and the shot goes the other way, then the reward model will think the shot is about to hit the spider so it should give you a reward because that gives you points. But actually the agent doesn’t end killing the spider, and so it can just do the same thing again and again and get reward for it.

And so it’s kind of found this exploit in the reward model. We know that online training, training with an actual human in the loop who keeps giving feedback, can get you around this problem. And the reason is that whenever the agent gets stuck in these kinds of loopholes, a human can just look at the agent’s behavior and give some additional feedback you can then use to update the reward model. And the reward model in turn can then teach the agent that this is actually not a high reward state. So what about recursive reward modeling? One question that we have when you’re trying to think about how to scale reward modeling is that eventually you want to tackle domains where it’s too hard for the human to really figure out what’s going on because the core problem is very complex or the human is not an expert in the domain.

Right now, this is basically only in the idea stage, but the basic idea is to apply reward modeling recursively. You have this evaluation task that is too complex for the human to do, and you’re training a bunch of evaluation helper agents that will help you do the evaluation of the main agent that you’re training. These agents then in turn will be trained with reward modeling or recursive reward modeling.

Let’s say you want to train the agent that designs a computer chip, and so it does a bunch of work, and then it outputs this giant schema for what the chip could look like. Now that schema is so complex and so complicated that, as a human, even if you were an expert in chip design, you wouldn’t be able to understand all of that, but you can figure out what aspects of the chip you care about, right? Like what is the number of, say, FLOPS it can do per second or what is the thermal properties.

For each of these aspects that you care about, you will spin up another agent, you teach another agent how to do that subtask, and then you would use the output of that agent, could be, let’s say, a document that details the thermal properties or a benchmark result on how this chip would do if we actually built it. And then you can look at all of these outputs of the evaluation helper agents, and then you compose those into feedback for the actual agent that you’re training.

The idea is here that basically the tasks that the evaluation helper agents have to do are easier problems in a more narrow domain because, A, they only have to do one sub-aspect of the evaluation, and also you’re relying on the fact that evaluation is easier than behavior. Since you have this easier task, you would hope that if you can solve easier tasks, then you can use the solutions or the agents that you train on these easier tasks to kind of scaffold your way up to solving harder and harder tasks. You could use this to push out the general scope of tasks that you can tackle with your AI systems.

The hope would be that, at least I would claim, that this is a general scheme that, in principle, can capture a lot of human economic activity that way. One really crucial aspect is that you’re able to compose a training signal for the agents that are trying to solve the task. You have to ground this out where you’re in some level, if you picture this big tree or directed acyclic graph of agents that help you train other agents and so on, there has to be a bottom level where the human can just look at what’s going on and can just give feedback directly, and use the feedback on the lowest level task to build up more and more complex training signals for more and more complex agents that are solving harder and harder tasks.

Lucas Perry: Can you clarify how the bootstrapping here happens? Like the very bottom level, how you’re first able to train the agents dedicated to sub-questions of the larger question?

Jan Leike: If you give me a new task to solve with recursive reward modeling, the way I would proceed is, assuming that we solved all of these technical problems, let’s say we can train agents with reward modeling on arbitrary tasks, then the way I would solve it is I would first think about what do you care about in this task? How do I measure its success? What are the different aspects of success that I care about? These are going to be my evaluation criteria.

For each of my evaluation criteria, I’m going to define a new subtask, and the subtasks will be “help me evaluate this criteria.” In the computer chip example, that was FLOPs per second, and so on, and so on. Then I proceed recursively. For each of the subtasks that I just identified, I start again by saying, “Okay, so now I have this agent, it’s supposed to get computer chip design, and a bunch of associated documentation, say, and now it has to produce this document that outlines the thermal properties of this chip.”

What I would do, again, is I’d be like, “Okay, this is a pretty complex task, so let’s think about how to break it down. How would I evaluate this document?” So I proceed to do this until I arrive at a task where I can just say, “Okay, I basically know how to do this task, or I know how to evaluate this task.” And then I can start spinning up my agents, right? And then I train the agents on those tasks, and then once I’ve trained all of my agents on the leaf level tasks, and I’m happy with those, I then proceed training the next level higher.

Lucas Perry: And the evaluation criteria, or aspects, are an expression of your reward function, right?

Jan Leike: The reward function will end up capturing all of that. Let’s say we have solved all of the evaluation subtasks, right? We can use the evaluation assistant agents to help us evaluate the overall performance of the main agent that you were training. Of course, whenever this agent does something, you don’t want to have to evaluate all of their behavior, so what you do is you essentially distill this whole tree of different evaluation helper agents that you build. There’s lots of little humans in the loop in that tree into one model that will predict what that whole tree of what agents and humans will say. That will basically be the reward that the main agent is being trained on.

Lucas Perry: That’s pretty beautiful. I mean, the structure is elegant.

Jan Leike: Thanks.

Lucas Perry: I still don’t fully understand it obviously, but it’s beginning to dawn upon my non-computer science consciousness.

Jan Leike: Well, current research on this stuff revolves around two questions, and I think these are the main questions that we need to think about when trying to figure out whether or not a scheme like this can work. The first question is how well does the one-step set up work, only reward modeling, no recursion? If one-step reward modeling doesn’t work, you can’t hope to ever build a whole tree out of that component, so clearly that has to be true.

And then the second question is how do errors accumulate if we build a system? Essentially what you’re doing is you’re training a bunch of machine learning components to help you build a training signal for other machine learning components. Of course none of them are going to be perfect. Even if my ability to do machine learning is infinitely great, which of course it isn’t, at the end of the day, they’re still being trained by humans, and humans make mistakes every once in a while.

If my bottom level has a certain, let’s say, reward accuracy, the next level up that I use those to train is going to have a lower accuracy, or potentially have a lower accuracy, because their training signal is slightly off. Now, if you keep doing this and building a more and more complex system, how do the errors in the system accumulate? This is a question we haven’t really done much work on so far, and this is certainly something we need to do more on in the future.

Lucas Perry: What sorts of experiments can we do with recursive reward modeling today, and why is it hard?

Jan Leike: The reason why this is difficult to find such tasks is because essentially you need tasks that have two properties. The first property is they have to be difficult enough so that you can’t evaluate them directly, right? Otherwise, you wouldn’t need the recursive part in recursive reward modeling. And then secondly, they have to be easy enough that we can actually hope to be able to do them with today’s systems.

In a way, it’s two very contradictory objectives, so it’s kind of hard to find something in the intersection. We can study a lot of the crucial parts of this independently of actually being able to build a prototype of the recursive part of recursive reward modeling.

Lucas Perry: I guess I’m just trying to also get a sense of when you think that recursive reward modeling might become feasible.

Jan Leike: One good point would be where we could get to the point where we’ve done like a whole lot of tasks with reward modeling, and we’re basically running out of tasks that we can do directly. Or an opportunity comes up when we find a task that we actually think we can do and that requires a decomposition. There’s ways in which you could try to do this now by artificially limiting yourself. You could, for example, solve chess with recursive reward modeling by pretending that there isn’t a procedure or reward function for chess.

If you rely on the human to look at the board and tell you whether or not it’s checkmate, if you’re like a pro chess player, you could probably do that quite well. But if you’re an amateur or a non-expert, you don’t really know that much about chess other than the rules, it’s kind of hard for a human to do that quickly.

What you could do is you could train evaluation helper agents that give you useful information about the chessboard, where, let’s say, they color certain tiles on the board that are currently under threat. And then using that information, you can make the assessment of whether this is a checkmate situation much more easily.

While we could do this kind of setup and use recursive reward modeling, and you’ll maybe learn something, at the same time, it’s not an ideal test bed because it’s just not going to look impressive as a solution because we already know how to use machine learning to play chess, so we wouldn’t really add anything in terms of value of tasks that we can do now that we couldn’t do otherwise.

Lucas Perry: But wouldn’t it show you that recursive reward modeling works?

Jan Leike: You get one data point on this particular domain, and so the question is what data points would you learn about recursive reward modeling that you wouldn’t learn in other ways? You could treat this as like two different individual tasks that you just solve with reward modeling. One task is coloring the tiles of the board, and one task is actually playing chess. We know we can do the latter, because we’ve done that.

What would be interesting about this experiment would be that you kind of learn how to cope with the errors in the system. Every once in a while, like the human will label a state incorrectly, and so you would learn how well you can still train even though your training signal is slightly off. I think we can also investigate that without actually having to literally build this recursive setup. I think there’s easier experiments we could do.

Lucas Perry: Do you want to offer any final words of clarification or just something succinct about how this serves AI safety and alignment?

Jan Leike: One way to think about safety is this specification robustness assurance framework. What this is, is basically a very high-level way of carving the space of safety problems into different categories. These are three categories. The first category is specification. How do you get the system to do what you want? Basically, what we usually mean when we talk about alignment. The second category is robustness. How can you make your system robust to various perturbations, such as adversarial inputs or distributional changes? The third category is assurance. How can you get better calibrated beliefs about how safe, or in the sense of robust and specification too, your system actually is?

Usually in assurance category, we talk about various tools for understanding and monitoring agents, right? This is stuff about testing, interpretability, verification, and so on, and so on. The stuff that I am primarily working on is in the specification category, where we’re basically trying to figure out how to get our agents to pursue the goals that we want them to pursue. The ambition of recursive reward modeling is to solve all of the specification problems. Some of the problems that we worry about are, let’s say, off switch problems, where your agent might meddle with its off switch, and you just don’t want it to do that. Another problem, let’s say, what about side effects? What about reward tampering? There’s a whole class of these kind of problems, and instead of trying to solve them each individually, we try to solve the whole class of problems at once.

Lucas Perry: Yeah, it’s an ambitious project. The success of figuring out the specification problem supervenes upon other things, but at the same time, if those other things are figured out, the solution to this enables, as you’re saying, a system which safely scales to super intelligence and beyond, and retains alignment, right?

Jan Leike: That’s the claim.

Lucas Perry: Its position then in AI safety and alignment is pretty clear to me. I’m not sure if you have other points you want to add to that though?

Jan Leike: Nope, I think that was it.

Lucas Perry: Okay. We’re just going to hit on a few more questions here then on recursive reward modeling. Recursive reward modeling seems to require some very competent agent or person to break down an evaluation into sub-questions or sub-evaluations. Is the creation of that structure actually scalable?

Jan Leike: Yeah, this is a really good question. I would picture these decompositions of the evaluation task to be essentially hardcoded, so you have a human expert that knows something about the task, and they can tell you what they care about in the task. The way I picture this is you could probably do a lot of the tasks in recursion depth of three, or five, or something like that, but eventually they’re so out of the range of what the human can do that they don’t even know how to break down the evaluation of that task.

Then this decomposition problem is not a problem that you want to tackle with recursive reward modeling, where basically you train an agent to propose decompositions, and then you have an evaluation where the human evaluates whether or not that was good decomposition. This is very, very far future stuff at that point; you’ve already worked with recursive reward modeling for a while, and you have done a bunch of decompositions, and so I don’t expect this to be something that we will be addressing anytime soon, and it’s certainly something that is within the scope of what the stuff should be able to do.

Recursive reward modeling is a super general method that you basically want to be able to apply to any kind of task that humans typically do, and proposing decompositions of the evaluation is one of them.

Lucas Perry: Are there pessimisms that you have about recursive reward modeling? How might recursive reward modeling fall short? And I think the three areas that we want to hit here are robustness, mesa-optimizers, and tasks that are difficult to ground.

Jan Leike: As I said earlier, basically you’re trying to solve the whole class of specification problems, where you still need robustness and assurance. In particular, there’s what we call the reward to result gap, where you might have the right reward function, and then you still need to find an agent that actually is good at optimizing that reward function. That’s an obvious problem, and there’s a lot of people just trying to make RL agents perform better. Mesa-optimizers I think are, in general, an open question. There’s still a lot of uncertainty around how they would come up, and what exactly is going on there. I think one thing that would be really cool is actually have a demo of how they could come up in a training procedure in a way that people wouldn’t expect. I think that would be pretty valuable.

And then thirdly, recursive reward modeling is probably not very well suited for tasks that are really difficult to ground. Moral philosophy might be in that category. The way I understand this is that moral philosophy tries to tackle questions that is really difficult to get really hard facts and empirical evidence on. These human intuitions might be like really difficult to ground, and to actually teach to our agents in a way that generalizes. If you don’t have this grounding, then I don’t know how you could build a training signal for the higher level questions that might evolve from that.

In other words, to make this concrete, let’s say I want to train an agent to write a really good book in moral philosophy, and now of course I can evaluate that book based on how novel it is relative to what the humans have written, or like the general literature. How interesting is it? Does it make intuitive sense? But then in order to actually make the progress on moral philosophy, I need to update my value somehow in a way that is actually the right direction, and I don’t really know what would be a good way to evaluate.

Lucas Perry: I think then it would be a good spot here for us to circle back around to the alignment landscape. A lot of what you’ve been saying here has rung bells in my head about other efforts, like with iterated distillation and application, and debate with Geoffrey Irving, and factored evaluation at Ought. There’s these categories of things which are supposed to be general solutions to making systems which safely scale to aligned super intelligence and beyond. This also fits in that vein of the alignment landscape, right?

Jan Leike: Yeah. I think that’s right. In some ways, the stuff that you mentioned, like projects that people are pursuing at OpenAI, and at Ought, share lot structure with what recursive reward modeling is trying to do, where you try to compose training signals for tasks that are too hard for humans. I think one of the big differences in how we think about this problem is that we want to figure out how to train the agents that do stuff in the world, and I think a lot of the discussion at OpenAI and Ought kind of center around building question answering systems and fine tuning language models, where the ambition is to get them to do reasoning tasks that are very difficult for humans to do directly, and then you do that by decomposing them into easier reasoning tasks. You could say it’s one scalable alignment technique out of several that are being proposed, and we have a special focus on agents. I think agents are great. I think people will build agents that do stuff in the world, take sequential actions, look at videos.

Lucas Perry: What research directions or projects are you most excited about, just in general?

Jan Leike: In general, the safety community as a whole should have a portfolio approach where we just try to pursue a bunch of paths in parallel, essentially as many as can be pursued in parallel. I personally I’m most excited about approaches that can work with existing deep learning and scale to AGI and beyond. There could be many ways in which things pan out in the future, but right now there’s an enormous amount of resources being put towards scaling deep learning. That’s something that we should take seriously and consider into the way we think about solutions to the problem.

Lucas Perry: This also reflects your support and insistence on empirical practice as being beneficial, and the importance of being in and amongst the pragmatic, empirical, tangible, real-world, present-day projects, which are likely to bring about AGI, such that one can have an impact. What do you think is missing or underserved in AI alignment and AI safety? If you were given, say, like $1 billion, how would you invest it here?

Jan Leike: That would be the same answer I just gave you before. Basically, I think you want to have a portfolio, so you invest that money to like a whole bunch of directions. I think I would invest more than many other people in the community towards working empirically with, let’s say, today’s deep RL systems, build prototypes of aligned AGI, and then do experiments on them. I’ll be excited to see more of that type of work. Those might be like my personal biases speaking too because that’s like why I’m working on this direction.

Lucas Perry: Yeah. I think that’s deeply informative though for other people who might be trying to find their footing in determining how they ought to approach this problem. So how likely do you think it is that deep reinforcement learning scales up to AGI? What are the strongest considerations for and against that?

Jan Leike: I don’t think anyone really knows whether that’s the case or not. Deep learning certainly has a pretty convincing track record of fitting arbitrary functions. We can basically fit a function that knows how to play StarCraft. That’s a pretty complicated function. I think, well, whatever the answer is to this question, in safety what we should be doing is we should be conservative about this. Take the possibility that deep RL could scale to AGI very seriously, and plan for that possibility.

Lucas Perry: How modular do you think AGI will be and what makes you optimistic about having clearly defined components which do planning, reward modeling, or anything else?

Jan Leike: There’s certainly a lot of advantages if you can build a system of components that you understand really well. The way I currently picturing trying to build, say, a prototype for aligned AGI would be somewhat modular. The trend in deep learning is always towards training end-to-end. Meaning that you just have your raw data coming in and the raw predictions coming out and you just train some giant model that does all of that. That certainly gives you performance benefits on some tasks because whatever structure the model ends up learning can just be better than what the humans perceptions would recommend.

How it actually ends up working out is kind of unclear at the moment. I think in terms of what we’d like for safety is that if you have a modular system, it’s going to make it easier to really understand what’s going on there because you can understand the components and you can understand how they’re working together, so it helps you break down the problem of doing assurance in the system, so that’s certainly a path that we would like to work out

Lucas Perry: And is inner alignment here a problem that is relevant to both the modular components and then how the modular components are interrelated within the system?

Jan Leike: Yeah, I think you should definitely think about what the incentives are and what the training signals are of all of the components that you’re using to build a system.

Lucas Perry: As we approach AGI, what efforts are going on right now at DeepMind for AI safety benchmarking?

Jan Leike: We’ve spent some time thinking about AI safety benchmarks. We made a few little environments that are called gridworlds that are basically just kind of like chess board tiles where your agent moves around, and those are I think useful to showcase some of the problems. But really I think there’s a lot of appetite right now for building benchmark environments that let you test your agent on different properties. For example, OpenAI just recently released a collection of environments for safe exploration that require you to train in the presence of site constraints. But there’s also a lot of other properties that you could actually build tractable benchmarks for today.

So another example would be adversarial inputs, and there’s this generalized adversarial examples challenge. There’s also, you could build a benchmark for distributional shift, which in some ways you already do that in machine learning a lot where you do a little training in a test split, but usually these are on from the same decision. There’s various trans learning research going on. I don’t think there is really established benchmarks for those. This is certainly something that could be done.

There’s also problems that we worry about in longterm safety that I think would be kind of hard to really do good benchmarks on right now. Here I’m thinking of things like the off-switch problems, reward gaming, where you actually have an agent that can modify its own input rewards. The problem here is really you need very complex environments that are difficult to build and learn with current systems.

But I think overall this is something that would be very useful for the community to pursue, because the history of recent machine learning progress has always been that if you make a benchmark, people will start improving on the benchmark. The benchmark starts driving progress, and we’ve seen this with the ImageNet benchmark. We’ve seen that with the Atari benchmark, just to name two examples. And so if you had a safety benchmark, you would kind of incentivize people to make safety progress. Then if it’s an established benchmark, you can also publish on this. Then longer term, once you’ve had success with a bunch of benchmarks or they’ve been established and accepted, they could also become industry norms.

Lucas Perry: I’m just trying to understand how benchmarks in general, whether they’re safety benchmarks or not, exist in the international and national communities of computer science. Are Chinese computer scientists going to care about the DeepMind safety benchmarks? Are they something that necessarily are incorporated?

Jan Leike: Why do you think Chinese researchers care about the ImageNet benchmark?

Lucas Perry: Well, I don’t really know anything about the ImageNet benchmark.

Jan Leike: Oh, so the ImageNet is this big collection of labeled images that a lot of people train image classifiers on. So these are like pictures of various breeds of dogs and cats and so on. Things people are doing, or at least were doing for a while, was training larger and larger vision models on ImageNet and then you can measure what is your test accuracy of ImageNet and that’s a very tangible benchmark on how well you can do computer vision with your machine learning models.

Lucas Perry: So when these benchmarks are created, they’re just published openly?

Jan Leike: Yeah, you can just download ImageNet. You can just get started at trying a model on ImageNet in like half an hour on your computer.

Lucas Perry: So the safety benchmarks are just published openly. They can just be easily accessed, and they are public and open methods by which systems can be benchmarked to the degree to which they’re aligned and safe?

Jan Leike: Yeah. I think in general people underestimate how difficult environment design is. I think in order for a safety benchmark to get established, it actually has to be done really well. But if you do it really well and you can get a whole bunch of people interested because it’s becomes clear that this is something that is hard to do and methods can’t … but it’s also something that, let’s say if you made progress on it, you could write a paper or you can get employed at a company because you did something that people agreed was hard to do. At the same time, it has the result that is very easily measurable.

Lucas Perry: Okay. To finish up on this point, I’m just interested in if you could give a final summary on your feelings and interests in AI safety benchmarking. Besides the efforts that are going on right now, what are you hoping for?

Jan Leike: I think in summary I would be pretty excited about seeing more safety benchmarks that actually measure some of the things that we care about if they’re done well and if they really pay attention to a lot of detail, because I think that can drive a lot of progress on these problems. It’s like the same story as with reward modeling, right? Because then it becomes easy to evaluate progress and it becomes easy to evaluate what people are doing and that makes it easy for people to do stuff and then see whether or not whatever they’re doing is helpful.

Lucas Perry: So there appears to be cultural and intellectual differences between the AI alignment and AI safety communities and the mainstream AI community, people who are just probably interested in deep learning and ML.

Jan Leike: Yeah. So historically the machine learning community and the long term safety community have been somewhat disjoined.

Lucas Perry: So given this disjointedness, what would you like the mainstream ML and AI community to do or to think differently?

Jan Leike: The mainstream ML community doesn’t think enough about how whatever they are building will actually end up being deployed in practice, and I think people that are starting to realize that they can’t really do RL in the real world if they don’t do reward modeling, and I think it’s most obvious to robotics people trying to get robots to do stuff in the real world all the time. So I think reward modeling will become a lot more popular. We’ve already seen that in the past two years.

Lucas Perry: Some of what you’re saying is reminding me of Stuart Russell’s new book, Human Compatible. I’m curious to know if you have any thoughts on that and what he said there and how that relates to this.

Jan Leike: Yes. Stuart also has been a proponent of this for a long time. In a way, he has been one of the few computer science professors who are really engaging with some of these longer term AI questions, in particular around safety. I don’t know why there isn’t more people saying what he’s saying.

Lucas Perry: It seems like it’s not even just the difference and disjointedness between the mainstream ML and AI community and then the AI safety and AI alignment community is just that one group is thinking longterm and the other is not. It’s just a whole different perspective and understanding about what it means for something to be beneficial and what it takes for something to be beneficial. I don’t think you need to think about the future to understand the importance of recursive reward modeling or the kinds of shifts that Stuart Russell is arguing for given the systems which are being created today are already creating plenty of problems. We’ve enumerated those many times here. That seems to me to be because the systems are clearly not fully capturing what human beings really want. Just trying to understand better and also what you think the alignment and AI safety community should do or think differently to address this difference.

Jan Leike: The longterm safety community in particular, initially I think a lot of the arguments people made were very high level and almost philosophical, has been a little bit of a shift towards concrete mathematical, but also at the same time very abstract research, then towards empirical research. I think this is kind of a natural transition from one mode of operation to something more concrete, but there’s still some parts of the safety community in the first phases.

I think there’s a failure mode here where people just spend a lot of time thinking about what would be the optimal way of addressing a certain problem before they actually go out and do something, and I think an approach that I tend to favor is thinking about this problem for a bit and then doing some stuff and then iterating and thinking some more. That way you get some concrete feedback on whether or not you’re actually making progress.

I think another thing that I would love to see the community do more of is I think there’s not enough appreciation for clear explanations, and there’s a tendency that people write a lot of vague blog posts, and that’s difficult to critique and to build on. Where we really have to move as a community is toward more concrete technical stuff that you can clearly point at parts of it and be like, “This makes sense. This doesn’t make sense. He has very likely made a mistake,” then that’s something where we can actually build on and make progress with.

In general. I think this is the sort of community that attracts a lot of thinking from first principles, and there’s a lot of power in that approach. If you’re not bound by what other people think and what other people have tried, then you can really discover truly novel directions and truly novel approaches and ideas, but at the same time I think there’s also a danger of overusing this kind of technique, because I think it’s right to also connect what you’re doing with the literature and what everyone else is doing. Otherwise, you will just keep reinventing the wheel on some kind of potential solution to safety.

Lucas Perry: Do you have any suggestions for the AI safety and alignment community regarding also alleviating or remedying this cultural and intellectual difference between what they’re working on and what the mainstream ML and AI communities are doing and working on such that it shifts their mindset and work to increase the chances that more people are aware of what is required to create beneficial AI systems?

Jan Leike: Something that would be helpful for this bridge would be if the safety community as a whole, let’s say, spends more time engaging with the machine learning literature, the machine learning lingo and jargon, and try to phrase the safety ideas and the research in those terms and write it up in a paper that can be published in NeurIPS rather than something that is a blog post. The form is just a format that people are much more likely to engage with.

This is not to say that I don’t like blog posts. Blogs are great for getting some of the ideas across. We also provide blog posts about our safety research at DeepMind, but if you really want to dive into the technical details and you want to get the machine learning community to really engage with the details of your work, then writing a technical paper is just the best way to do that.

Lucas Perry: What do you think might be the most exciting piece of empirical safety work which you can realistically imagine seeing within the next five years?

Jan Leike: We’ve done a lot of experiments with reward modeling, and I personally have been surprised how far we could scale it. It’s been able to tackle every problem that we’ve been throwing at it. So right now we’re training agents to follow natural language instructions in these 3D environments that we’re building here at DeepMind. These are a lot harder problems than, say, Atari games, and reward modeling still is able to tackle them just fine.

One kind of idea what that prototype could look like is a model-based reinforcement learning agent where you learn a dynamics model then train a reward model from human feedback then the reinforcement learning agent uses the dynamics model and the reward model to do search at training and at test time. So you can actually deploy it in the environment and it can just learn to adapt its plans quickly. Then we could use that to do a whole bunch of experiments that we would want that system to do. You know like, solve off-switch problems or solve reward tampering problems or side effects, problems and so on. So I think that’d be really exciting, and I think that’s well within the kind of system that we could build in the near future.

Lucas Perry: Cool. So then wrapping up here, let’s talk a little bit about the pragmatic side of your team and DeepMind in general. Is DeepMind currently hiring? Is the safety team hiring? What is the status of all of that, and if listeners might be able to get involved?

Jan Leike: We always love to hire new people that join us in these efforts. In particular, we’re hiring research engineers and research scientists to help us build this stuff. So, if you, dear listener, have, let’s say, a Master’s degree in machine learning or some kind of other hands on experience in building and training deep learning systems, you might want to apply for a research engineering position. For a research scientist position the best qualification is probably a PhD in machine learning or something equivalent. We also do research internships for people who maybe have a little bit early in their PhD. So this is the kind of thing that applies to you and you’re excited about working on these sort of problems, then please contact us.

Lucas Perry: All right, and with that, thank you very much for your time, Jan.

End of recorded material

FLI Podcast: The Psychology of Existential Risk and Effective Altruism with Stefan Schubert

We could all be more altruistic and effective in our service of others, but what exactly is it that’s stopping us? What are the biases and cognitive failures that prevent us from properly acting in service of existential risks, statistically large numbers of people, and long-term future considerations? How can we become more effective altruists? Stefan Schubert, a researcher at University of Oxford’s Social Behaviour and Ethics Lab, explores questions like these at the intersection of moral psychology and philosophy. This conversation explores the steps that researchers like Stefan are taking to better understand psychology in service of doing the most good we can. 

Topics discussed include:

  • The psychology of existential risk, longtermism, effective altruism, and speciesism
  • Stefan’s study “The Psychology of Existential Risks: Moral Judgements about Human Extinction”
  • Various works and studies Stefan Schubert has co-authored in these spaces
  • How this enables us to be more altruistic

Timestamps:

0:00 Intro

2:31 Stefan’s academic and intellectual journey

5:20 How large is this field?

7:49 Why study the psychology of X-risk and EA?

16:54 What does a better understanding of psychology here enable?

21:10 What are the cognitive limitations psychology helps to elucidate?

23:12 Stefan’s study “The Psychology of Existential Risks: Moral Judgements about Human Extinction”

34:45 Messaging on existential risk

37:30 Further areas of study

43:29 Speciesism

49:18 Further studies and work by Stefan

Works Cited 

Understanding cause-neutrality

Considering Considerateness: Why communities of do-gooders should be exceptionally considerate

On Caring by Nate Soares

Against Empathy: The Case for Rational Compassion

Eliezer Yudkowsky’s Sequences

Whether and Where to Give

A Person-Centered Approach to Moral Judgment

Moral Aspirations and Psychological Limitations

Robin Hanson on Near and Far Mode 

Construal-Level Theory of Psychological Distance

The Puzzle of Ineffective Giving (Under Review) 

Impediments to Effective Altruism

The Many Obstacles to Effective Giving (Under Review) 

Moral Aspirations and Psychological Limitations

 

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Lucas Perry: Hello everyone and welcome to the Future of Life Institute Podcast. I’m Lucas Perry.  Today, we’re speaking with Stefan Schubert about the psychology of existential risk, longtermism, and effective altruism more broadly. This episode focuses on Stefan’s reasons for exploring psychology in this space, how large this space of study currently is, the usefulness of studying psychology as it pertains to these areas, the central questions which motivate his research, a recent publication that he co-authored which motivated this interview called The Psychology of Existential Risks: Moral Judgements about Human Extinction, as well as other related work of his. 

This podcast often ranks in the top 100 of technology podcasts on Apple Music. This is a big help for increasing our audience and informing the public about existential and technological risks, as well as what we can do about them. So, if this podcast is valuable to you, consider sharing it with friends and leaving us a good review. It really helps. 

Stefan Schubert is a researcher at the the Social Behaviour and Ethics Lab at the University of Oxford, working in the intersection of moral psychology and philosophy. He focuses on psychological questions of relevance to effective altruism, such as why our altruistic actions are often ineffective, and why we don’t invest more in safe-guarding our common future. He was previously a researcher at Centre for Effective Altruism and a postdoc in philosophy at the London School of Economics. 

We can all be more altruistic and effective in our service of others. Expanding our moral circles of compassion farther into space and deeper into time, as well as across species, and possibly even eventually to machines, while mitigating our own tendencies towards selfishness and myopia is no easy task and requires deep self-knowledge and far more advanced psychology than I believe we have today. 

This conversation explores the first steps that researchers like Stefan are taking to better understand this space in service of doing the most good we can. 

So, here is my conversation with Stefan Schubert 

Can you take us through your intellectual and academic journey in the space of EA and longtermism and in general, and how that brought you to what you’re working on now?

Stefan Schubert: I started range of different subjects. I guess I had a little bit of hard time deciding what I wanted to do. So I got a masters in political science. But then in the end, I ended up doing a PhD in philosophy at Lund University in Sweden, specifically in epistemology, the theory of knowledge. And then I went to London School of Economics to do a post doc. And during that time, I discovered effective altruism and I got more and more involved with that.

So then I applied to Centre for Effective Altruism, here in Oxford, to work as a researcher. And I worked there as a researcher for two years. At first, I did policy work, including reports on catastrophic risk and x-risk for a foundation and for a government. But then I also did some work, which was general and foundational or theoretical nature, including work on the notion of cause neutrality, how we should understand that. And also on how EAs should think about everyday norms like norms of friendliness and honesty.

And I guess that even though I, at the time I didn’t do sort of psychological empirical research, that sort of relates to my current work on psychology because for the last two years, I’ve worked on the psychology of effective altruism at the Social Behavior and Ethics Lab here at Oxford. This lab is headed by Nadira Farber and I also work closely with Lucius Caviola, who did his PhD here at Oxford and recently moved to Harvard to do a postdoc.

So we have three strands of research. The first one is sort of the psychology of effective altruism in general. So why is it that people aren’t effectively altruistic? This is a bit of a puzzle because generally people, they are at least somewhat effective when they working for their own interest. To be sure they are not maximally effective, but when they try to buy a home or save for retirement, they do some research and sort of try to find good value for money.

But they don’t seem to do the same when they donate to charity. They aren’t as concerned with effectiveness. So this is a bit of a puzzle. And then there are two strands of research, which have to do with specific EA causes. So one is the psychology of longtermism and existential risk, and the other is the psychology of speciesism, human-animal relations. So out of these three strands of research, I focused the most on the psychology of effective altruism in general and the psychology of longtermism and existential risk.

Lucas Perry: How large is the body of work regarding the psychology of existential risk and effective altruism in general? How many people are working on this? If you give us more insight into the state of the field and the amount of interest there.

Stefan Schubert: It’s somewhat difficult to answer because it sort of depends on how do you define these domains. There’s research, which is of some relevance to ineffective altruism, but it’s not exactly on that. But I will say that there may be around 10 researchers or so who are sort of EAs and work on these topics for EA reasons. So you definitely want to count them. And then when we thinking about non EA researchers, like other academics, there hasn’t been that much research I would say on the psychology of X-risk and longtermism

There’s research on the psychology of climate change, that’s a fairly large topic. But more specifically on X-risk and longtermism, there’s less. Effective altruism in general. That’s a fairly large topic. There’s lots of research on biases like the identifiable victim effect: people’s tendency to donate to identifiable victims over larger number of known unidentifiable statistical victims. Maybe the order of a few hundred papers.

And then the last topic, speciesism; human-animals relations: that’s fairly large. I know less of that literature, but my impression is that it’s fairly large.

Lucas Perry: Going back into the 20th century, much of what philosophers have done, like Peter Singer is constructing thought experiments, which isolate the morally relevant aspects of a situation, which is intended in the end to subvert psychological issues and biases in people.

So I guess I’m just reflecting here on how philosophical thought experiments are sort of the beginnings of elucidating a project of the psychology of EA or existential risk or whatever else.

Stefan Schubert: The vast majority of these papers are not directly inspired by philosophical thought experiments. It’s more like psychologists who run some experiments because there’s some theory that some other psychologist has devised. Most don’t look that much at philosophy I would say. But I think effective altruism and the fact that people are ineffectively altruistic, that’s fairly theoretically interesting for psychologists, and also for economists.

Lucas Perry: So why study psychological questions as they relate to effective altruism, and as they pertain to longtermism and longterm future considerations?

Stefan Schubert: It’s maybe easiest to answer that question in the context of effective altruism in general. I should also mention that when we studied this topic of sort of effectively altruistic actions in general, what we concretely study is effective and ineffective giving. And that is because firstly, that’s what other people have studied, so it’s easier to put our research into context.

The other thing is that it’s quite easy to study in a lab setting, right? So you might ask people, where would you donate to the effective or the ineffective charity? You might think that career choice is actually more important than giving, or some people would argue that, but that seems more difficult to study in a lab setting. So with regards to what motivates our research on effective altruism in general and effective giving, what ultimately motivates our research is that we want to make people improve their decisions. We want to make them donate more effectively, be more effectively altruistic in general.

So how can you then do that? Well, I want to make one distinction here, which I think might be important to think about. And that is the distinction between what I call a behavioral strategy and an intellectual strategy. And the behavioral strategy is that you come up with certain framings or setups to decision problems, such that people behave in a more desirable way. So there’s literature on nudging for instance, where you sort of want to nudge people into desirable options.

So for instance, in a cafeteria where you have healthier foods at eye level and the unhealthy food is harder to reach people will eat healthier than if it’s the other way round. You could come up with interventions that similarly make people donate more effectively. So for instance, the default option could be an effective charity. We know that in general, people tend often to go with the default option because of some kind of cognitive inertia. So that might lead to more effective donations.

I think it has some limitations. For instance, nudging might be interesting for the government because the government has a lot of power, right? It might frame the decision on whether you want to donate your organs after you’re dead. The other thing is that just creating an implementing these kinds of behavior interventions can often be very time consuming and costly.

So one might think that this sort of intellectual strategy should be emphasized and it shouldn’t be forgotten. So with respect to the intellectual strategy, you’re not trying to change people’s behavior solely, you are trying to do that as well, but you’re also trying to change their underlying way of thinking. So in a sense it has a lot in common with philosophical argumentation. But the difference is that you start with descriptions of people’s default way of thinking.

You describe that your default way of thinking, that leads you to prioritize an identifiable victim over larger numbers of statistical victims. And then you sort of provide an argument that that’s wrong. Statistical victims, they are just as real individuals as the identifiable victims. So you get people to accept that their own default way of thinking about identifiable versus statistical victims is wrong, and that they shouldn’t trust the default way of thinking but instead think in a different way.

I think that this strategy is actually often used, but we don’t often think about it as a strategy. So for instance, Nate Soares has this blog post “On Caring” where he argues that we shouldn’t trust our internal care-o-meter. And this is because we can’t increase how much we feel about more people dying with the number of people that die or with the badness of those increasing numbers. So it’s sort of an intellectual argument that takes psychological insight as a starting point and other people have done as well.

So the psychologist Paul Bloom has this book Against Empathy where he argues for similar conclusions. And I think Eliezer Yudkowsky uses his strategy a lot in his sequences. I think it’s often an effective strategy that should be used more.

Lucas Perry: So there’s the extent to which we can know about underlying, problematic cognition in persons and we can then change the world in ways. As you said, this is framed as nudging, where you sort of manipulate the environment in such a way without explicitly changing their cognition, in order to produce desired behaviors. Now, my initial reaction to this is, how are you going to deal with the problem when they find out that you’re doing this to them?

Now the second one here is the extent to which we can use our insights from psychological and analysis and studies to change implicit and explicit models and cognition in order to effectively be better decision makers. If a million deaths is a statistic and a dozen deaths is a tragedy, then there is some kind of failure of empathy and compassion in the human mind. We’re not evolved or set up to deal with these kinds of moral calculations.

So maybe you could do nudging by setting up the world in such a way that people are more likely to donate to charities that are likely to help out statistically large, difficult to empathize with numbers of people, or you can teach them how to think better and better act on statistically large numbers of people.

Stefan Schubert: That’s a good analysis actually. On the second approach: what I call the intellectual strategy, you are sort of teaching them to think differently. Whereas on this behavioral or nudging approach, you’re changing the world. I also think that this comment about “they might not like the way you nudged them” is a good comment. Yes, that has been discussed. I guess in some cases of nudging, it might be sort of cases of weakness of will. People might not actually want the chocolate but they fall prey to their impulses. And the same might be true with saving for retirement.

So whereas with ineffective giving, yeah there it’s much less clear. Is it really the case that people really want to donate effectively and therefore sort of are happy to be nudged in this way, that doesn’t seem to clear at all? So that’s absolutely a reason against that approach.

And then with respect to arguing for certain conclusions, in the sense that it is argument or argumentation, it’s more akin to philosophical argumentation. But it’s different from standard analytic philosophical argumentation in that it discusses human psychology. You discuss how our psychological dispositions mislead us at length and that’s not how analytic philosophers normally do it. And of course you can argue for instance, effective giving in the standard philosophical vein.

And some people have done that, like this EA philosopher Theron Pummer, he has an interesting paper called Whether and Where to Give on this question of whether it is an obligation to donate effectively. So I think that’s interesting, but one worries that there might not be that much to say about these issues because everything else equal is maybe sort of trivial that the more effectiveness the better. Of course everything isn’t always equal. But in general, it might not be too much interesting stuff you can say about that from a normative or philosophical point of view.

But there are tons of interesting psychological things you can say because there are tons of ways in which people aren’t effective. The other related issue is that this form of psychology might have a substantial readership. So it seems to me based on the success of Kahneman and Haidt and others, that people love to read about how their own and others’ thoughts by default go wrong. Whereas in contrast, standard analytic philosophy, it’s not as widely read, even among the educated public.

So for those reasons, I think that the sort of more psychology based augmentation may in some respects be more promising than purely abstract philosophical arguments for why we should be effectively altruistic.

Lucas Perry: My view or insight here is that the analytic philosopher is more so trying on the many different perspectives in his or her own head, whereas the psychologist is empirically studying what is happening in the heads of many different people. So clarifying what a perfected science of psychology in this field is useful for illustrating the end goals and what we’re attempting to do here. This isn’t to say that this will necessarily happen in our lifetimes or anything like that, but what does a full understanding of psychology as it relates to existential risk and longtermism and effective altruism enable for human beings?

Stefan Schubert: One thing I might want to say is that psychological insights might help us to formulate a vision of how we ought to behave or what mindset we ought to have and what we ought to be like as people, which is not the only normatively valid, which is what philosophers talk about, but also sort of persuasive. So one idea there that Lucius and I have discussed quite extensively recently is that some moral psychologists suggest that when we think about morality, we think to a large degree, not in terms of whether a particular act was good or bad, but rather about whether the person who performed that act is good or bad or whether they are virtuous or vicious.

So this is called the person centered approach to moral judgment. Based on that idea, we’ve been thinking about what lists of virtues people would need, in order to make the world better, more effectively. And ideally these should be virtues that both are appealing to common sense, or which can at least be made appealing to common sense, and which also make the world better when applied.

So we’ve been thinking about which such virtues one would want to have on such a list. We’re not sure exactly what we’ll include, but some examples might be prioritization, that you need to make sure that you prioritize the best ways of helping. And then we have another which we call Science: That you do proper research and how to help effectively or that you rely on others who do. And then collaboration, that you’re willing to collaborate on moral issues, potentially even with your moral opponents.

So the details of this virtues aren’t too important, but the idea is that it hopefully should seem like a moral ideal to some people, to be a person who lives these virtues. I think that to many people philosophical arguments about the importance of being more effective and putting more emphasis on consequences, if you read them in a book of analytic philosophy, that might seem pretty uninspiring. So people don’t read that and think “that’s what I would want to be like.”

But hopefully, they could read about these kinds of virtues and think, “that’s what I would want to be like.” So to return to your question, ideally we could use psychology to sort of create such visions of some kind of moral ideal that would not just be normatively correct, but also sort of appealing and persuasive.

Lucas Perry: It’s like a science, which is attempting to contribute to the project of human and personal growth and evolution and enlightenment in so far as that as possible.

Stefan Schubert: We see this as part of the larger EA project of using evidence and reason and research to make the world a better place. EA has this prioritization research where you try to find the best ways of doing good. I gave this talk at EAGx Nordics earlier this year on “Moral Aspirations and Psychological Limitations.” And in that talk I said, well what EAs normally do when they prioritize ways of doing good, is as it were, they look into the world and they think: what ways of doing good are there? What different courses are there? What sort of levers can we pull to make the world better?

So should we reduce existential risk from specific sources like advanced AI or bio risk, or is rather global poverty or animal welfare the best thing to work on? But then the other approach is to rather sort of look inside yourself and think, well I am not perfectly effectively altruistic, and that is because of my psychological limitations. So then we want to find out which of those psychological limitations are most impactful to work on because, for instance, they are more tractable or because it makes a bigger difference if we remove them. That’s one way of thinking about this research, that we sort of take this prioritization research and turn it inwards.

Lucas Perry: Can you clarify the kinds of things that psychology is really pointing out about the human mind? Part of this is clearly about biases and poor aspects of human thinking, but what does it mean for human beings to have these bugs and human cognition? What are the kinds of things that we’re discovering about the person and how he or she thinks that fail to be in alignment with the truth.

Stefan Schubert: I mean, there are many different sources of error, one might say. One thing that some people have discussed is that people are not that interested in being effectively altruistic. Why is that? Some people say that’s just because they get more warm glow out of giving someone who’s suffering more saliently and then the question arises, why do they get more warm glow out of that? Maybe that’s because they just want to signal their empathy. That’s sort of one perspective, which is maybe a bit cynical, then ,that the ultimate source of lots of ineffectiveness is just this preference for signaling and maybe a lack of genuine altruism.

Another approach would be to just say, the world is very complex and it’s very difficult to understand it and we’re just computationally constrained, so we’re not good enough at understanding it. Another approach would be to say that because the world is so complex, we evolved various broad-brushed heuristics, which generally work not too badly, but then, when we are put in some evolutionarily novel context and so on, they don’t guide us too well. That might be another source of error. In general, what I would want to emphasize is that there are likely many different sources of human errors.

Lucas Perry: You’ve discussed here how you focus and work on these problems. You mentioned that you are primarily interested in the psychology of effective altruism in so far as we can become better effective givers and understand why people are not effective givers. And then, there is the psychology of longtermism. Can you enumerate some central questions that are motivating you and your research?

Stefan Schubert: To some extent, we need more research just in order to figure out what further research we and others should do so I would say that we’re in a pre-paradigmatic stage with respect to that. There are numerous questions one can discuss with respect to psychology of longtermism and existential risks. One is just people’s empirical beliefs on how good the future will be if we don’t go extinct, what the risk of extinction is and so on. This could potentially be useful when presenting arguments for the importance of work on existential risks. Maybe it turns out that people underestimate the risk of extinction and the potential quality of the future and so on. Another issue which is interesting is moral judgments, people’s moral judgements about how bad extinction would be, and the value of a good future, and so on.

Moral judgements about human extinction, that’s exactly what we studied in a recent paper that we published, which is called “The Psychology of Existential Risks: Moral Judgements about Human Extinction.” In that paper, we test this thought experiment by philosopher Derek Parfit. He has this thought experiment where he discusses three different outcomes. First, peace, the second, a nuclear war that kills 99% of the world’s existing population and three, a nuclear war that kills everyone. Parfit says, then, that a war that kills everyone, that’s the worst outcome. Near-extinction is the next worst and peace is the best. Maybe no surprises there, but the more interesting part of the discussion, that concerns the relative differences between these outcomes in terms of badness. Parfit effectively made an empirical prediction, saying that most people would find a difference in terms of badness between peace and near-extinction to be greater, but he himself thought that the difference between near-extinction and extinction, that’s the greater difference. That’s because only extinction would lead to the future forever being lost and Parfit thought that if humanity didn’t go extinct, the future could be very long and good and therefore, it would be a unique disaster if the future was lost.

On this view, extinction is uniquely bad, as we put it. It’s not just bad because it would mean that many people would die, but also because it would mean that we would lose a potentially long and grand future. We tested this hypothesis in the paper, then. First, we had a preliminary study, which didn’t actually pertain directly to Parfit’s hypothesis. We just studied whether people would find extinction a very bad event in the first place and we found that, yes, they do and they that the government should invest substantially to prevent it.

Then, we moved on to the main topic, which was Parfit’s hypothesis. We made some slight changes. In the middle outcome, Parfit had 99% dying. We reduced that number to 80%. We also talked about catastrophes in general rather than nuclear wars and we didn’t want to talk about peace because we thought that you might have an emotional association with the word “peace;” we just talked about no catastrophe instead. Using this paradigm, we found that Parfit was right. First, most people, just like him, thought that extinction was the worst outcome, near extinction the next, and no catastrophe was the best. But second, we find, then, that most people find the difference in terms of badness, between no one dying and 80% dying, that’s greater than the difference between 80% dying and 100% dying.

Our interpretation, then, is that this is presumably because they focus most on the immediate harm that the catastrophes cause and in terms of the immediate harm, the difference between no one dying and 80% dying, it’s obviously greater than that between 80% dying and 100% dying. That was a control condition in some of our experiments, but we also had other conditions where we would slightly tweak the question. We had one condition which we call the salience condition, where we made the longterm consequences of the three outcomes salient. We told participants to remember the longterm consequences of the outcomes. Here, we didn’t actually add any information that they don’t have access to, but we just made some information more salient and that made significantly more participants find the difference between 80% dying and 100% dying the greater one.

Then, we had yet another condition which we call the utopia condition, where we told participants that if humanity doesn’t go extinct, then the future will be extremely long and extremely good and it was said that if 80% die, then, obviously, at first, things are not so good, but after a recovery period, we would go on to this rosy future. We included this condition partly because such scenarios have been discussed to some extent by futurists, but partly also because we wanted to know, if we ramp up this goodness of the future to the maximum and maximize the opportunity costs of extinction, how many people would then find the difference between near extinction and extinction the greater one. Indeed, we found, then, that given such a scenario, a large majority found the difference between 80% dying and 100% dying the larger one so then, they did find extinction uniquely bad given this enormous opportunity cost of a utopian future.

Lucas Perry: What’s going on in my head right now is we were discussing earlier the role or not of these philosophical thought experiments in psychological analysis. You’ve done a great study here that helps to empirically concretize the biases and remedies for the issues that Derek Parfit had exposed and pointed to in his initial thought experiment. That was popularized by Nick Bostrom and it’s one of the key thought experiments for much of the existential risk community and people committed to longtermism because it helps to elucidate this deep and rich amount of value in the deep future and how we don’t normally consider that. Your discussion here just seems to be opening up for me tons of possibilities in terms of how far and deep this can go in general. The point of Peter Singer’s child drowning in a shallow pond was to isolate the bias of proximity and Derek Parfit’s thought experiment isolates the bias of familiarity, temporal bias and continuing into the future, it’s making me think, we also have biases about identity.

Derek Parfit also has thought experiments about identity, like with his teleportation machine where, say, you stepped into a teleportation machine and it annihilated all of your atoms but before it did so, it scanned all of your information and once it scanned you, it destroyed you and then re-assembled you on the other side of the room, or you can change the thought experiment and say on the other side of the universe. Is that really you? What does it mean to die? Those are the kinds of questions that are elicited. Listening to what you’ve developed and learned and reflecting on the possibilities here, it seems like you’re at the beginning of a potentially extremely important and meaningful field that helps to inform decision-making on these morally crucial and philosophically interesting questions and points of view. How do you feel about that or what I’m saying?

Stefan Schubert: Okay, thank you very much and thank you also for putting this Parfit thought experiment a bit in context. What you’re saying is absolutely right, that this has been used a lot, including by Nick Bostrom and others in the longtermist community and that was indeed one reason why we wanted to test it. I also agree that there are tons of interesting philosophical thought experiments there and they should be tested more. There’s also this other field of experimental philosophy where philosophers test philosophical thought experiments themselves, but in general, I think there’s absolutely more room for empirical testing of them.

With respect to temporal bias, I guess it depends a bit what one means by that, because we actually did get an effect from just mentioning that they should consider the longterm consequences, so I might think that to some extent it’s not only that people are biased in favor of the present, but it’s also that they don’t really consider the longterm future. They sort of neglect it and it’s not something that’s generally discussed among most people. I think this is also something that Parfit’s thought experiment highlights. You have to think about the really longterm consequences here and if you do think about them, then, your intuitions about these thought experiment should reverse.

Lucas Perry: People’s cognitive time horizons are really short.

Stefan Schubert: Yes.

Lucas Perry: People probably have the opposite discounting of future persons that I do. Just because I think that the kinds of experiences that Earth-originating intelligent life forms will be having in the near to 100 to 200 years will be much more deep and profound than what humans are capable of, that I would value them more than I value persons today. Most people don’t think about that. They probably just think there’ll be more humans and short of their bias towards present day humans, they don’t even consider a time horizon long enough to really have the bias kick in, is what you’re saying?

Stefan Schubert: Yeah, exactly. Thanks for that, also, for mentioning that. First of all, my view is that people don’t even think so much about the longterm future unless prompted to do so. Second, in this first study I mentioned, which was sort of a pre-study, we asked, “How good do you think that the future’s going to be?” On the average, I think they said, “It’s going to be slightly better than the present” and that would be very different from your view, then, that the future’s going to be much better. You could argue that this view that the future is going to be about as good as present is somewhat unlikely. I think it’s going to be much better or maybe it’s going to be much worse. There’s several different biases or errors that are present here.

Merely making the longterm consequences of the three outcomes salient, that already makes people more inclined to find a difference between 80% dying and 100% dying the greater one, so then you don’t add any information. Also ,specifying that the longterm outcomes are going to be extremely good, that makes a further difference that make most people find the difference between 80% dying and 100% dying the greater one.

Lucas Perry: I’m sure you and I, and listeners as well, have the hilarious problem of trying to explain this stuff to friends or family members or people that you meet that are curious about it and the difficulty of communicating it and imparting the moral saliency. I’m just curious to know if you have explicit messaging recommendations that you have extracted or learned from the study that you’ve done.

Stefan Schubert: You want to make the future more salient if you want people to care more about existential risk. With respect to explicit messaging more generally, like I said, there haven’t been that many studies on this topic, so I can’t refer to any specific study that says that this is how you should work with the messaging on this topic but just thinking more generally, one thing I’ve been thinking about is that maybe, with many of these issues, it’s just that it takes a while for people to get habituated with them. At first, if someone hears a very surprising statement that has very far reaching conclusions, they might be intuitively a bit skeptical about it, independently of how reasonable that argument would be for someone who would be completely unbiased. Their prior is that, probably, this is not right and to some extent, this might even be reasonable. Maybe people should be a bit skeptical of people who say such things.

But then, what happens is that most such people who make such claims that seem to people very weird and very far-reaching, they get discarded after some time because people poke holes in their arguments and so on. But then, a small subset of all such people, they actually stick around and they get more and more recognition and you could argue that that’s what’s now happening with people who work on longtermism and X-risk. And then, people slowly get habituated to this and they say, “Well, maybe there is something to it.” It’s not a fully rational process. I think this doesn’t just relate to longtermism an X-risk but maybe also specifically to AI risk, where it takes time for people to accept that message.

I’m sure there are some things that you can do to speed up that process and some of them would be fairly obvious like have smart, prestigious, reasonable people talk about this stuff and not people who don’t seem as credible.

Lucas Perry: What are further areas of the psychology of longtermism or existential risk that you think would be valuable to study? And let’s also touched upon other interesting areas for effective altruism as well.

Stefan Schubert: I mentioned previously people’s empirical beliefs, that could be valuable. One thing I should mention there is that I think that people’s empirical beliefs about the distant future are massively affected by framing effects, so depending on how you ask these questions, you are going to get very different answers so that’s important to remember that it’s not like people have these stable beliefs and they will always say that. The other thing I mentioned is moral judgments, and I said we stated moral judgements about human extinction, but there’s a lot of other stuff to do, like people’s views on population ethics could obviously be useful. Views on whether creating happy people is morally valuable. Whether it’s more valuable to bring large number of people whose life is barely worth living into existence than to bring in a small number of very happy people into existence and so on.

Those questions obviously have relevance for the moral value of the future. One thing I would want to say is that if you’re rational, then, obviously, your view on what and how much we should do to affect the distant future, that should arguably be a function of your moral views, including on population ethics, on the one hand, and also your empirical views of how the future’s likely to pan out. But then, I also think that people obviously aren’t completely rational and I think, in practice, their views on the longterm future will also be influenced by other factors. I think that their view on whether helping the longterm future seems like an inspiring project, that might depend massively on how the issue is framed. I think these aspects could be worth studying because if we find these kinds of aspects, then we might want to emphasize the positive aspects and we might want to adjust our behavior to avoid the negative. The goal should be to formulate a vision of longtermism that feels inspiring to people, including to people who haven’t put a lot of thought into, for instance, population ethics and related matters.

There are also some other specific issues which I think could be useful to study. One is the psychology of predictions about the distant future and the implications of the so-called construal level theory for the psychology or the longterm future. Many effective altruists would know construal level theory under another name: near mode and far mode. This is Robin Hanson’s terminology. Construal level theory is a theory about psychological distance and how it relates to how abstractly we construe things. It says that we conceive of different forms of distance – spatial, temporal, social – similarly. The second claim is that we conceive of items and events at greater psychological distance. More abstractly, we focus more on big picture features and less on details. So, Robin Hanson, he’s discussed this theory very extensively including with respect to the long term future. And he argues that the great psychological distance to the distant future causes us to reason in overly abstract ways, to be overconfident to have poor epistemics in general about the distant future.

I find this very interesting, and these kinds of ideas are mentioned a lot in EA and the X-risk community. But, to my knowledge there hasn’t been that much research which applies construal level theory specifically to the psychology of the distant future.

It’s more like people look at these general studies of construal level theory, and then they noticed that, well, the temporal distance to the distant future is obviously extremely great. Hence, these general findings should apply to a very great extent. But, to my knowledge, this hasn’t been studied so much. And given how much people discuss near or far mode in this case, it seems that there should be some empirical research.

I should also mention that I find that construal level theory a very interesting and rich psychological theory in general. I could see that it could illuminate the psychology of the distant future in numerous ways. Maybe it could be some kind of a theoretical framework that I could use for many studies about the distant future. So, I recommend that key paper from 2010 by Trope and Liberman on construal level theory.

Lucas Perry: I think that just hearing you say this right now, it’s sort of opening my mind up to the wide spectrum of possible applications of psychology in this area.

You mentioned population ethics. That makes me just think of in the context of EA and longtermism and life in general, the extent to which psychological study and analysis can find ethical biases and root them out and correct for them, either by nudging or by changing the explicit methods by which humans cognize about such ethics. There’s the extent to which psychology can better inform our epistemics, so this is the extent to which we can be more rational.

And I’m reflecting now how quantum physics subverts many of our Newtonian mechanics and classical mechanics, intuitions about the world. And there’s the extent to which psychology can also inform the way in which our social and experiential lives also condition the way that we think about the world and the extent to which that sets us astray in trying to understand the fundamental nature of reality or thinking about the longterm future or thinking about ethics or anything else. It seems like you’re at the beginning stages of debugging humans on some of the most important problems that exist.

Stefan Schubert: Okay. That’s a nice way of putting it. I certainly think that there is room for way more research on the psychology of longtermism and X-risk.

Lucas Perry: Can you speak a little bit now here about speciesism? This is both an epistemic thing and an ethical thing in the sense that we’ve invented these categories of species to describe the way that evolutionary histories of beings bifurcate. And then, there’s the psychological side of the ethics of it where we unnecessarily devalue the life of other species given that they fit that other category.

Stefan Schubert: So, we have one paper on the review, which is called “Why People Prioritize Humans Over Animals: A Framework for Moral Anthropocentrism.

To give you a bit of context, there’s been a lot of research on speciesism and on humans prioritizing humans over animals. So, in this paper we sort of try to take a bit more systematic approach and pick these different hypotheses for why humans prioritize humans over animals against each other, and look at their relative strengths as well.

And what we find is that there is truth to several of these hypotheses of why humans prioritize humans over animals. One contributing factor is just that they value individuals with greater mental capacities, and most humans have great mental capacities than most animals.

However, that explains the only part of the effect we find. We also find that people think that humans should be prioritized over animals even if they have the same mental capacity. And here, we find that this is for two different reasons.

First, according to our findings, people are what we call species relativists. And by that, we mean that they think that members of the species, including different non-human species, should prioritize other members of that species.

So, for instance, humans should prioritize other humans, and an elephant should prioritize other elephants. And that means that because humans are the ones calling the shots in the world, we have a right then, according to this species relativist view, to prioritize our own species. But other species would, if they were in power. At least that’s the implication of what the participants say, if you take them at face value. That’s species relativism.

But then, there is also the fact that they exhibit an absolute preference for humans over animals, meaning that even if we control for the mental capacities of humans and animals, and even if we control for the species relativist factors that we control for who the individual who could help them is, there remains a difference which can’t be explained by those other factors.

So, there’s an absolute speciesist preference for humans which can’t be explained by any further factor. So, that’s an absolute speciesist preference as opposed to this species relativist view.

In total, there’s a bunch of factors that together explain why humans prioritize animals, and these factors may also influence each other. So, we present some evidence that if people have a speciesist preference for humans over animals, that might, in turn, lead them to believe that animals have less advanced mental capacities than they actually have. And because they have this view that individuals with lower mental capacity, they are less morally valuable, that leads them to further deprioritize animals.

So, these three different factors, they sort of interact with each other in intricate ways. Our paper gives this overview over these different factors which contribute to humans prioritizing humans over animals.

Lucas Perry: This helps to make clear to me that a successful psychological study with regards to at least ethical biases will isolate the salient variables which are knobs that are tweaking the moral saliency of one thing over another.

Now, you said mental capacities there. You guys aren’t bringing consciousness or sentience into this?

Stefan Schubert: We discuss different formulations at length, and we went for the somewhat generic formulation.

Lucas Perry: I think people have beliefs about the ability to rationalize and understand the world, and then how that may or may not be correlated with consciousness that most people don’t make explicit. It seems like there are some variables to unpack underneath cognitive capacity.

Stefan Schubert: I agree. This is still like fairly broad brushed. The other thing to say is that sometimes we say that this human has as advanced mental capacities as these animals. Then, they have no reason to believe that the human has a more sophisticated sentience or is more conscious or something like that.

Lucas Perry: Our species membership tells me that we probably have more consciousness. My bedrock thing is I care about how much the thing can suffer or not, not how well it can model the world. Though those things are maybe probably highly correlated with one another. I think I wouldn’t be a speciesist if I thought human beings were currently the most important thing on the planet.

Stefan Schubert: You’re a speciesist if you prioritize humans over animals purely because of species membership. But, if you prioritize one species over another for some other reasons which are morally relevant, then you would not be seen as a speciesist.

Lucas Perry: Yeah, I’m excited to see what comes of that. I think that working on overcoming racism and misogyny and other things, and I think that overcoming speciesism and temporal biases and physical space, proximity biases are some of the next stages in human moral evolution that have to come. So, I think it’s honestly terrific that you’re working on these issues.

Is there anything you would like to say or that you feel that we haven’t covered?

Stefan Schubert: We have one paper which is called “The Puzzle of Ineffective Giving,” where we study this misconception that people have, which is that they think the difference in effectiveness between charities is much smaller than it actually is. So, experts think that the most effective charities are vastly much more effective than the average charity, and people don’t know that.

That seems to suggest that beliefs play a role in ineffective giving. But, there was one interesting paper called “Impediments to Effective Altruism” where they show that even if you tell people that cancer charity is less effective than an arthritis charity, they still donate.

So, then we have this other paper called “The Many Obstacles to Effective Giving.” It’s a bit similar to this speciesist paper, I guess, that we sort of pit different competing hypotheses that people have studied against each other. We give people different tasks, for instance, tasks which involve identifiable victims and tasks which involve ineffective but low overhead charities.

And then, we sort of started, well, what if we tell them how to be effective? Does that change how they behave? What’s the role of that pure belief factor? What’s the role of preferences? The result is a bit of a mix. Both beliefs and preferences contribute to ineffective giving.

In the real world, it’s likely that are several beliefs and preferences that obstruct effective giving present simultaneously. For instance, people might fail to donate to the most effective charity because first, it’s not a disaster charity, and they might have a preference for a disaster charity. And it might have a high overhead, and they might falsely believe then that high overhead entails low effectiveness. And it might not highlight identifiable victims, and they have a preference for donating to identifiable victims.

Several of these obstacles are present at the same time, and in that sense, ineffective giving is overdetermined. So, fixing one specific obstacle may not make as much of the difference as one would have wanted. That might support the view that what we need is not primarily behavioral interventions that address individual obstacles, but rather a more broad mindset change that can motivate people to proactively seek out the most effective ways of doing good.

Lucas Perry: One other thing that’s coming to my mind is the proximity of a cause to someone’s attention and the degree to which it allows them to be celebrated in their community for the good that they have done.

Are you suggesting that the way for remedying this is to help instill a curiosity and something resembling the EA mindset that would allow people to do the cognitive exploration and work necessary to transcend these limitations that bind them to their ineffective giving or is that unrealistic?

Stefan Schubert: First of all, let me just say that with respect to this proximity issue, that was actually another task that we had. I didn’t mention all the tasks. So, we told people that you can either help a local charity or a charity, I think it was in India. And then, we told them that the Indian charity is more effective and asked “where would you want to donate?”

So, you’re absolutely right. That’s another obstacle to effective giving, that people sometimes have preferences or beliefs that local charities are more effective even when that’s not the case. Some donor I talked to, he said, “Learning how to donate effectively, it’s actually fairly complicated, and there are lots of different things to think about.”

So, just fixing the overhead myth or something like that, that may not take you very far, especially if you think that the very best charities that are sort of extremely much more effective than the average charity. So, what’s important is not going from an average charity to a somewhat more effective charity, but to actually find the very best charities.

And to do that, we may need to address many psychological obstacles because the most effective charities, they might be very weird and sort of concerned with longterm future or what-not. So, I do think that a mindset where people seek out effective charities, or defer to others who do, that might be necessary. It’s not super easy to make people adopt that mindset, definitely not.

Lucas Perry: We have charity evaluators, right? These institutions which are intended to be reputable enough that they can tell you which are the most effective charities to donate to. It wouldn’t even be enough to just market those really hard. They’d be like, “Okay, that’s cool. But, I’m still going to donate my money to seeing eye dogs because blindness is something that runs in my family and is experientially and morally salient for me.”

Is the way that we fix the world really about just getting people to give more, and what is the extent to which the institutions which exist, which require people to give, need to be corrected and fixed? There’s that tension there between just the mission of getting people to give more, and then the question of, well, why do we need to get everyone to give so much in the first place?

Stefan Schubert: This insight that ineffective giving is overdetermined and there are lots of things that stand in a way of effective giving, one thing I like about it is that it seems to sort of go well with this observation that it is actually, in the real world, very difficult to make people donate effectively.

I might relate there a bit to what you mentioned about the importance of giving more, and so we could sort of distinguish between the different kinds of psychological limitations. First, that limitations that relate to how much we give. We’re selfish, so therefore we don’t necessarily give as much of our monetary rather resources as we should. There are sort of limits to altruism.

But then, there are also limits to effectiveness. We are ineffective for various reasons that we’ve discussed. And then, there’s also fact that we can have the wrong moral goals. Maybe we work towards short term goals, but then we would realize on the careful reflection that we should work towards long term goals.

And then, I was thinking like, “Well, which of these obstacles should you then prioritize if you turn this sort of prioritization framework inwards?” And then, you might think that, well, at least with respect to giving, it might be difficult for you to increase the amount that you give by more than 10 times. Americans, for instance, they already donate several percent of their income. We know from historical experience that it might be hard for people to sustain very high levels of altruism, so maybe it’s difficult for them to sort of ramp up this altruist factor to the extreme amount.

But then, with effectiveness, if this story about heavy-tailed distributions of effectiveness is right, then you could increase the effectiveness of your donations a lot. And arguably, the sort of psychological price for that is lower. It’s very demanding to give up a huge proportion of your income for others, but I would say that it’s less demanding to redirect your donations to a more effective cause, even if you feel more strongly for the ineffective cause.

I think it’s difficult to really internalize how enormously important it is to go for the most effective option. And also, of course, then the third factor to sort of change your moral goals if necessary. If people would reduce their donations by 99%, they would reduce the impact by 99%. Many people would feel guilty about it.

But then, if they reduce their impact 99% via reducing their effectiveness 99% through choosing an ineffective charity, then people don’t feel similarly guilty, so similar to Nate Soares’ idea of a care-o-meter: our feelings aren’t adjusted for these things, so we don’t feel as much about the ineffectiveness as we do about altruistic sacrifice. And that might lead us to not focus enough on effectiveness, and we should really think carefully about going that extra mile for the sake of effectiveness.

Lucas Perry: Wonderful. I feel like you’ve given me a lot of concepts and tools that are just very helpful for reinvigorating a introspective mindfulness about altruism in my own life and how that can be nurtured and developed.

So, thank you so much. I’ve really enjoyed this conversation for the reasons I just said. I think this is a very important new research stream in this space, and it seems small now, but I really hope that it grows. And thank you for you and your colleagues work here on seeding and doing the initial work in this field.

Stefan Schubert: Thank you very much. Thank you for having me. It was a pleasure.

AI Alignment Podcast: Machine Ethics and AI Governance with Wendell Wallach

Wendell Wallach has been at the forefront of contemporary emerging technology issues for decades now. As an interdisciplinary thinker, he has engaged at the intersections of ethics, governance, AI, bioethics, robotics, and philosophy since the beginning formulations of what we now know as AI alignment were being codified. Wendell began with a broad interest in the ethics of emerging technology and has since become focused on machine ethics and AI governance. This conversation with Wendell explores his intellectual journey and participation in these fields.

 Topics discussed in this episode include:

  • Wendell’s intellectual journey in machine ethics and AI governance 
  • The history of machine ethics and alignment considerations
  • How machine ethics and AI alignment serve to produce beneficial AI 
  • Soft law and hard law for shaping AI governance 
  • Wendell’s and broader efforts for the global governance of AI
  • Social and political mechanisms for mitigating the risks of AI 
  • Wendell’s forthcoming book

Key points from Wendell:

  • “So when you were talking about machine ethics or when we were talking about machine ethics, we were really thinking about it in terms of just how do you introduce ethical procedures so that when machines encounter new situations, particularly when the designers can’t fully predict what their actions will be, that they factor in ethical considerations as they choose between various courses of action. So we were really talking about very basic program in the machines, but we weren’t just thinking of it in terms of the basics. We were thinking of it in terms of the evolution of smart machines… What we encounter in the Singularity Institute, now MIRI for artificial intelligence approach of friendly AI and what became value alignment is more or less a presumption of very high order intelligence capabilities by the system and how you would ensure that their values align with those of the machines. They tended to start from that level. So that was the distinction. Where the machine ethics folks did look at those futuristic concerns, they did more so from a philosophical level and at least a belief or appreciation that this is going to be a relatively evolutionary course, whereby the friendly AI and value alignment folks, they tended to presume that we’re going to have very high order cognitive capabilities and how do we ensure that those align with the systems. Now, the convergence, I would say, is what’s happening right now because in workshops that have been organized around the societal and ethical impact of intelligent systems.”
  • “My sense has been that with both machine ethics and value alignment, we’ve sort of got the cart in front of the horse. So I’m waiting to see some great implementation breakthroughs, I just haven’t seen them. Most of the time, when I encounter researchers who say they’re taking seriously, I see they’re tripping over relatively low level implementations. The difficulty is here, and all of this is converging. What AI alignment was initially and what it’s becoming now I think are quite different. I think in the very early days, it really was presumptions that you would have these higher order intelligences and then how were you going to align them. Now, as AI alignment, people look at the value issues as they intersect with present day AI agendas. I realize that you can’t make the presumptions about the higher order systems without going through developmental steps to get there. So, in that sense, I think whether it’s AI alignment or machine ethics, the one will absorb the lessons of the other. Both will utilize advances that happen on both fronts.”
  • “David Collingridge wrote a book where he outlined a problem that is now known as the Collingridge Dilemma. Basically, Collingridge said that while it was easiest to regulate a technology early in its style development, early in its development, we had a little idea of what its societal impact would be. By the time we did understand what the challenges from the societal impact were, the technology would be so deeply entrenched in our society that it would be very difficult to change its trajectory. So we see that today with social media. Social media was totally entrenched in our society before we realized how it could be manipulated in ways that would undermine democracy. Now we’re having a devil of a time of figuring out what we could do. So Gary and I, who had been talking about these kinds of problems for years, we realized that we were constantly lamenting the challenge, but we altered the conversation one day over a cup of coffee. We said, “Well, if we had our druthers, if we have some degree of influence, what would we propose?” We came up with a model that we referred to as governance coordinating committees. Our idea was that you would put in place a kind of issues manager that would try and guide the development of a field, but first of all, it would just monitor development, convene forums between the many stakeholders, map issues and gaps, see if anyone was addressing those issues and gaps or where their best practices had come to the floor. If these issues were not being addressed, then how could you address them, looking at a broad array of mechanisms. By a broad array of mechanisms, we meant you start with feasible technological solutions, you then look at what can be managed through corporate self-governance, and if you couldn’t find anything in either of those areas, then you turn to what is sometimes called soft law… So Gary and I proposed this model. Every time we ever talked about it, people would say, “Boy, that’s a great idea. Somebody should do that.” I was going to international forums, such as going to the World Economic meetings in Davos, where I’d be asked to be a fire-starter on all kinds of subject areas by safety and food security and the law of the ocean. In a few minutes, I would quickly outline this model as a way of getting people to think much more richly about ways to manage technological development and not just immediately go to laws and regulatory bodies. All of this convinced me that this model was very valuable, but it wasn’t being taken up. All of that led to this first International Congress for the Governance of Artificial Intelligence, which will be convened in Prague on April 16 to 18. I do invite those of you listening to this podcast who are interested in the international governance of AI or really agile governance for technology more broadly to join us at that gathering.”

 

Important timestamps: 

0:00 intro

2:50 Wendell’s evolution in work and thought

10:45 AI alignment and machine ethics

27:05 Wendell’s focus on AI governance

34:04 How much can soft law shape hard law?

37:27 What does hard law consist of?

43:25 Contextualizing the International Congress for the Governance of AI

45:00 How AI governance efforts might fail

58:40 AGI governance

1:05:00 Wendell’s forthcoming book

 

Works referenced:

A Dangerous Master: How to  Keep Technology from Slipping Beyond Our Control 

Moral Machines: Teaching Robots Right from Wrong

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Hey everyone, welcome to the AI Alignment Podcast. I’m Lucas Perry. Today, we’ll be speaking with Wendell Wallach. This episode is primarily dedicated to the issue and topic of AI governance, though in order to get there we go on and explore Wendell’s intellectual journey in machine ethics and how that led him up to his current efforts in AI governance. We also discuss how machine ethics and AI alignment both attempt to serve the project of creating beneficial AI and deal with the moral and ethical considerations related to the growing power and use of artificial intelligence. We discuss soft law and hard law for shaping AI governance. We get into Wendell’s efforts for the global governance of AI and discuss the related risks. And to finish things off we also briefly touch on AGI governance and Wendell’s forthcoming book. If you find this podcast valuable, interesting, or helpful, consider sharing it with others who might find it valuable as well.

For those who are not familiar with Wendell, Wendell is an internationally recognized expert on the ethical and governance concerns posed by emerging technologies, particularly artificial intelligence and neuroscience. Wendell is a consultant and ethicist, and a scholar at Yale University’s Interdiscplinary Center or Bioethics. He is also a co-author with Colin Allen, Moral Machines: Teaching Robots Right from Wrong. This work maps machine ethics, machine morality, computational morality, and friendly AI. He has a second and more recent book, A Dangerous Master: How to Keep Technology from Slipping Beyond our Control. From my perspective of things, it seems there is much growing enthusiasm and momentum in the space of AI policy and governance efforts. So, this conversation and those like it I feel help to further develop my perspective and understanding of where we are in the project and space of AI governance. For these reasons, I hope that you’ll find it valuable as well. So, let’s get into our conversation with Wendell Wallach.

It would be great if you could start by clarifying the evolution of your thought in science and technology over the years. It appears that you’ve gone from being interested in bioethics to machine ethics to now a more recent focus in AI governance and AI ethics. Can you take us through this movement in your thought and work?

Wendell Wallach: In reality, all three of those themes have been involved in my work from the very beginning, but the emphasis has changed. So I lived a very idiosyncratic life that ended with two computer consulting companies that I had helped start. But I had felt that there were books that I wanted to get out of my head, and I turned those companies over to the employees, and I started writing and realized that I was not up on some of the latest work in cognitive science. So one thing led to another, and I was invited to the first meeting of a technology and ethics working group at Yale University that had actually been started by Nick Bostrom when he was at Yale and Bonnie Kaplan. Nick left about a year later, and a year after that, Bonnie Kaplan had an accident, and the chair of that working group was turned over to me.

So that started my focus on technology and ethics more broadly. It was not limited to bioethics, but it did happen within the confine for the Yale Interdisciplinary Center for Bioethics. I was all over the place and the sense that I was already a kind of transdisciplinary thinker, transdisciplinary scholar, but having the challenge of focusing my study and my work so it was manageable. In other words, I was trying to think broadly at the same time as I was trying to focus on different subject areas. One thing led to another. I was invited to a conference in Baden Baden where I met Colin Allen. We together with the woman who started the workshop there, Eva Schmidt, began thinking about a topic that we were calling machine morality at that time. By machine morality, we meant thinking about how moral decision making faculties might be implemented in computers and robots.

Around the same time, there were other scholars working on the same themes. Michael and Susan Anderson, for example, had grabbed on to the title ‘machine ethics.’ Over time, as these various pathways converge, machine ethics became the main research area or the way in which this research project was referred to. It did have other names in addition to machine morality. It was sometimes called computational morality. At the same time, there were others who were working on it under the title of friendly AI, a term that was coined by Eliezer Yudkowsky. But the real difference between the machine ethics folks and the friendly AI folks was that the friendly AI folks were explicitly focused upon the challenge of how you would manage or tame superintelligence, whereby the machine ethics crew were much more ethicists, philosophers, computer scientists who were really thinking about first steps toward introducing moral decision making faculties, moral sensitivity into computers and robots. This was a relatively small group of scholars, but as this evolved over time, Eva and Collin and I decided that we would write a book mapping the development of this field of research.

Eva Schmidt fell away, and the book finally came out from Oxford University Press under the title Moral Machines: Teaching Robots Right from Wrong. So, as you may be aware, that’s still a seminal text out there. It’s still something that is read broadly and is being cited broadly, and in fact, it’s citations are going up and were even being requested by Oxford University Press to produce an update of the book. Machine Ethics was two parts philosophy, one part, computer science. It was basically two fields of study. One was looking explicitly at the question of implementing sensitivity to moral considerations in computers and robots, and the other side with really thinking comprehensively about how humans make moral decisions. So, arguably, Moral Machines was the first book that really took that comprehensive look at human moral decision making seriously. It was also a time when there was a lot of research going on in moral psychology in the way in which people’s affective and decision making concerns affected what became our ethical decision making processes.

So we were also able to bring some of that in, bring evolutionary psychology in and bring a lot of new fields of research that had not really been given their due or had not been integrated very well with the dominant reason based theories of ethics such as deontology, which is really ethical approaches that focus on duties, rules and consequentialism, which is an ethical theory that says right and wrong is not determined by following the rules or doing your duty, it’s determined by looking at the consequences of your action and selecting that course or the action likely to produce the greatest good for the greatest number. So it’s like we were integrating evolutionary psychology, cognitive science, moral psychology, together with the more rational-based theories, as we looked at top down and bottom up approaches for introducing sensitivity to ethical considerations in computers and robots.

The major shift in that whole trajectory and one I only learned about at the first FLI conference in Puerto Rico where I and Jim Moor were the only two people who had been actively involved in the machine ethics community, Jim Moor is a professor at Dartmouth, for those of you who are not aware of him, and he has been a seminal figure in the philosophy of computing for decades now, was at that Puerto Rican gathering, the concept of value alignment with race to us for the first time. What I realized was that those who are talking about value alignment from the AI perspective, by and large, had little or no understanding that there had ever been a field or was an ongoing field known as machine ethics.

That led to my applying for a Future of Life Institute grant, which I was awarded as PI. That grant was to host three annual workshops bringing together experts not only in AI, but machine ethics, philosophy, generally, resilience, engineering, robotics, a broad array of fields of people who had been thinking seriously about value issues in computational systems. Those really became groundbreaking workshops where it was clear that the computer scientists and the AI researchers knew very little about ethics issues, and the ethicists didn’t necessarily have a great depth of understanding of some of the challenges coming up in artificial intelligence. Bart Selman and Stuart Russell agreed to be co-chairs of those workshops with me. The last one was completed over a year ago with some closing presentations in New York city and at Yale.

Lucas Perry: I think it’d be helpful here if you could disambiguate the machine ethics crowd and way of thinking and what has been done there from the AI alignment, value alignment, Eliezer branch of thinking that has been going on. AI alignment seems more focused on explicitly trying to understand human preference hierarchies and be able to specify objectives without the machine systems doing other things that we don’t want them to do. Then you said that machine ethics is about imbuing ethical decision making faculties or reasoning or sensitivities in machine systems. That, to me, seems more like normative ethics. We have these normative theories like you mentioned deontology and consequentialism and virtue ethics, and maybe machines can invent other normative ethical theories. So they seem like different projects.

Wendell Wallach: They are very different projects. The question is whether they converge or not or whether they can really be treated totally distinct projects from each other. So when you were talking about machine ethics or when we were talking about machine ethics, we were really thinking about it in terms of just how do you introduce ethical procedures so that when machines encounter new situations, particularly when the designers can’t fully predict what their actions will be, that they factor in ethical considerations as they choose between various courses of action. So we were really talking about very basic program in the machines, but we weren’t just thinking of it in terms of the basics. We were thinking of it in terms of the evolution of smart machines. For example, in Moral Machines, Colin and I had a chart that we had actually developed with Eva Schmidt and had been in earlier articles that the three of us offered, and it looked at the development of machines on two axes.

One was increasing autonomy, and the other was increasing sensitivity with at the far other extremes, sensitivity to ethical consideration. We realized that you could put any tool within that chart. So a hammer has no sensitivity, and it has no autonomy. But when you think of a thermostat, it has a very low degree of sensitivity and a very low degree of autonomy, so as temperatures change, it can turn on or off heating. We then, within that chart, had a series of semicircles, one that delineated when we moved into the realm of what we labeled operational morality. By operational morality, we meant that the computer designers could more or less figure out all the situations the system would encounter and hard program its responses to those situations. The next level was what we call functional morality, which was as the computer programmers could no longer predetermine all the situations the system would encounter, the system would have to have some kind of ethical sub routines. Then at the highest level was full moral agency.

What we encounter in the Singularity Institute, now MIRI for artificial intelligence approach of friendly AI and what became value alignment is more or less a presumption of very high order intelligence capabilities by the system and how you would ensure that their values align with those of the machines. They tended to start from that level. So that was the distinction. Where the machine ethics folks did look at those futuristic concerns, they did more so from a philosophical level and at least a belief or appreciation that this is going to be a relatively evolutionary course, whereby the friendly AI and value alignment folks, they tended to presume that we’re going to have very high order cognitive capabilities and how do we ensure that those align with the systems. Now, the convergence, I would say, is what’s happening right now because in workshops that have been organized around the societal and ethical impact of intelligent systems. The first experiments even the value alignment people are doing still tend to be relatively low level experiments, given the capabilities assistants have today.

So I would say, in effect, they are machine ethics experiments or at least they’re starting to recognize that the challenges at least initially aren’t that much different than those the machine ethicists outlined. As far as the later concerns go, which is what is the best course to proceed on producing systems that are value aligned, well there, I think we have some overlap also coming into the machine ethicist, which raises questions about some of these more technical and mathematically-based approaches to value alignment and whether they might be successful. In that regard, Shannon Vallor, an ethicist at Santa Clara University, who wrote a book called Technology and the Virtues, and has now taken a professorship at Edinburgh, she and I produced a paper called, I think it was From Machine Ethics to Value Alignment to virtue alignment. We’re really proposing that analytical approaches alone will not get us to machines that we can trust or that will be fully ethically aligned.

Lucas Perry: Can you provide some examples about specific implementations or systems or applications of machine ethics today?

Wendell Wallach: There really isn’t much. Sensitivity to ethical considerations is still heavily reliant on how much we can get that input into systems and then how you integrate that input. So we are still very much at the stage of bringing various inputs in without a lot of integration, let alone analysis of what’s been integrated and decisions being made based on that analysis. For all purposes and both machine ethics, then I would say, bottom up value alignment, there’s just not a lot that’s been done. These are still somewhat futuristic research trajectories.

Lucas Perry: I think I’m just trying to poke here to understand better about what you find most skillful and useful about both approaches in terms of a portfolio approach to building beneficial AI systems, like if this is an opportunity to convince people that machine ethics is something valuable and that should be considered and worked on and expanded. I’m curious to know what you would say.

Wendell Wallach: Well, I think machine ethics is the name of the game in the sense that for all I talk about systems that will have very high order of capabilities. We just aren’t there. We’re still dealing with relatively limited forms of cognitive decision making. For all the wonder that’s going on in machine learning, that’s still a relatively limited kind of learning approach. So I’m not dealing with machines that are making fundamental decisions at this point, or if they are allowed to, it’s largely because humans have abrogated their responsibility, trust the machines, and let the machines make the decisions regardless of whether the machines actually have the capabilities to make sophisticated decisions.

Well, I think as we move along, as you get more and more inputs into systems and you figure out ways of integrating them, there will be the problem of which decisions can be made without, let’s just say, higher order consciousness or understanding of the falling implications of those systems, of the situations, of the ethical concerns arising in the situations and which decisions really require levels of, and I’m going to use the understanding and consciousness words, but I’m using them in a circumspect way for the machines to fully appreciate the ramifications of the decisions being made and therefore those who are affected by those decisions or how those decisions will affect those around it.

Our first stage is going to be largely systems of limited consciousness or limited understanding and our appreciation of what they can and cannot do in a successful manner and when you truly need a human decision maker in the loop. I think that’s what we are broadly. The differences between the approaches with the AI researchers are looking at what kind of flexibility they have within the tools I have now for building AI systems. The machine ethicists, I think they’ll tend to be largely philosophically rooted or ethically rooted or practically ethically rooted, and therefore they tend to be more sensitive to the ramifications of decision makings by machines and capacities that need to be accounted for before you want to turn over a decision to a machine, such as a lethal autonomous weapon. What should the machine really understand before it can be a lethal autonomous weapon, and therefore, how tightly does the meaningful human control need to be?

Lucas Perry: I’m feeling a tension between trying to understand the role and place of both of these projects and how they’re skillful. In terms just strict AI alignment, if we had a system that wanted to help us and it was very good at preference learning such that it could use all human artifacts in the world like books, movies and other things. It can also study your behavior and also have conversations with us. It could leverage all data points in the world for building a deep and rich understanding of individual human preference hierarchies, and then also it could extrapolate broad preference facts about species wide general considerations. If that project were to succeed, then within those meta preferences and that preference hierarchy exists the kinds of normative ethical systems that machine ethics is trying to pay lip service to or to be sensitive towards or to imbue in machine systems.

From my perspective, if that kind of narrative that I just gave is true or valid, then that would be sort of a complete value alignment, and so far as it would create beneficial machine systems. But in order to have that kind of normative decision making and sensibilities in machine systems such that they fully understand and are sensitive to the ethical ramifications of certain decision makings, that requires higher order logic and the ability to generate concepts and to interrelate them and to shift them around and use them in the kinds of ways that human beings do, which we’re far short of.

Wendell Wallach: So that’s where the convergence is. We’re far short of it. So I have no problem with the description you made. The only thing I noted is, at the beginning you said, if we had, and for me, in order to have, you will have to go through these stages of development that we have been alluding to as machine ethics. Now, how much of that will be able to utilize tools that come out of artificial intelligence that we had not been able to imagine in the early days of machine ethics? I have no idea. There’s so many uncertainties on how that pathway is going to unfold. There’re uncertainties about what order the breakthroughs will take place, how the breakthroughs will interact with other breakthroughs and technology more broadly, whether there will be public reactions to autonomous systems along the way that slow down the course of development or even stop certain areas of research.

So I don’t know how this is all going to unfold. I do see within the AI community, there is kind of a leap of faith to a presumption of breaths of capacity that when I look at it, I still look at, well, how do we get between here and there. When I look at getting between here and there, I see that you’re going to have to solve some of these lower level problems that got described more in the machine ethics world than have initially been seen by the value alignment approaches. That said, now that we’re getting researchers actually trying to look at implementing value alignment, I think they’re coming to appreciate that these lower level problems are there. We can’t presume high level preference parsing by machines without them going through developmental stages in relationship to understanding what a preference is, what a norm is, how they get applied within different contexts.

My sense has been that with both machine ethics and value alignment, we’ve sort of got the cart in front of the horse. So I’m waiting to see some great implementation breakthroughs, I just haven’t seen them. Most of the time, when I encounter researchers who say they’re taking seriously, I see they’re tripping over relatively low level implementations. The difficulty is here, and all of this is converging. What AI alignment was initially and what it’s becoming now I think are quite different. I think in the very early days, it really was presumptions that you would have these higher order intelligences and then how were you going to align them. Now, as AI alignment, people look at the value issues as they intersect with present day AI agendas. I realize that you can’t make the presumptions about the higher order systems without going through developmental steps to get there.

So, in that sense, I think whether it’s AI alignment or machine ethics, the one will absorb the lessons of the other. Both will utilize advances that happen on both fronts. All I’m trying to underscore here is there are computer engineers and roboticist and philosophers who reflected on issues that perhaps the value alignment people are learning something from. I, in the end, don’t care about machine ethics or value alignment per se, I just care about people talking with each other and learning what they can from each other and moving away from a kind of arrogance that I sometimes see happen on both sides of the fence that one says to the other you do not understand. The good news and one thing that I was very happy about in terms of what we did in these three workshops that I was PI on with the help of the Future of Life Institute was, I think we sort of broke open the door for transdisciplinary dialogue.

Now, true, This was just one workshop. Now, we have gone from a time where the first Future of Life Institute gathering of Puerto Rico, the ethicists in the room, Jim Moore and I were backbenchers, to a time where we have countless conferences that are basically transdisciplinary conferences where people from many fields of research are now beginning to listen to each of them. The serious folks in the technology and ethics really have recognized the richness of ethical decision making in real contexts. Therefore, I think they can point that out. Technologists sometimes like to say, “Well, you ethicist, what do you have to say because you can’t tell us what’s right and wrong anyway?” Maybe that isn’t what ethics is all about, about dictating what’s right and wrong. Maybe ethics is more about how do we navigate the uncertainties of life, and what kinds of intelligence need to be brought to bear to navigate the uncertainties of life with a degree of sensitivity, depth, awareness, and appreciation for the multilayered kinds of intelligences that come into play.

Lucas Perry: In the context of this uncertainty about machine ethics and about AI alignment and however much or little convergence there might be, let’s talk about how all of this leads up into AI governance now. You touched on a lot of your machine ethics work. What made you pivot into AI governance, and where is that taking you today?

Wendell Wallach: After completing moral machines, I started to think about the fact that very few people had a deep and multidisciplinary understanding of the broad array of ethical and societal impacts posed by emerging technologies. I decided to write a primer on that, focusing on what could go wrong and how we might diffuse ethical challenges and undesirable societal impacts. That was finally published under the title A Dangerous Master: How to Keep Technology from Slipping Beyond our Control. The first part of that was really a primer on the various fields of science from synthetic biology to geoengineering, what the benefits were, what could go wrong. But then the book was very much about introducing people to various themes that arise, managing complex, adaptive systems, resilience, engineering, transcending limits, a whole flock of themes that have become part of language of discussing emerging technologies but weren’t necessarily known to a broader public.

Even for those of us who are specialists in one area of research such as biotech, we have had very little understanding of AI or geoengineering or some of the other fields. So I felt there was a need for a primer. Then the final chapter for the primer, I turned to how some of these challenges might be addressed through governance and oversight. Simultaneously, while I was working on that book, Gary Marchant and I, Gary Marchant is the director of the Center for Law and Innovation at the Sandra Day O’Connor School of Law at Arizona State University. Gary has been a specialist in the law and governance of emerging technologies. He and I, in our interactions lamented the fact that it was very difficult for any form of governance of these technologies. It was something called the pacing problem. The pacing problem refers to the fact that scientific discovery and technological innovation is far outpacing our ability to put in place appropriate ethical legal oversight, and that converges with another dilemma that has bedeviled people in technology governance for decades, going back to 1980.

David Collingridge wrote a book where he outlined a problem that is now known as the Collingridge Dilemma. Basically, Collingridge said that while it was easiest to regulate a technology early in its style development, early in its development, we had a little idea of what its societal impact would be. By the time we did understand what the challenges from the societal impact were, the technology would be so deeply entrenched in our society that it would be very difficult to change its trajectory. So we see that today with social media. Social media was totally entrenched in our society before we realized how it could be manipulated in ways that would undermine democracy. Now we’re having a devil of a time of figuring out what we could do.

So Gary and I, who had been talking about these kinds of problems for years, we realized that we were constantly lamenting the challenge, but we altered the conversation one day over a cup of coffee. We said, “Well, if we had our druthers, if we have some degree of influence, what would we propose?” We came up with a model that we referred to as governance coordinating committees. Our idea was that you would put in place a kind of issues manager that would try and guide the development of a field, but first of all, it would just monitor development, convene forums between the many stakeholders, map issues and gaps, see if anyone was addressing those issues and gaps or where their best practices had come to the floor. If these issues were not being addressed, then how could you address them, looking at a broad array of mechanisms. By a broad array of mechanisms, we meant you start with feasible technological solutions, you then look at what can be managed through corporate self-governance, and if you couldn’t find anything in either of those areas, then you turn to what is sometimes called soft law.

Soft law is laboratory practices and procedures, standards, codes of conduct, insurance policy, a whole plethora of mechanisms that fall short of laws and regulatory oversight. The value of soft law is that soft law can be proposed easily, and you can throw it out if technological advances mean it’s no longer necessary. So it’s very agile, it’s very adaptive. Really anyone can propose the news off law mechanism. But that contributes to one of the downsides, which is you can have competing soft law, but the other downside is perhaps even more important is that you seldom have a means of enforcement if there are violations of soft law. So, on some areas you deem need enforcement, and that’s why hard law and regulatory institutions become important.

So Gary and I proposed this model. Every time we ever talked about it, people would say, “Boy, that’s a great idea. Somebody should do that.” I was going to international forums, such as going to the World Economic meetings in Davos, where I’d be asked to be a fire-starter on all kinds of subject areas by safety and food security and the law of the ocean. In a few minutes, I would quickly outline this model as a way of getting people to think much more richly about ways to manage technological development and not just immediately go to laws and regulatory bodies. All of this convinced me that this model was very valuable, but it wasn’t being taken up. All of that led to this first International Congress for the Governance of Artificial Intelligence, which will be convened in Prague on April 16 to 18. I do invite those of you listening to this podcast who are interested in the international governance of AI or really agile governance for technology more broadly to join us at that gathering.

Lucas Perry: Can you specify the extent to which you think that soft law, international norms will shape hard law policy?

Wendell Wallach: I don’t think any of this is that easy at the moment because when I started working on this project and working toward the Congress, there was almost no one in this space. Suddenly, we have a whole flock of organizations that have jumped into it. We have more than 53 lists of principles for artificial intelligence and all kinds of specifications of laws coming along like GDPR, and the EU will actually be coming out very soon with a whole other list of proposed regulations for the development of autonomous systems. So we are now in an explosion of groups, each of which in one form or another is proposing both laws and soft law mechanisms. I think that means we are even more in need of something like a governance coordinating committee. What I mean is loose coordination and cooperation, but at least putting some mechanism in place for that.

Some of the groups that have come to the floor are like the OECD, which actually represents a broad array of the nations, but not all of them. The Chinese were not party to the development of the OECD principles. The Chinese, for example, have somewhat different principles and laws that are most attractive in the west. My point is that we have an awful lot of groups, some of which would like to have a significant leadership role or are dominating role, and we’ll have to see to what extent they cooperate with each other or whether we finally have a cacophony of competing soft law recommendations. But I think even if there’s a competition at the UN perhaps with a new mechanism that we create or through each of these bodies like the OECD and IAAA individually, best practices will come to the fore over time and they will become the soft law guidelines. Now, which of those soft guidelines need to make hard law? That may vary from nation to nation.

Lucas Perry: The agility here is in part imbued by a large amount of soft laws, which will then clarify best practices?

Wendell Wallach: Well, I think like anything else, just like the development of artificial intelligence. There’s all kinds of experimentation going on, all kinds of soft law frameworks, principles which have to be developed into policy and soft law frameworks going on. It will vary from nation to nation. We’ll get an insight over time about which practices really work and which haven’t worked. Hopefully, with some degree of coordination, we can underscore the best practices, we can monitor the development of the field in a way where we can underscore where the issues that still need to be addressed. We may have forums to work out differences. There may never be a full consensus and there may not need to be a full consensus considering much of the soft law will be implemented on a national or regional view like front. Only some of it will need to be top down in the sense that it’s international.

Lucas Perry: Can you clarify the set of things or legal instruments which consist of soft law and then the side of things which make up a hard law?

Wendell Wallach: Well, hard law is always things that have become governmentally instituted. So the laws and regulatory agencies that we have in America, for example, or you have the same within Europe, but you have different approaches to hard law. The Europeans are more willing to put in pretty rigorous hard law frameworks, and they believe that if we codify what we don’t want, that will force developers to come up with new creative experimental pathways that accommodate our values and goals. In America, were reticent to codify things into hard law because we think that will squelch innovation. So those are different approaches. But below hard law, in terms of soft law, you really do have these vast array of different mechanisms. So I mentioned international standards, some of those are technical. We see a lot of technical standards come in out of the IEEE and the ISO. The IEEE, for example, has jumped into the governance of autonomous systems in a way where it wants to go beyond what can be elucidated technically to talk more about what kinds of values we’re putting in place and what the actual implementation of those values would be. So that’s soft law.

Insurance policies sometimes dictate what you can and cannot do. So that soft law. We have laboratory practices and procedures. What’s safe to do in a laboratory and what isn’t? That’s soft law. We have new approaches to implementing values within technical systems, what is sometimes referred to as value-added design. That’s kind of a form of soft law. There are innumerable frameworks that we can come up with and we can create new ones if we need to to help delineate what is acceptable and what isn’t acceptable. But again, that delineation may or may not be enforceable. Some enforcement is, if you don’t do what the insurance policy has demanded of you, you lose your insurance policy, and that’s a form of enforceability.

You can lose membership in various organizations. Soft law gets into great detail in terms of acceptable use of humans and animals in research. But at least that’s a soft law that has, within the United States and Europe and elsewhere, some ability to prosecute people who violate the rights of individuals, who harm animals in a way that is not acceptable in the course of doing the research. So what are we trying to achieve by convening a first International Congress for the Governance of Artificial Intelligence? First of all, our hope is that we will get a broad array of stakeholders present. So, far, nearly all the governance initiatives are circumspect in terms of who’s there and who is not there. We are making special efforts to ensure that we have a robust representation from the Chinese. We’re going to make sure that we have robust representation from those from underserved nations and communities who are likely to be very effected by AI, but not necessarily we’ll know a great deal about it. So having a broad array of stakeholders is the number one goal of what we are doing.

Secondly, between here and the Congress, we’re convening six experts workshops. What we intend to do with these expert workshops is bring together a dozen or more of those individuals who have already been thinking very deeply about the kinds of governance mechanisms that we need. Do understand that I’m using the word governance, not government. Government usually just entails hard law and bureaucracies. By governance, we mean bringing in many other solutions to what we call regulatory or oversight problems. So we’re hopeful that we’ll get experts not only in AI governance, but also in thinking about agile governance more broadly that we will have them come to these small expert workshops we’re putting together, and at those expert workshops, we hope to elucidate what are the most promising mechanisms for the international governance of the AI. If they can elucidate those mechanisms, they will then be brought before the Congress. At the Congress, we’ll have further discussions and a Richmond around some of those mechanisms, and then by the end of the Congress, we will have boats to see if there’s an overwhelming consensus of those present to move forward on some of these initiatives.

Perhaps, something like what I had called the governance coordinating committee might be one of those mechanisms. I happen to have also been an advisor to the UN secretary General’s higher level panel on digital cooperation, and they drew upon some of my research and combined that with others and came up with one of their recommendations, so they recommended something that is sometimes referred to a network of networks. Very similar to what I’ve been calling a governance coordinating committee. In the end, I don’t care what mechanisms we start to put in place, just that we begin to take first steps toward putting in place that will be seen as trustworthy. If we can’t do that, then why bother. At the end of the Congress, we’ll have these votes. Hopefully that will bring some momentum behind further action to move expeditiously toward putting some of these mechanisms in place.

Lucas Perry: Can you contextualize this International Congress for the Governance of AI within the broader AI governance landscape? What are the other efforts going on, and how does this fit in with all of them?

Wendell Wallach: Well, there are many different efforts underway. The EU has its efforts, the IEEE has its effort. The World Economic Forum convenes people to talk about some of these issues. You’ll have some of this come up in the Partnership in AI, you have OECD. There are conversations going on in the UN. You the higher level panels recommendations. So they have now become a vast plethora of different groups that have jumped into it. Our point is that, so far, none of these groups include all the stakeholders. So the Congress is an attempt to bring all of these groups together and ensure that other stakeholders have a place at the table. That would be the main difference.

We want to weave the groups together, but we are not trying to put in place some new authority or someone who has authority over the individual groups. We’re just trying to make sure that we’re looking at the development of AI comprehensively, that we’re talking with each other, that we have forums to talk with each other, that issues aren’t going unaddressed, and then if somebody truly has come forward with best practices and procedures, that those are made available to everyone else in the world or at least underscored for others in the world as promising pathways to go down.

Lucas Perry: Can you elaborate on how these efforts might fail to develop trust or how they might fail to bring about coordination on the issues? Is it always in the incentive of a country to share best practices around AI if that increases the capacity of other countries to catch up?

Wendell Wallach: We always have this problem of competition and cooperation. Where’s competition going to take place? How much cooperation will there actually be? It’s no mystery to anyone in the world that decisions are being made as we speak about whether or not we’re going to move towards wider cooperation within the international world or whether we have movements where we are going to be looking at a war of civilization or at least a competition between civilizations. I happen to believe there’s so many problems within emerging technologies that if we don’t have some degree of coordination, we’re all damned and that that should prevail in global climate change and in other areas, but whether we’ll actually be able to pull that off has to do with decisions going on in individual countries. So, at the moment, we’re particularly seeing that tension between China and the US. If the trade work can be diffused, then maybe we can back off from that tension a little bit, but at the moment, everything’s up for grabs.

That being said, when everything’s up for grabs, my belief is you do what you can to facilitate the values that you think need to be forwarded, and therefore I’m pushing us toward recognizing the importance of a degree of cooperation without pretending that we aren’t going to compete with each other. Competition’s not bad. Competition, as we all know, furthers innovation helps disrupt technologies that are inefficient and replace them with more efficient ways of moving forward. I’m all for competition, but I would like to see it in a broader framework where there is at least a degree of cooperation on AI ethics and international governmental cooperation.

Lucas Perry: The path forward seems to have something to do with really reifying the importance of cooperation and how that makes us all better off to some extent, not pretending like there’s going to be full 100% cooperation, but cooperation where it’s needed such that we don’t begin defecting on each other in ways that are mutually bad and incompatible.

Wendell Wallach: That claim is central to the whole FLI approach.

Lucas Perry: Yeah. So, if we talk about AI in particular, there’s this issue of lethal autonomous weapons. There’s an issue of, as you mentioned, the spread of disinformation, the way in which AI systems and machine learning can be used more and more to lie and to spread subversive or malicious information campaigns. There’s also the degree to which algorithms will or will not be contributing to discrimination. So these are all like short term things that are governance issues for us to work on today.

Wendell Wallach: I think the longer term trajectory is that AI systems are giving increasing power to those who want to manipulate human behavior either from marketing or political purposes, and they’re manipulating the behavior by studying human behavior and playing to our vulnerabilities. So humans are very much becoming machines in this AI commercial political juggernaut.

Lucas Perry: Sure. So human beings have our own psychological bugs and exploits, and massive machine learning can find those bugs and exploits and exploit them in us.

Wendell Wallach: And in real time. I mean, with the collection of sensors and facial recognition software and emotion recognition software over 5G with a large database of our past preferences and behaviors, we can be bombarded with signals to manipulate our behavior on very low levels and areas where we are known to be vulnerable.

Lucas Perry: So the question is to the extent to which and the strategies for which we can use within the context of these national and global AI governance efforts to mitigate these risks.

Wendell Wallach: To mitigate these risks, to make sure that we have meaningful public education, meaning I would say from grammar school up, digital literacy so that individuals can recognize when they’re being scammed, when they’re being lied to. I mean, we’ll never be perfect at that, but at least have ones antenna out for that and the degree to which we perhaps need to have some self recognition that if we’re going to not be just manipulable. But we’ll truly cultivate the capacity to recognize when there are internal and external pressures upon us and diffuse those pressures so we can look at new, more creative, individualized responses to the challenge at hand.

Lucas Perry: I think that that point about elementary to high school education is really interesting and important. I don’t know what it’s like today. I guess they’re about the same as what I experienced. They just seemed completely incompatible with the way the technology is going and dis-employment and other things in terms of the way that they teach and what they teach.

Wendell Wallach: Well, it’s not happening within the school systems. What I don’t fully understand is how savvy young people are within their own youth culture, whether they’re recognizing when they’re being manipulated or not, whether that’s part of that culture. I mean part of my culture, and God knows I’m getting on in years now, but it goes back to questions of phoniness and pretense and so forth. So we did have our youth culture that was very sensitive to that. But that wasn’t part of what our educational institutions were engaged in.

The difference now is that we’ll have to be both within the youth culture, but also we would need to be actually teaching digital literacy. So, for an example, I’m encountering a as scam a week, I would say right now through the telephone or through email. Some new way that somebody has figured out to try and rip off some money from me. I can’t believe how many new approaches are coming up. It just flags that this form of corruption requires remarkable degree of both sensitivity but a degree of digital knowledge so that you can recognize when you need to at least check out whether this is real or a scan before you give sensitive information to others.

Lucas Perry: The saving grace, I think for, gen Z and millennial people is that… I mean, I don’t know what the percentages are, but more than before, many of us have basically grown up on the internet.

Wendell Wallach: So they have a degree of digital literacy.

Lucas Perry: But it’s not codified by an institution like the schooling system, but changing the schooling system to the technological predictions of academics. I don’t know how much hope I have. It seems like it’s a really slow process to change anything about education. It seems like it almost has to be done outside of public education

Wendell Wallach: That may be what we mean by governance now is what can be done within the existing institutions and what has to find means of being addressed outside of the existing institutions, and is it happening or isn’t it happening? If youth culture in its evolving forms gives 90% of digital literacy to young people, fine, but what about those people who are not within the networks of getting that education, and what about the other 10%? How does that take place? I think that’s the kind of creativity and oversight we need is just monitoring what’s going on, what’s happening, what’s not happening. Some areas may lead to actual governmental needs or interventions. So let’s take the technological unemployment issue. I’ve been thinking a lot about that disruption in new ways. One question I have is whether it can be slowed down. An example for me for a slow down would be if we found ways of not rewarding corporations for introducing technologies that bring about minimal efficiencies but are more costly to the society than the efficiencies that they introduce for their own productivity gains.

So, if it’s a small efficiency, but the corporation fires 10,000 people and just 10,000 people are now on the door, I’m not sure whether we should be rewarding corporations for that. On the other hand, I’m not quite sure what kind of political economy you could put in place so you didn’t reward corporations for that. Let’s just say that you have automatic long haul trucking. In the United States, we have 1.7 million long haul truck drivers. It’s one of the top jobs in the country. First of all, long haul trucking can probably be replaced more quickly than we’ll have self driving trucks in the cities because of some of the technical issues encountered in cities and on country roads and so forth. So you could have a long haul truck that just went from on-ramp to off ramp and then have human drivers who take over the truck for the last few miles to take it to the shipping depot.

But if we’ve replaced long haul truckers in the United States over a 10 year period, that would mean putting 14,000 truck drivers out of work every month. That means you have to create 14,000 jobs a month that are appropriate for long haul truck drivers. At the same time, as you’re creating jobs for new people entering the workforce and for others whose jobs are disappearing because of automation, it’s not going to happen. Given the culture in the United States, my melodramatic example is some long haul truckers may just decide to take the semis closed down interstate highways and sit in their cap and say to the government, “Bring it on.” We are moving into that kind of social instability. So, on one hand, if getting rid of the human drivers doesn’t bring massive efficiencies, it could very easily bring social instability and large societal costs. So perhaps we don’t want to encourage that. But we need to look at it in greater depth to understand what the benefits and costs are.

We often overplay the benefits, and we under-represent the downsides and the costs. You could see a form of tax on corporations relative to how many workers they laid off and how many jobs they created. It could be a sliding tax. For corporations reducing its workforce dramatically, then it gets a higher tax on its profit than one that’s actually increasing its workforce. That would be a form of maybe how you’re funding UBI. In UBI, I would like to see something that I’ve referred to as UBI plus plus plus. I mean there’ve been various UBI pluses. But in my thought was that you’re being given that basic income for performing a service for the society. In other words, performing a service for the society is your job. There may not be anybody overseeing what service you are providing or you might be able to decide yourself what that service would be.

Maybe somebody was an aspiring actor would decide that they were going to put together an acting group and take Shakespeare into the school system, that that was their service to the society. Others may decide they don’t know how to do a service to the society, but they want to go back to school, so perhaps they’re preparing for a new job or a new contribution, and perhaps other people will really need a job and we’ll have to create high touch jobs such as those that you have in Japan for them. But the point is UBI is paying you for a job. The job you’re doing is providing a service to the society, and that service is actually improving the overall society. So, if you had thousands of creative people taking educational programs into schools, perhaps you’re improving overall education and therefore the smarts of the next generation.

Most of this is not international governance, but where it does impinge upon international considerations is if we do have massive unemployment. It’s going to be poorer nations that are going to be truly set back. I’ve been planning out in international circles that we now have the Sustainable Development Goals. Well, just technological unemployment alone could undermine the realization of the Sustainable Development Goals.

Lucas Perry: So that seems like a really big scary issue.

Wendell Wallach: It’s going to vary from country to country. I mean, the fascinating thing is how different these national governments will be. So some of the countries in Africa are leap frogging technology. They’re moving forward. They’re building smart cities. They aren’t going through our development. But other countries don’t even have functioning governments or the governments are highly autocratic. When you look at the technology available for surveillance systems now, I mean we’re very likely to see some governments in the world that look like horrible forms of dictatorship gulags, at the same time as there’ll be some countries where human rights are deeply entrenched, and the oversight of the technologies will be such that they will not be overly repressive on individual behavior.

Lucas Perry: Yeah. Hopefully all of these global governance mechanisms that are being developed will bring to light all of these issues and then effectively work on them. One issue which is related, and I’m not sure how fits in here or it fits in with your thinking, is specifically the messaging and thought around the governance related to AGI and superintelligence. Do you have any thinking here about how any of this feeds into that or your thoughts about that?

Wendell Wallach: I think that the difficulty is we’re still in a realm where when and what AGI or superintelligence will appear and what it will look like. It’s still so highly speculative. So, at this stage of the game, I don’t think that AGI is really a governmental issue beyond the question of whether government should be funding some of the research. There may also be a role for governments in monitoring when we’re crossing thresholds that open the door for AGI. But I’m not so concerned about that because I think there’s a pretty robust community that’s doing that already that’s not governmental, and perhaps we don’t need the government too involved. But the point here is, if we can put in place robust mechanisms for the international governance of AI, then potentially those mechanisms either make recommendations that perhaps slow down the adoption of technologies that could be dangerous or enhance the ethics and the sensitivity and the development of the technologies. If and when we are about to cross thresholds that open real dangers or serious benefits, that we have the mechanisms in place to help regulate the unfold into that trajectory.

But that, of course, has to be wishful thinking at this point. We’re taking baby steps at this stage of the game. Those baby steps are going to be building on the activities at FLI and OpenAI and other groups that are already engaged in. My way of approaching it is, and it’s not just with AGI, it’s also in relationship to biotech, is just a flag that are speculative dangers out there, and we are making decisions today about what pathways we, humanity as a whole, want to navigate. So, oftentimes in my presentations, I will have a slide up, and that slide is two robots kneeling over the corpse of a human. When I put that slide up, I say we may even be dealing with the melodramatic possibility that we are inventing the human species as we have known it out of existence.

So that’s my way of flagging that that’s the concern, but not trying to pretend that that’s one that governments should or can address at this point more that we are inflection point where we should and can put in place values and mechanisms to try and ensure that the trajectory of the emerging technologies is human-centered, is planet-centered, is about human flourishing.

Lucas Perry: I think that the worry of the information that is implicit to that is that if there are two AIs embodied as robots or whatever, standing over a human corpse to represent them dominating or transcending the human species. What is implicit to that is that they have more power than us because you require more power to be able to do something like that. To have more power than the human species is something governments would maybe be interested in that would be something maybe we wouldn’t want to message about.

Wendell Wallach: I mean, it’s the problem with lethal autonomous weapons. Now, I think most of the world has come to understand that lethal autonomous weapons is a bad idea, but that’s not stopping governments from pursuing them or the security establishment within government saying that it’s necessary that we go down this road. Therefore, we don’t get an international ban or treaty. The messaging with governments is complicated. I’m using the messaging only to stress what I think we should be doing in the near term.

Lucas Perry: Yeah, I think that that’s a good idea and the correct approach. So, if everything goes right in terms of this process of AI governance, then we’re able to properly manage the development of new AI technology, what is your hope here? What are optimistic visions of the future, given successful AI governance?

Wendell Wallach: I’m a little bit different than most people on this. I’m not so much caught up in visions of the future based on this technology or that technology. My focus is more that we have a conscious active decision making process in the present where people get to put in place the values and instruments they need to have a degree of control over the overall development of emerging technologies. So, yes, of course I would like to see us address global climate change. I would like us to adapt AI for all. I would like to see all kinds of things take place. But more than anything, I’m acutely aware of what a significant inflection point this is in human history, and that we’re having the pass through a very difficult and perhaps in relatively narrow doorway in order ensure human flourishing for the next couple of hundred years.

I mean, I understand that I’m a little older than most of the people involved in this process, so I’m not going to be on the stage for that much longer barring radical life extension taking place in the next 20 years. So, unlike many people who are working on positive technology visions for the future, I’m less concerned with the future and more concerned with how, in the present, we nudge technology onto our positive course. So my investment is more that we ensure that humanity not only have a chance, but a chance to truly prevail.

Lucas Perry: Beautiful. So you’re now discussing about how you’re essentially focused on what we can do immediately. There’s the extent to which AI alignment and machine ethics or whatever are trying to imbue an understanding of human preference hierarchies in machine systems and to develop ethical sensibilities and sensitivities. I wonder what the role is for, first of all, embodied compassion and loving kindness in persons as models for AI systems and then embodied loving kindness and compassion and pure altruism in machine systems as a form of alignment with idealized human preference hierarchies and ethical sensibilities.

Wendell Wallach: In addition of this work I’m doing on the governance of emerging technologies, I’m also writing a book right now. The book has a working title, which is Descartes Meets Buddha: Enlightenment for the Information Age.

Lucas Perry: I didn’t know that. So that’s great.

Wendell Wallach: So this fits in with your question very broadly. I’m both looking at if the enlightenment ethos, which has directed humanities development over the last few hundred years is imploding under the weight of its own success, then what ethos do we put in place that gives humanity a direction for flourish and over the next few hundred years? I think central to creating that new ethos is to have a new understanding of what it means to be human. But that new understanding isn’t something totally new. It needs to have some convergence with what’s been perennial wisdom to be meaningful. But the fact is when we ask these questions, how are we similar to and how do we truly differ from the artificial forms of intelligence that we’re creating? Or what will it mean to be human as we evolved through the impact of emerging technologies, whether that’s life extension or uploading or bioengineering?

There still is this fundamental question about what grounds, what it means to be human. In other words, what’s not just up for grabs or up for engineering. To that, I bring in my own reflections after having meditated for the last 50 years on my own insights shall we say and how that converges with what we’ve learned about human functioning, human decision making and human ethics through the cognitive sciences over the last decade or two. Out of that, I’ve come up with a new model that I referred to as cyber souls, meaning that as sciences illuminating the computational and biochemical mechanisms that give rise to human capabilities, we have often lost sight of the way in which evolution also forged us into integrated beings, integrated within ourselves and searching for an adapted integration to the environment and the other entities that share in that environment.

And it’s this need for integration and relationship, which is fundamental in ethics, but also in decision making. There’s the second part of this, which is this new fascination with moral psychology and the recognition that reason alone may not be enough for good decision making. And that if we have an ethics that doesn’t accommodate people’s moral psychology, then reason alone isn’t going to be persuasive for people, they have to be moved by it. So I think this leads us to perhaps a new understanding of what’s the role of psychological states in our decision making, what information is carried by different psychological states, and how does that information help direct us toward making good and bad decisions. So I call that a silent ethic. There are certain mental states, which historically have at least indicated for people that they’re in the right place at the right time, in the right way.

Oftentimes, these states, whether they’re called flow or oneness or creativity, they’re being given some spiritual overlay and people look directly at how to achieve these states. But that may be a misunderstanding of the role of mental states. Mental States are giving us information. As we factor that information into our choices and actions, those mental states fall away, and the byproduct are these so-called spiritual or transcendent states, and often they have characteristics where thought and thinking comes to a rest. So I call this the silent ethic, taking the actions, making the choices that allow our thoughts to come to rest. When our thoughts are coming to rest, we’re usually in relationships within ourself and our environments that you can think of as embodied presence or perhaps even the foundations for virtue. So my own sense is we may be moving toward a new or revived virtue ethics. Part of what I’m trying to express in this new book is what I think is foundational to the flourishing of that new virtue ethics.

Lucas Perry: That’s really interesting. I bring this up and asking because I’ve been interested in the role of idealization, ethically, morally and emotionally in people and reaching towards whatever is possible in terms of human psychological enlightenment and how that may exist as certain benchmarks or reference frames in terms of value learning.

Wendell Wallach: Well, it is a counter pose to the notion that machines are going to have this kind of embodied understanding. I’m highly skeptical that we will get machines in the next hundred years that come in close to this kind of embodied understanding. I’m not skeptical that we could have on new kind of revival movement among humans where we create a new class of moral exemplars, which seems to be the exact opposite of what we’re doing at the moment.

Lucas Perry: Yeah. If we can get the AI systems and create abundance and reduce existential risk of bunch and have a long period of reflection, perhaps there will be this space for reaching for the limits of human idealization and enlightenment.

Wendell Wallach: It’s part of what the whole question is going on, for us, philosophy types, to what extent is this all about machine superintelligence and to what extent are we using the conversation about superintelligence as an imperfect mirror to think more deeply about the ways we’re similar to in dissimilar from the AI systems we’re creating or have a potential to create.

Lucas Perry: All right. So, with that, thank you very much for your time.

 If you enjoyed this podcast, please subscribe. Give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI alignment series.

End of recorded material

FLI Podcast: Cosmological Koans: A Journey to the Heart of Physical Reality with Anthony Aguirre

There exist many facts about the nature of reality which stand at odds with our commonly held intuitions and experiences of the world. Ultimately, there is a relativity of the simultaneity of events and there is no universal “now.” Are these facts baked into our experience of the world? Or are our experiences and intuitions at odds with these facts? When we consider this, the origins of our mental models, and what modern physics and cosmology tell us about the nature of reality, we are beckoned to identify our commonly held experiences and intuitions, to analyze them in the light of modern science and philosophy, and to come to new implicit, explicit, and experiential understandings of reality. In his book Cosmological Koans: A Journey to the Heart of Physical Reality, FLI co-founder Anthony Aguirre explores the nature of space, time, motion, quantum physics, cosmology, the observer, identity, and existence itself through Zen koans fueled by science and designed to elicit questions, experiences, and conceptual shifts in the reader. The universe can be deeply counter-intuitive at many levels and this conversation, rooted in Anthony’s book, is an attempt at exploring this problem and articulating the contemporary frontiers of science and philosophy.

Topics discussed include:

  • What is skillful of a synergy of Zen and scientific reasoning
  • The history and philosophy of science
  • The role of the observer in science and knowledge
  • The nature of information
  • What counts as real
  • The world in and of itself and the world we experience as populated by our concepts and models of it
  • Identity in human beings and future AI systems
  • Questions of how identity should evolve
  • Responsibilities and open questions associated with architecting life 3.0

 

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Lucas Perry: Welcome to the Future of Life Institute podcast. I’m Lucas Perry. Today, we’re speaking with Anthony Aguirre. He is a cosmologist, a co-founder of the Future of Life Institute, and a co-founder of the Foundational Questions Institute. He also has a cool prediction market called Metaculus that I suggest you check out. We’re discussing his book, Cosmological Koans: A Journey Into the Heart of Physical Reality. This is a book about physics from a deeply philosophical perspective in the format of Zen koans. This discussion is different from the usual topics of the podcast, thought there are certainly many parts that directly apply. I feel this will be of interest to people who like big questions about the nature of reality. Some questions that we explore are, what is skillful of a synergy of Zen and scientific reasoning, the history and philosophy of science, the nature of information, we ask what is real, and explore that question. We discuss the world in and of itself and the world we experience as populated by our concepts and stories about the universe. We discuss identity in people and future AI systems. We wonder about how identity should evolve in persons and AI systems. And we also get into the problem we face of architecting new forms of intelligence with their own lived experiences, and identities, and understandings of the world. 

As a bit of side news, Ariel is transitioning out of her role at FLI. So, i’ll be taking over the main FLI podcast from here on out. This podcast will continue to deal with broad issues in the space of existential risk and areas that pertain broadly to the Future of Life Institute. Like, AI risk and AI alignment, as well as bio-risk and climate change, and the stewardship of technology with wisdom and benevolence in mind. And the AI Alignment Podcast will continue to explore the technical, social, political, ethical, psychological, and broadly interdisciplinary facets of the AI alignment problem. So, I deeply appreciated this conversation with Anthony and I feel that conversations like these help me to live what I feel is an examined life. And if these topics and questions that I’ve mentioned are of interest to you or resonate with you then I think you’ll find this conversation valuable as well. 

So let’s get in to our conversation with Anthony Aguirre. 

We’re here today to discuss your work, Cosmological Koans: A Journey to the Heart of Physical Reality. As a little bit of background, tell me a little bit about your experience as a cosmologist and someone interested in Zen whose pursuits have culminated into his book.

Anthony Aguirre: I’ve been a cosmologist professionally for 20 years or so since grad school I suppose, but I’ve also for my whole life had just the drive to understand what reality is, what’s reality all about. One approach to that certainly to understanding physical reality is physics and cosmology and fundamental physics and so on. I would say that the understanding of mental reality, what is going on in the interior sense is also reality and is also crucially important. That’s what we actually experience. I’ve long had an interest in both sides of that question. What is this interior reality? Why do we have experience the way we do? How is our mind working? As well as what is the exterior reality of physics and the fundamental physical laws and the large scale picture of the universe and so on?

While professionally I’ve been very  focused on the external side and the cosmological side in particular, I’ve nourished that interest in the inner side as well and how that interior side and the exterior side connect in various ways. I think that longstanding interest has built the foundation of what then turned into this book that I’ve put together over a number of years that I don’t care to admit.

Lucas Perry: There’s this aspect of when we’re looking outward, we’re getting a story of the universe and then that story of the universe eventually leads up into us. For example as Carl Sagan classically pointed out, the atoms which make up your body had to be fused in supernovas, at least the things which aren’t hydrogen and helium. So we’re all basically complex aggregates of collapsed interstellar gas clouds. And this shows that looking outward into the cosmos is also a process of uncovering the story of the person and of the self as well.

Anthony Aguirre: Very much in that I think to understand how our mind works and how our body works, we have to situate that within a chain of wider and wider context. We have to think of ourselves as biological creatures, and that puts us in the biological context and evolution and evolution over the history of the earth, but that in turn is in the context of where the earth sits in cosmic evolution in the universe as a whole, and also where biology and its functioning sits within the context of physics and other sciences, information theory, computational science. I think to understand ourselves, we certainly have to understand those other layers of reality.

I think what’s often assumed though is that to understand those other layers of reality, we don’t have to understand how our mind works. I think that’s tricky because on the one hand, we’re asking for descriptions of objective reality, and we asking for laws of physics. We don’t want to ask for our opinion that we’re going to disagree about. We want something that transcends our own minds and our ability to understand or describe those things. We’re looking for something objective in that sense.

I think it’s also true that many of the things that we talk about is fairly objective contain unavoidably a fairly subjective component to them. Once we have the idea of an objective reality out there that is independent of who’s observing it, we ascribe a lot of objectivity to things that are in fact much more of a mix that have a lot more ingredients that we have brought to them than we like to admit and are not wholly out there to be observed by us as impartial observers but are very much a tangled interaction between the observer and the observed.

Lucas Perry: There are many different facets and perspectives here about why taking the cosmological perspective of understanding the history of the universe, as well as the person, is deeply informative. In terms of the perspective of the Future of Life Institute, understanding cosmology tells us what is ultimately possible for life in terms of how long the universe will last, and how far you can spread, and fundamental facts about information and entropy, which are interesting, and also ultimately determine how the fate of intelligence and consciousness in the world. There’s also this anthropic aspect that you’re touching on about how observers only observe the kinds of things that observers are able to observe. We can also consider the limits of the concepts that are born of being a primate conditioned by evolution and culture, and the extent to which our concepts are lived experiences within our world model. And then there’s this distinction between the map and the territory, or our world model and the world itself. And so perhaps part of fusing Zen with cosmology is experientially being mindful of not confusing the map for the territory in our moment to moment experience of things.

There’s also this scientific method for understanding what is ultimately true about the nature of reality, and then what Zen offers is an introspective technique for trying to understand the nature of the mind, the nature of consciousness, the causes and conditions which lead to suffering, and the concepts which inhabit and make up conscious experience. I think all of this thinking culminates into an authentically lived life as a scientist and as a person who wants to know the nature of things, to understand the heart of reality, to attempt to not be confused, and to live an examined life – both of the external world and the experiential world as a sentient being. 

Anthony Aguirre: Something like that, except I nurture no hope to ever not be confused. I think confusion is a perfectly admirable state in the sense that reality is confusing. You can try to think clearly, but I think there are always going to be questions of interests that you simply don’t understand. If you go into anything deeply enough, you will fairly quickly run into, wow, I don’t really get that. There are very few things that if you push into them carefully and skeptically and open-mindedly enough, you won’t come to that point. I think it would actually be I think let down if I ever got to the point where I wasn’t confused about something. All the fun would be gone, but otherwise, I think I agree with you. Where shall we start?

Lucas Perry: This helps to contextualize some of the motivations here. We can start by explaining why cosmology and Zen in particular? What are the skillful means born of a fusion of these two things? Why fuse these two things? I think some number of our audience will be intrinsically skeptical of all religion or spiritual pursuits. So why do this?

Anthony Aguirre: There are two aspects to it. I think one is a methodological one, which is Cosmological Koans is made up of these koans, and they’re not quite the same koans that you would get from a Zen teacher, but they’re sort of riddles or confrontations that are meant to take the recipient and cause them to be a little bit baffled, a little bit surprised, a little bit maybe shocked at some aspect of reality. The idea here is to both confront someone with something that is weird or unusual or contradicts what they might have believed beforehand in a comfortable, familiar way and make it uncomfortable and unfamiliar. Also to make the thing that is being discussed about the person rather than abstracts intellectual pursuit. Something that I like about Zen is that it’s about immediate experience. It’s about here you are here and now having this experience.

Part of the hope I think methodologically of Cosmological Koans is to try to put the reader personally in the experience rather than have it be stuff out there that physicists over there are thinking about and researching or we can speculate with a purely third person point of view to emphasize that if we’re talking about the universe and the laws of physics and reality, we’re part of the universe. We’re obeying those laws of physics. We’re part of reality. We’re all mixed up in that there can be cases where it’s useful to get a distance from that, but then there are also cases where it’s really important to understand what that all has to do with you. What does this say about me and my life, my experience, my individual subjective, first person view of the world? What does that have to do with these very third person objective things that physics studies?

Part of the point is an interesting and fun way to jolt someone into seeing the world in a new way. The other part is to make it about the reader in this case or about the person asking the questions and not just the universe out there. That’s one part of why I chose this particular format.

I think the other is a little bit more on the content side to say I think it’s dangerous to take things that were written 2,500 years ago and say, oh look, they anticipated what modern physics is finding now. They didn’t quite. Obviously, they didn’t know calculus, let alone anything else that modern physics knows. On the other hand, I think the history of thinking about reality from the inside out, from the interior perspective using a set of introspective tools that were incredibly sophisticated through thousands of years does have a lot to say about reality when the reality is both the internal reality and the external one.

In particular, when you’re talking about a person experiencing the physical world perceiving something in the exterior physical world in some way, what goes on in that process that has both the physical side to it and an internal subjective mental side to it, observing how much of the interior gets brought to the perception. In that sense, I think the Eastern traditions are way ahead of where the West was. The West has had this idea that there’s the external world out there that sends information in and we receive it and we have a pretty much accurate view of what the world is. The idea that instead what we are actually experiencing is very much a joint effort of the experiencer and that external world building up this thing in the middle that brings that individual along with a whole backdrop of social and biological and physical history to every perception. I think that is something that is (a) true, and (b) there’s been a lot more investigation of that on the Eastern and on the philosophical side, some in Western philosophy too of course, but on the philosophical side rather than just the physical side.

I think the book is also about exploring that connection. What are the connections between our personal first person, self-centered view and the external physical world? In doing that investigation, I’m happy to jump to whatever historical intellectual foundations there are, whether it’s Zen or Western philosophy or Indian philosophy or modern physics or whatever. My effort is to touch on all of those at some level in investigating that set of questions.

Lucas Perry: Human beings are the only general epistemic agents in the universe that we’re currently aware of. From the point of view of the person, all the progress we’ve done in philosophy and science, all that there has ever been historically, from a first person perspective, is consciousness and its contents, and our ability to engage with those contents. It is by virtue of engaging with the contents of consciousness that we believe that we gain access to the outside world.  You point out here that in Western traditions, it’s been felt that we just have all of this data come in and we’re basically just seeing and interacting with the world as it really is. But as we’ve moreso uncovered, and in reality, the process of science and interrogating the external world is more like you have this internal virtual world model simulation that you’re constructing, that is a representation of the world that you use to engage and navigate with it. 

From this first person experiential bedrock, Western philosophers like Descartes have tried to assume certain things about the nature of being, like “I think, therefore I am.” And from assumptions about being, the project and methodologies of science are born of that reasoning and follow from it. It seems like it took Western science a long time, perhaps up until quantum physics, to really come back to the observer, right?

Anthony Aguirre: Yeah. I would say that a significant part of the methodology of physics was at some level to explicitly get the observer out and to talk about only objectively mathematically definable things. The mathematical part is still with physics. The objective is still there, except that I think there’s a realization that one always has to, if one is being careful, talk about what actually gets observed. You could do all of classical physics at some level, physics up to the beginning of the 20th century without ever talking about the observer. You could say there is this object. It is doing this. These are the forces acting on it and so on. You don’t have to be very careful about who is measuring those properties or talking about them or in what terms.

Lucas Perry: Unless they would start to go fast and get big.

Anthony Aguirre: Before the 20th century, you didn’t care if things were going fast. In the beginning of the 20th century though, there was relativity, and there was quantum mechanics, and both of those suddenly had the agent doing the observations at their centers. In relativity, you suddenly have to worry about what reference frame you’re measuring things in, and things that you thought were objective facts like how long is the time interval between two things that happen suddenly were revealed to be not objective facts, but dependent on who the observer is in particular, what reference frame their state of motion and so on.

Everything else as it turned out is really more like a property of the world that the world can either have or not when someone checks. The structure of quantum mechanics is at some level things have a state, which encodes something about the objects, and the something that it encodes is there’s this set of questions that I could ask the object and I can get answers to those questions. There’s a particular set of questions that I might ask and I’d get definite answers. If I ask other questions that aren’t in that list, then I get answers still, but they’re indefinite, and so I have to use probabilities to describe them.

This is a very different structure to say the object is a list of potential answers to questions that I might pose. It’s very different from saying there’s a chunk of stuff that has a position and a momentum and a force is acting on it and so on. It feels very different. While mathematically you can make the connections between those, it is a very different way of thinking about reality. That is a big change obviously and one that I think still isn’t complete in the sense that as soon as you start to talk that way and say an electron or a glass of water or whatever is a set of potential answers to questions, that’s a little bit hard to swallow, but you immediately have to ask, well, who’s asking the questions and who’s getting the answers? That’s the observer.

The structure of quantum mechanics from the beginning has been mute about that. It said make an observation and you’ll get these probabilities. That’s just pushing the observer into the thing that by definition makes observations, but without a specification of what does that mean to make an observation, what’s allowed to do it and what isn’t? Can an electron observe another electron or does it have to be a big group of electrons? What is it exactly that counts as making an observation and so on? There are all these questions about what this actually means that have just been sitting around since quantum mechanics was created and really haven’t been answered at any agreed upon or really I would say satisfactory way.

Lucas Perry: Theres a ton there. In terms of your book, there’s this fusion between what is skillful and true about Zen and what is skillful and true about science. You discussed here historically this transition to an emphasis on the observer and information and how those change both epistemology and ontology. The project of Buddhism or the project of Zen is ultimately also different from the project and intentions of Western science historically in terms of the normative, and the ethics driving it, and whether it’s even trying to make claims about those kinds of things. Maybe you could also explain a little bit there about where the projects diverge, what they’re ultimately trying to say either about the nature of reality or the observer.

Anthony Aguirre: Certainly in physics and much of philosophy of physics I suppose, it’s purely about superior understanding of what physical reality is and how it functions and how to explain the world around us using mathematical theories but with little or no translation of that into anything normative or ethical or prescriptive in some way. It’s purely about what is, and not only is there no ought connected with it as maybe there shouldn’t be, but there’s no necessary connection between any statement of what ought to be and what is. No translation of because reality is like this, if we want this, we should do this.

Physics has got to be part of that. What we need to do in order to achieve our goals has to do with how the world works, and physics describes that so it has to be part of it and yet, it’s been somewhat disconnected from that in a way that it certainly isn’t in spiritual traditions like Buddhism where our goal in Buddhism is to reduce or eliminate suffering. This is how the mind works and therefore, this is what we need to do given the way the mind and reality works to reduce or eliminate suffering. That’s the fundamental goal, which is quite distinct from the fundamental goal of just I want to understand how reality works.

 do think there’s more to do, and obviously there are sciences that fill that role like psychology and social science and so on that are more about let’s understand how the mind works. Let’s understand how society works so that given some set of goals like greater harmony in society or greater individual happiness, we have some sense of what we should do in order to achieve those. I would say there’s a pretty big gap nowadays between those fields on the one hand and fundamental physics on the other hand. You can spend a lot of time doing social science or psychology without knowing any physics and vice versa, but at the same time, it’s not clear that they really should be so separate. Physics is talking about the basic nature of reality. Psychology is also talking about the basic nature of reality but two different sides of it, the interior side and the exterior side.

Those two are very much connected, and so it should not be entirely possible to fully understand one without at least some of the other. That I think is also part of the motivation that I have because I don’t think that you can have a comprehensive worldview of the type that you want to have in order to understand what we should do, without having some of both aspects in it.

Lucas Perry: The observer has been part of the equation the whole time. It’s just that classical mechanics is a problem such that it never really mattered that much, but now it matters more given astronomy and communications technologies.  When determining what is, the fact that an observer is trying to determine what is and that the observer has a particular nature impacts the process of trying to discover what is, but not only are there supposed “is statements” that we’re trying to discover or understand, but we’re also from one perspective conscious beings with experiences and we have suffering and joy, and are trying to determine what we ought to do. I think what you’re pointing towards is basically an alternate unification of the problem of determining what is, and also of the often overlooked fact that we are contextualized as a creature in the world we’re attempting to understand, and make decisions about what to do next.

Anthony Aguirre: I think you can think of that in very big terms like that in this cosmic context, what is subjectivity? What is consciousness? What does it mean to have feelings of moral value and so on? Let’s talk about that. I think it’s also worth being more concrete in the sense that if you think about my experience as an agent in the world insofar as I think the world is out there objectively and I’m just perceiving it more or less directly. I tend to make very real in my mind a lot of things that aren’t necessarily real. Things that are very much half created by me, I tend to then turn into objective things out there and then react to them. This is something that we just all do on a personal basis all the time in our daily lives. We make up stories and then we think that those stories are real. This is just a very concrete thing that we do every day.

Sometimes that works out well and sometimes it doesn’t because if the story that we have is different from the story that someone else has or the story that society has, or if some in some ways somewhat more objective story then we have a mismatch and we can cause a lot of poor choices and poor outcomes by doing that. Simply the very clear psychological fact that we can discover with a little bit of self analysis that the stories that we make up aren’t as true as we usually think they are, that’s just one end of the spectrum of this process by which we as sentient beings are very much co-creating the reality that we’re inhabiting.

I think this co-creation process we’re comfortable with the fact that it awkwardly happens when we make up stories about what happened yesterday when I was talking to so and so. We don’t think of it so much when we’re talking about a table. We think the table is there. It’s real. If anything, it is. When we go deeper, we can realize that all of the things like color and solidity and endurance over time aren’t in the way function of the atoms and the laws of physics evolving them. Those things are properties that we’ve brought as useful ways to describe the world that have developed over millions of years of evolution and thousands of years of social evolution and so on. Those properties, none of those things are built into the laws of nature. Those are all things that we’ve brought. That’s not to say that the table is made up. Obviously, it’s not. The table is very objective in a sense, but there’s no table built into the structure of the universe.

I think we tend to brush under the rug how much we bring to our description of reality. We say that it’s out there. We can realize that on small levels, but I think to realize the depth of how much we bring to our perceptions and where that stuff comes from, which is a long historical, complicated information generating process that takes a lot more diving in and thinking about.

Lucas Perry: Right. If one were god or if one were omniscient, then to know the universe at the ultimate level would be to know the cosmic wave function, and within the cosmic wave function, things like marriage and identity and the fact that I have a title and conceptual history about my life are not bedrock ontological things. Rather they’re concepts and stories that sentient beings make up due to, as you said, evolution and social conditioning and culture.

Anthony Aguirre: Right, but when you’re saying that, I think there’s a suggestion that the cosmic wave functions description would be better in some way. I’d take issue with that because I think if you were some super duper mega intelligence that just knew the position of every atom or exactly the cosmic wave function, that doesn’t mean that you would know that the table in front of me is brown. That description of reality has all the particles in it and their positions and at some level, all the information that you could have of the fundamental physics, but it’s completely missing a whole bunch of other stuff, which are the ways that we categorize that information into meaningful things like solidity and color and tableness.

Lucas Perry: It seems to me that that must be contained within that ultimate description of reality because in the end, we’re just arrangements of particles and if god or the omniscient thing could take the perspective of us then they would see the table or the chair and have that same story. Our stories about the world are information built into us. Right?

Anthony Aguirre: How would it do that? What I’m saying is there’s information. Say the wave function of the universe. That’s some big chunk of information describing all kinds of different observations you could make of locations of atoms and things, but nowhere in that description is it going to tell you the things that you would need to know in order to talk about whether there’s a glass on the table in front of me because glass and table and things are not part of that wave function. Those are concepts that have to be added to it. It’s more specification that has been added that exists because of our view of the world. It only exists from the interior perspective of where we are as creatures that have evolved and are looking out.

Lucas Perry: My perspective here is that given the full capacity of the universal wave function for the creation of all possible things, there is the total set of arbitrary concepts and stories and narratives and experiences that sentient beings might dream up that arrive within the context of that particular cosmic wave function. There could be tables and chairs, or sniffelwoops and worbblogs but if we were god and we had the wave function, we could run it such that we created the kinds of creatures who dreamt a life of sniffelwoops and worbblogs or whatever else. To me, it seems like it’s more contained within the original thing.

Anthony Aguirre: This is where I think it’s useful to talk about information because I think that I just disagree with that idea in the sense that if you think of an eight-bit string, so there’s 256 possibilities of where the ones and zeros can be on and off, if you think of all 256 of those things, then there’s no information there. Whereas when I say actually only 128 of these are allowed because the first one is a one, you cut down the list of possibilities, but by cutting it down, now there’s information. This is exactly the way that information physically or mathematically is defined. It’s by saying if all the possibilities are on equal footing, you might say equally probable, then there’s no information there. Whereas, if some of them are more probable or even known, like this is definitely a zero or one, then that whole thing has information in it.

I think very much the same way with reality. If you think of all the possibilities and they’re all on the table with equal validity, then there’s nothing there. There’s nothing interesting. There’s no information there. It’s when you cut down the possibilities that the information appears. You can look at this in many different contexts. If you think about it in quantum mechanics, if you start some system out, it evolves into many possibilities. When you make an observation of it, you’re saying, oh, this possibility was actually realized and in that sense, you’ve created information there.

Now suppose you subscribe to the many worlds view of quantum mechanics. You would say that the world evolves into two copies, one in which thing A happened and one in which thing B happened. In that combination, A and B, there’s less information than in either A or B. If you’re observer A or if you’re observer B, you have more information than if you’re observer C looking at the combination of things. In that sense, I think we as residents, not with omniscient view, but as limited agents that have a particular point of view actually have more information about the world in a particular sense than someone who has the full view. The person with the full view can say, well, if I were this person, I would see this, or if I were this person, I would see that. They have in some sense a greater analytical power, but there’s a missing aspect of that, which is to make a choice as to which one you’re actually looking at, which one you’re actually residing in.

Lucas Perry: It’s like the world model which you’re identified with or the world model which you’re ultimately running is the point. The eight-bit string that you mentioned: that contains all possible information that can be contained within that string. Your point is that when we begin to limit it is when we begin to encode more information.

Anthony Aguirre: That’s right. There’s a famous story called the Library of Babel by Borges. It’s a library with every possible sequence of characters just book, after book, after book. You have to ask yourself how much information is there in that library. On the one hand, it seems like a ton because each volume you pick out has a big string of characters in it, but on the other hand, there’s nothing there. You would search forever practically far longer than the age of the universe before you found even a sentence that made any sense.

Lucas Perry: The books also contain the entire multi-verse, right?

Anthony Aguirre: If they go on infinitely long, if they’re not finite length books. This is a very paradoxical thing about information, I think, which is that if you combine many things with information in them, you get something without information in it. That’s very, very strange. That’s what the Library of Babel is. I think it’s many things with lots of information, but combined, they give you nothing. I think that’s in some level how the universe is that it might be a very low information thing in and of itself, but incredibly high information from the standpoint of the beings that are in it like us.

Anthony Aguirre: When you think of it that way, we become vastly, vastly more important than you might think because all of that information that the universe then contains is defined in terms of us, in terms of the point of view that we’re looking out from, without which there’s sort of nothing there. That’s a very provocative and strange view of the world, but that’s more and more the way I think maybe it is.

Lucas Perry: I’m honestly confused. Can you expand upon your example? 

Anthony Aguirre: Suppose you’ve got the library of Babel. It’s there, it’s all written out. But suppose that once there’s a sentence like, “I am here observing the world,” that you can attribute to that sentence a point of view. So once you have that sequence of words like, “I am here observing the world,” it has a subjective experience. So then almost no book has that in this whole library, but a very, very, very select few do. And then you focus on those books. That sub-selection of books you would say there’s a lot of information associated with that subsection, because making something more special means that it has more information. So once you specify something, there’s a bunch of information associated with it.

Anthony Aguirre: By picking out those particular books, now you’ve created information. What I’m saying is there’s a very particular subset of the universe or subset of the ways the universe could be, that adds a perspective that has a subjective sense of looking out at the world. And if you specify, once you focus in from all the different states of the universe to those associated … having that perspective, that creates a whole bunch of information. That’s the way that I look at our role as subjective observers in the universe, that by being in a first person perspective, you’re sub-selecting a very, very, very special set of matter and thus creating a whole ton of information relative to all possible ways that the matter could be arranged.

Lucas Perry: So for example, say the kitchen is dirty, and if you leave the kitchen alone, entropy will just continue to make the kitchen more dirty because there are more possible states in which the kitchen is dirty than it is clean, and there are more possible states in the universe in which sentient human beings do not arise. But here we are, encoded on a planet with the rest of organic life … and in total, evolution and the history of life on this planet requires requires a large and unequal amount of information and specification. 

Anthony Aguirre: Yes, I would say … We haven’t talked about entropy, and I don’t know if we should. Genericness is the opposite of information. So when something’s very specific, there’s information content, and when it’s very generic, there’s less information content. This is at some level saying, “Our first person perspective as conscious beings is very, very specific.” I think there is something very special and mysterious at least, about the fact that there’s this very particular set of stuff in the universe that seems to have a first person perspective associated with it. That’s where we are, sort of almost by definition.

That’s where I think the question of agency and observation and consciousness has something to do with how the universe is constituted, not in that it changes the universe in some way, but that connected with this particular perspective is all this information, and if the physical world is at some level made of information, that’s a very radical thing because that’s saying that through our conscious existence and our particular point of view, we’re creating information, and information is reality, and therefore we’re creating reality.

There are all these ways that we apply physics to reality. They’re very information theoretic. There’s this sort of claim that a more useful way to think about the constituents of reality are as informational entities. And then the second claim is that by specifying, we create information. And then the third is that by being conscious observers who come into being in the universe and then have our perspective that we look out toward the universe from, that we are making a selection, we’re specifying, “This is what I see.” So we’re then creating a bunch of information and thus creating a reality.

In that sense, I’m claiming that we create a reality, not from some, “I think in my mind and therefore reality appears like magical powers,” but that if we really talk about what’s real, it isn’t just little bits of stuff I think, but it’s everything else that makes up reality and that information that makes up reality is something that we very much are part of the creation of. 

There are different definitions of information, but the way that the word is most commonly used is for Shannon information. And what that is, is an amount that is associated with a set of probabilities. So if I say I’m going to roll some dice, what am I going to roll? So you’d say, “I don’t know.” And I’d say, “Okay, so what probabilities would you ascribe to what I’m going to roll?” And you’d say, “Well probably a sixth for each side of the die.” And I would say that there’s zero information in that description. And I say that because that’s the most uncertain you could be about the rolls of the dice. There’s no information there in your description of the die.

Now I roll it, and we see that it’s a three. So now the probability of three is 100% or at least very close to it. And the probability of all the other ones is zero. And now there is information in our description. Something specific has happened, and we’ve created information. That’s not a magical thing; it’s just the information is associated with probabilities over things, and when we change the probabilities, we change how much information there is.

Usually when we observe things, we narrow the probabilities. That’s kind of the point of making observations, to find out more about something. In that sense, we can say that we’re creating information or we’re gathering information, so we’ve created information or gathered it in that sense by doing the measurement. In that sense, any time we look at anything, we’re creating information, right?

If I just think what is behind me, well there’s probably a pillar. It might be over there, it might be over there. Now let me turn around and look. Now I’ve gathered information or created information in my description of pillar location. Now when we’re talking about a wave function and somebody measuring the wave function, and we want to keep track of all of the information and so on, it gets rather tricky because there are questions about whose probabilities are we talking about, and whose observations and what are they observing. So we have to get really careful and technical about what sort of probabilities are being defined and whose they are, and how are they evolving.

When you read something like, “Information is preserved in the universe,” what that actually means is that if I take some description of the universe now and then I close my eyes and I evolve that description using the laws of physics, the information that my description had will be preserved. So the laws of physics themselves will not change the amount of information in that description.

But as soon as I open my eyes and look, it changes, because I just will observe something and I’ll see that I closed my eyes, the universe could have evolved into two different things. Now I open them and see which one it actually evolved into. Now I increased the information. I reduced the uncertainty. So it’s very, very subtle, the way in which the universe preserves information. The dynamics of the universe, the laws of physics, preserve the information that is associated with a description that you have of the world. There’s an incredible amount of richness there because that’s what’s actually happening. If you want to think about what reality is, that’s what reality is, and it’s the observers who are creating that description and observing that world and changing the description to match what they saw. Reality is a combination of those two things: the evolution of the world by the laws of physics, and the interaction of that with the person who or the whatever it is that is asking the questions and making the observations.

What’s very tricky is that unlike matter, information is not something that you can say, “I’ve got four bits of information here and five bits of information here, so I’m going to combine them and get nine bits of information.” Sometimes that’s true, but other times it’s very much not true. That’s what’s very, very, very tricky I think. So if I say I’ve got a die and I rolled a one with a 100% chance, that’s information. If I say I have a die and I rolled a two, or if I say I had a die and then rolled a three, all of those have information associated with them. But if I combine those in the sense that I say I have a die and I rolled a one and a two and a three and a four and a five and a six, then there’s no information associated with that.

All of the things happened, and so that’s what’s so tricky about it. It’s the same with the library of Babel. If I take every possibility on an equal footing, then none of them is special and there’s no information associated with that. If I take a whole bunch of special things and put them in a big pot, I just have a big mess and then there’s nothing special any more.

When I say something like, “The world is made out of information,” that means that it has different sort of properties than if it was made out of stuff. Because stuff … Like you take away some stuff and there’s less stuff. Or you divide the stuff in two and each half has half as much stuff. And information is not necessarily that way. And so if you have a bunch of information or a description of something and you take a subset of it, you’ve actually made more information even though there’s less that you’re talking about.

It’s different than the way we think about the makeup of reality when you think about it as made up of stuff, and has just very different properties that are somewhat counter-intuitive when we’re used to thinking about the world as being made up of stuff.

Lucas Perry: I’m happy that we have spent this much time on just discussing information, because I think that it offers an important conceptual shift for seeing the world, and a good challenging of some commonly held intuitions – at least, that I have. The question for me now is, what are the relevant and interesting implications here for agents? The one thing that had been coming to my mind is… and to inject more Zen here… there is a koan that goes something like: “first there were mountains and then there were no mountains, and then there were mountains.”  This seems to have parallels to the view that you’re articulating, because first you’re just stupefied and bought into the reality of your conceptualizations and stories where you say “I’m actually ultimately a human being, and I have a story about my life where I got married, and I had a thing called a job, and there were tables, which were solid and brown and had other properties…” But as you were saying, there’s no tableness or table in the wave function; these are all stories and abstractions which we use because they are functional or useful for us. And then when we see that we go, “Okay, so there aren’t really mountains in the way that I thought, mountains are just stories we tell ourselves about the wave function.”

But then I think it seems like you’re pointing out here again, there’s sort of this ethical or normative imperative where it’s like, “okay, so mountains are mountains again, because I need my concept and lived experience of a mountain to exist in the world, and to exist amongst human institutions and concepts and language, and even though I may return to this, this all may be viewed in a new light. Is this pointing in the right direction in your opinion?

Anthony Aguirre: I think in a sense, in that we think we’re so important, and the things around us are real, and then we realize as we study physics that actually, we’re tiny little blips in this potentially infinite or at least extremely large, somewhat uncaring-seeming universe, that the things that we thought are real are kind of fictitious, and partly made up by our own history and perceptions and things, that the table isn’t really real but it’s made up of atoms or wave function or what have you.

But then I would say, why do you attribute more realness to the wave function than the table? The wave function is a sort of very impoverished description of the world that doesn’t contain tables and things. So I think there’s this pathology of saying because something is described by fundamental physical mathematical laws, it’s more real than something like a table that is described by people talking about tables to other people.

There’s something very different about those things, but is one of them more real and what does that even mean? If the table is not contained in the wave function and the wave function isn’t really contained in the table, they’re just different things. They’re both, in my view, made out of information, but rather different types and accessible to rather different things.

To me, the, “Then I realized it was a mountain again,” is that yes, the table is kind of an illusion in a sense. It’s made out of atoms and we bring all this stuff to it and we make up solidity and brownness and stuff. So it’s not a fundamental part of the universe. It’s not objectively real, but then I think at some level nothing is so purely objectively real. It’s a sliding scale, and then it’s got a place for things like the wave function of the universe and the fundamental laws of physics at the more objective end of things, and brownness and solidity at the more subjective end of things, and my feelings about tables and my thirst for water at the very subjective end of things. But I see it as a sort of continuous spectrum, and that all of those things are real, just in somewhat different ways. In that sense, I think I’ve come back to those illusory things being real again in a sense, but just from a rather different perspective, if we’re going to be Zen about it.

Lucas Perry: Yeah, it seems to be an open question in physics and cosmology. There is still arguing now currently going on about what it means for something to be real. I guess I would argue that something is real if it maybe has causality or that causality would supervene upon that thing… I’m not even sure, I don’t think I’m even going to start here, I think I would probably be wrong. So…

Anthony Aguirre: Well, I think the problem is in trying to make a binary distinction between whether things are real or not or objective or not. I just think that’s the wrong way to think about it. I think there are things that are much more objective than other things, and things that are much less objective than other things, and to the extent that you want to connect real with being objective, there are then things that are more and less real.

In one of the koans in the book, I make this argument that we think of a mathematical statement like the Pythagorean theorem, say, or some other beautiful thing like Euler’s theorem relating exponentials to cosines and sines, that these are objective special things built into the universe, because we feel like once we understand these things, we see that they must have been true and existed before any people were around. Like it couldn’t be that the Pythagorean theorem just came into being when Pythagoras or someone else discovered it, or Euler’s theorem. They were true all the way back until before the first stars and whatnot.

And that’s clearly the case. There is no time at which those things became true. At the same time, suppose I just take some axioms of mathematics that we employ now, and some sort of rules for generating new true statements from them. And then I just take a computer and start churning out statements. So I churn out all possible consequences of those axioms. Now, if I let that computer churn long enough, somewhere in that string of true statements will be something that can be translated into the Pythagorean theorem or Euler’s theorem. It’s in there somewhere. But am I doing mathematics? I would say I’m not, in the sense that all I’m doing is generating an infinite number of true statements if I let this thing go on forever.

But almost all of them are super uninteresting. They’re just strings of gobbledygook that are true given the axioms and the rules for generating new true statements, but they don’t mean anything. Whereas Euler’s theorem is a very, very special statement that means something. So what we’re doing when we’re doing mathematics, we feel like what we’re doing is proving stuff to be true. And we are at some level, but I think what we’re really doing from this perspective is out of this catalog that is information-free of true statements, we’re picking out a very, very special subset that are interesting. And in making that selection, we’re once again creating information. And the information that we’re creating is really what we’re doing, I think, when we’re doing mathematics.

The information contained in the statement that the Pythagorean theorem is an interesting theorem that applies to stuff in the real world and that we should teach our kids in school, that only came into being when humans did. So although the statement has always been true, the information I think was created along with humans. So I think you kind of get to have it both ways. It is built into the universe, but at the same time, it’s created, so you discover it and you create it.

I think there’s a lot of things that are that way. And although the Pythagorean theorem feels super objective, you can’t disagree with the Pythagorean theorem in a sense, we all agree on it once we understand what it is, at the same time, it’s got this subjective aspect to it that out of all the theorems we selected, this particular one of interest … We also selected the axioms by the way, out of all different sets of axioms we could have chosen. So there’s this combination of objectivity and the subjectivity that we as humans that like to do geometry and think about the world and prove theorems and stuff have brought to it. And that combination is what’s created the information that is associated with the Pythagorean theorem.

Lucas Perry: Yeah. You threw the word “subjectivity” there, but this process is bringing us to the truth, right? I mean, the question is again, what is true or real?

Anthony Aguirre: There are different senses of subjectivity. So there’s one sense of having an interior world view, having consciousness or awareness or something like that, being a subject. And there’s another of saying that its perspectival, that it’s relative or something, that different agents might not agree on it or might see it a little bit differently. So I’d want to distinguish between those two.

Lucas Perry: In which sense did you mean?

Anthony Aguirre: What I mean is that the Pythagorean theorem is quite objective in the sense that once lots of agents agree on the premises and the ground rules, we’re all going to agree on Pythagorean theorem. Whereas we might not agree on whether ice cream is good, but it’s still a little bit not objective.

Lucas Perry: It’s like a small part of all possible mathematically true statements which arise out of those axioms.

Anthony Aguirre: Yes. And that some community of agents in a historical process had to select that out. It can’t be divorced from the process and the agents that brought it into being, and so it’s not entirely objective in that sense.

Lucas Perry: Okay. Yeah, yeah, that makes sense. I see. So this is a question I was intending on asking you an hour ago before we went down this wormhole, first I’m interested in just the structure of your book. How do you structure your book in terms of the ideas and what leads to what?

Anthony Aguirre: Just a brief outline of the book: there are a few different layers of structure. One is the koans themselves, which are sort of parables or little tales that encode some idea. There’s maybe a metaphor or just the idea itself, and the koans take place as part of a narrative that takes place starting in 1610 or 1630 or so, in a trip from Italy to in the end, Kyoto. So this across the world journey that takes place through these koans. And they don’t come in chronological order, so you kind of have to piece together the storyline as the book goes on. But it kind of comes together in the end, so there’s a sequence of things that are happening through the koans, and there’s a storyline that you get to see assemble itself and it involves a genie and it involves a sword fight and it involves all kinds of fun stuff.

That’s one layer of the structure, is the koans forming the narrative. Then after each koan is a commentary that’s kind of delving into the ideas, providing some background, filling in some physics, talking about what that koan was getting at. And in some cases, it’s kind of a resolution to it, like here’s the paradox and here’s the resolution to that paradox. But more often, it’s here’s the question, here’s how to understand what that question is really asking. Here’s a deeper question that we don’t know the answer to, and maybe we’ll come back to later in the book or maybe we won’t. So there’s kind of this development of a whole bunch of physics ideas that are going on in those commentaries.

In terms of the physics ideas, there’s a sequence. There’s first classical physics including relativity. The second part is quantum mechanics, essentially. The third part is statistical mechanics and information theory. The fourth part is cosmology. The fifth part is the connections to the interior sense, like subjectivity and the subject and experiments and thinking about interior sense and consciousness and the eye. And then the last part is a sort of more philosophical section, bringing things together in the way that we’ve been discussing, like how much of reality is out there, how much of it is constructed by us, or us as us writ large as a society and thinking beings and biological evolution and so on. So that’s kind of the structure of the book.

Lucas Perry: Can you read for us two of your favorite koans in the book?

Anthony Aguirre: This one alludes to a classic philosophical thought experiment of the ship of Theseus. This one’s called What Is It You Sail In? It takes place in Shanghai, China in 1620. “After such vast overland distances, you’re relieved that the next piece of your journey will be at sea, where you’ve always felt comfortable. Then you see the ship. You’ve never beheld a sorrier pile of junk. The hull seems to be made mostly of patches, and the patches appear to be made of other patches. The nails look nailed together. The sails are clearly mostly a quilt of canvas sacks and old clothing. ‘Does it float?’ you ask the first mate, packing in as much skepticism as you can fit. ‘Yes. Many repairs, true. But she is still my good companion, [Atixia 00:25:46], still the same ship she ever was.’

Is she?, you wonder. Then you look down at your fingernails, your skin, the fading scar on your arm and wonder, am I? Then you look at the river, the sea, the port and all around. Is anything?”

So what this one’s getting at is this classic tale where if you replace one board of a ship, you’d still say it’s the same ship; you’ve just replaced one little piece of it. But as you replace more and more pieces of it, at some point, every piece of the ship might be a piece that wasn’t there before. So is it the same ship or it’s not? Every single piece has been replaced. And our body is pretty much like this; on a multi-year timescale, we replace pretty much everything.

The idea of this is to get at the fact that when we think of a thing like an identity that something has, it’s much more about the form and I would say the information content in a sense, than about the matter that it’s made up of. The matter’s very interchangeable. That’s sort of the way of kicking off a discussion of what does it mean for something to exist? What is it made of? What does it mean for something to be different than another thing? What are the different forms of existence? What is the form versus the matter?

And with the conclusion that at some level, the very idea of matter is a bit of an illusion. There’s kind of form in the sense that when you think of little bits of stuff, and you break those little bits of stuff down farther, you see that there are protons and electrons and neutrons and whatnot, but what those things are, they’re not little bits of stuff. They’re sort of amounts or properties of something. Like we think of energy or mass as a thing, but it’s better to think of it as a property that something might have if you look.

The fact that you have an electron really means that you’ve got something with a little bit of the energy property or a little bit of the mass property, a little bit of the spin property, a little bit of the electron lepton number property, and that’s it. And maybe you talk about its position or its speed or something. So it’s more like a little bundle of properties than a little bundle of stuff. And then when you think of agglomerations of atoms, it’s the same way. Like the way that they’re arranged is a sort of informational thing, and questions you can ask and get answers to.

Going back to our earlier conversation, this is just a slightly more concrete version of the claim that when we say what something’s made of, there are lots of different answers to that question that are useful in different ways. But the answer that it’s made of stuff is maybe not so useful as we usually think it is.

Lucas Perry: So just to clarify for listeners, koans in Zen traditionally are supposed to be not explicitly philosophically analytical, but experiential things which are supposed to subvert commonly held intuitions which may take you from seeing mountains as mountains, to no mountains, to mountains again. So here there’s this perspective that there’s both supposedly the atoms which make up me and you, and then the way in which the atoms are arranged, and then this koan that you say elicits the thought that you can remove any bit of information from me, and you can continue to move one bit of information from me at a time, and there’s no one bit of information that I would say is essential to what I call Lucas, or what I take to be myself. Nor atoms. So then what am I? How many atoms or bits of information do you have to take away from me until I stop being Lucas? And so one may arrive at the place where you’re deeply questioning the category of Lucas altogether.

Anthony Aguirre: Yeah. The things in this book are not Zen koans in the sense that a lot of them are pretty philosophical and intellectual and analytical, which Zen koans are sort of not. But at the same time, when you delve into them and try to experience them, when you think not of the abstract idea of the ship in this koan and lepton numbers and energy and things like that, but when you apply it to yourself and think, okay, what am I if I’m not this body?, then it becomes a bit more like a genuine Zen koan. You’re sort of like, ah, I don’t know what I am. And that’s a weird place to be. I don’t know what I am.

Lucas Perry: Yeah. Sure. And the wisdom to be found is the subversion of a ton of different commonly held intuitions, which are evolutionarily conditioned, which are culturally conditioned and socially conditioned. So yeah, this has to do with the sense of permanent things and objects, and then what identity ultimately is, or what our preferences are about identity, or if there are normative or ethical imparitives about the sense of identity that we out to take. Are there any other ideas here for some other major intuitions that you’re attempting to subvert in your book?

Anthony Aguirre: Well yeah, there’s … I guess it depends which ones you have, but I’ve subverted as many as I can. I mean, a big one I think is the idea of a sort of singular individual self, and that’s one that is really interesting to experiment with. The way we go through our lives pretty much all the time is that there’s this one-to-one correspondence between our feeling that we’re an individual self looking out at the world, there’s an “I”. We feel like there’s this little nugget of me-ness that’s experiencing the world and owns mental faculties, and then owns and steers around this body that’s made out of physical stuff.

That’s the intuition that we go through life with, but then there are all kinds of thought experiments you can do that put tension on that. And one of them that I go through a lot in the book is what happens when the body gets split or duplicated, or there are multiple copies of it and things like that. And some of those things are physically impossible or so extraordinarily difficult that they’re not worth thinking about, but some of them are very much things that might automatically happen as part of physics, if we really could instantaneously copy a person and create a duplicate of them across the room or something like that.

What does that mean? How do we think about that? When we’ve broken that one-to-one correspondence between the thing that we like to think of as ourself and our little nugget of I-ness, and the physical body, which we know is very, very closely related to that thing. When one of them bifurcates into two, it kind of throws that whole thing up in the air, like now what do we think? And it gets very unsettling to be confronted with that. There are several koans investigating that at various different levels that don’t really draw any conclusions, I would say. They’re more experiments that I’m sort of inviting other people to subject themselves to, just as I have thinking about them.

It’s very confusing how to think about them. Like, should I care if I get copied to another copy across the room and then get instantaneously destroyed? Should that bother me? Should I fear that process? What if it’s not across the room, but across the universe? And what if it’s not instantaneously that I appear across the room, but I get destroyed now, and I exist on the other side of the universe a billion years from now, the same configuration of atoms? Do I care that that happens? There are no easy answers to this, I think, and they’re not questions that you can easily dismiss.

Lucas Perry: I think that this has extremely huge ethical implications, and represents, if transcended, an important point in human evolution. There is this koan, which is something like, “If you see the Buddha on the road, kill him.” Which means if you think you’ve reached something like enlightenment, it’s not that, because enlightenment is another one of these stories. But insofar as human beings are capable of transcending illusions and reaching anything called enlightenment… I think that an introspective journey into trying to understand the self and the world is one of the most interesting pursuits a human being can do. And just to contextualize this and, I think, paint the picture better, it’s evolution that has evolved these information processing systems, with this virtual sense of self that exists in the world model we have, and the model we have about ourselves and our body, and this is because this is good for self preservation. 

So you can say, “Where do you feel you’re located?” Well I sort of feel I’m behind my face and I feel I have a body and I have this large narrative of self concept and identity, which is like, “OI’m Lucas. I’m from here. I have this concept of self which I’ve created, which is basically this extremely elaborative connotative web of all the things which I think make up my identity. And under scrutiny, this is basically just all conditioned, it’s all outside of myself, all prior to myself, I’m not self-made at all, yet I think that I’m some sort of self separate entity. And then comes along Abrahamic religions at some point in the story of humanity, which are going to have tremendous cultural and social implications on the way that evolution has already bred ego-primates like ourselves. We’re primates with egos and now we have Abrahamic religions, which are contributing to this problem by conditioning the language and philosophy and thought of the West, which say that ultimately you’re a soul, you’re not just a physical thing.

You’re actually a soul who has a body and you’re basically just visiting here for a while, and then the thing that is essentially you will go to the next level of existence. This leads to, I think, reifying this rational conceptualization of self and this experience itself. Where you feel like you have a body, you feel that your heart beats itself, you feel that think your thoughts and you say things like, “I have a brain.” Who is it that stands in relation to the brain? Or we might say something like, “I have a body.” Who is it that has a body? So it seems like our language is clearly conditioned and structured around our sense and understanding of self. And there’s also this sense in which you’ve been trying to subvert some sorts of ideas here, like sameness or otherness, what counts as the same ship or not. And from an ultimate physics perspective, the thing that is fusing the stars is the same thing that is thinking my thoughts. The fundamental ontology of the world is running everything, and I’m not separate from that, yet if feels like I am, and this seems to have tremendous ethical implications.

For example, people believe that people are deserving of retribution for crimes or acting immorally, as if they had chosen in some ultimate and concrete sense what to do. The ultimate spiritual experience, or at least the ultimate insight, is to see this whole thing for what it is, to realize that basically everyone is spell bound by these narratives of self, and these different intuitions we have about the world, and that we’re basically bought into this story that I think Abrahamic religions have led to a deeper conditioning in us. It seems to me that atheists also experience themselves this way. We think when we die there’ll be nothing, there will just be an annihilation of the self, but part of this realization process is that there’s no self to be annihilated to begin with. There’s just consciousness and its contents, and ultimately by this process you may come to see that consciousness is something empty of self and empty of identity. It’s just another thing that is happening.

Anthony Aguirre: I think there are a lot of these cases where the mountain becomes less then more of a mountain and then more and less of a mountain. You touched upon consciousness and free will and many other things that are also in this, and there’s a lot of discussion of free will in the book and we can get into that too. I think with consciousness or the self, I find myself in this strange sort of war in the sense that, on the one hand I feel like there’s a sense in which this self that we construct, is kind of an illusionary thing and that the ego and things that we attach to, is kind of an illusionary thing. But at the same time, A, it sure feels real and the feeling of being Anthony, I think is a kind of unique thing.

I don’t subscribe to the notion that there’s this little nugget of soul stuff that exists at the core of a person. It’s easy to sort of make fun of this, but at the same time I think the idea that there’s something intrinsically equally valuable to each person is really, really important. I mean it underlies a lot of our way of thinking about society and morality, in ways that I find very valuable. And so while I kind of doubt the sort of metaphysics of the individual’s soul in that sense, I worry what happens to the way we’ve constructed our scheme of values. If we grade people on a sliding scale, you’re more valuable than this other person. I think that sense of equal intrinsic human worth is incredibly crucial and has led to a lot of moral progress. So I have this really ambivalent feeling, in that I doubt that there’s some metaphysical basis for that, but at the same time I really, really value that way of looking at the self, in terms of society and morality and so on, that we’ve constructed on top of that.

Lucas Perry: Yeah, so there’s the concept in zen Buddhism of skillful means. So one could say that the concept of each human being having some kind of equal and intrinsic worth, which is related to their uniqueness and fundamental being as being a human being, that that is skillful. 

Anthony Aguirre: It’s not something that in some sense makes any rational sense. Whatever you name, some people have more of it than others. Money, capability, intelligence, sensitivity.

Lucas Perry: Even consciousness.

Anthony Aguirre: Consciousness maybe. Maybe some people are just a lot more conscious than others. If we can measure it, maybe some people would be like a 10 on the dial and others would be 2. Who knows?

Lucas Perry: I think that’s absolutely probably true, because some people are brain dead. Medically there’s a sliding scale of brain activity, so yeah, I think today it seems clear that some people are more conscious than others.

Anthony Aguirre: Yes, that’s certainly true. I mean when we go to sleep, we’re less conscious. But nonetheless, although anything that you can measure about people and their experience of the world varies and if you could quantify it on a scale, some people would have more and less. Nonetheless, we find it useful to maintain this idea that there is some intrinsic equality among people and I worry what would happen if we let go of that. What kind of world would we build without that assumption? So I find it valuable to keep that assumption, but I’m conflicted about that honestly, because on what basis do we make that assumption? I really feel good about it, but I’m not sure I can point to why. Maybe that’s just what we do. We say this is an axiom that we choose to believe that there’s an intrinsic moral value to people and I respect that, because I think you have to have axioms. But it’s an interesting place that we’ve come to, I think in terms of the relation between our beliefs about reality and our beliefs about morality.

Lucas Perry: Yeah. I mean there’s the question, as we approach AI and super intelligence, of what authentic experiential and ethical enlightenment and idealization means. From my perspective the development of this idea, which is correlated with the enlightenment and humanism, right? Is a very recent thing, the 17 and the 1800’s, right? So it seems clear from a cosmological context that this norm or ethical view is obviously based on a bunch of things that are just not true, but at the same time it’s been ethnically very skillful and meaningful for fixing many of the immoral things that humans do, that are unethical. But obviously it seems like it will give way to something else, and the question is, is what else does it give way to?

So if we create Life 3.0 and we create AI’s that do not care about getting turned off for two minutes and then waking up again, because they don’t feel the delusion of a self. That to me seems to be a step in moral evolution, and why I think that ultimately it would be super useful for AI design, if the AI designers would consider the role that identity plays in forming strong AI systems that are there to help us. We have the opportunity here to have selfless AI systems, they’re not going to be confused like we are. They’re not going to think they have souls, or feel like they have souls, or have strong senses of self. So it seems like there’s opportunities here, and questions around what it means to transcend many of the aspects of human experience, and how best it would be to instantiate that in advanced AI systems. 

Anthony Aguirre: Yeah, I think there’s a lot of valuable stuff to talk about there. In humans, there are a whole bunch of things that go together that don’t necessarily have to be packaged together. Intelligence and consciousness are packaged together, it’s not clear to what degree those have to be. It’s not clear how much consciousness and selfness have to be packaged together. It’s not clear how much consciousness or selfness and a valence to consciousness, a positive or negative experience have to be packaged together. Could we conceive of something that is intelligent, but not conscious? I think we certainly could, depending on how intelligent it has to be. I think we have those things and depending on what we mean by consciousness, I guess. Can we imagine something that is conscious and intelligent, but without a self, maybe? Or conscious, but it doesn’t matter to it how something goes. So it’s something that’s conscious, but can’t really have a moral weight in the sense that it doesn’t either suffer or experience positive feelings, but it does experience.

I think there’s often a notion that if something is said to have consciousness, then we have to care about it. It’s not totally clear that that’s the case and at what level do we have to care about somethings preferences? The rain prefers to fall down, but I don’t really care and if I frustrate the rain by putting up an umbrella, I don’t feel bad about that. So at what level do preferences matter and how do we define those? So there are all these really, really interesting questions and what’s both sort of exciting and terrifying, is that we have a situation in which those questions are going to play out. In that we’re going to be creating things that are intelligent and we’re doing that now depending on how intelligent they have to be again. That may or may not be conscious, that may or may not have preferences, may or may not matter. They may or may not experience something positive or negative when those preferences are satisfied or not.

And I think we have the possibility of both moral catastrophe if we do things wrong at some level, but an enormous opportunity as well, in the sense that you’ve pointed out that we may be able to create agents that are purely selfless and insofar as other beings have a moral value. These beings can be absolute altruists, like Stewart has been pointing out in his book. Absolute altruism is a pretty tough one for humans to attain, but might be really easy for beings that we construct that aren’t tied to an evolutionary history and all those sorts of things that we came out of.

It may still be that the sort of moral value of the universe centers around the beings that do have meaningful preferences, like humans. Where meaning sort of ultimately sits, what is important and what’s not and what’s valuable and what’s not. If that isn’t grounded in the preferences of experiencing conscious beings, then I don’t know where it’s grounded, so there’s a lot of questions that come up with that. Does it just disappear if those beings disappear and so on? All incredibly important questions I think, because we’re now at the point in the next however many years, 50, 100, maybe less, maybe more. Where our decisions are going to affect what sorts of beings the universe gets inhabited by in the far future and we really need to avoid catastrophic blunders in how that plays out.

Lucas Perry: Yeah. There this whole aspect of AI alignment that you’re touching on, that is not just AI alignment, but AI generation and creation. The problem has been focused on how we can get AI systems, in so far as we create them, to serve the needs of human beings, to understand our preference hierarchies, to understand our metapreferences. But in the creation of Life 3.0, there’s this perspective that you’re creating something who, by virtue of how it is created, it is potentially more morally relevant than you, it may be capable of much more experience, much more profound levels of experience, which also means that there’s this aspect of AI alignment which is about qualia architecting or experience architecting or reflecting on the fact that we’re building Life 3.0. These aren’t just systems that can process information for us, there are important questions about what it is like to be that system in terms of experience and ethics and moral relevance. If you create something with the kind of experience that you have, and it has the escape velocity to become super intelligent and populate the cosmic endowment with whatever it determines to be the good, or what we determine to be the good, what is the result of that?

One last thing that I’m nervous about is that the way that the illusion of self will contribute to a fair and valuable AI alignment. This consideration is in relation to us not being able to see what is ultimately good. We could ultimately be tied up in the preservation of our own arbitrary identities, like the Lucas identity or the Anthony identity. You could be creating something like blissful, purely altruistic, benevolent Boddhisattva gods, but we never did because we had this fear and this illusion of self-annihilation. And that’s not to deny that our information can be destroyed, and maybe we care a lot about the way that the Lucas identity information is arranged, but when we question these types of intuitions that we have, it makes me question and wonder if my conditioned identity is actually as important as I think it is, or as I experience it to be.

Anthony Aguirre: Yeah, I think this is a very horrifyingly thorny question that we have to face and my hope is that we have a long time to face it. I’m very much an advocate of creating intelligent systems that can be incredibly helpful and economically beneficial and then reaping those benefits for a good long time while we sort ourselves out. But with a fairly strict upper limit on how intelligent and powerful we make those things. Because I think if huge gains in the capability of machine systems happens in a period of years or even decades, the chance of us getting these big questions right, seems to me like almost zero. There’s a lot of argumentation about how difficult is it to build a machine system that has the same sort of general intelligence that we do. And I think part of what makes that question hard, is thinking about the huge amount of effort that went in evolutionarily and otherwise to creating the sort of robust intelligence that humans have.

I mean we’ve built up over millions of years in this incredibly difficult adversarial environment, where robustness is incredibly important. Cleverness is pretty important, but being able to cope with a wide variety of circumstances is kind of what life and mind has done. And I think the degree to which AGI will be difficult, is at some level the degree to which it has to attain a similar level of generality and robustness, that we’ve spent just an ungodly amount of computation over the evolution of life on earth to attain. If we have to do anything like that level of computation, it’s going to take just an extraordinarily long time. But I think we don’t know to what degree all of that is necessary and to what degree we can really skip over a lot of it, in the same way that we skip over a lot of evolution of flying when we build an airplane.

But I think there’s another question, which is that of experience and feeling that were even more clueless as to where we would possibly start. If we wanted to create an appreciation for music, you have no clue where to even begin with that question, right? What does it even mean to appreciate or listen to, in some sense have preferences. You can maybe make a machine that will sort different kinds of music into different categories, but do you really feel like there’s going to be any music appreciation in there or in any other human feeling? These are things that have a very, very long, complicated evolutionary history and it’s really unclear to me that we’re going to get them in machine form without something like that. But at least as our moral system is currently construed, those are the things that actually matter.

Whether conscious beings are having a good time, is pretty much the foundation of what we consider to be important, morally speaking at least. Unless we have ideas like we have to do it with a way to please some deity or something like that. So I just don’t know, when you’re talking about future AI beings that have a much richer and deeper interior sense, that’s like the AGI problem squared. We can at least imagine what it’s like to make a general intelligence, an idea of what it would take to do that. But when you talk about creating a feeling being, with deeper, more profound feelings that we have, just no clue what that means in terms of actually engineering or something.

Lucas Perry: So putting on the table all of the moral anti-realism considerations and thought that many people in the AI alignment community may have… Their view is that there’s the set of the historically conditioned preferences that we have and that’s it. We can imagine if horshoecrabs had been able to create a being more intelligent than them, a being that was aligned to horshoecrabs preferences and preference hierarchy. And we can imagine that the horseshoecrabs were very interested and committed to just being horseshoecrabs, because that’s what horseshoecrab wants to do. So now you have this being that was able to maintain it’s own existential condition of the horseshoecrab for a very long time. That just seems like an obvious moral catastrophe. It seems like a waste of what could have been.

Anthony Aguirre: That’s true. But if you imagine that the horseshoe crabs, instead creating elaborate structures out of sand, that they decided we’re their betters and we’re like, this is their legacy was to create these intricate sand structures, because the universe deserves to be inhabited by these much greater beings than them. Then that’s also a moral catastrophe, right? Because the sand structures have no value whatsoever.

Lucas Perry: Yeah. I don’t want humans to do any of these things. I don’t want human beings to go around building monuments, and I don’t want us to lock in to the human condition either. Both of these cases obviously seem like horrible waste, and now you’re helping to articulate the issue that human beings are at a certain place in evolution. 

And so if we’re to create Life 3.0, then it’s also unclear epistemically how we are to evaluate what kinds of exotic qualia states are the kinds that are morally good, and I don’t even know how to begin to answer that question.

So we may be unaware of experiences that literally astronomically better than the kinds of experiences that we have access to, and it’s unclear to me how you would navigate effectively towards that, other than amplifying what we already have.

Anthony Aguirre: Yeah. I guess my instinct on that is to look more on the biology side then the machine side and to say as biological systems, we’re going to continue to evolve in various ways. Some of those might be natural, some of them might be engineered and so on. Maybe some of them are symbiotic, but I think it’s hard for me to imagine how we’re going to have confidence that the things that are being created have an experience that we would recognize or find valuable, if they don’t have some level of continuity with what we are, that we can directly experience. The reason I feel confidence that my dog is actually feeling some level of joy or frustration or whatever, is really by analogy, right? There’s no way that I can get inside the dog’s mind, maybe someday there will be, but there’s no way at the moment. I assume that because we have this common evolutionary heritage, that the outward manifestations of those feelings correspond to some inward feelings in much the same way that they do in humans and much the same the way that they do in me. And I feel quite confident about that really, although for a long period of history, people have believed otherwise at times.

So I think realistically all we’re going to be able to do, is reason by analogy and that’s not going to work very well I think with machine systems, because it’s quite clear that we’ll be able to create machine systems that can wag their tails and smile and things, even though there’s manifestly nothing behind that. So at what point we would start to believe the sort of behavioral cues and say that there’s some interior sense behind that, is very, very unclear when we’re talking about a machine system. And I think we’re very likely to make all kinds of moral errors in either ascribing too much or too little interior experience to machines, because we have no real way of knowing to make any meaningful connection between those things. I suspect that we’ll tend to make the error in both directions. We’ll create things that seem kind of lifelike and attribute all kinds of interior life to them that we shouldn’t and if we go on long enough, we may well create things that have some interior sense that we don’t attribute to them and make all kinds of errors that way too.

So I think it’s quite fraught actually in that sense and I don’t know what we’re going to do about that. I mean we can always hope that the intractably hard problems that we can’t solve now, will just be solved by something much smarter than us. But I do worry a little bit about attributing sort of godlike powers to something by saying, “Oh, it’s super intelligent, so it will be able to do that.” I’m not terribly optimistic. It may well be that the time at which something is so intelligent that it can solve the problem of consciousness and qualia and all these things, it’d be so far beyond the time at which it was smart enough to completely change reality in the world and all kinds of other things. That it’s almost past the horizon of what we can think about now, it’s sort of past the singularity in that sense. We can speculate, hopefully or not hopefully, but it’s not clear on what basis we would be speculating.

Lucas Perry: Yeah. At least the questions that it will need to face, and then we can leave it open as to whether or not and how long it will need to address those questions. So we discussed who I am, I don’t know. You touched on identity and free will. I think that free will in the libertarian sense, as in I could have done otherwise, is basically one of these common sense intuitions that is functionally useful, but ultimately illusory.

Anthony Aguirre: Yeah, I disagree. I will just say briefly, I prefer to think of free will as a set of claims that may or may not be true. And I think in general it’s useful to decompose the question of free will into a set of claims that may or may not be true. And I think when you do that, you find that most of the claims are true, but there may be some big fuzzy metaphysically thing that you’re equating to that set of claims and then claiming it’s not true. So that’s my feeling, that when you actually try to operationalize what you mean by free will, you’ll find that a lot of the things that you mean actually are properties of reality. But if you sort of invent a thing that you call free will, that’s by its nature can’t be part of a physical world, then yes, that doesn’t exist. In a nutshell that’s my point of view, but we could go into a lot more depth some other time.

Lucas Perry: I think I understand that from that short summary. So for this last part then, can you just touch on, because I think this is an interesting point, as we come to the end of the conversation. Form is emptiness, emptiness is form. What does that mean?

Anthony Aguirre: So form is emptiness, is coming back to the discussion of earlier. That when we talk about something like a table, that thing that we call real and existing and objective in some sense, is actually composed of all kinds of ingredients that are not that thing. Our evolutionary history and our concept of solidity and shape, all of these things come together from many different sources and as the Buddhist would say, “There’s no intrinsic self existence of a table.” It very much exists relative to a whole bunch of other things, that we and many other people and processes and so on, bring into being. So that’s the form is emptiness. The emptiness is the emptiness of an intrinsic self existence, so that’s the way that I view the form is emptiness.

But turning that around, that emptiness is form, is yes, even though the table is empty of inherit existence, you can still knock on it. It’s still there, it’s still real and it’s in many ways as real as anything else. If you look for something that is more intrinsically existing than a table, you’re not really going to find it and so we might as well call all of those things real, in which case the emptiness is form again, it’s something. That’s the way I sort of view it and that’s the way that I’ve explored it in that section of the book.

 So to talk about like the ship, that there’s this form of the ship that is kind of what we call the ship. That’s the arrangement of atoms and so on, it’s kind of made out of information and whatnot. That that form is empty in the sense that there are all these ingredients, that come from all these different places that come together to make that thing, but then that doesn’t mean it’s non-existent or meaningless or something like that. That there very much is meaning in the fact that something is a ship rather than something else, that is reality. So that’s kind of the case that I’m putting together in that last section of the book. It’s not so simply either, our straight forward sense of a table as a real existing thing, nor is it, everything is an illusion. It’s like a dream, it’s like a phantasm, nothing is real. Neither of those is the right way to look at it.

Lucas Perry: Yeah, I think that your articulation here brings me again back, for better or for worse, to mountains, no mountains, and mountains again. I came into this conversation with my conventional view of things, and then there’s “form is emptiness.” Oh so okay, so no mountains. But then “emptiness is form.” Okay, mountains again. And given this conceptual back and forth, you can decide what to do from there.

Anthony Aguirre: So have we come back to the mountain in this conversation, at this point?

Lucas Perry: Yeah. I think we’re back to mountains. So I tremendously valued this conversation and feel that it’s given me a lot to consider. And I will re-enter the realm of feeling like a self and inhabiting a world of chairs, tables, objects and people. And will have to engage with some more thinking about information theory. And with that, thank you so much.

 

AI Alignment Podcast: Human Compatible: Artificial Intelligence and the Problem of Control with Stuart Russell

Stuart Russell is one of AI’s true pioneers and has been at the forefront of the field for decades. His expertise and forward thinking have culminated in his newest work, Human Compatible: Artificial Intelligence and the Problem of Control. The book is a cornerstone piece, alongside Superintelligence and Life 3.0, that articulates the civilization-scale problem we face of aligning machine intelligence with human goals and values. Not only is this a further articulation and development of the AI alignment problem, but Stuart also proposes a novel solution which bring us to a better understanding of what it will take to create beneficial machine intelligence.

 Topics discussed in this episode include:

  • Stuart’s intentions in writing the book
  • The history of intellectual thought leading up to the control problem
  • The problem of control
  • Why tool AI won’t work
  • Messages for different audiences
  • Stuart’s proposed solution to the control problem

Key points from Stuart: 

  •  “I think it was around 2013 that it really struck me that in fact we’d been thinking about AI the wrong way all together. The way we had set up the whole field was basically kind of a copy of human intelligence in that a human is intelligent, if their actions achieve their goals. And so a machine should be intelligent if its actions achieve its goals. And then of course we have to supply the goals in the form of reward functions or cost functions or logical goals statements. And that works up to a point. It works when machines are stupid. And if you provide the wrong objective, then you can reset them and fix the objective and hope that this time what the machine does is actually beneficial to you. But if machines are more intelligent than humans, then giving them the wrong objective would basically be setting up a kind of a chess match between humanity and a machine that has an objective that’s across purposes with our own. And we wouldn’t win that chess match.”
  • “So when a human gives an objective to another human, it’s perfectly clear that that’s not the sole life mission. So you ask someone to fetch the coffee, that doesn’t mean fetch the coffee at all costs. It just means on the whole, I’d rather have coffee than not, but you know, don’t kill anyone to get the coffee. Don’t empty out my bank account to get the coffee. Don’t trudge 300 miles across the desert to get the coffee. In the standard model of AI, the machine doesn’t understand any of that. It just takes the objective and that’s its sole purpose in life. The more general model would be that the machine understands that the human has internally some overall preference structure of which this particular objective fetch the coffee or take me to the airport is just a little local manifestation. And machine’s purpose should be to help the human realize in the best possible way their overall preference structure. If at the moment that happens to include getting a cup of coffee, that’s great or taking him to the airport. But it’s always in the background of this much larger preference structure that the machine knows and it doesn’t fully understand. One way of thinking about is to say that the standard model of AI assumes that the machine has perfect knowledge of the objective and the model I’m proposing assumes that the model has imperfect knowledge of the objective or partial knowledge of the objective. So it’s a strictly more general case.”
  • “The objective is to reorient the field of AI so that in future we build systems using an approach that doesn’t present the same risk as the standard model… That’s the message I think for the AI community is the first phase our existence maybe should come to an end and we need to move on to this other way of doing things. Because it’s the only way that works as machines become more intelligent. We can’t afford to stick with the standard model because as I said, systems with the wrong objective could have arbitrarily bad consequences.”

 

Important timestamps: 

0:00 Intro

2:10 Intentions and background on the book

4:30 Human intellectual tradition leading up to the problem of control

7:41 Summary of the structure of the book

8:28 The issue with the current formulation of building intelligent machine systems

10:57 Beginnings of a solution

12:54 Might tool AI be of any help here?

16:30 Core message of the book

20:36 How the book is useful for different audiences

26:30 Inferring the preferences of irrational agents

36:30 Why does this all matter?

39:50 What is really at stake?

45:10 Risks and challenges on the path to beneficial AI

54:55 We should consider laws and regulations around AI

01:03:54 How is this book differentiated from those like it?

 

Works referenced:

Human Compatible: Artificial Intelligence and the Problem of Control

Superintelligence

Life 3.0

Occam’s razor is insufficient to infer the preferences of irrational agents

Synthesizing a human’s preferences into a utility function with Stuart Armstrong

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas: Hey everyone, welcome back to the AI Alignment Podcast. I’m Lucas Perry and today we’ll be speaking with Stuart Russell about his new book, Human Compatible: Artificial Intelligence and The Problem of Control. Daniel Kahneman says “This is the most important book I have read in quite some time. It lucidly explains how the coming age of artificial super intelligence threatens human control. Crucially, it also introduces a novel solution and a reason for hope.”

Yoshua Bengio says that “This beautifully written book addresses a fundamental challenge for humanity: increasingly intelligent machines that do what we ask, but not what we really intend. Essential reading if you care about our future.”

I found that this book helped clarify both intelligence and AI to me as well as the control problem born of the pursuit of machine intelligence. And as mentioned, Stuart offers a reconceptualization of what it means to build beneficial and intelligent machine systems. That provides a crucial place of pivoting and how we ought to be building intelligent machines systems.

Many of you will already be familiar with Stuart Russell. He is a professor of computer science and holder of the Smith-Zadeh chair in engineering at the University of California, Berkeley. He has served as the vice chair of the World Economic Forum’s Council on AI and Robotics and as an advisor to the United Nations on arms control. He is an Andrew Carnegie Fellow as well as a fellow of the Association for The Advancement of Artificial Intelligence, the Association for Computing Machinery and the American Association for the Advancement of Science.

He is the author with Peter Norvig of the definitive and universally acclaimed textbook on AI, Artificial Intelligence: A Modern Approach. And so without further ado, let’s get into our conversation with Stuart Russell.

Let’s start with a little bit of context around the book. Can you expand a little bit on your intentions and background for writing this book in terms of timing and inspiration?

Stuart: I’ve been doing AI since I was in high school and for most of that time the goal has been let’s try to make AI better because I think we’ll all agree AI is mostly not very good. When we wrote the first edition of the textbook, we decided to have a section called, What If We Do Succeed? Because it seemed to me that even though everyone was working on making AI equivalent to humans or better than humans, no one was thinking about what would happen if that turned out to be successful.

So that section in the first edition in 94 was a little equivocal, let’s say, you know, we could lose control or we could have a golden age and let’s try to be optimistic. And then by the third edition, which was 2010 the idea that we could lose control was fairly widespread, at least outside the AI communities. People worrying about existential risk like Steve Omohundro, Eliezer Yudkowsky and so on.

So we included those a little bit more of that viewpoint. I think it was around 2013 that it really struck me that in fact we’d been thinking about AI the wrong way all together. The way we had set up the whole field was basically kind of a copy of human intelligence in that a human is intelligent, if their actions achieve their goals. And so a machine should be intelligent if its actions achieve its goals. And then of course we have to supply the goals in the form of reward functions or cost functions or logical goals statements. And that works up to a point. It works when machines are stupid. And if you provide the wrong objective, then you can reset them and fix the objective and hope that this time what the machine does is actually beneficial to you. But if machines are more intelligent than humans, then giving them the wrong objective would basically be setting up a kind of a chess match between humanity and a machine that has an objective that’s across purposes with our own. And we wouldn’t win that chess match.

So I started thinking about how to solve that problem. And the book is a result of the first couple of years of thinking about how to do it.

Lucas: So you’ve given us a short and concise history of the field of AI alignment and the problem of getting AI systems to do what you want. One of the things that I found so great about your book was the history of evolution and concepts and ideas as they pertain to information theory, computer science, decision theory and rationality. Chapters one through three you sort of move sequentially through many of the most essential concepts that have brought us to this problem of human control over AI systems.

Stuart: I guess what I’m trying to show is how ingrained it is in intellectual thought going back a couple of thousand years. Even in the concept of evolution, this notion of fitness, you know we think of it as an objective that creatures are trying to satisfy. So in the 20th century you had a whole lot of disciplines, economics developed around the idea of maximizing utility or welfare or profit depending on which branch you look at. Control theory is about minimizing a cost function, so the cost function described some deviation from ideal behavior and then you build systems that minimize the cost. Operations research, which is dynamic programming and Markov decision processes is all about maximizing the sum of rewards. And statistics if you set it up in general, is about minimizing an expected loss function.

So all of these disciplines have the same bug if you like. It’s a natural way to set things up, but in the long run we’ll just see it as a bad cramped way of doing engineering. And what I’m proposing in the book actually is a way of thinking about it that’s much more in a binary rather than thinking about the machine and it’s objective.

You think about this coupled system with humans or you know, it could be any entity that wants a machine to do something good for it or another system to do something good for it. And then the system itself, which is supposed to do something good for the human or whatever else it is that wants something good to happen. So this kind of coupled system, don’t really see that in the intellectual tradition. Maybe one exception that I know of, which is the idea of principle agent games in economics. So a principal might be an employer and the agent might be the employee. And then the game is how does the employer get the employee to do something that the employer actually wants them to do, given that the employee, the agent has their own utility function and would rather be sitting home drinking beers and watching football on the telly.

How do you get them to show up at work and do all kinds of things they wouldn’t normally want to do? The simplest way is you pay them. But you know, there’s all kinds of other ideas about incentive schemes and status and then various kinds of sanctions if people don’t show up and so on. So the economists study that notion, which is a coupled system where one entity wants to benefit from the behavior of another.

So that’s probably the closest example that we have. And then maybe in ecology, look at symbiotic species or something like that. But there’s not very many examples that I’m aware of. In fact, maybe I can’t think of any, where the entity that’s supposedly in control, namely us, is less intelligent than the entity that it’s supposedly controlling, namely the machine.

Lucas: So providing some framing and context here for the listener, the first part of your book, chapters one through three explores the idea of intelligence in humans and in machines. There you give this historical development of ideas and I feel that this history you give of computer science and the AI alignment problem really helps to demystify both the person and evolution as a process and the background behind this problem.

Your second part of your book, chapters four through six discusses some of the problems arising from imbuing machines with intelligence. So this is a lot of the AI alignment problem considerations. And then the third part, chapter seven through ten suggests a new way to think about AI, to ensure that machines remain beneficial to humans forever.

You’ve begun stating this problem and readers can see in chapters one through three that this problem goes back a long time, right? The problem with computer science at its inception was that definition that you gave that a machine is intelligent in so far as it is able to achieve its objectives. In reaction to this, you’ve developed cooperative inverse reinforcement learning and inverse reinforcement learning, which is sort of part of the latter stages of this book where you’re arguing for new definition that is more conducive to alignment.

Stuart: Yeah. In the standard model as I call it in the book, the humans specifies the objective and plugs it into the machine. If for example, you get in your self driving car and it says, “Where do you want to go?” And you say, “Okay, take me to the airport.” For current algorithms as we understand them, understand built on this kind of model, that objective becomes the sole life purpose of the vehicle. It doesn’t necessarily understand that in fact that’s not your sole life purpose. If you suddenly get a call from the hospital saying, oh, you know, your child has just been run over and is in the emergency room. You may well not want to go to the airport. Or if you get into a traffic jam and you’ve already missed the last flight, then again you might not want to go to the airport.

So when a human gives an objective to another human, it’s perfectly clear that that’s not the sole life mission. So you ask someone to fetch the coffee, that doesn’t mean fetch the coffee at all costs. It just means on the whole, I’d rather have coffee than not, but you know, don’t kill anyone to get the coffee. Don’t empty out my bank account to get the coffee. Don’t trudge 300 miles across the desert to get the coffee.

In the standard model of AI, the machine doesn’t understand any of that. It just takes the objective and that’s its sole purpose in life. The more general model would be that the machine understands that the human has internally some overall preference structure of which this particular objective fetch the coffee or take me to the airport is just a little local manifestation. And machine’s purpose should be to help the human realize in the best possible way their overall preference structure.

If at the moment that happens to include getting a cup of coffee, that’s great or taking him to the airport. But it’s always in the background of this much larger preference structure that the machine knows and it doesn’t fully understand. One way of thinking about is to say that the standard model of AI assumes that the machine has perfect knowledge of the objective and the model I’m proposing assumes that the model has imperfect knowledge of the objective or partial knowledge of the objective. So it’s a strictly more general case.

When the machine has partial knowledge of the objective there’s whole lot of new things that come into play that simply don’t arise when the machine thinks it knows the objective. For example, if the machine knows the objective, it would never ask permission to do an action. It would never say, you know, is it okay if I do this because it believes that it’s already extracted all there is to know about human preferences in the form of this objective. And so whatever plan it formulates to achieve the objective must be the right thing to do.

Whereas a machine that knows that it doesn’t know the full objective could say, well, given what I know, this action looks okay, but I want to check with the boss before going ahead because it might be that this plan actually violate some part of the human preference structure that it doesn’t know about. So you get machines that ask permission, you get machines that, for example, allow themselves to be switched off because the machine knows that it might do something that will make the human unhappy. And if the human wants to avoid that and switches the machine off, that’s actually a good thing. Whereas a machine that has a fixed objective would never want to be switched off because that guarantees that it won’t achieve the objective.

So in the new approach you have a strictly more general repertoire of behaviors that the machine can exhibit. The idea of inverse reinforcement learning is this is the way for the machine to actually learn more about what the human preference structure is. By observing human behavior, which could be verbal behavior, like, could you fetch me a cup of coffee? That’s a fairly clear indicator about your preference structure, but it could also be that you know, you ask a human question and the human doesn’t reply. Maybe the human’s mad at you and is unhappy about the line of questioning that you’re pursuing.

 So human behavior means everything humans do and have done in the past. So everything we’ve ever written down, every movie we’ve made, every television broadcast contains information about human behavior and therefore about human preferences. Inverse reinforcement learning really means how do we take all that behavior and learn human preferences from it?

Lucas: What can you say about how tool AI as a possible path to AI alignment fits in this schema where we reject the standard model, as you call it, in favor of this new one?

Stuart: Tool AI is a notion, oddly enough, it doesn’t really occur within the field of AI. It’s a phrase that came from people who are thinking from the outside about possible risks from AI. And what it seems to mean is the idea that rather than buildings general purpose intelligence systems. If you are building AI systems designed for some specific purpose, then that’s sort of innocuous and doesn’t present any risks. And some people argue that in fact if you just have a large collection of these innocuous application specific AI systems, then there’s nothing to worry about.

My experience of tool AI is that when you build applications specific systems, you can kind of do it in two ways. One is you kind of hack it. In other words, you figure out how you would do this task and then you write a whole bunch of very, very special purpose code. So, for example, if you were doing handwriting recognition, you might think, oh, okay, well in order to find an ‘S’ I have to look for a line that’s curvy and I follow the line and it has to have three bends, it has to be arranged this way. And you know, you write a whole bunch of tests to check each characteristic of an ad that it has all these characteristics and it doesn’t have any loops and this, that and the other. And then you see okay, that’s an S.

And that’s actually not the way that people went about the problem of handwriting recognition. The way that they did it was to develop machine learning systems that could take images of characters that were labeled and then train a recognizer that could recognize new instances of characters. And in fact, Yann LeCun at AT&T was doing a system that was designed to recognize words and figures on checks. So very, very, very application specific, very tooley and order to do that he invented convolutional neural networks. Which is what we now call deep learning.

So, out of this very, very narrow piece of tool AI came this very, very general technique. Which has solved or largely solved object recognition, speech recognition, machine translation, and some people argue will produce general purpose AI. So I don’t think there’s any safety to be found in focusing on tool AI.

The second point is that people feel that somehow to tool AI is not an agent. So an agent meaning a system that you can think of as perceiving the world and then taking actions. And again, I’m not sure that’s really true. So a Go program is an agent. It’s an agent that operates in a small world, namely the Go board, but it perceives the board, the move that’s made and it takes action.

It chooses what to do next in many applications like this, this is the really the only way to build an effective tool is that it should be an agent. If it’s a little vacuum cleaning robot or lawn mowing robot, certainly a domestic robot that’s supposed to keep your house clean and look after the dog while you’re out. There’s simply no way to build those kinds of systems except as agents and as we improve the capabilities of these systems, whether it’s for perception or planning and behaving in the real physical world. We’re effectively going to be creating general purpose intelligent agents. I don’t really see salvation in the idea that we’re just going to build applications specific tools.

Lucas: So that helps to clarify that tool AI do not get around this update that you’re trying to do with regards to the standard model. So pivoting back to intentions surrounding the book, if you could distill the core message or the central objective in writing this book, how would you say that?

Stuart: The objective is to reorient the field of AI so that in future we build systems using an approach that doesn’t present the same risk as a standard model. I’m addressing multiple audiences. That’s the message I think for the AI community is the first phase our existence maybe should come to an end and we need to move on to this other way of doing things. Because it’s the only way that works as machines become more intelligent. We can’t afford to stick with the standard model because as I said, systems with the wrong objective could have arbitrarily bad consequences.

Then the other audience is the general public, people who are interested in policy, how things are going to unfold in future and technology and so on. For them, I think it’s important to actually understand more about AI rather than just thinking of AI as this kind of magic juice that triples the value of your startup company. It’s a collection of technologies and those technologies have been built within a framework, the standard model that has been very useful and is shared with these other fields, economic, statistics, operations of search, control theory. But that model does not work as we move forward and we’re already seeing places where the failure of the model is having serious negative consequences.

One example would be what’s happened with social media. So social media algorithms, content selection algorithms are designed to show you stuff or recommend stuff in order to maximize click-through. Clicking is what generates revenue for the social media platforms. And so that’s what they tried to do and I almost said they want to show you stuff that you will click on. And that’s what you might think is the right solution to that problem, right? If you want to maximize, click-through, then show people stuff they want to click on and that sounds relatively harmless.

Although people have argued that this creates a filter bubble or a little echo chamber where you only see stuff that you like and you don’t see anything outside of your comfort zone. That’s true. It might tend to cause your interests to become narrower, but actually that isn’t really what happened and that’s not what the algorithms are doing. The algorithms are not trying to show you the stuff you like. They’re trying to turn you into predictable clickers. They seem to have figured out that they can do that by gradually modifying your preferences and they can do that by feeding you material. That’s basically, if you think of a spectrum of preferences, it’s to one side or the other because they want to drive you to an extreme. At the extremes of the political spectrum or the ecological spectrum or whatever image you want to look at. You’re apparently a more predictable clicker and so they can monetize you more effectively.

So this is just a consequence of reinforcement learning algorithms that optimize click-through. And in retrospect, we now understand that optimizing click-through was a mistake. That was the wrong objective. But you know, it’s kind of too late and in fact it’s still going on and we can’t undo it. We can’t switch off these systems because there’s so tied in to our everyday lives and there’s so much economic incentive to keep them going.

So I want people in general to kind of understand what is the effect of operating these narrow optimizing systems that pursue these fixed and incorrect objectives. The effect of those on our world is already pretty big. Some people argue that operation’s pursuing the maximization of profit have the same property. They’re kind of like AI systems. They’re kind of super intelligent because they think over long time scales, they have massive information, resources and so on. They happen to have human components, but when you put a couple of hundred thousand humans together into one of these corporations, they kind of have this super intelligent understanding, manipulation capabilities and so on.

Lucas: This is a powerful and important update for research communities. I want to focus here in a little bit on the core messages of the book as per each audience because I think you can say and clarify different things for different people. So for example, my impressions are that for sort of laypersons who are not AI researchers, the history of ideas that you give clarifies the foundations of many fields and how it has led up to this AI alignment problem. As you move through and past single agent cases to multiple agent cases where we give rise to game theory and decision theory and how that all affects AI alignment.

So for laypersons, I think this book is critical for showing the problem, demystifying it, making it simple, and giving the foundational and core concepts for which human beings need to exist in this world today. And to operate in a world where AI is ever becoming a more important thing.

And then for the research community, as you just discussed, it seems like this rejection of the standard model and this clear identification of systems with exogenous objectives that are sort of singular and lack context and nuance. That when these things optimize for their objectives, they run over a ton of other things that we care about. And so we have to shift from this understanding where the objective is something inside of the exogenous system to something that the system is uncertain about and which actually exists inside of the person.

And I think the last thing that I sort of saw was for people who are not AI researchers, it says, here’s this AI alignment problem. It is deeply interdependent and difficult. It requires economists and sociologists and moral philosophers. And for this reason too, it is important for you to join in to help. Do you have anything here you’d like to hit on or expand on or anything I might’ve gotten wrong?

Stuart: I think that’s basically right. One thing that I probably should clarify, and it comes maybe from the phrase value alignment. The goal is not to build machines whose values are identical to those of humans. In other words, it’s not to just put in the right objective because I actually believe that that’s just fundamentally impossible to do that. Partly because humans actually don’t know their own preference structure. There’s lots of things that we might have a future positive or a negative reaction to that we don’t yet know, lots of foods that we haven’t yet tried. And in the book I give the example of the durian fruit, which some people really love and some people find utterly disgusting, and I don’t know which I am because I’ve never tried it. So I’m genuinely uncertain about my own preference structure.

It’s really not going to be possible for machines to be built with the right objective built in. They have to know that they don’t know what the objective is. And it’s that uncertainty that creates this deferential behavior. It becomes rational for that machine to ask permission and to allow itself to be switched off, which as I said, are things that a standard model machine would never do.

The reason why psychology, economics, moral philosophy become absolutely central, is that these fields have studied questions of human preferences, human motivation, and also the fundamental question which machines are going to face, of how do you act on behalf of more than one person? The version of the problem where there’s one machine and one human is relatively constrained and relatively straightforward to solve, but when you get one machine and many humans or many machines and many humans, then all kinds of complications come in, which social scientists have studied for centuries. That’s why they do it, because there’s more than one person.

And psychology comes in because the process whereby the machine is going to learn about human preferences requires that there be some connection between those preferences and the behavior that humans exhibit, because the inverse reinforcement learning process involves observing the behavior and figuring out what are the underlying preferences that would explain that behavior, and then how can I help the human with those preferences.

Humans, surprise, surprise, are not perfectly rational. If they were perfectly rational, we wouldn’t need to worry about psychology; we would do all this just with mathematics. But the connection between human preferences and human behavior is extremely complex. It’s mediated by our whole cognitive structure, and is subject to lots of deviations from perfect rationality. One of the deviations is that we are simply unable, despite our best efforts, to calculate what is the right thing to do given our preferences.

Lee Sedol, I’m pretty sure wanted to win the games of Go that he was playing against AlphaGo, but he wasn’t able to, because he couldn’t calculate the winning move. And so if you observe his behavior and you assume that he’s perfectly rational, the only explanation is that he wanted to lose, because that’s what he did. He made losing moves. But actually that would be obviously a mistake.

So we have to interpret his behavior in the light of his cognitive limitations. That becomes then a matter of empirical psychology. What are the cognitive limitations of humans, and how do they manifest themselves in the kind of imperfect decisions that we make? And then there’s other deviations from rationality. We’re myopic, we suffer from weakness of will. We know that we ought to do this, that this is the right thing to do, but we do something else. And we’re emotional. We do things driven by our emotional subsystems, when we lose our temper for example, that we later regret and say, “I wish I hadn’t done that.”

 All of this is really important for us to understand going forward, if we want to build machines that can accurately interpret human behavior as evidence for underlying human preferences.

Lucas: You’ve touched on inverse reinforcement learning in terms of human behavior. Stuart Armstrong was on the other week, and I believe his claim was that you can’t infer anything about behavior without making assumptions about rationality and vice versa. So there’s sort of an incompleteness there. I’m just pushing here and wondering more about the value of human speech, about what our revealed preferences might be, how this fits in with your book and narrative, as well as furthering neuroscience and psychology, and how all of these things can decrease uncertainty over human preferences for the AI.

Stuart: That’s a complicated set of questions. I agree with Stuart Armstrong that humans are not perfectly rational. I’ve in fact written an entire book about that. But I don’t agree that it’s fundamentally impossible to recover information about preferences from human behavior. Let me give the kind of straw man argument. So let’s take Gary Kasparov: chess player, was world champion in the 1990s, some people would argue the strongest chess player in history. You might think it’s obvious that he wanted to win the games that he played. And when he did win, he was smiling, jumping up and down, shaking his fists in triumph. And when he lost, he behaved in a very depressed way, he was angry with himself and so on.

Now it’s entirely possible logically that in fact he wanted to lose every single game that he played, but his decision making was so far from rational that even though he wanted to lose, he kept playing the best possible move. So he’s got this completely reversed set of goals and a completely reversed decision making process. So it looks on the outside as if he’s trying to win and he’s happy when he wins. But in fact, he’s trying to lose and he’s unhappy when he wins, but his attempt to appear unhappy again is reversed. So it looks on the outside like he’s really happy because he keeps doing the wrong things, so to speak.

This is an old idea in philosophy. Donald Davidson calls it radical interpretation: that from the outside, you can sort of flip all the bits and come up with an explanation that’s sort the complete reverse of what any reasonable person would think the explanation to be. The problem with that approach is that it then takes away the meaning of the word “preference” altogether. For example, let’s take the situation where Kasparov can checkmate his opponent in one move, and it’s blatantly obvious and in fact, he’s taken a whole sequence of moves to get to that situation.

If in all such cases where there’s an obvious way to achieve the objective, he simply does something different, in other words, let’s say he resigns, so whenever he’s in a position with an obvious immediate win, he instantly resigns, then in what sense is it meaningful to say that Kasparov actually wants to win the game if he always resigns whenever he has a chance of winning?

You simply vitiate the entire meaning of the word “preference”. It’s just not correct to say that a person who always resigns whenever they have a chance of winning really wants to win games. You can then kind of work back from there. So by observing human behavior in situations where the decision is kind of an obvious one that doesn’t require a huge amount of calculation, then it’s reasonable to assume that the preferences are the ones that they reveal by choosing the obvious action. If you offer someone a lump of coal or a $1,000 bill and they choose a $1,000 bill, it’s unreasonable to say, “Oh, they really prefer the lump of coal, but they’re just really stupid, so they keep choosing the $1,000 dollar bill.” That would just be daft. So in fact it’s quite natural that we’re able to gradually infer the preferences of imperfect entities, but we have to make some assumptions that we might call minimal rationality, which is that in cases where the choice is obvious, people will generally tend to make the obvious choice.

Lucas: I want to be careful here about not misrepresenting any of Stuart Armstrong’s ideas. I think this is in relation to the work Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents, if you’re familiar with that?

Stuart: Yeah.

Lucas: So then everything you said still suffices. Is that the case?

Stuart: I don’t think we radically disagree. I think maybe it’s a matter of emphasis. How important is it to observe the fact that there is this possibility of radical interpretation? It doesn’t worry me. Maybe it worries him, but it doesn’t worry me because we do a reasonably good job of inferring each other’s preferences all the time by just ascribing at least a minimum amount of rationality in human decision making behavior.

This is why economists, the way they try to elicit preferences, is by offering you direct choices. They say, “Here’s two pizzas. Are you going to have a bubblegum and pineapple pizza, or you can have ham and cheese pizza. Which one would you like?” And if you choose the ham and cheese pizza, they’ll infer that you prefer the ham and cheese pizza, and not the bubblegum and pineapple one, as seems pretty reasonable.

There may be real cases where there is genuine ambiguity about what’s driving human behavior. I am certainly not pretending that human cognition is no mystery; it still is largely a mystery. And I think for the long term, it’s going to be really important to try to unpack some of that mystery. Horribly to me, the biggest deviation from rationality that humans exhibit is the fact that our choices are always made in the context of a whole hierarchy of commitments that effectively put us into what’s usually a much, much smaller decision-making situation than the real problem. So the real problem is I’m alive, I’m in this enormous world, I’m going to live for a few more decades hopefully, and then my descendants will live for years after that and lots of other people on the world will live for a long time. So which actions do I do now?

And I could do anything. I could continue talking to you and recording this podcast. I could take out my phone and start trading stocks. I could go out on the street and start protesting climate change. I could set fire to the building and claim the insurance payment, and so on and so forth. I could do a gazillion things. Anything that’s logically possible I could do. And I continue to talk in the podcast because I’m existing in this whole network and hierarchy of commitments. I agreed that we would do the podcast, and why did I do that? Well, because you asked me, and because I’ve written the book and why did I write the book and so on.

So there’s a whole nested collection of commitments, and we do that because otherwise we couldn’t possibly manage to behave successfully in the real world at all. The real decision problem is not, what do I say next in this podcast? It’s what motor control commands do I send to my 600 odd muscles in order to optimize my payoff for the rest of time until the heat death of the universe? And that’s completely and utterly impossible to figure out.

I always, and we always, exist within what I think Savage called a small world decision problem. We are aware only of a small number of options. So if you want to understand human behavior, you have to understand what are the commitments and what is the hierarchy of activities in which that human is engaged. Because otherwise you might be wondering, well why isn’t Stuart taking out his phone and trading stocks? But that would be a silly thing to wonder. It’s reasonable to ask, well why is he answering the question that way and not the other way?

Lucas: And so “AI, please fetch the coffee,” also exists in such a hierarchy. And without the hierarchy, the request is missing much of the meaning that is required for the AI to successfully do the thing. So it’s like an inevitability that this hierarchy is required to do things that are meaningful for people.

Stuart: Yeah, I think that’s right. Requests are a very interesting special case of behavior, right? They’re just another kind of behavior. But up to now, we’ve interpreted them as defining the objective for the machine, which is clearly not the case. And people have recognized this for a long time. For example, my late colleague Bob Wilensky had a project called the Unix Consultant, which was a natural language system, and it was actually built as an agent, that would help you with Unix stuff, so managing files on your desktop and so on. You could ask it questions like, “Could you make some more space on my disk?”, and the system needs to know that RM*, which means “remove all files”, is probably not the right thing to do, that this request to make space on the disk is actually part of a larger plan that the user might have. And for that plan, most of the other files are required.

So a more appropriate response would be, “I found these backup files that have already been deleted. Should I empty them from the trash?”, or whatever it might be. So in almost no circumstances would a request be taken literally as defining the sole objective. If you asked for a cup of coffee, what happens if there’s no coffee? Perhaps it’s reasonable to bring a cup of tea or “Would you like a can of Coke instead?”, and not to … I think in the book I had the example that you stop at a gas station in the middle of the desert, 250 miles from the nearest town and they haven’t got any coffee. The right thing to do is not to trundle off across the desert and come back 10 days later with coffee from a nearby town. But instead to ask, well, “There isn’t any coffee. Would you like some tea or some Coca-Cola instead?”

 This is very natural for humans and in philosophy of language, my other late colleague Paul Grice, was famous for pointing out that many statements, questions, requests, commands in language have this characteristic that they don’t really mean what they say. I mean, we all understand if someone says, “Can you pass the salt?”, the correct answer is not, “Yes, I am physically able to pass the salt.” He became an adjective, right? So we talk about Gricean analysis, where you don’t take the meaning literally, but you look at the context in which it was said and the motivations of the speaker and so on to infer what is a reasonable course of action when you hear that request.

Lucas: You’ve done a wonderful job so far painting the picture of the AI alignment problem and the solution for which you offer, at least the pivoting which you’d like the community to take. So for laypersons who might not be involved or experts in AI research, plus the AI alignment community, plus potential researchers who might be brought in by this process or book, plus policymakers who may also listen to it, what’s at stake here? Why does this matter?

Stuart: I think AI, for most of its history, has been an interesting curiosity. It’s a fascinating problem, but as a technology it was woefully lacking. And it has found various niches where it’s useful, even before the current incarnation in terms of deep learning. But if we assume that progress will continue and that we will create machines with general purpose intelligence, that would be roughly speaking, the biggest event in human history.

History, our civilization, is just a consequence of the fact that we have intelligence, and if we had a lot more, it would be a radical step change in our civilization. If these were possible at all, it would enable other inventions that people have talked about as possibly the biggest event in human history, for example, creating the ability for people to live forever or much, much longer life span than we currently have, or creating the possibility for people to travel faster than light so that we could colonize the universe.

If those are possible, then they’re going to be much more possible with the help of AI. If there’s a solution to climate change, it’s going to be much more possible to solve climate change with the help of AI. It’s this fact that AI in the form of general purpose intelligence systems is this kind of über technology that makes it such a powerful development if and when it happens. So the upside is enormous. And then the downside is also enormous, because if you build things that are more intelligent than you, then you face this problem. You’ve made something that’s much more powerful than human beings, but somehow you’ve got to make sure that it never actually has any power. And that’s not completely obvious how to do that.

The last part of the book is a proposal for how we could do that, how you could change this notion of what we mean by an intelligent system so that rather than copying this sort of abstract human model, this idea of rationality, of decision making in the interest, in the pursuit of one’s own objectives, we have this other kind of system, this sort of coupled binary system where the machine is necessarily acting in the service of human preferences.

If we can do that, then we can reap the benefits of arbitrarily intelligent AI. Then as I said, the upside would be enormous. If we can’t do that, if we can’t solve this problem, then there are really two possibilities. One is that we need to curtail the development of artificial intelligence and for all the reasons that I just mentioned, it’s going to be very hard because the upside incentive is so enormous. It would be very hard to stop research and development in AI.

The third alternative is that we create general purpose, superhuman intelligent machines and we lose control of them, and they’re pursuing objectives that are ultimately mistaken objectives. There’s tons of science fiction stories that tell you what happens next, and none of them are desirable futures for the human race.

Lucas: Can you expand upon what you mean by if we’re successful in the control/alignment problem, what “tremendous” actually means? What actually are the conclusions or what is borne out of the process of generating an aligned super intelligence from that point on until heat death or whatever else?

Stuart: Assuming that we have a general purpose intelligence that is beneficial to humans, then you can think about it in two ways. I already mentioned the possibility that you’d be able to use that capability to solve problems that we find very difficult, such as eternal life, curing disease, solving the problem of climate change, solving the problem of faster than light travel and so on. You might think of these as sort of the science fiction-y upside benefits. But just in practical terms, when you think about the quality of life for most people on earth, let’s say it leaves something to be desired. And you say, “Okay, would be a reasonable aspiration?”, and put it somewhere like the 90th percentile in the US. That would mean a ten-fold increase in GDP for the world if you brought everyone on earth up to what we call a reasonably nice standard of living by Western standards.

General purpose AI can do that in the following way, without all these science fiction inventions and so on. So just deploying the technologies and materials and processes that we already have in ways that are much, much more efficient and obviously much, much less labor intensive.

The reason that things cost a lot and the reason that people in poor countries can’t afford them … They can’t build bridges or lay railroad tracks or build hospitals because they’re really, really expensive and they haven’t yet developed the productive capacities to produce goods that could pay for all those things. The reason things are really, really expensive is because they have a very long chain of production in which human effort is involved at every stage. The money all goes to pay all those humans, whether it’s the scientists and engineers who designed the MRI machine or the people who worked on the production line or the people who worked mining the metals that go into making the MRI machine.

All the money is really paying for human time. If machines are doing every stage of the production process, then you take all of those costs out, and to some extent it becomes like a digital newspaper, in the sense that you can have as much of it as you want. It’s almost free to make new copies of a digital newspaper, and it would become almost free to produce the material goods and services that constitute a good quality of life for people. And at that point, arguing about who has more of it is like arguing about who has more digital copies of the newspaper. It becomes sort of pointless.

That has two benefits. One is everyone is relatively much better off, assuming that we can get politics and economics out of the way, and also there’s then much less incentive for people to go around starting wars and killing each other, because there isn’t this struggle which has sort of characterized most of human history. The struggle for power, wealth and access to resources and so on. There are other reasons people kill each other, religion being one of them, but it certainly I think would help if this source of competition and warfare were removed.

Lucas: These are very important short-term considerations and benefits from getting this control problem and this alignment problem correct. One thing that the superintelligence will hopefully also do is reduce existential risk to zero, right?  And so if existential risk is reduced to zero, then basically what happens is the entire cosmic endowment, some hundreds of thousands of galaxies, become unlocked to us. Perhaps some fraction of it would have to be explored first in order to ensure existential risk is pretty close to zero. I find your arguments are pragmatic and helpful for the common person about why this is important.

For me personally, and why I’m passionate about AI alignment and existential risk issues, is that the reduction of existential risk to zero and having an aligned intelligence that’s capable of authentically spreading through the cosmic endowment, to me seems to potentially unlock a kind of transcendent object at the end of time, ultimately influenced by what we do here and now, which is directed and created by coming to better know what is good, and spreading that.

What I find so beautiful and important and meaningful about this problem in particular, and why anyone who’s reading your book, why it’s so important for them for core reading, and reading for laypersons, for computer scientists, for just everyone, is that if we get this right, this universe can be maybe one of the universes and perhaps the multiverse, where something like the most beautiful thing physically possible could be made by us within the laws of physics. And that to me is extremely awe-inspiring.

Stuart: I think that human beings being the way they are, will probably find more ways to get it wrong. We’ll need more solutions for those problems and perhaps AI will help us solve other existential risks, and perhaps it won’t. The control problem I think is very important. There are a couple of other issues that I think we still need to be concerned with. Well, I don’t think we need to be concerned with all of them, but a couple of issues that I haven’t begun to address or solve … One of those is obviously the problem of misuse, that we may find ways to build beneficial AI systems that remain under control in a mathematically guaranteed way. And that’s great. But the problem of making sure that only those kinds of systems are ever built and used, that’s a different problem. That’s a problem about human motivation and human behavior, which I don’t really have a good solution to. It’s sort of like the malware problem, except much, much, much, much worse. If we do go ahead developing general purpose intelligence systems that are beneficial and so on, then, parts of that technology, the general purpose intelligent capabilities could be put into systems that are not beneficial as it were, that don’t have a safety catch. And that misuse problem. If you look at how well we’re doing with malware, you’d have to say, more work needs to be done. We’re kind of totally failing to control malware and the ability of people to inflict damage on others by uncontrolled software that’s getting worse. We need an international response and a policing response. Some people argue that, oh, it’s fine. The super intelligent AI that we build will make sure that other nefarious development efforts are nipped in the bud.

This doesn’t make me particularly confident. So I think that’s an issue. The third issue is, shall we say enfeeblement. This notion that if we develop machines that are capable of running every aspect of our civilization, then that changes the dynamic that’s been in place since the beginning of human history or pre history. Which is that for our civilization to continue, we have had to pass on our knowledge and our skills to the next generation. That people have to learn what it is that the human race knows over and over again in every generation, just to keep things going. And if you add it all up, if you look, there’s about a hundred odd billion people who’ve ever lived and they spend each about 10 years learning stuff on average. So that’s a trillion person years of teaching and learning to keep our civilization going. And there’s a very good reason why we’ve done that because without it, things would fall apart very quickly.

But that’s going to change. Now. We don’t have to put it into the heads of the next generation of humans. We can put it into the heads of the machines and they can take care of the civilization. And then you get this almost irreversible process of enfeeblement, where humans no longer know how their own civilization functions. They lose knowledge of science, of engineering, even of the humanities of literature. If machines are writing books and producing movies, then we don’t even need to learn that. You see this in E. M. Forster’s story, The Machine Stops from 1909 which is a very prescient story about a civilization that becomes completely dependent on its own machines. Or if you like something more recent in WALL-E the human race is on a, sort of a cruise ship in space and they all become obese and stupid because the machines look after everything and all they do is consume and enjoy. And that’s not a future that I would want for the human race.

And arguably the machines should say, this is not the future you want, tie your shoelaces, but we are these, shortsighted. We may effectively override what the machines are telling us and say, “No, no, you have to tie my shoe laces for me.” So I think this is a problem that we have to think about. Again, this is a problem for infinity. Once you turn things over to the machines, it’s practically impossible, I think, to reverse that process, we have to keep our own human civilization going in perpetuity and that requires a kind of a cultural process that I don’t yet understand how it would work, exactly.

Because the effort involved in learning, let’s say going to medical school, it’s 15 years of school and then college and then medical school and then residency. It’s a huge effort. It’s a huge investment and at some point the incentive to undergo that process will disappear. And so something else other than… So at the moment it’s partly money, partly prestige, partly a desire to be someone who is in a position to help others. So somehow we got to make our culture capable of maintaining that process indefinitely when many of the incentive structures that have kept it in place go away.

Lucas: This makes me wonder and think about how from an evolutionary cosmological perspective, how this sort of transition from humans being the most intelligent form of life on this planet to machine intelligence being the most intelligent form of life. How that plays out in the very longterm. If we can do thought experiments where we imagine if monkeys had been actually creating humans and then had created humans, what the role of the monkey would still be.

Stuart: Yep. But we should not be creating the machine analog of humans, I.E. autonomous entities pursuing their own objectives. So we’ve pursued our objectives pretty much at the expense of the monkeys and the gorillas and we should not be producing machines that play an analogous role. That would be a really dumb thing to do.

Lucas: That’s an interesting comparison because the objectives of the human are exogenous to the monkey and that’s the key issue that you point out. If the monkey had been clever and had been able to control evolution, then they would have set the human uncertain as to the monkey’s preferences and then had him optimize those.

Stuart: Yeah, I mean they could imagine creating a race of humans that were intelligent but completely subservient to the interests of the monkeys. Assuming that they solved the enfeeblement problem and the misuse problem, then they’d be pretty happy with the way things turned out. I don’t see any real alternative. So Samuel Butler in 1863 wrote a book about a society that faces the problem of superintelligent machines and they take the other solution, which is actually to stop. They see no alternative but to just ban the construction of intelligent machines altogether. In fact, they ban all machines and in Frank Herbert’s Dune, the same thing. They have a catastrophic war in which humanity just survives in its conflict with intelligent machines. And then from then on, all intelligent machines, in fact, all computers are banned altogether. I can’t see that that’s a plausible direction, but it could be that we decide at some point that we cannot solve the control problem or we can’t solve the misuse problem or we can’t solve the enfeeblement problem.

And we decided that it’s in our best interests to just not go down this path at all. To me that just doesn’t feel like a possible direction. Things can change if we start to see bigger catastrophes. I think the click through catastrophe is already pretty big and it results from very, very simple minded algorithms that know nothing about human cognition or politics or anything else. They’re not even explicitly trying to manipulate us. It’s just, that’s what the code does in a very simple minded way. So we could imagine bigger catastrophes happening that we survived by the skin of our teeth as happened in Dune for example. And then that would change the way people think about the problem. And we see this over and over again with nuclear power, with fossil fuels and so on that by large technology is always seen as beneficial and more technology is therefore more beneficial.

And we pushed your head often ignoring the people who say “But, but, but what about this drawback? What about this drawback?” And maybe that starting to change with respect to fossil fuels. Several countries have now decided since Chernobyl and Fukushima to ban nuclear power, the EU has much stronger restriction on genetically modified foods than a lot of other countries, so there are pockets where people have pushed back against technological progress and said, “No, not all technology is good and not all uses of technology are good and so we need to exercise a choice.” But the benefits of AI are potentially so enormous. It’s going to take a lot to undo this forward progress.

Lucas: Yeah, absolutely. Whatever results from earth originating intelligent life at the end of time, that thing is up to us to create. I’m quoting you here, you say, “A compassionate and jubilant use of humanity’s cosmic endowment sounds wonderful, but we also have to reckon with the rapid rate of innovation in the malfeasance sector, ill intentioned people are thinking up new ways to misuse AI so quickly that this chapter is likely to be outdated even before an attains printed form. Think of it not as depressing reading. However, but as a call to act before it’s too late.”

Thinking about this and everything you just touched on. There’s obviously a ton for us to get right here that needs to be gotten right and it’s a question and problem for everyone in the human species to have a voice in.

Stuart: Yeah. I think we really need to start considering the possibility that there ought to be a law against it. For a long time the IT industry almost uniquely has operated in a completely unregulated way. The car industry for example, cars have to follow various kinds of design and safety rules. You have to have headlights and turn signals and brakes and so on. A car that’s designed in an unsafe way gets taken off the market, but software can do pretty much whatever it wants.

Every license agreement that you sign whenever you buy or use software tells you that it doesn’t matter what their software does. The manufacturer is not responsible for anything and so on. And I think it’s a good idea to actually take legislative steps, regulatory steps just to get comfortable with the idea that yes, I see we maybe do need regulation. San Francisco, for example, has banned the use of facial recognition in public or for policing. California has a ban on the impersonation of human beings by AI systems. I think that ban should be pretty much universal. But in California it’s primary area of applicability is in persuading people to vote in any particular direction in an election. So it’s a fairly narrow limitation. But when you think about it, why would you want to allow AI systems to impersonate human beings so that in other words, the human who’s in conversation, believes that if they’re talking to another human being, that they owe that other human being a whole raft of respect, politeness, all kinds of obligations that are involved in interacting with other humans.

But you don’t owe any of those things to an AI system. And so why should we allow people to effectively defraud humans by convincing them that in fact they’re engaged with another human when they aren’t? So I think it would be a good idea to just start things off with some basic common sense rules. I think the GDPR rule that says that you can’t use an algorithm to make a decision that has a significant legal effect on a person. So you can’t put them in jail simply as a result of an algorithm, for example. You can’t fire them from a job simply as a result of an algorithm. You can use the algorithm to advise, but a human has to be involved in the decision and the person has to be able to query the decision and ask for the reasons and in some sense have a right of appeal.

So these are common sense rules that almost everyone would agree with. And yet certainly in the U.S., there’s reluctance to put them into effect. And I think going forward, if we want to have safe AI systems, there’s at least going to be a role for regulations. There should also be standards as in I triple E standards. There should also be professional codes of conduct. People should be trained in how to recognize potentially unsafe designs for AI systems, but there should, I think, be a role for regulation where at some point you would say, if you want to put an AI system on the internet, for example, just as if you want to put software into the app store, it has to pass a whole bunch of checks to make sure that it’s safe to make sure that it won’t wreak havoc. So, we better start thinking about that. I don’t know yet what that regulation should say, but we shouldn’t be in principle opposed to the idea that such regulations might exist at some point.

Lucas: I basically agree that these regulations should be implemented today, but they seem pretty temporary or transient as the uncertainty in the AI system for the humans’ objective function or utility function decreases. So they become more certain about what we want. At some point it becomes unethical to have human beings governing these processes instead of AI systems. Right? So if we have timelines from AI researchers that range from 50 to a hundred years for AGI, we could potentially see laws and regulations like this go up in the next five to 10 and then disappear again somewhere within the next hundred to 150 years max.

Stuart: That’s an interesting viewpoint. And I think we have to be a little careful because autonomy is part of our preference structure. So although one might say, okay, know who gets to run the government? Well self, evidently it’s possible that machines could do a better job than the humans we currently have that would be better only in a narrow sense that maybe it would reduce crime, maybe it would increase economic output, we’d have better health outcomes, people would be more educated than they would with humans making those decisions, but there would be a dramatic loss in autonomy. And autonomy is a significant part of our preference structure. And so it isn’t necessarily the case that the right solution is that machines should be running the government. And this is something that the machines themselves will presumably recognize and this is the reason why parents at some point tell the child, “No, you have to tie your own shoe laces.” Because they want the child to develop autonomy.

The same thing will be true. The machines want humans to retain autonomy. As I said earlier, with respect to enfeeblement, right? It’s this conflict between our longterm best interest and our short term-ism in the choices that we tend to make. It’s always easier to say, “Oh no, I can’t be bothered at the time I shoelaces. Please could you do it?” But if you keep doing that, then the longterm consequences are bad. We have to understand how autonomy, which includes machines not making decisions, folds into our overall preference structure. And up to now there hasn’t been much of a choice, at least in the global sense. Of course it’s been humans making the decisions, although within any local context it’s only a subset of humans who are making the decisions and a lot of other people don’t have as much autonomy. To me, I think autonomy is a really important currency that to the extent possible, everyone should have as much of it as possible.

Lucas: I think you really hit the nail on the head. The problem is where autonomy fits in the hierarchy of our preferences and meta preferences. For me, it seems more instrumental than being an end goal in itself. Now this is an empirical question across all people where autonomy fits in their preference hierarchies and whether it’s like a terminal value or not, and whether under reflection and self idealization, our preferences distill into something else or not. Autonomy could possibly but not necessarily be an end goal. In so far as that it simply provides utility for all of our other goals. Because without autonomy we can’t act on what we think will best optimize our own preferences and end values. So definitely a lot of questions there. The structure of our preference hierarchy will certainly dictate, it seems, the longterm outcome of humanity and how enfeeblement unfolds.

Stuart: The danger would be that we misunderstand the entire nature of the human preference hierarchy. So sociologists and others have talked about the hierarchy of human needs in terms of food, shelter, physical security and so on. But they’ve always kind of assumed that you are a human being and therefore you’re the one deciding stuff. And so they tend not to think so much about fundamental properties of the ability to make your own choices for good or ill. And science fiction writers have had a field day with this. Pointing out that machines that do what you want are potentially disastrous because you lose the freedom of choice.

One could imagine that if we formulate things not quite right and the effect of the algorithms that we build is to make machines that don’t value autonomy in the right way or don’t have it folded into the overall preference structure in the right way, that we could end up with a subtle but gradual and very serious loss of autonomy in a way that we may not even notice as it happens. Like the slow boiling frog. If we could look ahead a hundred years and see how things turn out, he would say, “Oh my goodness, that is a terrible mistake”. We’re going to make sure that that doesn’t happen. So I think we need to be pretty careful. And again this is where we probably need the help of philosophers to make sure that we keep things straight and understand how these things fit together.

Lucas: Right, so seems like we simply don’t understand ourselves. We don’t know the hierarchy of our preferences. We don’t really know what preferences exactly are. Stuart Armstrong talks about how we haven’t figured out the symbol grounding problem. So there are issues with even understanding how preferences relate to one another ultimately and how the meaning there is born. And we’re building AI systems which will be more capable than us. Perhaps they will be conscious. You have a short subchapter I believe on that or at least on how you’re not going to talk about consciousness.

Stuart: Yeah. I have a paragraph saying I have nothing to say.

Lucas: So potentially these things will also be moral patients and we don’t know how to get them to do the things that we’re not entirely sure that we want them to do. So how would you differentiate this book from Superintelligence or Life 3.0 or other books on the AI alignment problem. And superintelligence in this space.

Stuart: I think the two major differences are one, I believe that to understand this whole set of issues or even just to understand what’s happening with AI and what’s going to happen, you have to understand something about AI. And I think that Superintelligence and Life 3.0 are to some extent, easier to grasp. If you already understand quite a bit about AI. And if you don’t, then it’s quite difficult to get as much out of those books as is in there. I think they are full of interesting points and ideas, but those points and ideas are easier to get out if you understand AI. So I wanted people to understand AI, understand, not just it as a technology, right? You could talk about how deep learning works, but that’s not the point. The point is really what is intelligence and how have we taken that qualitative understanding of what that means and turned it into this technical discipline where the standard model is machines that achieve fixed objectives.

And then the second major difference is that I’m proposing a solution for at least one of the big failure modes of AI. And as far as I can tell, that solution, I mean, it’s sort of mentioned in some ways in Superintelligence, I think the phrase there is normative uncertainty, but it has a slightly different connotation. And partly that’s because this approach of inverse reinforcement learning is something that we’ve actually worked on at Berkeley for a little over 20 years. It wasn’t invented for this purpose, but it happens to fit this purpose and then the approach of how we solve this problem is fleshed out in terms of understanding that it’s this coupled system between the human that has the preferences and the machine that’s trying to satisfy those preferences and doesn’t know what they are. So I think that part is different. That’s not really present in those other two books.

It certainly shares, I think the desire to convince people that this is a serious issue. I think both Superintelligence and Life 3.0 do a good job of that Superintelligence is sort of a bit more depressing. It’s such a good job of convincing you that things can go South, so many ways that you almost despair. Life 3.0 is a bit more cheerful. And also I think Life 3.0 does a good job of asking you what you want the outcome to be. And obviously you don’t want it to be catastrophic outcomes where we’re all placed in concrete coffins with heroin drips as Stuart Armstrong likes to put it.

But there are lots of other outcomes which are the ones you want. So I think that’s an interesting part of that book. And of course Max Tegmark, the author of Life 3.0 is a physicist. So he has lots of amazing stuff about the technologies of the future, which I don’t have so much. So those are the main differences. I think that wanting to convey the essence of intelligence, how that notion has developed, how is it really an integral part of our whole intellectual tradition and our technological society and how that model is fundamentally wrong and what’s the new model that we have to replace it with.

Lucas: Yeah, absolutely. I feel that you help to clarify intelligence for me, the history of intelligence from evolution up until modern computer science problems. I think that you really set the AI alignment problem up well resulting from there being intelligences and multi-agent scenarios, trying to do different things, and then you suggest a solution, which we’ve discussed here already. So thanks so much for coming on the podcast, Stuart, your book is set for release on October 8th?

Stuart: That’s correct.

Lucas: Great. We’ll include links for that in the description. Thanks so much for coming on.

 If you enjoyed this podcast, please subscribe. Give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI alignment series.

End of recorded material

FLI Podcast: Feeding Everyone in a Global Catastrophe with Dave Denkenberger & Joshua Pearce

Most of us working on catastrophic and existential threats focus on trying to prevent them — not on figuring out how to survive the aftermath. But what if, despite everyone’s best efforts, humanity does undergo such a catastrophe? This month’s podcast is all about what we can do in the present to ensure humanity’s survival in a future worst-case scenario. Ariel is joined by Dave Denkenberger and Joshua Pearce, co-authors of the book Feeding Everyone No Matter What, who explain what would constitute a catastrophic event, what it would take to feed the global population, and how their research could help address world hunger today. They also discuss infrastructural preparations, appropriate technology, and why it’s worth investing in these efforts.

Topics discussed include:

  • Causes of global catastrophe
  • Planning for catastrophic events
  • Getting governments onboard
  • Application to current crises
  • Alternative food sources
  • Historical precedence for societal collapse
  • Appropriate technology
  • Hardwired optimism
  • Surprising things that could save lives
  • Climate change and adaptation
  • Moral hazards
  • Why it’s in the best interest of the global wealthy to make food more available

References discussed include:

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Ariel Conn: In a world of people who worry about catastrophic threats to humanity, most efforts are geared toward preventing catastrophic threats. But what happens if something does go catastrophically wrong? How can we ensure that things don’t spiral out of control, but instead, humanity is set up to save as many lives as possible, and return to a stable, thriving state, as soon as possible? I’m Ariel Conn, and on this month’s episode of the FLI podcast, I’m speaking with Dave Denkenberger and Joshua Pearce.

Dave and Joshua want to make sure that if a catastrophic event occurs, then at the very least, all of the survivors around the planet will be able to continue eating. Dave got his Master’s from Princeton in mechanical and aerospace engineering, and his PhD from the University of Colorado at Boulder in building engineering. His dissertation was on his patented heat exchanger. He is an assistant professor at University of Alaska Fairbanks in mechanical engineering. He co-founded and directs the Alliance to Feed the Earth in Disasters, also known as ALLFED, and he donates half his income to that. He received the National Science Foundation Graduate Research Fellowship. He is a Penn State distinguished alumnus and he is a registered professional engineer. He has authored 56 publications with over 1600 citations and over 50,000 downloads — including the book Feeding Everyone No Matter What, which he co-authored with Joshua — and his work has been featured in over 20 countries, over 200 articles, including Science.

Joshua received his PhD in materials engineering from the Pennsylvania State University. He then developed the first sustainability program in the Pennsylvania State system of higher education and helped develop the Applied Sustainability Graduate Engineering Program while at Queens University Canada. He is currently the Richard Witte Professor of Materials Science and Engineering and a professor cross-appointed in the Department of Materials Science and Engineering, and he’s in the Department of Electrical and Computer Engineering at the Michigan Technological University where he runs the Open Sustainability Technology research group. He was a Fulbright-Aalto University Distinguished Chair last year and remains a visiting professor of photovoltaics and Nano-engineering at Aalto University. He’s also a visiting professor at the University of Lorraine in France. His research concentrates on the use of open source appropriate technology to find collaborative solutions to problems in sustainability and poverty reduction. He has authored over 250 publications, which have earned more than 11,000 citations. You can find his work on appropedia.org, and his research is regularly covered by the international and national press and continually ranks in the top 0.1% on academia.edu. He helped found the field of alternative food for global catastrophes with Dave, and again he was co-author on the book Feeding Everyone No Matter What.

So Dave and Joshua, thank you so much for joining us this month.

Dave Denkenberger: Thank you.

Joshua Pearce: Thank you for having us.

Ariel Conn: My first question for the two of you is a two-part question. First, why did you decide to consider how to survive a disaster rather — than focusing on prevention, as so many other people do? And second, how did you two start working together on this topic?

Joshua Pearce: So, I’ll take a first crack at this. Both of us have worked in the area of prevention, particularly in regards to alternative energy sources in order to be able to mitigate climate destabilization from fossil fuel burning. But what we both came to realize is that many of the disasters that we look at that could actually wipe out humanity aren’t things that we can necessarily do anything to avoid. The ones that we can do something about — climate change and nuclear winter — we’ve even worked together on it.

So for example, we did a study where we looked at how many nuclear weapons a state should have if they would continue to be rational. And by rational I mean even if everything were to go your way, if you shot all of your nuclear weapons, they all hit their targets, the people you were aiming at weren’t firing back at you, at what point would just the effects of firing that many weapons hurt your own society, possibly kill many of your own people, or destroy your own nation?

The answer to that turned out to be a really remarkably low number. The answer was 100. And many of the nuclear power states currently have more weapons than that. And so it’s clear at least from our current political system that we’re not behaving rationally and that there’s a real need to have a backup plan for humanity in case something does go wrong — whether it’s our fault, or whether it’s just something that happens in nature that we can’t control like a super volcano or an asteroid impact.

Dave Denkenberger: Even though there is more focus on preventing a catastrophe than there is on resilience to the catastrophe, overall the field is highly neglected. As someone pointed out, there are still more publications on dung beetles than there are on preventing or dealing with global catastrophic risks. But I would say that the particular sub-field of resilience to the catastrophes is even more neglected. That’s why I think it’s a high priority to investigate.

Joshua Pearce: We actually met way back as undergraduate students at Penn State. I was a chemistry and physics double major and one of my friends a year above said, “You have to take an engineering science class before you leave.” It changed his life. I signed up for this class taught by the man that eventually became my advisor, Christopher Wronski, and it was a brutal class — very difficult conceptually and mathematically. And I remember when one of my first tests came back, there was this bimodal distribution where there were two students who scored A’s and everybody else failed. Turned out that the two students were Dave and I, so we started working together then just on homework assignments, and then continued collaborating through all different areas of technical experiments and theory for years and years. And then Dave had this very interesting idea about what do we do in the event of a global catastrophe? How can we feed everybody? And to attack it as an engineering problem, rather than a social problem. We started working on it very aggressively.

Dave Denkenberger: So it’s been, I guess, 18 years now that we’ve been working together: a very fruitful collaboration.

Ariel Conn: Before I get any farther into the interview, let’s quickly define what a catastrophic event is and the types of catastrophic events that you both look at most.

Dave Denkenberger: The original focus was on the catastrophes that could collapse global agriculture. These would include nuclear winter from a full-scale nuclear war like US-Russia, causing burning of cities and blocking of the sun with smoke, but it could also mean a super volcanic eruption like the one that happened about 74,000 years ago that many think nearly wiped out the human species. And then there could also be a large asteroid impact similar to the one that wiped out the dinosaurs about 66 million years ago.

And in those cases, it’s very clear we need to have some other alternative source of food, but we also look at what I call the 10% global shortfalls. These are things like a volcano that caused the year without a summer in 1816, might have reduced food supply by about 10%, and caused widespread famine including in Europe and almost in the US. Then it could be a slightly smaller sized asteroid, or a regional nuclear war, and actually many other catastrophes such as a super weed, a plant that could out-compete crops. If this happened naturally, it probably would be slow enough that we could respond, but if it were part of a coordinated terrorist attack, that could be catastrophic. Even though technically we waste more than 10% of our food and we feed more than 10% of our food to animals, I think realistically, if we had a 10% food shortfall, the price of food would go so high that hundreds of millions of people could starve.

Joshua Pearce: Something that’s really important to understand about the way that we analyze these risks is that currently, even with the agricultural system completely working fine, we’ve got somewhere on the order of 800 million people without enough food to eat, because of waste and inefficiencies. And so anything that starts to cut into our ability for our agricultural system to continue, especially if all of plant life no longer works for a number of years because of the sun being blocked, we have to have some method to provide alternative foods to feed the bulk of the human population.

Ariel Conn: I think that ties in to the next question then, and that is what does it mean to feed everyone no matter what, as you say in the title of your book?

Dave Denkenberger: As Joshua pointed out, we are still not feeding everyone adequately right now. The idea of feeding everyone no matter what is an aspirational goal, and it’s showing that if we cooperated, we could actually feed everyone, even if the sun is blocked. Of course, it might not work out exactly like that, but we think that we can do much better than if we were not prepared for one of these catastrophes.

Joshua Pearce: Right. Today, roughly one in nine people go to bed hungry every night, and somewhere on the order of 25,000 people starve to death or die from hunger-related disease [per day]. And so one of the inspiring things from our initial analysis drawn up in the book is that even in the worst-case scenarios where something major happens, like a comet strike that would wipe out the dinosaurs, humans don’t need to be wiped out: We could provide for ourselves. And the embarrassing thing is that today, even with the agricultural system working fine, we’re not able to do that. And so what I’m at least hoping is that some of our work on these alternative foods provides another mechanism to provide low-cost calories for the people that need it, even today when there is no catastrophe.

Dave Denkenberger: One of the technologies that we think could be useful even now is there’s a company called Comet Bio that is turning agricultural residues like leaves and stalks into edible sugar, and they think that’s actually going to be able to compete with sugar cane. It has the advantage of not taking up lots of land that we might be cutting the rainforest down for, so it has environmental benefits as well as humanitarian benefits. Another area that I think would be relevant is in smaller disasters, such as an earthquake or a hurricane, generally the cheapest solution is just shipping in grain from outside, but if transportation is disrupted, it might make sense to be able to produce some food locally — like if a hurricane blows all the crops down and you’re not going to be able to get any normal harvest from them, you can actually grind up those leaves, like from wheat leaves, and squeeze out the liquid, boil the liquid, and then you get a protein concentrate, and people can eat that.

Ariel Conn: So that’s definitely a question that I had, and that is to what extent can we start implementing some of the plans today during a disaster? This is a pre-recorded podcast; Dorian has just struck the Bahamas. Can the stuff that you are working on now help people who are still stuck on an island after it’s been ravaged by a hurricane?

Dave Denkenberger: I think there is potential for that, the getting food from leaves. There’s actually a non-profit organization called Leaf for Life that has been doing this in less developed countries for decades now. Some other possibilities would be some mushrooms can mature in just a few weeks, and they can grow on waste, basically.

Joshua Pearce: The ones that would be good for an immediate catastrophe are the in between food that we’re working on: between the time that you run out of stored food and the time that you can ramp up the full scale, alternative foods.

Ariel Conn: Can you elaborate on that a little bit more and explain what that process would look like? What does happen between when the disaster strikes? And what does it look like to start ramping up food development in a couple weeks or a couple months or however long that takes?

Joshua Pearce: In the book we develop 10 primary pathways to develop alternative food sources that could feed the entire global population. But the big challenge for that is it’s not just are there enough calories — but you have to have enough calories at the right time.

If, say, a comet strikes tomorrow and throws up a huge amount of earth and ash and covers the sun, we’d have roughly six months of stored food in grocery stores and pantry that we could use to eat. But then for most of the major sources of alternative food, it would take around a year to ramp them up, to take these processes that might not even exist now and get them to industrial scale to feed billions of people. So the most challenging is that six-month-to-one-year period, and for those we would be using the alternative foods that Dave talked about, the mushrooms that can grow really fast and leaves. And the leaf one, part of those leaves can come from agricultural residues, things that we already know are safe.

The much larger biomass that we might be able to use is just normal killed tree leaves. The only problem with that is that there hasn’t been really any research into whether or not that’s safe. We don’t know, for example, if you can eat maple or oak leaf concentrate. The studies haven’t been done yet. And that’s one of the areas that we’re really focusing on now, is to take some of these ideas that are promising and prove that they’re actually technically feasible and safe for people to use in the event of a serious catastrophe, a minor one, or just being able to feed people that for whatever reason don’t have enough food.

Dave Denkenberger: I would add that even though we might have six months of stored food, that would be a best-case scenario when we’ve just had the harvest in the northern hemisphere; We could only have two or three months of stored food. But in many of these catastrophes, even a pretty severe nuclear winter, there’s likely to be some sunlight still coming down to the earth, and so a recent project we’ve been working on is growing seaweed. This has a lot of advantages because seaweed can tolerate low light levels, the ocean would not cool as fast as on the land, and it grows very quickly. So we’ve actually been applying seaweed growth models to the conditions of nuclear winter.

Ariel Conn: You talk about the food that we have stored being able to last for two to six months. How much transportation is involved in that? And how much transportation would we have, given different scenarios? I’ve heard that the town I’m in now, if it gets blocked off by a big snow storm, we have about two weeks of food. So I’m curious: How does that apply elsewhere? And are we worried about transportation being cut off, or do we think that transportation will still be possible?

Dave Denkenberger: Certainly there will be destruction of infrastructure regionally, whether it’s nuclear war or a super volcano or asteroid impact. So in those affected countries, transportation of food is going to be very challenging, but most of the people would not be in those countries. That’s why we think that there’s still going to be a lot of infrastructure still functioning. There are still going to be chemical factories that we can retrofit to turn leaves into sugar, or another one of the technologies is turning natural gas into single-cell protein.

Ariel Conn: There’s the issue of developing agriculture if the sun is blocked, which is one of the things that you guys are working on, and that can happen with nuclear war leading to nuclear winter; It can happen with the super volcano, with the asteroid. Let’s go a little more in depth and into what happens with these catastrophic events that block the sun. What happens with them? Why are they so devastating?

Joshua Pearce: All the past literature on what would happen if, say, we lost agriculture for a number of years, is all pretty grim. The base assumption is that everyone would simply starve to death, and there might be some fighting before that happens. When you look at what would happen based on previous knowledge of generating food from traditional ways, those were the right answers. And so, what we’re calling catastrophic events not only deal with the most extreme ones, the sun-killing ideas, but also the maybe a little less tragic but still very detrimental to the agricultural system: so something like a planned number of terrorist events to wipe out the major bread baskets of the world. Again, for the same idea, is that you&#