Podcast: AI and the Value Alignment Problem with Meia Chita-Tegmark and Lucas Perry

Published

28 February, 2018

What does it mean to create beneficial artificial intelligence? How can we expect to align AIs with human values if humans can't even agree on what we value? Building safe and beneficial AI involves tricky technical research problems, but it also requires input from philosophers, ethicists, and psychologists on these fundamental questions. How can we ensure the most effective collaboration?

Ariel spoke with FLI's Meia Chita-Tegmark and Lucas Perry on this month's podcast about the value alignment problem: the challenge of aligning the goals and actions of AI systems with the goals and intentions of humans.

Topics discussed in this episode include:

how AGI can inform human values,
the role of psychology in value alignment,
how the value alignment problem includes ethics, technical safety research, and international coordination,
a recent value alignment workshop in Long Beach,
and the possibility of creating suffering risks (s-risks).

This podcast was edited by Tucker Davey.

Transcript

Ariel: I'm Ariel Conn with the Future of Life Institute, and I'm excited to have FLI's Lucas Perry and Meia Chita-Tegmark with me today to talk about AI, ethics and, more specifically, the value alignment problem. But first, if you've been enjoying our podcast, please take a moment to subscribe and like this podcast. You can find us on iTunes, SoundCloud, Google Play, and all of the other major podcast platforms.

And now, AI, ethics, and the value alignment problem. First, consider the statement “I believe that harming animals is bad.” Now, that statement can mean something very different to a vegetarian than it does to an omnivore. Both people can honestly say that they don't want to harm animals, but how they define "harm" is likely very different, and these types of differences in values are common between countries and cultures, and even just between individuals within the same town. And then we want to throw AI into the mix. How can we train AIs to respond ethically to situations when the people involved still can't come to an agreement about what an ethical response should be?

The problem is even more complicated because often we don't even know what we really want for ourselves, let alone how to ask an AI to help us get what we want. And as we've learned with stories like that of King Midas, we need to be really careful what we ask for. That is, when King Midas asked the genie to turn everything to gold, he didn't really want everything -- like his daughter and his food -- turned to gold. And we would prefer than an AI we design recognize that there's often implied meaning in what we say, even if we don't say something explicitly. For example, if we jump into an autonomous car and ask it to drive us to the airport as fast as possible, implicit in that request is the assumption that, while we might be OK with some moderate speeding, we intend for the car to still follow most rules of the road, and not drive so fast as to put anyone's life in danger or take illegal routes. That is, when we say "as fast as possible," we mean "as fast as possible within the rules of law," and not within the rules of physics or within the laws of physics. And these examples are just the tiniest tip of the iceberg, given that I didn't even mention artificial general intelligence (AGI) and how that can be developed such that its goals align with our values.

So as I mentioned a few minutes ago, I'm really excited to have Lucas and Meia joining me today. Meia is a co-founder of the Future of Life Institute. She's interested in how social sciences can contribute to keeping AI beneficial, and her background is in social psychology. Lucas works on AI and nuclear weapons risk-related projects at FLI. His background is in philosophy with a focus on ethics. Meia and Lucas, thanks for joining us today.

Meia: It's a pleasure. Thank you.

Lucas: Thanks for having us.

Ariel: So before we get into anything else, one of the big topics that comes up a lot when we talk about AI and ethics is this concept value alignment. I was hoping you could both maybe talk just a minute about what value alignment is and why it's important to this question of AI and ethics.

Lucas: So value alignment, in my view, is bringing AI’s goals, actions, intentions and decision-making processes in accordance with what humans deem to be the good or what we see as valuable or what our ethics actually are.

Meia: So for me, from the point of view of psychology, of course, I have to put the humans at the center of my inquiry. So from that point of view, value alignment ... You can think about it also in terms of humans’ relationships with other humans. But I think it's even more interesting when you add artificial agents into the mix. Because now you have an entity that is so wildly different from humans yet we would like it to embrace our goals and our values in order to keep it beneficial for us. So I think the question of value alignment is very central to keeping AI beneficial.

Lucas: Yeah. So just to expand on what I said earlier: The project of value alignment is in the end creating beneficial AI. It's working on what it means for something to be beneficial, what beneficial AI exactly entails, and then learning how to technically instantiate that into machines and AI systems. Also, building the proper like social and political context for that sort of technical work to be done and for it to be fulfilled and manifested in our machines and AIs.

Ariel: So when you're thinking of AI and ethics, is value alignment basically synonymous, just another way of saying AI and ethics or is it a subset within this big topic of AI and ethics?

Lucas: I think they have different connotations. If one's thinking about AI ethics, I think that one is tending to be moreso focused on applied ethics and normative ethics. One might be thinking about the application of AI systems and algorithms and machine learning in domains in the present day and in the near future. So one might think about atomization and other sorts of things. I think that when one is thinking about value alignment, it's much more broad and expands also into metaethics and really sort of couches and frames the problem of AI ethics as something which happens over decades and which has a tremendous impact. I think that value alignment has a much broader connotation than what AI ethics has traditionally had.

Meia: I think it all depends on how you define value alignment. I think if you take the very broad definition that Lucas has just proposed, I think that yes, it probably includes AI ethics. But you can also think of it more narrowly as simply instantiating your own values into AI systems and having them adopt your goals. In that case, I think there are other issues as well because if you think about it from the point of view of psychology, for example, then it's not just about which values get instantiated and how you do that, how you solve the technical problem, but also we know that humans, even if they know what goals they have and what values they uphold, it's very, very hard for them sometimes to actually act in accordance to them because they have all sorts of cognitive and emotional effective limitations. So in that case I think value alignment is, in this narrow sense, is basically not sufficient. We also need to think about AIs and applications of AIs in terms of how do they help us and how do they make sure that we gain the cognitive competencies that we need to be moral beings and to be really what we should be, not just what we are.

Lucas: Right. I guess to expand on what I was just saying. Value alignment I think in the more traditional sense, it's sort of all ... It's more expansive and inclusive in that it's recognizing a different sort of problem than AI ethics alone has. I think that when one is thinking about value alignment, there are elements of thinking about -- somewhat about machine ethics but also about social, political, technical and ethical issues surrounding the end goal of eventually creating AGI. Whereas, AI ethics can be more narrowly interpreted just as certain sorts of specific cases where AI’s having impact and implications in our lives in the next 10 years. Whereas, value alignment’s really thinking about the instantiation of ethics and machines and making machine systems that are corrigible and robust and docile, which will create a world that we're all happy about living in.

Ariel: Okay. So I think that actually is going to flow really nicely into my next question, and that is, at FLI we tend to focus on existential risks. I was hoping you could talk a little bit about how issues of value alignment are connected to the existential risks that we concern ourselves with.

Lucas: Right. So, we can think of AI systems as being very powerful optimizers. We can imagine there being a list of all possible futures and what intelligence is good for is for modeling the world and then committing to and doing actions which constrain the set of all possible worlds to ones which are desirable. So intelligence is sort of the means by which we get to an end, and ethics is the end towards which we strive. So these are how these two things really integral and work together and how AI without ethics makes no sense and how ethics without AI or intelligence in general also just doesn't work. So in terms of existential risk, there are possible futures that intelligence can lead us to where earth-originating intelligent life no longer exists either intentionally or by accident. So value alignment sort of fits in by constraining the set of all possible futures by working on technical work by doing political and social work and also work in ethics to constrain the actions of AI systems such that existential risks do not occur, such that by some sort of technical oversight, by some misalignment of values, by some misunderstanding of what we want, the AI generates an existential risk.

Meia: So we should remember that homo sapiens represent an existential risk to itself also. We are creating nuclear weapons. We have more of them than we need. So many, in fact, that we could destroy the entire planet with them. Not to mention homo sapiens has also represented an existential risk for all other species. The problem is AI is that we're introducing in the mix a whole new agent that is by definition supposed to be more intelligent, more powerful than us and also autonomous. So as Lucas mentioned, it's very important to think through what kind of things and abilities do we delegate to these AIs and how can we make sure that they have the survival and the flourishing of our species in mind. So I think this is where value alignment comes in as a safeguard against these very terrible and global risks that we can imagine coming from AI.

Lucas: Right. What makes doing that so difficult is beyond the technical issue of just having AI researchers and AI safety researchers knowing how to just get AI systems to actually do what we want without creating a universe of paperclips. There's also this terrible social and political context in which this is all happening where there is really great game-theoretic incentives to be the first person to create artificial general intelligence. So in a race to create AI, a lot of these efforts that seem very obvious and necessary could be cut in favor of more raw power. I think that's probably one of the biggest risks for us not succeeding in creating value-aligned AI.

Ariel: Okay. Right now it's predominantly technical AI people who are considering mostly technical AI problems. How to solve different problems is usually, you need a technical approach for this. But when it comes to things like value alignment and ethics, most of the time I'm hearing people suggest that we can't leave that up to just the technical AI researchers. So I was hoping you could talk a little bit about who should be part of this discussion, why we need more people involved, how we can get more people involved, stuff like that.

Lucas: Sure. So maybe if I just break the problem down into just what I view to be the three different parts then talking about it will make a little bit more sense. So we can break down the value alignment problem into three separate parts. The first one is going to be the technical issues, the issues surrounding actually creating artificial intelligence. The issues of ethics, so the end towards which we strive. The set of possible futures which we would be happy in living, and then also there's the governance and the coordination and the international problem. So we can sort of view this as a problem of intelligence, a problem of agreeing on the end towards which intelligence is driven towards, and also the political and social context in which all of this happens.

So thus far, there's certainly been a focus on the technical issue. So there's been a big rise in the field of AI safety and in attempts to generate beneficial AI, attempts at creating safe AGI and mechanisms for avoiding reward hacking and other sorts of things that happen when systems are trying to optimize their utility function. The Concrete Problems on AI Safety paper has been really important and sort of illustrates some of these technical issues. But even between technical AI safety research and ethics there's disagreement about something also like machine ethics. So how important is machine ethics? Where does machine ethics fit in to technical AI safety research? How much time and energy should we put into certain kinds of technical AI research versus how much time and effort should we put into issues in governance and coordination and addressing the AI arms race issues? How much of ethics do we really need to solve?

So I think there's a really important and open question regarding how do we apply and invest our limited resources in sort of addressing these three important cornerstones in value alignment so that the technical issue, the issues in ethics and then issues in governance and coordination, and how do we optimize working on these issues given the timeline that we have? How much resources should we put in each one? I think that's an open question. Yeah, one that certainly needs to be addressed more about how we're going to move forward given limited resources.

Meia: I do think though the focus so far has been so much on the technical aspect. As you were saying, Lucas, there are other aspects to this problem that need to be tackled. What I'd like to emphasize is that we cannot solve the problem if we don't pay attention to the other aspects as well. So I'm going to try to defend, for example, psychology here, which has been largely ignored I think in the conversation.

So from the point of view of psychology, I think the value alignment problem is double fold in a way. It's about a triad of interactions. Human, AI, other humans, right? So we are extremely social animals. We interact a lot with other humans. We need to align our goals and values with theirs. Psychology has focused a lot on that. We have a very sophisticated set of psychological mechanisms that allow us to engage in very rich social interactions. But even so, we don't always get it right. Societies have created a lot of suffering, a lot of moral harm, injustice, unfairness throughout the ages. So for example, we are very ill-prepared by our own instincts and emotions to deal with inter-group relations. So that's very hard.

Now, people coming from the technical side, they can say, "We're just going to have AI learn our preferences." Inverse reinforcement learning is a proposal that says that basically explains how to keep humans in the loop. So it's a proposal for programing AI such that it gets its reward not from achieving a goal but from getting good feedback from a human because it achieved a goal. So the hope is that this way AI can be correctable and can learn from human preferences.

As a psychologist, I am intrigued, but I understand that this is actually very hard. Are we humans even capable of conveying the right information about our preferences? Do we even have access to them ourselves or is this all happening in some sort of subconscious level? Sometimes knowing what we want is really hard. How do we even choose between our own competing preferences? So this involves a lot more sophisticated abilities like impulse control, executive function, etc. I think that if we don't pay attention to that as well in addition to solving the technical problem, I think we are very likely to not get it right.

Ariel: So I'm going to want to come back to this question of who should be involved and how we can get more people involved, but one of the reasons that I'm talking to the both of you today is because you actually have made some steps in broadening this discussion already in that you set up a workshop that did bring together a multidisciplinary team to talk about value alignment. I was hoping you could tell us a bit more about how that workshop went, what interesting insights were gained that might have been expressed during the workshop, what you got out of it, why you think it's important towards the discussion? Etc.

Meia: Just to give a few facts about the workshop. The workshop took place in December 2017 in Long Beach, California. We were very lucky to have two wonderful partners in co-organizing this workshop. The Berggruen Institute and the Canadian Institute for Advanced Research. And the idea for the workshop was very much to have a very interdisciplinary conversation about value alignment and reframe it as not just a technical problem but also one that involves disciplines such as philosophy and psychology, political science and so on. So we were very lucky actually to have a fantastic group of people there representing all these disciplines. The conversation was very lively and we discussed topics all the way from near term considerations in AI and how we align AI to our goals and also all the way to thinking about AGI and even super intelligence. So it was a fascinating range both of topics discussed and also perspectives being represented.

Lucas: So my inspiration for the workshop was being really interested in ethics and the end towards which this is all going. What really is the point of creating AGI and perhaps even eventually superintelligence? What is it that is good and what is that is valuable? Broadening from that and becoming more interested in value alignment, the conversation thus far has been primarily understood as something that is purely technical. So value alignment has only been seen as something that is for technical AI safety researchers to work on because there are technical issues regarding AI safety and how you get AIs to do really simple things without destroying the world or ruining a million other things that we care about. But this is really, as we discussed earlier, an interdependent issue that covers issues in metaethics and normative ethics, applied ethics. It covers issues in psychology. It covers issue in law, policy, governance, coordination. It covers the AI arms race issue. Solving the value alignment problem and creating a future with beneficial AI is a civilizational project where we need everyone working on all these different issues. On issues of value, on issues of game theory among countries, on the technical issues, obviously.

So what I really wanted to do was I wanted to start this workshop in order to broaden the discussion. To reframe value alignment as not just something in technical AI research but something that really needs voices from all disciplines and all expertise in order to have a really robust conversation that reflects the interdependent nature of the issue and where different sorts of expertise on the different parts of the issue can really come together and work on it.

Ariel: Is there anything specific that you can tell us about what came out of the workshop? Were there any comments that you thought were especially insightful or ideas that you think are important for people to be considering?

Lucas: I mean, I think that for me one of the takeaways from the workshop is that there's still a mountain of work to do and that there are a ton of open questions. This is a very, very difficult issue. I think that one thing I took away from the workshop was that we couldn't even agree on the minimal conditions for which it would be okay to safely deploy AGI. There are just issues that seem extremely trivial in value alignment from the technical side and from the ethical side that seem very trivial, but on which I think there is very little understanding or agreement right now.

Meia: I think the workshop was a start and one good thing that happened during the workshop is I felt that the different disciplines or rather their representatives were able to sort of air out their frustrations and also express their expectations of the others. So I remember this quite iconic moment when one roboticist simply said, "But I really want you ethics people to just tell me what to implement in my system. What do you want my system to do?" So I think that was actually very illustrative of what Lucas was saying -- the need for more joint work. I think there was a lot of expectations I think from both the technical people towards the ethicists but also from the ethicists in terms of like, "What are you doing? Explain to us what are the actual ethical issues that you think you are facing with the things that you are building?" So I think there's a lot of catching up to do on both sides and there's much work to be done in terms of making these connections and bridging the gaps.

Ariel: So you referred to this as sort of a first step or an initial step. What would you like to see happen next?

Lucas: I don't have any concrete or specific ideas for what exactly should happen next. I think that's a really difficult question. Certainly, things that most people would want or expect. I think in the general literature and conversations that we were having, I think that value alignment, as a word and as something that we understand, needs to be expanded outside of the technical context. I don't think that it’s expanded that far. I think that more ethicists and more moral psychologists and people in law policy and governance need to come in and need to work on this issue. I'd like to see more coordinated collaborations, specifically involving interdisciplinary crowds informing each other and addressing issues and identifying issues and really some sorts of formal mechanisms for interdisciplinary coordination on value alignment.

It would be really great if people in technical research, in technical AI safety research and in ethics and governance could also identify all of the issues in their own fields, which the resolution to those issues and the solution to those issues requires answers from other fields. So for example, inverse reinforcement learning is something that Meia was talking about earlier and I think it's something that we can clearly decide and see as being interdependent on a ton of issues in a law and also in ethics and in value theory. So that would be sort of like an issue or node in the landscape of all issues and technical safety research that would be something that is interdisciplinary.

So I think it would be super awesome if everyone from their own respective fields are able to really identify the core issues which are interdisciplinary and able to dissect them into the constituent components and sort of divide them among the disciplines and work together on them and identify the different timelines at which different issues need to be worked on. Also, just coordinate on all those things.

Ariel: Okay. Then, Lucas, you talked a little bit about nodes and a landscape, but I don't think we've explicitly pointed out that you did create a landscape of value alignment research so far. Can you talk a little bit about what that is and how people can use it?

Lucas: Yeah. For sure. With the help of other colleagues at the Future of Life Institute like Jessica Cussins and Richard Mallah, we've gone ahead and created a value alignment conceptual landscape. So what this is is it's a really big tree, almost like an evolutionary tree that you would see, but what it is, is a conceptual mapping and landscape of the value alignment problem. What it's broken down into are the three constituent components, which we were talking about earlier, which is the technical issues, the issues in technically creating safe AI systems. Issues in ethics, breaking that down into issues in metaethics and normative ethics and applied ethics and moral psychology and descriptive ethics where we're trying to really understand values, what it means for something to be valuable and what is the end towards which intelligence will be aimed at. Then also, the other last section is governance. So issues in coordination and policy and law in creating a world where AI safety research can proceed and where there aren't ... Where we don't develop or allow a sort of winner-take-all scenario to rush us towards the end and not really have a final and safe solution towards fully autonomous powerful systems.

So what the landscape here does is it sort of outlines all of the different conceptual nodes in each of these areas. It lays out what all the core concepts are, how they're all related. It defines the concepts and also gives descriptions about how the concepts fit into each of these different sections of ethics, governance, and technical AI safety research. So the hope here is that people from different disciplines can come and see the truly interdisciplinary nature of the value alignment problem, to see where ethics and governance and the technical AI safety research stuff all fits in together and how this all together really forms, I think, the essential corners of the value alignment problem. It's also nice for researchers and other persons to understand the concepts and the landscape of the other parts of this problem.

I think that, for example, technical AI safety researchers probably don't know much about metaethics or they don't spend too much time thinking about normative ethics. I'm sure that ethicists don't spend very much time thinking about technical value alignment and how inverse reinforcement learning is actually done and what it means to do robust human imitation in machines. What are the actual technical, ethical mechanisms that are going to go into AI systems. So I think that this is like a step in sort of laying out the conceptual landscape, in introducing people to each other's concepts. It's a nice visual way of interacting with I think a lot of information and sort of exploring all these different really interesting nodes that explore a lot of very deep, profound moral issues, very difficult and interesting technical issues, and issues in law, policy and governance that are really important and profound and quite interesting.

Ariel: So you've referred to this as the value alignment problem a couple times. I'm curious, do you see this ... I'd like both of you to answer this. Do you see this as a problem that can be solved or is this something that we just always keep working towards and it's going to influence -- whatever the current general consensus is will influence how we're designing AI and possibly AGI, but it's not ever like, "Okay. Now we've solved the value alignment problem." Does that make sense?

Lucas: I mean, I think that that sort of question really depends on your metaethics, right? So if you think there are moral facts, if you think that more statements can be true or false and aren't just sort of subjectively dependent upon whatever our current values and preferences historically and evolutionarily and accidentally happen to be, then there is an end towards which intelligence can be aimed that would be objectively good and which would be the end toward which we would strive. In that case, if we had solved the technical issue and the governance issue and we knew that there was a concrete end towards which we would strive that was the actual good, then the value alignment problem would be solved. But if you don't think that there is a concrete end, a concrete good, something that is objectively valuable across all agents, then the value alignment problem or value alignment in general is an ongoing process and evolution.

In terms of the technical and governance sides of those, I think that there's nothing in the laws of physics or I think in computer science or in game theory that says that we can't solve those parts of the problem. Those ones seem intrinsically like they can be solved. That's nothing to say about how easy or how hard it is to solve those. But whether or not there is sort of an end towards value alignment I think depends on difficult questions in metaethics and whether something like moral error theory is true where all moral statements are simply false and that morality is maybe sort of just like a human invention, which has no real answers or who's answers are all false. I think that's sort of the crux of whether or not value alignment can "be solved" because I think the technical issues and the issues in governance are things which are in principle able to be solved.

Ariel: And Meia?

Meia: I think that regardless of whether there is an absolute end to this problem or not, there's a lot of work that we need to do in between. I also think that in order to even achieve this end, we need more intelligence, but as we create more intelligent agents, again, this problem gets magnified. So there's always going to be a race between the intelligence that we're creating and making sure that it is beneficial. I think at every step of the way, the more we increase the intelligence, the more we need to think about the broader implications. I think in the end we should think of artificial intelligence also not just as a way to amplify our own intelligence but also as a way to amplify our moral competence as well. As a way to gain more answers regarding ethics and what our ultimate goals should be.

So I think that the interesting questions that we can do something about are somewhere sort of in between. We will not have the answer before we are creating AI. So we always have to figure out a way to keep up with the development of intelligence in terms of our development of moral competence.

Ariel: Meia, I want to stick with you for just a minute. When we talked for the FLI end of your podcast, one of the things you said you were looking forward to in 2018 is broadening this conversation. I was hoping you could talk a little bit more about some of what you would like to see happen this year in terms of getting other people involved in the conversation, who you would like to see taking more of an interest in this?

Meia: So I think that unfortunately, especially in academia, we've sort of defined our work so much around these things that we call disciplines. I think we are now faced with problems, especially in AI, that really are very interdisciplinary. We cannot get the answers from just one discipline. So I would actually like to see in 2018 more sort of, for example, funding agencies proposing and creating funding sources for interdisciplinary projects. The way it works, especially in academia, so you propose grants to very disciplinary-defined granting agencies.

Another thing that would be wonderful to start happening is our education system is also very much defined and described around these disciplines. So I feel that, for example, there's a lack of courses, for example, that teach students in technical fields things about ethics, moral psychology, social sciences and so on. The converse is also true; in social sciences and in philosophy we hear very little about advancements in artificial intelligence and what's new and what are the problems that are there. So I'd like to see more of that. I'd like to see more courses like this developed. I think a friend of mine and I, we've spent some time thinking about how many courses are there that have an interdisciplinary nature and actually talk about the societal impacts of AI and there's a handful in the entire world. I think we counted about five or six of them. So there's a shortage of that as well.

But then also educating the general public. I think thinking about the implications of AI and also the societal implications of AI and also the value alignment problem is something that's probably easier for the general public to grasp rather than thinking about the technical aspects of how to make it more powerful or how to make it more intelligent. So I think there's a lot to be done in educating, funding, and also just simply having these conversations. I also very much admire what Lucas has been doing. I hope he will expand on it, creating this conceptual landscape so that we have people from different disciplines understanding their terms, their concepts, each other’s theoretical frameworks with which they work. So I think all of this is valuable and we need to start. It won't be completely fixed in 2018 I think. But I think it's a good time to work towards these goals.

Ariel: Okay. Lucas, is there anything that you wanted to add about what you'd like to see happen this year?

Lucas: I mean, yeah. Nothing else I think to add on to what I said earlier. Obviously we just need as many people from as many disciplines working on this issue because it's so important. But just to go back a little bit, I was also really liking what Meia said about how AI systems and intelligence can help us with our ethics and with our governance. I think that seems like a really good way forward potentially if as our AI systems grow more powerful in their intelligence, they're able to inform us moreso about our own ethics and our own preferences and our own values, about our own biases and about what sorts of values and moral systems are really conducive to the thriving of human civilization and what sorts of moralities lead to sort of navigating the space of all possible minds in a way that is truly beneficial.

So yeah. I guess I'll be excited to see more ways in which intelligence and AI systems can be deployed for really tackling the question of what beneficial AI exactly entails. What does beneficial mean? We all want beneficial AI, but what is beneficial, what does that mean? What does that mean for us in a world in which no one can agree on what beneficial exactly entails? So yeah, I'm just excited to see how this is going to work out, how it's going to evolve and hopefully we'll have a lot more people joining this work on this issue.

Ariel: So your comment reminded me of a quote that I read recently that I thought was pretty interesting. I've been reading Paula Boddington's book Toward a Code of Ethics for Artificial Intelligence. This was actually funded at least in part if not completely by FLI grants. But she says, "It's worth pointing out that if we need AI to help us make moral decisions better, this cast doubt on the attempts to ensure humans always retain control over AI." I'm wondering if you have any comments on that.

Lucas: Yeah. I don't know. I think this sort of a specific way of viewing the issue or it's a specific way of viewing what AI systems are for and the sort of future that we want. In the end is the best at all possible futures a world in which human beings ultimately retain full control over AI systems. I mean, if AI systems are autonomous and if value alignment actually succeeds, then I would hope that we created AI systems which are more moral than we are. AI systems which have better ethics, which are less biased, which are more rational, which are more benevolent and compassionate than we are. If value alignment is able to succeed and if we're able to create autonomous intelligent systems of that sort of caliber of ethics and benevolence and intelligence, then I'm not really sure what the point is of maintaining any sort of meaningful human control.

Meia: I agree with you, Lucas. That if we do manage to create ... In this case, I think it would have to be artificial general intelligence that is more moral, more beneficial, more compassionate than we are, then the issue of control, it's probably not so important. But in the meantime, I think, while we are sort of tinkering with artificial intelligent systems, I think the issue of control is very important.

Lucas: Yeah. For sure.

Meia: Because we wouldn't want to ... We wouldn't want to cut out of the loop too early before we've managed to properly test the system, make sure that indeed it is doing what we intended to do.

Lucas: Right. Right. I think that in the process of that that it requires a lot of our own moral evolution, something which we humans are really bad and slow at. As president of FLI Max Tegmark likes to talk about, he likes to talk about the race between our growing wisdom and the growing power of our technology. Now, human beings are really kind of bad at keeping our wisdom in pace with the growing power of our technology. If we sort of look at the moral evolution of our species, we can sort of see huge eras in which things which were seen as normal and mundane and innocuous, like slavery or the subjugation of women or other sorts of things like that. Today we have issues with factory farming and animal suffering and income inequality and just tons of people who are living with exorbitant wealth that doesn't really create much utility for them, whereas there's tons of other people who are in poverty and who are still starving to death. There are all sorts of things that we can see in the past as being obviously morally wrong.

Meia: Under the present too.

Lucas: Yeah. So then we can see that obviously there must be things like that today. We wonder, "Okay. What are the sorts of things today that we see and innocuous and normal and as mundane that the people of tomorrow, as William MacAskill says, will see us as moral monsters? How are we moral monsters today, but we simply can't see it? So as we create powerful intelligence systems and we're working on our ethics and we're trying to really converge on constraining the set of all possible worlds into ones which are good and which are valuable and ethical, it really demands a moral evolution of ourselves that we sort of have to figure out ways to catalyze and work on and move through, I think, faster.

Ariel: Thank you. So as you consider attempts to solve the value alignment problem, what are you most worried about, either in terms of us solving it badly or not quickly enough or something along those lines? What is giving you the most hope in terms of us being able to address this problem?

Lucas: I mean, I think just technically speaking, ignoring the likelihood of this -- the worst of all possible outcomes would be something like an s-risk. So an s-risk is a subset of x-risks -- s-risk stands for suffering risk. So this is a sort of risk whereby some sort of value misalignment, whether it be intentional or much more likely accidental, some seemingly astronomical amount of suffering is produced by deploying a misaligned AI system. The way that this was function is given certain sorts of assumptions about the philosophy of mind, about consciousness and machines, if we understand potentially consciousness and experience to be substrate-independent, meaning if consciousness can be instantiated in machine systems, that you don't just need meat to be conscious, but you need something like integrated information or information processing or computation or something like that, then the invention of AI systems and superintelligence and the spreading of intelligence, which optimizes towards any sort of arbitrary end, it could potentially lead to vast amounts of digital suffering, which would potentially arise accidentally or through subroutines or simulations, which would be epistemically useful but that involve a great amount of suffering. That coupled with these artificial intelligent systems running on silicon and iron and not on squishy, wet, human neurons would be that it would be running at digital time scales and not biological time scales. So there would be huge amplification of the speed of which the suffering was run. So subjectively, we might infer that a second for a computer, a simulated person on a computer, would be much greater than that for a biological person. Then we can sort of reflect that these are the sorts of risks -- or an s-risk would be something that would be really bad. Just any sort of way that AI can be misaligned and lead to a great amount of suffering. There's a bunch of different ways that this could happen.

So something like an s-risk would be something super terrible but it's not really clear how likely that would be. But yeah, I think that beyond that obviously we're worried about existential risk, we're worried about ways that this could curtail or destroy the development of earth-originating intelligent life. Ways that this really might happen are I think most likely because of this winner-take-all scenario that you have with AI. We've had nuclear weapons for a very long time now, and we're super lucky that nothing bad has happened. But I think the human civilization is really good at getting stuck into minimum equilibria where we get locked into these positions where it's not easy to escape from. So it's really not easy to disarm and get out of the nuclear weapons situation once we've discovered it. Once we start to develop, I think, more powerful and robust AI systems, I think already that a race towards AGI and towards more and more powerful AI might be very, very hard to stop if we don't make significant progress on that soon, if we're not able to get a ban on lethal autonomous weapons and if we're not able to introduce any real global coordination and that we all just start racing towards more powerful systems that there might be a race towards AGI, which would cut corners on safety and potentially make the likelihood of an existential risk or suffering risk more likely.

Ariel: Are you hopeful for anything?

Lucas: I mean, yeah. If we get it right, then the next billion years can be super amazing, right? It's just kind of hard to internalize that and think about that. It's really hard to say I think how likely it is that we'll succeed in any direction. But yeah, I'm hopeful that if we succeed in value alignment that the future can be unimaginably good.

Ariel: And Meia?

Meia: What's scary to me is that it might be too easy to create intelligence. That there's nothing in the laws of physics making it hard for us. Thus I think that it might happen too fast. Evolution took a long time to figure out how to make us intelligent, but that was probably just because it was trying to optimize for things like energy consumption and making us a certain size. So that’s scary. It's scary that it's happening so fast. I'm particularly scared that it might be easy to crack general artificial intelligence. I keep asking Max, "Max, but isn't there anything in the laws of physics that might make it tricky?" His answer and also that of more physicists that I've been discussing with is that, "No, it doesn't seem to be the case."

Now, what makes me hopeful is that we are creating this. Stuart Russell likes to give this example of a message from an alien civilization, an alien intelligence that says, "We will be arriving in 50 years." Then he poses the question, "What would you do when you prepare for that?" But I think with artificial intelligence it's different. It's not like it's arriving and it's a given and it has a certain form or shape that we cannot do anything about. We are actually creating artificial intelligence. I think that's what makes me hopeful that if we actually research it right, that if we think hard about what we want and we work hard at getting our own act together, first of all, and also on making sure that this stays and is beneficial, we have a good chance to succeed.

Now, there'll be a lot of challenges in between from very near-term issues like Lucas was mentioning, for example, autonomous weapons, weaponizing our AI and giving it the right to harm and kill humans, to other issues regarding income inequality enhanced by technological development and so on, to down the road how do we make sure that autonomous AI systems actually adopt our goals. But I do feel that it is important to try and it's important to work at it. That's what I'm trying to do and that's what I hope others will join us in doing.

Ariel: All right. Well, thank you both again for joining us today.

Lucas: Thanks for having us.

Meia: Thanks for having us. This was wonderful.

Ariel: If you're interested in learning more about the value alignment landscape that Lucas was talking about, please visit FutureofLife.org/valuealignmentmap. We'll also link to this in the transcript for this podcast. If you enjoyed this podcast, please subscribe, give it a like, and share it on social media. We'll be back again next month with another conversation among experts.

View transcript

Podcast

Related episodes

If you enjoyed this episode, you might also like:

17 July, 2026

Podcast: AI and the Value Alignment Problem with Meia Chita-Tegmark and Lucas Perry

Transcript

Related episodes

Why AI Evaluations Are Broken and How to Fix Them (with David Manheim)

How AI Is Replacing Children’s Ability to Think (with Randi Weingarten)

The Gulf Between AI Progress and Political Understanding (with Dex Hunter-Torricke)

How AI Companions Trap Users Through Addictive Design (with Claire Boine)

Sign up for the Future of Life Institute newsletter