AI Alignment Podcast: On Becoming a Moral Realist with Peter Singer

Are there such things as moral facts? If so, how might we be able to access them? Peter Singer started his career as a preference utilitarian and a moral anti-realist, and then over time became a hedonic utilitarian and a moral realist. How does such a transition occur, and which positions are more defensible? How might objectivism in ethics affect AI alignment? What does this all mean for the future of AI?

On Becoming a Moral Realist with Peter Singer is the sixth podcast in the AI Alignment series, hosted by Lucas Perry. For those of you that are new, this series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with Peter Singer. Peter is a world-renowned moral philosopher known for his work on animal ethics, utilitarianism, global poverty, and altruism. He’s a leading bioethicist, the founder of The Life You Can Save, and currently holds positions at both Princeton University and The University of Melbourne.

Topics discussed in this episode include:

  • Peter’s transition from moral anti-realism to moral realism
  • Why emotivism ultimately fails
  • Parallels between mathematical/logical truth and moral truth
  • Reason’s role in accessing logical spaces, and its limits
  • Why Peter moved from preference utilitarianism to hedonic utilitarianism
  • How objectivity in ethics might affect AI alignment
In this interview we discuss ideas contained in the work of Peter Singer. You can learn more about Peter’s work here and find many of the ideas discussed on this podcast in his work The Point of View of the Universe: Sidgwick and Contemporary EthicsYou can listen to the podcast above or read the transcript below.

Lucas: Hey, everyone, welcome back to the AI Alignment Podcast series. I’m Lucas Perry, and today, we will be speaking with Peter Singer about his transition from being a moral anti-realist to a moral realist. In terms of AI safety and alignment, this episode primarily focuses on issues in moral philosophy.

In general, I have found the space of moral philosophy to be rather neglected in discussions of AI alignment where persons are usually only talking about strategy and technical alignment. If it is unclear at this point, moral philosophy and issues in ethics make up a substantial part of the AI alignment problem and have implications in both strategy and technical thinking.

In terms of technical AI alignment, it has implications in preference aggregation, and it’s methodology, in inverse reinforcement learning, and preference learning techniques in general. It affects how we ought to proceed with inter-theoretic comparisons of value, with idealizing persons or agents in general and what it means to become realized, how we deal with moral uncertainty, and how robust preference learning versus moral reasoning systems should be in AI systems. It has very obvious implications in determining the sort of society we are hoping for right before, during, and right after the creation of AGI.

In terms of strategy, strategy has to be directed at some end and all strategies smuggle in some sort of values or ethics, and it’s just good here to be mindful of what those exactly are.

And with regards to coordination, we need to be clear, on a descriptive account, of different cultures or groups’ values or meta-ethics and understand how to move from the state of all current preferences and ethics onwards given our current meta-ethical views and credences. All in all, this barely scratches the surface, but it’s just a point to illustrate the interdependence going on here.

Hopefully this episode does a little to nudge your moral intuitions around a little bit and impacts how you think about the AI alignment problem. In coming episodes, I’m hoping to pivot into more strategy and technical interviews, so if you have any requests, ideas, or persons you would like to see interviewed, feel free to reach out to me at lucas@futureoflife.org. As usual, if you find this podcast interesting or useful, it’s really a big help if you can help share it on social media or follow us on your preferred listening platform.

As many of you will already know, Peter is a world-renowned moral philosopher known for his work on animal ethics, utilitarianism, global poverty, and altruism. He’s a leading bioethicist, the founder of The Life You Can Save, and currently holds positions at both Princeton University and The University of Melbourne. And so, without further ado, I give you Peter Singer.

Thanks so much for coming on the podcast, Peter. It’s really wonderful to have you here.

Peter: Oh, it’s good to be with you.

Lucas: So just to jump right into this, it would be great if you could just take us through the evolution of your metaethics throughout your career. As I understand, you began giving most of your credence to being an anti-realist and a preference utilitarian, but then over time, it appears that you’ve developed into a hedonic utilitarian and a moral realist. Take us through the evolution of these views and how you developed and arrived at your new ones.

Peter: Okay, well, when I started studying philosophy, which was in the 1960s, I think the dominant view, at least among people who were not religious and didn’t believe that morals were somehow an objective truth handed down by God, was what was then referred to as an emotivist view, that is the idea that moral judgments express our attitudes, particularly, obviously from the name, emotional attitudes, that they’re not statements of fact, they don’t purport to describe anything. Rather, they express attitudes that we have and they encourage others to share those attitudes.

So that was probably the first view that I held, siding with people who were non-religious. It seemed like a fairly obvious option. Then I went to Oxford and I studied with R.M. Hare who was a professor of moral philosophy at Oxford at the time and a well-known figure in the field. His view was also in this general ballpark of non-objectivist or, as we would know say, non-realist theories, non-cognitivist] was another term used for them. They didn’t purport to be about knowledge.

But his view was that when we make a moral judgment, we are prescribing something. So his idea was that moral judgments fall into the general family of imperative judgments. So if I tell you shut the door, that’s an imperative. It doesn’t say anything that’s true or false. And moral judgments were a particular kind of imperative according to Hare, but they had this feature that they had to be universalizable. So by universalizable, Hare meant that if you were to make a moral judgment, your prescription would have to hold in all relevantly similar circumstances. And relevantly similar was defined in such a way that it didn’t depend on who the people were.

So, for example, if I were to prescribe that you should be my slave, the fact that I’m the slave master and you’re the slave isn’t a relevantly similar circumstance. If there’s somebody just like me and somebody just like you, that I happen to occupy your place, then the person who is just like me would also be entitled to be the slave master of me ’cause now I’m in the position of the slave.

Obviously, if you think about moral judgments that way, that does put a constraint on what moral judgments you can accept because you wouldn’t want to be a slave, presumably. So I liked this view better than the straightforwardly emotivist view because it did seem to give more scope for argument. It seemed to say look, there’s some kind of constraint that really, in practice, means we have to take everybody’s interests into account.

And I thought that was a good feature about this, and I drew on that in various kinds of applied contexts where I wanted to make moral arguments. So that was my position, I guess, after I was at Oxford, and for some decades after that, but I was never completely comfortable with it. And the reason I was not completely comfortable with it was that there was always a question you could ask on Hare’s view. Hare said where does this universalizability constraint come from on our moral judgment? And Hare’s answer was well, it’s a feature of moral language. It’s implied in, say, using the terms ought or good or bad or beauty or obligation. It’s implied that the judgments you are making are universalizable in this way.

And that, in itself, was plausible enough, but it was open to the response that well, in that case, I’m just not gonna use moral language. If moral language requires me to make universalizable prescriptions and that means that I can’t do all sorts of things or can’t advocate all sorts of things that I would want to advocate, then I just won’t use moral language to justify my conduct. I’ll use some other kind of language, maybe prudential language, language of furthering my self-interests. And what’s wrong with doing that moreover, and it’s not just that they can do that, but tell me what’s wrong with them doing that?

So this is a kind of a question about why act morally. And on his view, it wasn’t obvious from his view what the answer to that would be, and, in particular, it didn’t seem that there would be any kind of answer about that’s irrational or you’re missing something. It seemed, really, as if it was an open choice that you had whether to use moral language or not.

So as I got further into the problem, as I tried to develop arguments that would show that it was a requirement of reason, not just a requirement of moral language, but a requirement of reason that we universalize our judgements.

And yet, it was obviously a problem in fitting that in to Hare’s framework, which is, I’ve been saying, was a framework within this general non-cognitivist family. And for Hare, the idea that there are objective reasons for action didn’t really make sense. They were just these desires that we had, which led to us making prescriptions and then the constraint that we universalize their prescriptions, but he explicitly talked about the possibility of objective prescriptions and said that that was a kind of nonsense, which I think comes out of the general background of the kind of philosophy that came out of logical positivism and the verificationist idea that things that you couldn’t verify were nonsense or so and so. And that’s why I was pretty uncomfortable with this, but I didn’t really see bright alternatives to it for some time.

And then, I guess, gradually, I was persuaded by a number of philosophers who were respected that Hare was wrong about rejecting the idea of objective truth in morality. I talked to Tom Nagel and probably most significant was the work of Derek Parfit, especially his work On What Matters, volumes one and two, which I saw in advance in draft form. He circulated drafts of his books to lots of people who he thought might give him some useful criticism. And so I saw that many years before it came out, and the arguments did seem, to me, pretty strong, particularly the objections to the kind of view that I’d held, which, by this time, was no longer usually called emotivism, but was called expressivism, but I think it’s basically a similar view, a view in the ballpark.

And so I came to the conclusion that there is a reasonable case for saying that there are objective moral truths and this is not just a matter of our attitudes or of our preferences universalized, but there’s something stronger going on and it’s, in some ways, more like the objectivity of mathematical truths or perhaps of logical truths. It’s not an empirical thing. This is not something you can describe that comes in the world, the natural world of our sense that you can find or prove empirically. It’s rather something that is rationally self-evident, I guess, to people who reflect on it properly and think about it carefully. So that’s how I gradually made the move towards objectivist metaethic.

Lucas: I think here, it would be really helpful if you could thoroughly unpack what your hedonistic utilitarian objectivist meta-ethics actually looks like today, specifically getting into the most compelling arguments that you found in Parfit and in Nagel that led you to this view.

Peter: First off, I think we should be clear that being an objectivist about metaethics is one thing. Being a hedonist rather than a preference utilitarian is a different thing, and I’ll describe … There is some connection between them as I’ll describe in a moment, but I could have easily become an objectivist and remained a preference utilitarian or held some other kind of normative moral view.

Lucas: Right.

Peter: So the metaethic view is separate from that. What were the most compelling arguments here? I think one of the things that had stuck with me for a long time and that had restrained me from moving in this direction was the idea that it’s hard to know what you mean when you say that something is an objective truth outside the natural world. So in terms of saying that things are objectively true in science, the truths of scientific investigation, we can say well, there’s all of this evidence for it. No rational person would refuse to believe this once they were acquainted with all of this evidence. So that’s why we can say that that is objectively true.

But that’s clearly not going to work for truths in ethics, which, assuming of course that we’re not naturalists, that we don’t think this can be deduced from some examination of human nature or the world, I certainly don’t think that and the people that are influential on me, Nagel and Parfit in particular, also didn’t think that.

So the only restraining question was well, what could this really amount to? I had known going back to the intuitionists in the early 20th century, people like W.D. Ross or, earlier, Henry Sidgwick, who was a utilitarian objectivist philosopher, that people made the parallel with mathematical proofs that there are mathematical proofs that we see as true by direct insight into their truths by their self-evidence, but I have been concerned about this. I’d never really done a deep study of philosophy or mathematics, but I’d been concerned about this because I thought there’s a case for saying that mathematical truths are an analytic truths, they’re truths in virtue of the meanings of the terms and virtue of the way we define what we mean by the numbers and by equals or the various other terms that we use in mathematics so that it’s basically just the unpacking of an analytic system.

The philosophers that I respected didn’t think this view had been more popular at the time when I was a student and it had stuck with me for a while, and although it’s not disappeared, I think it’s perhaps not as widely held a view now as it was then. So that plus the arguments that were being made about how do we understand mathematical truths, how do we understand the truths of logical inference. We grasps these as self-evident. We find them undeniable, yet this is, again, a truth that is not part of the empirical world, but it doesn’t just seem that it’s an analytic truth either. It doesn’t just seem that it’s the meanings of the terms. It does seem that we know something when we know the truths of logic or the truths of mathematics.

On this basis, it started to seem like the idea that there are these non-empirical truths in ethics as well might be more plausible than I thought it was before. And I also went back and read Henry Sidgwick who’s a philosopher that I greatly admire and that Parfit also greatly admired, and looked at his arguments about what he saw as, what he called, moral axioms, and that obviously makes the parallel with axioms of mathematics.

I looked at them and it did seem to me difficult to deny, that is, claims, for example, that there’s no reason for preferring one moment of our existence to another in itself. In other words that we shouldn’t discount the future, except for things like uncertainty, but otherwise, the future is just as important as the present, an idea somewhat similar to his universalizability, but somewhat differently stated by Sidgwick that if something is right for someone, then it’s right independently of the identities of the people involved. But for Sidgwick, as I say, that was, again, a truth of reason, not simply an implication of the use of particular moral terms. Thinking about that, that started to seem right to me, too.

And, I guess, finally, Sidgwick’s claim that the interest of one individual are no more important than the interests of another, assuming that the goods involved that can be done to that person, that is the extent of their interests are similar. Sidgwick’s claim was that people were reflecting carefully on these truths can see that they’re true, and I thought about that, and it did seem to me that … It was pretty difficult to deny, not that nobody will deny them, but that they do have a self-evidence about them. That seemed to me to be a better basis for ethics than views that I’d been holding up to that point, the views that so came out of, originally, emotivism and then out of prescriptivism.

It was a reasonable chance that that was right. As you say, I should give it more credence than I have. It’s not that I’m 100% certain that it’s right by any means, but that’s a plausible view that’s worth defending and trying to see what objections people make to it.

Lucas: I think there’s three things here that would be helpful for us to dive in more on. The first thing is, and this isn’t a part of metaethics, which I’m particularly acquainted with, so, potentially, you can help guide us through this part a little bit more. This non-naturalism vs naturalism argument. Your view is, I believe you’re claiming, is a non-naturalist view is you’re claiming that you can not deduce the axioms of ethics or the basis of ethics from a descriptive or empirical account of the universe?

Peter: That’s right. There certainly are still naturalists around. I guess Peter Railton is a well-known, contemporary, philosophical naturalist. Perhaps Frank Jackson, my Australian friend and colleague. And some of the naturalist views have become more complicated than they used to be. I suppose the original idea of naturalism that people might be more familiar with is simply the claim that there is a human nature and that acting in accordance with that human nature is the right thing to do, so you describe human nature and then you draw from that what are the characteristics that we ought to follow.

That, I think, just simply doesn’t work. I think it has its origins in a religious framework in which you think that God has created our nature with particular purposes that we should behave in certain ways. But the naturalists who defend it, going back to Aquinas even, maintain that it’s actually independent of that view.

If you, in fact, you take an evolutionary view of human nature, as I think we should, then our nature is morally neutral. You can’t derive any moral conclusions from what our nature is like. It might be relevant to know what our nature is like in order to know that if you do one thing, that might lead to certain consequences, but it’s quite possible that, for example, our nature is to seek power and to use force to obtain power, that that’s an element of human nature or, on a group level, to go to war in order to have power over others, and yet naturalists wouldn’t wanna say that those are the right things. They would try and give some account as to why how some of that’s a corruption of human nature.

Lucas: Putting aside naturalist accounts that involve human nature, what about a purely descriptive or empirical understanding of the world, which includes, for example, sentient beings and suffering, and suffering is like a substantial and real ontological fact of the universe and the potential of deducing ethics from facts about suffering and what it is like to suffer? Would that not be a form of naturalism?

Peter: I think you have to be very careful about how you formulate this. What you said sounds a little bit like what Sam Harris says in his book, The Moral Landscape, which does seem to be a kind of naturalism because he thinks that you can derive moral conclusions from science, including exactly the kinds of things that you’ve talked about, but I think there’s a gap there, and the gap has to be acknowledged. You can certainly describe suffering and you can describe happiness conversely, but you need to get beyond description if you’re going to have a normative judgment. That is if you’re gonna have a judgment that says what we ought to do or what’s the right thing to do or what’s a good thing to do, there’s a step that’s just being left out.

If somebody says sentient beings can suffer pain or they can be happy, this is what suffer and pain is like, this is what being happy is like; therefore, we ought to promote happiness, which goes back to David Hume who pointed this out that various moral arguments describe the world using is, is, is, this is the case, and then, suddenly, but without any explanation, they say and therefore, we ought. Needs to be explained how you get from this is statement to the ought statements.

Lucas: It seems that reason, whatever reason might be and however you might define that, seems to do a lot of work at the foundation of your moral view because it seems that reason is what leads you towards the self-evident truth of certain foundational ethical axioms. Why might we not be able to pull the same sort of move with a form of naturalistic moral realism like Sam Harris develops by simply stating that given a full descriptive account of the universe and given first person accounts of suffering and what suffering is like, that it is self-evidently true that built into the nature of that sort of property or part of the universe is that it ought to be diminished?

Peter: Well, if you’re saying that … There is a fine line, maybe this is what you’re suggesting, between saying from the description, we can deduce what we ought to do and between saying when we reflect on what suffering is and when we reflect on what happiness is, we can see that it is self-evident that we ought to promote happiness and we ought to reduce suffering. So I regard that as a non-naturalist position, but you’re right that the two come quite close together.

In fact, this is one of the interesting features of volume three of Parfit’s On What Matters, which was only published posthumously, but was completed before he died, and in that, he responds to essays that are in a book that I edited called Does Anything Really Matter. The original idea was that he would respond in that volume, but, as often happened with Parfit, he wrote responses as such length that it needed to be a separate volume. It would’ve made the work too bulky to put them together, but Peter Railton had an essay in Does Anything Really Matter, and Parfit responded to it, and then he invited Railton to respond to his response, and, essentially, they are saying that yeah, their views have become closer anyway, there’s been a convergence, which is pretty unusual in philosophy because philosophers tend to emphasize the differences between their views.

Between what Parfit calls his non-natural objectivist view and between Railton’s naturalist view, because Railton’s is a more sophisticated naturalist view, the line starts to become a little thin, I agree. But, to me, the crucial thing is that you’re not just saying here’s this description; therefore, we ought to do this. But you’re saying if we understand what we’re talking about here, we can have as an intuition of self-evidence, the proposition that it’s good to promote this or it’s good to try to prevent this. So that’s the moral proposition, that it is good to do this. And that’s the proposition that you have to take some other step. You can say it’s self-evident, but you have to take some other step from simply saying this is what suffering is like.

Lucas: Just to sort of capture and understand your view a bit more here, and going back to, I think, mathematics and reason and what reason means to you and how it operates the foundation of your ethics, I think that a lot of people will sort of get lost or potentially feel it is maybe an arbitrary or cheap move to …

When thinking about the foundations of mathematics, there are foundational axioms, which is self-evidently true, which no one will deny, and then translating that move into the foundations of ethics into determining what we ought to do, it seems like there would be a lot of peole being lost there, there would be a lot of foundational disagreement there. When is it permissible or okay or rational to make that sort of move? What does it mean to say that these really foundational parts of ethics are self-evidently true? How is not the case that that’s simply an illusion or simply a byproduct of evolution that we’re confused that these certain fictions that we’ve evolved are self-evidently true?

Peter: Firstly, let me say, as I’ve mentioned before, I don’t claim that we can be 100% certain about moral truths, but I do think that it’s a plausible view. One reason why it relates to, you just mentioned, being a product of evolution, one reason why it relates to that, and this is something that I argued with my co-author Katarzyna de Lazari-Radek in the 2014 book we wrote called The Point of View of the Universe, which is, in fact, a phrase form Sidgwick, and that argument is that there are a number of moral judgments that we make, there are many moral judgments that we make that we know have evolutionary origins, so lots of things that we think of as wrong, originated because they would not have helped us to survive or they would not have helped a small tribal group to survive to allow certain kinds of conduct. And some of those, we might wanna reject today.

We might think, for example, we have an instinctive repugnance of incest, but Jonathon Hyde has shown that even if you describe a case where adult brothers and sisters who choose to have sex and nothing bad happens as a result of that, their relationship remains as strong as ever, and they have fun, and that’s the end of it, people still say oh, somehow that’s wrong. They try to make up reasons why it’s wrong. That, I think, is an example of an evolved impulse, which, perhaps, is no longer really apposite because we have effective contraception, and so what are the evolutionary reasons why we might want to avoid incest are not necessarily there.

But in a case of the kinds of things that I’m talking about and that Sidgwick is talking about, like the idea that everyone’s good is of equal significance, they have perceived why we would’ve evolved to have bad attitude because, in fact, it seems harmful to our prospects of survival and reproduction to give equal weight to the interest of complete strangers.

The fact that people do think this, and if you look at a whole lot of different independent, historical, ethical traditions in different cultures and different parts of the world at different times, you do find many thinkers who converge on something like this idea in various formulations. So why do they converge on this given that it doesn’t seem to have that evolutionary justification or explanation as to why it would’ve evolved?

I think that suggests that it may be a truth of reason and, of course, you may then say well, but reason has also evolved, and indeed it has, but I think that reason may be a little different in that we evolved a capacity to reason various specific problem solving needs, helped us to survive in lots of circumstances. But it may then enable us to see things that have no survival value, just as no doubt simple arithmetic has a survival value, but understanding the truths of higher mathematics don’t really have a survival value, so maybe similarly in ethics, there are some of these more abstract universal truths that don’t have a survival value, but which, nevertheless, the best explanation for why many people seem to come to these views is that they’re truths of reason, and once we’re capable of reasoning, we’re capable of understanding these truths.

Lucas: Let’s start off at reason and reason alone. When moving from reason and thinking, I guess, alongside here about mathematics for example, how is one moving specifically from reason to moral realism and what is the metaphysics of this kind of moral realism in a naturalistic universe without anything supernatural?

Peter: I don’t think that it needs to have a very heavyweight metaphysical presence in the universe. Parfit actually avoided the term realism in describing his view. He called it non-naturalistic normative objectivism because he thought that realism carried this idea that it was part of the furniture of the universe, as philosophers say, that the universe consists of the various material objects, but in addition to that, it consists of moral truths is if they’re somehow sort of floating there out in space, and that’s not the right way to think about it.

I’d say, rather, the right way to think about it is as, you know, we do with logical and mathematical truths that once you have been capable of a certain kind of thought, they will move towards these truths. They have the potential and capacity for thinking along these lines. One of the claims that I would make a consequence of my acceptance of objectivism in ethics as the rationally based objectivism is that the morality that we humans have developed on Earth in this, anyway, at this more abstract, universal level is something that aliens from another galaxy could also have achieved if they had similar capacities of thought or maybe greater capacities of thought. It’s always a possible logical space, you could say, or a rational space that is there that beings may be able to discover once they develop those capacities.

You can see mathematics in that way, too. It’s one of a number of possible ways of seeing mathematics and of seeing logic, but they’re just timeless things that, in some way, truths or laws, if you like, but they don’t exist in the sense in which the physical universe exists.

Lucas: I think that’s really a very helpful way of putting it. So the claim here is that through reason, one can develop the axioms of mathematics and then eventually develop quantum physics and other things. And similarly, when reason is applied to thinking about what one ought to do or when thinking about the status of sentient creatures that one is applying logic and reason to this rational space and that this rational space has truths in the same way that mathematics does?

Peter: Yes, that’s right. It has at least perhaps only a very small number, Sidgwick came up with three axioms that are perhaps only a very small number of truths and fairly abstract truths, but that they are truths. That’s the important aspect. That they’re not just particular attitudes, which beings who evolved as homo sapiens have all are likely to understand and accept, but beings who evolved in a different galaxy in a quite different way would not accept. My claim is that if they are also capable of reasoning, if evolution had again produced rational beings, they would be able to see the truths in the same way as we can.

Lucas: So spaces of rational thought and of logic, which can or can not be explored, seems very conceptual queer to me, such that I don’t even really know how to think about it. I think that one would worry that one is applying reason, whatever reason might be, to a fictional space. I mean you’re discussing earlier that some people believe mathematics to be simply the formalization of what is analytically true about the terms and judgments and the axioms and then it’s just a systematization of that and an unpacking of it from beginning into infinity. And so, I guess, it’s unclear to me how one can discern spaces of rational inquiry which are real, from ones which are anti-real or which are fictitious. Does that make sense?

Peter: It’s a problem. I’m not denying that there is something mysterious, I think maybe my former professor, R.M. Hare, would have said queer … No, it was John Mackie, actually, John Mackie was also at Oxford when I was there, who said these must be very queer things if there are some objective moral truths. I’m not denying that it’s something that, in a way, would be much simpler if we could explain everything in terms of empirical examination of the natural world and say there’s only that plus there are formal systems. There are analytic systems.

But I’m not persuaded that that’s a satisfactory explanation of mathematics or logic either, so if those who are convinced that this is a satisfactory way of explaining logic and mathematics, may well think that then they don’t need this explanation of ethics either, but it is a matter of if we need to appeal to something outside the natural realm to understand some of the other things about the way we reason, then perhaps ethics is another candidate for this.

Lucas: So just drawing parallels again here with mathematics ’cause I think it’s the most helpful. Mathematics is incredible for helping us to describe and predict the universe. The president of the Future of Life Institute, Max Tegmark, develops an idea of potential mathematical Platonism or realism where the universe can be understood primarily as, and sort of ontologically, a mathematical object within, potentially, a multiverse because as we look into the properties and features of quarks and the building blocks of the universe, all we find is more mathematical properties and mathematical relationships.

So within the philosophy of math, there’s certainly, it seems, open questions about what math is and what the relation of mathematics is to the fundamental metaphysics and ontology of the universe and potential multiverse. So in terms of ethics, what information or insight or anything do you think that we’re missing could further inform our view that there potentially is objective morality or whatever that means or inform us that there is a space of moral truths which can be arrived at by non-anthropocentric minds, like aliens minds you said could also arrive at the moral truths as they could also arrive at mathematical truths.

Peter: So what further insight would show that this was correct, other, presumably, than the arrival of aliens who start swapping mathematical theorems with us?

Lucas: And have arrived at the same moral views. For example, if they show up and they’re like hey, we’re hedonistic consequentialists and we’re really motivated to-

Peter: I’m not saying they’d necessarily be hedonistic consequentialists, but they would-

Lucas: I think they should be.

Peter: That’s a different question, right?

Lucas: Yeah, yeah, yeah.

Peter: We haven’t really discussed steps to get there yet, so I think they’re separate questions. My idea is that they would be able to see that if we had similar interests to the ones that they did, then those interests ought to get similar weight, that they shouldn’t ignore our interests just because we’re not members of whatever civilization or species they are. I would hope that if they are rationally sophisticated, they would at least be able to see that argument, right?

Some of them, just as with us, might see the argument and then say yes, but I love the tastes of your flesh so much I’m gonna kill you and eat you anyway. So, like us, they may not be purely rational beings. We’re obviously not purely rational beings. But if they can get here and contact us somehow, they should be sufficiently rational to be able to see the point of the moral view that I’m describing.

But that wasn’t a very serious suggestion about waiting for the aliens to arrive, and I’m not sure that I can give you much of an answer to say what further insights are relevant here. Maybe it’s interesting to try and look at this cross-culturally, as I was saying, and to examine the way that great thinkers of different cultures and different eras have converged on something like this idea despite the fact that it seems unlikely to have been directly produced by evolutionary beings in the same way that our other more emotionally driven moral reactions are.

Peter: I don’t know that the argument can go any further, and it’s not completely conclusive, but I think it remains plausible. You might say well, that’s a stalemate. Here are some reasons for thinking morality’s objective and other reasons for rejecting that, and that’s possible. That happens in philosophy. We get down to bedrock disagreements and it’s hard to move people with different views.

Lucas: What is reason? One could also view reason as some human-centric bundle of both logic and intuitions, and one can be mindful that the intuitions, which are sort of bundled with this logic, are almost arbitrary consequences of evolution. So what is reason fundamentally and what does it mean that other reasonable agents could explore spaces of math and morality in similar ways?

Peter: Well, I would argue that there are common principles that don’t depend on our specific human nature and don’t depend on the path of our evolution. I accept, to the extent, that because the path of our evolution has given us the capacity to solve various problems through thought and that that is what our reason amounts to and therefore, we have insight into these truths that we would not have if we did not have that capacity. From this kind of reasoning, you can think of as something that goes beyond specific problem solving skills to insights into laws of logic, laws of mathematics, and laws of morality as well.

Lucas: When we’re talking about axiomatic parts of mathematics and logics and, potentially, ethics here as you were claiming with this moral realism, how is it that reason allows us to arrive at the correct axioms in these rational spaces?

Peter: We developed the ability when we’re presented with these things to consider whether we can deny them or not, whether they are truly self-evident. We can reflect on them, we can talk to others about them, we can consider biases that we might have that might explain why we believe them and see where there are any such biases, and once we’ve done all that, we’re left with the insight that some things we can’t deny.

Lucas: I guess I’m just sort of poking at this idea of self-evidence here, which is doing a lot of work in the moral realism. Whether or not something is self-evident, at least to me, it seems like a feeling, like I just look at the thing and I’m like clearly that’s true, and if I get a little bit meta, I ask okay, why is that I think that this thing is obviously true? Well, I don’t really know, it just seems self-evidently true. It just seems so and this, potentially, just seems to be a consequence of evolution and of being imbued with whatever reason is. So I don’t know if I can always trust my intuitions about things being self-evidently true. I’m not sure how to navigate my intuitions and views of what is self-evident in order to come upon what is true.

Peter: As I said, it’s possible that we’re mistaken, that I’m mistaken in these particular instances. I can’t exclude that possibility, but it seems to me that there’s hypotheses that we hold these views because they are self-evident, and look for evolutionary explanations and, as I’ve said, I’ve not really found them, so that’s as far as I can go with that.

Lucas: Just moving along here a little bit, and I’m becoming increasingly mindful of your time, would you like to cover briefly this sort of shift that you had from preference utilitarianism to hedonistic utilitarianism?

Peter: So, again, let’s go back to my autobiographical story. For Hare, the only basis for making moral judgments was to start from our preferences and then to universalize them. There could be no arguments about something else being intrinsically good or bad, whether it was happiness or whether it was justice or freedom or whatever because that would be to import some kind of objective claims into this debate that just didn’t have a place in his framework, so all I could do was take my preferences and prescribe them universally, and, as I said, that involved putting myself in the position of the others affected by my action and asking whether I could still accept it.

When you do that, and if you, let’s say your action affects many people, not just you and one other, what you’re really doing is you’re trying to sum up how this would be from the point of view of every one of these people. So if I put myself in A’s position, would I be able to accept this? But then I’ve gotta put myself in B’s position as well, and C, and D, and so on. And to say can I accept this prescription universalized is to say if I were living the lives of all of those people, would I want this to be done or not? And that’s a kind of, as they say, a summing of the extent to which doing this satisfies everyone’s preferences net on balance after deducting, of course, the way in which is thwarts or frustrates or is contrary to their preferences.

So this seem to be the only way in which you could go further with Hare’s views as they eventually worked it out and changed it a little bit over the years, but in his later formulations of it. So it was a kind of a preference utilitarianism that it led to, and I was reasonably happy with that, and I accepted the idea that this meant that what we ought to be doing is to maximize the satisfaction of preferences and avoid thwarting them.

And it gives you, in many cases, of course, somewhat similar conclusions to what you would say if what we wanna do is maximize happiness an minimize suffering or misery because for most people, happiness is something that they very much desire and misery is something that they don’t want. Some people might have different preferences that are not related to that, but for most people, they will probably come down some way or other to how it relates to their well-being, their interests.

There are certainly objections to this, and some of the objections relate to preferences that people have when they’re not fully informed about things. And Hare’s view was that, in fact, the preferences that we should universalize are the preferences people should have when they are fully informed and when they’re thinking calmly, they’re not, let’s say, angry with somebody and therefore they have a strong preference to hit him in the face, even though this will be bad for them and bad for him.

So the preference view sort of then took this further step of saying it’s the preferences that you would have if you were well informed and rational and calm, and that seemed to solve some problems with preference utilitarianism, but it gave rise to other problems. One of the problems were well, does this mean that if somebody is misinformed in a way that you can be pretty confident they’re never gonna be correctly informed, you should still do what they would want if they were correctly informed.

An example of this might be someone who’s a very firm religious believer and has been all their life, and let’s say one of their religious beliefs is that having sex outside marriage is wrong because God has forbid it, let’s say, it’s contrary to the commandments or whatever, but given that, let’s say, let’s just assume, there is no God, therefore, a priori there’s no commandments that God made against sex outside marriage, and given that if they didn’t believe in God, they would be happy to have sex outside marriage, and this would make them happier, and would make their partner happy as well, should I somehow try to wangle things so that they do have sex outside marriage even though, as they are now, they prefer not to.

And that seems a bit of a puzzle, really. Seems highly paternalistic to ignore their preferences in the base of their knowledge even though you’re convinced that they’re knowledge is false. So there are puzzles and paradoxes like that. And then there was another argument that does actually, again, come out of Sidgwick, although I didn’t find it in Sidgwick until I read it in other philosopher’s later.

Again, I think Peter Railton’s is one who uses his. and that is that if you’re really asking what people would do if they’re rational and fully informed, you have to make judgments about what is a rational and fully informed view in this situation. And that might involve even the views that we’ve just been discussing, that if you were rational, you would know what the objective truth was and you would want to do it. So, at that level, a preference view actually seems to amount to a different view, an objectivist view, that you would hold where you would have to actually know what the things that were good.

So, as I say, it had a number of internal problems, even just if you assume the meta-ethic that I was taking from Hare originally. But if then, as happened with me, you become convinced that there can be objective moral truths. This was, in some ways, opened up the field to other possible ideas as to what was intrinsically good because now you could argue that something was intrinsically good even if it was not something that people preferred, and in that light, I went back to reading some of the classical utilitarians, again, particularly, Sidgwick and his arguments for why happiness rather than the satisfaction of desires is the ultimate value, something that is of intrinsic value, and it did seem to overcome these problems with preference utilitarianism that had been troubling me.

It had certainly had some paradoxes of its own, some things that it seemed not to handle as well, but after thinking about it, again, I decided that it was more likely than not that a hedonistic view was the right view. I wouldn’t put it stronger than that. I still think preference utilitarianism has some things to be said for it and they’re also, of course, views that say yes, happiness is intrinsically good and suffering is intrinsically bad, but they’re not the only things that are intrinsically good or bad, things like justice or freedom or whatever. There’s various other candidates that people have put forward. Many of them, in fact, are being objectively good or bad. So there are also possibilities.

Lucas: When you mentioned that happiness or certain sorts of conscious states of sentient creatures can be seen as intrinsically good or valuable, keeping in mind the moral realism that you hold, what is the metaphysical status of experiences in the universe given this view? Is it that happiness is good based off of the application of reason and the rational space of ethics? Unpack the ontology of happiness and the metaphysics here a bit.

Peter: Well, of course it doesn’t change what happiness is. That’s to say that it’s of intrinsic value, but that is the claim that I’m making. That the world is a better place if it has more happiness in it and less suffering in it. That’s judgment that I’m making about the state of the universe. Obviously, there have to be beings who can be happy or can be miserable, and that requires a conscious mind, but the judgment that the universe if better with more happiness and less suffering is mind independent. I think … Let’s imagine that there were beings that could feel pain and pleasure, but could not make any judgments about anything of value. They’re like some non-humans animals, I guess. It would still be the case that the universe was better if those non-human animals suffered less and had more pleasure.

Lucas: Right. Because it would be sort of intrinsic quality or property to the experience that it be valuable or disvaluable. So yeah, thanks so much for your time, Peter. It’s really been wonderful and informative. If people would like to follow you or check you out somewhere, where can they go ahead and do that?

Peter: I have a website, which actually I’m in the process of reconstructing a bit, but it’s Petersinger.info. There’s a Wikipedia page. They wanna look at things that I’m involved in, they can look at thelifeyoucansave.org, which is the nonprofit organization that I’ve founded that is recommending perfective charities that people can donate to. That probably gives people a bit of an idea. There’s books that I’ve written that are discussing these things. I probably mentioned The Point of View of the Universe, which goes into the things we’ve discussed today, probably more thoroughly than anything else. For people who don’t wanna read a big book, I’ve also got Oxford University Press’ very short introduction series. The book on utilitarianism is, again, co-authored by the same co-author as The Point of View of the Universe, Katarzyna de Lazari-Radek and myself, and that’s just a hundred page version of some of these arguments we’ve been discussing.

Lucas: Wonderful. Well, thanks again, Peter. We haven’t ever met in person, but hopefully I’ll catch you around the Effective Altruism conference track sometime soon.

Peter: Okay, hope so.

Lucas: Alright, thanks so much, Peter.

Hey, it’s post-podcast Lucas here and just wanted to chime in with some of my thoughts and tie this all into AI thinking. For me, the most consequential aspect of moral thought in this space and moral philosophy, generally, is how much disagreement there is between people who’ve thought long and hard about this issue and what an enormous part of AI alignment this makes up, and the effects, different moral and meta-ethical views have on preferred AI alignment methodology.

Current general estimates by AI researchers, but human level AI on the decade to century long timescale with about a 50% probability by mid-century with that obviously increasing over time, and it’s quite obvious that moral philosophy ethics and issues of value and meaning will not be solved on that timescale. So if we assume at the worst case success story where technical alignment and coordination and strategy issues will continue in their standard, rather morally messy way with how we currently unreflectively deal with things, where moral information isn’t taken very seriously, then I’m really hoping the technical alignment and coordination succeed well enough for us to create a very low level aligned system, that we’re able to pull the brakes on and work hard on issues of value, ethics, and meaning. The end towards which that AGI will be aimed. Otherwise, it seems very clear that given all of this moral uncertainty that is shared, we risk value drift or catastrophically unoptimal or even negative futures.

Turning into Peter’s views that we discussed here today, if axioms of morality are accessible through reason alone, as the axioms of mathematics appear to be, then we ought to consider the implications here for how we want to progress with AI systems and AI alignment more generally.

If we take human beings to be agents of limited or semi-rationality, then we could expect that some of us, or some fraction of us, have gained access to what might potentially be core axioms of the logical space of morality. When AI systems are trained on human data in order to infer and learn human preferences, given Peter’s view, this could be seen as a way of learning the moral thinking of imperfectly rational beings. This, or any empirical investigation, given Peter’s views, would not be able to arrive at any clear, moral truth, rather it would find areas where semi-rational beings, like ourselves, generally tend to converge in this space.

This would be useful or potentially passable up until AGI, but if such a system is to be fully autonomous and safe, then a more robust form of alignment is necessary. If the AGI we create is one day rational, putting aside whatever reason might be and how it gives rational creatures access to self-evident truths and rational spaces, then if AGI is a fully rational agent, then it, perhaps, would arrive at self-evident truths of mathematics and logic, and even morality, just as aliens on another planet might if they’re fully rational as is Peter’s view. If so, this would potentially be evidence of this view being true and can also reflect here that AGI from this point of using reason to have insight into the core truths of logical spaces could reason much better and more impartially than any human in order to fully explore and realize universal truths of morality.

At this point, we would essentially have a perfect moral reasoner on our hands with access to timeless universal truths. Now the question would be could we trust it and what would ever be sufficient reasoning or explanation given to humans by this moral oracle that would satisfy and satiate us of our appetites and desires to know moral truth and to be sure that we have arrived at moral truth?

It’s above my pay grade what rationality or reason actually is and might be prior to certain logical and mathematical axioms and how such a truth seeking meta-awareness can grasps these truths as self-evident or whether the self-evidence of the truths of mathematics and logic are programmed into us by evolution trying and failing over millions of year. But maybe that’s an issue for another time. Regardless, we’re doing philosophy, computer science, and poly-sci on a deadline, so let’s keep working on getting it right.

If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

[end of recorded material]